6,784 Matching Annotations
  1. Jun 2022
    1. Reviewer #1 (Public Review):

      The authors look at a few different nematode species to compare the dynamics of anaphase. They find that in some species the spindle oscillates transversely in anaphase, and in other species it does not. They ask what accounts for this different behavior. To address this question, they use ablation of the central spindle, and conclude from the result, correctly, that after the ablation the centrosomes are pulled to the opposite poles of the cell in all species. However, the magnitude, half-time and initial velocity of the recoil differ.

      To understand what accounts for the quantitative difference, the authors

      1) use a simple viscoelastic model of a constant force, F, pulling against a spring (with constant stiffness k), while the object moves through the viscous medium.

      2) estimate the cytoplasmic viscosity from tracking yolk granules,

      3) estimate parameters F and k from fitting the exponential recoil curves. They find that the greatest correlation between having transverse oscillation or not is with lower or higher viscosity, not with magnitude of the force or stiffness of the spring.

      Two major problems with this study can be identified:

      1) Meaning and significance: It is not clear if the transverse oscillation have a functional significance. In fact, they are more likely than not simply a byproduct of complex nonlinear mechanics of the mitotic spindle. It is important to understand what we can learn about the spindle mechanics from these oscillations, but there may be no evolutionary significance here. If the authors were asking - how, in many different species, the spindle scales with the cell size in the same way (as was done in Farhadifar et al 2020, which the authors do not to cite) despite large parameter variations - that would be a different story. But asking which parameter change is responsible for the behavior change is less meaningful.

      2) The study is not convincing, mainly because the model used for the fit is overly simplistic. The force is not constant, the spring stiffness is not constant, the mechanics is not, etc. There are a few different, very complex models, of the anaphase spindle with transverse oscillations - comparing to simulations of these models would be more convincing. Also, I am not quite sure whether the volume fraction of yolk is a useful parameter. Does not measuring MSD give us the diffusion coefficient and viscosity directly? I think using the factor depending on the volume fraction artificially inflates the viscosity differences. Lastly, I do not understand the theoretical argument based on comparison with Nedelec's model: in that model, increasing viscosity only slowed the oscillations down, not abolished them.

      In short, much more thorough investigation would be needed to understand which differences between the species account for the presence or absence of the oscillations, and one may question whether the answer would have a deep impact on our understanding of spindle mechanics.

    1. Reviewer #2 (Public Review):

      A summary of what the authors were trying to achieve:

      The authors have developed an approach to prediction of T cell receptor:peptide-MHC (TCR:pMHC) interactions that relies on 3D model building (with published tools) followed by feature extraction and machine learning. The goal is to use structural and energetic features extracted from 3D models to discriminate binding from non-binding TCR:pMHC pairs. They are not the first to make such an attempt (e.g., Lanzarotti, Marcotili, Nielsen, Mol. Imm. 2018), but they provide a detailed critical evaluation of the approach that sets the stage for future attempts. The hope is that structure-based approaches may have better power to generalize from limited training data and/or to model unseen pMHCs.

      An account of the major strengths and weaknesses of the methods and results:

      The authors first report (section 4.1) that their structural and energetic features contain information on binding mode, highlighting complexes with reversed binding polarity, for example, and partly discriminating MHC class I from MHC class II structures. This is encouraging but not terribly surprising. Also, with regard to MHC I vs II discrimination, it is not clear how the class II peptides are registered with respect to one another. This needs to be done by alignment on MHC and mapping of structurally-corresponding peptide positions, since the extent of N- and C-terminal peptide overhangs varies between structures and is largely irrelevant to the docking mode. Interactions between the TCR and MHC are ignored in the feature extraction process; it's possible that including these interactions could improve performance. The authors state: "To be noted that not all structures could be successfully modelled by TCRpMHC models, and so we could not submit them to the feature extraction pipeline." It's unclear what effect this could have on the results: if the modeling failures are cases of structures for which no good CDR templates could be identified, then perhaps this could bias the results.

      Section 4.2 reports a negative result: unsupervised learning applied to the extracted features is unable to discriminate binding from non-binding complexes. This suggests that there is not likely to be a simple energetic feature, such as overall binding energy, that reliably discriminates the true binders. In Section 4.3, the authors turn to supervised learning, in which training examples inform prediction by a classifier. One finding is that the pure-sequence approach using Atchley-factor encoding of the TCR:pMHC outperforms the structure-based approaches, though not by much. A combined model incorporating Atchley factors and structural features does slightly better. These results are a little hard to interpret because we don't know how challenging the 10-fold internal cross-validation is. It doesn't sound like there is any attempt to avoid testing on TCR:pMHCs that are nearly identical to TCR:pMHCs in the training sets, and the structural database is highly redundant, containing many slight variants of well-studied systems. It's also not clear how overlap between the template database used for 3D modeling and the testing set was handled; my guess is that since the model building is an external tool this was not controlled. Together, these factors may explain why the results on independent test sets are, for the most part, significantly worse than the cross-validation results. Another take-home message from the independent validation is that the sequence-only method seems to outperform the sequence+structure or structure-only methods. Although these are described as "out-of-sample validation", it's not clear how different these independent TCR:pMHC examples are from the structure dataset on which the model was trained.

      Sections 4.4 and 4.5 report that prediction accuracy varies significantly across epitopes, and this is in part determined by sequence similarity to the structural database (which provides templates for modeling and also constitutes the training set for the model). In section 4.6, the authors determine that the model does not appear to be able to predict binding affinity (as opposed to the binary decision, binding versus non-binding). Finally, in section 4.7 the authors benchmark the predictor against two publicly available, sequence-based predictors. When predicting for epitopes present in their training sets, all methods do reasonably well, with the edge going to the sequence-based ERGO method. When predicting for epitopes not present in their training sets, none of the methods perform very well. The authors state that "these results suggest that the structure-based models developed in this study perform as well as the state-of-the-art sequence-based models in predicting binding to novel pMHC, despite learning from a much smaller training set." This may be true, but the predictions themselves are not much better than random guessing (AUROCs around 0.5-0.6).

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      I'm doubtful that the proposed methods will form the basis of a practical prediction algorithm. In the absence of ability to generalize to unseen epitopes, simpler sequence-based approaches that leverage the ever-growing dataset of TCR:pMHC interactions seem preferable. I still think the study has value as a template and roadmap for future efforts, and a baseline for comparison. For me, a key unanswered question is whether the model-derived structural features are just a different, slightly noisier way of memorizing sequence, or actually contain orthogonal information that can enhance predictions. It might be possible to gain insight into this question by looking more carefully at the impact of model-building accuracy on performance (the authors use sequence similarity as a proxy, but this is confounded by overlap between the training set and the template set used for modeling). If model-building really adds something, it seems plausible that it does so by accurately capturing physical features of the true binding mode.

      A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community:

      As state above, I think the present work will have a positive impact on the field of TCR:pMHC prediction by critically evaluating the structure-based approach (and also by testing two previously published methods on independent data). I am less convinced of the utility of the specific methods than of the overall conceptual framework, evaluation procedures, and training/testing sets.

      Any additional context you think would help readers interpret or understand the significance of the work:

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): **Summary:** Techniques to probe the local environment of membrane proteins are sparse, although the influence of lipids on the membrane protein's function are known since many years. Therefore, the paper by Umebayashi et al. is important. The environment-sensitive dye Nile red (NR) coupled to a membrane protein is an appropriate sensor for monitoring the local membrane fluidity. Linking of Nile red to the receptor via a flexible tether was achieved with the acyl carrier protein (ACP)-tag method. Experiments showed that depending on the ACP site a certain linker length is required to have NR inserted in the membrane and thus be an effective sensor for lipid disorder. This technology could be of general usability to study the environment of membrane proteins in the context of their function. As an example, the technique allowed insulin induced membrane disorder in the close insulin receptor vicinity to be observed. Further, results suggested that tyrosine activity is required for this disorder to happen. The experimental results appear to be complete and controls were made.

      **Major comments:** 1) Sometimes technical terms are used without explanation: What is the GP value? What is ACP-IR? The spectrum was measured in number of rois? The reader can find those abbreveations out, but it would be nice to have them defined.

      We have made a list of abbreviations.

      2) Fig. 1d) is confusing. The ACP-IR labelling is evident in 3 panels, but there is no difference in the color (emission spectra of 1992-ACP-IR vs 2031-ACP-IR should be visible??). The DAPI staining is very different. When doing the latter, how difficult is it to get the staining equal?

      The differences in spectra cannot be seen because we used pseudo colors for display of the DAPI and CoA-PEG-NR staining. The reviewer’s comments about the unequal DAPI staining is correct. The reason for this is most likely that the cell membrane is unequally permeabilized by PFA treatment. As the point of this figure is just to show that the plasma membrane is labeled, dependent upon the expression of the ACP-tagged insulin receptor, we don’t think that the variable intensities of the DAPI staining is important. DAPI is simply used to indicate the position of the cells.

      3) How can one interpret Fig. 4: a) Control goes over 4 frames, at 240" insulin is added, and 10 frames should show a fluctuation difference?

      We showed 4 frames after control treatment that showed no significant change was observed by control treatment. We expected that clear changes would be invoked by insulin treatment in GP images, however these changes, while visible in the GP images, are difficult to see for the untrained observer. This is the reason why we used the ZNCC method in the subsequent figures to better visualize the changes.

      1. b) A color shift from blue to green is visible after insulin addition. But it is faint - difficult to assess from the pseudo color scheme. What does 1000 pixel top/1000 pixel bottom mean in c). Is it an attempt to better visualize the fluctuation? It is difficult to recognize a difference before and after adding insulin. d) It seems that the kymograph set should show this. What is the color scale? Why is 3 so untypical, i.e., no change? Box 6 is also peculiar: the left side does not show a strong change upon insulin administration, the right side does. Why? We appreciate the helpful comments for improving our manuscript.

      As pointed out, the change of GP value is extremely small before and after insulin addition, so it is difficult to fully visualize the change with normal pseudo-color expression. To deal with this, we adopted the following two methods to visualize minute changes.

      1) Visualization of local changes of the statistical GP value showed by ZNCC throughout the time-lapse images (Fig. 6 and Fig. S2B).

      2) Visualization of the top/bottom 1000 pixels of the sorting ZNCC value in each image (Fig. 7 and Fig. S2C). The top 1000 pixels are the ones that showed the largest changes. The bottom 1000 pixels are the ones that showed the smallest changes.

      Owing to these expressions, we found out that the level of the response against the insulin signal was spatially and temporally heterogeneous in the membrane.

      As for the color scale, in order to clarify the meaning of the difference of color, we have added the description about the relationship between the color and the ZNCC value in the results section.

      4) How is the kymogram calculated? The legend says 'The horizontal dimension represents the averaged ZNCC inside the rectangular area, and the vertical dimension represents time'. The averaged ZNCC is a single value, so it is not clear why the kymogram shows a variation from left to right. May it be the ZNCC was averaged just vertically?

      We apologize that we did not provide information regarding making the kymograph.

      In the yellow rectangular area (Fig. 6B), the ZNCC values of the pixels with the same x coordinate value were vertically averaged, which were represented as the horizontal direction of the kymograph. That is, one horizontal line of the kymograph holds the spatial distribution of the ZNCC value along the horizontal direction of the membrane, and the vertical direction shows their time changes. To make it easier to understand, we refined the description about the kymograph in the legend of Fig. 6.

      5) When calculating cross-correlation values on images, they need to be aligned. What fraction of the total image does the selected 19x19 box represent? As described, I imagine that a rolling CC over 19x19 pixels is calculated over an image from the time lapse series comparing it with the reference Iave(x,y). Compared to the 3x3 median filtered CP image, the ZNCC image should then be much more blurred??

      Below we provide more information regarding the calculation of ZNCC.

      Each local window for ZNCC calculation is set to a 19x19 pixels centered on every single pixel excluding the edges of an image. The ZNCC value calculated in that window is set to a center pixel of that area. After that, a new window centered on the adjacent pixel is set and calculate the new ZNCC. That is, the calculation window is slid throughout the image. Also, the calculated ZNCC value is not set to all the pixels of the window, but is set to only the center pixel of the window, so there is no blur effect like median filtering.

      The figure below shows a schematic view of our ZNCC calculation.

      Schematic view of our ZNCC calculation

      **Minor comment:** On page 16 supplementary is not spelled properly.

      corrected

      Reviewer #1 (Significance (Required)):

      The key point of this paper is convincing and the new technology appears to have a lot of potential. It can be applied to study membrane protein function in the context of its environment, the lipid bilayer.

      Membrane fluidity measurements have been developed (e.g., using fluorescent probes like laurdan). However, the trick to link a probe like nile red by ACP technology to the insulin receptor and to observe its activity is quite new.

      A most recent description of such a technology is in TrAC Trends in Analytical Chemistry Volume 133, December 2020, 116092.

      This is an interesting review, but not directly impacting on our work.

      **Referees cross-commenting**

      All comments are constructive and important. The paper is important but needs to be amended as proposed.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): **Summary:** In this manuscript, authors generated an ACP-attached Nile Red probe in order to specifically label Insulin receptor in the membrane. Owing to this specificity, one can measure the lipid membrane properties around a specific protein in the membrane. **Major comments:**

      For the conclusions in the manuscript to be convincing, in my opinion, these additional data need to be added. Some of these are new experiments, and some are detailed analysis of existing data. The new experiments are not for new line of investigation, instead it is to confirm their statements and conclusions. The major point is the reliability of spectral shift. In usual environment sensitive probes, it is certain that they are in the membrane whatever is done to the membrane. However, when the probe is attached to a protein, it is not trivial to have the same confidence that the probe is always inside the membrane, and it is in the same plane of the membrane. 1992-ACP-IR is a good example; authors state that it binds to the protein outside the membrane, but when there is cholesterol addition and -maybe more interestingly- cholesterol removal, the dye still reacts and changes its emission (even PreCT changes its emission quite a bit at the 570 nm region). This is a clear indication of a change in localization of the probe upon some changes in the membrane. This implies that observed spectral shifts may not be due to lipid packing differences, but due to localization of the probes. For this reason, it is crucial to know where any environment sensitive probe localize in the membrane with respect to membrane normal, and this knowledge is more important for this probe. Related to this, the spectral difference upon insulin treatment and activation of insulin receptor could be due to changes in probe's localization in the membrane. Especially because authors show in Fig1e, the spectra can change depending on the probe localization. Relatedly, quantum yield of NR should be significantly different when it is inside vs outside membrane. Authors should show QY for 1992-ACP-NR and 2031-ACP-NR with different PEG lengths and upon insulin treatment.

      We understand the logic of the request to measure the QY, since the QY of Nile red is much higher in organic solvents than in aqueous solutions, so it might be predicted that the QY of Nile red is higher in a lipid bilayer than when covalently bound to the protein in an aqueous environment. However, this argument depends upon the mechanism for the increase in quantum yield when going from aqueous to a non-polar solution. One possible explanation is based on the intrinsic properties of the dye under the two conditions. The alternative explanation would be that the dye would aggregate (be insoluble) in aqueous solution and therefore either not fluoresce or self-quench. In this case, we believe that the latter is the explanation because we and others have previously shown the turn-on properties of the probe when binding to proteins (SNAP-tag and others). It is not simple to measure QY in the cell under a microscope, but we have done something similar shown in supplementary figure 4. We labeled the three ACP-receptor complexes with PEG11-Nile red and co-stained with antibody to the Insulin Receptor. We then calculated a relative quantum yield. There were very little differences at all between the relative quantum yields, so we conclude that it is not the environment of the probe, which affects the quantum yield under these conditions, but the fact that it is covalently attached to a protein and incapable of forming aggregates. What distinguishes these constructs is the emission spectrum, not the quantum yield. In supplementary Table 2 we also did QY measurements in vitro and we could reproduce the increase of quantum yield by association with liposomes or in organic solvents. We tested whether non-covalent association with a protein would increase the QY by incubation with the lipid binding protein, BSA, in PBS. This was not the case, strongly pointing to the conclusion that it is the covalent association with the protein that increases the QY, not association with a protein. We believe that our demonstration of changes in fluorescent spectra with changes in cholesterol, large changes in fluorescent spectra with linker length for the 1992 construct and voltage sensitivity using patch-clamp prove that the Nile red is reporting on the membrane environment under the conditions we propose.

      **Minor comments:** - Fig 1d requires quantification We do not agree on this. This is simply to show that the labeling is dependent upon expression of the relevant ACP-IR constructs. There is no detectable labeling of the control.

      • Voltage sensitivity of different PEG length of 2031-ACP probe should be added. We have added this data in figure 2 panel E.

      • Fig 3a graph should show all data points, not only bar graphs. Also, the band in 3a for +CoA-PEG-NR is dimmer than other bands, is it specific to this particular gel since quantification does not show any difference?

      There is no significant difference- Fig 4d, colour code is needed.

      Done

      • Fig 5b and Fig3d are basically the same experiments in terms of control measurement, why is the difference in 3b is 0.04 GP unit while it is 0.007 GP unit?

      We explain in the MS, but have improved the title of Y-axis in Fig.5 b graph so that the difference in what is plotted is clear. - Why is inhibitor data so noisy? We should discuss.

      We don’t know the exact reason why inhibitor data is noisy, but we speculate that the actin cytoskeleton and phosphoinositide-dependent signaling could affect the membrane stability, and the membrane environment would be fluctuated in the presence of latrunculin B or PI3K inhibitor.

      Reviewer #2 (Significance (Required)): Overall, this is a very useful approach, and this line of research will yield very useful tools to shed light on how lipids surrounding proteins can change their function. Major advance of the paper is the new chemical biology tool. There is also biological data on how insulin can change the insulin receptor's membrane environment which is contradictory to some old literature claiming that InsR becomes more "rafty" upon insulin treatment (e.g., PMID: 11751579).

      If this type of tagging proves robust and reproducible (limitations and concerns listed above and below), it could be used by other researchers to tag their protein of interest and investigate the lipid environment around those proteins.

      The downside of this method is that the probe requires ACP tag, a relatively less used tag than others in biology, therefore researchers interested in using this probe should have their proteins with ACP tag. Moreover, the linker length and ACP-tag position are quite crucial parameters (and probably should be optimized for each protein). Longer PEG lengths cannot report on changes efficiently (Fig3b), while shorter lengths are prone to artefacts as they can go out of membrane (Fig1 and Fig2). This might limit its widespread use.

      The reason for using the ACP tag is that neither the SNAP tap nor the HALO tag working. The tethered Nile Red preferred to bind to the tqg rather than inserting into the membrane.

      **Referees cross-commenting** I agree with all comments and concerns of other reviewers. I see the usability and potential of this new technology along with its limitations as all three reviewers pointed out.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): See below. No concerns on any of these issues.

      Reviewer #3 (Significance (Required)): **Critique:** This MS reports a proof-of-principle for using site-directed environmentally sensitive probe technology to assess the local membrane environment of a receptor tyrosine kinase (IR) upon activation. This technology addresses a major gap in our arsenal of tools to study the mechanisms of membrane signaling as the parameters of interest are biophysical parameters rather than purely biochemical ones. How to do this with spatial and temporal resolution is a major challenge. This study builds on previous work by the Riezman group that develops an extrinsic labeling system to tether Nile Red to specific sites on the ectodomain of a signaling receptor and then probe local membrane environments as a function of receptor activity. This is a carefully done study is well-controlled, is clever in design and is well-described. Although the major issues to which such a general technology could contribute involve intracellular (and not extracellular) event, the advances described will be of general interest -- particularly that local membrane order decreases when IR becomes activated. Specific comments for the authors' consideration follow:

      **Specific Comments:** (i) As a general comment, the authors are measuring extracellular plasma membrane leaflet properties that may or may not translate to what is happening in the local inner leaflet environment. A general reader may well miss the significance of this. This point needs to be more explicitly emphasized in the Discussion.

      This has been discussed in the revised version.

      (ii) Why not treat cells with a PLC inhibitor to block PIP2 hydrolysis and ask if that inhibits membrane disorder. It is PIP2 hydrolysis/resynthesis that regulates the actin cytoskeleton at signaling receptors and this seems an attractive candidate for study.

      There is a long list of attractive post-signaling events of the insulin receptor and how this works in different cell types that could be tested. We believe that this is beyond the scope of this study and we encourage others to do this.

      (iii) The data acquisition time is at least 4 min which is long enough for activated receptors to be recruited to sites of endocytosis. Can the authors exclude the possibility that what they are measuring isn't reflective of such spatial reorganization? Does a clathrin inhibitor block the observed change in local membrane order for activated IR? We determined localization to AP2 adaptor containing clathrin coated pits at the cell surface and showed that during the time-course of the experiment that there is no significant change in co-localization or evidence for endocytosis (new figure 9). Therefore, we decided not to do the clathrin inhibitor blocking experiment because we believe that it could only lead to indirect effects.

      (iv) Receptor activation is accompanied by other transitions such as dimerization, etc. Can the authors exclude the possibility that what they are measuring is related to changes in depth of insertion of the NR probe into the plasma membrane outer leaflet that is a consequence of IR conformational transitions associated with activation? This is highly unlikely given the fact that fluidification of the membrane environment is found with all length linkers. Given the intervals in increases in linker length on the 2031 construct, which is the closest to the membrane, it is very difficult to conceive that any of the ones larger than 5 PEGs restrict significantly the membrane insertion of the dye. **Referees cross-commenting**

      I think we have a consensus opinion

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      See below. No concerns on any of these issues.

      Significance

      Critique:

      This MS reports a proof-of-principle for using site-directed environmentally sensitive probe technology to assess the local membrane environment of a receptor tyrosine kinase (IR) upon activation. This technology addresses a major gap in our arsenal of tools to study the mechanisms of membrane signaling as the parameters of interest are biophysical parameters rather than purely biochemical ones. How to do this with spatial and temporal resolution is a major challenge. This study builds on previous work by the Riezman group that develops an extrinsic labeling system to tether Nile Red to specific sites on the ectodomain of a signaling receptor and then probe local membrane environments as a function of receptor activity.

      This is a carefully done study is well-controlled, is clever in design and is well-described. Although the major issues to which such a general technology could contribute involve intracellular (and not extracellular) event, the advances described will be of general interest -- particularly that local membrane order decreases when IR becomes activated. Specific comments for the authors' consideration follow:

      Specific Comments:

      (i) As a general comment, the authors are measuring extracellular plasma membrane leaflet properties that may or may not translate to what is happening in the local inner leaflet environment. A general reader may well miss the significance of this. This point needs to be more explicitly emphasized in the Discussion.

      (ii) Why not treat cells with a PLC inhibitor to block PIP2 hydrolysis and ask if that inhibits membrane disorder. It is PIP2 hydrolysis/resynthesis that regulates the actin cytoskeleton at signaling receptors and this seems an attractive candidate for study.

      (iii) The data acquisition time is at least 4 min which is long enough for activated receptors to be recruited to sites of endocytosis. Can the authors exclude the possibility that what they are measuring isn't reflective of such spatial reorganization? Does a clathrin inhibitor block the observed change in local membrane order for activated IR?

      (iv) Receptor activation is accompanied by other transitions such as dimerization, etc. Can the authors exclude the possibility that what they are measuring is related to changes in depth of insertion of the NR probe into the plasma membrane outer leaflet that is a consequence of IR conformational transitions associated with activation?

      Referees cross-commenting

      I think we have a consensus opinion

    1. Author Response

      Reviewer #1 (Public Review):

      This study addresses the important question of understanding the cellular physiology of cholinergic interneurons in the striatum. These interneurons play a key role in learning and performance of motivated behaviors, and are central to movement disorders, psychiatric disease, and addiction. Their unique physiology, which includes tonic pacemaking activity and active conductances that shape integration of dendritic inputs, is critical to their function but is still incompletely understood. The authors cleverly integrate a series of innovative electrophysiological and optical approaches to gain insight into dendritic physiology of these neurons. Their creative approach yields some interesting and novel findings. However, there are technical and conceptual concerns that need to be addressed before these results can be readily interpreted. Some refinement of analysis and presentation, and potentially some additional experiments, will therefore be required to strengthen the conclusions and facilitate interpretation of the results.

      We believe that with several new sets of experiments and simulations, we have successfully refined the analysis and addressed the technical and conceptual problems. Indeed, we strengthened the conclusion with a novel pharmacological experiment that provided model-independent evidence of proximal-only boosting.

      Major concerns:

      1) This manuscript focuses on differential physiology of proximal and distal dendrites contribute to physiological activity and integration of inputs in cholinergic interneurons, suggesting that NaP and HCN currents act in concert to selectively boost inputs onto proximal dendrites (from thalamus), relative to inputs onto distal dendrites (from cortex). The results presented in Figures 1-4 are consistent with a distinct physiology of proximal-vs-distal dendrites based on purely electrical properties. Indeed, Figure 5 initially appears consistent with this model as well, since thalamic inputs (onto proximal dendrites) are boosted by an NaP conductance, while cortical inputs (onto distal dendrites) are not. This raises a key conceptual question: why are cortical inputs onto distal dendrites not boosted? Any depolarization of distal dendrites must pass through proximal dendrites before reaching the recording electrode at the soma. Shouldn't this signal be subject to the same active and passive conductances, and consequently the same boosting that shapes thalamic inputs onto proximal dendrites?

      You are absolutely right in the case of a linear model (passive or quasi-linear). However, for a nonlinear system, there can be preferential boosting of proximal inputs. The new Appendix 2, addresses this point with computer simulations.

      2) The quasi-linear approach to characterizing active and passive membrane properties is promising, and the choice of a cable-based model is well supported. However, the model itself is rather opaque, which limits confidence in the interpretation of the results. Additional analysis and description should be presented to alleviate concerns about whether the experimental data, which has a limited number of measurable values, may be over-fit by a model with too many free parameters. For example, why is the radius of the dendrite a free parameter that is allowed to vary in the full field vs proximal experiment (Lines 253-256) - and isn't it a serious red flag that the value returned for proximal dendrites is smaller than for the full field? Additional tables (e.g. fixed and free parameters and how they were determined), and figures (plots of how those parameters influence the fits, and how the parameters interact with one another) would considerably strengthen confidence in the conclusions drawn by the authors.

      Thank you very much for this comment. We have added in the new ms a table with all the parameters fit in the various figures, and have discussed the possible pitfalls of overfitting. Most importantly, we have provided a new appendix (#1) to the manuscript that explains the effects of the various model parameters in a systematic fashion, beginning with a passive dendrites, followed by the effects of boosting and then the effect of restorative currents that give rise to resonances. This appendix addresses the questions raised by the reviewer regarding how the various parameters influence the fits.

      We apologize, if we created a confusion, with respect to the meaning of the parameter r. It does not represent the radius of the dendrites (which is not explicitly represented at all, only implicitly through the space constant) but rather the electrotonic range of illumination. We indeed find that the fits consistently estimate a value of r for the proximal illumination which is smaller than that estimated for the full-field illumination, as it should.

      Finally, our new pharmacological demonstration of differential boosting in the case of proximal vs. fullfield illumination (see above) is entirely independent of the quasi-linear model fit. So for the main thrust of the ms, which is to demonstrate a proximal localization of nonlinearities and its correspondence to the spatial localization of excitatory afferent inputs, this is now achieved, at least vis-à-vis the NaP current, independently of the qausilinear model. However, we still find the model useful as it is used to estimate the distribution of HCN currents and provides a framework to think about how to manipulate dendritic nonlinearities experimentally.

      3) Technically, the use of ChR2 to modulate dendritic currents is creative. While the authors rightly acknowledge that activation/deactivation kinetics of the ChR2 channel will contribute to filtering, this important point should be expanded with additional analysis and potentially with new experiments. Of particular concern is the transition of ChR2 channels to an inactivated state over the comparatively long oscillating light pulse in Figure 3 Inactivation of ChR2 is prominent over this timescale and would precisely co-vary with the shift in oscillation frequency. To address this, the authors should present a direct measurement of this inactivation and account for it in their analysis of the chirp data. Alternatively, the chirp stimulus could be presented backwards (starting at high frequency), so that comparison of forwards-vs-backwards chirp recordings could disentangle this artefact. Either one or both of these additional experiments would be critical for interpreting the roll-off in photocurrent responses at high frequencies reported in Figure 3.

      Touché! You were spot on with this critique and we were wrong. We have now conducted several new experiments (that appear in the main text and in Figure 3 and all its supplements) that show that including ChR2 kinetics explicitly in the model fits actually makes the fits more self-consistent and removes some of the glaring differences between the results from the somatic voltage perturbations (Figures 1–2) and the optogenetic illumination (Figure 3). So as per your request, we have now presented a direct measurement of the deactivation (Figure 3–figure supplement 1) and we have played the “chirp” backwards (Appendix 1–figure 2) to address the issue of inactivation.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      First we would like to express our deep gratitude to the reviewers for thoroughly and fairly reviewing our work.


      Reviewer #1:

      Major Concerns

      1. A major concern I have is with the use of DAPT to modulate Notch signaling, and investigate the impact on integrins, Yap, cadherins, etc. Gamma-secretase, the target of DAPT, cleaves not only Notch receptors, but also IntegrinB1, Nectins, Cadherins, Ephrins and more. This recent review lists 149 substrates (Guner & Lichtenthaler Seminars in Cell & Developmental Biology 2020). The risk that some of the results reflect DAPT impact on IntegrinB1, Cadherins etc themselves is significant. The authors should validate their findings with more specific modulation of Notch activity, for example with a Notch blocking antibody, with siRNA, or with SAHM1. We agree with the reviewer´s comment and will add additional key experiments using SAHM1 as alternative inhibitor of Notch activity.

      Furthermore, EGTA was used to "acutely destabilize VE-Cadherin". But EGTA chelates Calcium, which is essential for Notch structure, and EGTA is thus a well-known activator of Notch signaling (see eg Rand MD et al. (2000) Calcium depletion dissociates and activates heterodimeric notch receptors. Mol Cell Biol). The authors rightfully describe and cite this paper, but the use of EGTA nonetheless confounds interpretation. The authors check for NICD levels (at what timepoint?) but the staining is cytoplasmic (also not labelled in the figure per se, but described in the figure legend? - please label the staining in the panel). And in any case, NICD is very short-lived and nuclear staining cannot be taken as a hallmark of signaling activity. In particular if staining is performed at a time point at which the receptor and NICD may have been exhausted/depleted. The authors should validate these observations/conclusions with the Notch reporter to conclusively demonstrate whether EGTA does not activate Notch in their system.

      To test whether transient treatment with EGTA causes Notch activation we will repeat this experiment with Notch reporter activity as readout.

      Trans-endocytosis of NECD on different substrates: the authors suggest that trans-endocytosis of NECD by Dll4 increases on softer substrates. But the authors also show that soft substrates lead to spreading out of cells, which could confound interpretation (is overlapping membranes, not internalization). The authors could validate trans-endocytosis by FACS: check if red Dll4+ cells contain more NECD. It is also not clear to me in this experiment whether the authors are looking at green NECD, or Notch1 full length, since they write "overlap of Notch1 and Dll4", which would not reflect trans-endocytosis but interactions at the cell surface for both cells. Please also define "overlay intensity", or explain further.

      We will validate the trans-endocytosis by flow cytometry. In addition, we describe the procedure for microscopic analysis more clearly (methods section, p 4; results section, p 17-19)

      The authors conclude their introduction with a statement that mechanosensitivity of Notch is linked to endocytosis, but their conclusion from Fig 6C was that Notch stiffness-dependence was independent of endocytosis, using the rhDll4..?

      We have now rephrased this sentence.

      • *

      Minor concerns

      1. In the introduction, the authors describe Dll3 as a Notch ligand that activates Notch signaling in trans. To my knowledge, Dll3 has only been described as a cis-inhibitor of Notch signaling. (I think this may have arisen during repeated edits of the manuscript!) This has now been corrected in the current version.

      In the introduction, the authors state that Notch1, Dll4 and Jag1 control angiogenesis, but then they only describe what Notch1/Dll4 do in the next few sentences. Perhaps one sentence to describe the role of Jag1 would help avoid the feeling of being "left hanging".

      This has now been corrected in the current version.

      Data presentation: please show all bar graphs with the individual replicates (dotplots).

      We have now changed all bar graphs into scatter plots.

      Data analysis/normalization: many graphs represent normalization of values in multiple steps which are not described in the methods/legends/results. For example, Notch reporter gene activity (Fig 1A) is Firefly divided by Renilla, and presumably normalized to the control condition at 1 (or an average of 1 for the three controls?). This is not explained. Also, it is not clear whether the data reported for the Control condition are Huvec on rhDll4 compared (normalized) to Huvec on control substrate (and similar for each other condition). What controls are included in this experiment? Please provide the full data to provide insight into the magnitude of activation by Dll4 itself. Perhaps "Control" is without rhDll4? But the bar underneath A/B implies this rhDll4 was used in all conditions.

      We have edited our manuscript accordingly to avoid these ambiguities.

      Statistics: data should be presented as means +/- standard deviation, not standard error of the mean (see for example Barde & Barde Perspect Clin Res. 2012): "SEM quantifies uncertainty in estimate of the mean whereas SD indicates dispersion of the data from mean. As readers are generally interested in knowing the variability within sample, descriptive data should be precisely summarized with SD."

      We now use SD instead of SEM.

      Statistics: In the Methods section, the authors state that one-way ANOVA was followed by Dunnett's multiple comparison test, and two-way ANOVA was followed by Tukey's multiple comparison test. Dunnett is used to compare every mean to a control mean, while Tukey is used to compare every mean with every other mean. Fig 1 describes using Dunnett for Fig 1B, but the end of the legend days Tukey was used. However Fig 1A,C show internal pairwise comparisons to plastic. Please be sure to explain which statistics were used where, and why, and if plastic was set as the comparator, please be explicit about this. Fig 3 uses "Sidak's corrected two-way ANOVA" and "Sidak's multiple comparison test"? I think Sidak is a method to correct alpha or p for multiple comparisons, as stated in the first instance, but it is described why this was used here, and not in other analyses, and whether the authors then applied Tukey's post-hoc test as described in the methods section? Similar comments for Fig 6. It is counter-intuitive that the plastic -1.5kPa PDMS difference with no error-bar overlap in 1A would be 1-star significance, while the plastic-70kPa difference with almost overlapping error bars in 1B would be 4-star significance. Please check/show values. In Fig 1B Figure legend, the authors write "Data is presented in a bar plot and compared with the integrin β____1 intensities without DAPT treatment", but this is not the statistical comparison presented. Fig 3B shows a very minor difference with overlapping error bars as 3-star significance? Is this correct?

      We have checked all statistical issues and corrected where necessary. Since the sample size and variance were homogenous in all comparisons we now uniformly use ANOVA and Tukey´s multiple comparison test as post hoc to keep things simple.

      How much nuclear NICD (NICD intensity) is there in control conditions? (Control missing from Fig 1B, D).

      We will repeat the experiment and compare the NICD levels with those in non-activated cells on plastic.

      A DAPI counterstaining for 1B/D right panels would facilitate evaluation of whether NICD nuclear intensity is increased. The same applies for nuclear YAP assessment in Fig 3B. I assume a nuclear counter-stain was done for quantification of nuclear NICD intensity, and nuclear YAP intensity, but this is not described in the Materials and Methods, please add a description of how intensity was quantified, and provide nuclear counterstain images. (Also, what is the unit on the y-axis of "intensity" graphs? Arbitrary units (a.u.)?

      The counterstaining method with Hoechst as well as the use of the nuclear staining for quantitative analysis of images are now described in the Methods section and where needed in the figure legends. The y-axis of the intensity graphs now has a dimension (a.u.). We decided against overlay of the nuclear staining with the NICD or YAP images for graphical reasons (visibility of the respective staining).

      How much "overall" integrin B1 is there in DAPT-treated conditions in Fig 2C? (related to the concept that DAPT could be cleaving integrin B1, it could be depleted at 24 hours..?)

      We will additionally add this experiment and validate the effect of Noch inhibition on the overall intergrin level by the alternative inhibitor SAHM1

      More details regarding the analysis procedure need to be added to the Methods Section. Were cells segmented and then mean intensity estimated for the whole cell? Was this done by means of Intensity Ratio Nuclei Cytoplasm Tool plugin for Fiji alone? Were images background corrected, corrected for inhomogeneous illumination, normalized? In the case of Integrin beta 1 active, the expression seems to be patterned, was intensity expressed as mean intensity of every pixel corresponding to cytoplasm? For VE Cadherin staining, how was intensity estimated (only pixels corresponding to membrane were considered or every pixel of the cell)? Many figures are originated from a confocal microscope: were z-stacks acquired and then maximum projections done? Were z-stacks acquired and then fluorescence quantified in 3D images? Was a single plane acquired or analyzed, and if that is the case, how was this plane chosen?

      The requested information has now been inserted in the respective results and method sections.

      In Fig 4A, how is VE-Cadherin intensity quantified? As an average per field of view? Or per cell? And if per cell, how was each cell delineated? And if not per cell, how were equal cell numbers ensured? In FRAP experiment, how was intensity quantified? Was it per cell, per field of view or per region? Was each bleached region analyzed separately, or each cell? The datapoints should be either added to Figure 4C or as supplementary to assess the fitting. How many bleached regions per cell were done and how many cells were analyzed? In FRAP experiment, was bleaching done with an increased pixel dwell time? Was laser intensity increased? Do you have an estimation of laser power (not percentage) or flux?

      These issues are now described in more detail in the respective figure legend.

      Figure S2 is not referenced in the manuscript - I think a reference to "Figure S3" in the NECD transendocytosis section (no page numbers or line numbering) should be to Fig S2 instead?

      Sorry for this mistake! We corrected this now.

      In Figure 5A NICD nuclear intensity normalized somehow (normalization not explained), and stiffness no longer appears to regulate NICD levels as shown in Figure 1B.

      We have now described the normalization better in the figure legend. The difference to the results in Fig. 1B is that in Fig. 5A the cells were not activated by Dll4 sender cells or rhDll4 (endogenous Notch activity). This is now stated more clearly.

      Fig 6B: From the immuno at right there is a clear stiffness-dependent difference in Transferrin uptake. How were "single cell uptake" and "number of particles" quantified? (How were cell bodies identified?) Uptake could also be verified with FACS.

      In this point, we disagree with the reviewer: we really do not see a systematic difference in intensities between the different substrates. The process of image analysis is now better described in the figure legend. The result was so clear that we did not use FACS as complementary approach.

      Fig 6C: there appear to be very different numbers of cells in the brightfield image at right. Are the 70, 1.5, and 0.5 kPa Notch reporter activities different from one another or only different from plastic? Might these results reflect cell density/increased Notch signaling due to more cell-cell contacts?

      Unfortunately, with decreasing stiffness the PDMS gels become optically more and more cloudy, giving the false impression of a higher cell number. We tried to circumvent this by changing contrast and brightness of the images, but to no satisfying effect. We now mention this issue in the figure legend.

      How was the Dll4 coating of the different substrates done?

      The coating of the substrates is now described under a specific subheading in the Methods section.

      It would be helpful to describe the composition of Collagen G (Collagen I) in the text (it is a risk to expect vendor information to remain available indefinitely).

      The role and composition of the Collagen G coatings was included in the text (p 7). Further information on the manufacturer of the product used is included in the methods section.

      Please list catalog numbers for all reagents, and dilutions used for antibodies.

      We have added this information wherever possible.

      Instead of using red and green for images, maybe cyan, yellow and/or magenta could be used to help the reader see what is being shown (especially if the reader might be color blind).

      We will of course adhere to the respective policy of the publishing journal, once the manuscript is accepted.

      Packages and tools such as Intensity Ratio Nuclei Cytoplasm Tool plugin for FIJI should be referenced.

      We have now referenced respective tools.

      Reviewer #2:

      *Major comments: *

      Is there difference on a growth rate of cells on softer vrs stiffer gels that could affect cell morphology/signaling pathways?

      This is an important point and we will perform additional respective experiments.

      Nuclear localization of NICD and YAP would be good to validate with western blot.

      Quantification of Western Blots (especially after nuclear isolation) is – at least in our hands – much less sensitive and reliable then quantitative imaging. We do not think that this experiment would strengthen our study.

      In Figure 3 and Figure 5, siRNA experiments would strengthen the data. DAPT is not only an inhibitor of Notch but affects to other proteins as well. This should be stated.

      A similar point was raised by Reviewer#1 with the suggestion to use SAHM1 as an alternative to DAPT. As suggested we will add these experiments.

      How was the mean VE-cadherin branch length determined? This term often refers to angiogenesis assay/sprout formation and maybe another one should be considered here to describe VE-cadherin junction morphology.

      Add to all figure texts how many cells were used for the analyses*. *

      The cell number is now added wherever appropriate.

      In Fig. 6C the cell morphology of HUVECs look abnormal in comparison to other images and should be re-done.

      In contrast to all other experiments the cells where not confluent in this case. The different morphology is a sign of the lack of neighbours, not of some problem with the cells.

      Was all the data normally distributed and thus ANOVA was used? Please add more details on the statistics part. Did you remove outliers?

      Like also suggested by Reviewer #1 we have added more information on statistics and streamlined this. The data are normally distributed, outliers wer not removed.

      MTT assay of DAPT would need to be presented as it can be cytotoxic. Cells are not well visible in Fig 2C with DAPT. DAPI and F-actin staining would help to see the cell morphology.

      We will add respective data on cell viability after DAPT (and SAHM1) treatment in a revised version of the manuscript.

      Minor comments:

      Please clarify how coating with rhDDL4 is done as this was unclear at least for this reviewer.

      The coating of the substrates is now described under a specific subheading in the Methods section.

      HUVECs are known to be hard to transfect. Please provide data on transfection efficiencies of all transiently transfected cells.

      We did not systematically monitor transfection efficiencies in this context, since there was always an internal control (e.g. co-reporter in the reporter gene assay) or the data were obtained on a single cell based quantification. Generally, we yield transfection efficiencies around 30% with HUVECs.

      Reviewer #3:

      Major comments:

      • *

      1) The authors use recombinant Dll4 or Dll4-expressing ("sender") cells to activate Notch in co-cultured cells. This is per se fine however, one might over-estimate all other observed downstream effects as endogenous Notch activity is lower. It would be important to see how naïve HUVEC or other primary endothelial cells respond to changes in stiffness. qPCR of Notch target genes such as Hey1, Hey2, Hes5, Dll4 is frequently used as a readout of Notch activity in this context. Also. the Notch transcriptional reporter assay might be a suitable read-out-

      In Fig.5A we show data on endogenous Notch activity (- EGTA) on substrates with different stiffness. In this case NICD levels in the nucleus do not differ. It will definitely be interesting to repeat this experiment based on the reporter gene assay.

      2) As the authors mention in the Discussion, cell density could be of utmost importance given the fact that Notch signaling usually is assumed as an in trans signaling event between adjacent cell membranes. However, also other signaling modes (in cis, cis inhibition, JAG1 vs DLL4 ratio) might be important. As such, the authors should carefully document an report on cell density in all experiments. Secondly, the authors should use other conditions such as sparse cell density and thirdly the authors should measure transcriptional effects of stiffness on Notch ligand expression.

      In all experiments (with the exception of Fig. 6C) we used confluent cells. With the sparse cells (Fig. 6C) we also observe stiffness dependency. Investigating Notch ligand expression is definitely a good idea and will be investigated in the revised manuscript.

      3) The authors need to compare stiffness in their model with physiological conditions in developing tissues and ideally also in tumor which often have increased tissue stiffness.

      *Good point! We have now integrated such comparisons in the Discussion. *

      4) Is Notch activation due to changes in stiffness dependent on the presence of ligands or could it be that (unspecific) binding of Notch receptors to ECM could trigger cleavage just by conformational change?

      Since there is no stiffness dependent response on collagen (Fig. 6C, left panel), an effect of unspecific binding is highly unlikely.

    1. Author Response

      Reviewer #1 (Public Review):

      In this article, the authors investigated the role of sleep and brain oscillations in visual cortical plasticity in adult humans. The authors tested the effect of 2 hours of monocular deprivation (MD) on ocular dominance measured by binocular rivalry. In the main MDN session, MD was performed in the late evening, followed by 2 hours of sleep, during which EEG was measured. After the sleep session, ocular dominance was measured, which was followed by 4 hours of sleep, then ocular dominance was measured again in the morning. The results show that the effect of MD was preserved 6 hours after MD. The effect of MD correlated with sleep spindle and slow oscillation measures. The questions asked by the study are timely and findings are important in understanding the visual cortical plasticity in human adults, but I have some concerns regarding the experimental design, analysis, and interpretation of the results, which are listed below.

      Thank you for the positive summary of our results.

      • The authors investigated EEG activities in the central and occipital regions. The results of the relationship between slow oscillations / sleep spindles and deprivation index are very interesting. However, it appears that the activities were averaged across hemispheres in the occipital region. Previous studies (e.g. Lunghi et al., 2011; Binda et al., 2018) have demonstrated that MD is associated with up-scaling of the deprived eye and with down-scaling of the non-deprived eye (page 11). I wonder whether sleep slow oscillations and / or spindles are modulated locally in the deprived occipital region? To answer the first question raised by the authors (how MD affects subsequent sleep), wouldn't it be important to compare between deprived vs. non-deprived regions?

      In humans, the pure monocular recipient cortical regions are very small and represent only very far visual periphery. These regions are impossible to be located by EEG and they are also difficult to locate also with high resolution fMRI (ref to Koulla CB). Visual cortical organization is based on the visual field map: neurons whose visu.al receptive fields lie next to one another in visual space are located next to one another in cortex, forming one complete representation of contralateral visual space, independently of the eye from which the visual information comes. However, at finer scales ocular dominance columns exist and Binda et al (2018) showed that in adult humans MD boosts the BOLD response to the deprived eye, changing ocular dominance of V1 vertices, consistent with homeostatic plasticity. All these are well known facts to the visual community, and we believe are not worthwhile to discuss them.

      • To answer the second question (how sleep contributes to consolidation of visual homeostatic plasticity), the authors compared the deprivation index between two sessions, the main MDN and a control MDM session. The experimental designs for these two sessions were quite different. For example, MD was conducted in the evening in MDN, whereas it was conducted in the morning in MDM. Since there may be circadian effects on plasticity (Frank, 2016), the comparisons between these sessions may not be sufficient in investigating the effect of sleep itself (it could be merely due to circadian effect).

      Thank you for raising this important issue. We performed the dark exposure experiment in the morning because we wanted to minimize the occurrence of sleep during the two hours spent by participants lying down in complete darkness. Preventing sleep under these conditions in the late evening would have been extremely challenging. In order to investigate a possible influence of the circadian rhythm on visual homeostatic plasticity and its decay over time, we have performed an additional experiment. In this experiment, we have tested the effect of 2h of monocular deprivation in the same participants either early in the morning or late at night (at a time of the day comparable to the MDnight and MDmorn conditions in the main study). We report the results of this control experiment in the supplementary materials (Figure S2). We found that the effect of monocular deprivation follows a similar timecourse for the two conditions (ocular dominance returns to baseline levels within 120 minutes after eye-patch removal). Moreover, we also report that the effect of MD is slightly (but significantly) larger in the morning, compared to the evening. The results of this experiment rules out a contribution of circadian effects and reinforces the evidence of a specific effect of sleep in maintaining visual homeostatic plasticity.

      • The authors argue that NREM sleep consolidates the effect of MD. However, consolidation may last days to months or even years (Dudai et al., 2015). Since the effect is gone in 6 hours or so, it may be difficult to interpret it as consolidation. Although the findings of the effects of sleep on ocular dominance plasticity are interesting, the interpretations of the results may need to be clarified or revised.

      We thank the reviewer for raising this issue. We agree that the data show a substantial delay in the decay process of the MD effects after the removal of the patch. The present data indicate that specifically the sleep condition and not merely darkness would be responsible for the maintenance of the MD-induced effect during the night. Therefore, we gladly adhere to the request and propose to say that sleep stabilizes/maintains the effects of MD as long as sleep itself persists. Having said that, we would like to point out that the MD boost in amblyopic patients gets consolidated for up to one year and increases across night sleep as we reported in Lunghi, Sframeli et al (2019). Although these data strongly suggest that real consolidation may occur, we agree with the reviewer that our data did not directly address this question and changed accordingly the manuscript.

      Reviewer #2 (Public Review):

      This manuscript is an interesting follow up on a substantial literature on the role of sleep in promoting critical period ocular dominance plasticity, and the role of sleep in promoting adult V1 plasticity following presentation of a novel visual stimulus. For nearly all of that literature (i.e. coming from cats and mice), the focus has mainly been on Hebbian mechanisms. The authors here propose to advance the field by investigating plasticity in adult human V1, which the authors consider to be homeostatic rather than Hebbian, and which the authors consider to be a form of sleep-dependent consolidation. This is an exciting goal, and the overall study designs and control will test the effects of brief MD and subsequent sleep or wake in the dark on V1 processing for the two eyes.

      Thank you for the positive commentary on our study.

      However, the outcomes of the study suggest that the changes observed in V1 across sleep may actually be the opposite of consolidation - rather it is decay of an effect on V1 function caused by prior wake experience (MD), which disappears over subsequent hours.

      We thank the reviewer for raising this issue. We agree that the data show a substantial delay in the decay process of the MD effects after the removal of the patch. The present data indicate that specifically the sleep condition and not merely darkness would be responsible for the maintenance of the MD-induced effect during the night. Therefore, we gladly adhere to the request and propose to say that sleep stabilizes/maintains the effects of MD as long as sleep itself persists. We have revised the entire MS through the various sections to handle this important aspect and to consider that a classic correlate of memory consolidation during sleep (spindles density) also turns out to be associated with maintenance of the MD-induced ocular dominance effect.

      The authors claim differences due to sleep, but there is not a direct statistical comparison between sleep and awake-in-the-dark controls.

      We now directly compare the effect of monocular deprivation and its decay after two hours in the sleep vs dark exposure condition (MDnight vs MDmor). We now plot the results of the two conditions in the same graph (Figure 2). We found a significant interaction effect between the factors TIME (before and after) and CONDITION (MDnight and MDmor), indicating a specific role of sleep in prolonging the decay of short-term monocular deprivation.

      There is also no quantification of sleep architecture across the sleep period, to determine whether REM or NREM play a role.

      We have provided a summary table of sleep architecture in the revised version of the Supplementary Materials. The table shows descriptive statistics of sleep architecture on MDnight and CN. Also, we report the result of the paired comparison between the nights and the Spearman correlations between the deprivation indices (DI before and DI after) and the changes between the nights in sleep architecture. Tests indicate that MD does not produce any main effect on the sleep architecture and that there are no substantial associations found between sleep architecture parameters and deprivation indices. Thus, it appears that changes in SSO and spindle frequency and amplitude did not lead to an alteration in the amount of N2 or N3 sleep, as we might expect. At the beginning of the Results section we refer to the table and to the lack of statistically significant effects.

      Finally, while there are tests of changes in NREM oscillations with previous plasticity in wake, there are no direct tests of changes across sleep - i.e. the very changes that could be considered consolidation.

      We thank the reviewer for stimulating us to investigate whether there are any NREM parameters whose change within the sleep cycle can be related to the degree of plasticity maintenance observed at the end of the two hours of sleep.

      For this aim, we 1) partitioned SSO and spindle events into tertiles according to their occurrence time, 2) estimated the average measures of events belonging to the first and last tertile, and considered the variation between tertiles as an estimate of the changes across sleep. We then tested whether there is a consistent relationship between measures of individual retained plasticity (DI after) and changes in SSO and sleep spindles across sleep.

      We did the across sleep analysis of the SSO and spindles measurements and as previously explained none of the parameters showed associations across sleep with the individual DI after sleep. We report these results in the supplementary materials (Figure S8).

      Finally is also not clear that the decay of response changes is due to homeostatic plasticity - it could be just that- decay of plasticity that occurred previously. The terminology used - e.g. consolidation, homeostatic vs. Hebbian - don't seem well founded based on data.

      Thank you for raising an important point. In our study homeostatic plasticity refers to the effect of short-term monocular deprivation (so the plasticity occurred before sleep). We have rephrased the interpretation of our results in terms of stabilization/maintenance rather than consolidation of plasticity

      About homeostatic vs Hebbian plasticity, there is a quite large agreement in the literature stating that indeed the effects are different. Now we make clear in the text that Hebbian plasticity is usually associated to the boost of most successful signals in driving a neuronal response or a behavior. Here the MD produced a boost of the unused, and probably silent, eye and as such the boost it is very difficult to explain in term of Hebbian plasticity. We make now this clear in the introduction.

      Reviewer #3 (Public Review):

      In this study, Menicucci et al. induced plastic changes in ocular dominance by applying an eye-patch to the dominant eye (monocular deprivation, MD). This manipulation resulted in a shift toward even more dominance of the deprived eye, as assessed though a binocular rivalry protocol. This effect was stabilized during sleep whereas it quickly decreases in waking (in the dark). The authors interpret the MD effect as the resultant of cortical plasticity over primary visual areas and its maintenance during sleep as the consolidation of these changes. The authors thus connect their work to the literature on sleep consolidation. They further show that the magnitude of the MD effect is positively correlated with sleep markers that are involved in memory consolidation (slow oscillations and sleep spindles).

      However, I have first conceptual issues with this study. Indeed, previous findings on the replay of memories during sleep and their consolidation were mostly obtained in hippocampus-dependent forms of learning. Here, I do not really see what is it that would be replayed. Thus, I struggle understanding how rhythms, such as sleep spindles, that have been linked to the transfer of hippocampal memories to the neocortex, would be mechanistically associated with low-level plastic changes restricted to primary visual areas. In addition, the effects were observed over occipital electrodes, where sleep spindles are far fewer and lower in amplitude than other cortical regions. Furthermore, the association between MD-related plasticity and slow oscillations is interesting but, since these slow oscillations organize sleep slow waves, the lack of correlation with slow wave is surprising.

      We agree with the review that many of our results are indeed surprising, especially those related to the involvement of the spindles and for these reasons we believe that eLife would be the appropriate journal to present our work. At present the fact that sleep spindles have been associated manly in mediating transfer of memory does not exclude a more general involvement in other sensory functions.

      Connected to these conceptual issues, I think the present work has some important methodological limitations. First of all, the analyses included a rather small number of participants, which could make some analyses, in particular correlational analyses, severely underpowered.

      We thank you for stimulating us to emphasize this limitation. In the section Participants within Materials and methods we pointed out that the complexity of the experimental design and the need to take into account the complexity of sleep expressed through different parameters, the sample size used and the need for corrections for multiple tests led to highlight only associations characterized by strong effect size.

      Secondly, the approach used to explore the correlation between plasticity and sleep features focused on subset of electrodes (ROI) defined a priori. It is therefore difficult to conclude on the specificity of the results. Given the topographical maps provided by the authors, I am wondering if a more exhaustive analysis of the effect at the electrode level could not yield more robust findings.

      The need for ROIs is based on the interindividual variability of brain structures, in particular the large anatomical variability of V1 orientation implying a variably oriented dipole and a variable maximal representation of visual potentials over electrodes from Oz to CPz. Moreover, we have to cope with the volume conduction effect that limits EEG spatial resolution.

      With these limitations in mind, we very gladly adhere to the reviewer's request to evaluate the effects on individual electrodes in more detail. To this end we have prepared supplementary figures which show boxplots and scatterplots for the electrodes inside the ROIs to evaluate main effects and associations, respectively.

      Finally, given the number of features tested, I think it is important to clarify the strategy used to correct for multiple comparisons.

      We thank the reviewer for highlighting an unclear point. In the revised version of the Statistical analyses section, we have provided missing details of the procedure used for handling false positives due to multiple testing. Basically, we applied the FDR correction for each question we asked.

      For example, “at which time points does dominance remain significantly different from baseline?” or, “which EEG feature and in which area of the scalp shows changes significantly dependent on plasticity induced by monocular deprivation?” For each of these questions, we made a group of tests (for the first example, dependent on the number of points at which ocular dominance was assessed until the morning; for the second example, on the number of EEG features examined multiplied by the number of areas in which they were assessed) to which Benjamini & Hochberg's FDR correction was then applied.

    1. Author Response

      Reviewer #1 (Public Review):

      The role of the parietal (PPC), the retrospenial (RSP) and the the visual cortex (S1) was assessed in three tasks corresponding a simple visual discrimination task, a working-memory task and a two-armed bandit task all based on the same sensory-motor requirements within a virtual reality framework. A differential involvement of these areas was reported in these tasks based on the effect of optogenetic manipulations. Photoinhibition of PPC and RSP was more detrimental than photoinhibition of S1 and more drastic effects were observed in presumably more complex tasks (i.e. working-memory and bandit task). If mice were trained with these more complex tasks prior to training in the simple discrimination task, then the same manipulations produced large deficits suggesting that switching from one task to the other was more challenging, resulting in the involvement of possibly larger neural circuits, especially at the cortical level. Calcium imaging also supported this view with differential signaling in these cortical areas depending on the task considered and the order to which they were presented to the animals. Overall the study is interesting and the fact that all tasks were assessed relying on the same sensory-motor requirements is a plus, but the theoretical foundations of the study seems a bit loose, opening the way to alternate ways of interpreting the data than "training history".

      1) Theoretical framework:

      The three tasks used by the authors should be better described at the theoretical level. While the simple task can indeed be considered a visual discrimination task, the other two tasks operationally correspond to a working-memory task (i.e. delay condition which is indeed typically assessed in a Y- or a T-maze in rodent) or a two-armed bandit task (i.e. the switching task), respectively. So these three tasks are qualitatively different, are therefore reliant on at least partially dissociable neural circuits and this should be clearly analyzed to explain the rationale of the focus on the three cortical regions of interest.

      We are glad to see that the reviewer finds our study interesting overall and sees value in the experimental design. We agree that in the previous version, we did not provide enough motivation for the specific tasks we employed and the cortical areas studied.

      Navigating to reward locations based on sensory cues is a behavior that is crucial for survival and amenable to a head-fixed laboratory setting in virtual reality for mice. In this context of goal-directed navigation based on sensory cues, we chose to center our study on posterior cortical association areas, PPC and RSC, for several reasons. RSC has been shown to be crucial for navigation across species, poised to enable the transformation between egocentric and allocentric reference frames and to support spatial memory across various timescales (Alexander & Nitz, 2015; Fischer et al., 2020; Pothuizen et al., 2009; Powell et al., 2017). It furthermore has been shown to be involved in cognitive processes beyond spatial navigation, such as temporal learning and value coding (Hattori et al., 2019; Todd et al., 2015), and is emerging as a crucial region for the flexible integration of sensory and internal signals (Stacho & ManahanVaughan, 2022). It thus is a prime candidate area in the study of how cognitive experience may affect cortical involvement in goal-directed navigation.

      RSC is heavily interconnected with PPC, which is generally thought to convert sensory cues into actions (Freedman & Ibos, 2018) and has been shown to be important for navigation-based decision tasks (Harvey et al., 2012; Pinto et al., 2019). Specific task components involving short-term memory have been suggested to cause PPC to be necessary for a given task (Lyamzin & Benucci, 2019), so we chose such task components in our complex tasks to maximize the likelihood of large PPC involvement to compare the simple task to.

      One such task component is a delay period between cue and the ultimate choice report, which is a common design in decision tasks (Goard et al., 2016; Harvey et al., 2012; Katz et al., 2016; Pinto et al., 2019). We agree with the reviewer that traditionally such a task would be referred to as a workingmemory task. However, we refrain from using this terminology because it may cause readers to expect that to solve the task, mice use a working-memory dependent strategy in its strictest and most traditional sense, that is mice show no overt behaviors indicative of the ultimate choice until the end of the delay period. If the ultimate choice is apparent earlier, mice may use what is sometimes referred to as an embodiment-based strategy, which by some readers may be seen as precluding working memory. Indeed, in new choice-decoding analyses from the mice’s running patterns, we show that mice start running towards the side of the ultimate choice during the cue period already (Figure 1—figure supplement 1). Regardless of these seemingly early choices, however, we crucially have found much larger performance decrements from inhibition in mice performing the delay task compared to mice performing the simple task, along with lower overall task performance in the delay task, indicating that the insertion of a delay period increased subjective task difficulty. As traditional working-memory versus embodiment-based strategies are not the focus of our study here and do not seem to inform the performance decrements from inhibition, we chose to label the task descriptively with the crucial task parameter rather than with the supposedly underlying cognitive process.

      For the switching task, we appreciate that the reviewer sees similarities to a two-armed bandit task. However, in a two-armed bandit task, rewards are typically delivered probabilistically, whereas in our task, cue and action values are constant within each of the two rule blocks, and only the rule, i.e. the cuechoice association, reverses across blocks. This is a crucial distinction because in our design, blocks of Rule A in the switching task are identical to the simple task, with fixed cue-choice associations and guaranteed reward delivery if the correct choice is made, allowing a fair comparison of cortical involvement across tasks.

      We have now heavily revised the introduction, results, and discussion sections of the manuscript to better explain the motivation for the tasks and the investigated brain areas. These revisions cover all the points mentioned in this response.

      Furthermore, we agree with the reviewer that the three tasks are qualitatively different and likely depend on at least partially dissociable circuits. We consider the large differences in cortical inhibition effects between the simple and the complex tasks as evidence for this notion. We also want to highlight that in fact, we performed task-specific optogenetic manipulations presented in the Supplementary Material to further understand the involvement of different areas in task-specific processes. In what is now Figure 1—figure supplement 4, we restricted inhibition in the delay task to either the cue period only or delay period only, finding that interestingly, PPC or RSC inhibition during either period caused larger performance drops than observed in the simple task. We also performed epoch-specific inhibition of PPC in the switching task, targeting specifically reward and inter-trial-interval periods following rule switches, in what is now Figure 1—figure supplement 5. With such PPC inhibition during the ITI, we observed no effect on performance recovery after rule switches and thus found PPC activity to be dispensable for rule updates.

      For the working-memory task we do not know the duration of the delay but this really is critical information; per definition, performance in such a task is delay-dependent, this is not explored in the paper.

      We thank the reviewer for pointing out the lack of information on delay duration and have now added this to the Methods section.

      We agree that in classical working memory tasks where the delay duration is purely defined by the experimenter and varied throughout a session, performance is typically dependent on delay duration. However, in our delay task, the delay distance is kept constant, and thus the delay is not varied by the experimenter. Instead, the time spent in the delay period is determined by the mouse, and the only source of variability in the time spent in the delay period is minor differences in the mice’s running speeds across trials or sessions. Notably, the differences in time in the delay period were greatest between mice because some mice ran faster than others. Within a mouse, the time spent in the delay period was generally rather consistent due to relatively constant running speeds. Also, because the mouse had full control over the delay duration, it could very well speed up its running if it started to forget the cue and run more slowly if it was confident in its memory. Thus, because the delay duration was set by the mouse and not the experimenter, it is very challenging or impossible to interpret the meaning and impact of variations in the delay duration. Accordingly, we had no a priori reason to expect a relationship between task performance and delay duration once mice have become experts at the delay task. Indeed, we do not see such a relationship in our data (see plot here, n = 85 sessions across 7 mice). In order to test the effect of delay duration on behavioral performance, we would have to systematically change the length of the delay period in the maze, which we did not do and which would require an entirely new set of experiments.

      Also, the authors heavily rely on "decision-making" but I am genuinely wondering if this is at all needed to account for the behavior exhibited by mice in these tasks (it would be more accurate for the bandit task) as with the perspective developed by the authors, any task implies a "decision-making" component, so that alone is not very informative on the nature of the cognitive operations that mice must compute to solve the tasks. I think a more accurate terminology in line with the specific task considered should be employed to clarify this.

      We acknowledge that the previous emphasis on decision-making may have created expectations that we demonstrate effects that are specific to the ‘decision-making’ aspect of a decision task. As we do not isolate the decision-making process specifically, we have substantially revised our wording around the tasks and removed the emphasis on decision-making, including in the title. Rather than decision-making, we now highlight the navigational aspect of the tasks employed.

      The "switching"/bandit task is particularly interesting. But because the authors only consider trials with highest accuracy, I think they are missing a critical component of this task which is the balance between exploiting current knowledge and the necessity to explore alternate options when the former strategy is no longer effective. So trials with poor performance are thus providing an essential feedback which is a major drive to support exploratory actions and a critical asset of the bandit task. There is an ample literature documenting how these tasks assess the exploration/exploitation trade-off.

      We completely agree with the reviewer that the periods following rule switches are an essential part of the switching task and of high interest. Indeed, ongoing work in the lab is carefully quantifying the mice’s strategy in this task and exploring how mice use errors after switches to update their belief about the rule. In this project, however, a detailed quantification of switching task strategy seemed beyond the scope because our focus was on training history and not on the specifics of each task. While we agree with the reviewer about the interesting nature of the switching period, it would be too much for a single paper to investigate the detailed mechanisms of each task on top of what we already report for training history. Instead, we have now added quantifications of performance recovery after rule switches in Figure 1— figure supplement 2, showing that rule switches cause below-chance performance initially, followed by recovery within tens of trials.

      2) Training history vs learning sets vs behavioral flexibility:

      The authors consider "training history" as the unique angle to interpret the data. Because the experimental setup is the same throughout all experiments, I am wondering if animals are just simply provided with a cognitive challenge assessing behavioral flexibility given that they must identify the new rule while restraining from responding using previously established strategies. According to this view, it may be expected for cortical lesions to be more detrimental because multiple cognitive processes are now at play.

      It is also possible that animals form learning sets during successive learning episodes which may interfere with or facilitate subsequent learning. Little information is provided regarding learning dynamics in each task (e.g. trials to criterion depending on the number of tasks already presented) to have a clear view on that.

      We thank the reviewer for raising these interesting ideas. We have now evaluated these ideas in the context of our experimental design and results. One of the main points to consider is that for mice transitioned from either of the complex tasks to the simple task, the simple task is not a novel task, but rather a well-known simplification of the previous tasks. Mice that are experts on the delay task have experienced the simple task, i.e. trials without a delay period, during their training procedure before being exposed to delay periods. Switching task expert mice know the simple task as one rule of the switching task and have performed according to this rule in each session prior to the task transition. Accordingly, upon to the transition to the simple task, both delay task expert mice and switching task expert mice perform at very high levels on the very first simple task session. We now quantify and report this in Figure 2—figure supplement 1 (A, B). This is crucial to keep in mind when assessing ‘learning sets’ or ‘behavioral flexibility’ as possible explanations for the persistent cortical involvement after the task transitions. In classical learning sets paradigms, animals are exposed to a series of novel associations, and the learning of previous associations speeds up the learning of subsequent ones (Caglayan et al., 2021; Eichenbaum et al., 1986; Harlow, 1949). This is a distinct paradigm from ours because the simple task does not contain novel associations that are new to the mice already trained on the complex tasks. Relatedly, the simple task is unlikely to present a challenge of behavioral flexibility to these mice given our experimental design and the observation of high simple task performance in the first session after the task transition.

      We now clarify these points in the introduction, results, and discussion sections, also acknowledging that it will be of interest for future work to investigate how learning sets may affect cortical task involvement.

      3) Calcium imaging data versus interventions:

      The value of the calcium imaging data is not entirely clear. Does this approach bring a new point to consider to interpret or conclude on behavioral data or is it to be considered convergent with the optogenetic interventions? Very specific portions of behavioral data are considered for these analyses (e.g. only highly successful trials for the switching/bandit task) and one may wonder if considering larger or different samples would bring similar insights. The whole take on noise correlation is difficult to apprehend because of the same possible interpretation issue, does this really reflect training history, or that a new rule now must be implemented or something else? I don't really get how this correlative approach can help to address this issue.

      We thank the reviewer for pointing out that the relationship between the inhibition dataset and calcium imaging dataset is not clear enough. We restricted analyses of inhibition and calcium imaging data in the switching task to the identical cue-choice associations as present in the simple task (i.e. Rule A trials of the switching task). We did this because we sought to make the fairest and most convincing comparison across tasks for both datasets. However, we can now see that not reporting results with trials from the other rule causes concerns that the reported differences across tasks may only hold for a specific subset of trials.

      We have now added analyses of optogenetic inhibition effects and calcium imaging results considering Rule B trials. In Figure 1—figure supplement 2, we show that when considering only Rule B trials in the switching task, effects of RSC or PPC inhibition on task performance are still increased relative to the ones observed in mice trained on and performing the simple task. We also show that overall task performance is lower in Rule B trials of the switching task than in the simple task, mirroring the differences across tasks when considering Rule A trials only.

      We extended the equivalent comparisons to the calcium imaging dataset, only considering Rule B trials of the switching task in Figure 4—figure supplement 3. With Rule B trials only, we still find larger mean activity and trial-type selectivity levels in RSC and PPC, but not in V1, compared to the simple task, as well as lower noise correlations. We thus find that our conclusions about area necessity and activity differences across tasks hold for Rule B trials and are not due to only considering a subset of the switching task data.

      In Figure 4—figure supplement 4, we further leverage the inclusion of Rule B trials and present new analyses of different single-neuron selectivity categories across rules in the switching task, reporting a prevalence of mixed selectivity in our dataset.

      Furthermore, to clarify the link between the optogenetic inhibition and the calcium imaging datasets, we have revised the motivation for the imaging dataset, as well as the presentation of its results and discussion. Investigating an area’s neural activity patterns is a crucial first step towards understanding how differential necessity of an area across tasks or experience can be explained mechanistically on a circuit level. We now elaborate on the fact that mechanistically, changes in an area’s necessity may or may not be accompanied by changes in activity within that area, as previous work in related experimental paradigms has reported differences in necessity in the absence of differences in activity (Chowdhury & DeAngelis, 2008; Liu & Pack, 2017). This phenomenon can be explained by differences in the readout of an area’s activity. We now make more explicit that in contrast to the scenario where only the readout changes, we find an intriguing correspondence between increased necessity (as seen in the inhibition experiments) and increased activity and selectivity levels (as seen in the imaging experiments) in cortical association areas depending on the current task and previous experience. Rather than attributing the increase in necessity solely to these observed changes in activity, we highlight that in the simple task condition already, cortical areas contain a high amount of task information, ruling out the idea that insufficient local information would cause the small performance deficits from inhibition. Our results thus suggest that differential necessity across tasks and experience may still require changes at the readout level despite changes in local activity. We view our imaging results as an exciting first step towards a mechanistic understanding of how cognitive experience affects cortical necessity, but we stress that future work will need to test directly the relationship between cortical necessity and various specific features of the neural code.

      Reviewer #2 (Public Review):

      The authors use a combination of optogenetics and calcium imaging to assess the contribution of cortical areas (posterior parietal cortex, retrosplenial cortex, S1/V1) on a visual-place discrimination task. Headfixed mice were trained on a simple version of the task where they were required to turn left or right depending on the visual cue that was present (e.g. X = go left; Y = go right). In a more complex version of the task the configurations were either switched during training or the stimuli were only presented at the beginning of the trial (delay).

      The authors found that inhibiting the posterior parietal cortex and retrosplenial cortex affected performance, particularly on the complex tasks. However, previous training on the complex tasks resulted in more pronounced impairments on the simple task than when behaviourally naïve animals were trained/tested on a simple task. This suggests that the more complex tasks recruit these cortical areas to a greater degree, potentially due to increased attention required during the tasks. When animals then perform the simple version of the task their previous experience of the complex tasks is transferred to the simple task resulting in a different pattern of impairments compared to that found in behaviorally naïve animals.

      The calcium imaging data showed a similar pattern of findings to the optogenetic study. There was overall increased activity in the switching tasks compared to the simple tasks consistent with the greater task demands. There was also greater trial-type selectivity in the switching task compared to the simple task. This increased trial-type selectivity in the switching tasks was subsequently carried forward to the simple task so that activity patterns were different when animals performed the simple task after experiencing the complex task compared to when they were trained on the simple task alone

      Strengths:

      The use of optogenetics and calcium-imaging enables the authors to look at the requirement of these brain structures both in terms of necessity for the task when disrupted as well as their contribution when intact.

      The use of the same experimental set up and stimuli can provide a nice comparison across tasks and trials.

      The study nicely shows that the contribution of cortical regions varies with task demands and that longerterm changes in neuronal responses c can transfer across tasks.

      The study highlights the importance of considering previous experience and exposure when understanding behavioural data and the contribution of different regions.

      The authors include a number of important controls that help with the interpretation of the findings.

      We thank the reviewer for pointing out these strengths in our work and for finding our main conclusions supported.

      Weaknesses:

      There are some experimental details that need to be clarified to help with understanding the paper in terms of behavior and the areas under investigation.

      The use of the same stimuli throughout is beneficial as it allows direct comparisons with animals experiencing the same visual cues. However, it does limit the extent to which you can extrapolate the findings. It is perhaps unsurprising to find that learning about specific visual cues affects subsequent learning and use of those specific cues. What would be interesting to know is how much of what is being shown is cue specific learning or whether it reflects something more general, for example schema learning which could be generalised to other learning situations. If animals were then trained on a different discrimination with different stimuli would this previous training modify behavior and neural activity in that instance. This would perhaps be more reflective of the types of typical laboratory experiments where you may find an impairment on a more complex task and then go on to rule out more simple discrimination impairments. However, this would typically be done with slightly different stimuli so you don't introduce transfer effects.

      We agree with the reviewer that investigating the effects of schema learning on cortical task involvement is an exciting future direction and have now explicitly mentioned this in the Discussion section. As the reviewer points out, however, our study was not designed to test this idea specifically. Because investigating schema learning would require developing and implementing an entirely new set of behavioral task variants, we feel this is beyond the scope of the current work. As to the question of how generalized the effects of cognitive experience are, our data in the run-to-target task suggest that if task settings are sufficiently distinct, cortical involvement can be similarly low regardless of complex task experience (now Figure 3—figure supplement 1). This finding is in line with recent work from (Pinto et al., 2019), where cortical involvement appears to change rapidly depending on major differences in task demands. However, work in MT has shown that previous motion discrimination training using dots can alter MT involvement in motion discrimination of gratings (Liu & Pack, 2017), highlighting that cortical involvement need not be tightly linked to the sensory cue identity.

      It is not clear whether length of training has been taken into account for the calcium imaging study given the slow development of neural representations when animals acquire spatial tasks.

      We apologize that the training duration and the temporal relationship between task acquisition and calcium imaging was not documented for the calcium imaging dataset. Please see our detailed reply below the ‘recommendations for the authors’ from Reviewer 2 below.

      The authors are presenting the study in terms of decision-making, however, it is unclear from the data as presented whether the findings specifically relate to decision making. I'm not sure the authors are demonstrating differential effects at specific decision points.

      We understand that the previous emphasis on decision-making may have created expectations that we demonstrate effects that are specific to the ‘decision-making’ aspect of a decision task. As we do not isolate the decision-making process specifically, we have substantially revised our wording around the tasks and removed the emphasis on decision-making, including in the title. Rather than decision-making, we now highlight the navigational aspect of the tasks employed.

      While we removed the emphasis on the decision-making process in our tasks, we found the reviewer’s suggestion to measure ‘decision points’ a useful additional behavioral characterization across tasks. So, we quantified how soon a mouse’s ultimate choice can be decoded from its running pattern as it progresses through the maze towards the Y-intersection. We now show these results in Figure 1—figure supplement 1. Interestingly, we found that in the delay task, choice decoding accuracy was already very high during the cue period before the onset of the delay. Nevertheless, we had shown that overall task performance and performance with inhibition were lower in the delay task compared to the simple task. Also, in segment-specific inhibition experiments, we had found that inhibition during only the delay period or only the cue period decreased task performance substantially more than in the simple task, thus finding an interesting absence of differential inhibition effects around decision points. Overall, how early a mouse made its ultimate decision did not appear predictive of the inhibition-induced task decrements, which we also directly quantify in Figure 1—figure supplement 1.

    2. Reviewer #1 (Public Review):

      The role of the parietal (PPC), the retrospenial (RSP) and the the visual cortex (S1) was assessed in three tasks corresponding a simple visual discrimination task, a working-memory task and a two-armed bandit task all based on the same sensory-motor requirements within a virtual reality framework. A differential involvement of these areas was reported in these tasks based on the effect of optogenetic manipulations. Photoinhibition of PPC and RSP was more detrimental than photoinhibition of S1 and more drastic effects were observed in presumably more complex tasks (i.e. working-memory and bandit task). If mice were trained with these more complex tasks prior to training in the simple discrimination task, then the same manipulations produced large deficits suggesting that switching from one task to the other was more challenging, resulting in the involvement of possibly larger neural circuits, especially at the cortical level. Calcium imaging also supported this view with differential signaling in these cortical areas depending on the task considered and the order to which they were presented to the animals. Overall the study is interesting and the fact that all tasks were assessed relying on the same sensory-motor requirements is a plus, but the theoretical foundations of the study seems a bit loose, opening the way to alternate ways of interpreting the data than "training history".

      1) Theoretical framework:<br /> The three tasks used by the authors should be better described at the theoretical level. While the simple task can indeed be considered a visual discrimination task, the other two tasks operationally correspond to a working-memory task (i.e. delay condition which is indeed typically assessed in a Y- or a T-maze in rodent) or a two-armed bandit task (i.e. the switching task), respectively. So these three tasks are qualitatively different, are therefore reliant on at least partially dissociable neural circuits and this should be clearly analyzed to explain the rationale of the focus on the three cortical regions of interest. For the working-memory task we do not know the duration of the delay but this really is critical information; per definition, performance in such a task is delay-dependent, this is not explored in the paper.

      Also, the authors heavily rely on "decision-making" but I am genuinely wondering if this is at all needed to account for the behavior exhibited by mice in these tasks (it would be more accurate for the bandit task) as with the perspective developed by the authors, any task implies a "decision-making" component, so that alone is not very informative on the nature of the cognitive operations that mice must compute to solve the tasks. I think a more accurate terminology in line with the specific task considered should be employed to clarify this.

      The "switching"/bandit task is particularly interesting. But because the authors only consider trials with highest accuracy, I think they are missing a critical component of this task which is the balance between exploiting current knowledge and the necessity to explore alternate options when the former strategy is no longer effective. So trials with poor performance are thus providing an essential feedback which is a major drive to support exploratory actions and a critical asset of the bandit task. There is an ample literature documenting how these tasks assess the exploration/exploitation trade-off.

      2) Training history vs learning sets vs behavioral flexibility:<br /> The authors consider "training history" as the unique angle to interpret the data. Because the experimental setup is the same throughout all experiments, I am wondering if animals are just simply provided with a cognitive challenge assessing behavioral flexibility given that they must identify the new rule while restraining from responding using previously established strategies. According to this view, it may be expected for cortical lesions to be more detrimental because multiple cognitive processes are now at play.

      It is also possible that animals form learning sets during successive learning episodes which may interfere with or facilitate subsequent learning. Little information is provided regarding learning dynamics in each task (e.g. trials to criterion depending on the number of tasks already presented) to have a clear view on that.

      3) Calcium imaging data versus interventions:<br /> The value of the calcium imaging data is not entirely clear. Does this approach bring a new point to consider to interpret or conclude on behavioral data or is it to be considered convergent with the optogenetic interventions? Very specific portions of behavioral data are considered for these analyses (e.g. only highly successful trials for the switching/bandit task) and one may wonder if considering larger or different samples would bring similar insights. The whole take on noise correlation is difficult to apprehend because of the same possible interpretation issue, does this really reflect training history, or that a new rule now must be implemented or something else? I don't really get how this correlative approach can help to address this issue.

    1. Reviewer #1 (Public Review): 

      This study compares concentrations of immune mediators in vaginal samples of young women who report having had or report not having had vaginal sex. The study finds that the concentration of many immune markers is higher in samples of women who report having had sex than in samples of women who report not yet having had sex. While the results are interesting and suggestive, I do not believe this result necessarily indicates that vaginal sex increases levels of these immune mediators (a causal relationship) and that the evidence presented here is strong enough to draw this conclusion. 

      This study presents many methodological strengths. The sample size is amply sufficient to achieve high statistical power for this research question. A particular strength of this analysis is the relatively large number of participants who provided paired before and after sex samples. These samples are particularly valuable because stronger conclusions can be drawn from them, as their comparison is less likely to be confounded by unmeasured confounders. The statistical methods are largely appropriate for the research question, with the use of random effects to account for the correlation in multiple measures per participant. 

      The reason I would not draw causal conclusions from this analysis is that there is a high potential for unmeasured confounding of the association between sex and the concentration of immune mediators. The variables that were included in the multivariable analysis were for the most part not confounders, so the authors cannot claim that their results are free from potential confounding. Confounders are in general variables which are common causes of both the exposure of interest (vaginal sex) and the outcome (level of immune markers), and which are not on the causal pathway and are not a downstream effect of the outcome (inverse causality). The only variable included that is potential confounders is age. Most other variables (pregnancy, contraception, Nugent score, Chlamydia infection, and HSV-2 seropositivity) are either potential mediators of the effect of sex or downstream effects of the level of immune markers. It does not follow that adjustment for these variables would necessarily lead to an underestimation of the causal effect, as it is possible some of these variables have complex relationships with immune mediators, so it is difficult to predict how adjusting for these variables would influence results. Some of these variables are also potentially colliders, so adjustment for them may lead to bias (see an introduction to this topic in Holmberg MJ, Andersen LW. Collider Bias. JAMA. 2022;327(13):1282-1283. doi:10.1001/jama.2022.1820). There is no consideration of general social determinants of health that are more likely to be confounders because they potentially influence both sexual behavior and the immune system: socioeconomic status, ethnicity, education, employment, housing, food security, access to health care, etc. There is overwhelming evidence that young people who are sexually active tend to have very different socioeconomic characteristics than young people who are not sexually active. It is therefore difficult to assess whether the higher level of immune markers in women who are sexually active truly represents a causal effect of sex or simply reflect differences in the type of women who have sex. 

      The paired analysis also suggests that the main analysis is likely to be confounded. The evidence from the paired analysis is much stronger than the evidence from the unpaired main analysis because the paired analysis inherently adjusts for many unmeasured confounders that lead to women having sex by a certain age; the differences in paired samples are likely much closer to the causal effect of sex than the differences from the unpaired samples. We see that, in the paired analysis, the differences in levels of immune mediators before and after sex is systematically much smaller and non-significant for most immune markers. This suggests to me that the main analysis is confounded and overestimates the effect of sex on immune markers. If there is a causal effect, it is likely to be much smaller than the one estimated in the main unpaired analysis. 

      The authors argue that the smaller effects seen in the paired analysis might be due to an effect of time, where samples closer to the start of sex show smaller differences. However, I would need more evidence to be convinced of this. Notably, they use a spline analysis in Figure 4 to show the effect of time since vaginal sex. However, I would have liked to see the p-values for the time-dependent spline effect, in order to see whether the data supports that a difference in slopes before and after sex significantly improves the model. I suspect many of the splines are not significant and may not lend strong support to the hypothesis that time since sex has an effect. It is however difficult to assess this visually without a formal test. 

      While the results from the systematic review and meta-analysis are interesting and show that at least two other studies have shown similar results, I wonder whether these other studies do not have similar issues of confounding. The other previous studies have even fewer paired samples, so are likely to have weaker evidence than the current study. 

      In summary, I think this study has some important methodological strengths in terms of sampling and study design. However, I believe the interpretation of the results should be more tempered and cautious; while there are differences in levels of immune markers in women who have had and not had sex, there is not to my mind sufficient evidence that this difference is the result of a causal effect of initiation of vaginal sex, as there is likely to be some collider bias and unmeasured residual confounding in the analysis.

    1. Author Response

      Reviewer #2 (Public Review):

      In this study, Radtke et al. use a model of helminth infection in IL-4-IRES-eGFP (4get) mice, in which transcription at the Il4 locus is reported by eGFP, in order to define the transcriptional signatures and clonal relatedness between Il4-licensed, CD4+ T cells in the mesenteric lymph nodes (mLN) and lungs. By infecting 4get mice with the hookworm Nippostrongylus brasiliensis, which is well described to induce a robust type 2 immune response, the authors isolated and sorted eGFP+CD4+ T cells from the mLN and lungs at 10day post infection and performed single cell RNA-seq analysis using the 10X Chromium platform. Transcriptional profiling of activated CD4+ T cells with scRNA-seq has been performed in a murine model of allergic asthma, including the lung and lung-draining lymph nodes, but this study involved unbiased capture of all activated CD4+ T cells (Tibbitt et al., Immunity, 2019). Radtke et al. have used a distinct model with Nippostrongylus brasiliensis and have focused on sorting Il4-licensed, CD4+ T cells, allowing for a greater number of captured CD4+ T cells with a "type 2" lymphocyte program for single cell analysis. Furthermore, this study sought to identify distinct and overlapping transcriptional signatures and clonal relatedness between Il4-licensed, CD4+ T cells in two "distant" tissues. In support of such an approach, there is growing evidence for tissue-specific and model-specific features of CD4+ T cell differentiation (Poholek, Immunohorizons, 2021; Hiltensperger et al., Nature Immunol, 2021; Kiner et al., Nature Immunol, 2021).

      Upon dimension reduction, the authors found mLN- and lung-specific clusters, including two juxtaposed clusters that form a "bridge" between the mLN and lung compartments, suggesting immigrating and/or emigrating cells. Consistent with previous studies, the dominant lung cluster (L2) exhibited unique expression of Il5 and Il13, enhanced IL-33 and IL-2 signaling, and exhibited an effector/resident memory profile. The authors did find a small cluster in the mLN (ML4) with an effector/resident memory signature that also expressed CCR9, suggesting the potential for homing to the gut mucosa. Whether this population is specific to the mLN or would also be found in the lung-draining lymph nodes remains unclear. In the mLN, the authors also describe an iNKT cell cluster with CCR9 expression and a CD4+ T cell cluster with a myeloid gene signature, but the significance of these populations remains unclear.

      The authors then use RNA velocity analysis to infer the developmental trajectory of Il4licensed, CD4+ T cells from the two tissue sites. Consistent with previous studies, the authors found that T cell proliferation was associated with fate decisions. Furthermore, among the two lung CD4+ T cell clusters, L1 represents highly differentiated, effector Th2 cells while L2, which is juxtaposed to the mLN clusters, represents a population likely entering the lung with the potential to differentiate into L1 cells.

      Next, the authors perform TCR repertoire analysis. The authors identified a broad TCR repertoire with the majority of distinct TCRs being found in only one cell. Among the TCRs found in more than one cell, a substantial number of clones can be found in both tissue sites, which is consistent with the findings that individual CD4+ T cells clones can produce different types of effector cells (Tubo et al., Cell, 2013). The authors find significant overlap of clones between the mLN and lung. In addition, they also identify clones enriched in a particular site and suggest that this represents local expansion. However, an alternative possibility is that certain CD4+ T cell clones are expanded at a particular site because the specific TCR preferentially instructs a particular cell fate. For example, fate-mapping of individual naïve CD8+ T cells suggests that certain T cell clones exhibit a greatly heightened capacity to form tissue-resident memory T cells over other cell fates (Kok et al., J Exp Med, 2020). Lastly, the authors analyze CDR3 sequences, finding the most abundant CDR3 motif belonging to the invariant TCRa chain of iNKTs. Among conventional CD4+ T cells, the abundant CDR3 motifs were not restricted to an exact TCRa/TCRb combination beyond a slight preferential usage of the Trbv1 gene. While TCR repertoire analysis allows for defining clonal relatedness among Il4-licensed, CD4+ T cells, the importance and relevance of the above findings to the in vivo type 2 immune response remain unclear.

      There are several limitations of the study:

      (1) The authors use the term "Th2 cells" to describe all Il4-licensed, CD4+ T cells. While CD4+ T helper cell nomenclature has evolved, Th2 cells and Tfh2 cells are generally used to describe distinct subsets driven by unique transcriptional programs (Ruterbusch et al., Annu Rev Immunol, 2020). While previous data suggested that Tfh2 cells are precursors to effector Th2 cells, subsequent studies support a model in which Tfh2 and Th2 cells represent distinct developmental pathways and should be designated as distinct subsets (Ballesteros-Tato et al., Immunity, 2016; Tibbitt et al., Immunity, 2019). Consequently, the authors' broad use of "Th2 cells" and a description of "Th2 cell heterogeneity" includes CD4+ T cell subsets with distinct developmental pathways that includes canonical Th2 cells as well as Tfh2 and iNKT cells. The clarity of the manuscript would be improved by describing eGFP+CD4+ cells as Il4licensed, CD4+ T cells rather than Th2 cells.

      We thank the reviewer for the helpful comment and state now that our IL-4 reporter positive population also includes cells that don’t meet the Th2 criteria in the introduction (lines 76-78).

      (2) The authors used perfused lungs to isolate Il4-licensed, CD4+ T cells for scRNA-seq of "Th2 cells" in the lung tissue. However, previous studies indicate that leukocytes, including CD4+ T cells, in lung vasculature are not completely removed by perfusion, which confounds the interpretation of a tissue cell profile due to contaminating circulating cells (Galkina, E et al., J Clin Invest, 2005; Anderson, KG et al., Nat Protoc, 2014). This is particularly true in the lung and relevant as the authors found a lung cluster (L2) with a circulating signature and suggested that L2 may represent a recent immigrant "Th2 cells". Thus, it is unclear whether L2 cluster identifies immigrant Th2 cells or simply reflect the circulating Th2 cells trapped in the lung vasculature. The study would benefit of using the intravascular staining to discriminate cells within the lungs from those in the circulation (Anderson, KG et al., Nat Protoc, 2014) for the proper isolation of Il4-licensed lung CD4+ T cells to truly define immigrant "Th2 cells" within the lung parenchyma.

      According to the reviewers suggestion we performed an intravascular staining to discriminate cells within the lungs from those in the circulation (new Figure 2—figure supplement 1). According to the vascularity staining method (with slightly increased time between i.v. and sacrifice compared to Anderson, KG et al., Nat Protoc, 2014 for higher probability of successful staining) the L2 lung cluster is a mixture of circulating cells and immigrating cells which we describe in the text (lines 210-213). The finding that the cells from the vasculature and the cells we classified as “migrating” seem to cluster together based on the similarity of their expression profiles on our UMAP further supports the classification of the L2 tissue fraction as “recent immigrants”. We thank the reviewer for this helpful comment which improved the quality of the manuscript.

      (3) The authors describe T cell exchange/trafficking across organs. However, in general, interorgan trafficking refers to lymphocyte trafficking between distinct non-lymphoid tissues, rather than trafficking between lymph nodes and peripheral tissues (Huang et al., Science, 2018). Rather than inter-organ trafficking, the authors have described shared and distinct features of Il4-licensed, CD4+ T cells from a draining lymph node of one organ (gut) and a distant non-lymphoid organ (lung). The experimental approach used makes interpretation of some of the findings challenging. Specifically, canonical effector Th2 cell differentiation is well described to occur via two checkpoints, including the draining lymph node and the peripheral (non-lymphoid) tissue (Liang et al., Nature Immunol, 2011; Van Dyken et al., Nature Immunol, 2016; Tibbitt et al., Immunity, 2019). In the draining lymph node, Th2 cells acquire the capacity to express IL-4 alone, but do not complete effector Th2 cell differentiation until trafficking to the inflamed peripheral tissues and receiving additional inflammatory signals. Consequently, it is unclear whether the differences identified in the mesenteric lymph node and lungs simply reflect well-described differences between the two Th2 cell checkpoints or organ-specific differences (gut vs lung). Il4-licensed, CD4+ T cells from the intestinal mucosa and lung-draining lymph node would also be needed to truly define organ-specific differences during helminth infection.

      According to the reviewers suggestion, we avoid the term “inter-organ trafficking” and replaced it by “at distant sites” in the title. As the reviewer points out we chose the setup of comparing a lymphoid and a non-lymphoid organ to acquire a broad picture of Th2 developmental stages in Nb infection. The limited overlap in clusters on the UMAP shows that expression profiles between MLN and lung strongly differ. However, this notion is not in conflict with cells of both organs being in a different developmental stage. We added information to highlight it in the manuscript (lines 99-101). Lung and MLN (rather than medLN and MLN) were selected to enable clonal relatedness/distribution analysis of T cells at distant sites. As part of the revision we additionally provide newly generated single cell sequencing data that compares medLN and MLN cells at day 10 after Nb infection and find that UMAP clusters are largely overlapping between medLN and MLN (new Figure 1—figure supplement 3). This suggests that there is no broad medLN/MLN site specific signature present that would force the medLN and MLN cells to cluster apart. Addition of the newly generated medLN/MLN data on the lung/MLN UMAP based on shared anchors (Stuart et al. Cell. 2019) also leads to a clear separation between all LN and lung cells supporting that cells don’t cluster due to a site-specific respiratory tract vs intestinal tract signature but likely based on developmental stages (new Fig. 1C,D). An exception are defined effector clusters that show signs of a site-specific signature (L1 expresses Ccr8, MLN4 and MLN6 express Ccr9, differences are also suggested by clustering described in lines 247-252). A similar phenotype to the one observed on the transcriptional level is observed when we cluster medLN/MLN and lung cells based on scRNAseq suggested surface marker expression after flow cytometric analysis, extending analysis to medLN on protein level (new Fig. 3). It would have also been interesting to include lamina propria T cells as the reviewer suggested but we were not able to extract high quality cells at day 10 after Nb infection which is a common limitation in the Nb model.

      (4) The study includes a single time point (day 10) whereas Tibbitt et al. performed scRNAseq in the lung and lung-draining lymph node at multiple time points during type 2 immunity (Tibbitt et al., Immunity, 2019). As a result, it remains unclear how similarities or differences between the mesenteric lymph node and lung response would change over the duration of helminth infection, especially given the helminth life cycle involves multiple infection stages.

      As part of the revision we screened for surface marker expression in the single cell sequencing dataset on transcript level and stained these on protein level (new Fig. 3 and Figure 3—figure supplement 1). This allows to follow the populations defined by scRNAseq longitudinally (d0, d6, d8, d10) by flow cytometry during Nb infection. We compared medLN, MLN and lung. The dynamic of the response in the medLN and the MLN seems similar with a small delay in the MLN compared to medLN.

      Nb with its relatively well defined migratory path through the body provides a relevant complex model antigen naturally present in the respiratory tract and the intestine during infection. However, analysis of complexity and relevance does often invoke limitations. While stage 4 larvae are found in lung and gut and certainly provide a shared antigen basis between both sites (migration stage from lung to intestine; Camberis et al. Curr Protoc Immunol. 2003), we also think that there is a reasonable number of antigens shared between different larval stages and antigen (either actively secreted or from dying larvae) that are systemically distributed. However, there are probably immunogenic differences between larval stages but to analyze these is beyond the scope of the manuscript.

      While i.e. Tibbitt et al. nicely define cell clusters with a limited number of cells they don’t include any TCR analysis and clonal information. Not much was known about the expansion of T cells in the different clusters in one organ and between organs and we provide relevant data in this regard. Furthermore, HDM as an allergy model might invoke different Th2 differentiation pathways as. i.e. Tfh13 cells are found in allergic settings but not in worm models (Gowthaman U, Science. 2019). With our approach on single cell level we were able to show effective distribution of a number of T cell clones in a highly heterogeneous immune response and describe and functionally validate successfully expanded clones / expanded TCR chains later on (i.e. new Fig. 6). This kind of analysis has not been performed for a worm model before.

      (5) The study analyzed one scRNA-seq experiment that included two mice without validation via flow cytometry or other method to infer a role of a particular finding to the type 2 immune response in vivo.

      As noted above, we screened for surface marker expression in the single cell sequencing dataset on transcript level and measured these on protein level by flow cytometry as the reviewer suggested. This allows to follow the populations defined by scRNAseq longitudinally (d0, d6, d8, d10) during Nb infection (new Fig. 3). Furthermore, we added a newly generated set of scRNAseq data which confirms and extends findings made in the initial sequencing experiment (Fig. 1C,D and Figure 1—figure supplement 3). We also included validation experiments based on the performed TCR analysis and retrovirally expressed three TCRs from our study and confirm Nb specific expansion for one of them in vivo (new Fig. 6 and Figure 6—figure supplement 1).

    1. The third UDL principle is to provide multiple means of expression and action. We find it helpful to think of this as the principle that transcends social annotation: at this point, students use what they’ve learned through engagement with the material to create new knowledge. This kind of work tends to happen outside of the social annotation platform as students create videos, essays, presentations, graphics, and other products that showcase their new knowledge.

      I'm not sure I agree here as one can take other annotations from various texts throughout a course and link them together to create new ideas and knowledge within the margins themselves. Of course, at some point the ideas need to escape the margins to potentially take shape with a class wiki, new essays, papers, journal articles or longer pieces.

      Use of social annotation across several years of a program this way may help to super-charge students' experiences.

    1. “How might we, both individually and as a society, creatively generate new visions of what it means to grow old?”

      I agree with Minha's assessment of the project. Her research question is phrased perfectly for the overall topic of these combined videos. I can't stop, and I think I won't stop thinking about what it truly means for me to age. Each voice represents a background that provides a resource for both the voice owner and the audience to answer this question. Aging for me means being more cautious with words and actions. I consciously do this because I see everyone around me go through this process and talk about it. Aging for me means looking at my grandparents and and thinking what I will do and what I will look like when I reach their age. I thought about this question a few times when I was much younger, then there was a long period of me not worrying about it at all, and in college, the question came back to me at higher rate of frequencies. I often ask myself if my future kids/grandkids (if I ever have them) would care about me and life after death was something that seems to be in my head for the longest time. Aging for me means carrying new responsibilities. I know that there are things that was acceptable when I was one year younger and became inapplicable for me the year after, and vice versa. "What it means to age?" is repeatedly asked throughout the video, motivating us to give it a try and craft our own response. This research question has well summarized for the bigger and better understanding of the purpose that these 'storytellers' and collaborators embed in this project. Same with taylortots, I may revisit this project from time to time with newer perspectives about the definition of growing old. Thank you for the insightful post!

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, the authors set out to clarify the relationship between brain oscillations and different levels of speech (syllables, words, phrases) using MEG. They presented word lists and sentences and used task instructions to attempt to focus listeners' attention on different levels of linguistic analysis (syllables, words, phrases).

      1) I came away with mixed feelings about the task design: following each stimulus (sentence or word list), participants were asked to (a) press a button (i.e. nothing related to what they heard, (b) indicate which of two syllables was heard, (c) indicate which of two words was heard, (d) indicate which pair of words was present in the correct order. This task is the critical manipulation in the study, as it is intended to encourage (or in the authors' words, "require") participants to focus on different timescales of speech (syllable, word, and phrase, respectively). I very much like the idea of keeping the physical stimuli unchanged, and manipulating attention through task demands - an elegant and effective approach. At the same time, I have reservations about the degree to which these task instructions altered attention during listening. My intuition is that, if I were a participant, I would just listen attentively, and then answer the question about the specific level. For example, I don't know that knowing I would be doing a "word pair" task, I would be attending at a slower rate than a "word" task, as in both cases I would be motivated to understand all of the words in the sentence. I fully acknowledge my introspection (n=1) may be flawed here, but nevertheless, any additional support validating the effect of these instructions would help the interpretation of the MEG results.

      The reviewer points out that to do any task on sentences (such as a word task and a syllable task) participants’ strategy could be to understand the full meaning of the sentence and infer the lower level properties based on the understanding of the full sentence. We fully share this introspection, which would suggest that extracting sentence meaning is partly automatic (or at least a default mode of processing) and independent of the behavioral relevance. While the reviewer sees this as a downside of the design, this is part of what our study tried to disentangle (automatic versus task-dependent processing at lower frequency time-scales). If, as the reviewer points out, all processing of sentences would be automatic we should not find any effect of task (as the task should not affect the tracking response at all). We found that overall the tracking response is robust to task-induced manipulation of attention – the main effect that MI to phrases is higher for sentences than for word lists is robust across passive and task conditions. But that is not the whole story on the source level, where we do find some task effects, which indicates that task instructions do matter. This means that participants changed their strategy depending on the instructions, but that overall, tracking of linguistic structures such as phrases is automatic. We show that for the IFG MI phrasal time scales are tracked stronger during the phrase task versus the other tasks. This is also reflected in stronger STG-IFG connectivity during the phrasal versus passive task. These results speak against the interpretation of the reviewer that “task instructions“ do not “ altered attention during listening”. While there are these subtle task differences, especially in IFG, overall our findings do speak for an automatic tracking of phrasal rate structure in sentences independent of task. We therefore concluded that “automatic understanding of linguistic information, and all the processing that this entails, cannot be countered to substantially change the consequences for neural readout, even when explicitly instructing participants to pay attention to particular time-scales” (line 548-549).

      The analysis steps generally seem sensible and well-suited to answering the main claims of the study. Controlling for power differences between conditions through matching was a nice feature.

      2) I had a concern about accuracy differences (as seen in Figure 1) across stimulus materials and tasks. In particular, for the phrase task, participants were more accurate for sentence stimuli than word list stimuli. I think this makes a lot of sense, as a coherent sentence will be easier to remember in order than a list of words. But, I did not see accuracy taken into account in any of the analyses. These behavioral differences raise the possibility that the MEG results related to the sentence > word list contrast in phrases (which seems one of the most interesting findings in IFG) simply reflect differences in accuracy.

      With the caveat of the concern regarding accuracy differences, the research goals were clear and the conclusions were generally supported by the analyses.

      Thank you for pointing this out. We have now taken accuracy into account in our analysis. It did not change any of our main findings or conclusions, and strengthened the argument that tracking of phrases in sentences vs. word lists is stronger. The influence of task difficulty is a relevant point to investigate (also see point 1 of reviewer 2 and point 4 of reviewer 3). To do so we added accuracy (per participant per condition) as a factor in the mixed model (as well as all interactions with task and condition) for the MI, power, and connectivity analyses at the phrasal rate/delta band. Note that as for the passive task there is no accuracy, we removed the passive task from the analyses. We could also only run models with random intercepts (not random slopes), due to the reduced number of degrees of freedom when adding the factor accuracy to the models.

      For the MI analysis we only found an effect in MTG. Specifically, there was a three-way interaction between task, condition and accuracy (F(2, 91.9) = 3.4591, p = 0.036). To follow up on this three-way interaction we split the data per task. The condition*accuracy interaction was only (uncorrected) significant for the word combination task (F(1,24.8) = 5.296, p = 0.03 (uncorrected)) and not for any other task (p>0.1). In the word combination task, we found that the difference between sentences and word lists was the strongest at high accuracies (see below figure the predicted values of the model). One way to interpret this finding is that stronger phrasal-rate MI tracking in MTG promotes phrasal-rate processing (as indicated by accuracy) more in sentences than in word lists.

      MEG – behavioral performance relation. A) Predicted values for the phrasal band MI in the MTG for the word combination task separately for the two conditions. B) Predicted values for the delta band WPLI in the STG-MTG connection separately for the two conditions. Error bars indicate the 95% confidence interval of the fit. Colored lines at the bottom indicate individual datapoints.

      For power we did not find any effect of accuracy. For the connectivity analysis we found in the STG-MTG connectivity a significant conditionaccuracy interaction (F(1, 80.23)=5.19, p = 0.025). The conditionaccuracy interaction showed that lower accuracies were generally associated with stronger differences between the sentences and word lists (see figure; the opposite of the MI analysis). Thus, functional connections in the delta band are stronger during sentence processing when participants have difficulty with the task (independent of the task performed). This could indicate that low-frequency connections are more relevant for the sentence than the word list condition (as the reviewer also indicated in point 1).

      After correcting for accuracy there was also a significant task condition interaction (F(2,80.01) = 3.348, p = 0.040) and a main effect of condition (F(1,80.361) = 5.809, p = 0.018). While overall there was a stronger WPLI for the sentence compared to the word list condition, the interaction seemed to indicate that this was especially the case during the word task (p = 0.005 corrected), but not for the other tasks (p>0.1).

      We added the results of the accuracy analyses in the main manuscript as well as adding a dedicated section in our discussion section (page 21-22). Adding accuracy did not remove any of the effects we report in the original analyses. Therefore, none of these finding change the interpretation of the results as the task still had an influence on the MI responses of MTG and IFG. The effect of accuracy in the MTG refined the results showing that the effect was strongest there for participants with high accuracies. This relationship suggests a functional role of tracking through phase alignment for understanding phrasal structure.

      The methods now read: “MEG-behavioural performance analysis: To investigate the relation between the MEG measures and the behavioural performance we repeated the analyses (MI, power, and connectivity) but added accuracy as a factor (together with the interactions with the task and condition factor). As there is no accuracy for the passive task, we removed this task from the analysis. We then followed the same analyse steps as before. Since we reduced our degree of freedom, we could however only create random intercept and not random slope models”.

      The results now read: “MEG-behavioural performance relation. We found for the MI analysis a significant effect of accuracy only in the MTG. Here, we found a three-way interaction between accuracy task condition (F(2, 91.9) = 3.459, p = 0.036). Splitting up for the three different tasks we found only an uncorrected significant effect for the condition accuracy interaction for the phrasal task (F(1, 24.8) = 5.296, p = 0.03) and not for the other two tasks (p>0.1). In the phrasal task, we found that when accuracy was high, there was a stronger difference between the sentence and the word list condition compared to when accuracy was low, with stronger accuracy for the sentence condition (Figure 7A).

      No relation between accuracy and power was found. For the connectivity analysis we found a significant condition accuracy interaction for the STG-MTG connection (F(1,80.23) = 5.19, p = 0.025; Figure 7B). Independent of task, when accuracy was low the difference between sentence and word lists was stronger with higher WPLI fits for the sentence condition. After correcting for accuracy there was also a significant task condition interaction (F(2,80.01) = 3.348, p = 0.040) and a main effect of condition (F(1,80.361) = 5.809, p = 0.018). While overall there was a stronger WPLI for the sentence compared to the word list condition, the interaction seemed to indicate that this was especially the case during the word task (p = 0.005), but not for the other tasks (p>0.1).”

      The discussion now reads: “We found that across participants both the MI and the connectivity in temporal cortex influenced behavioural performance. Specifically, MTG-STG connections were, independent of task, related to accuracy. There was higher connectivity between MTG and STG for sentences compared to word lists at low accuracies. At high accuracies, we found that stronger MTG tracking at phrasal rates (measured with MI) for sentences compared to word lists during the word combination task. These results suggest that indeed tracking of phrasal structure in MTG is relevant to understand sentences compared to word lists. This was reflected in a general increase in delta connectivity differences when the task was difficult (Figure 7B). Participants might compensate for the difficulty using phrasal structure present in the sentence condition. When phrasal structure in sentences are accurately tracked (as measured with MI) performance is better when these rates are relevant (Figure 7A). These results point to a role for phrasal tracking for accurately understanding the higher order linguistic structure in sentences even though more research is needed to verify this. It is evident that the connectivity and tracking correlations to behaviour do not explain all variation in the behavioural performance (compare Figure 1 with 3). Plainly, temporal tracking does not explain everything in language processing. Besides tracking there are many other components important for our designated tasks, such as memory load and semantic context which are not captured by our current analyses.”

      Reviewer #2 (Public Review):

      In a MEG study, the authors investigate as their main question whether neural tracking at the phrasal time scale reflects linguistic structure building (testing different conditions: sentences vs. word-lists) or an attentional focus on the phrasal time scale (testing different tasks, passive listening, syllable task, word task, word combination/phrasal scale task). They perform the following analyses at brain areas (ROIs: STG, IFG, MTG) of the language network: (1) Mutual information (MI) between the acoustic envelope and the delta band neuronal signals is analyzed. (2) Power in the delta band is analyzed. (3) Connectivity is analyzed using debiased WPLI. For all analyses, linear mixed-models are separately conducted for each ROI. The main finding is that the sentence compared to the word-list condition is more strongly tracked at the phrasal scale (MI). In STG the effect was task-independent; in MTG the effect only occurred for active tasks; and in IFG additionally, the word-combining/phrasal scale task resulted in higher tracking compared to all other tasks. The authors conclude that phrasal scale neural tracking reflects linguistic processing which takes place automatically, while task-related attention contributes additionally at IFG (interpreted as combinatorial hub involved in language and non-language processing). The findings are stable when power differences are controlled. The connectivity analysis showed increased connectivity in the delta band (phrasal time scale) between IFG-STG in the phrasal-scale compared to the passive task (adding to the IFG MI findings). (Additionally, they separately analyze neural tracking at the syllabic and word time scale, which however is not in the main focus).

      Major strength/weaknesses of the methods and results:

      1) A major strength of the results is that part of them replicate the authors' earlier findings (i.e. higher tracking at the phrasal time scale for sentences compared to word-lists; Kaufeld et al., 2020), while they complement this earlier work by showing that the effects are due to linguistic processing and not to an attentional focus on the phrasal time scale due to the task (at least in STG and MTG; while the task plays a role for the IFG tracking). Another strength is that a power control analysis is applied, which allows excluding spurious results due to condition differences in power. A weakness of the method is that analyses were applied separately per ROI, and combined across correct/incorrect trials (if I understood correctly), no trial-based analysis was conducted (which is related to how MI is computed). Furthermore, several methodological details could be clarified in the manuscript.

      The authors achieved their aims by providing evidence that neuronal tracking at the phrasal time scale in STG and MTG depends on the presence of linguistic information at this scale rather than indicating an attentional focus on this time scale due to a specific task. Their results support the conclusion. Results would be strengthened by showing that these effects are not impacted by different amounts of correct/incorrect trials across conditions (if I understood that correctly).

      We thank the reviewer for her comments. It is correct that we collapsed across the correct and incorrect trials. This had various reasons (also see point 2 and 9 of reviewer 1 and point 4 of reviewer 3). First, our tasks function solely to direct participants’ attention to the various linguistic representations (syllables, words, phrases) and the timescales that they occur on. The three tasks are in a sense more control tasks to study the tracking response, and manipulate attention as tracking during spoken language comprehension occurs, rather than a case where the neural response to the tasks is itself to be studied. For example, in a typical working memory paradigm, it is only during correct trials that the relevant cognitive process occurs. In contrast, in our paradigm, it is likely that that spoken stimuli are heard and processing, in other words, sentence comprehension and word list perception occur, even during incorrect trials in the syllable condition. As such, we do not expect MI tracking responses to explain the behavioral data. However, we agree it is crucially important to show that MI differences are not a function of task performance differences.

      Second, there are clear differences in difficulty level of the trials within conditions. For example, if the target question was related to the last part of the audio fragment, the task was much easier than when it was at the beginning of the audio fragment. In the syllable task, if syllables also were (by chance) a part-word, the trial was also much easier. If we were to split up in correct and incorrect we would not really infer solely processes due to accurately processing the speech fragments, but also confounded the analysis by the individual difficulty level of the trials.

      To acknowledge this, we added this limitation to the methods. The methods now reads: “Note that different trials within a task were not matched for task difficulty. For example, in the syllable task syllables that make a word are much easier to recognize than syllables that do not make a word. Additionally, trials pertaining to the beginning of the sentence are more difficult than ones related to the end of the sentence due to recency effects.”.

      To still investigate if overall accuracy influenced the results we did add accuracy (across participants) to the mixed models. Note that as for the passive task there is no accuracy, we removed the passive task from the analyses. We could also only run models with random intercepts (not random slopes), due to the reduced number of degrees of freedom when adding the factor accuracy to the models.

      For the MI analysis we only found an effect in MTG. Specifically, there was a three-way interaction between task, condition and accuracy (F(2, 91.9) = 3.4591, p = 0.036). To follow up on this three-way interaction we split the data per task. The condition*accuracy interaction was only (uncorrected) significant for the word combination task (F(1,24.8) = 5.296, p = 0.03 (uncorrected)) and not for any other task (p>0.1). In the word combination task, we found that the difference between sentences and word lists was the strongest at high accuracies (see on the right attached figure the predicted values of the model). One way to interpret this finding is that stronger phrasal-rate MI tracking in MTG promotes phrasal-rate processing (as indicated by accuracy) more in sentences than in word lists.

      For power we did not find any effect of accuracy. For the connectivity analysis we found in the STG-MTG connectivity a significant conditionaccuracy interaction (F(1, 80.23)=5.19, p = 0.025). The conditionaccuracy interaction showed that lower accuracies were generally associated with stronger differences between the sentences and word lists (see figure below; the opposite of the MI analysis). Thus, functional connections in the delta band are stronger during sentence processing when participants have difficulty with the task (independent of the task performed). This could indicate that low-frequency connections are more relevant for the sentence than the word list condition.

      MEG – behavioral performance relation. A) Predicted values for the phrasal band MI in the MTG for the word combination task separately for the two conditions. B) Predicted values for the delta band WPLI in the STG-MTG connection separately for the two conditions. Error bars indicate the 95% confidence interval of the fit. Colored lines at the bottom indicate individual datapoints.

      After correcting for accuracy there was also a significant task*condition interaction (F(2,80.01) = 3.348, p = 0.040) and a main effect of condition (F(1,80.361) = 5.809, p = 0.018). While overall there was a stronger WPLI for the sentence compared to the word list condition, the interaction seemed to indicate that this was especially the case during the word task (p = 0.005 corrected), but not for the other tasks (p>0.1).

      We added the results of the accuracy analyses in the main manuscript as well as adding a dedicated section in our discussion section (page 21-22). Adding accuracy did not remove any of the effects we report in the original analyses. Therefore, none of these finding change the interpretation of the results as the task still had an influence on the MI responses of MTG and IFG. The effect of accuracy in the MTG refined the results showing that the effect was strongest there for participants with high accuracies. This relationship suggests a functional role of tracking through phase alignment for understanding phrasal structure.

      The methods now read: “MEG-behavioural performance analysis: To investigate the relation between the MEG measures and the behavioural performance we repeated the analyses (MI, power, and connectivity) but added accuracy as a factor (together with the interactions with the task and condition factor). As there is no accuracy for the passive task, we removed this task from the analysis. We then followed the same analyse steps as before. Since we reduced our degree of freedom, we could however only create random intercept and not random slope models”.

      The results now read: “MEG-behavioural performance relation. We found for the MI analysis a significant effect of accuracy only in the MTG. Here, we found a three-way interaction between accuracytaskcondition (F(2, 91.9) = 3.459, p = 0.036). Splitting up for the three different tasks we found only an uncorrected significant effect for the condition*accuracy interaction for the phrasal task (F(1, 24.8) = 5.296, p = 0.03) and not for the other two tasks (p>0.1). In the phrasal task, we found that when accuracy was high, there was a stronger difference between the sentence and the word list condition compared to when accuracy was low, with stronger accuracy for the sentence condition (Figure 7A).

      No relation between accuracy and power was found. For the connectivity analysis we found a significant conditionaccuracy interaction for the STG-MTG connection (F(1,80.23) = 5.19, p = 0.025; Figure 7B). Independent of task, when accuracy was low the difference between sentence and word lists was stronger with higher WPLI fits for the sentence condition. After correcting for accuracy there was also a significant taskcondition interaction (F(2,80.01) = 3.348, p = 0.040) and a main effect of condition (F(1,80.361) = 5.809, p = 0.018). While overall there was a stronger WPLI for the sentence compared to the word list condition, the interaction seemed to indicate that this was especially the case during the word task (p = 0.005), but not for the other tasks (p>0.1).”

      The discussion now reads: “We found that across participants both the MI and the connectivity in temporal cortex influenced behavioural performance. Specifically, MTG-STG connections were, independent of task, related to accuracy. There was higher connectivity between MTG and STG for sentences compared to word lists at low accuracies. At high accuracies, we found that stronger MTG tracking at phrasal rates (measured with MI) for sentences compared to word lists during the word combination task. These results suggest that indeed tracking of phrasal structure in MTG is relevant to understand sentences compared to word lists. This was reflected in a general increase in delta connectivity differences when the task was difficult (Figure 7B). Participants might compensate for the difficulty using phrasal structure present in the sentence condition. When phrasal structure in sentences are accurately tracked (as measured with MI) performance is better when these rates are relevant (Figure 7A). These results point to a role for phrasal tracking for accurately understanding the higher order linguistic structure in sentences even though more research is needed to verify this. It is evident that the connectivity and tracking correlations to behaviour do not explain all variation in the behavioural performance (compare Figure 1 with 3). Plainly, temporal tracking does not explain everything in language processing. Besides tracking there are many other components important for our designated tasks, such as memory load and semantic context which are not captured by our current analyses.”

      The findings are an important contribution to the ongoing debate in the field whether neuronal tracking at the phrasal time scale indicates linguistic structure processing or more general processes (e.g. chunking).

      Reviewer #3 (Public Review):

      This manuscript presents a MEG study aiming to investigate whether neural tracking of phrasal timescales depends on automatic language processing or specific tasks related to temporal attention. The authors collected MEG data of 20 participants as they listened to naturally spoken sentences or word lists during four different tasks (passive listening vs. syllable task vs. word tasks vs. phrase task). Based on mutual information and Connectivity analysis, the authors found that (1) neural tracking at the phrasal band (0.8-1.1 Hz) was significantly stronger for the sentence condition compared to the word list condition across the classical language network, i.e., STG, MTG, and IFG; (2) neural tracking at the phrasal band was (at least tend significantly) stronger for phrase task than other tasks in the IFG; (3) the IFG-STG connectivity was increased in the delta-band for the phrase task. Ultimately, the authors concluded that neural tracking of phrasal timescales relied on both automatic language processing and specific tasks.

      Overall, this study is trying to tackle an interesting question related to the contributing factors for neural tracking of linguistic structures. The study procedure and analyses are well executed, and the conclusions of this paper are mostly well supported by data. However, I do have several major concerns.

      1. The title of the manuscript uses the description "tracking of hierarchical linguistic structure". In general, hierarchical linguistic structures involve multiple linguistic units with different timescales, such as syllables, words, phrases, and sentences. In this study, however, the main analysis only focused on the phrasal band (0.8-1.1 Hz). It seemed that there was no significant stimulus- or task-effect on the word band or syllabic band (supplementary figures). Therefore, it is highly recommended that the authors modify the related descriptions, or explain why neural tracking of phrases can represent neural tracking of hierarchical linguistic structures in the current study.

      We thank the reviewer for this comment. We meant to refer to the task manipulation directing attention to different levels of representation across the linguistic hierarchy. We have changed the title to “Neural tracking of phrases during spoken language comprehension is automatic and task-dependent.” We hope this resolves any inadvertent confusion we created. Furthermore, throughout the manuscript we ensure to talk about effect occurring for phrasal tracking at low frequency bands at not across any hierarchical linguistic structure. We agree that our findings cannot speak for any task-dependent effects along the hierarchy, only that at the phrasal level there is a difference between sentences and word lists.

      1. In Methods, the authors employed MI analyses on three frequency bands: 0.8-1.1 Hz for the phrasal band, 1.9-2.8 Hz for the word band, and 3.5-5.0 Hz for the syllabic band (line 191-192). As the timescales of linguistic units are various and overlapped in natural speech, I wonder how the authors define the boundaries of these frequency bands, and whether these bands are proper for the naturally spoken stimuli in the current study. These important details should be clarified.

      The frequency bands of the MI analysis were based on the stimuli, or in other words, are data driven. They reflect the syllabic, word, and phrasal rates in our stimulus set (calculated in Kaufeld et al., 2020). They were calculated by annotating the sentences by syllables, words, and phrasal and converting the rate of the linguistic units to frequency ranges. The information has been added to the manuscript. We acknowledge that unlike our stimulus set in natural speech the boundaries of these bands can overlap and now also state this (“While in our stimulus set the boundaries of the linguistic levels did not overlap, in natural speech the brain has an even more difficult task as there is no one-to-one match between band and linguistic unit [26]”, line number 211-213).

      1. What is missing in the manuscript are the explanations of the correlation between behavioral performance and neural tracking. In Results, the behavioral performance shows significant differences across the active tasks (Figure 1), but the MI differences across the tasks are relatively weak in IFG (Figure 3). In addition, the behavioral performance only shows significant differences between the sentence and word list conditions during the phrasal task, but the MI differences between the conditions are significant in MTG during the syllabic, word, and phrasal tasks. Explanations for these inconsistent results are expected.

      We answer this point together with point 4 below where we analyze the behavioral performance and the MEG responses.

      1. Since the behavioral performance of these active tasks is likely related to the temporal attention to relevant timescales of different linguistic units, I wonder whether there exist underlying neural correlates of behavioral performance (e.g., significant correlation between performance and mutual information). If so, it may be interesting and bring a new bright spot for the current study.

      The influence of task difficulty is a relevant point to investigate (also see point 1 of reviewer 2 and point 4 of reviewer 3). To do so we added accuracy (per participant per condition) as a factor in the mixed model (as well as all interactions with task and condition) for the MI, power, and connectivity analyses at the phrasal rate/delta band. Note that as for the passive task there is no accuracy, we removed the passive task from the analyses. We could also only run models with random intercepts (not random slopes), due to the reduced number of degrees of freedom when adding the factor accuracy to the models.

      For the MI analysis we only found an effect in MTG. Specifically, there was a three-way interaction between task, condition and accuracy (F(2, 91.9) = 3.4591, p = 0.036). To follow up on this three-way interaction we split the data per task. The condition*accuracy interaction was only (uncorrected) significant for the word combination task (F(1,24.8) = 5.296, p = 0.03 (uncorrected)) and not for any other task (p>0.1). In the word combination task, we found that the difference between sentences and word lists was the strongest at high accuracies (see the below figure the predicted values of the model). One way to interpret this finding is that stronger phrasal-rate MI tracking in MTG promotes phrasal-rate processing (as indicated by accuracy) more in sentences than in word lists.

      MEG – behavioral performance relation. A) Predicted values for the phrasal band MI in the MTG for the word combination task separately for the two conditions. B) Predicted values for the delta band WPLI in the STG-MTG connection separately for the two conditions. Error bars indicate the 95% confidence interval of the fit. Colored lines at the bottom indicate individual datapoints.

      For power we did not find any effect of accuracy. For the connectivity analysis we found in the STG-MTG connectivity a significant conditionaccuracy interaction (F(1, 80.23)=5.19, p = 0.025). The conditionaccuracy interaction showed that lower accuracies were generally associated with stronger differences between the sentences and word lists (see figure attached; the opposite of the MI analysis). Thus, functional connections in the delta band are stronger during sentence processing when participants have difficulty with the task (independent of the task performed). This could indicate that low-frequency connections are more relevant for the sentence than the word list condition.

      After correcting for accuracy there was also a significant task*condition interaction (F(2,80.01) = 3.348, p = 0.040) and a main effect of condition (F(1,80.361) = 5.809, p = 0.018). While overall there was a stronger WPLI for the sentence compared to the word list condition, the interaction seemed to indicate that this was especially the case during the word task (p = 0.005 corrected), but not for the other tasks (p>0.1).

      We added the results of the accuracy analyses in the main manuscript as well as adding a dedicated section in our discussion section (page 21-22). Adding accuracy did not remove any of the effects we report in the original analyses. Therefore, none of these finding change the interpretation of the results as the task still had an influence on the MI responses of MTG and IFG. The effect of accuracy in the MTG refined the results showing that the effect was strongest there for participants with high accuracies. This relationship suggests a functional role of tracking through phase alignment for understanding phrasal structure.

      While the findings can explain some behavioral effects, we agree with the reviewer that the behavioral results and the MI results don’t align. We note that our use of tasks to guide attention to different timescales and linguistic representations differs from the use of, for example, a working memory task where only the correct trials contain the relevant cognitive process. In working memory type paradigms, the MEG data should indeed explain the behavioral response. Our study was designed to test for effects of task demands on the neural tracking response to speech and language. As we are only using the tasks to control attention, we do not attempt to explain behavior through the MEG data or differences in MI.

      Thus, the phrasal tracking cannot explain all of the behavioral results (point 3). It is at this point unclear what could have caused this effect, but it quite likely that neural sources outside the speech and language ROIs we selected are in play. We discuss this now.

      The methods now read: “MEG-behavioural performance analysis: To investigate the relation between the MEG measures and the behavioural performance we repeated the analyses (MI, power, and connectivity) but added accuracy as a factor (together with the interactions with the task and condition factor). As there is no accuracy for the passive task, we removed this task from the analysis. We then followed the same analyse steps as before. Since we reduced our degree of freedom, we could however only create random intercept and not random slope models”.

      The results now read: “MEG-behavioural performance relation. We found for the MI analysis a significant effect of accuracy only in the MTG. Here, we found a three-way interaction between accuracytaskcondition (F(2, 91.9) = 3.459, p = 0.036). Splitting up for the three different tasks we found only an uncorrected significant effect for the condition*accuracy interaction for the phrasal task (F(1, 24.8) = 5.296, p = 0.03) and not for the other two tasks (p>0.1). In the phrasal task, we found that when accuracy was high, there was a stronger difference between the sentence and the word list condition compared to when accuracy was low, with stronger accuracy for the sentence condition (Figure 7A).

      No relation between accuracy and power was found. For the connectivity analysis we found a significant conditionaccuracy interaction for the STG-MTG connection (F(1,80.23) = 5.19, p = 0.025; Figure 7B). Independent of task, when accuracy was low the difference between sentence and word lists was stronger with higher WPLI fits for the sentence condition. After correcting for accuracy there was also a significant taskcondition interaction (F(2,80.01) = 3.348, p = 0.040) and a main effect of condition (F(1,80.361) = 5.809, p = 0.018). While overall there was a stronger WPLI for the sentence compared to the word list condition, the interaction seemed to indicate that this was especially the case during the word task (p = 0.005), but not for the other tasks (p>0.1).”

      The discussion now reads: “We found that across participants both the MI and the connectivity in temporal cortex influenced behavioural performance. Specifically, MTG-STG connections were, independent of task, related to accuracy. There was higher connectivity between MTG and STG for sentences compared to word lists at low accuracies. At high accuracies, we found that stronger MTG tracking at phrasal rates (measured with MI) for sentences compared to word lists during the word combination task. These results suggest that indeed tracking of phrasal structure in MTG is relevant to understand sentences compared to word lists. This was reflected in a general increase in delta connectivity differences when the task was difficult (Figure 7B). Participants might compensate for the difficulty using phrasal structure present in the sentence condition. When phrasal structure in sentences are accurately tracked (as measured with MI) performance is better when these rates are relevant (Figure 7A). These results point to a role for phrasal tracking for accurately understanding the higher order linguistic structure in sentences even though more research is needed to verify this. It is evident that the connectivity and tracking correlations to behaviour do not explain all variation in the behavioural performance (compare Figure 1 with 3). Plainly, temporal tracking does not explain everything in language processing. Besides tracking there are many other components important for our designated tasks, such as memory load and semantic context which are not captured by our current analyses.”

    1. Author Response

      Reviewer #1 (Public Review):

      The authors examined the relationships between humans' heartbeats and their ability to perceive objects using touch.

      Strengths: This study is a large and sophisticated one, with great attention to detail and systematic analysis of the resulting data. The hypotheses are clear and the study was carried out well. The presentation of the data visually is very informative. With such a large and high-quality set of data, the conclusions that we can draw should be clear and strong.

      Weaknesses: The main drawbacks for me were first, exactly how the data were analysed, and second that there seem to be too many results reported to get an overall view of what the study has found.

      First, there are always a number of choices that researchers can make when analysing their data. Too many choices in fact. So we always need to see a consistent, principled, and transparent account of how those choices were made and what the effects on the data were. At present, I think this needs to be improved, partly in the justification of the analyses that were done; partly by re-doing some analyses and the presentation of results.

      Second, I admit to being a little lost when trying to understand all of the analyses - why there were done, what choices were made, and what the findings were. In some cases, it felt a little bit like the analyses were decided on only quite late - after exploring the data. One clear way to address this would be to divide the main results into two kinds: confirmatory (those that the authors expected to do before the study was run), and exploratory (those that the authors decided to do only after seeing the data). This would be both good practice and would help to focus the reader on what are the most critical findings.

      Achievements: I think the presentation of results needs to be strengthened before I can decide whether the aims are achieved.

      Impact: This will also depend on the revision of the results.

      We thank the Reviewer for these comments. In the original manuscript we thought we have been clear as to those analyses that were planned and those that were exploratory. The planned analyses are in keeping with the previous studies in the literature on which this study was based (Al et al. 2020; Al et al. 2021; Grund et al. 2021). The only exploratory analysis was the inclusion of touch variance as a co-variate. We had not expected that participants would differ so much in how long they held their touch.

      Reviewer #2 (Public Review):

      In this article, the authors set out to discover whether the cardiac cycle influences active tactile discrimination, to better understand the putative relationship between interoception rhythms and exteroceptive perception. While numerous articles have looked at these relationships in the passive domain, here the authors designed an innovative active sensing task to better understand the interaction of sensorimotor processes with the cardiac rhythm.

      The authors report a series of consecutive analyses. In the first, they find that while active discriminative touch is not modulated by the cardiac cycle, non-discriminative touch is such that the start, median duration, and end time of touches are shifted forward along the cardiac cycle towards diastole. Next, the authors examined the proportion of total start and end touches within systole versus diastole and found that across both discrimination and control conditions, touch was roughly 10-25% more likely to terminate during diastole. Further, examining the median holding time, the authors found that touches initiated during systole were lengthened in duration, consistent with a perceptual inhibition by this phase. This last effect appeared to be greatest for the highest stimulus difficulty levels, further supporting the notion that some cardiac inhibition of sensory processing may be at stake. Finally, when examining physiological responses, the authors found that cardiac inter-beat intervals were lengthened during active touch, consistent with the hypothesis that the brain may exploit strategic cardiac deceleration to minimize inhibitory effects.

      Overall, the key effects of the manuscript are fascinating and robust. A major strength of the approach here is the task itself, which utilizes a well-controlled stimulus with multiple levels of task difficulty, as well as an elegant positive control condition. This enabled the authors to look rigorously at difficulty and stimulus condition interactions with the cardiac phase. This clearly pays off in the analyses, as the authors are able to construct a more informative story about how precisely cardiac timing events modulate perception.

      Statistically speaking, I found the overall approach to be rigorous and sound. The study is well powered for a psychophysical investigation of this nature, and the interpretation of results is based on robust effects in the presence of a strong positive control.

      We thank the reviewer for these positive comments on the original version of this paper.

      Reviewer #3 (Public Review):

      The manuscript presents a carefully designed and well-controlled study on active tactile perception and its relationship to internal bodily rhythms - the cardiac cycle. This work builds on previous studies which also showed that active perception/voluntary actions occur in certain phases of the cardiac cycle, but the previous research failed to show/was not designed to show the significance of these synchronizations for perception or behaviour. To my knowledge, this is the first report that seems to experimentally show that active perception in the cardiac diastole leads to behavioural advantages - better tactile discrimination.

      The manuscript itself is very clearly written, the introduction is concise but sufficient, while the results section is very well organised and I especially like how the authors guide the reader through the analysis and additional steps taken to understand the findings even better.

      Yet, despite careful study design, effective visualisations, and elegantly constructed story, there are some analytical choices that, in my opinion, are not sufficiently justified or explained (e.g., selecting a diastolic window equal in length to the duration of systole, instead of using the whole duration of diastole). Such analytical decisions could have (at least some) effects on the obtained results and thus conclusions drawn.

      We thank the Reviewer for these comments. The analyses referred to here were planned and specifically the choice of the windows for defining systole and diastole were identical to the studies in the literature on which this study was based (Al et al. 2020; Al et al. 2021).

    1. Reviewer #2 (Public Review):

      Context:<br /> The authors propose a new analysis of an already well-studied conceptual model of adaptation to a new environment. Individual genotypes are characterized by some (breeding value for) phenotype under gaussian stabilizing selection (meaning that fitness is a gaussian function of phenotype, centered around some optimum value). The scenario assumed is that an isolated population of fixed size is initially at equilibrium (between mutation, selection and genetic drift). This population is diploid and sexual with many unlinked loci acting additively on phenotype (across loci and between homologous chromosomes). This view simplifies the analysis but is also not inconsistent with various empirical analysis of locus specific effects on quantitative traits (the empirical support is discussed and reviewed in both introduction and discussion).

      Then a change in the environment induces a shift in the optimum without affecting any other parameter (strength of selection, population size, mutation effects, existing phenotypes), see figure 1. We wish to know how the population responds to this change, both in terms of phenotype distributions, and the underlying genetic basis (how alleles of various effects change in frequency and contribute to the phenotypic response).

      This process has been at the core of the modelling of adaptation for more than a century, as it is maybe the most natural conceptual framework to describe adaptation to a new environment (a "niche shift" so to speak). It is relevant to both the study of demographic/ecological and phenotypic responses to changing conditions, and to the genomics of the changes associated with this process.<br /> However, in spite of this long history (reviewed in introduction in broad lines), we do not have an exact mathematical description of this process. The reason is that the problem is in fact very complex: the genome is a sea of various genes, each bearing various alleles (depending on the individual), that further interact mutually by selection (even though loci are additive on phenotype), because fitness is not a linear function of phenotype. The simple population genetics with two alleles and one locus seem far away...

      I think it is fair to say that the main route to handle this problem, in predominantly sexual species, has been through the approximations of quantitative genetics. There, each locus is assumed of small effect and linkage disequilibrium between them is neglected. This has led to empirically testable, and often quite accurate, predictions on the response to selection in terms of mean phenotypic change. Yet, even under this broad approximation strategy, there are various ways to derive predictions, each neglecting one force or another (genetic drift most of the time), or looking at the process over short or longer timescales.

      Aim and achievements:<br /> The authors include their work within this broad framework, but set to derive new approximations that are intended to cover several of the existing approach as subcases, and especially to handle genetic drift effects in finite populations (large ones), and short vs. longer timescales. I believe they succeed quite well in doing so: they provide clear approximation methods (in appendix mostly) and substantial simulations to show their accuracy. The derivations are fairly technical but most of the time they manage to give an intuition of where they come from and illustrate this intuition via figures in the main text. They produce a prediction of two main observable dynamics: that of the (breeding value for) phenotype itself (its mean over time, variance, third moment), and that of the genetic contribution of various loci and alleles along the genome (depending on the allelic effect on phenotype). They also describe two timescales where the dynamics are fairly different, a short timescale where the mean phenotype is shifting (quite rapidly over tens/hundreds generations) towards the new optimum, and a longer timescale where the higher moments and mostly the genetic basis changes while the mean phenotype merely wanders in a narrow vicinity of the new optimum. The connection between the two timescales is important as it is the slight differences in allele fates during the first one that result in differences in long term behavior in the longer one (illustrated in figure 3).

      The main achievement on the phenotypic response is mostly to reobtain previous approximations under somewhat different or broader assumptions. This is not useless as it may explain why these known predictions (the "Lande model") are surprisingly robust to deviations from the required conditions (e.g. figure 2). However, I think that some extra exploration of the parameter space (away from the required conditions) would allow to really see when the Lande model does fail on mean phenotype dynamics over short timescales, as anticipated. The question of whether this range is relevant remaining open to empirical measurement.<br /> Therefore, the main contribution of this ms is not on phenotypic responses but on the underlying genetic basis, and what we may expect to observe when measuring QTL's or GWAS between two populations separated by an environmental shift in the past: are there many loci contributing limited difference, or fewer loci contributing most of it. In that respect, eqs 20-21 and 25-26-27, and figures 5 and 6 display the main findings and thei check by simulations. These findings, although stemming from quite elaborate derivations, yield a fairly simple and yet accurate outcome, at least in the parameter range studied. Various other parameter sets are also checked against simulations in the appendix, and the simulation code is made available for any further check (as exploring all the possible parameters is a fairly taunting task, for an article of its own probably).

      Limits:<br /> I believe the main limit of this work is fairly explained in the discussion: to achieve mathematical tractability (a full numerical treatment being inherently impossible given the many parameters), many simplifying assumptions must be made (simple fitness landscape, simple effect of the environmental change, simple demography etc.). This means that it is possible that empirical observations will differ from the predictions for various reasons. However, quantitative genetics have already proven reasonably robust and accurate in predicting observed phenotypic dynamics, using comparable approximations so it is not madness to hope that the same will happen concerning the genetic basis of adaptation. Also, I would suspect that the methods proposed in appendix will most likely extend fairly easily to some deviations from the model's assumption: change in phenotypic variance with the new environment (a form of plasticity), or in width of the fitness function, or change the population size, without too much effect on the main conclusions. Still, some other limits may not be overcome as easily (e.g. pleiotropy among multiple traits, or non-stationary optimum), but it seems (a priori) that part of the approach could still be adapted for these situations. The main "wall-hitting" limit of the paper is inherent in the very basis of the approach, namely assuming mild changes occurring in weakly linked polymorphic and numerous loci as opposed to strong changes occurring on more tightly linked and fewer loci. These limits are all fairly described in discussion.

      Overall, this paper is not an easy read, but not by lack of clarity, rather because the problem at hand is complex, and there is a lot of material to describe. Each part flows quite well in my opinion, but there are many parts to read.

      Potential impact:<br /> I believe that because it yields relatively simple analytic outcomes (at least the predictions in main text), the paper could be useful to data analysis, mostly in the field of genomics of adaptation where it may provide testable predictions for GWAS and QTL data. It could also be used to infer genetic distributions (v(a),f(a)) from observed QTL or GWAS data, if the model is deemed valid.

      In the field of theoretical population genetics, it may also provide a methodology to capture sexual adaptation dynamics in other contexts by mixing various approximation methods: connecting distinct timescales, connecting deterministic approximations for phenotype and diffusion approximations for allele frequencies. This may not be the first time of course (see e.g. "stochastic house of cards" and their extensions), but it is here used in the context of adaption dynamics rather than equilibria, for the first time I think.

    1. Rule #7: Predict the future. The same way you would predict what's going to happen in the next season of your favorite show. Is Beechum going to kill the President? Figure out what you think is going to happen in the future based on the details of what's happened in the past.This can however very quickly lead into the mistake of Historicism. The predictive power of History has a limit. While it is true that you can identify historical trends and you can make educated guesses as to them happening again if the conditions are met, I don't think it's necessarily a good thing to rely on it. The reason for this being that we then begin to look for similarities and lose sight of other factors which may change the situation, and we run the risk of attempting to formulate historical laws. In this sense I agree with Karl Popper, in that these 'laws' aren't falsifiable, they aren't testable, unlike the sciences; history does not have this luxury

      Predict With The Objetive of Validating And Changing Your Mental Model, trying to be as comprehensive as possible. Try to Think About the powers that rules and how their oppositions can react to their actions.

      Do not make your prediction long in extension. The further you try to get the chances that you would make a critical error in your prediction grows exponentially

      Some propms always have reveal valuable and memorable information about the period.

      Changing Your Mental Model, Not To Get IT

    1. Author Response

      Reviewer #2 (Public Review):

      This is an interesting and well-performed study that adds to the literature base. The authors investigated the role of a discrete brain pathway in binge drinking of alcohol. They adopted a multidisciplinary approach that overall suggested that alcohol-induced changes at synapses of anterior insula (AI) cortex inputs to the dorsolateral striatum (DLS) maintain binge drinking. Further, they suggest this may be a biomarker for the development of alcohol use disorder (AUD).

      Strengths:

      1. Extends previous studies and builds further evidence for AI→DLS involvement in aberrant alcohol intake.

      2. Adopts elegant approaches to isolate the defined connections. This included in vivo optogenetic stimulations (both open and closed loop), recording of defined synapses in slice preparations, applying in vivo optogenetic stimulation parameters to isolated brain slices

      3. Well-controlled for the most part, although at times the authors assert "specific" effects without unequivocal proof. For example, the insula also projects to the ventral striatum and this pathway has been implicated in regulation of alcohol intake in rodent models (Jaramillo et al., 2018), and is activated in heavy drinking humans during high threat related alcohol cue presentation (Grodin et al., 2018).

      4. Measures the microstructure of drinking behavior in subjects.

      5. Employed an artificial neural network and machine learning to interrogate data. After training the network it could predict both the fluid consumed (water vs alcohol) and the virus type based on drinking microstructure data.

      6. Applied a series of behavioral tests to confirm that stimulating the defined pathway was not in and of itself reinforcing, anxiogenic or altered locomotion.

      Weaknesses:

      1. Only used male mice, in humans binge drinking in females is a major problem and rates of AUD between males and females have been converging in recent times (Grant et al., 2015).

      We took age-matched female mice that were injected with AAV-ChR2 into AIC and had them undergo the same 3 weeks of Drinking in the Dark to replicate the male data displayed in Figure 1 with an experimental focus on AIC inputs. We then performed whole cell patch clamp electrophysiology in DLS brain slices from these female mice. We measured optically evoked input-output responses (oEPSCs), AMPA/NMDA current ratios (oNMDA/oAMPA), and paired pulse ratios (oPPR). These data are presented in supplemental figure 4. In contrast to males, we did not observe any effect of alcohol consumption on AIC inputs into the DLS of female mice compared to males. We also combined both male and female datasets to statistically determine if we had sex differences for these specific measures by the existence of a main effect and/or a sex x fluid interaction. We report these statistics in text from lines 180 to 195, where we note that we did not have a sex x fluid effect for oEPSCs but did note that we had a sex x fluid effect for our oNMDA/oAMPA synaptic plasticity measure. This finding further justifies the behavioral data and circuit manipulations being conducted in solely male mice.

      While this is a fascinating sex difference and important data for the field, this manuscript is not specifically about exploring sex differences per se. We believe we have done our due diligence and correctly reported the existence of sex differences, or the possibility of sex differences, but the electrophysiological findings that we later modulate in vivo are only present in males. We point out that future work is needed to determine the contribution of circuit-specific changes in females at these synapses. Ultimately it will take much more work to fully elucidate sex difference circuit-specific mechanisms that we feel are far beyond the scope of this manuscript.

      1. At times over-interpreted, especially with regards to specificity.

      We are not exactly sure what the reviewer is referring to with “regards to specificity,” but we have done our best to address what we think they are asking and hope that we have adequately addressed this critique. We added sentences (lines 173-178) regarding alcohol-induced plasticity at other inputs to DLS that were not tested and (lines 442 - 446) how we are not sure whether these synapses control consumption of other non-alcohol substances (but point out our prior sucrose drinking data from Muñoz et al., Nat. Comm. 2018).

      1. Lacks a mechanism, although the authors do acknowledge this.

      This is just a first step towards discovering a mechanism. We previously identified an unusually alcohol-sensitive synapse and are now elucidating its behavioral role and some associated plasticity at that synapse that may be part of a mechanism. With our new single session alcohol data to compare our 3 week drinking data to, we are closer to beginning the process of discovering a mechanism. Additional work that is beyond the scope of this manuscript is needed.

      1. I would like some more discussion about the potential for this to be a biomarker in humans.

      We have removed language in the body of the manuscript and expanded on the implications of our findings at the end of our results and discussion from lines 514 to 548.

      Reviewer #3 (Public Review):

      Haggerty et al. assess how the projection from the agranular insular cortex to the dorsolateral striatum contributes to binge drinking in mice. The authors use whole-cell patch-clamp electrophysiology to examine synaptic adaptations following binge drinking (Drinking-in-the-Dark) in male mice, finding a constellation of changes that include increased AMPA and NMDA receptor function at insula synapses onto striatal projection neurons. They go on to assess a causal role for this projection in regulating binge drinking using optogenetics, finding that stimulating insula->striatal transmission in vivo reduces total ethanol consumed during DID, along with several specific behavioral measurements of drinking microstructure. One of the most interesting of these findings is a decrease in "front-loading", or drinking during the very beginning of the session, a phenotype that has been associated with problematic drinking and alcohol use disorder in humans. Finally, the authors use machine learning to build a predictive model that can reliably discern stimulated mice from controls. These studies improve our understanding of the neurocircuitry that mediates binge drinking and synaptic and circuit adaptations that occur following binge drinking. Experiments are blinded and performed in a rigorous manner, including physiological validation experiments in support of the in vivo optogenetic manipulation. Despite many strengths, there are significant limitations and gaps in the electrophysiology studies included in this version of the manuscript. As acknowledged by the authors, there are curious findings that are seemingly at odds with each other, and further studies addressing cell type specificity and/or feedforward inhibition would significantly improve the interpretation of this work. Furthermore, the manuscript would be significantly improved by an expanded Introduction containing more specific background information along with a standalone Discussion to place these findings within the broader literature. Lastly, a major limitation of these studies is the low number of mice used for the in vivo optogenetic control experiments and the exclusion of female mice throughout.

      Major concerns:

      1) Expanded Introduction and Discussion. The Introduction does not discuss and/or downplays historical literature investigating neuroadaptations following binge drinking. Studies examining changes in glutamate receptor function within striatal circuits should be discussed in greater detail, rather than the broad pass and review citation included. Behavioral studies examining how the function of the insula and DLS regulate ethanol exposure should also be discussed, especially including work examining the insula to accumbens pathway. It would also be worthwhile to reference human studies implicating the insula and DLS in AUDs.

      We have expanded the introduction and discussion to include these topics.

      2) It is difficult to form a comprehensive picture of the electrophysiological changes reported in Figure 1. The data seems to indicate increased AMPAR function, even more increased NMDAR function, decreased glutamate release probability, and decreased population spikes. These conflicting findings are acknowledged and there are two possible factors mentioned in the manuscript - differential engagement of MSN populations and changes in feedforward inhibition through local interneurons. I disagree with the authors' dismissal of potential MSN subtype-specific effects contributing to these discrepancies. Although AIC inputs innervate D1 and D2 MSNs comparably under control conditions, it is quite possible that the pathways are differentially altered following DID, as has been observed in many reports of alcohol or drug exposure (e.g. Cheng et al. Biological Psychiatry 2017). On the other hand, I wholeheartedly agree with the authors that AIC-driven feedforward inhibition through local interneurons (or even MSNs) could explain the curious divergence between the synaptic and population-level changes depicted in Figure 1. I think additional experiments addressing to help connect the dots are critical in interpreting the changes described in this manuscript. The authors could consider targeted recordings from specific cell types (e.g. D1, D2, and/or interneurons), measurements of AMPA/NMDA receptor subunit stoichiometry, and/or additional experiments in conditions where feedforward transmission is blocked (e.g. PTX or TTX/4AP).

      The reviewer has excellent points that will help elucidate a mechanism. Many of these suggestions are planned experiments in our laboratory, but are, in our opinion, beyond the scope of the present manuscript. Please see our response to Reviewer #2’s 3rd stated weakness. We have revised the text to incorporate some of the points raised here.

      3) N=2 mice in the ICSS experiment in Figure 4J is not sufficient to interpret, and including error bars on this data set is misleading. There also appears to be a difference in distance traveled between GFP and ChR2 mice in Figure 4C, but statistics are not reported. It is also hard to understand what that might mean given the way these data are normalized.

      For this revised manuscript we reran this experiment with 6 animals per group and updated Figure 4 I and J and the accompanying methods section titled “Intracranial self-stimulation” to reflect the change. We also note that the new, correctly powered experiment confirmed the previous claim that AIC inputs to the DLS do not modulate operant responding behaviors.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors have used every possible combination and permutation of treatments at different stages of diapause and post diapause development in the mouse and used conditional gene knockouts at different stages to tease out the interactions of Foxa2 with Msx1 and LIF in the reactivation and implantation process in mice. The authors extend diapause further after treatments with progesterone and an estrogen-degrading chemical to show that this will prolong diapause in the presence of Msx1. Overall this study advances our knowledge of the cross-talk between uterine endometrium and the blastocyst during and after the remarkable phenomenon that is diapause.

      Strengths

      Demonstrating that Msx1 is critical to maintaining diapause, and that diapause is maintained in Foxa2 deficient mice have clarified their interactions. It is interesting that LIF triggers implantation on day 8 but cannot support the pregnancy to full term. Suppression of the estrogen effects by progesterone or fulvestrant increases the duration of diapause. Demonstrating that Foxa2 induces diapause via interactions with MSX1 shows Foxa2 plays such an important role in the control of diapause and adds another 'cog' to the complex wheel of its control.

      Weaknesses

      There is an assumption that everyone will understand the various manipulations that are done in this study - some effort needs to be made to clarify each experimental stage. How long are the embryos viable after the extension of the diapause by the various manipulations.

      The very positive review by a well-known expert in the field of diapause is reassuring, and we agree with her suggestions to improve the quality of the manuscript. As recommended, we now provide a scheme to summarize our findings to illustrate the length of embryo dormancy (see Fig. 7).

      Reviewer #3 (Public Review):

      Matsuo et al. have authored a manuscript describing the effects of depletion of the forkhead box gene, Foxa2, on embryogenesis and gestation in the mouse. The effects of this treatment are the induction of the diapause arrest in the development of the embryo and consequent dormancy. The manuscript is wellprepared, and the figures, for the most part, are didactic and interpretable. Although the conclusions are interesting, the principal weaknesses of the manuscript are the lack of novelty and the perceived absence of some controls and follow-up experiments.

      Controls and Follow-ups:

      1) The Cre/lox system depletes rather than deletes genes. Although in situ data are presented, these are not judged to be quantitative. The usual qPCR analysis of tissues could have established the quantity of depletion. Stupid but can be done. This is important because the frequency of implantation sites in both Cre/lox models (lines 111-113) may be attributable to the residual expression of Foxa2.

      The Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ mouse models used in the current study have been used in the previous studies (refs 7 and 8 in the manuscript). The deletion efficiency of Foxa2 in Foxa2f/fPgrCre/+ mice was examined by RT-PCR and IHC (figure 2 in ref 7); while the deletion efficiency in Foxa2f/fLtfCre/+ mice was examined by IHC (figure S1 in ref 8). The deletion efficiency has been proven by hundreds of publications since the generations of Pgr-cre in 2005 and Ltf-cre mice in 2014.

      Although these mouse lines have been used before, we confirmed the deletion of Foxa2 at the beginning of our study at protein levels (fig 1c) and RNA levels (fig 1d). We understand that the reviewer is trying to link the observation that some of the knockout animals still carried implantation sites on day 8 of pregnancy with the possibility that the deletion of Foxa2 is not complete. However, it is not uncommon to observe such phenotypes that are not fully penetrant even in systemic knockout mouse models. Nonetheless, we now provide real time PCR results of uterine Foxa2 on day 4 of pregnancy in all mouse models used in the current manuscript in the new supplemental figure 1.

      2) The most novel and salient finding of the present study is that the depletion of Foxa2 results in embryos that are in a state that "morphologically resembled dormant blastocysts". A useful experiment would have been to transplant these embryos to normal recipients or to culture them in vitro to determine whether they were capable of reactivation from the dormant state.

      Whether dormant embryos in Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ uteri can be reactivated is the main question we studied. The results in figures 4-6 address this question. The blastocysts in Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ uteri can be activated on day 4 as shown in figure 4b. Without any support, blastocysts in Foxa2f/fLtfCre/+ uteri still can be reactivated on day 8 (figure 4b). In the following experiments and results shown in figures 5 and 6, we tried to improve the uterine environment by supplementing progesterone and estrogen. Dormant embryos are successfully re-activated by a LIF injection and the pregnancies proceeded to full terms.

      This reviewer suggests using normal recipients to test the reactivation of dormant embryos. Given dormant embryos can be reactivated in a knockout uterine environment, embryo transfer experiments using normal recipients are an addition measure to test the integrity of embryonic dormancy. The embryo transfer experiments may be futile attempt in our studies because of the following reasons.

      The numbers of mated mutant females that yield blastocysts are relatively meager and so are the numbers of blastocysts recovery, especially from diapausing donors. It is well known that implantation rates after blastocyst transfer are compromised due the surgical trauma and anesthesia. Therefore, the results from these experiments may not provide meaningful information.

      Furthermore, during the pandemic our mouse colonies were drastically reduced, and we are still recovering from this downturn during this “New Normal”. Notably, pregnancy rate fluctuates throughout the year even if mice are housed in a controlled environment, and pregnancy rate is often relatively poor in mutant mice which of course depend on the genetic background and diets (DOI: 10.1126/scisignal.aam9011). Most importantly, viability of diapausing embryos is amply evident from our experiments (Figs. 4-6)

      3) Figure 3C indicates that embryos recovered on Day 8 had an extensive proliferation of ICM cells, but not trophoblast. Previous studies have explored the progression of entry and exit from diapause in the mouse (DOI: 10.1093/biolre/ioz017) showing that reactivation of the embryo from diapause commences in the ICM and then proceeds to the trophoblast. It therefore may be possible that proliferation in the trophoblast is not suspended, rather than the recovered blastocyst has resumed development and that mitotic activity has not yet reached the trophoblast.

      It is common to see KI67 expression in the ICM of dormant embryos. Figure 4D from the paper quoted by this reviewer presents Ki67 staining on embryos undergoing diapause at different stages. In our study, we showed Ki67 staining on dormant embryos collected on day 8, which equals D7.5 in their figure. Our data in figure 3C is consistent with observation shown. Without LIF, embryos remain dormant in Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ uteri.

      4) In Figure 4B, neither the Ltf nor the Pgr Cre treated uteri appear normal on Day 8. This is not consistent with the conclusion in lines 170 et seq. of the manuscript. It is difficult to discern normality from Figure 4C, but it is clear that the PgrCre-lox uterus does not conform to the controls. It is later noted that there is edema in the uteri at this time in the Day 8-treated PgrCre/lox mice (lines 217-218).

      We have clarified our description.

      Lines 173-176: Notably, implantation sites with a normal appearance were observed in Foxa2f/fLtfCre/+ uteri when LIF was given on day 8 of pregnancy (Figure 4b), albeit Foxa2f/fPgrCre/+ uteri with edema have only faint blue bands. Histology of implantation sites confirmed this observation.

      In line 217, we stated that “the uterine edema in Foxa2f/fPgrCre/+ females two days after LIF injection on day 8…”. Figure 4B showed that Foxa2f/fPgrCre/+ uteri with edema have some very faint blue bands suggesting implantation-like reaction. But we do not think they are real implantation, which is confirmed by figures 4c and e.

      5) In Figure 6B, the implantation sites appear substantially smaller in mice of both mutant genotypes. Supplemental Figure 4 suggests that this is not the case. It is unclear whether the samples chosen for figures are representative of the uteri and whether variation in the size of implantation sites was observed.

      In figure 6B, the Foxa2f/f uteri samples were collected on day 10 of pregnancy, which is same as when Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ tissues were collected. Since embryos implanted in Foxa2f/f uteri on day 4 night but in Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ uteri on day 8 after LIF injections, the implantation sites are bigger in Foxa2f/f uteri. However, in supplemental figure 4 the implantation sites were collected from Foxa2f/f females on day 6 of pregnancy, which show similar size as compared to implantation sites collected from Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ females 2 days after LIF injection.

    1. Reviewer #2 (Public Review):

      Transgenerational effects (TE) (usually defined as multigenerational effects lasting for at least three generations) generated a lot of interest in recent years but the adaptive value of such effects is unclear. In order to understand the scope for adaptive TE we need to understand i) whether such effects are common; ii) whether they are stress-specific; and iii) if there are trade-offs with respect to performance in different environments. The last point is particularly important because F1, F2 and F3 descendants may encounter very different environments. On the other hand, intergenerational effects (lasting for one or two generations) are relatively common and can play an important role in evolutionary processes. However, we do not know whether intergenerational and transgenerational effects have same underlying mechanisms.

      This study makes a big step towards resolving these questions and strongly advances our understanding of both phenomena. Much of the previous work on mechanisms of multigenerational effects has been conducted in C. elegans and this works uses the same approach. They focus on bacterial infection, Microsporidia infection, larval starvation and osmotic stress. I did not quite understand why the authors chose to focus on P. vranovensis rather than P. aeruginosa P14 that has been used in previous studies of transgenerational effects in C. elegans. However, this is a minor point because I guess they were interested in broad transgenerational responses to bacterial infection rather than in strain-specific ones. The authors used different Caenorhabditis species, which is another strength of this study in addition to using multiple stresses.

      They found 279 genes that exhibited intergenerational changes in all C species tested, but most interestingly, they show that a reversal in gene expression corresponds to a reversal in response to bacterial infection (beneficial in two species and deleterious on one). This is very intriguing! This was further supported by similar observations of osmotic stress response.

      They also report that intergenerational effects are stress-specific and there have deleterious effects in mismatched environments, and, importantly, when worms were subject to multiple stresses. It is quite likely that offspring will experience a range of environments and that several environmental stresses will be present simultaneously in nature. I really liked this aspect of this work as I think that tests in different environments, especially environments with multiple stresses, are often lacking, which limits the generality of the conclusions.

      Another interesting piece of the puzzle is that beneficial and deleterious effects could be mediated by the same mechanisms. It would be interesting to explore this further. However, this is not a real criticism of this work. I think that the authors collected an impressive dataset already and every good study generates new research questions.

      Given these findings, I was particularly keen to see what comes of transgenerational effects. The general answer was that there aren't many, and the authors conclude that all intergenerational effects that they studied are largely reversible and that intergenerational and transgenerational effects represent distinct phenomena. While I think that this is a very important finding, I am not sure whether we can conclude that intergenerational and transgenerational effects are not related.

      In my view, an alternative interpretation is that intergenerational effects are common while transgenerational effects are rare. Because intergenerational effects are stress-specific, transgenerational effects could be stress-specific as well.

      Perhaps different mechanisms regulate intergenerational responses to, say, different forms of starvation (e.g. compare opposing transgenerational responses to prolonged larval starvation (Rechavi et al. doi:10.1016/j.cell.2014.06.020) and rather short adulthood starvation (Ivimey-Cook et al. 2021 https://doi.org/10.1098/rspb.2021.0701). Perhaps some (most?) forms of starvation generate only intergenerational responses and do not generate transgenerational responses. But some do. Those forms of starvation that generate both intergenerational and transgenerational effects could do so via same mechanisms and represent the same phenomenon. I am by no means saying this is the case, but I am not sure that the absence of evidence of transgenerational effects in this study necessarily suggests that inter- and trans-generational effects are different phenomena.

      The only concern real concern was the lack of phenotypic data on F3 beyond gene expression. Ideally, I would like to see tests of pathogen avoidance and starvation resistance in F3. However, given the amount of work that went into this study, the lack of strong signature of potential transgenerational effects in gene expression, and the fact that most of these effects were shown previously to last only one generation, I do not think this is crucial.

      It would be very interesting to compare gene expression and other phenotypic responses in F1 and F3 between P. vranovensis and PA14. Also, it would be interesting to test the adaptive value of intergenerational and transgenerational effects after exposure to both strains in different environments. This is would be very informative and help with understanding the evolutionary significance of transgenerational epigenetic inheritance of pathogen avoidance as reported previously. Why response to P. vranovensis is erased while response to PA14 is maintained for four generations? Are nematodes more likely to encounter one species than the other? Again, however, this is not something necessary for this study.

      The main strengths of this paper are i) use of multiple stresses; ii) use of multiple species; iii) tests in different environments; and iv) simultaneous evaluation of intergenerational and transgenerational responses. This study is first of a kind, and it provides several important answers, while highlighting clear paths for future work. Excellent work and I think it will generate a lot of interest in the community.

    1. Reviewer #2 (Public Review):

      This paper seeks to test the extent to which adaptation via selective sweeps has occurred at disease-associated genes vs genes that have not (yet) been associated with disease. While there is a debate regarding the rate at which selective sweeps have occurred in recent human history, it is clear that some genes have experienced very strong recent selective sweeps. Recent papers from this group have very nicely shown how important virus interacting proteins have been in recent human evolution, and other papers have demonstrated the few instances in which strong selection has occurred in recent human history to adapt to novel environments (e.g. migration to high altitude, skin pigmentation, and a few other hypothesized traits).

      One challenge in reading the paper was that I did not realize the analysis was exclusively focused on Mendelian disease genes until much later (the first reference is not until the end of the introduction on pages 7-8 and then not at all again until the discussion, despite referring to "disease" many times in the abstract and throughout the paper). It would be preferred if the authors indicated that this study focused on Mendelian diseases (rather than a broader analysis that included complex or infectious diseases). This is important because there are many different types of diseases and disease genes. Infectious disease genes and complex disease genes may have quite different patterns (as the authors indicate at the end of the introduction).

      The abstract states "Understanding the relationship between disease and adaptation at the gene level in the human genome is severely hampered by the fact that we don't even know whether disease genes have experienced more, less, or as much adaptation as non-disease genes during recent human evolution." This seems to diminish a large body of work that has been done in this area. The authors acknowledge some of this literature in the introduction, but it would be worth toning down the abstract, which suggests there has been no work in this area. A review of this topic by Lluis Quintana-Murci1 was cited, but diminished many of the developments that have been made in the intersection of population genetics and human disease biology. Quintana-Murci says "Mendelian disorders are typically severe, compromising survival and reproduction, and are caused by highly penetrant, rare deleterious mutations. Mendelian disease genes should therefore fit the mutation-selection balance model, with an equilibrium between the rate of mutation and the rate of risk allele removal by purifying selection", and argues that positive selection signals should be rare among Mendelian disease genes. Several other examples come to mind. For example, comparing Mendelian disease genes, complex disease genes, and mouse essential genes was the major focus of a 2008 paper2, which pointed out that Mendelian disease genes exhibited much higher rates of purifying selection while complex disease genes exhibited a mixture of purifying and positive selection. This paper was cited, but only in regard to their findings of complex diseases. A similar analysis of McDonald-Kreitman tables3 was performed around Mendelian disease genes vs non-disease genes, and found "that disease genes have a higher mean probability of negative selection within candidate cis-regulatory regions as compared to non-disease genes, however this trend is only suggestive in EAs, the population where the majority of diseases have likely been characterized". Both of these studies focused on polymorphism and divergence data, which target older instances of selection than iHS and nSL statistics used in the present study (but should have substantial overlap since iHS is not sensitive to very recent selection like the SDS statistic). Regardless, the findings are largely consistent, and I believe warrant a more modest tone.

      There are some aspects of the current study that I think are highly valuable. For example, the authors study most of the 1000 Genomes Project populations (though the text should be edited since the admixed and South Asian populations are not analyzed, so all 26 populations are not included, only the populations from Africa, East Asia, and Europe are analyzed; a total of 15 populations are included Figures 2-3). Comparing populations allows the authors to understand how signatures of selection might be shared vs population-specific. Unfortunately, the signals that the authors find regarding the depletion of positive selection at Mendelian disease genes is almost entirely restricted to African populations. The signal is not significant in East Asia or Europe (Figure 2 clearly shows this). It seems that the mean curve of the fold-enrichment as a function of rank threshold (Figure 3) trends downward in East Asian and European populations, but the sampling variance is so large that the bootstrap confidence intervals overlap 1). The paper should therefore revise the sentence "we find a strong depletion in sweep signals at disease genes, especially in Africa" to "only in Africa". This opens the question of why the authors find the particular pattern they find. The authors do point out that a majority of Mendelian disease genes are likely discovered in European populations, so is it that the genes' functions predate the Out-of-Africa split? They most certainly do. It is possible that the larger long-term effective population size of African populations resulted in stronger purifying selection at Mendelian disease genes compared to European and East Asian populations, where smaller effective population sizes due to the Out-of-Africa Bottleneck diminished the signal of most selective sweeps and hence there is little differentiation between categories of genes, "drift noise"). It is also surprising to note that the authors find selection signatures at all using iHS in African populations while a previous study using the same statistic could not differentiate signals of selection from neutral demographic simulations4.

      The authors find that there is a remarkably (in my view) similar depletion across all but one MeSH disease classes. This suggests that "disease" is likely not the driving factor, but that Mendelian disease genes are a way of identifying where there are strongly selected deleterious variants recurrently arising and preventing positively selected variants. This is a fascinating hypothesis, and is corroborated by the finding that the depletion gets stronger in genes with more Mendelian disease variants. In this sense, the authors are using Mendelian disease genes as a proxy for identifying targets of strong purifying selection, and are therefore not actually studying Mendelian disease genes. The signal could be clearer if the test set is based on the factor that is actually driving the signal.

      One of the most important steps that the authors undertake is to control for possible confounding factors. The authors identify 22 possible confounding factors, and find that several confounding factors have different effects in Mendelian disease genes vs non-disease genes. The authors do a great job of implementing a block-bootstrap approach to control for each of these factors. The authors talk specifically about some of these (e.g. PPI), but not others that are just as strong (e.g. gene length). I am left wondering how interactions among other confounding factors could impact the findings of this paper. I was surprised to see a focus on disease variant number, but not a control for CDS length. As I understand it, gene length is defined as the entire genomic distance between the TSS and TES. Presumably genes with larger coding sequence have more potential for disease variants (though number of disease variants discovered is highly biased toward genes with high interest). CDS length would be helpful to correct for things that pS does not correct for, since pS is a rate (controlling for CDS length) and does not account for the coding footprint (hence pS is similar across gene categories).

      The authors point out that it is crucial to get the control set right. This group has spent a lot of time thinking about how to define a control set of genes in several previous papers. But it is not clear if complex disease genes and infectious disease genes are specifically excluded or not. Number of virus interactions was included as a confounding factor, so VIPs were presumably not excluded. It is clear that the control set includes genes not yet associated with Mendelian disease, but the focus is primarily on the distance away from known Mendelian disease genes.

      Minor comments:

      On page 13, the authors say "This artifact is also very unlikely due to the fact that recombination rates are similar between disease and non-disease genes (Figure 1)." However, Figure 1 shows that "deCode recombination 50kb" is clearly higher in disease genes and comparable at 500kb. The increased recombination rate locally around disease genes seems to contradict the argument formulated in this paragraph.

      1. Quintana-Murci L. Understanding rare and common diseases in the context of human evolution. Genome Biol. 2016 Nov 7;17(1):225. PMCID: PMC5098287<br /> 2. Blekhman R, Man O, Herrmann L, Boyko AR, Indap A, Kosiol C, Bustamante CD, Teshima KM, Przeworski M. Natural selection on genes that underlie human disease susceptibility. Curr Biol. Elsevier BV; 2008 Jun 24;18(12):883-889. PMCID: PMC2474766<br /> 3. Torgerson DG, Boyko AR, Hernandez RD, Indap A, Hu X, White TJ, Sninsky JJ, Cargill M, Adams MD, Bustamante CD, Clark AG. Evolutionary processes acting on candidate cis-regulatory regions in humans inferred from patterns of polymorphism and divergence. PLoS Genet. Public Library of Science (PLoS); 2009 Aug;5(8):e1000592. PMCID: PMC2714078<br /> 4. Granka JM, Henn BM, Gignoux CR, Kidd JM, Bustamante CD, Feldman MW. Limited evidence for classic selective sweeps in African populations. Genetics. Oxford University Press (OUP); 2012 Nov;192(3):1049-1064. PMCID: PMC3522151

    1. Author Response:

      Reviewer #1:

      The manuscript by Bellio and colleagues is based on the experimental model of T. cruzi infection in WT, MyD88-/- and IL-18-/- mice previously described by the same group in a 2017 eLife publication. The main message of the current study is that, in addition to IFN-g+ Th1 effectors, T. cruzi infection induces an even larger population of cytotoxic CD4+ T cells.

      The characterization of the cytotoxic CD4+ T cells is well documented. The data shown are convincing. However, since Burel et al. (2012) described the existence of a similar population in humans infected with P. falciparum (an intracellular pathogen), the authors should modify the statement (line 35-36) in the abstract.

      First, we would like to thank Reviewer #1 for the positive comments on our work.

      Please note that our statement in the abstract is: “Here, for the first time, we showed that CD4CTLs abundantly differentiate during mouse infection with an intracellular parasite” refers to mouse experimental models of parasite infection and not to human studies. We could not find any article with Burel JG as first author published in 2012; we believe that Reviewer# 1 is referring to a study published in 2016 (Burel et al. PLoS Pathog. 2016 Sep 23;12(9):e1005839), in which a population of CD4 T cells with cytotoxic properties was described in humans after primary exposure of blood-stage malaria parasites. Please note that the finding of the important role of T-cell intrinsic IL- 18R/MyD88 signaling for the development of a strong CD4CTL response is also part of the main message of our manuscript.

      Similarly, the title "Cytotoxic CD4+ T cells… predominantly infiltrate Trypanosoma cruzi-infected hearts" is an overstatement. If cytotoxic CD4+ T cells outnumber 10:1 IFN-g-secreting population (in lymphoid tissue) their higher representation in hearts of infected mice is not a selective phenomenon but rather expected.

      We would like to thank Reviewer #1 for this comment, giving us the opportunity to clarify this point. Of note, we were not referring to the ratio of CD4CTL to Th1 cells, but to the frequency of CD4CTL among all the CD4+CD44+ (activated/memory) T cells. In fact, as shown in Figure 7-figure supplement 2, (now added to the revised ms), we found that the frequency of GzB+ cells among all activated/memory CD4+CD44hi T cells is significantly increased in the heart compared to the frequency of GzB+ among CD4+CD44hi T cells found in the spleen. Please also note that the frequency of CD4+ T cells expressing both GzB and PRF also increases in the heart compared to the spleen (Fig. 7F, middle panel and Fig. 1D left panel). We are now including this information in the revised manuscript, clarifying this point.

      My major concern is that the function of these cells remains undefined. Are they beneficial or detrimental for the host? It appears that the authors themselves could not make up their minds. The GzB+ CD4+ T cells protect but do not decrease the parasite load (Fig 6G).

      Our results in the mouse model of infection with T. cruzi, employing the adoptive transfer of WT CD4+GzB+ T cells to the susceptible Il18ra-/- mouse strain, indicate a clear beneficial role of CD4CTLs in the acute phase of experimental T. cruzi infection. Significantly extended survival was observed in the group of mice receiving sorted CD4+GzB+ cells, without, however, decreasing parasite load (Figure 6G). We would like to comment here that in order to be beneficial to the host, an immune response does not always result in decreasing the pathogen load. In fact, in certain circumstances, to hinder the excessive inflammatory response (which can lead to host death), is an advantage for the host, even if this does not result in the reduction of the pathogen numbers. The advantage conferred to the host by regulating the inflammatory response was probably also explored in pathogen/host co-evolution, giving rise to chronic infections, where the host can survive for a longer period and the pathogen increases its chances of transmission (Schneider DS & Ayres JS., 2008, Nat Rev Immunol;8(11):889; Medzhitov R, et al, 2012, Science; 335(6071):936). Therefore, the results shown on Figure 6G are fully compatible with a potential regulatory role exerted by CD4CTLs, previously proposed by other authors (Mucida et al, Nat. Immunol. 2013), and point to the beneficial role of CD4CTLs for the host in the acute phase of infection with T. cruzi, probably by contributing to the decrease of immunopathology, the detrimental side of an exacerbated immune response, as discussed. Also favoring this hypothesis, the frequency of CD4CTLs expressing immunoregulatory molecules is increased when compared to other activated CD4+T cell subsets (Figure 3 and new Figure 7-figure supplements 3 and 4). Please see our complete discussion on this subject in the revised manuscript.

      On the other hand, during the chronic phase of the disease, the persistence of the immune response against the parasite might involve functional changes in the CD4 T cell response. This hypothesis could explain the association found between CD4CTLs and cardiomyopathy in chronic Chagas patients. Therefore, a beneficial role for CD4CTLs in the acute phase is totally compatible with the hypothesis that, during the chronic response in a persistent infection, CD4CTLs might acquire a detrimental role, contributing to immunopathology. Of note, several studies in the literature have shown a beneficial role for Th1 cells during the acute phase of infection with T. cruzi, while the Th1 response has also been associated to a pathologic outcome during the chronic phase of Chagas disease (reviewed in Ferreira et al, 2014 World J Cardiol 2014 6(8):7820 and in Fresno & Girones, 2018, Front.Immunol. 9;351). Therefore, it is not implausible that the CD4CTL subpopulation, could also display different roles in the acute versus the chronic phases of the infection with T. cruzi. However, at present, this hypothesis remains speculative as stated in the manuscript discussion. An extensive investigation of the role of CD4CTLs, as well as of immunoregulation mechanism acting in chronic Chagas patients need to be conducted to fully answer this question, which is beyond the scope of the present work. Nevertheless, we acknowledge that the alternative possibility remains, in which the higher levels of CD4CTLs in chronic patients reflect elevated parasite burden and/or inflammation in the heart, without a direct involvement of this cell subset in the pathology. Please see our answer to Review #2 on this topic and the inclusion of discussion clarifying this point in the revised manuscript.

      Are they terminally differentiated or "exhausted" effectors? GzB+ CD4+ T cells can be found in the hearts of chronically infected mice, but we do not know if they are specific for pathogen or self Ags. Do they express the markers of exhaustion on day 14 in the heart?

      1) We have commented in the first version of the manuscript that one of the limitations of our work is the fact that very few CD4 epitopes of T. cruzi presented by I-Ab have been described so far, and this limits the investigation on the specificity of CD4CTLs in our model. This is a very interesting and important question, which, however, is not possible to address in the present work.

      We would like to thank Reviewer#1 for the suggestion of performing a broader analysis on the expression of immunoregulatory markers associated with exhaustion and/or terminal differentiation, which adds for the comprehension of CD4CTL biology in the model of acute infection with T. cruzi. Whether GzB+CD4+ T cells are terminally differentiated or "exhausted" effectors is an interesting and debated question. It was initially hypothesized that since exhausted T cells share features with terminally differentiated T cells, this would suggest a developmental relationship between these cell states (Akbar, A.N. & Henson, S.M., 2011 Nat. Rev. Immunol.11:289; Blank, C.U. et al, 2018, Nat.Rev.Immunol,19:665). However, subsequent studies showed that exhausted T cells seem to be derived from effector cells that retain the capacity to be long-lived (Angelosanto, J.M. et al., 2012, J. Virol. 86: 8161). In the first version of our manuscript, we investigated the expression of several markers associated with exhaustion such as 2B4, Lag-3, Tim-3 and CD39, besides the downregulation of CD27 on GzB+ CD4+ T cells (Figures 1E, 3B, 3D-E and 5E). In general, cells losing the expression of CD27 have been characterized as Ag-experienced further differentiated cells (Takeuchi and Saito, 2017, Front.Immunol. 8:194). Our finding that, differently from GzB-negative cells, most GzB+CD4+ T cells had lost the expression of CD27, suggested to us that CD4CTLs present in the spleen of mice infected with T. cruzi might be further differentiated T cells (Figure 3E). The transcription factor Blimp-1 controls the terminal differentiation of cells in a variety of immunological settings and its high expression in CD4+ and CD8+ T cells is associated to the expression of immunoregulatory markers (Chihara, N. et al, 2018, Nature 558:454). The observed high expression of Blimp-1 by GzB+CD4+ T cells (Figure 5D) is also compatible with the hypothesis that CD4CTLs are terminally differentiated. Of note, most of the exhaustion studies were performed on CD8+ T cells and it is still not well established if this phenomenon is equally regulated in CD4+ T cells. We have now extended the investigation on the expression of terminal differentiation/exhaustion markers, including PD-1 staining, on GzB+PRF+ CD4+ T cells in the spleen and in the heart of infected mice. Results in Figure 7-figure supplement 3, show that CD44hiGzB+PRF+ CD4+ T cells compose the subset of activated cells among which the higher frequency of cells expressing these markers is found, both in the spleen and in the heart, at day 14 pi. The only exception was the equal ratio of cells expressing PD-1, and at equivalent levels, when comparing CD44hiGzB-PRF- and CD44hiGzB+PRF+ CD4+ T cells in the spleen. Non-significant differences in the percentages of cells expressing PD-1 among CD44hiGzB-PRF- and CD44hiGzB+PRF+ CD4+ T cells were found in the heart. However, the intensity of expression of the PD-1 marker (MFI) was significantly higher among CD44hiGzB+PRF+ compared to CD44hiGzB-PRF- CD4+ T cells infiltrating the heart. Furthermore, we also compared the frequency of CD44hiGzB+PRF+ CD4+ T cells expressing Lag-3, Tim-3, CD39 and PD-1, and their corresponding MFI values, between the spleen and the heart (Figure 7-figure supplement 4). Of note, while MFI values of Tim-3, CD39 and PD-1 expression were increased on CD4CTLs (CD44hiGzB+PRF+) in the heart compared to CD4CTLs in the spleen, Lag-3 expression levels were decreased on CD4CTLs infiltrating the cardiac tissue. Despite exhaustion being often seen as a dysfunctional state, it is important to note that the expression of these inhibitor molecules allows strongly activated T cells to persist and partially contain chronic viral infections without causing immunopathology and that highly functional effector T cells can also express such inhibitory receptors (reviewed in Wherry, E.J., 2011, Nat. Immunol.,12:492; Blank, C.U. et al, 2018, Nat. Rev. Immunol., 19:665). Interestingly, only PD-1, but not Lag-3, Tim-3 or CD39 expression is upregulated on CD8CTLs in the heart relatively to the spleen, an indication that the T. cruzi-infected cardiac tissue is a less so-called exhaustion-inducing environment compared to certain tumors (Figure 7- figure supplement 4). It is known that many immunomodulatory molecules, including Lag-3, Tim-3, PD-1 and CD39 are co-expressed as part of a module composing a larger co-inhibitory gene program, which is expressed in both CD4+ and CD8+ T cells under certain activation conditions, driven by cytokine IL-27 (Chihara, N. et al, 2018, Nature 558:454). The opposing behavior of Lag-3 expression, which is downmodulated on CD4CTLs in the heart in comparison to the spleen, indicate that CD4CTLs infiltrating the heart are not typically exhausted cells. Of note, a recent study has shown that exhausted CD8+T cells can partially reacquire phenotypic and transcriptional features of T memory cells, in a process that includes the downmodulation of Lag-3 expression (Abdel-Hakeem, M.S. et al, 2021, Nat.Immunol., 22:1008). As requested, these new data were included (Figure 7-figure supplements 3 and 4) and discussed in the revised manuscript.

      The factors that control differentiation of cytotoxic CD4+ T cells are the same as for IFN-g- Th1 cells. MyD-88-/- and IL-18-/- mice significantly lack both populations and succumb to T. cruzi infection. In their 2017 eLife publication, this group reported that survival of infected MyD-88-/- and IL-18-/- mice can be rescued by adoptive transfer of purified total WT CD4+ T cells, which was attributed entirely to their ability to secrete IFN-g (at least in the case of MyD-88-/- recipients). In the current study, the authors only used infected IL-18-/- recipients and show that this time transfer of GzB+ CD4+ T cells is sufficient to confer the protection. When compared with the old data, the rescue of the infected IL-18-/- with only GzB+ CD4+ T cells looks weaker (2 surviving animals out of 10 pooled from 2 experiments), strongly suggesting that IFN-g Th1 cells do play a significant role. It is unclear when the parasite load in Fig G6 was evaluated. It would be good to show deltaCT values for individual mice.

      We thank Reviewer #1 for the opportunity to clarify the point on the protective role of Th1 and CD4CTLs cells during T. cruzi infection and to better discuss our data. Please note that we do not question the beneficial role of Th1 cells in this infection model. In our paper published in 2017 in eLife, we have shown that the adoptive transfer of IFN-g- deficient CD4+ T cells do not result in the decrease of parasite loads in susceptible recipient mice. These results are totally in agreement with the known beneficial role of Th1 cells during infection with T. cruzi, through the microbicidal action of IFN-g, which was also described by other groups.

      The new information that our present study brings is that the adoptive transfer of GzB+CD4+ T cells with poor (GzB-YFP+) or no (Ifng-/-) capacity of IFN-g secretion, also significantly extended survival of infected Il18r-/- mice, which have lower levels of both Th1 and CD4CTLs, compared to WT mice (Figure 6G and Figure 6-figure supplement 2). Please note that 3 (not 2) out of 10 mice that received GzB+CD4+ T cells survived. We stated in our discussion that, together, our present and past data demonstrate that both Th1 and CD4CTL are important for improving survival, although through different mechanisms, since adoptively transferred GzB+CD4+ T cells (as well as Ifng-/- CD4+ T cells) were not capable of reducing parasite load but, notwithstanding, extended survival.

      Following the guidelines of the Animal Care and Use Committee, in order to prevent/alleviate animal suffering, all laboratory animals found near death must be euthanized. Therefore, parasite load in the hearts was evaluated in mice found at the moribund condition (a severely debilitated state that precedes imminent death, as defined in Toth, L.,2000; ILAR J, 41:72), presenting unambiguous signals that the experimental endpoint has been reached. We have now included 2ˆDeltaCT values for individual mice in Figure 6G, as requested.

      Because donor IFN-g-/- CD4+ T cells do express IFN-gR (Supp Fig 6-2), IFN-g produced by IL-18-/- host cells could enhance the activity and/or help expand cytotoxic CD4+ T cells among the IFN-g-/- CD4+ donor population. To directly test the protective role of cytotoxic CD4+ T cells in the absence of IFN-g, the authors should treat infected IL-18-/- mice that have received IFN-g-/- CD4+ T cells with anti-IFN-gamma Ab.

      It is known that IFN-g is critically important for resistance against infection with T. cruzi. Accordingly, Ifng-/- mice are extremely susceptible, dying at early time points of infection (Campos, M. et al, 2004, J.Immunol, 172:1711). Of note, IFN-g production by other cell types, and not only derived from CD4+ T cells, is relevant for resistance against infection, as demonstrated for CD8+ T cells (Martin D & Tarleton R. Immunol Rev. 2004, 201:304). In our present work, we performed experiments where Ifng-/- CD4+ T cells were adoptively transferred to susceptible Il18ra-/- mice, with the goal of testing whether the transferred cells would be able to confer some increment in the survival time of infected mice, despite of not being able to decrease parasite loads, a direct consequence of their deficiency in IFN-g production, as previously shown (Oliveira et al., 2017, eLife). In fact, this turned out to be the case and we showed that the transfer of purified Ifng-/- CD4+ T cells extended survival (Figure 6-figure supplement 2). Of note, our data demonstrate that the percentage of GzB+CD4+ T cells is not affected in the total absence of IFN-g, since Ifng-/- mice display the same frequency of this cell population as found in WT mice (Figure 4B). The increased survival of adoptively transferred mice is compatible with a regulatory function of GzB+CD4+ T cells, which additionally express several immunoregulatory molecules, as shown. Whether IFN-g produced by the host is enhancing the activity and/or expanding cytotoxic CD4+ T cells among the transferred T cell population is not an essential point here, since we were not aiming to test the protective role of cytotoxic CD4+ T cells in the total absence of IFN-g in the host mice.

      The intracellular cytokine staining in this study appears to be suboptimal. Instead of stimulating with PMA/ionomycin in the presence of Golgi block, Roffe et al. (2012) stimulated lymphocytes with anti-CD3 prior to adding Brefeldin A, an important technical difference which may explain the rather low frequencies of IFN-g+ and IL-10+ cells in this study.

      We respectfully disagree from Reviewer #1 on this point. The frequency of IFNg+ CD4+ and IL-10+CD4+ T cells in the spleen of mice infected with T. cruzi Y strain obtained in our experiments is in the same range to what was previously described by other research groups investigating the immune response to this parasite, including studies that have employed anti-CD3 stimulation and brefeldin A, such as Jankovic, D. et al, 2007, JEM 204:273 (Fig.S1), cited in our manuscript (page 9, lines 218-219), among others (Nihei J et al, 2021, Front. Cell. Infect. Microbiol.11:758273; Martins GA et al, 2004, Microbes Infect 6:1133 – Fig.6B; Hamano S. et al, 2003, Immunity, 19:657- Fig. 2A). In the present work, we used the combination of monensin and brefeldin A after PMA/iono treatment, and found the same frequency of IFN-g+CD4+ T cells described in a previous study of our group, where staining was performed after incubation of splenocytes with parasite-derived protein extract and brefeldin A alone (Oliveira AC et al., 2010, PLoSPath 6(4):e1000870 –Fig. 8D). On the other hand, please note that the study cited by Rev. #1 (Roffe et al., JI 2012) employed a different strain of T. cruzi, the Colombiana strain, which differs in several aspects from the Y strain used in our work. Colombiana induces a different pathology, with distinct kinetics. In that study, intracellular IFN-g and IL-10 detection was performed at a much later time point of infection (day 30 pi), and in cells infiltrating the heart, not the spleen. In summary, frequencies of IFN-g and IL-10 secreting CD4+ T cells described in our manuscript are comparable to the ones found in the spleen of mice infected with the same or similar strains of T. cruzi and reported in articles of prestigious journals by other groups, cited above.

      Reviewer #2:

      In this work, Professor Bellio and her colleagues provide compelling evidence to show unusually strong induction of cytotoxic CD4 T cells (CD4CTLs) in Trypanosoma cruzi-parasitized mice. Using genetic models and mixed bone marrow chimeras they dissect the signals responsible for CD4CTL induction in this infection and identify T cell-intrinsic IL-18R/MyD88 signaling as the key inducer. The CD4CTLs that clonally expand in T. cruzi infection outnumber CD4 cells with typical Th1 profile (IFN-γ secretion) and bear the hallmarks of CD4CTLs described in other model systems and in humans. Utilizing GzmbCreERT2/ROSA26EYFP reporter mice, the authors show that adoptive transfer of CD4 cells that have made GzB can increase the survival of T. cruzi parasitized l18ra-/- mice. Finally, the authors describe a clear correlation between the frequency of CD4CTLs the circulation of patients with T. cruzi-induced chronic Chagas cardiomyopathy, implying a pathogenic role for these cells in chronic disease.

      The findings reported here are an important addition to the understanding of both the origin of CD4CTLs and their potential role in host protection or disease. The evidence provided in support of the main claims is very strong and the association between CD4CTLs and Chagas disease quite intriguing. There are, however, some aspects of the work that would benefit from further clarification or experimental support, so that alternative interpretations of the data can be excluded.

      The defining characteristic of CD4CTLs that separates them from other CD4 subsets is the production of granzymes and perforin and, by extension, the ability to kill target cells in a granzyme/perforin-dependent manner. In contrast, all T cells can kill target cells via alternative mechanisms that are not dependent on granzyme/perforin, for example through expression of TNF family members. It would appear that much, if not most, of the killing activity of T. cruzi-induced CD4CTLs can be attributed to FasL (Fig. 1B). FasL-mediated killing is not restricted to CD4CTLs and as the title of one of the cited studies (Kotov et al., 2018) states, "many Th cell subsets have Fas ligand-dependent cytotoxic potential". It would be important to ascertain if expression of granzyme/perforin by CD4CTLs in T. cruzi infection is also associated with granzyme/perforin-dependent cytotoxicity. This affects the direct and indirect in vitro cytotoxicity assays, as well as the interpretation of in vivo protection.

      Similarly, the protective effect of transferring GzmbCreERT2/ROSA26EYFP reporter-positive cells to Il18ra-/- mice may not be necessarily mediated in a granzyme/perforin-dependent manner or by CD4CTLs for that matter. The reporter will mark cells that express GzB at the time of tamoxifen administration but does not guarantee that these cells will continue to express GzB or that they will prolong survival of recipients in a granzyme/perforin-dependent manner.

      While the authors provide evidence that GzB-producing cells are largely distinct from IFN-γ-producing cells, the reporter-positive cells may still contain genuine Th1 cells. Given Th1 cells have been previously found necessary for protection of Il18ra-/- mice in the T. cruzi model, can a role for Th1 cells in this transfer model be formally excluded? The authors do convincingly demonstrate that IFN-γ itself is not essential for protection, but that does not leave granzyme/perforin-dependent as the only other alternative. For example, the experiment described in Fig. 6G lacks an important control, the transfer of reporter-negative cells. What would the conclusion be if reporter-negative (but T. cruzi-specific) cells proved as protective as reporter-positive cells?

      We would like to thank Reviewer #2 for the positive comments on our study and for giving us the opportunity to better discuss and clarify the relevant points raised in this review.

      (i) Concerning the role of GzB/PRF in cytotoxicity: as explained in more details in our next answer to Reviewer #2, we have now shown that the cytolytic activity of the CD4 T cell subset differentiating in the murine T. cruzi-infection model is totally dependent on a GzB- and PRF-mediated mechanism.

      (ii) Concerning a possible role for Th1 in the adoptive transfer experiments: please note that the parasite load is not decreased by the adoptive transfer of CD4+GzB+ T cells (Figure 6G); Additionally, we showed that the adaptive transfer of Ifng-/- CD4+ T cells also extend the survival of infected mice (Figure 6-figure supplement 2), but did not decrease parasite levels (Oliveira et al., 2017). These results exclude a role for Th1 cells, which are known to exert an important microbicidal function through the production of IFN-g, as previously demonstrated by us (Oliveira, 2017) and other groups. Together, our present and past data support the notion that both Th1 and CD4CTL are important for extending survival, although through different mechanisms. Our results are in accordance with an immunoregulatory role played by CD4CTLs, likely through the GzB/PRF/FasL-mediated killing of infected APCs in an IFN-g-independent manner, although it is not possible to attribute the beneficial role of the adoptively transferred CD4CTLs exclusively to their cytolytic function, as discussed in the revised manuscript. Of note, we also show here that most CD4+GzB+PRF+ T cells express high levels of immunomodulatory molecules, raising the possibility that the beneficial role of adoptively transferred CD4CTLs might rely on the concerted action of their cytolytic function and immunomodulatory activity. Please see the full discussion on this point in the revised version of the manuscript.

      (iii) Concerning the adoptive transfer of GzB-EYFP-negative cells: unfortunately, GzB-EYFP-negative cells cannot be employed as a control, since in the GzmBCreERT2/ ROSA26EYFP mouse line age, only 1 - 3 % of total splenic CD4+ T cells express EYFP after induction by tamoxifen (Figure 2-figure supplement 3). This contrasts to 10-40% of GzB+ and PFR+ cells among CD4+ T lymphocytes, observed by intracellular staining. Consequently, the majority of the CD4+GzB+ T population is EYFP-negative in this system and thus, sorted “GzB-EYFP-negative”, based on the absence of expression of EYFP, would not be bona-fide GzB-negative cells. If it were possible to sort GzB reporter-negative cells, Th1 cells would be among the sorted cells and upon adoptive transfer they would secrete IFN-g and, consequently, decrease the parasite load in recipient mice (Oliveira, 2017). However, in the absence of the proposed immunoregulatory action of CD4CTLs, Th1 cells transferred alone might also increase pathology and, consequently, it is possible that they would not extend survival, albeit diminishing parasite load. It is expected that higher levels of extended survival would be attained when both Th1 and CD4CTLs are transferred, as discussed in the manuscript and in answer (ii) above. Importantly, please note that one current hypothesis is that CD4CTLs differentiate from Th1 and, therefore, the adoptive transfer of Th1 cells will not guarantee that Th1-derived CD4CTLs would not be developing in vivo, unless special engineered mouse strains, not available at present, would be employed for these experiments.

      Reviewer #3:

      By modelling trypanosoma cruzi infection in mice, the authors highlighted the presence of a subsets of CD4 T cells expressing canonical markers and transcription factors of CTLs and capable of exerting antigen specific and MHC class II restricted cytotoxic activity. Mechanistically, using KO mice, the authors have shown that myd88 expression is required for strengthening the CD4 CTLs phenotype during the infection.

      Moreover, by investigating the presence of a previously published CD4 CTLs gene signature in a mixed bone marrow chimera settings they highlighted a cell intrinsic role for Myd88 in imprinting the signature. The study also identifies Il18R as a myd88 upstream receptor potentially responsible for CD4 CTLs development by showing that lack of IL18R phenocopied myd88 deficiency in failing to promote a CD4 CTLs phenotype.

      Finally, by showing the direct correlation between perforin expressing CD4 T cells in Chagas infected individuals and parameters of heart disfunction the authors hinted at a possible involvement of CD4CTLs in a clinical setting.

      -The core finding of the paper, providing the first evidence of CD4 CTLs development in a mouse model of intracellular parasite is well supported by the data. The expression of markers correlated to CD4 cytotoxicity in other settings and gene signatures fits well the phenotype described and suggests possible common features for CD4 CTLs development across infection with different pathogens.

      This manuscript will boost the knowledge over the involvement of non canonical CD4 types in the immune responses to parasites. Moreover the finding that CD4 CTLs are the predominant phenotype in organs importants for viral replication imply an involvement of these cells in the development of the pathology that will have to be taken into accounts in future studies.

      • The understanding of the parental relationship beteween CD4CTLs and Th1 remains unclear and it's complicated by the low numbers of IFNg (regarded as an hallmark of functional Th1) producing CD4 T cells detected in the model. IFN-g production by CD4 is lower than 10% even when achieved by PMA/Iono stimulation and half of Gzb+ CD4 stain positive for the cytokine. On the other hand the putative transcription factor of Th1 development, Tbet, is expressed by all Gzb positive CD4s. This discrepancy and the low number of IFNG+ should be better discussed by the authors.

      First, we would like to thank Reviewer #3 for the constructive criticism on our manuscript. Regarding the apparent discrepancy on the frequencies of IFN-g+ and Tbet+ CD4+ T cells in our model, please first note that the percentage of IFN-g+ CD4+ T cells detected in the present study is comparable to the ones found in the spleen of mice infected with the same or similar strains of T. cruzi and reported by other groups (please see our complete answer to Reviewer #1 on this topic). With this remark done, we think that the apparent discrepancy between the expression of T-bet and the low fraction of GzB+CD4+ T cells producing IFN-g is a very interesting question. It is known that T-bet is a key transcription factor associated with the development of IFN-g-producing CD4+ T cells and that it also coordinates the expression of multiple other genes in CD4+ T cells and in other cell types. Also, T-bet can interact with other proteins, resulting in the induction or inhibition of key factors in T cell differentiation (reviewed in Hunter, 2019, Nat. Rev. Immunol, 19:398). Importantly, it has been shown that during the late stages of Th1 cell activation, T-bet recruits the transcriptional repressor Bcl-6 to the Ifng locus to limit IFNg transcription (Oestreich, 2011, JEM, 208:1001) Therefore, T-bet action is not limited to transactivation of the Ifng gene, but can also act as part of a negative-feedback loop to limit IFN-g production in certain cells. We do not believe that Bcl-6 is playing a role in CD4+GzB+ T cells in our model, since we found that the majority of CD4+GzB+ T lymphocytes express Blimp-1 (Figure 5D), and Blimp-1 and Bcl-6 are known to be reciprocally antagonistic transcription factors.

      However, the possibility remains that another repressor factor is downregulating Ifng gene transcription in the majority of T-bet+ CD4+GzB+ T cells, with the participation of T-bet or not. Of note, Blimp-1 was shown to be a critical regulator for CD4 T cell exhaustion during infection with T. gondii, and CD4+ T cells deficient in Blimp-1 produced higher levels of IFN-g in infected mixed-bone marrow chimeric mice reconstituted with WT and Blimp-1 conditional knock-out cells (Hwang, S., 2016, JEM 213:1799). Furthermore, Blimp-1 attenuates IFN-g production in CD4 T cells activated under nonpolarizing conditions and chromatin immunoprecipitation showed that Blimp-1 binds directly to a distal regulatory region in the Ifng gene (Cimmino, L. et al. 2008, JI 181:2338). We have also shown that, like Blimp-1, Eomes is expressed by around 60% of the GzB+CD4+ T cells (Figure 2G). It is known that Eomes controls the transcription of cytotoxic genes and promotes IFN-g production in CD8+ T cells, binding to the promotor of the Ifng gene. Interestingly, Eomes was also shown to participate in the induction of immunoregulatory/exhaustion receptors, such as PD-1 and Tim-3. Furthermore, deficiency of Eomes led to increased cytokine production (Paley, M.A. et al., 2012, Science 338: 1220). More recently, evidence in favor of the participation of Eomes in the repression of IFN-g production in TCR-gamma-delta T cells was also published (Lino, C. et al.,2017, EJI 47:970). Therefore, these studies indicate the complex control of Ifng gene, in which T-bet, Eomes, Blimp-1 and possible other TFs might play concerted roles. We think it would be interesting to investigate the role of Eomes and/or Blimp-1 in the repression of the Ifng gene in GzB+CD4+ T cells. Kinetics studies on the expression of these TFs, may contribute for the better understanding of the parental relationship between CD4CTLs and Th1 cells, a fundamental question, not completely understood yet. A comment on this subject was included in the revised manuscript.

      On the same note, while the confirmation of a CD4 CTLs gene signature in the model is very convincing, it must be noted that the one used as a reference was obtained by performing single cell RNA seq , taking into account only IFNg+ CD4 cells and then comparing Gzb+ and Gzb- negative in the setting. The authors are instead using bulk RNA seq and comparing populations of cells that would have none VS low levels of Th1. In this view, while the confirmation of the CD4 CTLs signature is striking, addressing the relative relationship with Th1 cells is complicated. Using Gzb YFP reporters in the setting could help improving the resolution between the 2 subsets.

      Our analysis clearly demonstrated the presence of the CD4CTL signature among WT CD4+ T cells, and its absence among Myd88-/- CD4+ T cells from the same mixed-BM chimeric mice. Together with our past work (Oliveira, 2017) and results included in the present manuscript, this analysis strongly contributes to demonstrate the importance of T-cell intrinsic IL-18R/MyD88 signaling for the development of a robust CD4CTL response to infection with an intracellular parasite. Although these results argue in favor of a common origin for CD4CTLs and Th1 cells during infection, an interesting point is that Ifng-/- mice display the same percentage of GzB+CD4+ T cells as WT mice (Figure 4B), suggesting that GzB+CD4+ T cells might emerge independently of IFN-gdependent Th1 cells. Therefore, the possibility remains that not all CD4CTLs are derived from the putative terminal differentiation of Th1 cells but that, instead, a divergence between the Th1 and CTL differentiation programs might occur at an earlier step. Although addressing this fundamental question goes beyond the possibilities of the present study, we believe that our results bring an important and substantial contribution for the understanding of the biology of CD4CTLs in response to infection and highlights the importance of IL-18R/MyD88 signaling for the reinforcement and/or stabilization of CD4+ T cell commitment into the CD4CTL phenotype. Regarding the use of GzB-YFP reporters, please see our answer below.

      • The dependancy on the Myd88/IL18r axis to promote CD4 CTLs is well characterized and the prolonged survival rate of IL18r-/- after the adoptive transfer of Gmb YFP+ CD4 is very convincing. However instead of using PBS as control the authors could have used YFP- or total CD4 cells for the task. While in previous publication it was already showed that protection was achieved by transferring the total CD4 population; comparing GzB + VS GzB- would have added useful insights over the amount of protection conferred by the subtypes and relative roles of CD4 CTLs and Th1 in the model. Parasitemia could also be reassessed in this view.

      We have already discussed the impossibility of sorting bona-fide GzB-negative cells from the reporter mouse strain available. Please see our complete answer to Reviewer 2 on this issue (iii) in this point-by-point letter. Moreover, due to the low percentage of GzB-EYFP cells labeled in the tamoxifen-treated reporter mice, a high number of mice is necessary for performing these adoptive transfer experiments. Unfortunately, due to the COVID-19 pandemic and its consequences on our animal facility, at present it is impossible to repeat this experiment including total CD4+T cells within a reasonable time. However, we have already shown in our past study (Oliveira, 2017), that the transfer of total WT CD4+T cells to Il18ra-/- mice, increased survival and lowered parasite load. On the other hand, our current data demonstrate that the adoptive transfer of GzB+CD4+ T cells increases survival but does not change the parasite load (Figure 6G). Therefore, these data strongly support that GzB+CD4+ T cells act in an IFN-g-independent way and, hence, differ from Th1 in the effector mechanism employed for extending survival of the recipient mice. In summary, our results favor the notion that CD4CTLs and Th1 cells have complementary roles, both being able to extend survival of recipient mice, although only Th1 are effective in lowering parasite load.

    1. Author Response

      Reviewer #1 (Public Review):

      The results are quite interesting and potentially have important therapeutic implications. Nevertheless, in the current form there are several weaknesses that diminish the strength of the findings.

      1) As the authors note, they do not provide direct evidence for the ultimate conclusion of the study that assembly with β2a and β2e subunits are necessary for CaV2.3 channels to contribute to pacemaking in SN DA neurons. The authors state siRNA knockdown experiments in SN DA neurons are technically challenging. Nevertheless, shRNA knockdown studies in SN neurons have been previously published. Such a study is critical to provide direct evidence for what would be a very important and impactful finding.

      Please refer to our detailed response to essential revision 1 above.

      2) Relative contribution of CaV1.3 (L‐type) and CaV2.3 channels to pacemaking in SN DA neurons. As the authors note, a phase III clinical trial for the L‐type channel blocker, isradipine, showed no efficacy for neuroprotection, even though some mice studies suggested this might be efficacious. On the other hand, the authors' previous work with CaV2.3 knockout mice suggest inhibition of this channel would be more appropriate for a neuroprotective response. It would be useful to get a direct comparison of the impact of isradipine and SNX‐482 on pacemaking in SN DA neurons (Figs. 1 and 2). If their impacts on pacemaking (and Ca2+ oscillations) are similar it would suggest something beyond the pacemaking Ca2+ influx could be responsible for neuroprotection (e.g. changes in NCS‐1 expression as previously suggested by the authors).

      The question about the relative contribution of Cav1.3 and Cav2.3 on pacemaking is complex due to the finding that different results have been obtained regarding the role of L‐type channels on pacemaking. In Cav1.3 knockout mice pacemaking frequency is normal (7, 8). Inhibition (of Cav1.2 and Cav1.3) by dihydropyridine Ca2+ channel inhibitors (e.g. isradipine, nimodipine) was found to inhibit pacemaking in some (e.g. 9‐11) but not in all (8, 12) reports. This seems to be dependent on experimental conditions, but the reasons for these discrepancies are currently unclear. Similarly, we find inhibition of pacemaking by SNX‐482 in cultured midbrain neurons (this paper) but, as previously reported, not in Cav2.3‐deficient mice (1). While this toxin is well suited to isolate Cav2.3‐mediated Ca2+ current components, effects on pacemaking in DA neurons have to be interpreted with more caution because (as clearly outlined in our original MS and our previous paper, 1), SNX‐482 is also a potent inhibitor of Kv4.3 channels. We consider this limitation even more in the discussion of SNX‐482 effects on pacemaking in cultured neurons (data now moved to Suppl Fig. 5) in the revised MS (end of page 15, top of page 16), although the SNX‐482 changes suggest an involvement of Cav2.3 for AP generation.

      Although we acknowledge the relevance of the question raised by the reviewer, based on our previous findings (1) the absence of an obvious role of Cav2.3 for pacemaking in SN DA neurons (despite their role for Ca2+ transients) as an experimental read‐out prevents a straightforward approach to study the contribution of different β‐subunits and their splice variants for this process.

      3) The slice recording data (Fig. 9) are confusing and raise concerns about adequacy of pharmacological isolation of CaV2.3 currents in this preparation. The accuracy of interpretation of the data in Fig. 9 rests critically on the idea that the cocktail of CaV channel blockers given successfully isolates CaV2.3 currents. Yet, the amplitudes of the exemplar currents shown for plus or minus the CaV channel blocker cocktail are almost the same. This cannot be due to CaV2.3 providing the dominant current in the slice preparation since addition of SNX‐482 only decreased Ca2+ current amplitude by 13% (Suppl Fig. 5). It is not clear to me why the steady‐state activation and inactivation curves experiments were not conducted in the cultured neuron preparation (Figs. 1 and 2) where there seems to be better control of pharmacological block of different Cav channel isoforms.

      We have now performed the isolation of SNX‐482sensitive currents not only in the cultured neuron preparation as suggested but, in addition, also in SN DA neurons. The latter experiments gave essentially identical steady‐state inactivation parameters as compared to our "R‐type" current (current remaining in the presence of all other channel blockers). This now also allows a direct comparison of SNX‐482‐sensitive current properties in cultured neurons and in slices (see response above). We now also specifically discuss previous reports of SNX‐482‐sensitive Rtype components in the introduction to allow comparison of these reports with our findings. Please also note that in our legend to Fig. 9A (original MS, now Fig. 6) we have explicitly stated that recordings of "similar amplitudes were chosen" to facilitate comparison of current kinetics. We still think that this makes sense and kept this part of the figure but now strengthened this point even more in the figure legend (Fig. 6).

      4) While the transcript data show that β2a and β2e are present in SN DA neurons, numerically they would still represent only a minority of the beta subunits present (<25%). I don't think sufficient thought has been given to this in the discussion of the results. Unless there is some preferential association of CaV2.3 with β2a and/or β2e, there would be a mix of channels with the majority incapable of supporting pacemaking in SN DA neurons. Given this, one would not necessarily expect that the gating characteristics of CaV2.3 would be the same as what is obtained with reconstituted channels in tsA201 cells where all the channels are assembled with β2a or β2e (see point #5 below).

      We now give this important point more thought in the discussion and mention that our data would imply such a preferential association of Cav2.3 with β2a and/or β2e and provide possible explanations. In addition, as in the original MS, we also provide alternative interpretations (Discussion, pg 14, 2nd and 3rd paragraph).

      5) The V0.5,inact of putative CaV2.3 channels in SN DA neurons of ‐52.4 mV was said to be 'very similar' to the value of ‐40 mV that was observed in tsA201 cells. A difference of +12 mV in voltage‐dependence gating of ion channels is substantial and should not be brushed off. A more nuanced interpretation would be that in SN DA neurons CaV2.3 likely associates with other beta subunits in addition to b2a and b2e and so one would not necessarily expect the V0.5,inact to be the same as what is observed in reconstituted channels in tsA201 cells.

      The V0.5,inact of ‐52.4 mV refers to the control current. We correctly stated that the V0.5,inact of R‐type current was ‐47.5 mV (as also shown in Table 3), i.e. only about 7 mV more negative than in tsA‐cells. We now rephrased this chapter because we also included the new data with inactivation data of SNX‐482sensitive currents in cultured neurons and in SN DA neurons recorded in slices (Discussion, page 13, 2nd paragraph). We do not refer to "'very similar" (difference ~5 mV) values anymore as suggested.

      Reviewer #2 (Public Review):

      This reviewer is very enthusiastic about the work but notes that most of the conclusions are based on data obtained by overexpressing Cav2.3 and accessory subunits in a heterologous expression system. The authors make a good argument for cross‐correlation between data in tsA‐201 cells and dopaminergic neurons, but it is unclear that the results will translate from one system to another. More data may be needed to do so (the reviewer does understand that these are challenging experiments), which the authors acknowledge in a section about the study's limitations. Based on this, it seems that the title is misleading without additional data supporting the role of Cav2.3 in dopaminergic neurons. Along the prior line, statements linking the study results to potential pathological implications seem a big stretch not supported by current data, and therefore should be eliminated.

      An issue with this manuscript is that the narrative and organization of the data are difficult to follow. The reviewer understands that the authors are weaving a complex story that involves using multiple techniques and approaches. Still, the way the data is organized and described makes the reader go back and forward to compare and contrast results constantly. This is further complicated by the fact that some experiments are done in dopaminergic neurons and others in tsA‐201 cells (the identity of the cell type used should be made clearer), the order of some figures is not appropriate (Supp Fig 1 for example) and some figure panels are not discussed (Supp Fig 5E to 5J).

      The MS has been completely rewritten, based on the additional SNX‐482experiments we have now performed both in the cultured DA neurons as well as in the midbrain slices. We therefore also moved data on effects on the spontaneous activity of cultured neurons by SNX‐482 into the supplement to make the key results easier to follow. The identity of neurons is indicated in all headers of table and figure legends to identify cell types. We also changed the title to “β2‐subunit alternative splicing stabilizes Cav2.3 Ca2+ channel activity during continuous midbrain dopamine neuronlike activity” to attenuate our previous statement regarding a role in dopaminergic midbrain neurons.

    1. Author Response

      Reviewer #1 (Public Review):

      As we lack empirical data of the response of most species to environmental changes, developing predictive tools based on traits that are easier to access or infer may help us developing better management tools. This is the case even for terrestrial mammals, a rather well studied group but with a large study bias towards temperate Europe and North America. This study uses maximum longevity, litter size and body mass to predict the sign and size of the relationships between annual temperature and precipitation anomalies and population growth rates, using the Living Planet database for times series of abundance and Chelsa for weather anomalies. The authors use a Bayesian framework to relate the size and absolute magnitude of the relationships between detrended population growth rates and weather anomalies, the framework accounting for the uncertainty in estimates as well as phylogenetic dependencies. They did not find any systematic effects -- on average the slopes of the relationships were close to 0 -- but the magnitude of the coefficients decreases for species with high maximum longevity and low litter size. Therefore, this study points to possible predictions of the magnitude of the response to weather variability using simple demographic indices such as longevity and litter size. The study has clear limitations that are common to similar "meta-regressions" using publicly available databases, but they are not ignored when discussing the results. One would hope that such limitations would lead to improving the quality of such databases, both in terms of taxonomic and geographic coverage as well as quality of data.

      We would like to thank Reviewer 1 for their overall positive feedback and constructive comments on the method and our predictions. We have now included complementary analyses based on high-quality subsets (≥ 20-year records; using life history traits estimated from structured population models), have clarified our set of hypotheses and discussed our results accordingly. Detailed responses are given below.

      I would like to challenge the authors in terms of why one would expect relationships of a given sign or magnitude. First with respect to sign of relationships, even for the same species and the same weather parameters, one could expect different signs depending on where the study is done with regards to the climatic niche. If one is close to the warm (or wet) edge, any positive temperature (or precipitation) anomalies would probably have a negative effect, but the reverse would happen when close to the cold or dry edge. There are studies showing such demographic and growth rate variability differences. I find therefore hard to interpret the sign of such weather anomalies and what it tells us about the "effect" of weather variability.

      We think that this is an important point to discuss with respect to the importance of within-species variability in population dynamics. Certainly, from the results L203-206 it is clear that populations of the same species can have responses of differing signs. It is also interesting to note that this may be the result of a population’s position in the climatic niche. However, aside from exploring this for species with long-term demographic monitoring across the range, we do not feel that exploring this was in the scope of the current study across species. We agree fully however that adding this perspective to studies of how populations are responding to changing climates is critical. As well as the paper mentioned below by Gaillard et al. (2013), recent work in Plantago lancelota with extensive spatial replication has also begun to reveal these within-range dynamics as a function of latitudinal or climatic gradients (Römer et al. 2021). We have added further discussion of this to the manuscript L330-340. We believe that this point adds to the context of our results highlighting variability within-species. In addition, we have clarified in the introduction that no clear directional responses of populations to weather anomalies was expected among and within species L133-135.

      Römer, G., Christiansen, D. M., de Buhr, H., Hylander, K., Jones, O. R., Merinero, S., ... & Dahlgren, J. P. (2021). Drivers of large‐scale spatial demographic variation in a perennial plant. Ecosphere, 12(1), e03356.

      Second with regards to the magnitude, it is clear that the maximum growth rate is strongly linked to maximum longevity and litter size -- slow species have a much lower maximum rate of growth than fast species. So, one would expect that variability of population growth rates is larger in fast species than slow species, and therefore the magnitude of their response to environmental variability. Now the question might also be whether weather variability explains a smaller or larger proportion of the variability in population growth rates -- that is, does weather have a relatively larger influence in fast species than slow species? You might have the answer but with the multiple standardizations of the response and predictor variables it is not obvious (that is, when you standardize the response and predictor variables, coefficients are correlations, but this is across species, not for a given population).

      The reviewer raises a very interesting and important point on whether the patterns we observe are simply a result of larger variability in growth rates in short-lived species. We have two responses to this point: 1) while there is indeed larger variation in the population growth rates of short-lived species, we believe that this variability is likely an evolved life-history strategy in response to the environment, and thus a key component of patterns we observe, 2) we also feel that our use of models that included annual effects, and state-space models with explicit process-noise terms, account for any confounding effect of this variation.

      To address the first point in more detail, we expect that life-histories (and thus population dynamics) are evolved responses to the environment (Stearns, 1992). For ‘fast’ organisms therefore, their intrinsic life-history strategy results in boom-bust population dynamics relative to ‘slow’ species. This is clearly observable in transient or non-asymptotic dynamics, where short-lived species more often have short-term population dynamics with a greater magnitude (Stott et al. 2011). On this point, we therefore argue that this variation in population growth is part of what we are trying to capture. Anomalies in the weather are therefore expected to act more strongly in ‘fast’ species. Following this point and the comments of Reviewer #3, we have now included more explicit hypotheses in terms of life-history L133-144.

      For the second point, while we may expect this variability to be the result of dynamics we are trying to capture, this does not preclude other sources of variation in population size confounding the patterns we could observe. For example, hunting pressure may influence both short-term population variability and long-term trends. As a result, we aimed to capture this residual variation using auto-regressive terms for year in our GAMs. While these terms do not explicitly model variability in population growth, they do account for a component of the trend, with variation (error around the trend, which is expected to be larger for fast species), and auto-regressive components of population change. Moreover, we did additional analyses using a state-space modelling approach. In the state-space approach, process noise, which in our case would equate to variability in population growth, is explicitly modelled and accounted for. We therefore believe that our analyses account for residual variability in population growth rates. State space models were also highly correlated with our auto-regressive GAMs, and we can therefore conclude that we do not expect that this variability influences our findings. We have now asserted this in the Methods section L531-535.

      Stearns, S.C., 1992. The evolution of life histories (No. 575 S81).

      Stott, I., Townley, S. and Hodgson, D.J., 2011. A framework for studying transient dynamics of population projection matrix models. Ecology Letters, 14(9), pp.959-970.

      Your analyses remove trends -- that is, climate or other systematic change as opposed to weather anomalies (yearly differences) -- and trends might be the main concerns in terms of conservation. This is made clear in the discussion but perhaps not as much in the introduction where you seem to focus on climate change (the title reflects this well, however, as you mention weather, not climate). This confusion between weather and climate is often made in the literature, when reference is made to climate effects rather than weather effects.

      We agree with the reviewer that climate and weather are often conflated in ecological studies. We apologise for this oversight in the introduction, and agree that the narrative and link to weather was not made explicit in the previous version. Following this point and the suggestions of Reviewer #3, we have now restructured large sections of the introduction to improve the clarity of our hypotheses. To address this point, we have now included specific introduction of different components of climate that species populations may respond to, including short-term extreme weather patterns as we explore in this study. Please find this revised section L80-97.

      Finally, I would like to see a measure of how good is the prediction you can make using traits. You may have "significant effects" but not helping much in terms of prediction (see PB Adler et al. 2011 in Science, for an example with species richness and productivity).

      On this point we disagree with the reviewer. The core of our analysis framework was to examine the predictive performance of models. We do not report any significant effects, and instead use Bayesian inference. Throughout the analysis framework, we used explicit tests of out-of-sample predictive performance with leave-one-out cross validation (Vehtari et al. 2017). This is asserted in the manuscript title and results section when introducing our spatial analysis L188-191. Cross validation was combined with model selection to test the predictive performance of a set of candidate models with respect to base models excluding predictors of interest. This predictive performance framework was not applied to examine the directional effects (question 1), as these models did not contain key predictors. However, model selections using predictive performance were done throughout questions 2 and 3, to explore spatial and life-history effects. We highlight this point in both the results L188-191 and methods sections L608-615. In the case of life-history, we found that relative to the base model, out-of-sample predictions were improved when including univariate life-history traits relative to the base model, and thus life-history traits aid in predicting weather responses.

      We did not explore the relative predictive performance of life-history traits with respect to other traits such as dietary specialisation, which have been shown to be important in climate responses (Pacifici et al. 2017). We believe that this would have been out of scope for the purpose of the current study, where we aimed to test specific hypotheses established in life-history theory.

      Pacifici, M., Visconti, P., Butchart, S.H., Watson, J.E., Cassola, F.M. and Rondinini, C., 2017. Species’ traits influenced their response to recent climate change. Nature Climate Change, 7(3), pp.205-208.

      Vehtari, A., Gelman, A. and Gabry, J., 2017. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and computing, 27(5), pp.1413-1432.

      Reviewer #2 (Public Review):

      Jackson et al. present a global analysis of the effects of life history on the response of terrestrial mammal populations to weather, showing that litter size and longevity significantly alter how populations respond to anomalies in temperature and rainfall. The topic is highly interesting, as it has implications for what data we should monitor to make more reliable predictions about species' responses to climatic change, and how we should prioritise which species to conserve by identifying those which might be at greatest risk.

      The authors comprehensively validate their results with substantial secondary analyses, and I believe that their assertions are supported by the results presented here. Whilst global scale analyses such as this provide useful generalities, they should be taken as that: an investigation of the general trends observed across large spatial scales, and caution should be taken extrapolating too far away from the species which have been analysed for this study.

      We thank the reviewer for their positive feedback, and agree with not drawing too many generalities from our findings. In the first paragraph of the discussion L253-262, we now explicitly refer to the results in the context of mammal population-dynamics/conservation.

      Reviewer #3 (Public Review):

      In this study, the authors aim to investigate how mammalian species are likely to respond to climate change. To this end, they investigate the effects of weather anomalies on the growth rates of mammalian populations. They use long-term population records for 157 terrestrial mammals from the Living Planet database. They explore three different questions using a two-step modelling approach: (1) whether temperature and precipitation anomalies have significant effects on population growth rates across species; (2) whether responses differ among species and biomes; and (3) whether life-history traits explain species responses to weather anomalies.

      The work undertaken in this manuscript is of broad appeal in the field and has the potential to inform conservation. Overall, the methodology is sound and the modelling framework robust; the authors took care to test the robustness of their models by fitting alternative sets of models. The two-step design of this study is interesting and the choice of the study system is relevant for the questions the authors aim to tackle. The authors also paid attention to some important points that are at times overlooked such as resolving taxonomy before running their analyses. I also appreciated the fact that the authors made their code available.

      We thank the reviewer for their positive feedback on the manuscript, which highlights many of our key goals with the paper.

      I nevertheless think that, in its present form, the main weakness of this manuscript is the clarity of the writing, the framing of the study and the overall flow. I found the manuscript at times a bit difficult to follow. That said, I think there is much scope for the authors to improve it. First, I think the work would benefit from better explanation of the underlying hypotheses. Second, in some places I think the authors go into a lot of details at the expense of clarity. As such, I think the authors should strive to better balance clarity with detailed information (notably in the results and methods; adding summary sentences, for example, could help clarify these sections). Third, I think there is room for improvement in the narrative and the flow of the introduction and the discussion. Finally, I think stronger justifications are sometimes required regarding specific points of the analysis.

      I believe that the conclusions of this work are supported by the data and the analyses, and think they are of interest and relevant to the field. However, I think the discussion should highlight the main limitations of the study. In particular, I think the biases in the data should be discussed, and notably whether these biases are expected to affect the results (and if so, in what way).

      To conclude, I think that beyond the aforementioned weaknesses of this study, the results and the methods are of interest for the field. I think the modelling framework is applicable to other study systems and relevant to the field as well.

      We warmly thank the reviewer for their positive words and thorough constructive feedback. We have extensively re-worked large sections of the manuscript (particularly the discussion and introduction) based on these points, and done our best to address all of them. Generally, we have strived to improve the clarity and succinctness of the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, Guggenmos proposes a process model for predicting confidence reports following perceptual choices, via the evidence available from stimuli of various intensities. The mechanisms proposed are principled, but a number of choices are made that should be better motivated - I develop below a number of concerns by order of importance.

      I’d like to thank the reviewer for their thorough and excellent review. It’s no set phrase that this review substantially improved the manuscript.

      1) Lack of separability of the two metacognitive modules.

      Can the author show that the proposed model can actually discriminate between the noisy readout module and the noisy report module? The two proposed modules have a different psychological meaning, but seem to similarly impact the confidence output. Are these two mutually exclusive (as Fig 1 suggests), or could both sources of noise co-exist? It will be important to show model recovery for introducing readout vs. report at the metacognitive level, e.g., show that a participant best-fitted by a nested model or a subpart of the full model, with a restricted number of modules (some of the parameters set to zero or one), is appropriately recovered? (focusing on these two modules) This raises the question of how the two types of sigma_m are recoverable/separable from each other (and should they both be called sigma_m, even if they both represent a standard deviation)? If they capture independent aspects of noise, one could imagine a model with both modules. More evidence is needed to show that these two capture separate aspects of noise.

      Testing the separability of the two noise types (readout, report) is a great idea and I have now performed a corresponding recovery analysis. Specifically, I have simulated data with both noise types for different regimes of sensory and metacognitive noise. As shown in the new Figure 7—figure supplement 6, the noise type can be precisely recovered in the most typical regimes.

      I now refer to this analysis in the subsection 2.4 Model recovery (Line 521ff):

      “One strength of the present modeling framework is that it allows testing whether inefficiencies of metacognitive reports are better described by metacognitive noise at readout (noisy-readout model) or at report (noisy-report model). To validate this type of application, I performed an additional model recovery analysis which tested whether data simulated by either model are also best fitted by the respective model. Figure 7—figure supplement 6 shows that the recovery probability was close to 1 in most cases, thus demonstrating excellent model identifiability. With fewer trials per observer, recovery probabilities decrease expectedly, but are still at a very good level. The only edge case with poorer recovery was a scenario with low metacognitive noise and high sensory noise. Model identification is particularly hard in this regime because low metacognitive noise reduces the relevance of the metacognitive noise source, while high sensory noise increases the general randomness of responses.”

      In principle, both noise modules can co-exist and model inversion should be possible (though mathematically more complicated). On the other hand, I anticipate that parameter recovery would be extremely noisy in such a scenario. For this work, I decided to not test this possibility as it would add a lot of complexity, with a high probability of ultimately being unfeasible.

      2) The trade-off between the flexibility of the model (modularity of the metacognitive part, choice of the link functions) and the generalisability of the process proposed seems in favor of the former. Does the current framework really allow to disambiguate between the different models? Or at least, the process modeled is so flexible that I am not sure it allows us to draw general conclusions? Fig 7 and section 3 of the results explain that all models are similar, regardless of module of functions specified; Fig 7 supp shows that half of participants are best fitted by noisy readout, while the other half is best fitted by noisy report; plus, idiosyncrasies across participants are all captured. Does this compromise the generalisability of the modeling of the group as a whole?

      This is a fair point and I understand the question has two components: a) is the model too flexible, potentially preventing generalized conclusions? b) is the flexibility of the model recoverable?

      Regarding a), I should emphasize that the manuscript (and toolbox) provides a modeling framework, rather than a single specific model. In other words, researchers applying the framework/toolbox must make a number of decisions: which noise type? which metacognitive biases should be considered? which link function? To ensure interpretability / generalizability, researchers have to sufficiently constrain the model. Due to this framework character, it makes sense that the manuscript is submitted under the Tools & Resources Article format rather than the Research Article format.

      On the other hand, I agree that it is the duty of the manuscript introducing the framework to provide all necessary information to help the researcher make these decisions. This is where the reviewer’s point b) is critical and I hope that with the new parameter and model recovery analyses in the present revision (see other comments) I meet this requirement to a satisfactory degree.

      To clarify the scope and aim of the paper, I now put a new subsection in front of the example application to the data from Shekhar and Rahnev, 2021 (Line 534ff):

      “It is important to note that the present work does not propose a single specific model of metacognition, but rather provides a flexible framework of possible models and a toolbox to engage in a metacognitive modeling project. Applying the framework to an empirical dataset thus requires a number of user decisions: which metacognitive noise type is likely more dominant? which metacognitive biases should be considered? which link function should be used? These decisions may be guided either by a priori hypotheses of the researcher or can be informed by running a set of candidate models through a statistical model comparison. As an exemplary workflow, consider a researcher who is interested in quantifying overconfidence in a confidence dataset with a single parameter to perform a brain-behavior correlation analysis. The concept of under/overconfidence already entails the first modeling decision, as only a link function that quantifies probability correct (Equation 6) allows for a meaningful interpretation of metacognitive bias parameters. Moreover, the researcher must decide for a specific metacognitive bias parameter. The researcher may not be interested in biases at the level of the confidence report, but, due to a specific hypothesis, rather at metacognitive biases at the level of readout/evidence, thus leaving a decision between the multiplicative and the additive evidence bias parameter. Also, the researcher may have no idea whether the dominant source of metacognitive noise is at the level of the readout or report. To decide between these options, the researcher computes the evidence (e.g., AIC) for all four combinations and chooses the best-fitting model (ideally, this would be in a dataset independent from the main dataset).”

      In addition, the website of the toolbox now provides a lot more information about typical use cases: https://github.com/m-guggenmos/remeta

      3) More extensive parameter recovery needs to be done/shown. We would like to see a proper correlation matrix between parameters, and recovery across the parameter space, not only for certain regimes (i.e. more than fig 6 supp 3), that is, the full grid exploration irrespective of how other parameters were set.

      The recovery of the three metacognitive bias parameters is displayed in Fig 4, but what about the other parameters? We need to see that they each have a specific role. The point in the Discussion "the calibration curves and the relationships between type 1 performance and confidence biases are quite distinct between the three proposed metacognitive bias parameters may indicate that these are to some degree dissociable" is only very indirect evidence that this may be the case.

      A comprehensive parameter recovery analysis is indeed a key analysis that was missing in the first version of the manuscript. I now performed several analyses to address this, rewrote and extended section 2.3 on parameter recovery. The new parameter recovery analysis was performed as follows (Line 455ff):

      “To ensure that the model fitting procedure works as expected and that model parameters are distinguishable, I performed a parameter recovery analysis. To this end, I systematically varied each parameter of a model with metacognitive evidence biases and generated data. Specifically, each of the six parameters (σs, ϑs, δs, σm, 𝜑m, δm) was varied in 500 equidistant steps between a sensible lower and upper bound. The model was then fit to each dataset. To assess the relationship between fitted and generative parameters, I computed linear slopes between each generative parameter (as the independent variable) and each fitted parameter (as the dependent variable), resulting in a 6 x 6 slope matrix. Note that I computed (robust) linear slopes instead of correlation coefficients, as correlation coefficients are sample-sizedependent and approach 1 with increasing sample size even for tiny linear dependencies. Thus, as opposed to correlation coefficients, slopes quantify the strength of a relationship. Comparability between the slopes of different parameters is given because i) slopes are – like correlation coefficients – expected to be 1 if the fitted values precisely recover the true parameter values (i.e., the diagonal of the matrix) and ii) all parameters have a similar value range which makes a comparison of off-diagonal slopes likewise meaningful. To test whether parameter recovery was robust against different settings of the respective other parameters, I performed this analysis for a coarse parameter grid consisting of three different values for each of the six parameters except σm, for which five different values were considered. This resulted in 35·51 = 1215 slope matrices for the entire parameter grid.”

      In addition, I computed additional supplementary analyses assessing a case with fewer trials, a model with confidence biases, and models with mixed evidence and confidence biases. For details about these analyses, I kindly point the reviewer to section 2.3. Together, these new analyses demonstrate that parameter recovery works extremely well across different regimes and for all model parameters, including the metacognitive bias parameters mentioned in the reviewer’s comment.

      1.8: It would be important to report under what regimes of other parameters these simulations were conducted. This is because, even if dependence of Mratio onto type 1 performance is reproduced, and that is not the case for sigma_m, it would be important to know whether that holds true across different combinations of the other parameter values.

      I now repeated this analysis for various settings of other parameters and include the results as new Figure 6—figure supplement 2. While the settings of other parameters affect the type 1 performance dependency of Mratio (with some interesting effects such as Mratio > 1), parameter recovery of sigma_m is largely unaffected. The same basic point thus holds: Mratio shows a nonlinear dependency with type 1 performance, but sigma_m can be recovered largely without bias under most regimes (the main exception is a combination of low sensory noise and high metacognitive noise under the noisy-readout model, which is also mentioned in the manuscript).

      Is lambda_m meaningfully part of the model, and if so, could it be introduced into the Fig 1 model, and be properly part of the parameter recovery?

      I now reworked the part about metacognitive biases to make it more consistent and to introduce lambda_m on equal footing with the other metacognitive bias parameters. I now distinguish between metacognitive evidence biases (the two main bias parameters of the original model, phi_m and theta_m) and metacognitive confidence biases, i.e. lambda_m and a new additive confidence bias parameter kappa_m. The schematic presentation of the model framework in Figure 1 is updated in accordance:

      This change also complies with reviewer 2, who rightfully pointed out that the original model framework put much stronger emphasis on bias parameters loading on evidence than on confidence. The metacognitive confidence bias parameters are now also part of the parameter recovery analyses (Figure 7—figure supplement 2).

      While it is still feasible to combine the two evidence-related bias parameters and lambda_m – as queried by the reviewer – not all mixed combinations of evidence- and confidence-related bias parameters perform well in terms of model recovery (in particular, combining all four parameters; cf. Figure 7—figure supplement 3). Hence, a decision on the side of the modeler is required. I comment on this important aspect at the end of the section 1.4 about metacognitive biases (Line 276ff):

      “Finally, note that the parameter recovery shown in Figure 4 was performed with four separate models, each of which was specified with a single metacognitive bias parameter (i.e., 𝜑m, δm, λm, or Km). Parameter recovery can become unreliable when more than two of these bias parameters are specified in parallel (see section 2.3; in particular, Figure 7—figure supplement 3). In practice, the researcher thus must make an informed decision about which bias parameters to include in a specific model (in most scenarios one or two metacognitive bias parameters are a good choice). While the evidence-related bias parameters 𝜑m and δm have a more principled interpretation (e.g., as an under/overestimation of sensory noise), it is not unlikely that metacognitive biases also emerge at the level of the confidence report (λm, km). The first step thus must always be a process of model specification or a statistical comparison of candidate models to determine the final specification (see also section 3.1).”

      4) An important nuance in comparing the present sigma_m to Mratio is that the present model requires that multiple difficulty levels are tested, whereas instead, the Mratio model based on signal detection theory assumes a constant signal strength. How does this impact the (unfair?) comparison of these two metrics on empirical data that varied in difficulty level across trials? Relatedly, the Discussion paragraph that explained how the present model departs from type 2 AUROC analysis similarly omits to account for the fact that studies relying on the latter typically intend to not vary stimulus intensity at the level of the experimenter.

      I thank the reviewer for this comment which made me realize that I incorrectly assumed that my model requires multiple stimulus difficulty levels. The only parameter that would require multiple stimulus intensities is the sensory threshold parameter, but for this parameter I already state that it requires additional stimulus difficulties close to threshold (Line 147ff). Otherwise I now made extensive tests that the model works just fine with constant stimuli. My reasoning mistake (iirc) was related to the fact that I fit a metacognitive link function, which I thought would require variance on the x-axis; but of course there is already plenty of variance introduced through noise at the sensory level, so multiple difficulty levels are not required to fit the metacognitive level. I now removed the relevant references to this requirement from the manuscript.

      Nevertheless, I agree that it is interesting to perform the comparison between Mratio and sigma_m also for a scenario with constant stimuli. See both the new Figure 6–supplement 1 with constant stimuli, and the (updated) main Figure 6 with multiple stimulus levels for comparison.

      The general point still holds also for constant stimuli: Mratio is not independent of type 1 performance. Thus, the observed dependence on type 1 performance is not due to the presence of varying stimulus levels. I now reference this new supplementary figure in Result section 1.8 (Line 389).

      5) 'Parameter fitting minimizes the negative log-likelihood of type 1 choices (sensory level) or type 2 confidence ratings (metacognitive level)'. Why not fitting both choices and confidence at the same time instead of one after the other? If I understood correctly, it is an assumption that these are independent, why not allow confidence reports to stem from different sources of choice and metacognitive noise? Is it because sensory level is completely determined by a logistic (but still, it produces the decision values that are taken up to the metacognitive level)?

      The decision to separate the two levels during parameter inference was deliberate. I now explain this choice in the beginning of Result section 2 (Line 416ff):

      “The reason for the separation of both levels is that choice-based parameter fitting for psychometric curves at the type 1 / sensory level is much more established and robust compared to the metacognitive level for which there are more unknowns (e.g., the type of link function or metacognitive noise distribution). Hence, the current model deliberately precludes the possibility that the estimates of sensory parameters are influenced by confidence ratings.”

      Indeed, I would regard it as highly problematic if the estimates of sensory parameters were influenced by confidence ratings, which are shaped by a manifold of interindividual quirks and biases and for which computational models are still in a developmental stage. Yet, from a pure simulation-based parameter recovery perspective, in which the true confidence model is known, using confidence ratings would indeed make sensory parameter estimation more precise (because of the rich information contained in continuous confidence ratings which is lost in the binarization of type 1 choices).

      6) Fig 4 left panels: could you clarify the reasoning that due to sensory noise, overconfidence is expected, instead of having objective and subjective probability correct aligning on the diagonal? Shouldn't the effects of sensory noise average out? In other words, why would the presence of sensory noise systematically push towards overconfidence rather than canceling out on average?

      As an intuitive explanation consider the case that no signal is present in a stimulus, e.g., a line grating in a clockwise/counterclockwise orientation discrimination task with an angle of 0 degrees. Since there is no true information in the stimulus, type 1 performance will be at chance level irrespective of sensory noise.

      However, sensory noise matters for the metacognitive level. Assuming no sensory noise (i.e., sigma_s = 0), the observer’s stimulus/decision variable would be zero and thus confidence would be zero. Thus, confidence would exactly match type 1 performance. Yet, assuming the presence of sensory noise, the stimulus estimate (“decision value”) will be always different from point-zero, if ever so slightly. While the average estimate of the stimulus variable across trials will indeed cancel out to zero, each individual trial will be different from zero (in either direction) and hence also the confidence will be different from zero in each trial. Since confidence is unsigned, the average confidence will be greater than zero and thus give the impression of an overconfident observer.

      Note that this explanation was implicitly included in the paragraph on the 0.75 signature of confidence (“When evidence discriminability is zero, an ideal Bayesian metacognitive observer will show an average confidence of 0.75 and thus an apparent (over)confidence bias of 0.25. Intuitively this can be understood from the fact that Bayesian confidence is defined as the area under a probability density in favor of the chosen option. Even in the case of zero evidence discriminability, this area will always be at least 0.5 − otherwise the other choice option would have been selected, but often higher.”, Line 257ff).

      7) The same analysis as Fig 6 but for noisy readout instead of noisy reports do not show the same results: both sigma_m and m-ratio vary as a function of type 1 performance. Does this mean that the present model with readout module does not solve the issue of dependency upon type 1 performance?

      I refer to this in the Result section: “The exception is a regime with very high metacognitive noise and low sensory noise under the noisy-readout model, in which recovery becomes biased” (Line 391ff). Indeed, the type 1 performance dependency of sigma_m recovery in this edge case is not as good as in the noisyreport model. However, note that recovery is stable across a large range of d’ including the range typical aimed for in metacognition experiments (i.e., medium performance levels to ensure sufficient variance in confidence ratings).

      It is also important to point out that a failure to recover true parameters under certain conditions is not a failure of the model, but a reflection of the fact that information can be lost at the level of confidence reports. For example, if sensory noise is very high, the relationship between evidence and confidence becomes essentially flat (Figure 3), producing confidence ratings close to zero irrespective of the level of stimulus evidence. It becomes increasingly impossible to recover any parameters in such a scenario. Vice versa if sensory noise is extremely low, confidence ratings approach a value of 1 irrespective of stimulus evidence, and the same issue arises. In both cases there is no meaningful variance for an inference about latent parameters. This issue is more pronounced in the noisy-readout case because it requires an inversion of precisely the relationship between evidence and confidence.

      8) In Eq8, could you explain why only the decision values consistent with the empirical choice are filtered. Is this an explicit modeling of the 'decision-congruence' phenomenon reported elsewhere (eg. Peters et al 2017)? What are the implications of not keeping only the congruent decision values?

      I apologize, this was a mistake in the manuscript. The integration is over all decision values, not just those consistent with the choice. I corrected it accordingly.

      Reviewer #2 (Public Review):

      This paper presents a novel computational model of confidence that parameterises links between sensory evidence, metacognitive sensitivity and metacognitive bias. While there have been a number of models of confidence proposed in the literature, many of these are tailored to bespoke task designs and/or not easily fit to data. The dominant model that sees practical use in deriving metacognitive parameters is the meta-d' framework, which is tailored for inference on metacognitive sensitivity rather than metacognitive biases (over- and underconfidence). This leaves a substantial gap in the literature, especially as in recent years many interesting links between metacognitive bias and mental health have started to be uncovered. In this regard, the ReMeta model and toolbox is likely to have significant impact on the field, and is an excellent example of a linked publication of both paper and code. It's possible that this paper could do for metacognitive bias what the meta-d' model did for metacognitive sensitivity, which is to say have a considerable beneficial impact on the level of sophistication and robustness of empirical work in the field.

      The rationale for many of the modelling choices is clearly laid out and justified (such as the careful handling of "flips" in decision evidence). My main concern is that the limits to what can be concluded from the model fits need much clearer delineation to be of use in future empirical work on metacognition. Answering this question may require additional parameter/model recovery analysis to be convincing.

      I thank the reviewer for these encouraging and constructive comments!

      Specific comments:

      • The parameter recovery demonstrated in Figure 4 across range of d's is impressive. But I was left wondering what happens when more than one parameter needs to be inferred, as in real data. These plots don't show what the other parameters are doing when one is being recovered (nor do the plots in the supplement to Figure 6). The key question is whether each parameter is independently identifiable, or whether there are correlations in parameter estimates that might limit the assignment of eg metacognitive bias effects to one parameter rather than another. I can think of several examples where this might be the case, for instance the slope and metacognitive noise may trade off against each other, as might the slope and delta_m. This seems important to establish as a limit of what can be inferred from a ReMeta model fit.

      This is an excellent point and was also raised by reviewer #1. See major comment 3 of reviewer #1 for a detailed response. In short, I now provide comprehensive analyses that demonstrate successful parameter recovery across different regimes and both noisy types (noisy-readout, noisy-report). See Figure 7.

      Regarding the anticipated trade-offs between the confidence slope (now referred to as multiplicative evidence bias) and metacognitive noise / delta_m (now additive evidence bias), there is a single scenario in which this becomes an issue. I describe this in the Results section as follows (Line 480ff):

      “Here, the only marked trade-off emerges between metacognitive noise σm and the metacognitive evidence biases (𝜑m, δm) in the noisy-readout model, under conditions of low sensory noise. In this regime, the multiplicative evidence bias 𝜑m becomes increasingly underestimated and the additive evidence bias δm overestimated with increasing metacognitive noise. Closer inspection shows that this dependency emerges only when metacognitive noise is high – up to σm  0.3 no such dependency exists. It is thus a scenario in which there is little true variance in confidence ratings (due to low sensory noise many confidence ratings would be close to 1 in the absence of metacognitive noise), but a lot of measured variance due to high metacognitive noise. It is likely for this reason that parameter inference is problematic. Overall, except for this arguably rare scenario, all parameters of the model are highly identifiable and separable.” In my experience, certain trade-offs in specific edge cases are almost inescapable for more complex models. Overall, I think it is fair to say that parameter recovery works extremely well, including the ‘trinity’ of metacognitive noise / multiplicative evidence bias / additive evidence bias.

      • Along similar lines, can the noisy readout and noisy report models really be distinguished? I appreciate they might return differential AICs. But qualitatively, it seems like the only thing distinguishing them is that the noise is either applied before or after the link function, and it wasn't clear whether this was sufficient to distinguish one from the other. In other words, if you created a 2x2 model confusion matrix from simulated data (see Wilson & Collins, 2019 eLife) would the correct model pathway from Figure 1 be recovered?

      Great point. I introduced a new subsection 2.4 “Model recovery”, in which I demonstrate successful recovery of noisy-readout versus noisy-report models. See also my response to the first comment of Reviewer #1, which includes the new model recovery figure and the associated paragraph in the manuscript. The key new figure is Figure 7—figure supplement 6.

      • Again on a similar theme: isn't the slope parameter rho_m better considered a parameter governing metacognitive sensitivity, given that it maps the decision values onto confidence? If this parameter approaches zero, the function flattens out which seems equivalent to introducing additional metacognitive noise. Are these parameters distinguishable?

      Indeed, the parameter recovery analysis shows a slight negative correlation between the slope parameter (now termed multiplicative evidence bias) and metacognitive noise (Figure 7). As the reviewer mentions, this is likely caused by the fact that both parameters lead to a flattening /steepening of the evidenceconfidence relationship. For reference, in the empirical dataset by Shekhar & Rahnev, the correlation between AUROC2 and the multiplicative evidence bias is almost absent at r = −0.017. Critically, however, while an increase of the metacognitive noise parameter σm will ultimately lead to a truly flat/indifferent relationship between evidence and confidence, the multiplicative evidence parameter 𝜑m only affects the slope (i.e., asymptotically confidence will still reach 1). This is one reason why parameter recovery for both σm and 𝜑m works overall very well. The differential effects of σm and 𝜑m are now better illustrated in the updated Figure 3:

      Also conceptually, the multiplicative evidence parameter 𝜑m plausibly represents a metacognitive bias, with either interpretation that I suggest in the manuscript: as a an under/overestimation of the evidence or as a an over/underestimation of one’s own sensory noise, leading to under/overconfidence, respectively. In sum, I think there are strong arguments for the present formalization and interpretation.

      • The final paragraph of the discussion was interesting but potentially concerning for a model of metacognition. It explains that data on empirical trial-by-trial accuracy is not used in the model fits. I hadn't appreciated this until this point in the paper. I can see how in a process model that simulates decision and confidence data from stimulus features, accuracy should not be an input into such a model. But in terms of a model fit, it seems odd not to use trial by trial accuracy to constrain the fits at the metacognitive level, given that the hallmark of metacognitive sensitivity is a confidence-accuracy correlation. Is it not possible to create accuracy-conditional likelihood functions when fitting the confidence rating data (similar to how the meta-d' model fit is handled)? Psychologically, this also makes sense given that the observer typically knows their own response when giving a confidence rating.

      While I agree of course that metacognitive sensitivity quantifies the relationship confidence-accuracy relationship, a process model is a distinct approach and requires distinct methodology. Briefly, the current model fit cannot be improved upon, as it is based on a precise inversion of the forward model. Computing accuracy-conditional likelihoods would lead to a biased parameter estimates, because it would incorrectly imply that the observer has access to the accuracy of their choice. While the observer knows their choice, as the reviewer correctly notes, they do not know the true stimulus category and hence not their accuracy.

      I argue in the manuscript that both approaches (descriptive meta-d’, explanatory process model) have their advantages and disadvantages. The concept of meta-d’ / metacognitive sensitivity does not care why a particular confidence rating is the way it is, or whether an incorrect response is caused by sensory noise or by an attentional lapse. On the one hand, this implies that one cannot draw any conclusions about the causes and mechanisms of metacognitive inefficiency, which could be perceived as a major drawback. In this respect, it is a purely descriptive measure (cf. last comment of Reviewer #1). On the other hand, because it is descriptive, it can simply compare the confidence between correct and incorrect choices and thus, in a sense, capture a more thorough picture of metacognitive sensitivity; that is, being metacognitively aware not only of the consequences one’s own sensory noise (as in typical process models), but also of all other sources of error (attentional lapses, finger errors, etc.). I now added an additional paragraph in which I summarize the comparison of type 2 ROC / meta-d’ and process models along these lines (Line 800ff):

      “In sum, while a type 2 ROC analysis, as a descriptive approach, does not allow any conclusions about the causes of metacognitive inefficiency, it is able to capture a more thorough picture of metacognitive sensitivity: that is, it quantifies metacognitive awareness not only about one’s own sensory noise, but also about other potential sources of error (attentional lapses, finger errors, etc.). While it cannot distinguish between these sources, it captures them all. On the other hand, only a process model approach will allow to draw specific conclusions about mechanisms – and pin down sources – of metacognitive inefficiency, which arguably is of major importance in many applications.”

      • I found it concerning that all the variability in scale usage were being assumed to load onto evidencerelated parameters (eg delta_m) rather than being something about how subjects report or use an arbitrary confidence scale (eg the "implicit biases" assumed to govern the upper and lower bounds of the link function). It strikes me that you could have a similar notion of offset at the level of report - eg an equivalent parameter to delta_m but now applied to c and not z. Would these be distinguishable? They seem to have quite different interpretations psychologically: one is at the level of a bias in confidence formation, and the other at the level of a public report.

      I substantially reworked the section about metacognitive biases, including an additive metacognitive bias (κm) also at the level of confidence. The previous version of the manuscript already included a multiplicative bias parameter loading onto confidence (previously referred to as ‘confidence scaling’ parameter, now multiplicative confidence bias λm), but it was considered optional and e.g. not part of the parameter recovery analyses.

      My previous emphasis on biases that load onto evidence-related variables was due to a more principled interpretation (e.g. ‘underestimation of sensory noise’), but I agree that metacognitive biases must not necessarily be principled and may be driven e.g. by the idiosyncratic usage of a particular confidence scale. Updated Figure 1 sketches the new, more complete model.

      Is a mix of evidence- and confidence-related metacognitive bias parameters distinguishable? I tested this in Figure 7—figure supplement 3.

      The slope matrices show that e.g., the model suggested by the reviewer (two evidence-related bias parameters 𝜑m and δm + an additive confidence-based bias parameter κm) is to some degree dissociable, although slight tradeoffs start to emerge with such a complex model. By contrast, a mix of only one evidence-related and one confidence-related bias parameter is much more robust. In general, I thus recommend using at most two metacognitive bias parameters, which are selected either based on a priori hypotheses or on a model comparison. I comment on the necessity of choosing one’s bias parameters in a new paragraph in section 1.4 about metacognitive biases (Line 276ff):

      “Finally, note that the parameter recovery shown in Figure 4 was performed with four separate models, each of which was specified with a single metacognitive bias parameter (i.e., 𝜑m, δm, λm, or m). Parameter recovery is more unreliable when more than two of these bias parameters are specified in parallel (see section 2.3; in particular, Figure 7—figure supplement 3). In practice, the researcher thus must make an informed decision about which bias parameters to include in a specific model (in most scenarios 1 or 2 metacognitive bias parameters is a good choice). While the evidence-related bias parameters 𝜑m and δm have a more principled interpretation (e.g., as an under/overestimation of sensory noise), it is not unlikely that metacognitive biases also emerge at the level of the confidence report (λm, km). The first step thus must always be a process of model specification or a statistical comparison of candidate models to determine the final specification (see also section 3.1).”

    1. Author Response

      Reviewer #1 (Public Review):

      The paper correctly identifies two biophysical properties that may impact an OHC contribution to cochlear amplification. These are the membrane RC time constant and prestin kinetics. The RC problem was identified by Santos-Sacchi 1989 (1) based on measures of OHC membrane capacitance, electromotility (eM) and published OHC resting and receptor potential data. At issue was a 20 dB disparity between threshold BM measures and eM when the resting potential (RP, ~ -70 mV)) is displaced from the voltage at maximal eM gain or peak NLC (Vh; ~ -40 mV). If RP were actually at Vh then the problem would not have been identified, assuming that prestin's voltage-responsiveness were frequency-independent, which was not in question at that time. Over the last two decades several groups have found prestin performance to be low pass. Isolated OHCs, macro-patch and OHCs in situ cochlear explants all show this low pass behavior. To date, no manipulations of load have pushed the voltage responsiveness to frequency-independent. This manuscript tries to avoid the kinetics issue and attempts to focus on the RC problem that has been dealt with extensively since 1989, including at that time a suggestion that the RC problem points to the dominance of the stereocilia bundle (2).

      The authors suggest that kinetics of prestin is not addressed in the current manuscript, but this is not the case. In ignoring the paper from Santos-Sacchi and Tan 2018 (3), reliance on Frank et al.'s (4) data explicitly utilizes their kinetic results. OHC84 (so-called short cell, 51 um long) is essentially frequency-independent after microchamber voltage roll-off correction. The authors choose 1 nm/mV gain at 50 kHz to work with in their arguments. As it turns out, the corrected eM of OHC84 is wrong since it does not fix the reported 23 kHz microchamber voltage roll-off. While OHC65 is appropriately fixed, OHC84 is over compensated. Gain at 50 kHz should be about half the chosen gain. This is not the most problematic issue for their arguments, however.

      In Santos-Sacchi and Tan 2018 (3) we show that low frequency (near DC) eM gain for OHCs averaging 55.3 um long is about 15 nm/mV. This indicates, as noted in that paper, that the resting potential of OHC84 was far shifted from Vh, accounting for its wide-band frequency response. If indeed, the authors still maintain that OHC eM is frequency-independent, ala Frank et al. (and in disregard to other publications where, to the contrary, eM gain would be far less at 50 kHz - see (5, 6)), then the eM gain at 50 kHz should be closer to 15 nm/mV; large enough, I think, to make their RC problem exercise overkill. That is, even in 1989 such a gain would not have suggested an RC problem. This is assuming that the normal resting potential is at Vh. Of course, at Vh membrane capacitance would be about twice that of linear capacitance (due to peak NLC) - the cell time constant does not discriminate against source of capacitance. All in all, isolated OHC biophysics that provides the voltage dependence and the kinetics of prestin cannot be ignored to deal with the RC problem in isolation. Doing so will give a false sense of how the cochlea works, and will encourage others to neglect, without rationale, published pertinent data, as with the Sasmal and Grosh 2019 (7) model where the OHC is treated as a frequency-independent PZE device.

      Finally, to scorn the significance of component characteristics comprising the whole cochlea, e.g., based on isolated OHC biophysics or prestin's cryo-EM structure, as a fallacy of composition suffers itself from hasty generalization. Of course, knowing the biophysics of single OHCs informs on the system response. Otherwise, the prestin KO would have been an unfunded goal, never allowed to pass beyond a system modeler's review. Indeed, the authors would have none of the "carefully" chosen data to present their RC counter argument. Pertinent, published biophysical characteristics must be included in any critical discussion on OHC performance. For that matter, cochlear modelers must follow the same rule.

      We thank reviewer #1 for the suggestions on the kinetics of prestin and previous literature.

      Although there is no data (to our best knowledge) for electromotilty (eM) in isolated basal murine OHCs, a more thorough review of the existing literature on the topic suggest that the assumed parameters are indeed a reasonably conservative estimation of eM in situ.

      Additionally, the OHC parameters are pessimistic enough to account for a doubling of effective capacitance due to NLC.

      Regarding the fallacy of composition, we are puzzled that the reviewer interpreted it as a “scorning” of the OHC biophysics, obviously important for cochlear function. The raised point is simple and rather obvious: a system built with low-pass filters doesn’t mean that the system is a low-pass filter. This is elucidated with the analogy, familiar to electrical engineers, that high- and band-pass filters are often built by cascading and mixing the response of low-pass filters. The “fallacy of composition” therefore lies in the conclusion that since eM is “low-pass”, it can’t possibly contribute to high frequency amplification. Strikingly, this conclusion is often based on measured vibrations near the OHCs showing transfer functions with >30 dB peak-to-tail ratio, and that are somewhat consistent with the inner working of cochlear models. That is, we are criticizing one specific interpretation of the biophysical data, not certainly suggesting that collecting and analyzing the data in the first place is unimportant.

      Reviewer #2 (Public Review):

      In the inner ear, the cochlea transforms sound-induced vibrations into electrical signals that are sent to the brain. Cochlear outer hair cells (OHCs) are thought to amplify these vibrations, but it is unclear how amplification works. Sound-induced vibrations modulate the current entering an OHC, which drive its receptor potential, causing the OHC to change length. The change in length owing to the receptor potential variation, known as the OHC's electromotile response, depends on the size of the receptor potential. However, the receptor potential decreases with increasing sound frequency, because of the resistance (R) and capacitance (C) of the OHC's membrane. This paper addresses the RC problem, limitations on high-frequency amplification owing to the OHC's receptor potential decreasing with frequency.

      The authors use a well-known simplification of the RC problem and some back-of-the-envelope calculations to argue that OHCs can amplify sufficiently well at high frequencies to match experimental data, despite the decrease in their receptor potentials. They argue that changes to OHC properties along the cochlea allow them to amplify at high frequencies and that OHCs reduce noise and distortion. They argue against OHCs as being cochlear impedance regulators and that OHCs do not limit cochlear tuning.

      Figure 1 and Equations 1-6 are useful teaching tools but are not novel. The back-of-the-envelope calculations use these equations and a limited number of data points from the literature. There are many prior models that show amplification despite the RC problem, but they are not analyzed or discussed in much detail.

      How RC OHC filtering reduces noise without reducing the signal is not explained. The type of noise calculation done in Appendix 1 is well-known and the application is again a rough back-of-the-envelope calculation. Most of the statements about noise are not fleshed out or supported by calculations.

      The discussion about tonotopic variations has little new data. Fig. 2 uses two data points from the literature and an unpublished data point from a colleague. The fact that BM displacement is smaller at the base than at the apex is well known. There is speculation that reduced OHC motion is "effectively counteracted" by gradients in OHC capacitance and MET current, but no evidence is presented.

      The discussion about distortions is pedagogical but is again speculation without new or strong-supporting evidence. Fig. 3 argues that OHCs might reduce high-frequency distortions, but don't limit the cochlear amplifier. The plots shown are either well-known consequences of filtering or a summary of the authors' previous model data.

      The arguments against OHCs as regulators and that they don't limit tuning are not well flushed out, speculative, and unsupported by new calculations or data.

      This paper does not clarify OHC operation or the RC problem, because it mixes speculation, limited data, and topics that are not clearly related to the problem.

      We agree with reviewer #2 that there are no new physics principles elucidated here, and that most of the discussion relies on simple calculations. But we believe that such simple calculations are the missing piece (absent in the literature) that allow one to appreciate the magnitude of the problem under exam—magnitude typically inflated by focusing on quantities whose physical significance is uncertain. In other words, we believe that the simplicity of the calculations and physical reasoning is not a bug, but a feature of the paper.

      We believe that in his criticism regarding various topics of discussion presenting little or speculative new evidence, this reviewer might not have fully considered that most of the evidence provided here is fundamentally a physics-based review of the recent experimental data, incidentally the same type of data previously employed to argue that the RC problem is dramatic in the first place. Likely we didn't convey this message clearly enough in the manuscript.

      While the arguments against OHCs as regulators are not all new, they are often ignored (or perhaps forgotten) and we believe there is a value in synthesizing them all in one place. The support for these arguments comes from fundamental hydrodynamic principles, previous modeling studies, and most importantly from OCT data collected over the last 6 years. Of course, the discussion on the plausibility of suggested mechanisms lacking a concrete proposal cannot be 100% “analytic”.

      About noise and signal amplification, the missing piece perhaps is that distributed internal noise sources (e.g., thermal and shot noise) are independent of each other and hence spatially incoherent. While the manuscript doesn’t specifically deal with signal vs. noise amplification in cochlear models, spatially distributed amplification is known to boost signals more than internal noise—a principle universally used in telecommunications and addressed in >60-year-old literature.

      Reviewer #3 (Public Review):

      This paper discusses the effect of the low-pass filtering between outer hair cell transducer current and receptor voltage. The filter's cut-off frequency (where the response is down by a factor of 0.71 of its maximum) can be quantified by the resistance and capacitance of the cell hair cell's basolateral membrane. The capacitance value is determined mainly by the lipid membrane and is augmented by the charge movement of the piezoelectric prestin molecule, which endows the OHC with its electromotile properties. The OHC's capacitance (C) value is pretty well known. The resistance (R) is determined mainly by K+ channels in the basolateral membrane, a value that is also known reasonably well. The low-pass cut-off frequency is equal to (2pi*RC)^-1 and has a value of a ~1 to a few kHz - a value that has both experimental and theoretical support. The low-pass filtering of membrane voltage is important because the cell responds to membrane voltage by shortening and lengthening - this electromotility is thought to be key to the cochlea's operation and in particular to cochlear amplification, the process that enhances the magnitude and tuning of the cochlea's passive response to sound. However, the auditory system works to 80 kHz and even higher in some animals. Thus, it has been posed (let's say by team A) that the RC cut-off frequency value of a few kHz makes electromotility too slow to operate "cycle-by-cycle" up to several 10s of kHz. The article under review, representing team B, supports "cycle-by-cycle" action, arguing that the several kHz cut off frequency is not a problem and is even an advantage.

      The arguments put forward in favor of cycle-by-cycle action are:

      1. The size of the motions, even with the low-pass-filtered attenuation are as large or larger as those measured in the cochlea at high frequencies.

      2. Noise is often increasing as frequency decreases, thus low-pass-filtering is actually good, to reduce the predominantly low frequency noise.

      3. Harmonic distortion is at supra-CF frequencies, so it's good if the hair cell is low-pass-filtering to reduce harmonics.

      These three points are reasonable, and the quantification relating to statement 1 is convincing. However, the quantification associated with point 2 is muddled. The hair cell voltage signal is expressed in volts, but the noise value is given in terms of the current mediated by 1-5 channels. A quantitative comparison should be made, with signal and noise expressed in the same units, preferably volts and volts/root(Hz), with a bandwidth estimated. The appendix attempts to be more quantitative and something like that short appendix should be incorporated into the paper. If a quantitative comparison in standard units is not possible with current data, that can be stated and underscores that we really don't know whether the noise is a problem for cycle-by-cycle amplification. Point 3 is reasonable and nicely illustrated in Fig. 3B. I did not get anything from Fig. 3A and the corresponding discussion on page 8 lines 320-335. Panels C and D were under-explained and could be removed, and the caption's reference to "short wave hydrodynamics" was also under-explained.

      The arguments put forward to challenge gain control mechanics, which employ DC shifts to set effective operating conditions:

      1. Operation based on DC and quasi-DC operating points is sensitive to noise, which as noted above is often increasing as frequency decreases.

      2. Operation that employs a DC shift for operating point is likely to work in such a way to reduce stiffness, which has been shown to be inconsistent with active cochlear responses. For example, stiffness reduction would reduce traveling wave wavelength and thus alter the response phase and timing to a degree that has not been observed experimentally. This has long been known and relevant papers are cited.

      Point 4 was not convincing to me because the motions related to setting operating conditions could be larger than the nanoscale cycle-by-cycle response motions - thus these operating point motions could be above the noise values that seem limiting to cycle-by-cycle amplification. Point 5 is a nice reminder of the conclusion that, based on experimental findings and physics-based basic cochlear models, the cochlear amplifier must work by means of energy injection. This point was made clearly by Kolston (well cited in this paper) and later supported by other work.

      The present paper is informative in many ways and offers useful insights for further exploration. It is nicely written and illustrated. Because the signal and noise values are not quantified, the basic claim, that the cochlea amplifier can amplify a noisy signal effectively, is not convincing and that basic question is still unsettled. Overall, the paper would be improved if the claims and arguments were presented more tightly, with fewer digressions, and more modestly.

      We thank reviewer #3 for the many comments and suggestions.

      We agree that plotting the spectral density of a “near-threshold” OHC signal vs. inherent electric noise results in much simplification. Regarding noise and signal amplification, previous work on transmission lines points out that amplification is the way to increase SNR along the line.

      We believe that part of the undergoing confusion is that the problem is not how OHC can amplify a “noisy signal” —the cochlea amplifies “noisy” sounds similarly as it amplifies pure tones— but how OHCs can amplify signals in presence of internal noise. Amplification and detection are two distinct things, and signal amplification does not rely on detection. Detection is an intrinsically nonlinear decision process (e.g., signal present/absent). Amplification in relevant frequency ranges is what allows to detect signals in the real world (e.g., radio receivers). The cochlea (as portrayed by classic theories) does not seem exceptional in this regard.

      We agree that the effect of noise on DC responses is not very clear in the manuscript. Although it is difficult to make quantitative statements on a hypothesis that lacks a concrete mechanistic proposal, ~63% of (inherent) electric noise power is confined below the RC corner frequency, i.e, the frequency band of the regulatory OHC. In presence of (unavoidable) flicker and brown noise (e.g., Brownian motion of stereocilia), this percentage can only increase. Conversely, in the frequency band of OHC cycle-by-cycle amplification, the noise power is only a tiny fraction of the total.

    2. Reviewer #1 (Public Review):

      The paper correctly identifies two biophysical properties that may impact an OHC contribution to cochlear amplification. These are the membrane RC time constant and prestin kinetics. The RC problem was identified by Santos-Sacchi 1989 (1) based on measures of OHC membrane capacitance, electromotility (eM) and published OHC resting and receptor potential data. At issue was a 20 dB disparity between threshold BM measures and eM when the resting potential (RP, ~ -70 mV)) is displaced from the voltage at maximal eM gain or peak NLC (Vh; ~ -40 mV). If RP were actually at Vh then the problem would not have been identified, assuming that prestin's voltage-responsiveness were frequency-independent, which was not in question at that time. Over the last two decades several groups have found prestin performance to be low pass. Isolated OHCs, macro-patch and OHCs in situ cochlear explants all show this low pass behavior. To date, no manipulations of load have pushed the voltage responsiveness to frequency-independent. This manuscript tries to avoid the kinetics issue and attempts to focus on the RC problem that has been dealt with extensively since 1989, including at that time a suggestion that the RC problem points to the dominance of the stereocilia bundle (2).

      The authors suggest that kinetics of prestin is not addressed in the current manuscript, but this is not the case. In ignoring the paper from Santos-Sacchi and Tan 2018 (3), reliance on Frank et al.'s (4) data explicitly utilizes their kinetic results. OHC84 (so-called short cell, 51 um long) is essentially frequency-independent after microchamber voltage roll-off correction. The authors choose 1 nm/mV gain at 50 kHz to work with in their arguments. As it turns out, the corrected eM of OHC84 is wrong since it does not fix the reported 23 kHz microchamber voltage roll-off. While OHC65 is appropriately fixed, OHC84 is over compensated. Gain at 50 kHz should be about half the chosen gain. This is not the most problematic issue for their arguments, however.

      In Santos-Sacchi and Tan 2018 (3) we show that low frequency (near DC) eM gain for OHCs averaging 55.3 um long is about 15 nm/mV. This indicates, as noted in that paper, that the resting potential of OHC84 was far shifted from Vh, accounting for its wide-band frequency response. If indeed, the authors still maintain that OHC eM is frequency-independent, ala Frank et al. (and in disregard to other publications where, to the contrary, eM gain would be far less at 50 kHz - see (5, 6)), then the eM gain at 50 kHz should be closer to 15 nm/mV; large enough, I think, to make their RC problem exercise overkill. That is, even in 1989 such a gain would not have suggested an RC problem. This is assuming that the normal resting potential is at Vh. Of course, at Vh membrane capacitance would be about twice that of linear capacitance (due to peak NLC) - the cell time constant does not discriminate against source of capacitance. All in all, isolated OHC biophysics that provides the voltage dependence and the kinetics of prestin cannot be ignored to deal with the RC problem in isolation. Doing so will give a false sense of how the cochlea works, and will encourage others to neglect, without rationale, published pertinent data, as with the Sasmal and Grosh 2019 (7) model where the OHC is treated as a frequency-independent PZE device.

      Finally, to scorn the significance of component characteristics comprising the whole cochlea, e.g., based on isolated OHC biophysics or prestin's cryo-EM structure, as a fallacy of composition suffers itself from hasty generalization. Of course, knowing the biophysics of single OHCs informs on the system response. Otherwise, the prestin KO would have been an unfunded goal, never allowed to pass beyond a system modeler's review. Indeed, the authors would have none of the "carefully" chosen data to present their RC counter argument. Pertinent, published biophysical characteristics must be included in any critical discussion on OHC performance. For that matter, cochlear modelers must follow the same rule.

      1. J. Santos-Sacchi, Asymmetry in voltage-dependent movements of isolated outer hair cells from the organ of Corti. J. Neurosci. 9, 2954-2962 (1989).<br /> 2. A. J. Hudspeth, How the ear's works work. Nature 341, 397-404 (1989).<br /> 3. J. Santos-Sacchi, W. Tan, The Frequency Response of Outer Hair Cell Voltage-Dependent Motility Is Limited by Kinetics of Prestin. J. Neurosci. 38, 5495-5506 (2018).<br /> 4. G. Frank, W. Hemmert, A. W. Gummer, Limiting dynamics of high-frequency electromechanical transduction of outer hair cells. Proc. Natl. Acad. Sci. U. S. A. 96, 4420-4425 (1999).<br /> 5. J. Santos-Sacchi, D. Navaratnam, W. J. T. Tan, State dependent effects on the frequency response of prestin's real and imaginary components of nonlinear capacitance. Sci. Rep. 11, 16149 (2021).<br /> 6. J. Santos-Sacchi, W. Tan, Complex nonlinear capacitance in outer hair cell macro-patches: effects of membrane tension. Sci. Rep. 10, 6222 (2020).<br /> 7. A. Sasmal, K. Grosh, Unified cochlear model for low- and high-frequency mammalian hearing. Proc Natl Acad Sci U S A 116, 13983-13988 (2019).

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Abdellatef et al. describe the reconstitution of axonemal bending using polymerized microtubules (MTs), purified outer-arm dyneins, and synthesized DNA origami. Specifically, the authors purified axonemal dyneins from Chlamydomonas flagella and combined the purified motors with MTs polymerized from purified brain tubulin. Using electron microscopy, the authors demonstrate that patches of dynein motors of the same orientation at both MT ends (i.e., with their tails bound to the same MT) result in pairs of MTs of parallel alignment, while groups of dynein motors of opposite orientation at both MT ends (i.e., with the tails of the dynein motors of both groups bound to different MTs) result in pairs of MTs with anti-parallel alignment. The authors then show that the dynein motors can slide MTs apart following photolysis of caged ATP, and using optical tweezers, demonstrate active force generation of up to ~30 pN. Finally, the authors show that pairs of anti-parallel MTs exhibit bidirectional motion on the scale of ~50-100 nm when both MTs are cross-linked using DNA origami. The findings should be of interest for the cytoskeletal cell and biophysics communities.

      We thank the reviewer for these comments.

      We might be misunderstanding this reviewer’s comment, but the complexes with both parallel and anti-parallel MTs had dynein molecules with their tails bound to two different MTs in most cases, as illustrated in Fig.2 – suppl.1. The two groups of dyneins produce opposing forces in a complex with parallel MTs, and majority of our complexes had parallel arrangement of the MTs. To clarify the point, we have modified the Abstract:

      “Electron microscopy (EM) showed pairs of parallel MTs crossbridged by patches of regularly arranged dynein molecules bound in two different orientations depending on which of the MTs their tails bind to. The oppositely oriented dyneins are expected to produce opposing forces when the pair of MTs have the same polarity.”

      Reviewer #2 (Public Review):

      Motile cilia generate rhythmic beating or rotational motion to drive cells or produce extracellular fluid flow. Cilia is made of nine microtubule doublets forming a spoke-like structure and it is known that dynein motor proteins, which connects adjacent microtubule doublet, are the driving force of ciliary motion. However the molecular mechanism to generate motion is still unclear. The authors proved that a pair of microtubules stably linked by DNA-origami and driven by outer dynein arms (ODA) causes beating motion. They employed in vitro motility assay and negative stain TEM to characterize this complex. They demonstrated stable linking of microtubules and ODAs anchored on the both microtubules are essential for oscillatory motion and bending of the microtubules.

      Strength

      This is an interesting work, addressing an important question in the motile cilia community: what is the minimum system to generate a beating motion? It is an established fact that dynein power stroke on the microtubule doublet is the driving force of the beating motion. It was also known that the radial spoke and the central pair are essential for ciliary motion under the physiological condition, but cilia without radial spokes and the central pair can beat under some special conditions (Yagi and Kamiya, 2000). Therefore in the mechanistic point of view, they are not prerequisite. It is generally thought that fixed connection between adjacent microtubules by nexin converts sliding motion of dyneins to bending, but it was never experimentally investigated. Here the authors successfully enabled a simple system of nexin-like inter-microtubule linkage using DNA origami technique to generate oscillatory and beating motions. This enables an interesting system where ODAs form groups, anchored on two microtubules, orienting oppositely and therefore cause tag-of-war type force generation. The authors demonstrated this system under constraints by DNA origami generates oscillatory and beating motions.

      The authors carefully coordinated the experiments to demonstrate oscillations using optical tweezers and sophisticated data analysis (Fourier analysis and a step-finding algorithm). They also proved, using negative stain EM, that this system contains two groups of ODAs forming arrays with opposite polarity on the parallel microtubules. The manuscript is carefully organized with impressive movies. Geometrical and motility analyses of individual ODAs used for statistics are provided in the supplementary source files. They appropriately cited similar past works from Kamiya and Shingyoji groups (they employed systems closer to the physiological axoneme to reproduce beating) and clarify the differences from this study.

      We thank the reviewer for these comments.

      Weakness

      The authors claim this system mimics two pairs of doublets at the opposite sites from 9+2 cilia structure by having two groups of ODAs between two microtubules facing opposite directions within the pair. It is not exactly the case. In the real axoneme, ODA makes continuous array along the entire length of doublets, which means at any point there are ODAs facing opposite directions. In their system, opposite ODAs cannot exist at the same point (therefore the scheme of Dynein-MT complex of Fig.1B is slightly misleading).

      Actually, opposite ODAs can exist at the same point in our system as well, and previous work using much higher concentration of dyneins (e.g, Oda et al., J. Cell biol., 2007) showed two continuous arrays of dynein molecules between a pair of microtubules. To observe the structures of individual dynein molecules we used low concentrations of dynein and searched for the areas where dynein could be observed without superposition, but there were some areas where opposite dyneins existed at the same point.

      We realize that we did not clearly explain this issue, so we have revised the text accordingly.

      In the 1st paragraph of Results: “In the dynein-MT complexes prepared with high concentrations of dynein, a pair of MTs in bundles are crossbridged by two continuous arrays of dynein, so that superposition of two rows of dynein molecules is observed in EM images (Haimo et al., 1979; Oda et al., 2007). On the other hand, when a low concentration of the dynein preparation (6.25–12.5 µg/ml (corresponding to ~3-6 nM outer-arm dynein)) was mixed with 20-25 µg/ml MTs (200-250 nM tubulin dimers), the MTs were only partially decorated with dynein, so that we were able to observe single layers of crossbridges without superposition in many regions.” Legend of Fig. 1(C): “Note that the geometry of dyneins in the dynein-MT complex shown in (B) mimics that of a combination of the dyneins on two opposite sides of the axoneme (cyan boxes), although the dynein arrays in (B) are not continuous.”

      If they want to project their result to the ciliary beating model, more insight/explanation would be necessary. For example, arrays of dyneins at certain positions within the long array along one doublet are activated and generate force, while dyneins at different positions are activated on another doublet at the opposite site of the axoneme. This makes the distribution of dyneins and their orientations similar to the system described in this work. Such a localized activation, shown in physiological cilia by Ishikawa and Nicastro groups, may require other regulatory proteins.

      We agree that the distributions of activated dyneins in 3D are extremely important in understanding ciliary beating, and that other regulatory proteins would be required to coordinate activation in different places in an axoneme. However, the main goal of this manuscript is to show the minimal components for oscillatory movements, and we feel that discussing the distributions of activated dyneins along the length of the MTs would be too complicated and beyond the scope of this study.

      They attempted to reveal conformational change of ODAs induced by power stroke using negative stain EM images, which is less convincing compared to the past cryo-ET works (Ishikawa, Nicastro, Pigino groups) and negative stain EM of sea urchin outer dyneins (Hirose group), where the tail and head parts were clearly defined from the 3D map or 2D averages of two-dynein ODAs. Probably three heavy chains and associated proteins hinder detailed visualization of the tail structure. Because of this, Fig.2C is not clear enough to prove conformational change of ODA. This reviewer imagines refined subaverage (probably with larger datasets) is necessary.

      As the reviewer suggests, one of the reasons for less clear averaged images compared to the past images of sea urchin ODA is the three-headed structure of Chlamydomonas ODA. Another and perhaps the bigger reason is the difficulty of obtaining clear images of dynein molecules bound between 2 MTs by negative stain EM: the stain accumulates between MTs that are ~25 nm in diameter and obscures the features of smaller structures. We used cryo-EM with uranyl acetate staining instead of negative staining for the images of sea urchin ODA-MT complexes we previously published (Ueno et al., 2008) in order to visualize dynein stalks. We agree with the reviewer that future work with larger datasets and by cryo-ET is necessary for revealing structural differences.

      That having been said, we did not mean to prove structural changes, but rather intended to show that our observation suggests structural changes and thus this system is useful for analyzing structural changes in future. In the revised manuscript, we have extensively modified the parts of the paper discussing structural changes (Please see our response to the next comment).

      It is not clear, from the inset of Fig.2 supplement3, how to define the end of the tail for the length measurement, which is the basis for the authors to claim conformational change (Line263-265). The appearance of the tail would be altered, seen from even slightly different view angles. Comparison with 2D projection from apo- and nucleotide-bound 3-headed ODA structures from EM databank will help.

      We agree with the reviewer that difference in the viewing angle affects the apparent length of a dynein molecule, although the 2 MTs crossbridged by dyneins lie on the carbon membrane and thus the variation in the viewing angle is expected to be relatively small. To examine how much the apparent length is affected by the view angle, we calculated 2D-projected images of the cryo-ET structures of Chlamydomonas axoneme (emd_1696 and emd_1697; Movassagh et al., 2010) with different view angles, and measured the apparent length of the dynein molecule using the same method we used for our negative-stain images (Author response image 1). As shown in the plot, the effect of view angles on the apparent lengths is smaller than the difference between the two nucleotide states in the range of 40 degrees measured here. Thus, we think that the length difference shown in Fig.2-suppl.4 reflects a real structural difference between no-ATP and ATP states. In addition, it would be reasonable to think that distributions of the view angles in the negative stain images are similar for both absence and presence of ATP, again supporting the conclusion.

      Nevertheless, since we agree with the reviewer that we cannot measure the precise length of the molecule using these 2D images, we have revised the corresponding parts of the manuscript, adding description about the effect of view angles on the measured length in the manuscript.

      Author response image 1. Effects of viewing angles on apparent length. (A) and (B) 2D-projected images of cryo-electron tomograms of Chlamydomonas outer arm dynein in an axoneme (Movassagh et al., 2010) viewed from different angles. (C) apparent length of the dynein molecule measured in 2D-projected images.

      In this manuscript, we discuss two structural changes: 1) a difference in the dynein length between no-nucleotide and +ATP states (Fig.2-suppl.4), and 2) possible structural differences in the arrangement of the dynein heads (Fig.2-suppl.3). Although we realize that extensive analysis using cryo-ET is necessary for revealing the second structural change, we attempted to compare the structures of oppositely oriented dyneins, hoping that it would lead to future research. In the revised manuscript, we have added 2D projection images of emd_1696 and emd_1697 in Fig.2-suppl.3, so that the readers can compare them with our negative stain images. We had an impression that some of our 2D images in the presence of ATP resembled the cryo-ET structure with ADP.Vi, whereas some others appeared to be closer to the no-nucleotide cryo-ET structure. We have also attempted to calculate cross-correlations, but difficulties in removing the effect of MTs sometimes overlapped with a part of dynein, adjusting the magnifications and contrast of different images prevented us from obtaining reliable results.

      To address this and the previous comments, we have extensively modified the section titled ‘Structures of dynein in the dynein-MT-DNA-origami complex’.

      In Fig.5B (where the oscillation occurs), the microtubule was once driven >150nm unidirectionally and went back to the original position, before oscillation starts. Is it always the case that relatively long unidirectional motion and return precede oscillation? In Fig.7B, where the authors claim no oscillation happened, only one unidirectional motion was shown. Did oscillation not happen after MT returned to the original position?

      Long unidirectional movement of ~150 nm was sometimes observed, but not necessarily before the start of oscillation. For example, in Figure 5 – figure supplement 1A, oscillation started soon after the UV flash, and then unidirectional movement occurred.

      With the dynein-MT complex in which dyneins are unidirectionally aligned (Fig.7B, Fig.7-suppl.2), the MTs kept moving and escaped from the trap or just stopped moving probably due to depletion of ATP, so we did not see a MT returning to the original position.

      Line284-290: More characterization of bending motion will be necessary (and should be possible). How high frequency is it? Do they confirm that other systems (either without DNA-origami or without ODAs arraying oppositely) cannot generate repetitive beating?

      The frequencies of the bending motions measured from the movies in Fig.8 and Fig.8-suppl.1 were 0.6 – 1 Hz, and the motions were rather irregular. Even if there were complexes bending at high frequencies, it would not have been possible to detect them due to the low time resolution of these fluorescence microscopy experiments (~0.1 s). Future studies at a higher time resolution will be necessary for further characterization of bending motions.

      To observe bending motions, the dynein-MT complex should be fixed to the glass or a bead at one part of the complex while the other end is free in solution. With the dynein-MT-DNA-origami complexes, we looked for such complexes and found some showing bending motions as in Fig. 8. To answer the reviewer’s question asking if we saw repetitive bending in other systems, we checked the movies of the complexes without DNA-origami or without ODAs arraying oppositely but did not notice any repetitive bending motions. However, future studies using the system with a higher temporal resolution and perhaps with an improved method for attaching the complex would be necessary in these cases as well.

    1. Author Response

      Reviewer #2 (Public Review):

      Schrecker, Castaneda and colleagues present cryo-EM structures of RFC-PCNA bound to 3'ss/dsDNA junction or nicked DNA stabilized by slowly hydrolyzable ATP analogue, ATPyS. They discover that PCNA can adopt an open form that is planar, different from previous models for the loading a sliding clamp. The authors also report a structure with closed PCNA, supporting the notion that closure of the sliding clamp does not require ATP hydrolysis. The structures explain how DNA can be threaded laterally through a gap in the PCNA trimer, as this process is supported by partial melting of the DNA prior to insertion. The authors also visualise and assign a function to the N-terminal domain in the Rfc1 subunit of the clamp loader, which they find modulates PCNA loading at the replication forks, in turn required for processive synthesis and ligation of Okazaki fragments.

      This work is extremely well done, with several structures with resolutions better than 3Å, which a significant achievement given the dynamic nature of the PCNA ring loading process. To investigate the role of the N-terminal domain of Rfc1 in PCNA loading, the authors use in vitro reconstitution of the entire DNA replication reaction, which is a powerful method to identify specific defects in Okazaki fragment synthesis and ligation.

      Important issues

      1. Figure 3B,D,F. I would find them much more informative if the authors showed the overlay between atomic model and cryo-EM density in the main figure. If the figure becomes too busy, the authors could decide to just add additional panels with the overlay as well as the atomic models alone. I do not think that showing segmented density for the DNA alone, as done is Figure 6C is sufficient. Also including the density for e.g. residues Trp638 and Phe582 seems important.

      We thank the reviewer for the suggestion. However, we have been unable to establish a way to show the density for both the protein and DNA in a meaningful manner due to the large number of atoms in the fields of view. For an example, please see Figure 1, which corresponds to Figure 3H. To aid the reader, we have revised several of the Figures and Figure Supplements to include density for the DNA.

      Consistent with our structures, recent work from the Kelch group has identified Trp638 and Phe582 as facilitating DNA base flipping (Gaubitz et al., 2022a). Despite the role in base flipping, no growth defects were observed in cells in which either of these residues were mutated and thus their functional role and the role of DNA base-flipping remains unclear.

      1. Cryo-EM samples preparation included substoichiometric RPA, which has been shown to promote DNA loading of PCNA by RFC. Would the authors expect a subset of PCNA-RFC-DNA particles to contain RPA as well? The glycerol gradient gel indicates that, at least in fraction 5, a complex might exist. If the authors think that the particles analyzed cannot contain RPA, it would be useful to mention this.

      We have no evidence to suggest that RPA cannot be present in the imaged particles. We have revised the text (lines 150 - 152) clarify that while RPA was present in the sample, we did not observe any density that could not be assigned to either DNA, RFC or PCNA. We therefore suggest that RPA does not interact with the complex in a stable manner.

      1. Published kinetic data indicate that ATP hydrolysis occurs before clamp closure. To incorporate this notion in their model, the authors suggest that ATP hydrolysis might promote PCNA closure by disrupting the planar RFC:PCNA interaction surface and hence the dynamic interaction of PCNA with Rfc2 and -5 in the open state. In addition, ATP hydrolysis promotes RFC disengagement from PCNA-DNA by reverting from a planar to an out-of-plane state. This model appears reasonable and nicely combines published data with the new findings reported by the authors. However, the model is oversimplified in Figure 6, where the only depicted effect of ATP hydrolysis is RFC release. Perhaps the authors could use the figure caption to acknowledge that ATP hydrolysis likely still has a role in facilitating PCNA closure.

      We have revised Figure 6 to show that DNA hydrolysis may occur either before or after ring closure.

      1. Can the authors explain what steps should be taken to describe PCNA loading by RFC in conditions where ATP hydrolysis is permitted? How would such experiments further inform the molecular mechanism for the loading of the PCNA clamp?

      As highlighted in point 3 above and by the other reviewers, ATP and ATPgS may alter the behavior and energetic landscape of RFC. In our studies, ATPgS was added trap the complex in a pre-hydrolysis state in which all components are assembled. We have added a section to the discussion noting the potential differences and highlighting the need for future studies to better elucidate the role of nucleotide hydrolysis. To achieve a hydrolysis competent complex, one could apply time-resolved cryo-EM approaches where the complex is formed on the grids and quickly vitrified. Such an approach, particularly if coupled with stopped-flow kinetic analyses, may provide additional insights in the kinetics of loading of PCNA onto DNA by RFC.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This is a well-executed and interesting study addressing a still controversial issue in clathrin-mediated endocytosis, namely the nature of curvature generation during formation of endocytic clathrin coated vesicles. The authors have applied new techniques to this old question, including state-of-the-art high resolution 3D single-molecule localization microscopy (SMLM, i.e. Super-resolution microscopy), a new maximum-likelihood based fitting framework to fit complex geometric models into localized point clouds (Wu et al., 2020, BioRxix) and mathematical modeling leading to a new cooperative curvature model of clathrin coat remodeling and temporal reconstruction of CCP structural dynamics based on the distribution of static super-resolution images. This is an important contribution, but will it resolve the controversy of constant curvature vs constant area for CCP invagination? I doubt it. In some ways the controversy is somewhat contrived and, as this paper shows the answer is unlikely to be either or. Below are some specific comments, in somewhat random order, from someone (a curmudgeon?) who has reviewed and/or carefully read these papers since 1980. Points that the authors should address are in bold. All can be addressed with modifications to the text, as the one experiment I asked for (quantification of clathrin recruitment) is impossible with this approach).

      • I wonder how many people who cite Heuser's 1980 paper have ever read it carefully. Indeed, many of the observations made here were also made by Heuser. Below, for example, is a summary I wrote, but then removed from a review as it was too lengthy "While Heuser favored the model that CCPs assemble first as flat structures and then rearrange during invagination, he was also careful to note several caveats. First, he observed that the edges of CCPs were 'ragged', likely reflecting sites of assembly of new polygons and that pentagons were more abundant at the edges. Thus, he argued that 'if even a few of these edge pentagons were destined to become completely surrounded with hexagons, it would be necessary to conclude that some degree of curvature can be built into coats as soon as they form". Second, by examining tilted sections he observed that "even the flattest baskets have a small degree of inward curvature, and many were complete hemispheres". Finally, he cautioned that his images were snap-shots and a precursor-product relationship could not, therefore, be unambiguously established and that the very large flat lattices he observed might well be 'prove to be some sort of dead end'. We now know that fibroblasts, in particular, have large numbers of static flat clathrin plagues."

      Thus, many of the author's conclusions, i.e. that 'completely flat clathrin coats are rare (pg 12, although they're not numbered), and that curved structures can be seen to emerge from the edges of flat lattices (see Supplemental Figure 1a, 3 examples on the right) are indeed consistent with Heuser's observations. In many ways, Heuser's 1980 paper is used as a straw man argument for the constant area model. The authors should more accurately cite and acknowledge this seminal paper.

      Response: __We thank the reviewer for this insightful and constructive input on the interpretation of the constant area model (CAM). We have revised the discussion (Page 14, Lines 397-402), citing Heuser’s observations more carefully and in similarity of what was already suggested eloquently by the reviewer. We agree that the strict interpretation of the CAM is misleading, and early evidence already suggests its flawed approximation of the endocytic mechanism (further mentioned now on __Page 15, Lines 429-431).

      • As Heuser did in his 1980 classic, the authors here would do well to note several caveats related to their analyses. These include:

      +

      Like Heuser they have assembled static imaged to create a pseudotemporal model, albeit using a much more quantitative approach. Nonetheless, it seems that this assumes only a single, stereotypic pathway for CCV formation. How good is this assumption? We know from dynamic imaging that there exists significant heterogeneity in both the kinetics and the molecular composition of CCPs. The authors should acknowledge this limitation.

      __Response: __We agree with the reviewer that the lack of direct temporal information is a clear limitation of our approach.

      We now introduce this limitation on Page 16, Lines 474-484, where we discuss the disadvantage of reconstructing an average trajectory based on static images. Here, the assumption of a single, stereotypic pathway of endocytosis is addressed. We cannot exclude the possibility of slight mechanistic variations being averaged out using our approach. However, we want to highlight the fact that our approach seems sensitive enough to distinguish between structures that originate via endocytosis, and structures that derived from a different pathway, potentially from the Golgi.

      We further address the kinetic variability in terms of abortive events on Page 14, Lines 405-411, __and discuss their effect on the mechanistic interpretation of our results. Generally speaking, abortive events are characterized as dim and short-lived structures in live-cell acquisitions. As the earliest structures in our data set already contain half the final coat area, we are most likely not capturing these abortive events in the first place (potential technical reasons for not capturing earlier structures are discussed on __Page 14, Lines 385-395).

      • The method, which required that they 'optimized the sample preparation to densely label clathrin at endocytic sites' involves labeling cells to near saturation with rabbit polyclonal antibodies to both clathrin light chains and clathrin heavy chains followed by detection with a second polyclonal donkey anti-rabbit. This gives 20 nm of additional and presumably flexible linker on the label. How might this effect the measurements and modeling? The Wu et al paper, which BTW has not been peer-reviewed, shows high precision fitting of the nuclear pore structure, but using endogenously tagged NUP-95, not two-layers of antibodies. The authors will need to discuss this limitation, it is my biggest concern regarding the analysis shown.

      Response: __We acknowledge the limitations imposed by indirect immunolabelling and formulated a hypothesis on how this could affect our model fit (mentioned on __Page 13, Line 363, illustrated in Supplementary Figure 6). A larger linkage error between label and target molecule would increase the distribution of localizations around the true underlying structure. As LocMoFit fits our spherical model directly to the localization coordinates, it is able to take this distribution into account, and will weigh the fit results based on the uncertainty of the localization estimation. A uniform distribution of labels around the true underlying structure should therefore be fitted accurately also at larger linkage error. A non-uniform labeling could occur should e.g. the densely crowded space between the coat and the plasma membrane not allow for the diffusion of the antibody to the clathrin epitopes. In that case, labeling would be one-sided, and instead of the true underlying structure, LocMoFit would optimize the spherical model to the highest probability density of label around + 10 nm from the true clathrin coat. This would result in an overestimation of the radius by the model, which we could correct by substracting 10 nm from the experimentally determined radius. This was done in Supplementary Figure 6 for the hypotheses of (1) uniform displacement by the antibodies; (2) biased displacement of the antibodies towards the cytosol; and (3) biased displacement of the antibodies towards the plasma membrane. Whilst we see that the fitting parameters scale with the corrected radii, the mechanistic interpretation of partial flat pre-assembly on the membrane, and subsequent bending and surface area growth still holds true.

      • One reason for continued controversy in this field is the lack of rany attempt to resolve findings obtained using different methods. Can a parsimonious explanation be found, or are their artifacts or misinterpretations of previous findings that can explain the discrepancies? Any valid model should fit all of the valid data. For example, the authors fail to cite a recent paper by Willy et al in Dev Cell (PMID 34774130), which has been on BioRxiv since 2019 (doi: https://doi.org/10.1101/715219). Here, similar to this present study, the authors used high resolution SIM-TIR to analyze ~1000 CCPs in 3 different cells lines (sadly non-overlapping with the cells used herein) and in Drosophila embryos to quantitatively test the two models. They conclude that their findings unambiguously support a constant curvature model. The authors would do the field a favor if they carefully read this paper and identified areas of commonality (i.e. that curvature is detected at early stages in both cases) and possible explanations for the discrepancies. Certainly, they should not ignore it.

      Response: __We agree with the reviewer on the importance of consolidating findings from different studies to converge to a generally accepted mechanism of clathrin coat formation. We had indeed cited Willy et al in the introduction, but agree that further discussion of their findings should be included. We therefore discuss their findings in more detail, also in comparison to our work, on __Page 17, Lines 502-511. We agree that we reach contradictory conclusions, which we think lies at least in part with the way that Willy et al. analyze their data. Willy et al. acquire 2D projections of the endocytic clathrin structures, whose size is just at the limit of their image resolution. They then compare their projected sizes to a purist constant area model, which assumes that a coat has to grow to its entire surface as an entirely flat structure and then instantaneously snaps to an increased curvature, resulting in a sudden drop of the projected area (footprint). As we and others (e.g. Bucher et al 2011, Heuser, 1980) have observed, completely flat lattices are rare, and curvature is initiated before final surface area is acquired. We do not agree that the absence of a purist constant area model implies that clathrin mediated endocytosis follows a constant curvature trajectory. Instead, we imagine that our cooperative curvature model is likely to fit well with the observations of Willy and colleagues.

      • An important body of evidence that is not considered in their model or discussion is that derived from live cell imaging. In addition to the heterogeneity mentioned above, studies have shown that the clathrin addition to CCPs is complete (i.e. the growth phase) occurs within the first ~20-30s, followed by a variable length (0->100s) plateau phase (Loerke et al, PMID 21447041). Both the current study and the Willy et al study admit that they may not be able to detect the earliest intermediates in CCP assembly. Indeed, in this study the smallest surface area CCPs are only 2-fold smaller than the largest CCPs, suggesting that over half of the triskelions have been recruited before a CCP can be distinguished from the background of clustered, nonspecifically-bound antibodies. Could the authors be monitoring events during the plateau phase and not the earliest events? Regardless, the findings are important as they address the nature of curvature generation during this plateau phase. While monitoring curvature generation during early events in CME, a recent study (Wang et al., eLife, PMID 32352376) showed that the acquisition of curvature within the first 20s of CCP assembly was a distinguishing feature between abortive and productive events. The authors might discuss how these studies on CCP dynamics might (or might not) inform their models.

      __Response: __We thank the reviewer for this very insightful comment and discuss this hypothesis on __Page 16-17, Lines 485-511. __We suggest that part of the initiating/growth phase observed in live-cell dynamics falls into the fast, flat assembly that we are unable to capture with our approach. It is challenging to clearly identify at which point in real-time we are detecting our earliest sites. We would however argue that the plateau phase in real-time could coincide with curvature generation and final addition of triskelia at the lattice rim. The variability in the duration of this plateau phase could therefore result from variable recruitment speed of triskelia and other factors during the finalizing of the vesicle neck.

      • The authors advertise 'quantitative' description of clathrin coated structure and indeed their measurements and models are quantitative; but there is no measure of intensity/numbers of triskelions and CCP growth: an important piece of quantitative data. I expect this is impossible with indirect immunofluorescence but should be considered as a limitation of the approach. Indeed, to my knowledge no one has yet quantitatively measured curvature generation in parallel to clathrin addition at CCPs (closest is Saffarian and Kirchhausen, PMID 17993495), but they don't discuss the relationship.

      Response: __We agree with the reviewer that quantifying the number of triskelia would be an essential piece of information to correlate area growth and curvature generation with dynamic information retrieved from fluorescence intensity in live-cell studies. Unfortunately, the indirect immunolabelling approach used in this work complicated this quantification, and direct comparison between number of localizations and fluorescence intensity cannot be made. However, we do observe a correlation between coat surface area and number of localizations in our data and show this in the newly added __Supplementary Figure 7. This allows us to formulate the hypothesis on Page 16-17, Lines 485-511, which suggests that the plateauing of fluorescence intensity coincides with curvature generation and final triskelia addition to the coat rim. We further highlight the necessity of capturing both high spatial and temporal resolution simultaneously, to ultimately overcome this limitation.

      • On page 7 equation 1, you assume a constant growth rate for addition of triskelia, but later describe that the rate might be cooperative (as the number of edges increases). How would this affect your modeling?

      Response: __We formulate the __surface area growth rate of the clathrin coat to be proportional to the rim length with a constant____ rate. The cooperativity between clathrin molecules we consider to affect the rate of curvature generation. The more molecules are present, the more the entire coat is inclined to bent. We rephrased that section to emphasize this distinction (Page 8, Line 217).

      Minor points:

      • Can you indicate in the first paragraph of the results that you are using indirect immunofluorescence with rabbit anti-CLCA, anti-CHC and detection with donkey anti-rabbit for labeling, to augment the rather vague statement 'we optimized the sample preparation to densely label clathrin at endocytic sites'.

      Response: __We added a clear indication on the labelling strategy used in this work on __Page 4, Lines 109-110.

      • I'm not comfortable with the conclusioin on page 5 that your data 'indicates that at the time point of scission, the clathrin coat of nascent vesicles is still incomplete'. Other explanations might be the relative kinetics of scission vs CCP growth (i.e. these structures are too transient to detect), or that deeply invaginated pits are sheered-off the membrane during sample preparation (there is evidence that most biochemically isolated CCVs are derived from sheered CCPs).

      Response: __We extended the explanation for the absence of fully closed vesicles with the hypotheses mentioned by the reviewer on __Page 5, Lines 159-161.

      • Bottom of page 5, can you briefly mention what data is shown in Supplemental Figure 2 (ie. Figure 2D and examples of likely non-endocytic CCPs shown in Supplemental Figure 2). When I read this, I questioned your speculation.

      Response: __We clarified the cross reference to (now) Supplementary Figure 3 accordingly on __Page 6, Lines 184-185.

      • Can you indicate N CCPs from N cells in the data in Tables 2-3 for fibroblasts and U2OS cells? Do you observe and have to ignore a larger number of flat/clustered CCPs in the fibroblasts?

      Response: __We indicated the number of cells and sites per data set in the Table captions on __Page 36, Lines 51; 959; and 967. We did not quantify the number of flat/clustered, plaque like structures in our data sets. During data acquisition, we would specifically select cells with minimal number of these structures present, and even within this cell chose an area in the periphery exhibiting low number of plaques. Our data is therefore not ideal to reliably quantify plaque density between different cell lines. Qualitative observations showed that whilst we had to disregard a few cells from the U2OS and SK-MEL-2 cell-lines due to high plaque formation, the 3T3 fibroblasts were relatively straight forward to image, as few cells showed high plaque density. A recent study by Hakanpää et al., 2022 (bioRxiv) showed the decreased formation of plaques when cells were seeded on fibronectin. The fact that fibroblasts excrete their own fibronectin agrees well with our observations of relatively few 3T3 cells exhibiting extensive plaque formation.

      • The last 3 paragraphs of the Introduction are results. The Introduction might best be used to review literature in more detail, discuss the reasons why uncertainty still exists and perhaps indicate how the methods applied here will help.

      Response: __We re-wrote the last 3 paragraphs of the introduction, now clearly stating the knowledge gap in the field, and what methods would be required to bridge it (Page 3, Lines 80-102).__

      Reviewer #1 (Significance (Required)):

      This is another excellent addition to a growing list of papers seeking to define the process of curvature generation at endocytic clathrin coated pits. In my opinion, its impact would be increased by better integrating the results presented here with other studies and methods, including the recent paper by Willy et al and the large body of literature on coated pit dynamics, some of which might be relevant in interpreting results, or at least placing them in a real vs pseudo-temporal perspective. The methods introduced and the quality of imaging, modeling and quantification further increase the study's significance. The finds will be of interest to those in the CME field, those studying membrane curvature generation in other contexts, those modeling CME, vesicle formation and curvature generation and those using SMLM to discern the structure of macromolecular assemblies.

      Reviewer expertise: Clathrin-mediated endocytosis (Sandra Schmid)

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary In this article, the authors aimed to investigate the dynamic of clathrin lattice during clathrin-mediated endocytosis (CME). Overall, they successfully achieved the goal by observing a large number of clathrin spots from several cell lines with 3D single-molecule localization microscopy (SMLM). With the help of this high-resolution imaging technique, they were able to describe the physical properties of each spot and reconstruct the assembly and remodeling of the clathrin coat. Moreover, by comparing the constant area/curvature model with their own data, the authors highlighted that neither of the prevailing models perfectly explained what they observed and proposed 'cooperative curvature model'. With the novel model, the authors were able to reconstruct the clathrin coat remodeling in different cell lines and concluded that the simultaneously bending and assembly of the clathrin coat is a homogenous property of endocytosis.

      The experiments and analytical procedures are well-designed and performed, and the manuscript is well-organized. The conclusion 'cooperative curvature model' was deduced from a large amount of data analysis and clearly stated in the text. I would like to recommend its publication if the following issues will be clarified.

      Major comments:

      1. The authors compared the morphological dynamics of clathrin-coated pit among three different cell lines (SK-MEL-2, U2OS, and 3T3) and found slight differences. As U2OS cells was derived from bone tissues, it has different mechanical properties (membrane tension, elasticity of cortical layer, etc..). It would be interesting to consider those mechanical properties in understanding the morphology (Figure 2) and progress (Figure 4) of the CME. Considering the fact that the bending energy of the plasma membrane is dependent on the membrane tension, they may be able to find some relationships between mechanical properties of the cell cortex and CME.

      __Response: __We thank the reviewer for this comment and very much agree that the relationship between mechanical properties structural adaptation of the endocytic machinery is a highly interesting question. We came to the same conclusion and are therefore exploring this relationship at the moment. This is however not a straightforward task, and the complex nature of plasma membrane mechanics necessitates careful experimental design. It is therefore outside the scope of this publication. We do think this point further highlights the potential of the method presented here, as it allows the investigation of additional principles in clathrin-mediated endocytosis mechanics. We do hope to share our insights on this topic soon.

      In Figure 4, the authors estimated the progression of the CME using the frequency distribution of theta. However, I wonder how they handled the events which were aborted in the middle of the CME. It had been suggested that some CME are aborted during the initial step of the CME. The authors should consider (at least discuss) those abortive events, which can disturb the analysis.

      Response: __Generally speaking, abortive events (now discussed on __Page 14, Lines 405-411) are characterized as dim and short-lived structures in live-cell acquisitions. As the earliest structures in our data set already contain half the final coat area, we are most likely not capturing these abortive events in the first place (potential technical reasons for not capturing earlier structures are discussed on Page 14, Lines 385-395).

      Abortive events throughout the later process of endocytosis would, according to our data, still follow the same mechanistic trajectory as other sites. They could potentially slightly skew our pseudotime analysis, as they would result in an overestimation of specific endocytic stages. The overall mechanistic insight of our work would not be greatly affected, as curvature generation would still occur according to the same trajectory. Due to the low impact on our overall results we do not discuss these late abortive events further.

      Minor comments:

      1. Page5, result section 2. The author should further explain why vesicles from trans Golgi could responsible for the small disconnected set of data points corresponding to the vesicles with larger curvatures.

      Response: __We extended our explanation for the presence of non-endocytically derived structures in our data set on __Page 6, Lines 184-189. We further extended the supplementary information with an additional experiment (Supplementary Figure 4), highlighting the absence of AP2-positive structures within the disconnected population. As AP2 is a specific marker for CME, these results further solidify our hypothesis. Further experiments would be required to determine their exact origin, and are outside of the scope of this publication.

      Page7, line 6. The author assumed that the clathrin coat starts growing on a flat membrane. However, as is mentioned in the discussion, clathrin has been proved to have curvature sensing ability which could be further amplified by adapter proteins by several times (Zeno et al., 2021). So, it seems that clathrin preferred a highly curved membrane instead of a flat one. Is it still reasonable to make this assumption?

      Response: __Whilst our assumption states the growing of clathrin coat on flat membranes, we do not restrict our model to an intercept through 0, and it would therefore still hold true even in the case of growth starting on slightly bent membranes. The impact of the preference of clathrin for curvature is considered as a potential mechanistic explanation for the positive feedback in curvature generation described by our model. We therefore already cite the reference mentioned by the reviewer on __Page 8, Line 224.

      As we do observe flat structures in our data set (discussed more in detail now on Page 14, Lines 396-404), we still think the assumption of early flat growth holds true.

      Page 9, result section 4. In the sentence: "we effectively generated the average trajectories of how curvature, surface area, projected area and lattice edge change during endocytosis in SK-MEL-2 cells (Figure 4B-E)." Here I think the authors are describing Figure 4C-F.

      __Response: __That is correct, an oversight on our part. We changed the cross-reference.

      Page 11, discussion. In the sentence: "A deviation of the cross-sectional profile from a circle is nevertheless preserved in the averaging (Supplementary Figure 5)." I didn't see supplementary figure 5 in the article.

      Response: __We changed the cross-reference. We were addressing a subsection of __Supplementary Figure 8.

      Reviewer #2 (Significance (Required)):

      From a vast amount of microscopic images and data analysis, the manuscript gives a clear model on the progress of the CME, which integrates two opposing models; constant area and constant curvature models. This is a big progress in our understanding of the molecular mechanism of CME, and will attract many researchers in the field of cell biology. From a viewpoint of my expertise (molecular imaging of plasma membrane and endocytic processes), this manuscript has significant impact on the related research fields.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary: The authors used single-molecule localization microscopy of clathrin in fixed cells (2 human cell lines, one mouse) to capture snapshots of a clathrin-mediated endocytosis (CME), fitted these localizations to a geometric model of a forming vesicle, and used these fitted measurements to test existing models of clathrin-mediated vesicle formation before refining their own. Specifically, the closing angle, a measure of vesicle completeness, was used as a proxy for growth-stage of the vesicle such that the many captured snapshots could reconstruct a pseudo-timeline with an unknown parameterization of time on closing angle. Two standard models of CME vesicle formation, where the surface area is kept constant or where the curvature is kept constant, were examined and determined to be incommensurate with the pseudo timelines of curvatures and surface area. The authors then describe their own model for CME vesicle formation, in which neither surface area nor curvature are constant in evolution of the vesicle, and cooperative forces are hypothesized to non-linearly modulate the curvature-growth as a function of closing angle. Additionally, by binning snapshots and then aligning, scaling, and azimuthally smoothing each bin, they reconstruct representations of distinct endocytic stages.

      Major comments:

      Most results are quite convincing, and the authors do a nice job of displaying examples of SMLM data, both with fit results as well as example clathrin assemblies that are too far removed from their budding-vesicle model to be included for analysis, for example. It is also worth noting that the clathrin images themselves appear to be very high-quality - clearly, as detailed in the methods, attention was given to each step of the imaging and reconstruction process.

      While the presented cooperative curvature model seems reasonable and surely fits the curvature-, surface area-, and rim length-vs. closing angle data better than the simplistic constant surface-area and constant curvature models, it also has more parameters, namely: gamma (the initial rate of curvature change with closing angle) and H_0 (the final preferred curvature). It would be appropriate to calculate an information criterion (e.g. Bayesian), using an assumption of Gaussian-distributed errors (presumably the data fitting in R was least squares, so this would match) to justify the additional parameters.

      Response: __This is an important observation by the reviewer. Indeed, our model uses one more parameter compared to the models we compare it with. To justify this, we performed the calculation as suggested by the reviewer, and found that the cooperative curvature model (CoopCM) indeed results in the lowest BIC (__Supplementary Notes). We therefore are confident that out of the three models tested in this work, our CoopCM fits best to the underlying experimental data (Page 8, Lines 232-235).

      A related issue relates to the error in the extracted value of the closing angle from a single 3D reconstruction - the error distribution should be quantified for this very important parameter. The errors in the other parameters extracted from the fits are less important, but would enhance the paper.

      Response: __We thank the reviewer for pointing out the importance of the estimation error of the key parameter closing angle. To address this point, based on the geometrical model, we simulated clathrin-coated structures with closing angles evenly distributed across the entire range (0-180°). This realistic simulation represents the data quality (e.g., localization precision and labeling efficiency) of the experimental data (corresponding methods are included in __Pages 22- 23, Lines 679-706). The result of fitting these structures using LocMoFit shows an unbiased estimation with small spread of the error (overall STD = 2.82°; see the newly included Supplementary Figure 2a).

      Pseudo-temporal sorting on closing angle makes sense and I appreciate the authors mentioning potential caveats to the monotonicity, etc. However, a comment about the impact of closing angle errors on the pseudo-time determinations would be helpful. The agreement of theta-rank plots with the hypothesized sqrt(t) scaling is reassuring.

      I additionally appreciate the robustness of fitting a geometric structure from localizations rather than relying on pseudo-temporal sorting on clathrin count extracted from localization-merging of multi-blinking emitters.

      Response: __The pseudo-temporal sorting is based on the precisely estimated closing angle, and therefore is also precise, as the distribution of the fitted closing angle has no significant distortion compared to the expectation (__Supplementary Figure 2b).

      The authors did a nice job of qualifying their more speculative claims, in particular I appreciated their mentioning the possibility that smaller clathrin coats could be below their detection limit.

      The authors state a set of data points in suppl. figure 2D (and suppl. Fig 3A-C) are "likely" small clathrin-coated vesicles from the trans Golgi. I appreciate the examples rendered in that figure so a reader can appraise, but if they have my background they might not know how reasonable exclusion of this data is from model testing. This claim could be rephrased or the rationale expanded upon to justify the Golgi hypothesis.

      Response: __We agree with the reviewer and further expanded on our hypothesis on the origin of the structures within the disconnected cloud of data points (Page 6, Lines 184-189). We further performed an additional experiment (Supplementary Figure 4)__, where we simultaneously imaged the clathrin coat at high resolution, and the CME specific AP2 complex tagged with GFP at diffraction limited resolution. We observed that there were no AP2-GFP positive structures present in the disconnected cloud of our data set, and conclude that these structures indeed must originate via a different pathway.

      The data and methods are presented such that they could be reproduced, and replicating their experiment in multiple cell lines, across multiple species, would seem to be adequate replication. As mentioned above, the statistical analysis of whether the model complexity is justified by improved goodness of fit is currently missing but can readily be checked and added.

      Minor comments:

      Last paragraph of the introduction, positive feedback is mentioned but not the slowing down as preferred curvature is realized (inclusion of which might help foster a clearer understanding of the model early on).

      Response: __We now mention the slowing down towards a preferred curvature in our introduction on __Page 3, Lines 100-102.

      In Fig. 1, please state in the figure caption what is being displayed in the two large panels and what is the color map. Is this the 3D data from the overlapping elliptical Gaussians projected on the plane in a "hot" map? Further, in the top right small panels, are the x-y images projections of all z, or measured at a specific z?

      Response: __We adjusted Figure 1 and the figure caption to clearly explain what is mentioned in each superresolution panel. The exact details for image rendering, including the color map and gaussian blurring of the localization coordinates are now described in the methods on __Page 21, Lines 625-627. Ultimately, the x-y images represent an enlarged view of the projections as visible in the previous two panels. We hope that rephrasing of Figure 1 legend clarifies this accordingly.

      In Eqn. (1), epsilon is not defined.

      Response: __The definition is mentioned on __Page 8, Line 210, right before the equation, same as for kon.

      For the theta-rank plots (Fig4 B, SFig D-F ii) moving the theta(t)=sqrt(t) red curves behind sorted theta data would make the data easier to see.

      __Response: __We adjusted the Figures according to the reviewer's suggestion.

      "Laser" in sentence about the speckle reducer should probably be plural.

      Response: __We corrected this grammar mistake, and changed “laser” to “lasers” on __Page 20, Line 586.

      I would like to see the "custom" algorithm based on redundant cross-correlation for drift correction briefly described.

      Response: __We added an explanation on the algorithm used for the drift correction on __Pages 20-21, Lines 611-617.

      A legend for supplemental figure 3 A-C would be nice.

      Response: __We added a legend for the various models in (now) __Supplementary Figure 5, and further made some clarifications in the figure caption.

      If the definition of the abbreviation flat-to-curved-transition as FTC was explicit I missed it.

      Response: __As we do not use this abbreviation anywhere else in the manuscript, we removed it from the __Supplementary Note to avoid confusion.

      Resolution of 20 and 30 nm (laterally and axially, respectively) was quoted once towards the beginning of the manuscript as being an improvement resulting from the localization method described in Li et al., 2018. Resolution can be difficult to speak about precisely, but the methods section would seem to indicate that localizations are filtered at 20 nm lateral localization precision (potentially 30 nm axially?), and I think the authors could consider rephrasing to depict this unless I am missing elsewhere a description of the resolution metric being used.

      Response: __The original 20 and 30 nm resolution (laterally and axially) was calculated based on the median localization precision values in x-y and z for a representative image, using the FWHM approach (described in Methods __Page 21, Lines 621-624). After consideration of the reviewer's question, we found the modal value to be a better quantity to calculate the resolution, and changed this in the text accordingly (Page 4, Lines 113-115, and Methods Page 21, Lines 621-624).

      Reviewer #3 (Significance (Required)):

      Proteins involved with inducing curvature in membranes are in general very exciting targets for localization microscopy, yet still for many systems questions remain unanswered. The authors tackle one such question in this manuscript. In other, unresolved, discussions, the posed hypotheses are quite similar to the simplistic models surpassed in this work (e.g. that curvature scales linearly with local protein copy number, or that surface area scales linearly with local protein copy number). The idea of cooperativity may be useful for others to consider, and the authors additionally demonstrate a seemingly smooth workflow using their separately described tools (primarily LoMoFit; Wu et al. 2021).

      I myself am not an expert on CME or vesicle trafficking. My background is primarily in SMLM method development and SMLM / fluorescence image analysis. From my perspective, the novelty of the biological conclusions appears to be the authors' specific cooperative model and the presence of two structural states which are enriched (closing angle 70{degree sign} and 130{degree sign}). As referenced, and authors F. Frey and U. S. Schwarz nicely present in Bucher et al. 2018, the constant curvature and constant surface area models are known to be inaccurate descriptions of CME evolution, and further it is also known that clathrin first assembles small flat structures before beginning to curve the membrane. However, the 3D super-resolution imaging and direct evaluation of a 3D model geometry in this work is a nice extension of the 2D super-resolution imaging and projection evaluation in the authors' previous work studying endocytosis through ensemble averaging in yeast (Mund et al. 2018) as well as the analysis on projections in Bucher et al. 2018. Fully 3D treatment of the clathrin structures allows the authors to orient asymmetric assemblies such that they are averaged out in their ensemble reconstruction, and as they point out the molecular specificity afforded by a fluorescence-based technique ensures unbiased segmentation of clathrin-involved endocytic sites. In other words, while this work does not describe a technical advance not already described elsewhere, it sets a nice example for those researching protein-membrane interactions of how to leverage the right tools to clearly and directly answer their questions. With their additional work to make these tools extensible to other geometries, multiple color channels, etc., I expect their work to inspire quality studies in other systems. That significance is complementary to their proposal of a reasonable model for the geometric evolution of CME.

      References:

      Maximum-likelihood model fitting for quantitative analysis of SMLM data, Yu-Le Wu, Philipp Hoess, Aline Tschanz, Ulf Matti, Markus Mund, Jonas Ries, bioRxiv 2021.08.30.456756; doi: https://doi.org/10.1101/2021.08.30.456756

      Bucher, D., Frey, F., Sochacki, K.A. et al. Clathrin-adaptor ratio and membrane tension regulate the flat-to-curved transition of the clathrin coat during endocytosis. Nat Commun 9, 1109 (2018). https://doi.org/10.1038/s41467-018-03533-0

      Markus Mund, Johannes Albertus van der Beek, Joran Deschamps, Serge Dmitrieff, Philipp Hoess, Jooske Louise Monster, Andrea Picco, François Nédélec, Marko Kaksonen, Jonas Ries, Systematic Nanoscale Analysis of Endocytosis Links Efficient Vesicle Formation to Patterned Actin Nucleation, Cell, 174, 4, (2018). https://doi.org/10.1016/j.cell.2018.06.032.

      s

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      In this article, the authors aimed to investigate the dynamic of clathrin lattice during clathrin-mediated endocytosis (CME). Overall, they successfully achieved the goal by observing a large number of clathrin spots from several cell lines with 3D single-molecule localization microscopy (SMLM). With the help of this high-resolution imaging technique, they were able to describe the physical properties of each spot and reconstruct the assembly and remodeling of the clathrin coat. Moreover, by comparing the constant area/curvature model with their own data, the authors highlighted that neither of the prevailing models perfectly explained what they observed and proposed 'cooperative curvature model'. With the novel model, the authors were able to reconstruct the clathrin coat remodeling in different cell lines and concluded that the simultaneously bending and assembly of the clathrin coat is a homogenous property of endocytosis. The experiments and analytical procedures are well-designed and performed, and the manuscript is well-organized. The conclusion 'cooperative curvature model' was deduced from a large amount of data analysis and clearly stated in the text. I would like to recommend its publication if the following issues will be clarified.

      Major comments:

      1. The authors compared the morphological dynamics of clathrin-coated pit among three different cell lines (SK-MEL-2, U2OS, and 3T3) and found slight differences. As U2OS cells was derived from bone tissues, it has different mechanical properties (membrane tension, elasticity of cortical layer, etc..). It would be interesting to consider those mechanical properties in understanding the morphology (Figure 2) and progress (Figure 4) of the CME. Considering the fact that the bending energy of the plasma membrane is dependent on the membrane tension, they may be able to find some relationships between mechanical properties of the cell cortex and CME.
      2. In Figure 4, the authors estimated the progression of the CME using the frequency distribution of theta. However, I wonder how they handled the events which were aborted in the middle of the CME. It had been suggested that some CME are aborted during the initial step of the CME. The authors should consider (at least discuss) those abortive events, which can disturb the analysis.

      Minor comments:

      1. Page5, result section 2. The author should further explain why vesicles from trans Golgi could responsible for the small disconnected set of data points corresponding to the vesicles with larger curvatures.
      2. Page7, line 6. The author assumed that the clathrin coat starts growing on a flat membrane. However, as is mentioned in the discussion, clathrin has been proved to have curvature sensing ability which could be further amplified by adapter proteins by several times (Zeno et al., 2021). So, it seems that clathrin preferred a highly curved membrane instead of a flat one. Is it still reasonable to make this assumption?
      3. Page 9, result section 4. In the sentence: "we effectively generated the average trajectories of how curvature, surface area, projected area and lattice edge change during endocytosis in SK-MEL-2 cells (Figure 4B-E)." Here I think the authors are describing Figure 4C-F.
      4. Page 11, discussion. In the sentence: "A deviation of the cross-sectional profile from a circle is nevertheless preserved in the averaging (Supplementary Figure 5)." I didn't see supplementary figure 5 in the article.

      Significance

      From a vast amount of microscopic images and data analysis, the manuscript gives a clear model on the progress of the CME, which integrates two opposing models; constant area and constant curvature models. This is a big progress in our understanding of the molecular mechanism of CME, and will attract many researchers in the field of cell biology. From a viewpoint of my expertise (molecular imaging of plasma membrane and endocytic processes), this manuscript has significant impact on the related research fields.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors attempt to optimize the FluoroSpot assay to allow for the assessment of cross-reactive antibodies targeting conserved epitopes shared by multi-allelic antigens and those specific to unique antigen variant at the B cells level. This is a critical aspect to consider when identifying targets of a broad range of cross-reactive antibody for vaccine development and the antigen VAR2CSA used in this work is one that will benefit from the method described in the manuscript.

      Overall, this is a method manuscript with extensive detail of the assay validation process. The description of the assay performance steps using, first monoclonal antibodies and later hybridoma/immortalized B cells was important to understand conditions that can influence the antigen-antibody interactions in the assay. This multiplex approach can assess the cross-reactivity of antibody to up four allelic variants of an antigen with the possibility to explore the affinity of antibody to a particular variant using the RSV measurements. The validation of the assay with PBMC from malaria exposed donors both men and women (that naturally acquired high titer of antibodies to VAR2CSA during pregnancy) is a strength of this work as this is in the context of polyclonal antibodies with more heterogenous antibody binding specificities.

      The ability of the assay to detect cross-reactive antibodies using all four tags appear highly variable even in the context of monoclonal antibody targeting the homologous antigen labelled with all 4 tags.

      We understand the concern for variability, but we think that in general the assay was very consistent. Regardless of the configuration used, we detected strikingly comparable number of spots/well, especially when the homologous antigen labelled with four tags was used (Figure 2A). Similar consistency has been previously reported when a similar assay was used to study cross-reactivity in dengue-specific antibodies.

      Overall, it appears that the assessed antibody reactivity with TWIN tagged antigens was relatively low and this needs to be explained and discussed as the current multiplex method, as it is, might just be optimized for study of cross-reactive antibodies to 3 antigens.

      The LED380 (used to detect and visualize the TWIN tag) indeed gave more background than the other three detection channels. We normally observed a ring of fluorescence at the edge and the middle of the wells, accompanied by lower intensity of the spots. These two characteristics are apparent in the figures and RSV plots presented in the manuscript. In an attempt to reduce these issues, we attempted to substitute the TWIN tag for a BAM tag detected with a peptide-specific antibody (data not presented). However, that approach did not improve the readout and we therefore decided to keep the TWIN-StrepTactin pair for all the experiments. Importantly, even with these issues, routine manual inspection of the wells confirmed the Apex software automatically and efficiently counted “real” spots giving us confidence on the performance of the assay. We acknowledge that exclusion of the LED380 data would lead to higher assay accuracy. However, it would result in reduced ability to assess broad antibody cross-reactivity, which was the primary objective of our study. We have added text briefly discussing this to the revised manuscript (lines 154-160).

      As acknowledged by the authors, the validation of this assay on PBMC from only 10 donors (7 women and 3 men) is a caveat to the conclusion and increasing this number of donors (the authors have previously excelled in B cells analyses of PfEMP1 proteins and would have PBMC readily available) will strengthen the validity of this assay.

      We thank the reviewer for this comment and agree the number of donors tested is far from sufficient to provide any conclusive evidence regarding frequencies of VAR2CSA-specific and cross-reactive B cells in the context of placental malaria. However, we firmly believe that the validation of the assay – which was the objective of the study – is sufficient, especially because we included human B-cell lines isolated from donors naturally exposed to VAR2CSA-expressing parasites. Futures studies including more donors and full-length VAR2CSA antigens are certainly warranted. As the performance of assay has now been validated (this manuscript) to our satisfaction, we are indeed planning such studies.

      Reviewer #2 (Public Review):

      The manuscript describes the development of a laboratory-based assay as a tool designed to identify individuals who have developed broadly cross-reactive antibodies with specificity for regions that are common to multiple variants of a given protein (VAR2CSA) of Plasmodium falciparum, the parasite that causes malaria. The assay has potential application in other diseases for which the question ofacquisition of antibody-mediated immunity, either through natural exposure or through vaccination, remains unresolved.

      From a purely technical/methodological viewpoint, the work described is of high quality, relying primarily on the availability of custom-designed, in-house-derived protein and antibody reagents that had, for the most part, been validated through use in earlier studies. The authors demonstrate a high degree of rigour in the assay development steps, culminating in a convincing demonstration of the ability to accurately and reproducibly quantify cross-reactive antibody types under controlled conditions using well-characterized monoclonal antibodies.

      In a final step, the authors used the assay to assess the content of broadly cross-reactive antibodies in samples from a small number of malaria-exposed African men and women. Given that VAR2CSA is a parasite-derived protein that is exclusively and intimately involved in the manifestation of malaria during pregnancy, with specific localisation to the maternal placental space, the premise is that antibodies -including those with cross-reactive specificities - should be almost exclusively detectable in samples from women, either pregnant at the time of sampling or having been pregnant at least once. The assay functioned technically as expected, identifying antibodies predominantly in women rather than men, but it failed to identify broadly cross-reactive antibodies in the women's samples used, only revealing antibodies with specificity for just one of the different variants used. The latter result could have two mutually non-exclusive explanations. On the one hand, the small number of women's samples (7) screened in the assay could simply be insufficient, demanding the use of a much larger panel. On the other hand, for technical reasons the assay involves the use of only relatively restricted parts of the VAR2CSA protein, and this particular aspect may represent its primary limitation. In earlier work, the authors did identify broadly cross-reactive antibodies in samples from African women, but that work relied on the use of the whole VAR2CSA protein present in its natural state embedded in the membrane of the infected red cell, or as a complete protein produced in the laboratory. The important point being that the whole protein likely interacts with antibodies that recognize protein structures that the isolated smaller parts of the whole protein used in the assay fail to reproduce, and that the cross-reactive antibodies identified recognize these structures that are conserved across different VAR2CSAvariants. The authors recognize these potential weaknesses in their discussion of the results. It is also possible that VAR2CSA variants expressed by parasites from geographically-distinct regions (Africa, Asia, South America) are themselves distinct, and this aspect could also have affected the outcome, since the variant protein sequences used in the assay were derived from parasites originating in these different regions.

      The assay could find application in the malaria research field in the specific context of assessments of antibody responses to a range of different parasite proteins that are, or have been, considered candidates for vaccine development but for which their extensive inherent allelic polymorphism has effectively negated such efforts.

      We thank the reviewer for the kind evaluation. We fully acknowledge the need for more comprehensive studies to assess the robustness of the pilot data regarding antibody cross-reactivity after natural exposure in the present study, which was aimed to document the performance of the complicated multiplexed assay rather than to provide such evidence. As mentioned above, we are currently planning such a study. We also acknowledge the need to assess the degree of cross-reactivity to full-length antigens rather than domain-specific components of them. This is obviously particularly true for large, multi-domain antigens such as PfEMP1 (including VAR2CSA). Such an exercise is complicated by the need for appropriately tagged antigens. We are intrigued by the apparent discrepancy between the degree of antibody cross-reactivity in depletion experiments using individual DBL domains of VAR2CSA (low cross-reactivity) versus full-length VAR2CSA antigens (very substantial cross-reactivity) reported by Doritchamou et al., and are keen to apply our approach to explore that finding. Therefore, as also mentioned above, we are currently planning a study employing tagged full-length VAR2CSA allelic variants as well.

    1. Reviewer #1 (Public Review):

      The authors test whether neurons in V1 show "multiplexing", which means that when two stimuli A and B are presented inside their receptive fields (RFs), the neuronal response fluctuates across trials between coding one of the two, leading to a bimodal spike count histogram. They find evidence for this "mixture" model response in a subset of V1 neurons. They next test whether the spike count noise correlations (Rsc) vary between pairs of neurons that prefer the same versus different stimuli, and show that Rsc is positive for neurons that prefer the same stimulus but negative for neurons that prefer different stimuli.

      While this paper shows some intriguing results, I feel that there are a lot of open questions that need to be addressed before convincing evidence of multiplexing can be established. These points are discussed below:

      1. The best spike count model shown in Figure 2C is confusing. It seems that the number of "conditions" is a small fraction of the total number of conditions (and neurons?) that were tested. Supplementary Figure 1 provides more details (for example, the "mixture" corresponds to only 14% of total cases), but it is still confusing (for example, what does WinProb>Min mean?). From what I understood, the total number of neurons recorded for the Adjacent case in V1 is 1604, out of which 935 are Poisson-like with substantially separated means. Each one has 2 conditions (for the two directions), leading to 1870 conditions (perhaps a few less in case both conditions were not available). I think the authors should show 5 bar plots - the first one showing the fraction for which none of the models won by 2/3 probability, and then the remaining 4 ones. That way it is clear how many of the total cases show the "multiplexing" effect. I also think that it would be good to only consider neurons/conditions for which at least some minimum number of trials are available (a cutoff of say ~15) since the whole point is about finding a bimodal distribution for which enough trials are needed.

      2. More RF details need to be provided. What was the size of the V1 RFs? What was the eccentricity? Typically, the RF diameter in V1 at an eccentricity of ~3 degrees is no more than 1 degree. It is not enough to put 2 Gabors of size 1 degree each to fit inside the RF. How close were the Gabors? I am confused about the statement in the second paragraph of page 9 "typically only one of the two adjacent gratings was located within the RF" - I thought the whole point of multiplexing is that when both stimuli (A and B) are within the RF, the neuron nonetheless fires like A or B? The analysis should only be conducted for neurons for which both stimuli are inside the RF. When studying noise correlations, only pairs that have overlapping RFs such as both A and B and within the RFs of both neurons should be considered. The cortical magnification factor at ~3-degree eccentricity is 2-2.5mm/degree, so we expect the RF center to shift by at least 2 degrees from one end of the array to the other.

      3. Eye data analysis: I am afraid this could be a big confound. Removing trials that had microsaccades is not enough. Typically, in these tasks the fixation window is 1.5-2 degrees, so that if the monkey fixates on one corner in some trials and another corner in other trials (without making any microsaccades in either), the stimuli may nonetheless fall inside or away from the RFs, leading to differences in responses. This needs to be ruled out. I do not find the argument presented on pages 18 or 23 completely convincing, since the eye positions could be different for a single stimulus versus when both stimuli are presented. It is important to show that the eye positions are similar in "AB" trials for which the responses are "A" like versus "B" like, and these, in turn, are similar to when "A" and "B" are presented alone.

      4. Figures 5 and 6 show that the difference in noise correlations between the same preference and different preference neurons remains even for non-mixture type neurons. So, although the reason for the particular type of noise correlation was given for multiplexing neurons (Figure 3 and 4), it seems that the same pattern holds even for non-multiplexers. Although the absolute values are somewhat different across categories, one confound that still remains is that the noise correlations are typically dependent on signal correlation, but here the signal correlation is not computed (only responses to 2 stimuli are available). If there is any tuning data available for these recordings, it would be great to look at the noise correlations as a function of signal correlations for these different pairs. Another analysis of interest would be to check whether the difference in the noise correlation for simply "A"/"B" versus "AB" varies according to neuron pair category. Finally, since the authors mention in the Discussion that "correlations did not depend on whether the two units preferred the same stimulus or different", it would be nice to explicitly show that in figure 5C by showing the orange trace ("A" alone or "B" alone) for both same (green) and different (brown) pairs separately.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their constructive comments and are pleased that all reviewers share our opinion, that the present study “makes an important contribution to the molecular architecture of mitochondria”, is in addition “an important advancement in our understanding of the mechanism by which Cqd1 regulates CoQ distribution” and will “thereby appealing to the broad readership of the journals”. We are convinced that addressing the important points raised by the reviewers will further strengthen the manuscript and result in additional significant insights in the molecular function of Cqd1.

      Reviewer #1:

      The major concerns affecting the conclusions are: 1) Experimental evidence is lacking on the contribution of contact site formation by Cqd1 to the effects on mitochondrial architecture and respiration-dependent growth. Determining the effects of the overexpression of the kinase-dead mutant on mitochondrial morphology and contact site formation with Por1-Om14 can address that.

      We thank reviewer #1 for raising these important points. Indeed, the various functions of Cqd1 might be independent from each other and so far we cannot distinguish between them. As suggested by the reviewer we will analyze the effect of overexpression of CQD1 in the Dups1 deletion mutant and make use of the point mutant in the conserved ATP binding domain which cannot complement the phenotype of the Dups1 Dcqd1 double deletion mutant. We generated a yeast mutant strain expressing Om14-3xHA in the absence of wild type Cqd1. Expression of the cqd1(E330A) mutant in the Om14-3xHA background and subsequent immunoprecipitation will allow us to test whether ATP binding is also essential for contact site formation. Preliminary experiments showed that the overexpression of cqd1(E330A) in the Dcqd1 deletion background results in a growth defect comparable to that caused by overexpression of CQD1 WT. Therefore, we think it might be more promising to analyze the interaction of Om14 and Cqd1 E330A at wild type level in order to avoid pleiotropic effects.

      In addition, we will further characterize the cqd1(E330A) mutant by analyzing the effect of its overexpression on mitochondrial morphology, cell growth and assembly of MICOS and F1FO ATP synthase in the Dcqd1 deletion background.

      2) Related to point #1, Cqd1 overexpression in deltaUsp1 cells could have addressed whether the role of Cqd1 in contact sites and mitochondrial architecture is independent of its role on CoQ distribution and phospholipid metabolism. Further characterization of the kinase-dead Cqd1 mutant on CoQ distribution, contact sites, mitochondrial archictecture and phsophsolipid metabolism might help discerning how these activities can be separated.

      We agree that the related points 1) and 2) raised by reviewer #1 are important and addressed our plans in the response on point 1).

      3) It is unclear how both Cqd1 overexpression and deletion induce mitochondrial fragmentation. Performing live cell imaging with a mitochondrial-phoactivatable GFP to measure mitochondrial fusion rates could help discerning the causes for fragmentation. It is a possibility that overexpression induced fragmentation by activating fission without changing fusion, while deletion induced fragmentation by blocking fusion.

      We thank reviewer #1 for bringing up this point. Perhaps our explanation in this respect was too short. Fig. 4E shows that deletion of CQD1 does not result in altered mitochondrial morphology, however, deletion of CQD1 in the Dups1 background leads to virtual complete fragmentation of the mitochondrial network. This is likely due to inhibition of mitochondrial fusion through disturbed processing of the fusion protein Mgm1 (see Fig. 4D). In contrast, overexpression of CQD1 does NOT result in formation of small mitochondrial fragments, but in formation of huge mitochondrial clusters which in addition contain a large proportion of ER membranes. So, we don’t think that this phenotype is related to either enhanced fission or reduced fusion. We will clarify this point in text of the revised manuscript.

      Minor comment:

      1) Figure 4 claims that mitochondrial function is impaired by ups1 deletion, which Cqd1 deletion exacerbates. However, no respiration data is shown in figure 1, only measurements of mitochondrial architecture are shown. Thus, oxygen consumption measurements are needed to claim effects on mitochondrial function.

      We did not want to claim that mitochondria lose respiratory competence upon simultaneous deletion of CQD1 and UPS1. Actually, our results indicate that the Dups1 Dcqd1 double deletion mutant grows like wild type on complete medium containing glycerol. Therefore, respiration is not impaired in this mutant. However, mitochondrial function is not restricted to ATP production by oxidative phosphorylation. The reviewer probably refers to Figure 4 where we show that mitochondrial biogenesis and dynamics are impaired in the Dups1 Dcqd1 double deletion mutant – the heading of the legend summarizes this as "mitochondrial function". We will be more precise in the revised version on this point and add a panel showing growth of the mutant strain on non-fermentable carbon source to avoid any further confusion.

      2) Some Western blots lack quantifications and statistical analyses of independent experiments.

      It is correct that some quantification and the respective statistics were missing in the initially submitted manuscript. We will add the requested information in the revised version of the manuscript.

      Reviewer #2:

      I have the following concerns for the authors to consider. (1) Although biochemical evidence shows that Cqd1 is likely a factor that forms CS structures in mitochondria, it would make the manuscript stronger if the authors can observe uneven distribution of Cqd1 in the mitochondrial membranes (assessed by fluorescent microscopy or ideally high-resolution microscopy) and the presence of Cqd1 in the region of close apposition of the OM and IM by immunogold labeling for electron microscopy.

      Two independent lines of evidence show that Cqd1 is a novel contact site protein: (i) it is found in the contact site fraction in density gradients (Fig. 6A), and (ii) it can be co-immunoprecipitated with outer membrane proteins (Fig. 6G, H, I). Furthermore, the co-IP is supported by cross-links of expected size (Fig. 6F). In sum, we feel that this is solid evidence to support our claim that Cqd1 is present in mitochondrial contact sites. However, it still might be interesting to check an uneven distribution of Cqd1 in mitochondria, as suggested by the reviewer. We will do this by 3D deconvolution fluorescence microscopy.

      (2) Since the structural characterization of Cqd1 is important to understand its interactions with the OM proteins and other UbiB protein kinase-like family proteins, Coq8 and Cqd2, take different orientations, the membrane topology of Cqd1 should be experimentally analyzed. The authors state, "two hydrophobic stretches can be identified in the Cqd1 sequence, of which the first one (amino acids 125-142) might be a bona fide transmembrane segment" (lines 97-100); then is Cqd1 a single membrane spanning protein or two-membrane spanning protein?  

      Unfortunately, it was not possible to test the location of the N terminus experimentally because an N-terminally tagged variant of Cqd1 (tag inserted between presequence and mature part) turned out to be unstable. We consider it very unlikely that the second hydrophobic stretch is a transmembrane domain as it is rather short (only 11 amino acids). Furthermore, several Cqd1 homologs in other fungi, including Yarrowia lipolytica, Aspergillus niger and Schizosaccharomyces pombe, are lacking the second hydrophobic stretch. Therefore, we propose that the major part of Cqd1 including the protein kinase-like domain is exposed to the intermembrane space. We will point out this more clearly in the revised manuscript.

      (3) The authors state, "conserved GxxxG dimerization motif (amino acids 504‐508)" (Fig. 1A caption), but this description needs a reference. The GxxxG motif was proposed to mediate transmembrane helix-helix association (https://doi.org/10.1006/jmbi.1999.3489), which is not consistent with the membrane topology proposed by the authors.

      We thank reviewer #2 for this comment. It is correct that GxxxG motifs are usually present in transmembrane a-helices. However, there is information available indicating that these motifs may also be present in soluble proteins and are stabilizing dimeric interactions for instance in the homodimeric Holliday-junction protein resolvase (Kleiger et al., 2002; doi: 10.1021/bi0200763.). However, as this point is not critical for our conclusions we will remove the discussion of the GxxxG motif from the revised manuscript.

      (4) What is the role of the kinase activity of Cqd1 in the CS formation? The effects of overexpression of Cqd1 (Fig. 7) should be tested for its E330A mutant.

      We also thank reviewer #2 for raising this important point similar to reviewer #1. Please see our response to point 1) of reviewer #1.

      (5) Is there stoichiometric as well as quantitative information on the 400 kD complex consisting of Cqd1, Por1 and Om14? Does the stoichiometry and amount of the complex depend on the growth condition? Does the complex contain other Por1 interacting IM proteins like Mdm31?

      We appreciate that reviewer #2 points out this important aspect. It might well be that the amount of the Cqd1 containing complex depends on growth conditions since its presence might be important for phospholipid homeostasis, CoQ distribution and mitochondrial architecture and morphology which for sure strongly depend on growth conditions. Therefore, we will try to analyze the amount of the Cqd1 complex present in mitochondria isolated from yeast cells grown on different media by BN-PAGE. So far we do not have any information on the stoichiometry of this complex and we feel that an analysis would go beyond the scope of this study. We agree with reviewer #2 that Mdm31 is an obvious candidate for an interaction partner of Cqd1. We actually tested this by co-immunoprecipitation using Cqd1-3xHA or Mdm31-3xHA. However, none of these approaches resulted in successful co-isolation of the potential interaction partner. We will mention this result in the revised manuscript.

      (6) For Fig. 7E, the authors state, "consistently, we observed dramatically increased mitochondria‐ER interactions Cqd1 overexpression", but this observation could be due to secondary effects because overexpression of Cqd1 itself already caused abnormal morphology of mitochondria.

      We thank reviewer #2 for bringing up this important point. To check whether the increased mitochondria‐ER interactions are a secondary effect due to altered mitochondrial morphology we will analyze the mitochondria‐ER interactions in other mitochondrial morphology mutants by fluorescence microscopy. This will reveal whether abnormal mitochondrial morphology generally leads to disturbed ER structure.

      (7) Since the antagonistic role of Cqd2 to Cqd1 was proposed, the results of the experiments for Cqd1 can be compared with those for Cqd2. For example, what will become of overexpression of Cqd2 instead of Cqd1 for Fig. 7? What is the lipid composition of the cqd1Dcqd2D double deletion mutant cells (the decreased PA level is recovered?)? Lines 424-425: In summary, overexpression of Cqd1 causes severe phenotypes on growth, formation of mitochondrial structural elements, and mitochondrial architecture and morphology. Is this phenotype affected by overexpression of Cqd2?

      This point raised by reviewer #2 is very interesting. Our preliminary experiments and previously published data (Tan et al., 2013) indicate that overexpression of Cqd2 is also toxic and results in the formation of huge mitochondrial clusters. Therefore, we will extend our study and analyze the effect of overexpression of CQD2, either alone or in combination with overexpression of CQD1.

      Reviewer #3:

      1) The central point of the paper is that Cqd1 is part of a novel contact site between the inner and the outer membrane. Om14 and Por1 were identified as outer membrane components of this contact site by immunoprecipitation. The data look convincing but they were generated from targeted experiments to test the involvement of suspected proteins. Ideally, one would like to see a cross-linking mass spectrometry (XL-MS) experiment that identifies the physical interactions of Cqd1 without bias.

      We thank reviewer #3 for acknowledging the presented data as convincing. Considering the significant amount of experiments planned for the revised version of the manuscript, we hope that reviewer #3 agrees that this point is not essential.

      2) Could an analogous blot of the MICOS complex be added to Figure 6D?

      Of course, we are happy to include BN-PAGE analysis showing the running behavior of MICOS next to the Cqd1 containing complex in Fig. 6D.

      3) In the Introduction, a host of contact sites is mentioned, which are partly from older papers. I'm not sure whether this is the accepted view of the field. Also, newer data suggest that the permeability transition pore is derived from complex V rather than ANT, CK, and VDAC. The authors should double check in order to represent the current state of the art

      We thank reviewer #3 for this comment. We will update this part according to the more recent literature.

    1. Author Response

      Reviewer #2 (Public Review):

      First, I want to congratulate the author team on this manuscript, which I read with great pleasure. I think this will be a fine addition to the literature!

      The present MS by Clement et al. provides a comprehensive overview of the brain shapes of lungfishes. Besides previously known/described brain endocasts, the work includes models and descriptions of previously undescribed taxa. Notably, all CT data are deposited online following best practices when working with digital anatomy. The specimen sample is impressive, especially as the sampled material is housed in museum all over the world. Although the sample size may seem numerically low (12 taxa), this actually is a comprehensive sample of fossil (and extant) lungfishes in terms of what's preserved in the first place.

      The study at hand has several goals: (1) The description of lungfish brains for taxa that were previously undescribed; (2) the quantification of aspects of brain shape using morphometric measurements; (3) the characterization of brain shape evolution of lungfishes using exploratory methods that ordinate morphometric measurements into a morphospace.

      The provided 3D data and descriptions will serve as valuable comparisons in future lungfish work. This type of data is imperial for palaeontological studies in general, and the anatomical information will be extremely valuable in the future. For example, anatomical characters related to brain architecture have been shown to be informative about phylogeny in the past, and the presented data may inform future phylogenetic studies. The quantification of brain shape via (largely linear) measurements is relatively simplistic, and can thus only detect gross trends in brain shape evolution among lungfishes. The authors describe several such trends - such as high variation in the olfactory brain region in comparison to other parts of the brain. The results and interpretations drawn from the authors are supported by their data, and the approach taken is valid, even if more sophisticated shape quantification methods (e.g. 3D landmarking) and analytical methods (e.g. explicit phylogenetic comparative methods) are available, which could provide additional insights in the future.

      We agree with Reviewer #2 that 3D geometric morphometrics could have provided more sophisticated analytical methods. However, geometric morphometrics has some limitations with regard to the type of data that we analysed: (1) low sample size and (2) missing/incomplete data. In order to have a comprehensive coverage of the brain shape, it would have required to have numerous landmarks (and semilandmarks) to represent the complexity of brain shape.

      First, our sample size (12 taxa) is low (although it is an impressive sample size when considering the type of data). Although there are no universal rule concerning the ratio “number of specimens / number of landmarks” (Zelditch et al., 2012), ideally the sample size must be from two to three times the number of landmarks. Thus, with a sample size of 12 we could have used ca. 4-6 landmarks which is very limited to describe complex shapes. In addition, in order to use geometric morphometrics (2D or 3D), the landmarks should be present on all the specimens. Because of the partial completeness of the studied fossils, the brain endocasts are not uniformly known for each species. Incomplete and deformed specimens prompt the removal of potential landmarks for analyses. Even using right-left reflexion of the endocasts, most specimens do not share all neurocranial information.

      We agree with Reviewer #2 that a phylogenetic PCA could have provided interesting analytical perspectives. Phylogenetic PCA are available on standard PCA, it is uncertain that it can be used on Bayesian PCA and InDaPCA (this method has been published very recently, and we haven’t found much literature about it). However, we did not find an adaptation of phylogenetic PCA to the BPCA nor the InDaPCA; we even contacted Liam Revell, who created the phylogenetic PCA, about this issue.

      The presented results and interpretations in this regard must be seen as a preliminary assessment of lungfish brain evolution, but it is clearly written and generally well performed.

      A potential shortcoming of the paper is the lack of explicit hypothesis testing, which is not problematic per se, but puts limits on the conclusions the authors can draw from their data.

      We decided to address the issues using exploratory methods rather than testing hypotheses. It is a more conservative approach, since it is the first quantitative analysis of dipnoan endocasts. Future analyses, will be able to formulate hypotheses based on our interpretation of our exploratory approach. We hope to stimulate such hypotheses testing, when in the future further dipnoans will be added; however, one has to remember that ossified neurocrania are known in Devonian dipnoans and one partially ossified neurocranium in a Carboniferous, the remaining dipnoans have cartilaginous neurocrania which limit the sample size from which endocast data could be gathered.

      For example, the authors state that different anatomical parts of the labyrinth (particularly, the utricle with respect to the semicircular canals or saccule) may show modular dissociation from other labyrinth modules, based on the polarity of eigenvalue signs of the PCA analysis. I think this is fine as a first approximation, but of course there are explicit statistical tools available to test for modularity/integration, such as two-block partial least squares regression analysis (Rohlf & Corti 2000, Syst. Biol.). I don't see the lack of usage of such methods as problematic, because you cannot do everything in one paper, and the authors remain careful in their interpretation.

      We agree with Reviewer #2 that different geometric morphometrics methods have been developed to look at variational modularity; one of the co-authors (RC) has been publishing a few papers on patterns of morphological integration and modularity in fishes (see Larouche, Cloutier & Zelditch, 2015, Evol. Biol.; Lehoux & Cloutier, 2015, J. Exp. Zool. Mol. Dev. Evol.; Larouche, Zelditch & Cloutier, 2018, Sci. Rep.). Interesting a priori hypotheses of brain modules could have been formulated and tested for modularity using for example Covariance Ratio (CR) and distance matrix approach. But still the low sample size and the incompleteness of the data are major constrains to test modularity. We would however endeavour to use such methods in future work as more complete material becomes available.

      It may be advisable, however, to add the odd sentence or statement about how some findings are preliminary or hypothesized, and that these should receive further treatment and testing using other methods in the future. I think this approach is actually very rewarding, because then you can inspire future work by outlining outstanding research problems that arise from the new data presented herein.

      We have now included an additional sentence early in the Discussion section stating: “We acknowledge that our investigation of lungfish brain evolution as elucidated from morphometric analysis of cranial endocasts is still preliminary in several respects. We hope that our study can inspire future work on the neural evolution of both fossil and extant lungfish.”

      In the following, I comment on a few aspects of the manuscripts. These represent instances where I had additional thoughts or ideas on how to slightly improve various aspects of the manuscript.

      1) Presentation of PCA results

      The authors provide several PCA analyses (preliminary analyses on partial matrices, BPCA, InDaPCA), and are very explicit about the procedures in general. For instance, I appreciate they explicitely state using correlation matrices for PCA analyses due to the usage of different measurement units among their data.

      Visually, the BPCA and InDaPCA are presented in figures 2 and 3, whereas the preliminary partial matrix PCAs are only reported as supplementary figures. While I don't object to any of this, I find the sequence of information given in the results section suboptimal.

      The figures have now been substantially reorganised to include more within the main body text and not as Supplementary Information, and we hope that this improves the sequence of information within the manuscript.

      The authors start by discussing the partial matrix analyses, although none of these analyses are visually/graphically depicted in the main text figures, and although their results do not seem to be of real importance for the narrative of the discussion. The other two PCA analyses actually are presented afterwards and separately, but they convey some common signals, particularly that the major source of variation seems to be a decreasing olfactory angle with increasing olfactory length, and a scaling relationship between all linear measurements (which all have the same eigenvector signs on the first PC axis). I wonder if an alternative way of presenting the PCA results would be better for this particular MS. For example, the authors could give "first level observations" first ("PCA analyses agree in X,Y,Y"), and then move to second order observations ("Morphospace of BPCA has some interesting taxon distribution with regard to chirodipterids"; "InDaPCA axis projections continuously retrieve clustering of specific variables"). I suspect this would shorten the text somewhat and could serve as a clearer articulation of the take home messages?

      Accordingly with Reviewer #2, we have now provided “first level” observations based on the standard PCA. We added some further comments on the species distribution in the morphospaces.

      2) Selection of PC axes for interpretation

      You describe how you use the broken-stick method to decide how many PC axes are retained for the interpretation of results, which I agree is a good procedure. However, I have a few questions regarding this. First, in line 331 (description of InDaPCA) you state that the first three axes are non-trivial "based on the screeplot" - which got me confused because it sounds a bit like eyeballing off the screeplot. Have you used the broken stick method for all your PCA analyses?

      Originally, we used both screeplot and broken-stick method, however, we are now solely using the broken stick method to determine the number of non-trivial axes. We agree with Reviewer #2 that this method is more rigorous than the scree plot. Our choice is greatly inspired by the studies of Jackson (1993, Ecology) and PeresNeto et al. (2005, Computational Statistics & Data Analysis). We have now edited the text so that our methods are clearer (and removed the text relating to the screeplot such as “based on the screeplot…”).

      The second question relates to the results of the broken stick method, which I did not find reported. Unless I am mistaken, for the xth axis, the method sums the fractions of 1/i (whereby i = x..n; n = number of axes), and divides this number by n to get a value of expected variation per axis. This number is then compared with the actual value of variance explained by the axis. So for the 1st of 17 axes, the broken-stick expectation is = (1 + 1/2 + .. + 1/17) / 17. If you apply this to your BPCA, the third axis' value (i.e., (1/3 + ... + 1/17)/17) is 0.114, which is smaller than the reported 0.120 that PC3 explains. Thus, following the broken stick method, PC3 does explain more variation that expected (and should thus be retained, contra your comment in line 311 which refers to two non-trivial axes)?

      We thank Reviewer #2 for the insightful evaluation of our paper who took the time to validate each step of our analyses. Effectively, we agree with Reviewer #2 that based on the broken stick method the third axis in nontrivial. The value for the third axis is 1,0531310. Thus, we are presenting these results as well as discussing the three PCA projections (axis 1 versus axis 2, axis 2 versus axis 3, axis 1 versus axis 3).

      Related to this potential issue is the presentation of the BPCA results in Fig. 2: You present loadings of three PC axes, although only the first two are considered in morphospace bi-plots and although the text also mentions only two non-trival axes. If the third axis is indeed non-trivial, then the loading-presentation could be retained in the figure, but then the authors should consider showing a PC1 vs. PC3 plot in addition to the currently presented biplot showing the first and second axis only. If the third axis indeed is trivial, as currently suggested by the text, then showing the loadings is unnecessary.

      We consider showing a biplot of PC1 vs PC3 unnecessary as those shown (PC1 vs PC2) already account for 83.4% of the variation captured. We have edited these figures so that the loadings related to PC3 have also now been omitted.

      It would be great if you clarify the usage/application of the broken stick method for all your PCAs. An easy way to report the results may be the add a row to each of your PCA loading tables in the supplements, in which you divide the actual value of variation explained by the value expected under the broken stick method - this way, all axes which explain more variation than expected by the stick method have values larger than 1, and axes which explain less have values lower than 1.

      We have taken this suggestion from Reviewer #2 on board and have now recalculated all values for the brokenstick method for each analysis; we also provide broken-stick values in their respective loading tables in the SI.

      3) Missing commentary on allometry

      In basically all PCA analyses, the first PC axis seems to be dominated by allometric size effects, given that all linear measurements have the same eigenvalue signs. The authors do acknowledge this (lines 314-316; 335-336), but offer no further comment on size effects/allometry.

      We agree that normally the first axis represents variation related mainly to size changes and shape changes related to size (allometry). However, we are reluctant to assume that our first axis corresponds to evolutionary allometry. Among others, Klingenberg & Zimmermann (1992) and Klingenberg (1996) used standard PCA (or multi-group PCA) to disentangle evolutionary and ontogenetic allometry (as well as static allometry) mainly by analysing multiple specimens for each group (or species) in order to have a better repartition of the covariance. Since our sample is limited to 12 species, and that they are all represented by a single specimen (except for Dipterus), it would be difficult to clearly discriminate variation associated to allometry. Even in a case of ontogenetic allometry, a sample size of 12 would have been limited to unambiguously conclude any variation.

      For example, it would be interesting to see how the linear measurements scale with overall head size. Similarly, the authors note that the semicircular canal measurements covary strongly, as do the utricle and saccule height/length measurements (paragraph line 346). Basically, it seems that the semicircular canal measurements scale with one another: as one gets bigger, so gets the other. It is interesting that the utricle does not seem to follow the same scaling pattern as the saccule and semicircular canals, and it would be good to hear if the authors think that there is a functional implication for this. Increases in utricular/saccular/semicircular canal sizes are usually explained by increased sensitivity - so is an increased utricular size a compensatory development to decreased semicircular canal+saccule size to retain an overall level of sensitivity, or does it maybe related to a relative change of importance of the specific functions, e.g. increased importance of linear accelerations in the horizontal plane with simultaneous decrease of importance of angular and vertical accelerations?

      We thank Reviewer 2 for this suggestion about overall head size scaling - endocast measurements. Our original study design also included measurements of dermal skulls, but we omitted this from the final version as the material available was far too incomplete to be able to conduct meaningful analyses. It is a topic of future study that some of us (AC, RC) have already discussed as a potential future project to be investigated.<br /> With respect to the functional implications of the modular dissociation of the labyrinths, we have expanded the final paragraph of the “implications for sensory abilities” within the Discussion, and similarly added the sentence “However, we acknowledge that it is difficult to determine if increased relative utricular size results from greater reliance of sensitivity in the horizontal plane alone, or if it expands to compensate for e.g. relative stagnation of the sacculus + semicircular canals in some way. Further studies, such as investigation of neuronal densities in extant lungfish labyrinths, may potentially in part clarify this uncertainty in future.”

      4) Labyrinth size

      With the above mentioned utricular exception, labyrinth size measurements particularly on the semicircular canals seem to imply that there is a relative consistent scaling relationship between the canals. When one canal gets larger, so do the others, perhaps thereby retaining canal symmetry across different absolute labyrinth sizes. Labyrinth size in tetrapods is often interpreted in relation to body size/mass or head size (e.g. Melville Jones & Spells 1963, Proc. R. Soc. Lond. Biol. Sci.; Spoor & Zonneveldt 1998, Yearb. Phys. Anthr.; Spoor et al. 2002, Nature; Spoor et al. 2007, PNAS; Bronzati et al. 2021, Curr. Biol.), as deviations from the expected labyrinth size per head size indicate increased or decreased relative labyrinth sensitivities. Large relative head sizes of birds and (within) mammals have generally been interpreted as indicative of "active" or "agile" behaviour, although doubt has been casted on these relationships recently (e.g., Bronzati et al. 2021). Increased sampling of relative labyrinth size from various vertebrate groups would be important to better understand labyrinth sizefunction relationships. Melville Jones & Spells (1963) have shown that fishes have large labyrinth sizes compared to most tetrapods, but they don't have lungfish data and the large labyrinth sizes of fishes have often remained uncommented on in tetrapod works. I think this study offers a fantastic opportunity to provide comparative labyrinth size data for lungfishes. In this regard, it would be really interesting to quantify labyrinth size relative to head size, and show a respective (phylogenetic) regression analysis. Ideally, the size of the labyrinth could be quantified along the arc lengths of the semicircular canals, but other ways are also thinkable (for example a box volume of labyrinth size by the existing measurements, contrasted with a box volume of the skull, i.e. heightwidthlength).

      Firstly, many thanks for the suggested reading of Bronzati et al. (2021) And while we consider a labyrinth skull size regression analysis to be a worthwhile suggestion, we have chosen not to include one in this study, partly as there is no phylogenetic regression based on the new methods that we are using, and secondly that it forms the basis of another study currently underway by some of the authors.

    1. Reviewer #1 (Public Review):

      In this study, the authors aimed to address the important question of the mechanism of deep brain stimulation (DBS) in treating Parkinson's disease, based on a mouse model that the authors established previously.

      The strength of the study lies on 1) avoiding the interference of stimulation artefacts of using electrophysiological recording technique, and 2) examining effects on cell-type or projection-specific targets.

      However, there are several critical problems in this study. First, the low temporal resolution and the averaged population signal (rather than from individual neurons) of the fibre photometry data prevents in-depth enough analysis of the effects of DBS on the target areas to draw useful conclusion. Thus, all interpretations were based on an average rise in GCaMP-reported calcium signals with pretty low temporal resolution. As a result, important readouts that were analysed in many previous studies such as the firing patterns (e.g. rhythmic) or synchrony among neurons were missed by this approach. Take one example. The conclusion that antidromic activation is excluded as a possible mechanism is based partly on the lack of good correlation of the averaged calcium signal with the behavioral improvement. However, such a lack of correlation is also evident in the averaged calcium signal and the improvement in movement behavior under 60Hz and 100 Hz stimulation (Figure 2). While a higher average in calcium signal is observed under 60Hz DBS than 100Hz, the improvement in motor behavior is lower than that induced by 100 Hz DBS. This highlights the severe limitation of the fibre photometry data in revealing the therapeutic mechanism of DBS.

      Second, there is no clear elucidation of the pathological changes revealed by the fibre photometry in PD mice to illustrate what is normal and what is abnormal, and how the DBS rectifies the abnormal changes. For example, when we need to interpret the effect of the DBS on calcium activities in the subthalamic nucleus (STN), the substantia nigra pars reticulala (SNr) and the primary motor cortex (M1), what abnormal GCaMP signal did the authors find, compared with healthy control mice? Without such information, it is difficult to get a sense of what an increase in GCaMP signal in STN, SNr and M1 mean with respect to motor control, and therefore what it means with respect to the effect of DBS. With the specific context of a peak (actually a biphasic waveform) of the calcium activity in the PD anima, it is puzzling that a surge of STN is correlated with movement onset, while in principle it should result in movement termination. Therefore, it is critical to know if there is there such a correlation in healthy animals. If yes, this may not indicate a pathological change that needs to rectified by DBS. If no, how the pathological appearance of such change leads to parkinsonian motor symptoms (akinesia, bradykinesia etc) must be established.

      Third, it is well-known that clinical DBS employed at least 120 Hz stimulation. In fact, the authors had also demonstrated in their previous report that the optimal stimulation frequency in the mouse model is around 180Hz. But the present study utilised clearly suboptimal frequencies (60 and 100Hz only) to address the mechanism. It is possible that different mechanisms or combinations of mechanism may take place under different stimulation frequencies. As such, any conclusion drawn from this study may not represent the whole picture.

      Given the above consideration, I do not think that the authors have achieved the aim of their study, as the results cannot convincingly support their conclusions.

    1. Reviewer #3 (Public Review):

      Zadbood and colleagues investigated the way key information used to update interpretations of events alter patterns of activity in the brain. This was cleverly done by the use of "The Sixth Sense," a film featuring a famous "twist ending," which fundamentally alters the way the events in the film are understood. Participants were assigned to three groups: (1) a Spoiled group, in which the twist was revealed at the outset, (2) a Twist group, who experienced the film as normal, and (3) a No-Twist group, in which the twist was removed. Participants were scanned while watching the movie and while performing cued recall of specific scenes. Verbal recall was scored based on recall success, and evidence for descriptive bias toward two ways of understanding the events (specifically, whether a particular character was or was not a ghost). Importantly, this allowed the authors to show that the Twist group updated their interpretation. The authors focused on regions of the Default Mode Network (DMN) based on prior studies showing responsiveness to naturalistic memory paradigms in these areas and analyzed the fMRI data using intersubject pattern similarity analysis. Regions of the DMN carried patterns indicative of story interpretation. That is, encoding similarity was greater between the Twist and No-Twist groups than in the Spoiled group, and retrieval similarity was greater between the Twist and Spoiled groups than in the No-Twist group. The Spoiled group also showed greater pattern similarity with the Twist group's recall than the No-Twist group's recall. The authors also report a weaker effect of greater pattern similarity between the Spoiled group's encoding and the Twist group's recall than between the Twist group's own encoding and recall. Together, the data all converge on the point that one's interpretation of an event is an important determinant of the way it is represented in the brain.

      This is a really nice experiment, with straightforward predictions and analyses that support the claims being made. The results build directly on a prior study by this research group showing how interpretational differences in a narrative drive distinct neural representations (Yeshurun et al., 2017), but extend an understanding of how these interpretational differences might work retrospectively. I do not have any serious concerns or problems with the manuscript, the data, or the analyses. However I have a few points to raise that, if addressed, would make for a stronger paper in my opinion.

      1) My most substantive comment is that I did not find the interpretive framework to be very clear with respect to the brain regions involved. The basic effects the authors report strongly support their claims, but the particular contributions to the field might be stronger if the interpretations could be made more strongly or more specifically. In other words: the DMN is involved in updating interpretations, but how should we now think about the role of the DMN and its constituent regions as a result of this study? There are a number of ideas briefly presented about what the DMN might be doing, but it just did not feel very coherent at times. I will break this down into a few more specific points:

      While many of us would agree that the DMN is likely to be involved in the phenomena at hand, I did not find that the paper communicated the logic for singularly focusing on this subset of regions very compellingly. The authors note a few studies whose main results are found in DMN regions, but I think that this could stand to be unpacked in a more theoretically interesting way in the Introduction.

      Relatedly, I found the summary/description of regional effects in the Discussion to be a bit unsatisfying. The various pattern similarity comparisons yielded results that were actually quite nonoverlapping among DMN regions, which was not really unpacked. To be clear, it is not a 'problem' that the regional effects varied from comparison to comparison, but I do think that a more theoretical exploration of what this could mean would strengthen the paper. To the authors' credit, they describe mPFC effects through the lens of schemas, but this stands in contrast to many other regions which do not receive much consideration.

      Finally, although there is evidence that regions of the DMN act in a coordinated way under some circumstances, there is also ample evidence for distinct regional contributions to cognitive processes, memory being just one of them (e.g., Cooper & Ritchey, 2020; Robin & Moscovitch, 2017; Ranganath & Ritchey, 2012). The authors themselves introduce the idea of temporal receptive windows in a cortical hierarchy, and while DMN regions do appear to show slower temporal drift than sensory areas, those studies show regional differences in pattern stability across time even within DMN regions. Simply put, it is worth considering whether it is ideal to treat the DMN as a singular unit.

      2) I think that some direct comparison to regions outside the DMN would speak to whether the DMN is truly unique in carrying the key representations being discussed here. I was reluctant to suggest this because I think that the authors are justified in expecting that DMN regions would show the effects in question. However, there really is no "null" comparison here wherein a set of regions not expected to show these effects (e.g., a somatosensory network, or the frontoparietal network) in fact do not show them. There are not really controls or key differences being hypothesized across different conditions or regions. Rather, we have a set of regions that may or may not show pattern similarity differences to varying degrees, which feels very exploratory. The inclusion of some principled control comparisons, etc. would bolster these findings. The authors do include a whole-brain analysis in Supplementary Figure 1, which indeed produced many DMN regions. However, notably, regions outside the DMN such as the primary visual cortex and mid-cingulate cortex appear to show significant effects (which, based on the color bar, might actually be stronger than effects seen in the DMN). Given the specificity of the language in the paper in terms of the DMN, I think that some direct regional or network-level comparison is needed.

      3) If I understand correctly, the main analyses of the fMRI data were limited to across-group comparisons of "critical scenes" that were maximally affected by the twist at the end of the movie. In other words, the analyses focused on the scenes whose interpretation hinged on the "doctor" versus "ghost" interpretation. I would be interested in seeing a comparison of "critical" scenes directly against scenes where the interpretation did not change with the twist. This "critical" versus "non-critical" contrast would be a strong confirmatory analysis that could further bolster the authors' claims, but on the other hand, it would be interesting to know whether the overall story interpretation led to any differences in neural patterns assigned to scenes that would not be expected to depend on differences in interpretation. (As a final note, such a comparison might provide additional analytical leverage for exploring the effect described in Figure 3B, which did not survive correction for multiple comparisons.)

      4) I appreciate the code being made available and that the neuroimaging data will be made available soon. I would also appreciate it if the authors made the movie stimulus and behavioral data available. The movie stimulus itself is of interest because it was edited down, and it would be nice for readers to be able to see which scenes were included.

      To sum up, I think that this is a great experiment with a lot of strengths. The design is fairly clean (especially for a movie stimulus), the analyses are well reasoned, and the data are clear. The only weaknesses I would suggest addressing are with regards to how the DMN is being described and evaluated, and the communication of how this work informs the field on a theoretical level.

    1. we need to treat one another with respect despite our differences like this is like an aspiration for people 00:41:01 right except for they thought it was in the bottom quarter of stuff for everybody else so what happens if i'm like i i would like to get back to treating other people's respect but i don't think 00:41:12 they care about that for me back to that ambiguous interactions that we have all the time i'm gonna read disrespect into most everything i see right and so i i think it's really critical like 00:41:25 like i talked about this as like congruence right this need for our private selves and our public selves to be as as closely aligned as possible we've known for a long time that that's that's a critical part of fulfillment 00:41:37 and self-actualization i mean how how do you get there you're the expert on that like how do you how do you get there if you have a divided self like my private self is different than my public self like so we know that at an individual 00:41:48 level but given the the fact of collective illusions i believe this idea of congruence may be the most important thing you can do for other people right because it doesn't help anyone when we misread each 00:42:00 other so profoundly

      Congruence is the antidote to collective illusion.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank all reviewers for their very helpful comments. We feel that the comments pointed to a few main issues that we could remedy. First, we found that many comments and concerns could be addressed with work from our previous paper (doi.org/10.1101/2020.11.24.396002). To fix this, we added additional descriptions of experiments done previously and additional citations. We discussed more in depth an experiment that shows that ciliary membrane and membrane proteins can indeed come from the cell body plasma membrane, we talked more about how we determined that the actin puncta are representative of membrane remodeling functions like endocytosis, and we discussed some of the mechanistic insights provided by our previous work that are applicable here. We hope that this helps to answer several of the reviewer questions. Second, there were a few experiments we thought would be useful to add. These are represented in bold in our responses below. Briefly, we added a measure of internalization or endocytosis in the drp3 mutant, we added some images of cilia to the phalloidin figure to orient readers’ views of the cell, we added some additional mechanistic insight (supplemental figure 3), and we added an axoneme stain to confirm that the axoneme was extending (supplemental figure 4). Finally, we fixed some of our wording in the paper to represent our findings more accurately. Together, we hope that these revisions will address the reviewer concerns.

      Additionally, we added some data that we collected while waiting on reviews. We investigated the requirement for myosin in this pathway and include this data in the supplement.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The current manuscript by Bigge et al. demonstrated that the chemical inhibition of GSk3 causes ciliary elongation in Chlamydomonas reinhardtii. They show that lithium induced ciliary lengthening is majorly due to GSK3 inhibition. Consistent with earlier reports, they show that new protein synthesis is not required for lithium induced ciliary elongation. The authors report that targeting endocytosis either by using chemical inhibitors (dynasore and CK-666) or genetic mutants (dpr3 and Arpc4) does not cause lithium induced ciliary elongation. They further reveal enhanced actin dynamics in lithium treated cells and such activity is lost in Arpc4 mutants. Based on these results, the authors concluded that endocytic pathways may be involved in lithium induced ciliary lengthening. The results are interesting, and this work is important in understanding more about ciliary length regulation. However, more experimental evidence addressing the current interpretation that endocytic pathways may be involved in lithium induced ciliary lengthening is required.

      Major comments: 1 The authors use chemical inhibitors as major tools for their study. However, the specificity of these inhibitors is a concern. How specific are these GSK3 inhibitors such as LiCl? Can authors show that LiCl mediated ciliary lengthening is due to inhibition of GSK3? Authors used BFA and Dynasore to show that not the Golgi, but the endocytosis derived membrane is required for ciliary lengthening. Again, here the specificity of these inhibitors is a concern. Especially as Dynasore has been shown to have non-specific effects.

      We agree that the specificity of chemical inhibitors can be a concern. This is why we used 4 separate inhibitors of GSK3, each showing elongation of cilia and an increase in actin puncta (suggesting an increase in actin dynamics at the membrane). While these different inhibitors may have different off-target effects. Their intended target, GSK3, is the same, suggesting that the shared phenotype from each inhibitor is conserved. The ability of LiCl to affect GSK3 activity in Chlamydomonas was also investigated in depth with a kinase assay and a western blot in Wilson, 2004 (doi: 10.1128/EC.3.5.1307-1319.2004). To address the off-target effects of Dynasore, we employed the drp3 mutant to confirm genetically what we saw from the chemical inhibition. We also show in our previous paper that Dynasore and PitStop2 have similar effects in Chlamydomonas, both of them inhibiting the internalization of a dye-labelled membrane, suggesting that they both function to block endocytosis (doi.org/10.1101/2020.11.24.396002). While no mutant or alternative inhibitor is available to look at the effects of BFA, this inhibitor and its effects on cilia have been well-characterized in Dentler, 2013 (doi.org/10.1371/journal.pone.0053366).

      Does inducing/enhancing endocytosis independent of GSK3 by other means has any effect on ciliary length regulation?

      Our concern with the proposed experiment is that even if elongation requires endocytosis, all endocytosis might not lead to ciliary elongation when endocytosis is for other purposes. For example, endocytosis could occur for other purposes, like nutrient uptake, that will have no effect on cilia. The plasma membrane to cilium pathway may be a targeted pathway triggered by specific disruptions. Therefore, we don’t feel that the proposed experiments will add to our model.

      The major claim of this paper is that LiCl mediated ciliary lengthening is due to enhanced endocytosis. Although authors showed that inhibition of endocytosis results in reduced ciliary length, it is important to show if GSK3 inhibition by LiCl (or any other inhibitor) causes any increased cellular endocytosis? Similarly, what is the effect of GSK3 mutants on endocytosis?

      *We show an increase in actin dynamics at the membrane and actin puncta following treatment with LiCl and the other GSK3 inhibitors. We show here and in our previous paper (doi.org/10.1101/2020.11.24.396002), that these puncta are likely endocytic based on the timing of their appearance and the proteins required for puncta formation (including the Arp2/3 complex and Clathrin) (Figure 7, previous paper). We updated our latest version to reflect the data we have already collected and presented as follows: *

      “Further, they rely on proteins typically thought to be involved in endocytosis including the Arp2/3 complex and clathrin, and they form at times when it makes sense for endocytosis to be occurring, like immediately following deciliation when membrane and protein must be recruited to cilia in a timeframe too short for new protein and membrane synthesis, sorting, and trafficking (Bigge et al. 2020). Thus, we stained cells with phalloidin to visualize filamentous actin and these endocytosis-like punctate structures when cells are treated with GSK3 inhibitors.”

      A phenotypic mutant of GSK3 does not currently exist in Chlamydomonas, and methods of reliably introducing mutants in Chlamydomonas do not currently exist. Thus, we used the array of GSK3 inhibitors.

      Are these endocytic processes enhanced specifically at/or around the cilium during the ciliary lengthening process?

      *Based on our phalloidin staining data, these processes are primarily enhanced near the cilium, but puncta also exist throughout the cell. To more clearly show this and in response to a comment from reviewer 2, we added a set of images with brightfield to demonstrate where the dots are in relation to cilia. We also added arrows to the images in the figure to point out the apex of the cell as determined by the filamentous actin structures in the cells. *

      Authors claim that drp3 is a target of GSK3 and, similar to the canonical dynamin, functions in endocytosis. While, it is an important observation, experiments are required to show the role of drp3 in endocytosis and also to show that it is indeed a target of GSK3.

      To address this comment, we are employing an experiment that was designed in our previous paper (doi.org/10.1101/2020.11.24.396002, Figure 5B-E). This experiment uses a lipophilic membrane dye, FM4-46FX. The dye binds to the membrane but is unable to enter the cell alone. It is quickly endocytosed and results in vesicular-like structures within the cell. We added a panel to Figure 3 where we do this experiment in wild-type and ____drp3 mutant cells. This shows that endocytosis is affected by the mutation in DRP3. The discussion of this new data is summarized in the text as follows:

      “Additionally, we showed that this DRP is required for internalization of a lipophilic membrane dye, FM4-46FX through endocytosis. This dye binds to the membrane but is unable to enter the cells on its own and must be endocytosed. In wild-type cells it is quickly endocytosed and visible as puncta within the cell (Figure 3F, H) (Bigge et al. 2020). However, in drp3 mutants the amount of dye endocytosed is significantly lower (Figure 3G-H), suggesting that DRP3 is required for optimal endocytosis in these cells.”

      Mechanistic insights into how endocytosis/actin dynamics regulate ciliary lengthening would be interesting to see. Further, it is interesting to see if the ciliary signaling defects caused by abnormal ciliary length can be rescued by inhibition of endocytosis.

      *In our previous paper (doi.org/10.1101/2020.11.24.396002), we dive into the mechanisms tying together actin dynamics, endocytosis, and cilia. We find that Arp2/3 complex-nucleated actin networks are required for endocytosis to reclaim ciliary membrane and membrane proteins from a pool in the plasma membrane for the rapid early stages of ciliary assembly. We believe that this is a similar mechanism to what is occurring when cells elongate following lithium treatment. This is because there are several parallels in phenotypes: *

      -The Arp2/3 complex is required for both ciliary assembly (Figure 1, previous paper) and ciliary elongation resulting from lithium treatment. In the case of ciliary assembly, treating with cycloheximide to block the synthesis of new protein fully eliminates regrowth in the absence of the Arp2/3 complex, suggesting this Arp2/3 complex dependent mechanism in early ciliary assembly does not involve new protein synthesis (Figure 2, previous paper). Similarly, the process of ciliary elongation in response to lithium does not require new protein synthesis.

      *-A burst in actin dynamics/actin puncta occurs immediately following deciliation during early regrowth and during growth initiated by lithium treatment. We know these puncta are Arp2/3 complex and clathrin dependent (Figures 4 and 7, previous paper). *

      *-Both initial ciliary assembly or ciliary maintenance and elongation of cilia due to lithium treatment require endocytosis (Figures 5, 7-8, previous paper) but not require Golgi-derived membrane (Figure 3, previous paper). *

      *-Also in the previous paper, we find that this mechanism is required for the internalization and relocalization of a ciliary membrane protein for mating (Figure 6, previous paper). We also find that ciliary membrane proteins move from the plasma membrane to the cilia during ciliary assembly (Figure 7-8, previous paper). *

      *This is summarized in the text as follows: *

      *In the introduction we added: *

      “Previous data from our lab suggest that the Arp2/3 complex and actin are involved in reclaiming material from the cell body plasma membrane that is required for normal ciliary assembly (Bigge et al. 2020). We show that the Arp2/3 complex is required for the normal assembly of cilia and for endocytosis of both plasma membrane and plasma membrane proteins in various contexts. Further, we find that deciliation triggers Arp2/3 complex-dependent endocytosis by observing an increase in actin puncta immediately following deciliation (Bigge et al. 2020).”

      And in the discussion we added:

      “Previous work has shown that while the Golgi is required for ciliary maintenance and assembly (Dentler 2013), it is not the only source of membrane. Instead, we found that membrane reclaimed through actin and Arp2/3-complex dependent endocytosis is required for ciliary assembly or growth from zero length (Bigge et al. 2020). More specifically, we found that the Arp2/3 complex is required for normal ciliary maintenance and ciliary assembly, especially in the early stages when membrane and protein are needed quickly. The Arp2/3 complex is also required for the internalization of membrane and a specific ciliary membrane protein required for mating. Further, we show that endocytosis-like actin puncta form immediately following deciliation in an Arp2/3 complex and clathrin-dependent manner, and that membrane from the cell body plasma membrane can be reclaimed and incorporated into cilia (Bigge et al. 2020). This led us to question whether that same mechanism might be required for ciliary elongation from steady state length induced by lithium treatment.”

      Minor comments: 1. The paper needs a thorough proof reading as it harbors many spelling mistakes, grammatical errors, and poor sentence formation in multiple instances.

      *The paper was thoroughly read, and spelling mistakes and grammar were fixed. *

      Supplemental Figure S2A and S2B should be quoted separately from S2C and S2D.

      *This was updated in the latest version of the paper. *

      In Page 6 paragraph 2 - "authors wrote "To determine if GSK3 could be a potential kinase for this protein, we employed ScanSite4.0, which confirmed that of the 9 DRPs of Chlamydomonas, the only one with a traditional GSK3 target sequence was DRPs (Supplemental Figure 2)." No data is shown in S2 with regard to this. Either data needs to be shown or change the text in a way to avoid confusion.

      *The text was changed in a way to avoid confusion. *

      It would be nice to see if GSK3 can actually phosphorylate DRP3.

      *This would be interesting, however there is not currently a simple way to test this. There is not an antibody for DRP3 that shares enough of its immunogen sequence with the Chlamydomonas DRP3 sequence to use for a western blot. *

      The authors observe that arpc4 mutants do not form actin puncta upon LiCl treatment. Could this phenotype be rescued by complementing with WT ARPC4.

      *We showed in our previous paper (doi.org/10.1101/2020.11.24.396002) that the actin puncta could be rescued by re-expression of wild-type ARPC4 (Figure 4). *

      The concentration of inhibitors is described differently in the text and figure legends (for example Fig. 4A)

      *In the figure legend of figure 4, the concentration of 6-BIO was accidentally reported as 100 µM instead of the correct value (100 nM) as it was throughout the rest of the paper. This was addressed in the latest version. *

      The p values are not significant in some of the figures. (Fig. 4D &Fig. 5C)

      P values were provided for all comparisons in an effort to be transparent and so that readers could draw their own conclusions about the data.

      Reviewer #1 (Significance (Required)):

      The current manuscript by Bigge et al. demonstrates that endocytosis is required for GSK3 inhibition mediated ciliary lengthening. Maintenance of proper length of cilia is crucial and its dysregulation results in pathogenesis. This work takes the field forward and helps in our understanding of how ciliary length is regulated. This work is of interest to researchers working in the field of ciliary biology as well as to those working on endocytosis.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: The authors show in this study that Lithium and other GSK3-beta inhibitors induce cilia elongation in Chlamydomonas. They further demonstrate that inhibition of endocytosis by Dynasore prevents the induced elongation of cilia. They speculate that a Dynamin-related protein might be involved in this process, and determine 9 Dynamin related proteins (DRPs) in Chlamydomonas of which DRP3 shows the highest sequence similarity. Lithium-induced ciliary elongation is prevented in DRP3 mutants supporting the author's hypothesis and indicating that DRP3 might be a GSK3-beta target, similar to some animal Dynamins. Since Dynamins interact with the F-actin regulator ARP3/3-complex, and because F-actin reorganization is observed in cells after GSK3-beta inhibition, they test the induction of ciliary elongation in arpc4 mutants and after blocking the ARP-complex by CK-666. Indeed, F-actin remodeling and cilia elongation were prevented after loss of ARP-complex function. The induction of ciliary elongation and F-actin remodeling also correlates with the emergence of strong F-actin punctae in cells, and the authors interpret that as induction of Dynamin-dependent endocytosis (also addressed in a current preprint from the group). From that, the conclude that endocytosis is required for delivering membrane to the growing cilium and that this is required for the observed effects. While this claim is somewhat supported by a lack of cilia elongation inhibition after treatment to prevent protein synthesis or Golgi function, direct evidence for membrane delivery to the cilium, the need for membrane delivery for ciliary elongation, and presence of bona fide endocytotic vesicles is sadly missing. Therefore, this study sheds new light on an important process in ciliary functional regulation and also furthers our understanding on why GSK3-beta inhibition induces elongated cilia in many cell systems, but I am not convinced that the conclusions are actually supported by the data, as the two key points in question were not experimentally addressed at this point.

      Main points: 1. The authors need to demonstrate that new membrane is delivered in the process to the growing cilium. E.g. this could be done by membrane stains (pulse) and static or live-cell imaging analysis in untreated, GSK3-beta inhibitor treated and in mutants.

      *In our previous paper (doi.org/10.1101/2020.11.24.396002), we do an experiment similar to the one described here (Figure 8, previous paper). We biotinylated all surface proteins, then removed the cilia (and therefore all labelled ciliary surface proteins) and allowed them to regrow. We then isolated the new cilia and probed for biotinylated proteins because any biotinylated proteins must have come from the surface of the cell. We found that the cilia did contain membrane proteins from the surface of the cell. This experiment shows that membrane and membrane proteins derived from the plasma membrane are entering growing cilia during regeneration. We added a description of this experiment to the text as follows: *

      “Conversely, when treated with Dynasore to inhibit endocytosis, cilia could not elongate to the same degree as untreated cells (Figure 3A-B), implying endocytosis is required for lithium-induced elongation and that endocytosis requires dynamin. This is consistent with results from our previous studies which show that ciliary membrane and membrane proteins are delivered from the cell body plasma membrane to the cilia. In an experiment first performed in Dentler 2013 and then later in Bigge et al. 2020, we biotinylated all cell surface proteins. Then, deciliated cells and allowed cilia to regrow. We then isolated cilia and probed for biotinylated proteins. Any biotinylated proteins present must have come from the cell body plasma membrane, and we found that indeed biotinylated proteins exist in the newly grown cilia, suggesting that ciliary membrane and membrane proteins can be recruited from the cell body plasma membrane (Dentler 2013; Bigge et al. 2020).”

      However, this experiment cannot be done in the case of lithium because cilia are not removed meaning they already will contain labelled surface proteins. Additionally, cells do not regrow cilia in the presence of lithium, meaning that we cannot add a regeneration. Regardless, work from our previous paper described above does establish that ciliary membrane and membrane proteins are able to come from the cell body plasma membrane as the reviewer requested.

      Along the same line, the authors need to demonstrate that the punctae are truly endocytotic vesicles. For that uptake assays/stains could be used and additional markers. Furthermore, there are multiple modes of endocytosis (e.g. Clathrin) besides Dynamin. The authors should determine if blocking other modes of endocytosis has similar or divergent effects on cilia elongation.

      *In our previous paper (doi.org/10.1101/2020.11.24.396002) we supplement the actin puncta data with membrane labelling to show that the puncta are likely endocytic pits (doi.org/10.1101/2020.11.24.396002, Figure 5). We also show that the puncta require both the Arp2/3 complex and active clathrin to form, further suggesting that they are endocytic (Figure 7, previous paper). We added this to the paper as follows: *

      “Further, they rely on proteins typically thought to be involved in endocytosis including the Arp2/3 complex and clathrin, and they form at times when it makes sense for endocytosis to be occurring, like immediately following deciliation when membrane and protein must be recruited to cilia in a timeframe too short for new protein and membrane synthesis, sorting, and trafficking (Bigge et al. 2020). To provide additional evidence that these are endocytic puncta, we also showed that a corresponding increase in membrane internalization occurs during this same timeframe using a fluorescent membrane dye that is endocytosed in wild-type cells (Bigge et al. 2020).”

      Additionally, Dynamin is required for most forms of endocytosis, including clathrin mediated endocytosis. In the previous paper (doi.org/10.1101/2020.11.24.396002), which we cite here, we do a deep dive into which endocytic proteins are present in Chlamydomonas. We found that clathrin mediated endocytosis is the most highly conserved on the endocytic processes we looked at (Figure 5, previous paper).

      We did add a new figure to this paper (Figure 4) using a dye that labels membrane in lithium treated cells. This dye binds to the plasma membrane but is unable to enter cells by itself and must be endocytosed. We found that during the first 30 minutes of lithium treatment there is increased membrane dye internalization.

      No cilia are actually shown in the study. I personally, would like to see how these cilia look like, especially in relation to the sites of F-actin remodeling and punctae formation. What comes first? Please also provide a axoneme staining to confirm elongation of the ciliary core and what happens to the tubulin pool when cilia cannot elongate any more? Is it accumulating at the ciliary base?

      We added a panel demonstrating where the puncta are in relation to cilia in Figure 4 with a brightfield overlay.* We also look at the appearance and timing of these puncta more in depth in our previous paper (doi.org/10.1101/2020.11.24.396002, Figure 7). We find that puncta form immediately following deciliation and start to return to normal following about 10 minutes of regrowth. We think that this mechanism of ciliary elongation in lithium is similar to what occurs during those early steps of ciliary assembly suggesting that the dots likely form very early on. *

      We also included axoneme staining in Supplemental figure 4*. We show that the axoneme does continue to elongate with the cilia. After about 90 minutes, the cilia actually stop growing and detach from the cells (doi: 10.1128/EC.3.5.1307-1319.2004, doi: doi.org/10.1247/csf.12.369). However, we are interested in the more acute mechanisms that result in ciliary elongation. *

      The authors also claim that the method of GSK3 inhibition is not important. It would be more correct to say that the mode/drug of GSK3 inhibition is not important, but discuss how some of the minor variance between treatments could be explained (incl. the timeline and temporal dynamics of the diverging effects; and the dose-dependency as low concentrations of BIO seem to induce shortening but high doses induce elongation of cilia).

      *We further discussed this in the text as follows: *

      “The minor variances between the drugs could be explained by the timeline in which we tested cilia (90 minutes) or the exact dosages we used. An example of this is 6-BIO where treatment with a low dose of 100 nM caused ciliary lengthening, but treatment with a higher concentration of 2 µM reportedly caused ciliary shortening (Kong et al. 2015). Together, the data suggest that the mode of inhibition by chemical targets of GSK3 is not important for ciliary lengthening. Whether GSK3 was inhibited via competition for ATP binding or phosphorylation, cilia were able to elongate.”

      They propose here a positive effect of F-actin build up in cilia length regulation, while most studies to date report ciliary shortening to correlate with increased F-actin at the ciliary base. I believe that this is not highlighted and discussed enough, which I find reduces the overall quality of the paper (but is easy to improve). It might be also interesting to test if other F-actin inducers/stabiliziers have the same effect?

      *This is addressed in the discussion in the latest version in depth as follows: *

      “One important detail to point out is that Chlamydomonas differ from mammalian cells in that they have a cell wall. The stability awarded by the cell wall means that Chlamydomonas does not require a cortical actin network as mammalian cells do. Thus, in Chlamydomonas, we are able to investigate actin dynamics and functions without the interference of the cortical actin network. This also means that some of the effects we see might be masked in mammalian cells by the presence of the cortical actin network and the effect that it has on ciliary assembly and maintenance.”

      *We also added a section to the introduction to address this concern early on so that readers will have this difference in mind as they read the paper: *

      “Additionally, unlike mammalian cells, Chlamydomonas lacks a cortical actin network which simplifies the relationship between cilia and actin and makes this an ideal model to study such interactions.”

      Also, F-actin inducers/stabilizers do not typically have the same effect because the filamentous actin needed for these processes must be dynamic, or able to undergo rapid depolymerization and repolymerization as needed during this fairly quick timeframe. This is demonstrated in Avasthi, 2014 (*doi.org/10.1016/j.cub.2014.07.038). Cells were treated with several actin targeting inhibitors including LatB which results in depolymerization of filaments and Jasplakinolide which results in stabilization of filaments. In both cases, ciliary regeneration is impaired suggesting that actin must be dynamic for its functions related to cilia. *

      Minor points: 1. In many Figures, the x-axis is labeled "Number of values", but I think that maybe number of observations might be more appropriate.

      We discussed this point and decided to change the axis titles to “Number of cilia”.

      The author often use the word "normally" elongating, but in all cases the elongation is induced = abnormal situation. Maybe the authors could use a different term.

      We originally used “normally” because there are times when we get defective elongation but not no elongation. In the latest version we changed this to “elongation consistent with untreated wild-type cells” or something along those lines.

      It is puzzling as to why DRP3 was chosen, while DRP2 actually is most similar in terms of domain composition. Maybe they could discuss that. They also could explain a bit better how the mutants were generated in which a "cassette was inserted early in the gene". What kind of disruption is expected?

      DRP3 was chosen because it has the highest sequence identity (and similarity). DRP2 while containing all domains, has low overall sequence conservation. DRP3 is also the only DRP that showed a potential GSK3 target site when investigated with ScanSite4.0. This was all made clearer in the text as follows:

      “Chlamydomonas contains 9 DRPs with similarity to a canonical dynamin (DRP1-9). Despite lacking 2 of the canonical dynamin domains, the DRP with the highest sequence similarity and identity to canonical dynamin is DRP3 (Supplemental Figure 2C-D). To determine if GSK3 could be a potential kinase for this protein, we employed ScanSite4.0, which confirmed that of the 9 DRPs of Chlamydomonas, the only one with a traditional GSK3 target sequence was DRP3.”

      The representative images in Figure 4A do not really seem to match the quantifications.

      *The quantitative data suggest that these different treatments have increased dots, which we believe the representative images do show. LiCl and CHIR99021 have the most dots, while 6-BIO and Tideglusib have more dots, but less than LiCl and CHIR99021. *

      line 109: "of-targets" should be off-targets

      Fixed in the latest version, thanks for pointing this out.

      line 141: "delivery form the Golgi" should be FROM the Golgi

      Fixed in the latest version, thanks for pointing this out.

      line 160: "was DRPs" should be was DRP3

      Fixed in the latest version, thanks for pointing this out.

      line 204/205: the sentence starting "Thus, we phalloidin..." should be rephrased. It sounds not quite correct

      Fixed in the latest version, thanks for pointing this out.

      line 209: Figure 4A should refer to Figure 4B

      Fixed in the latest version, thanks for pointing this out.

      line 211: "times or rapid ciliary" should be of rapid ciliary...

      Fixed in the latest version, thanks for pointing this out.

      line 257: "in lithium." Should be in lithium treated cells Fixed in the latest version, thanks for pointing this out.

      Reviewer #2 (Significance (Required)):

      This study sheds new light on an important process in ciliary functional regulation and also furthers our understanding on why GSK3-beta inhibition induces elongated cilia in many cell systems, but I am not convinced that the conclusions are actually supported by the data, as the two key points in question were not experimentally addressed at this point.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Chlamydomonas maintains relatively regular length of cilia (flagella). However, when the cell is exposed to high concentration of lithium ions, it elongates cilia further. In this work, Bigge and Avasthi made experiments to build a potential hypothesis of molecular mechanism of this unusual cilia elongation. Their hypothesis is (1) cilia elongation is triggered, depending on supply of extra membrane (not proteins), (2) membrane is supplied from plasma membrane by clathrin-dependent endocytosis (not from Golgi), (3) this endocytosis contains Arp2/3 complex, (4) GSK3 downregulates Arp2/3 dependent endocytosis and (5) GSK3 is suppressed by lithium. They conducted well-organized experiments to prove each step. While some of them are indirect, their hypotheses were supported experimentally in outline.

      (1) is undoubted, since the authors demonstrated that inhibition of protein production by cycloheximide did not influence cilia elongation.

      (2) The authors clearly demonstrated that source of ciliary membrane for elongation is plasma membrane and not Golgi by examining specific inhibitors' effect. They also showed protein transfer from plasma membrane to cilia, by biotinylaing surface proteins in the cell, deciliating and growing cilia and detecting biotinylated proteins in cilia. This part rather characterizes initial growth of cilia, not elongation. Therefore this result must be properly described in the context of this work (which is elongation of cilia).

      This comment was particularly helpful as it also helps us address some of the comments from the other reviewers. We updated the description of this experiment in the context of this work in the latest version as follows:

      “Further, they rely on proteins typically thought to be involved in endocytosis including the Arp2/3 complex and clathrin, and they form at times when it makes sense for endocytosis to be occurring, like immediately following deciliation when membrane and protein must be recruited to cilia in a timeframe too short for new protein and membrane synthesis, sorting, and trafficking (Bigge et al. 2020). To provide additional evidence that these are endocytic puncta, we also showed that a corresponding increase in membrane internalization occurs during this same timeframe using a fluorescent membrane dye that is endocytosed in wild-type cells (Bigge et al. 2020).”

      For (3)-(4), they visualized Arp2/3 localization, showing highly condensed Arp2/3. They interpreted these particles as sign of clathrin endocytosis. Since so far such an endocytosis particle has not been reported in Chlamydomonas, the authors confirmed that DRPs are target of GSK3 to indirectly show GSK3 influences formation of endocytosis. This reviewer thinks the author should be able to directly confirm endocytosis for example by electron microscopy (of traditional epon-embedded and stained cells).

      We visualized Arp2/3 complex-dependent filamentous actin localization. We provide DRP3 as a potential target of GSK3, but do not report that it is the target that results in increased endocytosis or increased ciliary length. We agree that electron microscopy would be ideal to visualize endocytosis in these cells. However, we feel this is outside the scope of this current work. But, we do have plans to look at endocytosis in Chlamydomonas *using electron microscopy in the future and hope that the increased context from the previous data are sufficient at this time. *

      (5) was elegantly proved by multiple drugs (all known as inhibitor of GSK3), including lithium.

      After fixing these points, this manuscript will be ready for publication.

      Minor points: Line188-191: not clear. What are *** and ****?

      Fixed in the latest version, thanks for pointing this out.

      Line262-264: It would be helpful how the initial cilia growth of the arpc4 cell.

      We agree that this would be helpful information, and included more of a description of how ciliary growth is affected by loss of Arp2/3 complex function in the latest version: “Specifically, we found that the Arp2/3 complex is required for reclamation of membrane from a pool in the plasma membrane during the rapid growth that occurs during early ciliary assembly”.

      Line321: it should read as follows. Cang 2014; carlsson and Bayly 2014). While we...

      Fixed in the latest version, thanks for pointing this out.

      Line329: were -> where

      Fixed in the latest version, thanks for pointing this out.

      Line365-366: Lithium-treated cells are not motile. Any thought why? Maybe protein production is not necessary for apparent cilia elongation, but necessary for elongation of functional cilia.

      *This is an interesting idea. However, even when protein production is allowed to proceed, Lithium-treated cells are not motile. This is a ciliary dysfunction, and in fact, after about 90 minutes incubation with lithium, the cilia of these cells start to crash out or fall off, demonstrating that these are not healthy cells or healthy cilia. *

      Reviewer #3 (Significance (Required)):

      This work is an important step toward the understanding of cilia elongation and thus growth mechanism. It will attract wide audience who have interest in cell biology and motility. My expertise is about motile cilia and their 3D structure.

    1. Author Response

      Joint Public Review:

      Strengths: The study represents a step forward in relating immune responses to infection outcomes that of urgent interest to public health, especially the timing of shedding and frequency of supershedding events. Nguyen et al.'s model provides a useful framework for understanding the links between immune effectors and infection outcomes, and it can be expanded to encompass further biological complexity. The study system is a good choice, given the ubiquity of both helminth and bacterial infections, and experimental infections of rabbits provide a useful point of comparison for past work in mice.

      We appreciated these general comments.

      Limitations: The present study does not explicitly account for differences in helminth infection dynamics across the two species represented in the data nor does it include feedbacks between the bacterial and helminth infections. Nguyen et a. therefore show the limits of what can be learned from focusing on the bacterial and immune dynamics alone, and this study should serve to motivate further work that can build on this modeling approach to produce a more comprehensive view of the interactions among species infecting the same host. Future studies examining the impact of helminth infection intensity would be tremendously useful for assessing the potential of anthelminthics to reduce the prevalence of bacterial respiratory diseases. Finally, subsequent studies may need to look beyond the factors examined here to understand why shedding varies so much through time for individual hosts.

      We agree that focusing only on the bacterial infection is a limitation in this study. We followed a parsimonious approach and decided to concentrate on B. bronchiseptica shedding in the four types of infection. While we do have data on the dynamics of infection of the two helminth species, adding these data would have been an enormous amount of work and too much to present in a single paper. Yet, we have already investigated some of these bi-directional effects using the BT group (Thakar et al. 2012 Plos Comp. Biol.) and plan to keep working on these rich datasets in the future.

      We also agree that it is important to understand the rapid variation in Bordetella shedding observed, which appears to be a common feature in many other host-pathogen systems. This requires a completely new set of experiments on infection and shedding at the local tissue level.

      Specific comments

      Definition of supershedding: A major stated goal of the MS is to investigate the effect of coinfection by helminths on supershedding. In order to compare animals with different coinfections, it is therefore necessary to have a common definition of supershedding. At present, the authors use a definition that depends on which arm of the experiment the animals belong to. This complicates the analysis and clouds its interpretation.

      We value this comment and see the implication of using different datasets to quantify supershedding. To overcome this problem, we now propose a slightly different approach where we pull the four infections together and calculate a common 99th or 95th percentile threshold. This common threshold is then used to calculate the number of hosts with at least one supershedding event above this cut-off, for every type of infection. Therefore, while the threshold is the same the percentage of hosts with supershedding events varies among infection groups.

      Inconsistent approach: Within each experimental treatment, the data display variability on at least three levels: (i) within animals, day-to-day shedding displays variability on a fast timescale; (ii) within animals, infection status varies more slowly over the course of infection; (iii) between animals, there is variation in both (i) and (ii). The authors' model seems well-designed to handle this variability, but the authors are strangely inconsistent in their use of it. To be specific, to account for level (i), the authors very sensibly adopt a zero-inflated model for the shedding data, whereby the rate of shedding (colony-forming units per second, CFU/s) is assumed to arise from a mixture of a quantitative process (which we might think of as intensity of potential shedding) and an all-or-nothing process (which might arise, for example, if some discrete behavior of the animal is necessary for shedding to occur at all). The inclusion of the all-or-nothing process necessitates an additional parameter, but it allows the non-zero shedding data to inform the model. To account for level (ii), the authors use a four-dimensional deterministic dynamical system. Three of the four variables are related to the measured components of the immune response. The fourth is related to the aforementioned potential shedding. Level (iii) is accounted for using a hierarchical Bayesian approach, whereby the individual animals have parameters drawn from a common prior distribution. This approach seems very well designed to address the authors' questions using the data at hand. However, they fail to exploit this, in at least three ways. First, even though the model appears designed specifically to allow for non-shedding animals, the authors exclude animals on an ad hoc basis. Second, rather than display the shedding data in the form recommended by the model, they display log(1+CFU/sec), which is arbitrary and problematic. Its arbitrariness stems from the fact that this quantity is sensitive to the units used for shedding rate. Third, despite the fact that the model appears specifically designed to account for variability at each of the three levels, they do not give enough information to allow the reader to judge whether the model does in fact do a good job of partitioning this variability.

      Please see comments to each specific matter below.

      Exclusion of animals: In view of the fact that the model the authors describe can account for variability on all three levels, it is strange that they exclude animals that shed too little or not at all. It would be preferable were the authors to base their conclusions on all the data they collected rather than on a subset chosen a posteriori. It is true that the non-shedders will have no information about the time-course of shedding; on the other hand, including them does not complicate the analysis, and it does allow for estimation of the all-or-nothing probability in a coherent fashion. In particular, the fact that coinfection appears to have an impact on whether animals shed at all is itself directly related to the authors' central questions. More generally, ad hoc exclusion of data raises concerns about the repeatability of the experiments that, in this case, appear entirely avoidable.

      Rabbits that were infected but never shed were excluded from all our original analysis and continue to be excluded in our updated version. Our focus is on the dynamics of shedding and including animals that do not shed is not informative to our objective. Moreover, these animals do not provide meaningful information on rabbits that are infected but do not shed, since this is a very small number (n=7) to draw meaningful conclusions across four types of infection. Rabbits with three or less shedding events larger than zero (i.e. CFU/s>0) were originally excluded from the modeling and continue to be excluded. This decision was motivated by technical reasons of model convergence and our commitment to generate meaningful results; in other words, it is difficult to fit a model, and provide robust results, on a time series with only three points larger than zero, irrespective of the number of zero points in the time series.<br /> In summary our subset of animals was not chosen a posteriori but based on clear objectives (i.e. pattern of shedding between and within types of infections), a rigorous approach and reliable results. We have further clarified our approach in the Results and Material and Methods.

      Incomplete description of the analysis: The description of the statistical analysis will not be complete until sufficient information is provided to allow the interested reader to decide for him- or herself whether the conclusions are warranted and for the motivated reader to reproduce the analysis. In particular, it is necessary to specify all priors fully. At present, these are not described at all, except in vague, and even incoherent, ways. Also, it is necessary to provide details of the MCMC performed. Specifically, the authors should describe the MCMC sampler and show their MCMC convergence diagnostics. Finally, it is good practice to display both the priors and the posteriors: it is impossible to assess the posteriors without an understanding of the priors.

      We have carefully revised our approach and results and now provide a complete description of our analysis with additional/new details on Parameter calibration, Model fitting, Model validation and Model selection in Material and Methods, and Appendix (Appendix-3 and 4). Specifically, we have included all priors, along with all posteriors, for the four types of infection in Table 2. We have also explained how the MCMC simulations were performed and how model convergence diagnosis was assessed (section ‘Parameter calibration and Model fitting’). In Appendix-3 we also show the parameter MCMC trace plots for the four types of infection.

      Second, rather than display the shedding data in the form recommended by the model, they display log(1+CFU/sec), which is arbitrary and problematic. Its arbitrariness stems from the fact that this quantity is sensitive to the units used for shedding rate.

      A clear feature of our shedding data is that there is large variation in the level of shedding both within and between hosts. Because of this, data were presented as log(1+CFU/s) to reduce the skewness of the datasets, and thus the variance, and facilitate the visualization of the experimental and simulated results. The use of data in the form of CFU/s would have made the visualization much harder, especially at low shedding where a large fraction of the data come from.

      The practice of displaying the data on a log-scale is appropriate when the underlying process is exponential or when the amount of relative variation is large, including when representing rates. This practice is widely used when modeling infectious diseases and describing biomedical results. A typical example is the overdispersion of macroparasite infections in host populations, or the large variation in the size of outbreaks by microparasite infections, these data are often described on a log-scale. An example closer to our case is the study on influenza-bacteria coinfection by Smith et al. 2013 Plos Pathogens. Given the nature of our data we found that plotting the level of shedding on a log-scale was the most effective way to represent our results.

      Model adequacy: The authors' argument rests on the model's ability to adequately account for the data. The authors need to provide some evidence of this, in one form or another. Ultimately, the question is whether the data are a plausible realization of the model. The authors should show simulations from the model (including the measurement error and not merely the deterministic trajectories) and compare these simulations to the data. In particular, it seems worryingly possible that the fitted model is capable of capturing certain averages in the data while, at the same time, failing to describe the infection progression for any of the actual infected animals.

      As previously reported, we have now provided full details on model fitting and model convergence in the section ’Parameter calibration and Model fitting’ and ‘Model validation’ in Material and Methods, and ‘Model validation’ and ‘Model convergence’ in Appendix (Appendix3 and 4).

      Regarding the evidence that the data are a plausible realization of the model, we have moved the original figure S1 in the main text (now figure 5). This figure shows the good fit of the model to neutrophil, IgA and IgG, both using individual and group data from every infection. We have also revised the quality of the plot to highlight individual simulations. To avoid too much crowding the 95% CIs for every individual are not reported, however, in Appendix-1 we provide the posterior parameter estimations and their 95% CIs, for every individual and as a group average, for the three co-infections (simulations for B rabbits were performed at the group level only).

      In the new figure 6 (original figure 5), we have now included the individual trajectories (without 95% CIs to avoid overcrowding), alongside the group trends, for the neutralization rates of neutrophils, IgA and IgG which are the important parameter regulating infection and where the CIs are large enough to show the individual data. The other rates have too narrow CIs to single out individual trajectories and, thus, we only reported the group trends.

      In the revised figure 7 (original figure 6) we have revised the quality of the plots to highlight individual trajectories, in addition to the median trend, but have not included the individual 95% CIs, again to avoid overcrowding.

      Finally, the main text associated to these figures has been updated accordingly.

      Confusion of correlation and causation: At various points, the authors succumb to the temptation to interpret their model literally and to interpret the correlations they observe as evidence for a causal linkage between the three immune components they measure, bacterial shedding, and coinfection. They should be more careful and circumspect in the description of their results.

      We have thoroughly revised the presentation and discussion of the results to avoid the overinterpretation of the findings.

      Additional Issues:

      Eqs 1-4. These equations are not mechanistic in any meaningful sense. Essentially, they posit the existence of exponential time-lags between the three immunity variables, and a simple linear killing relationship between each of the variables and pathogen load. To interpret the equations literally risks making unwarranted conclusions. For example, any physiological variable correlated with any of the three variables in the model might equally well be credited with the influence on shedding attributed to IgA, IgG, or neutrophils.

      This work tests the hypothesis that neutrophils, IgA and IgG affect the dynamics of B. bronchispetica infection and, in turn, bacterial shedding. Of course, there are many other immunological mechanisms that could contribute to the pattern observed and that can be tested, as there are many other variables correlated with these dynamics that do not play any role in these patterns, as noted by the reviewer. We follow a parsimonious approach by focusing on three immune variables previously identified as important in regulating Bordetella infection. To avoid excessive complexity and allow model tractability, our informed decision was to simplify the relationship between immunity and infection, without losing the important role of the immune variables selected. Finally, by referring to previous work by others and us we do note that the immune mechanisms described can be much more complex.

      l 456. Do the authors account for the variability in time spent with plates? Implicitly, the assumption is made that the amount of time a rabbit spends with a plate, i.e., the decision as to whether to engage in a behavior that will terminate the plate interaction, is independent of everything else. This raises the question: Does the time spent per plate correlate with anything?

      We always recorded the amount of time spent with the plate, and every rabbit had a maximum interaction time of 10 minutes. Rabbits are very inquisitive and rarely we had animals that did not interact or had to remove the plate because they were chewing the media; usually animals used the entire 10 minutes. Analyses do account for the interaction time and are presented as Colony Forming Unit/second (CFU/s). As noted in the Material and Methods section ‘Observation model’: ‘The probability of having a shedding event is independent of time since inoculation, in that shedding can occur anytime during the experiment and anytime during the interaction with the petri dish”. This assumption is based on our observations of rabbit behavior during the trials.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the authors present a new technique for analysing low complexity regions (LCRs) in proteins- extended stretches of amino acids made up from a small number of distinct residue types. They validate their new approach against a single protein, compare this technique to existing methods, and go on to apply this to the proteomes of several model systems. In this work, they aim to show links between specific LCRs and biological function and subcellular location, and then study conservation in LCRs amongst higher species.

      The new method presented is straightforward and clearly described, generating comparable results with existing techniques. The technique can be easily applied to new problems and the authors have made code available.

      This paper is less successful in drawing links between their results and the importance biologically. The introduction does not clearly position this work in the context of previous literature, using relatively specialised technical terms without defining them, and leaving the reader unclear about how the results have advanced the field. In terms of their results, the authors further propose interesting links between LCRs and function. However, their analyses for these most exciting results rely heavily on UMAP visualisation and the use of tests with apparently small effect sizes. This is a weakness throughout the paper and reduces the support for strong conclusions.

      We appreciate the reviewer’s comments on our manuscript. To address comments about the clarity of the introduction and the position of our findings with respect to the rest of the field, we have made several changes to the text. We have reworked the introduction to provide a clearer view of the current state of the LCR field, and our goals for this manuscript. We also have made several changes to the beginnings and ends of several sections in the Results to explicitly state how each section and its findings help advance the goal we describe in the introduction, and the field more generally. We hope that these changes help make the flow of the paper more clear to the reader, and provide a clear connection between our work and the field.

      We address comments about the use of UMAPs and statistical tests in our responses to the specific comments below.

      Additionally, whilst the experimental work is interesting and concerns LCRs, it does not clearly fit into the rest of the body of work focused as it is on a single protein and the importance of its LCRs. It arguably serves as a validation of the method, but if that is the author's intention it needs to be made more clearly as it appears orthogonal to the overall drive of the paper.

      In response to this comment, we have made more explicit the rationale for choosing this protein at the beginning of this section, and clarify the role that these experiments play in the overall flow of the paper.

      Our intention with the experiments in Figure 2 was to highlight the utility of our approach in understanding how LCR type and copy number influence protein function. Understanding how LCR type and copy number can influence protein function is clearly outlined as a goal of the paper in the Introduction.

      In the text corresponding to Figure 2, we hypothesize how different LCR relationships may inform the function of the proteins that have them, and how each group in Figure 2A/B can be used to test these hypotheses. The global view provided by our method allows proteins to be selected on the basis of their LCR type and copy number for further study.

      To demonstrate the utility of this view, we select a key nucleolar protein with multiple copies of the same LCR type (RPA43, a subunit of RNA Pol I), and learn important features driving its higher-order assembly in vivo and in vitro. We learned that in vivo, a least two copies of RPA43’s K-rich LCRs are required for nucleolar integration, and that these K-rich LCRs are also necessary for in vitro phase separation.

      Despite this protein being a single example, we were able to gain important insights about how K-rich LCR copy number affects protein function, and that both in vitro higher order assembly and in vivo nucleolar integration can be explained by LCR copy number. We believe this opens the door to ask further questions about LCR type and copy number for other proteins using this line of reasoning.

      Overall I think the ideas presented in the work are interesting, the method is sound, but the data does not clearly support the drawing of strong conclusions. The weakness in the conclusions and the poor description of the wider background lead me to question the impact of this work on the broader field.

      For all the points where Reviewer #1 comments on the data and its conclusions, we provide explanations and additional analyses in our responses below showing that the data do indeed support our conclusions. In regards to our description of the wider background, we have reworked our introduction to more clearly link our work to the broader field, such that a more general audience can appreciate the impact of our work.

      Technical weaknesses

      In the testing of the dotplot based method, the manuscript presents a FDR rate based on a comparison between real proteome data and a null proteome. This is a sensible approach, but their choice of a uniform random distribution would be expected to mislead. This is because if the distribution is non-uniform, stretches of the most frequent amino will occur more frequently than in the uniform distribution.

      Thank you for pointing this out. The choice of null proteome was a topic of much discussion between the authors as this work was being performed. While we maintain that the uniform background is the most appropriate, the question from this reviewer and the other reviewers made us realize that a thorough explanation was warranted. For a complete explanation for our choice of this uniform null model, please see the newly added appendix section, Appendix 1.

      The authors would also like to point out that the original SEG algorithm (Wootton and Federhen, 1993) also made the intentional choice of using a uniform background model.

      More generally I think the results presented suggest that the results dotplot generates are comparable to existing methods, not better and the text would be more accurate if this conclusion was clearer, in the absence of an additional set of data that could be used as a "ground truth".

      We did not intend to make any strong claims about the relative performance of our approach vs. existing methods with regard to the sequence entropy of the called LCRs beyond them being comparable, as this was not the main focus of our paper. To clarify the text such that it reflects this, we have removed ‘or better’ from the text in this section.

      The authors draw links between protein localisation/function and LCR content. This is done through the use of UMAP visualisation and wilcoxon rank sum tests on the amino acid frequency in different localisations. This is convincing in the case of ECM data, but the arguments are substantially less clear for other localisations/functions. The UMAP graphics show generally that the specific functions are sparsely spread. Moreover when considering the sample size (in the context of the whole proteome) the p-value threshold obscures what appear to be relatively small effect sizes.

      We would first like to note that some of the amino acid frequency biases have been documented and experimentally validated by other groups, as we write and reference in the manuscript. Nonetheless, we have considered the reviewer's concerns, and upon rereading the section corresponding to Figure 3, we realize that our wording may have caused confusion in the interpretation there. In addition to clarifying this in the manuscript, we believe the following clarification may help in the interpretations drawn from that section.

      Each point in this analysis (and on the UMAP) is an LCR from a protein, and as such multiple LCRs from the same protein will appear as multiple points. This is particularly relevant for considering the interpretation of the functional/higher order assembly annotations because it is not expected that for a given protein, all of the LCRs will be directly relevant to the function/annotation. Just because proteins of an assembly are enriched for a given type of LCR does not mean that they only have that kind of LCR. In addition to the enriched LCR, they may or may not have other LCRs that play other roles.

      For example, a protein in the Nuclear Speckle may contain both an R/S-rich LCR and a Q-rich LCR. When looking at the Speckle, all of the LCRs of a protein are assigned this annotation, and so such a protein would contribute a point in the R/S region as well as elsewhere on the map. Because such "non-enriched" LCRs do not occur as frequently, and may not be relevant to Speckle function, they are sparsely spread.

      We have now changed the wording in that section of the main text to reflect that the expectation is not all LCRs mapping to a certain region, but enrichment of certain LCR compositions.

      Reviewer #3 (Public Review):

      The authors present a systematic assessment of low complexity sequences (LCRs) apply the dotplot matrix method for sequence comparison to identify low-complexity regions based on per-residue similarity. By taking the resulting self-comparison matrices and leveraging tools from image processing, the authors define LCRs based on similarity or non-similarity to one another. Taking the composition of these LCRs, the authors then compare how distinct regions of LCR sequence space compare across different proteomes.

      The paper is well-written and easy to follow, and the results are consistent with prior work. The figures and data are presented in an extremely accessible way and the conclusions seem logical and sound.

      My big picture concern stems from one that is perhaps challenging to evaluate, but it is not really clear to me exactly what we learn here. The authors do a fine job of cataloging LCRs, offer a number of anecdotal inferences and observations are made - perhaps this is sufficient in terms of novelty and interest, but if anyone takes a proteome and identifies sequences based on some set of features that sit in the tails of the feature distribution, they can similarly construct intriguing but somewhat speculative hypotheses regarding the possible origins or meaning of those features.

      The authors use the lysine-repeats as specific examples where they test a hypothesis, which is good, but the importance of lysine repeats in driving nucleolar localization is well established at this point - i.e. to me at least the bioinformatics analysis that precedes those results is unnecessary to have made the resulting prediction. Similarly, the authors find compositional biases in LCR proteins that are found in certain organelles, but those biases are also already established. These are not strictly criticisms, in that it's good that established patterns are found with this method, but I suppose my concern is that this is a lot of work that perhaps does not really push the needle particularly far.

      As an important caveat to this somewhat muted reception, I recognize that having worked on problems in this area for 10+ years I may also be displaying my own biases, and perhaps things that are "already established" warrant repeating with a new approach and a new light. As such, this particular criticism may well be one that can and should be ignored.

      We thank the reviewer for taking the time to read and give feedback for our manuscript. We respectfully disagree that our work does not push the needle particularly far.

      In the section titled ‘LCR copy number impacts protein function’, our goal is not to highlight the importance of lysines in nucleolar localization, but to provide a specific example of how studying LCR copy number, made possible by our approach, can provide specific biological insights. We first show that K-rich LCRs can mediate in vitro assembly. Moreover, we show that the copy number of K-rich LCRs is important for both higher order assembly in vitro and nucleolar localization in cells, which suggests that by mediating interactions, K-rich LCRs may contribute to the assembly of the nucleolus, and that this is related to nucleolar localization. The ability of our approach to relate previously unrelated roles of K-rich LCRs not only demonstrates the value of a unified view of LCRs but also opens the door to study LCR relationships in any context.

      Furthermore, our goal in identifying established biases in LCR composition for certain assemblies was to validate that the sequence space captures higher order assemblies which are known. In addition to known biases, we use our approach to uncover the roles of LCR biases that have not been explored (e.g. E-rich LCRs in nucleoli, see Figure 4 in revised manuscript), and discover new regions of LCR sequence space which have signatures of higher order assemblies (e.g. Teleost-specific T/H-rich LCRs). Collectively, our results show that a unified view of LCRs relates the disparate functions of LCRs.

      In response to these comments, we have added additional explanations at the end of several sections to clarify the impact of our findings in the scope of the broader field. Furthermore, as we note in our main response, we have added experimental data with new findings to address this concern.

      That overall concern notwithstanding, I had several other questions that sprung to mind.

      Dotplot matrix approach

      The authors do a fantastic job of explaining this, but I'm left wondering, if one used an algorithm like (say) SEG, defined LCRs, and then compared between LCRs based on composition, would we expect the results to be so different? i.e. the authors make a big deal about the dotplot matrix approach enabling comparison of LCR type, but, it's not clear to me that this is just because it combines a two-step operation into a one-step operation. It would be useful I think to perform a similar analysis as is done later on using SEG and ask if the same UMAP structure appears (and discuss if yes/no).

      Thank you for your thoughtful question about the differences between SEG and the dotplot matrix approach. We have tried our best to convey the advantages of the dotplot approach over SEG in the paper, but we did not focus on this for the following reasons:

      1) SEG and dotplot matrices are long-established approaches to assessing LCRs. We did not see it in the scope of our paper to compare between these when our main claim is that the approach as a whole (looking at LCR sequence, relationships, features, and functions) is what gives a broader understanding of LCRs across proteomes. The key benefits of dotplots, such as direct visual interpretation, distinguishing LCR types and copy number within a protein, are conveyed in Figure 1A-C and Figure 1 - figure supplements 1 and 4. In fact, these benefits of dotplots were acknowledged in the early SEG papers, where they recommended using dotplots to gain a prior understanding of protein sequences of interest, when it was not yet computationally feasible to analyze dotplots on the same scale as SEG (Wootton and Federhen, Methods in Enzymology, vol. 266, 1996, Pages 554-571). Thus, our focus is on the ability to utilize image processing tools to "convert" the intuition of dotplots into precise read-out of LCRs and their relationships on a multi-proteome scale. All that being said, we have considered differences between these methods as you can see from our technical considerations in part 2 below.

      2) SEG takes an approach to find LCRs irrespective of the type of LCR, primarily because SEG was originally used to mask LCR-containing regions in proteins to facilitate studies of globular domains. Because of this, the recommended usage of SEG commonly fuses nearby LCRs and designates the entire region as "low complexity". For the original purpose of SEG, this is understandable because it takes a very conservative approach to ensure that the non-low complexity regions (i.e. putative folded domains) are well-annotated. However, for the purpose of distinguishing LCR composition, this is not ideal because it is not stringent in separating LCRs that are close together, but different in composition. Fusion can be seen in the comparison of specific LCR calls of the collagen CO1A1 (Figure 1 - figure supplement 3E), where even the intermediate stringency SEG settings fuse LCR calls that the dotplot approach keeps separate. Finally, we did also try downstream UMAP analysis with LCRs called from SEG, and found that although certain aspects of the dotplot-based LCR UMAP are reflected in the SEG-based LCR UMAP, there is overall worse resolution with default settings, which is likely due to fused LCRs of different compositions. Attempting to improve resolution using more stringent settings comes at the cost of the number of LCRs assessed. We have attached this analysis to our rebuttal for the reviewer, but maintain that this comparison is not really the focus of our manuscript. We do not make strong claims about the dotplot matrices being better at calling LCRs than SEG, or any other method.

      UMAPs generated from LCRs called by SEG

      LCRs from repeat expansions

      I did not see any discussion on the role that repeat expansions can play in defining LCRs. This seems like an important area that should be considered, especially if we expect certain LCRs to appear more frequently due to a combination of slippy codons and minimal impact due to the biochemical properties of the resulting LCR. The authors pursue a (very reasonable) model in which LCRs are functional and important, but it seems the alternative (that LCRs are simply an unavoidable product of large proteomes and emerge through genetic events that are insufficiently deleterious to be selected against). Some discussion on this would be helpful. it also makes me wonder if the authors' null proteome model is the "right" model, although I would also say developing an accurate and reasonable null model that accounts for repeat expansions is beyond what I would consider the scope of this paper.

      While the role of repeat expansions in generating LCRs has been studied and discussed extensively in the LCR field, we decided to focus on the question of which LCRs exist in the proteome, and what may be the function downstream of that. The rationale for this is that while one might not expect a functional LCR to arise from repeat expansion, this argument is less of a concern in the presence of evidence that these LCRs are functional. For example, for many of these LCRs (e.g. a K-rich LCR, R/S-rich LCR, etc as in Figure 3), we know that it is sufficient for the integration of that sequence into the higher order assembly. Moreover, in more recent cases, variation of the length of an LCR was shown to have functional consequences (Basu et al., Cell, 2020), suggesting that LCR emergence through repeat expansions does not imply lack of function. Therefore, while we think the origin of a LCR is an interesting question, whether or not that LCR was gained through repeat expansions does not fall into the scope of this paper.

      In regards to repeat expansions as it pertains to our choice of null model, we reasoned that because the origin of an LCR is not necessarily coupled to its function, it would be more useful to retain LCR sequences even if they may be more likely to occur given a background proteome composition. This way, instead of being tossed based on an assumption, LCRs can be evaluated on their function through other approaches which do not assume that likelihood of occurrence inversely relates to function.

      While we maintain that the uniform background is the most appropriate, the question from this reviewer and the other reviewers made us realize that a thorough explanation was warranted for this choice of null proteome. For a complete explanation for our choice of this uniform null model, please see the newly added appendix section, Appendix 1.

      The authors would also like to point out that the original SEG algorithm (Wootton and Federhen, 1993) also made the intentional choice of using a uniform background model.

      Minor points

      Early on the authors discuss the roles of LCRs in higher-order assemblies. They then make reference to the lysine tracts as having a valence of 2 or 3. It is possibly useful to mention that valence reflects the number of simultaneous partners that a protein can interact with - while it is certainly possible that a single lysine tracts interacts with a single partner simultaneously (meaning the tract contributes a valence of 1) I don't think the authors can know that, so it may be wise to avoid specifying the specific valence.

      Thank you for pointing this out. We agree with the reviewer's interpretation and have removed our initial interpretation from the text and simply state that a copy number of at least two is required for RPA43’s integration into the nucleolus.

      The authors make reference to Q/H LCRs. Recent work from Gutiérrez et al. eLife (2022) has argued that histidine-richness in some glutamine-rich LCRs is above the number expected based on codon bias, and may reflect a mode of pH sensing. This may be worth discussing.

      We appreciate the reviewer pointing out this publication. While this manuscript wasn’t published when we wrote our paper, upon reading it we agree it has some very relevant findings. We have added a reference to this manuscript in our discussion when discussing Q/H-rich LCRs.

      Eric Ross has a number of very nice papers on this topic, but sadly I don't think any of them are cited here. On the question of LCR composition and condensate recruitment, I would recommend Boncella et al. PNAS (2020). On the question of proteome-wide LCR analysis, see Cascarina et al PLoS CompBio (2018) and Cascarina et al PLoS CompBio 2020.

      We appreciate the reviewer for noting this related body of work. We have updated the citations to include work from Eric Ross where relevant.

    2. Reviewer #3 (Public Review):

      The authors present a systematic assessment of low complexity sequences (LCRs) apply the dotplot matrix method for sequence comparison to identify low-complexity regions based on per-residue similarity. By taking the resulting self-comparison matrices and leveraging tools from image processing, the authors define LCRs based on similarity or non-similarity to one another. Taking the composition of these LCRs, the authors then compare how distinct regions of LCR sequence space compare across different proteomes.

      The paper is well-written and easy to follow, and the results are consistent with prior work. The figures and data are presented in an extremely accessible way and the conclusions seem logical and sound.

      My big picture concern stems from one that is perhaps challenging to evaluate, but it is not really clear to me exactly what we learn here. The authors do a fine job of cataloging LCRs, offer a number of anecdotal inferences and observations are made - perhaps this is sufficient in terms of novelty and interest, but if anyone takes a proteome and identifies sequences based on some set of features that sit in the tails of the feature distribution, they can similarly construct intriguing but somewhat speculative hypotheses regarding the possible origins or meaning of those features.

      The authors use the lysine-repeats as specific examples where they test a hypothesis, which is good, but the importance of lysine repeats in driving nucleolar localization is well established at this point - i.e. to me at least the bioinformatics analysis that precedes those results is unnecessary to have made the resulting prediction. Similarly, the authors find compositional biases in LCR proteins that are found in certain organelles, but those biases are also already established. These are not strictly criticisms, in that it's good that established patterns are found with this method, but I suppose my concern is that this is a lot of work that perhaps does not really push the needle particularly far.

      As an important caveat to this somewhat muted reception, I recognize that having worked on problems in this area for 10+ years I may also be displaying my own biases, and perhaps things that are "already established" warrant repeating with a new approach and a new light. As such, this particular criticism may well be one that can and should be ignored.

      That overall concern notwithstanding, I had several other questions that sprung to mind.

      Dotplot matrix approach<br /> The authors do a fantastic job of explaining this, but I'm left wondering, if one used an algorithm like (say) SEG, defined LCRs, and then compared between LCRs based on composition, would we expect the results to be so different? i.e. the authors make a big deal about the dotplot matrix approach enabling comparison of LCR type, but, it's not clear to me that this is just because it combines a two-step operation into a one-step operation. It would be useful I think to perform a similar analysis as is done later on using SEG and ask if the same UMAP structure appears (and discuss if yes/no).

      LCRs from repeat expansions<br /> I did not see any discussion on the role that repeat expansions can play in defining LCRs. This seems like an important area that should be considered, especially if we expect certain LCRs to appear more frequently due to a combination of slippy codons and minimal impact due to the biochemical properties of the resulting LCR. The authors pursue a (very reasonable) model in which LCRs are functional and important, but it seems the alternative (that LCRs are simply an unavoidable product of large proteomes and emerge through genetic events that are insufficiently deleterious to be selected against). Some discussion on this would be helpful. it also makes me wonder if the authors' null proteome model is the "right" model, although I would also say developing an accurate and reasonable null model that accounts for repeat expansions is beyond what I would consider the scope of this paper.

      Minor points<br /> Early on the authors discuss the roles of LCRs in higher-order assemblies. They then make reference to the lysine tracts as having a valence of 2 or 3. It is possibly useful to mention that valence reflects the number of simultaneous partners that a protein can interact with - while it is certainly possible that a single lysine tracts interacts with a single partner simultaneously (meaning the tract contributes a valence of 1) I don't think the authors can know that, so it may be wise to avoid specifying the specific valence.

      The authors make reference to Q/H LCRs. Recent work from Gutiérrez et al. eLife (2022) has argued that histidine-richness in some glutamine-rich LCRs is above the number expected based on codon bias, and may reflect a mode of pH sensing. This may be worth discussing.

      Eric Ross has a number of very nice papers on this topic, but sadly I don't think any of them are cited here. On the question of LCR composition and condensate recruitment, I would recommend Boncella et al. PNAS (2020). On the question of proteome-wide LCR analysis, see Cascarina et al PLoS CompBio (2018) and Cascarina et al PLoS CompBio 2020.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors sought to create a machine learning framework for analyzing video recordings of animal behavior, which is both efficient and runs in an unsupervised fashion. The authors construct Selfee from recent computational neural network codes. As the paper is methodsfocused, the key metrics for success would be (1) whether Selfee performs similarly or more accurately than existing methods, and more importantly (2) whether Selfee uncovers new behavioral features or dynamics otherwise missed by those existing methods.

      Weaknesses:

      Although the basic schematics of Selfee are laid out, and the code itself is available, I feel that material in between these two levels of description is somewhat lacking. Details of what other previously published machine learning code makes up Selfee, and how those parts work would be helpful. Some of this is in the methods section, but an expanded version aimed at a more general readership would be helpful.

      Thanks for the suggestions. We expanded the paragraphs describing training objectives and AR-HMM analysis. We also revised Figure 2C for clarity, and we have added a new figure, Figure 6, to describe how our pipeline works in detail. We also added a detailed instructions for Selfee usage on our GitHub page.

      *The paper highlights efficiency as an important aspect of machine learning analysis techniques in the introduction, but there is little follow up with this aspect.

      Our model only had a more efficient training process compared with other self-supervised learning methods. We also found our model could perform zero-shot domain transfer, so training may not even be necessary. However, we did not mean that our model was superior in terms of data efficiency or inference speed. We have revised some of the claims in the Discussion.

      *In comparing Selfee to other approaches, the paper uses DeepLabCut, but perhaps running other recent methods for more comprehensive comparison would be helpful as well.

      We compare Selfee feature extraction with features from FlyTracker or JAABA, two widely used software. We also visualized the tracking results of SLEAP and FlyTracker in complement to the DeepLabCut experiment.

      *Using Selfee to investigate courtship behavior and other interactions was nicely demonstrated. Running it on simpler data (say, videos of individual animals walking around or exploring a confined space) might more broadly establish the method's usefulness.

      We used Selfee with open field test (OFT) of mice after chronic immobilization stress (CIS) treatment. We demonstrated that our pipeline from data preprocessing to all the data mining algorisms with this experiment, and the results were added to the last section of Results.

      Reviewer #2 (Public Review):

      Jia et al. present a CNN based tool named "Selfee" for unsupervised quantification of animal behavior that could be used for objectively analyzing animal behavior recorded in relatively simple setups commonly used by various neurobiology/ethology laboratories. This work is very relevant but has some serious unresolved issues for establishing credibility of the method.

      Overall Strengths: Jia et al have leveraged a recent development "Simple Siamese CNNs" to work for behavioral segmentation. This is a terrific effort and theoretically very attractive.

      Overall Weakness: Unfortunately, the data supporting the method is not as promising. It is also riddled with incomplete information and lack of rationale behind the experiments.

      Specific points of concern:

      1) No formal comparison with pre-existing methods like JAABA which would work on similar videos as Selfee.

      We added some comparisons with JAABA and FlyTracker extracted features, and also visualized FlyTracker and SLEAP tracking results aside from DeepLabCut. This result is now in the new Table 1. To avoid tracking inaccuracy during intensive interactions and potential inappropriately tuned parameters, we used a peer-reviewed dataset focused on wing extension behavior only. Our results showed a competitive performance of Selfee as other methods.

      2) For all Drosophila behavior experiments, I'm concerned about the control and test genetic background. Several studies have reported that social behaviors like courtship and aggression are highly visual and sensitive to genetic background and presence of "white" gene. The authors use Canton S (CS) flies as control data. Whereas it is unclear if any or all of the test genotypes have been crossed into this background. It would be helpful if authors provide genotype information for test flies.

      We have added a detailed sheet about their genotype in this version. The genetic information of all animals can also be found on the Bloomington fly center by the IDs provided. In brief, five fly lines used in this work are in the CS background: CCHa2-R-RAGal4, CCHa2-R-RBGal4, Dop2RKO, DopEcRGal4 and Tdc2RO54. We did not back cross other flies into the CS background for three reasons. First, most mutant lines are compared with their appropriate control lines. For example, in the original Figure 3B (the new Figure 4B), for CCHa2-R-RBGal4 > Kir2.1 flies contained wildtype white gene, so the comparison with CS flies would not cause any problem. For TrhGal4 flies, they were in white background, and so were other lines that had no phenotype. At the same time, in the original Figure 3G to J (the new Figure 4G to J), we used w1118 as controls for TrhGal4 flies, which were all in mutated white background. Second, in the original Figure 4F and G (the new Figure 5F and G), we admitted that the comparison between NorpA36, in mutated white background, and CS flies was not very convincing. Nevertheless, the delayed dynamic of NorpA mutants was reported before, and our experiment was just a demonstration of the DTW algorithm. Lastly, our method focused on the methodology of animal behavior analysis, and original videos were provided for research replications. Therefore, even if the behavioral difference was due to genetic backgrounds, it would not affect the conclusion that our method could detect the difference

      3) Utility of "anomaly score" rests on Fig 3 data. Authors write they screened "neurotransmitter-related mutants or neuron silenced lines" (lines 251-252). Yet Figure 3B lacks some of the most commonly occurring neurotransmitter mutants/neuron labeling lines (e.g. Acetelcholine, GABA, Dopamaine, instead there are some neurotransmitter receptor lines, but then again prominent ones are missing). This reduces the credibility of this data.

      First of all, this paper did not intend to conduct new screening assays, rather we used pre-existed data in the lab to demonstrate the application of Selfee. Previous work in our lab focused on the homeostatic control of fly behaviors, so most listed lines used here were originally used to test the roles of neuropeptides or neurons nutrient and metabolism regulation, such as CCHarelated lines, a CNMa mutant, and Taotie neuron silenced flies. There were some other important genes that were not involved in this dataset. Some most common transmitters are not included for two reasons. First, common neurotransmitters usually have a very global and broad effect on animal behaviors, and even if there is any new discovery, it could be difficult to interpret the phenomenon due to a large number of disturbed neurons. Second, most mutants of those common neurotransmitters are not viable, for example, paleGal4 as a mutant for dopamine; Gad1A30 for GABA, and ChATl3 for acetylcholine. However, we did perform experiments on serotonin-related genes (SerT and Trh), octopamine-related genes (Tdc and Oamb), and some other viable dopamine receptor mutants.

      4) The utility of AR-HMM following "Selfee" analysis rests on the IR76b mutant experiment (Fig4). This is the most perplexing experiment! There are so many receptors implicated in courtship and IR76b is definitely not among the most well-known. None of the citations for IR76b in this manuscript have anything to do with detection of female pheromones. IR76b is implicated in salt and amino acid sensation. The authors still call this "an extensively studies (co)receptor that is known to detect female pheromones" (lines310-311). Unsurprisingly the AR-HMM analysis doesn't find any difference in modules related to courtship. Unless I'm mistaken the premise for this experiment is wrong and hence not much weight should be given to its results.

      We have removed the Ir76b results from the Results. The demonstration of AR-HMM was now done with a mouse open field assay.

      Reviewer #3 (Public Review):

      This paper is describing a machine learning method applied to videos of animals. The method requires very little pre-processing (end-to-end) such as image segmentation or background subtraction. The input images have three channels, mapping temporal information (liveframes). The architecture is based on tween deep neural networks (Siamese network) and does not require human annotated labels (unsupervised learning). However, labels can still be used if they are produced, as in this case, by the algorithm itself - self-supervised learning. This flavor of machine learning is reflected in the name of the method: "Selfee." The authors are convincingly applying the Selfee to several challenging animal behavior tasks which results in biologically relevant discoveries.

      A significant advantage of unsupervised and self-supervised learning is twofold: 1) it allows for discovering new behaviors, and 2) it doesn't require human-produced labels.

      In this case of self-supervised learning the features (meta-representations) are learned from two views of the same original image (live-frame), where one of the views is augmented in several different ways, with a hope to let the deep neural network (ResNet-50 architecture in this case) learn to ignore such augmentations, i.e. learn the meta-representations invariant to natural changes in the data similar to the augmentations. This is accomplished by utilizing a Siamese Convolutional Neural Network (CNN) with the ResNet-50 version as a backbone. Siamese networks are composed of tween deep nets, where each member of the pair is trying to predict the output of another. In applications such as face recognition they normally work in the supervised learning setting, by utilizing "triplets" containing "negative samples." These are the labels.

      However, in the self-supervised setting, which "Selfee" is implementing, the negative samples are not required. Instead the same image (a positive sample) is viewed twice, as described above. Here the authors use the SimSiam core architecture described by Chen, X. & He, K (reference 29 in the paper). They add Cross-Level Discrimination (CLD) to the SimSiam core. Together these two components provide two Loss functions (Loss 1 and Loss 2). Both are critical for the extraction of useful features. In fact, removing the CLD causes major deterioration of the classification performance (Figure 2-figure supplement 5).

      The authors demonstrate the utility of the Selfee by using the learned features (metarepresentations) for classification (supervised learning; with human annotation), discovering short-lasting new behaviors in flies by anomaly detection, long time-scale dynamics by ARHMM, and Dynamic Time Warping (DTW).

      For the classification the authors use k-NN (flies) and LightGBM (mice) classifiers and they infer the labels from the Selfee embedding (for each frame), and the temporal context, using the time-windows of 21 frames and 81 frames, for k-NN classification and LightGBM classification, respectively. Accounting for the temporal context is especially important in mice (LightGBM classification) so the authors add additional windowed features, including frequency information. This is a neat approach. They quantify the classification performance by confusion matrices and compute the F1 for each.

      Overall, I find these classification results compelling, but one general concern is the criticality of the CLD component for achieving any meaningful classification. I would suggest that the authors discuss in more depth why this component is so critical for the extraction of features (used in supervised classification) and compare their SimSiam architecture to other methods where the CLD component is implemented. In other words, to what degree is the SimSiam implementation an overkill? Could a simpler (and thus faster) method be used - with the CLD component - instead to achieve similar end-to-end classification? The answer would help illuminate the importance of the SimSiam architecture in Selfee.

      We added more about the contribution of the CLD loss in the last paragraph of Siamese convolutional neural networks capture discriminative representations of animal posture, the second section of Results. Further optimization of neural network architectures was discussed in the Discussion section. As for why CLD is that important, there are two main reasons. First of all, all behavior photos are so similar that it is not very easy to distinguish them from each other. In the field of so-called self-supervised learning without negative samples, researchers use either batch normalization or similar operations to implicitly utilize negative samples within a minibatch. However, when all samples are quite similar, it might not be enough. CLD uses explicit clusters to utilize negative samples within a minibatch, in the word of the authors “Our key insight is that grouping could result from not just attraction, but also common repulsion”, so that provides more powerful discrimination. The second reason is what the author argued in the CLD paper, CLD is very powerful in processing long-tailed datasets. As shown in the original Figure 2—figure supplement 5 (the new Figure 3—figure supplement 5), behavior data are highly unbalanced. As explained in the CLD paper. CLD fights against long-tailed distribution from two aspects. One is that it scales up the importance of negative samples within a mini-batch from 1/B to 1/K by k-means; another is that cluster operation could relieve the imbalance between the tail and head classes within a mini-batch. Here I quote: “While the distribution of instances in a random mini-batch is long-tailed, it would be more flattened across classes after clustering.” It was also visualized in Fig5 of the CLD paper.

      To the best of our knowledge, SimSiam is the simplest method that would work with CLD. In the original CLD paper, they combined CLD method with other popular frameworks including BYOL and Mocov2. However, those popular frameworks are more complicated than SimSiam networks. We have attempted to combine CLD with BarlowTwins but failed. As the author of CLD suggested on Github: “Hi, good to know that you are trying to combine CLD with BarLowTwins! My concern is also on the high feature dimension, which may cause the low clustering quality. Maybe it is necessary to have a projection layer to project the highdimensional feature space to a low-dimensional one.” In terms of speed, there are two major parts. For inference, only one branch is used, so the major contribution of efficiency comes from CNN backbone. In theory, light backbones like MobileNet would work, but ResNet50 is already fast enough on a model GPU. As for training, the major computational cost aside from the CNN backbone is from Siamese branches. Two branches, two times of computation. Nevertheless, CLD relied on this kind of structure, so even if the learning framework is simpler than Simsiam, it is not likely to achieve a faster training speed. As for other structures, I think this new instance learning framework (https://arxiv.org/abs/2201.10728) is possible to achieve a similar result with fewer data and in a shorter time. However, this powerful method could be used with CLD. We might try it in the future.

      One potential issue with unsupervised/self-supervised learning is that it "discovers" new classes based, not on behavioral features but rather on some other, irrelevant, properties of the video, e.g. proximity to the edges, a particular camera angle, or a distortion. In supervised learning the algorithm learns the features that are invariant to such properties, because humanmade labels are used and humans are great at finding these invariant features. The authors do mention a potential limitation, related to this issue, in the Discussion ("mode splitting"). One way of getting around this issue, other than providing negative samples, is to use a very homogeneous environment (so that only invariance to orientation, translation, etc, needs to be accomplished). This has worked nicely, for example, with posture embedding (Berman, G. J., et al; reference 19 in the manuscript). Looking at the t-SNE plots in Figure 2 one must wonder how many of the "clusters" present there are the result of such learning of irrelevant (for behavior) features, i.e. how good is the generalization of the meta-representations. The authors should explore the behaviors found in different parts of the t-SNE maps and evaluate the effect of the irrelevant features on their distributions. For example, they may ask: to what extent does the distance of an animal from the nearest wall affect the position in the t-SNE map? It would be nice to see how various simple pre-processing steps might affect the t-SNE maps, as well as the classification performance. Some form of segmentation, even very crude, or simply background subtraction, could go a very long way towards improving the features learned by Selfee.

      In the new Figure 3—figure supplement 1, the visualization demonstrates that our features contained a lot of physical information, including wing angles, animal distance and positions in the chamber. “Mode-split” can be partially explained by those features. We actually performed background subtraction and image crop for mice behaviors, where we found them useful.

      The anomaly detection is used to find unusual short-lasting events during male-male interaction behavior (Figure 3). The method is explained clearly. The results show how Selfee discovered a mutant line with a particularly high anomaly score. The authors managed to identify this behavior as "brief tussle behavior mixed with copulation attempts." The anomaly detection analyses were also applied to discover another unusual phenotype (close body contact) in another mutant line. Both results are significant when compared to the control groups.

      The authors then apply AR-HMM and DTW to study the time dynamics of courtship behavior. Here too, they discover two phenotypes with unusual courtship dynamics, one in an olfactory mutant, and another in flies where the mutation affects visual transduction. Both results are compelling.

      The authors explain their usage of DTW clearly, but they should expand the description of the AR-HMM so that the reader doesn't have to study the original sources.

      We expanded the section that talks about AR-HMM mechanisms.

    1. Author Response

      Reviewer #1 (Public Review):

      This work offers a simple explanation to a fundamental question in cell biology: what dictates the volume of a cell and of its nucleus, focusing on yeast cells. The central message is that all this can be explained by an osmotic equilibrium, using the classical Van't Hoff's Law. The novelty resides in an effort to provide actual numbers experimentally.

      In this work, Lemière and colleagues combine physical modeling and quantitative measures to establish the basic principles that dictate the volume of a cell and of its nucleus. By doing so, they also explain an observation reported many times and in many different types of cells, of a proportionality between the volume of the cell and of its nucleus. The central message is that all this can be explained by an osmotic equilibrium, using the classical Van't Hoff's Law. This is because, in yeast cells, while the cell has a wall that can contribute to the equilibrium, the nucleus does not have a lamina and there is thus no elastic contribution in the force balance for the nucleus, as the authors show very nicely experimentally, using both cells and protoplasts and measuring the cell and nucleus volume for various external osmotic pressures (the Boyle Van't Hoff Law for a perfect gas, also sometimes called the Ponder relation) ¬- this was performed before for mammalian cells (Finan et al.), as cited and commented in the discussion by the authors, showing that mammalian cells have no significant elastic wall (linear relation) while the nucleus has one (non linear relation). This is well explained by the authors in the discussion. It is one of the clearer experimental results of the article. Together, the data and model presented in this article offer a simple explanation to a fundamental question in cell biology. In this matter, the principles are indeed seemingly simple, but what really counts are the actual numbers. While this article sheds some light on this aspect, it does not totally solve the question. The experiments are very well done and quantified, but some approximations made in the modeling are questionable and should at least be discussed in more length. Overall, this article is extremely valuable in the context of the recent effort of the cell biology and biophysics communities to understand the fundamental question of what dictates the size of cells and organelles. I have a few concerns detailed below. Importantly, there are many very interesting points of the article that I am not discussing below, simply because I completely agree with them.

      1) The main concern is about the assumption made by the authors that the small osmolytes do not count to establish the volume of the nucleus. It was shown that small osmolytes such as ions are a vast majority of the osmolytes in a cell (more than ten times more abundant than proteins for example, which represent about 10 mM, for a total of 500 mM of osmolytes). This means that just a small imbalance in the amount of these between the nucleus and cytoplasm might have a much larger effect than the number of proteins, which is the osmolyte that authors choose to consider for the nuclear volume.

      The point of the authors to disregard small osmolytes is that they can freely diffuse between the cytoplasm and the nucleus through the nuclear pores. They thus consider that the nuclear volume is established thanks to the barrier function of the nuclear envelope, which would retain larger osmolytes inside the nucleus and that the rest is balanced. This reasoning is not correct: for example, the volume of charged polymers depends on the concentration of ions in the polymer while there is no membrane at all to retain them. This is because of an important principle that the authors do not include in their reasoning, which is electro-neutrality.

      Because most large molecules in the cell are charged (proteins and also DNA for the nucleus), the number of counterions is large, and is probably much larger than the number of proteins. So it is hard to argue that this could be ignored in the number of osmotically active molecules in the nucleus. This is known as the Donnan equilibrium and the question is thus whether this is actually the principle which dictates the nuclear volume.

      The question then becomes whether the number of counterions differs between the cytoplasm and the nucleus, and more precisely whether the difference is larger than the difference considered by the authors in the number of proteins.

      How is it possible to estimate this number? One of the numbers found in the literature is the electric potential across the nuclear envelope (Mazanti Physiological Reviews 2001). The number is between 1 and 10 mV, with more cations in the nucleus than in the cytoplasm. This number could correspond to much more cations than the number of proteins, although the precise number is not so simple to compute and the precision of the measure matters a lot, since there is an exponential relation between the concentrations and the potential.

      This point above is simply made to explain that the authors cannot rule out the contribution of small osmolytes to the nuclear volume and should at least leave this possibility open in the discussion of their article.

      As a conclusion, I totally agree with equation 3 which defines the N/C ratio, but I think that the Ns considered might not be the number of large macromolecules which cannot pass the nuclear envelope, but rather the small ones. Whether it is the case or not and what is actually the important species to consider depends on the actual numbers and these numbers are not established in this article. It is likely out of the scope of the article to establish them, but the point should at least be discussed and left open for future studies.

      We appreciate these excellent points made by the reviewer and their numerous consultants. We amend the discussion of colloid osmotic pressure in the text to reflect these points.

      2) The authors refer to the notion of colloidal pressure, discussed in the review by Mitchison et al. This term could be confusing and the authors should either explain it better or just not use it and call it perfect gas pressure or Van't Hoff pressure. Indeed, what is meant by colloidal pressure is simply the notion that all molecules could be considered as individual objects, independently of their size, and that it is then possible to apply the Van't Hoff Law just as it was a perfect gas, hence the notion of 'colloidal' pressure, which would be the osmotic pressure of all the individual molecules. The authors might want to discuss, or at least mention, that it is a bit surprising that all these crowded large macromolecules would behave like a perfect osmometer and that the Van't Hoff law applies to them. Alternatively, it could be simpler to consider that what actually counts for the volume is mostly small freely diffusing osmolytes, to which this law applies well, and which are much more numerous.

      3) Very small point: on page 7 the authors refer to BVH's Law (Nobel, 1969). It is not clear what they mean. If they refer to the Nobel prize of Van't Hoff, it dates from 1901 (he died in 1911) and not 1969. I am not sure if there is something in one of the Nobel prizes delivered in 1969 which relates to this law. I checked but it does not seem to be the case, so it is probably a mistake in the date.

      The citation is correct. It's a JTB paper by Park S. Nobel describing the BHV relation in biology.

      4) On page 11, bottom, the result of the maintenance of the N/C ratio in protoplast is presented as an additional result, while it is a simple consequence of the previous results: both the cell and nuclear volume change linearly with the external osmotic pressure, so it is obvious that their ratio does not change when the external pressure is changed.

      This result was not trivial. Although both cells and nuclei volume change linearly with the inverse of the external osmotic concentration in protoplasts, it was not obvious whether the two volumes change with the same proportion (ie same slope on the BVH graph).

      Another result, not commented by the authors, is that this should be true only in protoplasts, since in whole cells, the cell wall is affecting the response of the cell volume, but not the nucleus, so the ratio should change.

      In whole cells, the maintenance of the N/C ratio is in fact also maintained, consistent with the model. This result is now clarified in the manuscript (Figure 1C and D plus Figures 3D and S1C).

      5) The results in Figure 5, with the inhibition of export from the nucleus, are presented as supporting the model. It is not really clear that they do. First the effect is very small, even if very clear. Again, the numbers matter here, so the interpretation of this result is not really direct and more calculation should be made to understand whether it can really be explained by a change of number of proteins. The result in panel F is even more problematic. The authors try to argue that the nucleus transiently gets denser, based on the diffusion of the GEMs and then adapts its density. It rather seems that it is overall quite constant in density, while it is the cell which has a decreasing density ¬- maybe, as suggested by the authors, because there are less ribosomes in the cytoplasm, so protein production is reduced. This could have an indirect effect on the number of amino acids (which would then be less consumed). A recent article by Neurohr et al (Trends in cell biology, 2020) suggests that such an effect can lead to cell dilution, in yeast, because the number of amino acids increases. In this particular case, this increase would affect the nuclear volume rather than the cell volume because of the presence of the cell wall and the rather small change.

      We agree that there are different possible interpretations for these results. We have carefully reconsidered the interpretation and have rewritten the entire text for Figure 5

      6) Page 16: it seems to me that the experiments presented in the chapter lines 360 to 376, on the ribosomal subunits, simply confirm that export is impaired, and they do not really contribute to confirm the hypothesis of the authors that it is the number of proteins in the nucleus which counts.

      We agree. We highlight the ribosomal subunit proteins as they are very abundant nuclear shuttling proteins that provide a good example for the dynamics of nuclear protein accumulation.

      The next paragraph with the estimation of the number of proteins in the nucleus and cytoplasm and how they change relatively upon export inhibition also appears to mostly demonstrate that export has been inhibited.

      The authors propose to use the number they find, 8%, to compare it to the change in the N/C ratio, which is of the same order. Given how small these numbers are, and the precision of such measures, it is very hard to believe that these 8% are really precise at a level which could allow such a comparison. The authors should really estimate the precision of their measures if they want to claim that. It is more likely that what they observe is a small but significant change in both cases; a small change means it is small compared to the total, so it is a fraction of it, and it is measurable, which means it is more than just a few percent, which is usually not possible to measure. So it means that it is in the order of 10%. This is the typical value of any small but measurable change given a method for the measure which can detect changes around 10%. In conclusion, these numbers might not prove anything.

      It could also be that the numbers match not just by chance, but that the osmolyte which matters is, for this type of experiment, changing in proportion to the amount of proteins (which would be possible for counter ions for example). But determining all that requires precise calculations and additional measures. It is thus more a matter of discussion and should be left more open by the authors.

      We agree that these measurements are not so precise. We have carefully reworded this section and removed these specific comparisons.

      Reviewer #2 (Public Review):

      The goal of the paper is to test the idea that colloidal osmotic pressure controls nuclear growth as suggested by Tim Mitchison in a recent review.

      In fleshing out the idea, Lemiere and colleagues develop a simple mathematical model that focuses on the forces generated by the movement of macromolecules across the nuclear-cytoplasmic boundary, ignoring any contribution of ions or small molecules which they assume equilibrate across the nuclear envelope. In testing this model, they focus their quantitative analysis on the response of cells that lack a wall (protoplasts) to osmotic shocks and to perturbations of nuclear export, protein synthesis and symmetric cell division. They also analyse the motion of small 40nm particles to test how diffusion is affected by these perturbations in both compartments.

      Their analysis leads them to make some important observations that suggest that the system is even simpler than they might have hoped, since under the conditions tested nuclei (which lack lamins) behave as ideal osmometers. That is, the nuclei and cytoplasm grow and shrink in concert following sudden osmotic shocks. This suggests that the tension in the nuclear envelope, which gives nuclei their spherical shape, plays no role in constraining nuclear size.

      While most of the paper's claims are well supported by their data under the assumptions of the model, there are a few claims that are less convincing.

      For example, while their data are consistent with the idea that cells regulate their nuclear/cytoplasmic size ration using an adder type mechanism, in which a fix ratio of nuclear and cytoplasmic proteins are synthesised per unit time as cells grow, this has not been rigorously put to the test. In addition, while the diffusion analysis is very interesting, it does not fully support the authors' simple model linking diffusion, molecular crowding and colloidal osmotic pressure, something that could be more thoroughly discussed in the manuscript.

      We added new data showing that slowing growth rate leads to a proportionate decrease in N/C ratio correction. This strengthens this portion of the paper.

      We have added an improved discussion of the GEMs data and its limitations.

      Reviewer #3 (Public Review):

      This manuscript by Lemière and colleagues presents a view on how nuclear size is set by simple physical principles. The first part of the work describes a theoretical framework with the nucleus and the cell as two nested osmometers. Using fission yeast as a model, the authors then show that protoplasts and nuclei behave as ideal osmometers, i.e. show linear changes in volume upon change in external osmotic pressure. Consequently, the nuclear to cell volume ratio remains constant upon osmotic changes, but increases upon block of nuclear export, which leads to higher nuclear protein contents. Measurements of diffusion in the cytoplasm and nucleoplasm back these data. Finally, in the last part of the manuscript, the authors show that nuclear growth through a passive osmotic model can explain the previously described homeostasis of nuclear volume.

      The manuscript is clearly written, and the data are clean and overall solid. I very much liked the simple view on the phenomenon of constant nuclear to cytosol ratio and the mix of modelling and experiments supporting the model that nuclear size is set passively by osmotic principles.

      There are however a few points that are slightly at odds with the model and/or require further explanation to make the model compelling and discuss it in view of previous findings.

      1) Isn't the finding that diffusion rates are faster in the nucleus (line 298, Fig S4C), indicating lower crowding in the nucleus, at odds with the finding that the non-osmotic volumes are similar in the two compartments? If the nucleus is less crowded, does this not suggest a lower pressure than the cytosol? I would also like to see this finding appear in Figure 4, which only reports on the normalized diffusion rates in both nuclei and cytosol.

      We have added this figure to the main Figure 4, as requested. We agree that this raises some interesting questions. Our current interpretation is that composition of the nucleoplasm and cytoplasm are different and therefore affect GEMs diffusion and colloid osmotic pressure slightly differently.

      2) Similarly, I don't understand the observed change in diffusion rates of GEMs upon LMB treatment (Fig 5F). If the nucleus behaves as an ideal osmometer, then any change in protein density between the nucleus and the cytosol, leading to change in osmotic pressure, will lead to a change in nuclear size that should re-equilibrate the osmotic pressures between the two compartments. The prediction would thus be that, if LMB treatment does not change overall protein concentration, at equilibrium there is no change in either osmotic pressure or density as measured by GEM diffusion rates. This is indeed illustrated by the constant normalized non-osmotic volume of the nucleus after LMB treatment. Is the change in diffusion rates perhaps only transient until a new steady state is reached? Or is there a change upon total protein content in the cell after LMB treatment?

      3) In the experiments labelling proteins with FITC, are the reported values really those of protein concentrations or rather protein amounts? Isn't the enlargement of the nucleus upon LMB treatment compensating for this increase in amounts, returning the nucleus to a similar concentration as before treatment? A change in concentration is not in agreement with the reported constant non-osmotic volume of the nucleus.

      These measurements of intensity are of concentrations. We add in the text this prediction that changes in concentration will be compensated for by swelling in nuclear volume and now interpret the data in light of this prediction. We add new data that total FITC staining for protein and RNA shows no change in concentration in compartments, consistent with this model.

      4) The authors state that "a previous paper proposed a model for N/C ratio homeostasis based upon an active feedback mechanism (Cantwell and Nurse, 2019)" (lines 471-472). My understanding of this previous study is that nuclear size was proposed to be set by a limiting component, itself proportional to cell volume. No feedback was postulated. This previous model is in fact not too different from what the authors propose here, with the previously proposed limiting component now corresponding to the nuclear macromolecules that produce colloid osmotic pressure and thus set nuclear size. Though the present study goes significantly further in presenting the passive role of osmosis in setting nuclear size, it is a misrepresentation to portray this previous model as fundamentally different. Furthermore, it is not clear whether the new osmotic pressure-based model produces a better fit than the previous 'limiting component model'. Figure 7E here is very similar to Fig 4I in Cantwell and Nurse 2019, but it is difficult to judge the similarity of the fits.

      The Cantwell and Nurse paper tested two models. The first was based upon nuclear growth being a fraction of cell growth. This model is qualitatively similar to ours. However, they discarded this initial model because it fitted poorly with their data. They then went to propose a second model, which contains a critical equation in which nuclear growth rate is a function of the N/C ratio, i.e. the system is sensing the N/C ratio and adjusting nuclear growth rate as a function of the N/C ratio. In other words, this is a feedback mechanism. The Cantwell paper does not describe this "feedback" term explicitly in the text, but it is clearly present in the equations. Therefore, our model which lacks any feedback term is fundamentally different from the Cantwell limiting component model.

      We show that our model fits our data much better than the Cantwell model. We believe that the different views in these studies arise from differences in the experimental data. These differences may arise from two technical differences: 1) Their use of binning could be responsible for flattening the nuclear growth rate as a function of the nuclear volume at start. 2) Their estimates of cell and nuclear volumes using a 2D image and geometric assumptions may be less accurate than our automated 3D volume method.

      5) If nuclear size is set purely by osmotic regulation, how do you explain that mutants in membrane regulation (such as nem1 and spo7, see Kume et al 2017; or lem2, see Kume et al 2019) previously shown to have an enlarged nucleus, display increased nuclear size?

      This is an interesting question that we are currently pursuing. It is likely that these mutants affect multiple processes besides nuclear envelope expansion. For example, at least some of these mutants have altered chromatin organization could cause increase in colloid pressure. There may also be significant defects in chromosome segregation, which leads to production of different-sized nuclei with abnormal number of chromosomes. Some of the N/C ratio defects reported in these papers may arise from their 2D measurement methods, which are not accurate for misshapen nuclei. In our preliminary results, lem2 mutants do not have N/C ratio defects.

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses:

      The data presented in the first part of the study are convincing. However, it is unclear whether each step of cell elongation and alignment, cell migration, cell dedifferentiation and regenerative response, is required for fin regeneration following amputation. As indicated in the discussion, the authors cannot provide evidence for the requirement of migration or dedifferentiation for the overall success of fin regeneration. Such limitations should be more clearly stated.

      We have modified the title and abstract to avoid overstating the requirement of the particular responses to successful regeneration. Furthermore, we have stated the limitations of our study more clearly in the discussion.

      We have removed the word “requires” from the title, it now reads: Zebrafish fin regeneration involves generic and regeneration-specific osteoblast injury responses

      In the discussion we state the limitations on page 21 as follows:

      “Unfortunately, currently existing tools to block dedifferentiation are either mosaic (activation of NF- κB signalling using the Cre-lox system) or cannot be targeted to osteoblasts alone (treatment with retinoic acid). Due to these limitations in our assays, we can currently not test what consequences specific, unmitigated perturbation of osteoblast dedifferentiation has for overall fin / bone regeneration. Conversely, the interventions presented here that specifically perturb osteoblast migration are limited as they act only transiently, that is they can severely delay, but not fully block migration. Furthermore, while interference with actomyosin dynamics reduces regenerative growth, we cannot distinguish whether this is caused by the inhibition of osteoblast migration or due to other more direct effects on cell proliferation and tissue growth. Thus, an unequivocal test of the importance of osteoblast migration for bone regeneration requires different tools.”

      In the second part of the study, the term trauma needs to be clarified or reconsidered. A trauma model would imply that healing is impaired. Evidence for a non-healing phenotype is lacking and is expected in support of a trauma model.

      We apologize if our use of the term trauma has caused confusion. We have simply used it interchangeably with “injury”. We have now removed all references to “trauma” in the text.

      The authors describe the process of fin regeneration that may share common features with bone regeneration in other species. In the absence of direct evidence of common mechanisms between fin regeneration and bone regeneration in other systems, the authors should remain focused on "fin regeneration" in their conclusions rather than referring to "bone regeneration" and "bone formation" in more general terms.

      We have rephrased the conclusion to have it more centred on bone regeneration in the fin. The relevant parts of the discussion now read on page 25 as follows:

      In conclusion, our findings support a model in which zebrafish fin bone regeneration involves both generic and regeneration-specific injury responses of osteoblasts. Morphology changes and directed migration towards the injury site as well as dedifferentiation represent generic responses that occur at all injuries even if they are not followed by regenerative bone formation. While migration and dedifferentiation can be uncoupled and are (at least partially) independently regulated, they appear to be triggered by signals that emanate from all bone injuries. In contrast, migration off the bone matrix into the bone defect, formation of a population of (pre-) osteoblasts and regenerative bone formation represent regeneration-specific responses that require additional signals that are only present at distal-facing injuries. The identification of molecular determinants of the generic vs regenerative responses will be an interesting avenue for future research.

      Reviewer #2 (Public Review):

      The study by Sehring et al. depends on an extensive and thoroughly acquired collection of data points in combination with a robust and rigorous statistical analysis. I see that the authors have spent a lot of effort into this and I am overwhelmed by the number of analyzed data points that again depend on careful measurements at the cellular level in a more or less intact tissue. However, since just a fraction of cells has been chosen to be incorporated into the statistical analysis, there is a certain risk of a biased selection. I think the reader of the paper would appreciate a somewhat clearer picture of how the authors get to their final numbers, starting from the original image data. This appears of particular importance when it comes to determining the elongation of cells and the angular deviations from the proximo-distal axis. In many cases (e.g. Fig.2 A, B, D and E), the reader has to take those numbers without seeing any primary image data. A practicable solution to that issue would be to complement the accompanying Excel sheets of raw data with corresponding image material. This should show an overview of a representative sample for the dedicated experiment, together with some appropriate magnifications of analyzed cells including the axes along which those measurements have been performed. Also, it would be important to state within the methods section of the paper whether the measurements have been done manually using Fiji or whether a certain automated Fiji plug-in has been used for this part of the analysis.

      Osteoblasts line the bony hemirays on the inner and outer surface (see Figure 1A), and for quantifications of osteoblast morphology, we analysed the osteoblasts of the outer layer of one hemiray (the hemiray facing the objective in whole mount imaging). While we have no direct evidence for this, we think it is reasonable to assume that osteoblasts in the other “sister” hemiray behave the same, and we have anecdotal evidence that osteoblasts on the inner surface of the hemirays also migrate and dedifferentiate. Thus, we don’t think that restriction of the analysis to one hemiray and the outer surface introduces bias.

      For measurement of morphology, we used a transgenic line expressing a fluorescent protein (FP) in osteoblasts in combination with Zns5 antibody labelling. Zns5 is a pan-osteoblastic marker which localizes to the cell membrane. Therefore, combination of a cytosolic FP labelling with the membrane labelling by Zns5 provides solid definition of single cell outlines. For general morphology studies and drug intervention studies, we used bglap:GFP transgenics. In the transgenic intervention studies (manipulation of NF-kB signalling), mCherry is expressed together with CreERT2 under the osterix promoter and used as cytosolic labelling of osteoblasts. Our analyses are always based on segments, e.g. we present data for segments 0, -1, 2. Within these segments all FP+ Zns5+ cells were included into the analysis, and cells along the whole proximodistal axis of a segment were measured. Measurements were performed manually, and the analysist was blinded. With these set-ups, not only a fraction but all FP+ Zns5+ osteoblasts present in those segments that we analysed were included into the analysis, and thus no selection was necessary that could have introduced bias. As suggested by Reviewer #2, we have added representative sample images to the accompanying Excel sheets of raw data for the dedicated experiments. Within these, the axes along which the measurements have been performed are indicated.

      We have expanded the description of the analysis in the method section. It now reads on page 36 as follows:

      “To quantify osteoblast cell shape and orientation, the transgenic line bglap:GFP in combination with Zns5 AB labelling was used. Osteoblasts of the outer layer of one hemiray (facing the objective in whole fin mounting) were imaged and analysed. As Zns5 localizes to the plasma membrane of all osteoblasts, the combination of both markers provides solid definition of single cell outlines. All GFP+ Zns5+ cells with such a defined outline within an analysed segment were included into the analysis, and cells along the whole proximodistal axis of a segment were measured. In the transgenic intervention studies, mCherry is expressed under the osx promoter and was used as cytosolic labelling of osteoblasts. Using Fiji (Schindelin et al., 2012), the longest axis of a FP+ Zns5+ cell was measured as maximum length, the short axis as maximum width, and the ratio calculated. Simultaneously, the angle of the maximum length towards the proximodistal ray axis was measured for angular deviation. All measurements were performed manually, with the analyst being blinded.”

      Along the same line, it would strengthen the statement provided by the statistical diagram in Fig.3A if the authors could show images of cells from segment -1 and -2 for all three experimental conditions. In particular, since the depicted segment -1 osteoblasts look rather roundish than elongated (compare with Fig.1 C and D, images and width/length ratio).

      As suggested by the reviewer, we have added representative sample images of cells in segment -1 to the figure, the images that were already there in the previous version of the figure were from segment -2 (new data in Figure 4A). As legible from the graphs, there is a certain range of morphology within each segment / assay with an obvious overlap between the segments. This can make it difficult to realize the difference between the segments by looking on the images alone, and we have therefore added arrowheads to highlight examples of roundish and elongated cells. Yet as mentioned above, all cells were included into the analysis.

      In regards to the biology itself, Sehring and colleagues claim that the complement system is required for injury-induced directed osteoblast migration. To strengthen this point it would be beneficial if the authors could show that the central complement components C3 and C5 are indeed expressed at the amputation site where the dedifferentiated pre-osteoblasts migrate to. It would be interesting to learn about the localization of C3 and C5 expression in the conventional amputation as well as the double-injury condition. Apparently, the RNAscope-based in situ hybridization seems to work quite well in the Weidinger lab.

      Complement precursor proteins are thought to be mainly expressed in the liver and distributed throughout the body via the circulation. Injury would then result in local production of the activated C3a and C5a peptides via a cascade of proteolytic processing. Unfortunately, we lack the tools to detect the C3 and C5 precursor proteins or the mature cleavage products of the complement factors, which mediate the biological function of the cascade (e.g. antibodies against the zebrafish proteins / peptides). We have also attempted RNAScope for c5a and c3a.1 in fins, but these turned out to not produce any specific stainings, thus the results of these experiments remained inconclusive and we have not included them in the manuscript.

      However, we analysed expression of the RNA coding for the precursors of the complement factors c5 and the six zebrafish paralogs of c3 using qRT-PCR on liver, non-injured fins and fins at 6 hpa (samples derived from segment -1 plus segment 0). These new data can be found in Figure 5B. Compared to the expression levels in the liver, expression in non-injured fins could hardly be detected. Interestingly, c5 and c3a.5 levels were upregulated in injured fins, but compared to the expression in the liver still only slightly, e.g. c5 is about 17 Ct values (2 to the power of 17 = 130000 times) more highly expressed in the liver than in the injured fin. These results are consistent with the idea that the majority of complement factors that are activated after injury is derived from precursors that are expressed in the liver and are distributed via the circulation to the fin, as is considered standard for the complement system. Interestingly, however, local production might contribute as well.

      Overall our new data support our conclusion that the complement system is an important regulator of osteoblast migration in vivo, since the receptors are present in osteoblasts (see also response to the next issue), while systemic and local expression can provide the precursors for injury-induced production of the activated factors that might act as guidance cues.

      To judge whether this osteoblast's migratory response is cell-type specific and cell-autonomous it would be good to know if c5ar1 and c3ar are solely expressed in osteoblasts, or rather broadly within tissue lining the hemirays.

      While we had already shown that c5aR1 is expressed in osteoblasts, we have now added additional RNAscope in situ analysis for c5aR1 showing that the receptor is also expressed in other cell types (new data in Figure 5 – figure supplement 1A). We have also attempted RNAScope for c3aR in fins, which however did not produce specific staining, thus remained inconclusive; we have not added these data to the manuscript. However, we established fluorescent activated cell sorting from bglap:GFP transgenic fins, which gives us an additional tool to analyse to which extent expression is specific to osteoblasts. By qRT-PCR analysis we found that c5aR1 and c3aR are expressed in both GFP+ osteoblasts and other cells that are GFP– (these will mainly represent epidermis and fibroblasts, to a lesser extent endothelial and other cell types). These new data can be found in Figure 5 – figure supplement 1B.

      While our qRT-PCR data and the c5aR1 RNAScope results show that the complement receptors are not specifically expressed in osteoblasts, we do not consider this result to be in conflict with our model that the complement system regulates osteoblast migration. Other cell types migrate after fin amputation as well, which is best described for epidermal cells (Chen et al., Dev Cell 2016, 10.1016/j.devcel.2016.02.017), but likely also occurs for fibroblasts (Poleo et al., DevDyn 2001, doi: 10.1002/dvdy.1152), and it is conceivable that the complement system plays a role in regulating these events as well.

      Reviewer #3 (Public Review):

      Weaknesses:

      1) The major conclusions on osteoblast dedifferentiation and migration are solely based on a bglap:GFP strain, which does not allow a pulse-chase approach in injury responses. Specificity of this strain to osteoblasts is also doubtful because as many as 20% of GFP+ cells are in proliferation. Specificity of bglap:GFP to mature osteoblasts is a major concern. Important caveats associated with this reporter strain are not carefully considered.

      To address these comments, we have performed several additional experiments as described below. In addition, we would like to refer the reviewer to our previous papers, where we have analysed the process of osteoblast dedifferentiation (Knopf et al., Dev Cell 2011, doi: 10.1016/j.devcel.2011.04.014; Geurtzen et al., Development 2014, doi: 10.1242/dev.105817; Mishra et al. Dev Cell 2020, doi: 10.1016/j.devcel.2019.11.016). Using transgenic reporters and immunofluorescence we have shown in these previous papers that osteoblasts in the non-injured fin express Bglap but not the pre-osteoblast marker Runx2 (and are thus by our definition differentiated). We apologize if we failed to explain the logic of our approach in this manuscript, we have restructured the results to clarify these, as indicated below.

      We have also performed the following additional experiments.

      1) To confirm the specificity of the bglap:GFP line for mature osteoblasts, we have performed three experiments:

      a) immunofluorescence against Runx2 on 7 dpa regenerates, at a stage where blastema proliferation at the distal tip of the regenerate produces new osteoblast progenitors, while in more proximal (older) regions osteoblasts have already started to differentiate and new bone matrix has formed. We found that Runx2 is expressed in distal regions in pre-osteoblasts, while bglap:GFP is only expressed in proximal regions in osteoblasts which do not express Runx2. Thus, formation of new bony segment during regenerative growth, bglap:GFP is activated in mature osteoblasts and the population does not include osteoblast precursor cells. These new data are found in Figure 2 – figure supplement 2B.

      b) we have refined and expanded our methods and are now able to determine the expression patterns of markers of the osteoblast differentiation status with single cell resolution using RNAScope in situ hybridization. Using this, we can now show that at 1 day post amputation, in segment -2 of the fin stump, which represents a segment equivalent to the non-injured state, since no dedifferentiation occurs here, bglap:GFP+ cells do not express endogenous runx2a. These new data are found in Figure 1 – figure supplement 1A.

      c) Using RNAScope, we can show that cyp26b1, a gene associated with dedifferentiated osteoblasts, is likewise not detected in bglap:GFP+ cells in segment -2 at 1 dpa (new data in Figure 1 – figure supplement 1B).

      Together, these data confirm that the bglap:GFP line is specific for differentiated osteoblasts, and does not label osteoblast progenitors. See the response to issue 2 below for how we describe these new data in the revised version of the manuscript.

      2) Regarding the proliferation of bglap:GFP osteoblasts: In the experiment the reviewer refers to (now Figure 5 – figure supplement 3A), we make use of the persistence of the GFP protein in the bglap:GFP line to detect dedifferentiated osteoblasts. Thus, at the time of analysis, when these GFP+ cells proliferate, they are not differentiated anymore. We can show this as follows:

      Although bglap expression is downregulated during osteoblast dedifferentiation and thus also GFP levels eventually drop in the transgenic line, we can nevertheless use this line to trace osteoblasts, since GFP protein persists for up to three days in cells that shut down endogenous bglap and also bglap:GFP transgene transcription. While we have already shown this previously (Knopf et al., Dev Cell 2011, doi: 10.1016/j.devcel.2011.04.014; Geurtzen et al., Development 2014, doi: 10.1242/dev.105817; Mishra et al. Dev Cell 2020, doi: 10.1016/j.devcel.2019.11.016), we have now also used RNAScope to confirm this. We analysed the expression of GFP on protein and RNA level in the bglap:GFP line. In bglap:GFP fish, in a mature segment in non-injured fins the regions close to the joints are devoid of cells expressing GFP (Figure 1G). Yet after amputation, we observe GFP+ cells in this distal part of segment -1 (Figure 1G, D). RNAscope in situ shows that these GFP+ cells are negative for gfp RNA (new data in Figure 1D). Thus, the observed fluorescence is due to the persistence of the GFP protein and not due to a potential upregulation of the transgene (Figure 1E).

      Importantly, we have now also added data describing the proliferative state of bglap:GFP+ osteoblasts. First, in the non-injured fin, bglap:GFP+ cells are non-proliferative (new data in Figure 5 – figure supplement 2B). After amputation, proliferation can be detected in GFP+ cells at 2 dpa (Figure 5 – figure supplement 2B), and proliferation is restricted to segment -1 and segment 0 (new data in Figure 5 – figure supplement 2C). As we show in Figure 1B, at 2 dpa, dedifferentiation as defined by bglap downregulation is not complete in segment -1, rather here a mixture of cells with different bglap levels are found. We have thus combined EdU labelling with RNAscope against bglap in segment -1 to analyse to which extent bglap and EdU anticorrelate. These data show that EdU is hardly ever incorporated into cells expressing high levels of bglap, while the majority of the proliferating osteoblasts are dedifferentiated, as they express only low levels of bglap (new data in Figure 5 – figure supplement 2D). Together, these data show that mature osteoblasts are non-proliferative, and upon amputation, when they are dedifferentiated, they become proliferative. Thus, the absence of proliferation in bglap:GFP+ cells in the non-injured fin adds to the evidence that this line is specific for mature osteoblasts, but due to the persistence of the GFP protein it can be used to analyse dedifferentiated osteoblasts.

      These data are described on page 14 of the manuscript as follows:

      “In the non-injured fin, bglap:GFP+ osteoblasts are non-proliferative, but upon amputation osteoblasts proliferate at 2 dpa (Figure 5 – figure supplement 2A, B). Proliferation is restricted to segment -1 and segment 0 (Figure 5 – figure supplement 2C), and RNAscope in situ analysis of bglap expression revealed that the majority of EdU+ osteoblasts have strongly downregulated bglap (Figure 5 – figure supplement 2D). Inhibition of C5aR1 with PMX205 had no effect on osteoblast proliferation in segment -1 at 2 dpa (Figure 5 – figure supplement 3A). Furthermore, upregulation of Runx2 was not changed by PMX205 treatment (Figure 5 – figure supplement 3B), and regenerative growth was not affected in fish treated with either W54011, PMX205 or SB290157 (Figure 5 – figure supplement Figure 3C). We conclude that the complement system specifically regulates injury-induced osteoblast migration, but not osteoblast dedifferentiation or proliferation in zebrafish.”

      3) To support our conclusion that osteoblasts migrate, we performed time-lapse imaging using a transgenic line expressing the photoconvertible protein kaede in osteoblasts (entpd5:kaede). Local photoconversion of only the proximal half of a segment allowed us to trace these photoconverted osteoblasts. This revealed that converted cells appear in the distal part of the segment within 1 dpa, which can only be explained by relocation of the cells. These new data can be found in Figure 1F and they are described on page 7 of the revised manuscript as follows: To trace osteoblasts, we used the transgenic line entpd5:kaede (Geurtzen et al., 2014), in which Kaede fluorescence can be converted from green to red by UV light (Ando et al., 2002). We photoconverted osteoblasts in the proximal half of segment -1, while osteoblasts in the distal half remained green (Fig. 1F). At 1 dpa, red osteoblasts were found in the distal half (Fig. 1F), showing that photoconverted osteoblasts had relocated distally.

      2) The authors poorly define dedifferentiation. They use reduced bglap:GFP or bglap mRNA expression as a sole criterion for dedifferentiation. The authors state that NF-kB and retinoic acid can inhibit osteoblast dedifferentiation. However, this simply reflects of the well-described fact that these signals promote osteoblast differentiation.

      We define dedifferentiation as the reversion of a mature cell into an undifferentiated progenitor-like status. This involves the following characteristics: 1) the expression of markers of the differentiated state are downregulated; 2) early lineage markers are re-expressed; 3) the cells become proliferative; and 4) they have the ability to re-differentiate into mature cells. Based in this definition, the downregulation of an osteoblast-specific marker can be used as a read-out for osteoblast dedifferentiation. Bglap is an established marker for mature osteoblasts (Kaneto et al., 2016 doi.org/10.1186/s12881-016-0301-7¸ Yoshioka et al., 2021 doi: 10.1002/jbm4.10496; Kannan et al., 2020 doi: 10.1242/bio.053280; Sojan et al., 2022 doi.org/10.3389/fnut.2022.868805; Valenti et al., 2020 doi.org/10.3390/cells9081911). While we use downregulation of bglap expression as our main read-out for osteoblast dedifferentiation in our experimental interventions (actomyosin inhibition, retinoic acid treatment, complement inhibition), we have expanded our methods to characterize osteoblast dedifferentiation, and have re-arranged our manuscript to show these data in the beginning of the results.

      Already in the previous version of the manuscript we have shown that endogenous bglap is strongly expressed in segment -2, (the segment that does not respond to fin amputation and thus represents the non-injured state), while it is downregulated in a graded manner in segment -1 and segment 0 (the segments where dedifferentiation happens). We have now moved this data to the re-designed Figure 1B. In addition to bglap, we can now show that entpd5, a gene required for bone mineralization, is strongly expressed in osteoblasts of segment -2, while it is massively downregulated in segment -1 and segment 0. These new data can be found in Figure 1C. Thus, entpd5 is another differentiation marker whose loss characterizes osteoblast dedifferentiation. Importantly, we can confirm by RNAScope that the pre-osteoblast marker runx2a is absent in mature segments but is upregulated in segment 0 and segment -1 at 1 dpa (new data in Figure 1 – figure supplement 1A). Similarly, cyp26b1, an enzyme shown to regulate dedifferentiation, is upregulated in segment 0 and segment -1, but not expressed in segment -2. (new data in Figure 1 – figure supplement 1B). Furthermore, we have repeated all experiments where we have previously quantified dedifferentiation upon experimental interventions using downregulation of bglap:GFP (actomyosin inhibition, retinoic acid treatment, complement inhibition). We now can fully confirm the previous conclusions using the more rigorous quantification of dedifferentiation using RNAScope analysis of endogenous bglap levels. We have replaced all bglap:GFP data with the new bglap RNAScope data. These new data are found in Figure 3F, Figure 3 – figure supplement 1A, Figure 4B and Figure 5F.

      Overall, we support our conclusion that osteoblasts dedifferentiate by the loss of the two differentiation markers bglap and entpd5, the upregulation of the pre-osteoblast marker runx2a and the dedifferentiation-associated gene cyp26b1, and the fact that osteoblasts become proliferative. We hope that the reviewer considers this sufficient evidence.

      In mammals, the available literature relatively convincingly concludes that NF-kB signaling negatively regulates osteoblast differentiation (Yao et al., 2014, doi: 10.1002/jbmr.2108; Swarnkar et al., 2014 doi.org/10.1371/journal.pone.0091421, Chang et al., 2009, doi.org/10.1038/nm.1954). Yet in zebrafish osteoblasts, we have previously shown that NF-kB signaling is active in mature osteoblasts and needs to be downregulated for dedifferentiation to occur (Mishra et al., 2020, 10.1016/j.devcel.2019.11.016). Importantly, in our previous work we showed that at least during fin regeneration, NF-kB signalling is not involved in osteoblast differentiation (Mishra et al., 2020, 10.1016/j.devcel.2019.11.016). Specifically, osteoblasts in which Nf-kappaB signaling is enhanced or inhibited differentiate completely normally during the later stages of fin regeneration in the fin regenerate. Hence, our findings with the Nf-kappaB intervention studies done in this manuscript, where we look at osteoblasts in the stump within 1 dpa, cannot be explained by them affecting osteoblast differentiation.

      For retinoic acid signalling, multiple roles in bone development and repair have been described in mammals. For zebrafish osteoblasts, it was shown that during the outgrowth phase of bone regeneration, retinoic acid negatively regulates osteoblast differentiation in the blastema (Blum & Begemann, 2015, 10.1242/dev.120204). Yet importantly, it also negatively controls the dedifferentiation of osteoblasts in the stump right after amputation (Blum & Begemann, 2015, 10.1242/dev.120204). Thus, the effect we observe at the early timepoints we analyse in our intervention studies (retinoic acid treatment) are due to the effect on osteoblast dedifferentiation.

      We have added a short definition of dedifferentiation to the results section (page 6). There it reads as follows:

      “We have previously shown that osteoblasts dedifferentiate in response to fin amputation, that is they revert from a mature, non-proliferative state into an undifferentiated progenitor-like state, which includes loss of bglap expression and upregulation of the pre-osteoblast marker runx2 (Knopf et al., 2011; Geurtzen et al., 2014).”

      In addition, we have restructured the results to describe our use of tools and the new data on page 6 of the revised manuscript as follows:

      Using RNAScope in situ hybridization, we can now show that downregulation of bglap occurs in a graded manner and that entpd5 expression is similarly downregulated during dedifferentiation (Figure 1B, C). At 1 day post amputation (1 dpa), expression of entpd5 and bglap remains high in segment -2, but gradually decreases towards the amputation plane and is almost entirely absent from segment 0, with entpd5 downregulation being more pronounced (Figure 1B, C). While RNA expression of these genes is downregulated within hours after injury, GFP or Kaede fluorescent proteins (FPs) expressed in bglap or entpd5 reporter transgenic lines persist for up to three days, even though transgene transcription is shut down rapidly as well (Knopf et al., 2011). We can confirm these earlier findings using the more sensitive RNAScope in situs. In bglap:GFP transgenics at 2 dpa, gfp RNA and GFP protein colocalized to the same cells in segment -2, where osteoblasts do not dedifferentiate (Fig. 1D). In contrast, in the distal segment -1 GFP protein was present, but barely any gfp transcript could be detected (Fig. 1D). Thus, persistence of FPs in reporter lines can be used for short-term tracing of dedifferentiated osteoblasts (Fig. 1E). At 1 dpa, bglap:GFP+ cells upregulated expression of the pre-osteoblast marker runx2a and of cyp26b1, an enzyme involved in retinoic acid signalling (Blum and Begemann, 2015), which regulates dedifferentiation (Figure 1 – figure supplement 1A, B). Both markers were exclusively upregulated in segment -1 and segment 0 at 1 dpa, but were absent in segment -2. Together, these data show that osteoblasts in segment -1 and segment 0 lose expression of mature markers and gain expression of dedifferentiation markers.

      3) The authors do not rigorously demonstrate that mature osteoblasts indeed migrate. What they showed in this study is simply cell shape changes.

      We have the following evidence for osteoblast migration:

      1) bglap:GFP+ cells relocate from the centre of segments towards the amputation plane (after fin amputations) or towards both injuries in the hemiray model. In this revised manuscript we show that transgene expression is not upregulated in these regions, but that GFP fluorescence there must be due to relocation of cells in which GFP protein persists (new data in Figure 1D, E; see also response to “Weaknesses, issue 1” above)

      2) Using the entpd5:kaede transgenic line, which is expressed in mature osteoblasts throughout segments, we have photoconverted only the proximal half of a segment, which allowed us to trace these photoconverted osteoblasts. This revealed that converted cells appear in the distal part of the segment within 1 dpa, which can only be explained by relocation of the cells. These new data can be found in Figure 1F.

      3) Already in the previous version of the manuscript, we have performed live imaging to track single cell behaviour. Using double transgenic fish expressing both GFP and kaede in osteoblasts, we deliberately only partly converted kaedeGreen to kaedeRed, which resulted in different hues for each osteoblast. This distinct colouring facilitates observing single cells. Video 1 shows the directed movement of cell bodies relative to their surroundings within 2 hours (see also Figure 2 – figure supplement 1A).

      4) Osteoblasts display the typical cell shape changes associated with active migration (elongation along the axis of migration, extension of dynamic protrusions), data in Figure 2.

      Together, we think these are convincing data supporting the conclusion that osteoblasts actively migrate.

      4) The hemiray removal model is highly innovative, but this part of the study is not very well connected to the rest of the study.

      We have rephrased the first sentence of the hemiray paragraph to make the connection more perceptible. It now reads as follows:

      In response to fin amputation, all osteoblast injury responses occur directed towards the amputation plane, that is dedifferentiation is more pronounced distally, osteoblasts migrate distal wards and the proliferative pre-osteoblast population forms distally of the amputation plane. We wondered how osteoblasts respond to injuries that occur proximal to their location. To test this, we established a fin ray injury model featuring internal bone defects.

    1. Author Response

      Reviewer #2 (Public Review):

      Summary: This substantial collaborative effort utilized virus-based retrograde tracing from cervical, thoracic and lumbar spinal cord injection sites, tissue clearing and cutting-edge imaging to develop a supraspinal connectome or map of neurons in the brain that project to the spinal cord. The need for such a connectome-atlas resource is nicely described, and the combination of the actual data with the means to probe that data is truly outstanding.

      They then compared the connectome from intact mice to those of mice with mild, moderate and severe spinal cord injuries to reveal the neuronal populations that retain axons and synapses below the level of injury. Finally, they look for correlations between the remaining neuronal populations and functional recovery to reveal which are likely contributing to recovery and its variability after injury. Overall, they successfully achieve their primary goals with the following caveats: The injury model chosen is not the most widely employed in the field, and the anatomical assessment of the injuries is incomplete/not ideal.

      Concerns/issues:

      1) I would like to see additional discussion/rationale for the chosen injury model and how it compares to other more commonly employed animal models and clinical injuries. Please relate how what is being observed with the supraspinal connectome might be different for these other models and for clinical injuries.

      We have added text to the Results and Discussion to explain our rationale for selecting the crush injury model, and to acknowledge differences between this model and more clinically relevant contusion models. (Results: line 360-364, Discussion 608-615). We agree wholeheartedly that a critical future direction will be to deploy brain-wide quantification in contusion models, and we are currently seeking funding to obtain the needed equipment.

      2) The assessment of the thoracic injuries employed is not ideal because it provides no anatomical description of spared white matter (or numbers of spared axons) at the injury epicenter.

      We address this more fully in the related point below. Briefly, we agree with a need to improve the assessment of the lesion but are hampered by tissue availability. We are unable to assess white matter sparing but can offer quantification of the width of residual astrocyte tissue bridges in four spinal sections from each animal (new Figure 5 – figure supplement 3). As discussed below, however, we recognize the limitations of the lesion assessment and agree with the larger point that the current quantification methods do not position us to make claims about the relative efficacy of spinal injury analyses versus whole-brain sparing analyses to stratify severity or predict outcomes. Our approach should be seen as a complement, not a substitute, for existing lesion-based analyses. We have edited language throughout the manuscript to make this position clearer.

      3) Related to this, but an issue that requires separate attention is the highly variable appearance of the injury and tracer/virus injection sites, the variability in the spatial relationship with labeled neurons (lumbar) and how these differences could influence labeling, sprouting of axons of passage and interpretation of the data. In particular this is referring to the data shown in Figure 6 (and related data).

      It is true that there is some variability in the relative position of the injury and injection, a surgical reality. The degree of variability was perhaps exaggerated in the original Figure 6 (Now Figure 5), in which one image came from one of two animals in the cohort with a notably larger gap between the injury and injection. Nevertheless, this comment raises the important question of how variability in injection-to-injury distance might affect supraspinal label. First, we would emphasize the data in Figure 1 – Figure Supplement 6, in which we showed that the number of retrogradely labeled supraspinal neurons is relatively stable as injection sites are deliberately varied across the lower thoracic and lumbar cord. Indeed, the question raised here is precisely the reason we performed this early test to determine how sensitive the results might be to shifts in segmental targeting. The results indicate that retrograde labeling is fairly insensitive to L1 versus L4 targeting. As an additional check for this specific experiment we also measured the distance between the rostral spread of viral label and the caudal edge of the lesion and plotted it against the total number of retrogradely labeled neurons in the brain. If a smaller injury/injection gap favored more labeling we might expect negative correlation, but none is apparent. We conclude that although the injury/injection distance did vary in the experiment, it likely did not exert a strong influence on retrograde labeling.

      Reviewer #3 (Public Review):

      In this manuscript, Wang et al describe a series of experiments aimed at optimizing the experimental and computational approach to the detection of projection-specific neurons across the entire mouse brain. This work builds on a large body of work that has developed nuclear-fused viral labelling, next-generation fluorophores, tissue clearing, image registration, and automated cell segmentation. They apply their techniques to understand projection-specific patterns of supraspinal neurons to the cervical and lumbar spinal cord, and to reveal brain and brainstem connections that are preferentially spared or lost after spinal cord injury.

      Strengths:

      Although this work does not put forward any fundamentally new methodologies, their careful optimization of the experimental and quantification process will be appreciated by other laboratories attempting to use these types of methods. Moreover, the observations of topological arrangement of various supraspinal centres are important and I believe will be interesting to others in the field.

      The web app provided by the authors provides a nice interface for users to explore these data. I think this will be appreciated by people in the field interested in what happens to their brain or brainstem region of interest.

      Weaknesses:

      Overall the work is well done; however, some of the novelty claims should be better aligned with the experimental findings. Moreover, the statistical approaches put forward to understand the relationship between spinal cord injury severity and cell counts across the mouse brain needs to be more carefully considered.

      The authors state that they provide an experimental platform for these types of analysis to be done. My apologies if I missed it but I could not find anywhere the information on viral construct availability or code availability to reproduce the results. Certainly both of these aspects would be required for people to replicate the pipeline. Moreover, the described methodology for imaging and processing is quite sparse. While I appreciate that this information is widely provided in papers that have developed these methods, I do not think it is appropriate to claim to have provided a platform for people to enable these types of analyses without a more in-depth description of the methods. Alternatively, the authors could instead focus on how they optimized current methodologies and avoid the overstatement that this work provides a tool for users. The exception to this is of course the viral constructs, the plasmids of which should be deposited.

      We agree that we have not provided a tool per se, more of an example that could be followed. We have revised language in the abstract, introduction, and discussion to make it clear that we optimized existing methods and provide an example of how this can be done, but are not offering a “plug and play” solution to the problem of registration that would, for example, allow upload of external data. For example, in the abstract we replaced “We now provide an experimental platform” with “Here we assemble an experimental workflow.” (Line 28). The term “platform” no longer appears in the manuscript and has been replaced throughout by “example.” We how this matches the intention of the comment and are happy to revise further as needed. Note that the plasmids have been deposited to Addgene.

      It was not completely to me clear why or when the authors switch back and forth between different resolutions throughout the manuscript. In the abstract it states that 60 regions were examined, but elsewhere the number is as many as 500. My understanding is that current versions of the Allen Brain Annotation include more than 2000 regions. I think it would make things clear for the readers if a single resolution was used throughout, or at least justified narratively throughout the text to avoid confusion.

      Thank you for pointing this out. The Cellfinder application recognizes 645 discrete regions in the brain, and across all experiments we detected supraspinal nuclei in 69 of these. This number, however, includes some very fine distinctions, for example three separate subregions of vestibular nuclei, three subregions of the superior olivary complex, etc. True experts may desire this level of information, but with the goal of accessibility we find it useful to collapse closely related / adjacent regions to an umbrella term. Doing so generates a list of 25 grouped or summary regions. In the revised version we move the 69-region data completely to the supplemental data (there for the experts who wish to parse), and use the consistent 25-region system (plus cervical spinal cord in later sections) to present data in the main figures. We have added text to the Results section (lines 157-162) to clarify this grouping system.

      The others provide an interesting analysis of the difference between cervical and lumbar projections. I think this might be one of the more interesting aspects of the paper - yet I found myself a bit confused by the analysis, and whether any of the differences observed were robust. Just prior to this experiment the authors provide a comparison of the mScarlet vs. the mGL, and demonstrate that mGL may label more cells. Yet, in the cervical vs. lumbar analysis it appears they are being treated 1 to 1. Moreover, I could not find any actual statistical analysis of this data? My impression would be that given the potential difference in labelling efficiency between the mScarlet and mGL this should be done using some kind of count analysis that takes into account the overall number of neurons labelled, such as a Chi-sq test or perhaps something more sophisticated. Then, with this kind of statistical analysis in place, do any of the discussed differences hold up? If not, I do not think this would detract from the interesting topological observations - but would call on the authors to be a bit more conservative about their statements and discussion regarding differences in the proportions of neurons projecting to certain supraspinal centers.

      This is an important point. In response to this input and related comments from other reviewers we performed new experiments to assess co-localization. The new data address the point above by including quantification of the degree of colocalization that results from titer-matched co-injection of the two fluorophores, providing baseline data. The results of this can be found in Figure 3 – figure supplement 3 and form the basis for statistical comparisons to experimental animals shown in Figure 3.

      Finally, I do have some concerns about the author's use of linear regression in their analysis of brain regions after varying severities of SCI. First of all, the BMS score is notoriously non-linear. Despite wide use of linear regressions in the field to attempt to associate various outcomes to these kinds of ordinal measures, this is not appropriate. Some have suggested a rank conversion of the BMS prior to linear analyses, but even this comes with its own problems. Ultimately, the authors have here 2-3 clear cohorts of behavioral scores and drawing a linear regression between these is unlikely to be robustly informative. Moreover, it is unclear whether the authors properly adjusted their p-values from running these regressions on 60 (600?) regions. Finally, the statement in the abstract and discussion that the authors "explain more variability" compared to typical lesion severity analysis is also unsupported. My suggestion would be the following:

      Remove the linear regression analyses associated with BMS. I do not think these add value to the paper, and if anything provide a large window of false interpretation due to a violation of the assumptions of this test.

      Consider adding a more appropriate statistical analysis of the brain regions, such as a non-parametric group analysis. Knowing which brain regions are severity dependent, and which ones are not, would already be an interesting finding. This finding would not be confounded by any attempt to link it to crude measures of behavior.

      We agree that the linear regression approach was flawed and appreciate the opportunity to correct it. After consultation with two groups of statisticians we were forced to conclude that the data are simply underpowered for mixed model and ranking approaches. We therefore adopted a much simpler strategy. As you point out (and as noted by the statisticians), the behavioral data are bimodal; one group of animals regained plantar stepping ability, albeit with varying degrees of coordination (BMS 6-8), while the others showed at most rare plantar steps (BMS 0-3.5). We therefore asked whether the number of spared neurons in each brain region differed between the two groups and also examined the degree of “overlap” in the sparing values between the two groups. The data are now presented in Figure 6.

      If the authors would like to state anything about 'explaining more variability' then the proper statistical analysis should be used, which in this case would be to compare the models using a LRT or equivalent. However, as I mentioned it does not seem to be appropriate to be doing this with linear models so the authors should consider a non-linear equivalent if they choose to proceed with this.

      We thank the reviewer for the excellent suggestion. However as we explained above after consultation with two groups of statisticians we were forced to conclude that the data are underpowered and could not apply some of the methods suggested. Especially in light of our simplified analysis, we think it is better to remove any claims of the relative success of the sparing in different regions to explain more or less variability. Instead we can simply report that sparing in some regions, but not others, is significantly different between “low-performing” and “high-performing” groups.

    1. We are not sorry for him—we learn that, not to be sorry for the dead. But for ourselves? This terror is always so fresh, so unexampled.

      This is quite a bold statement, especially for an opening paragraph. It makes the reader stop and think, potentially reflecting on their own life. It also allows us to connect with the narrator as they think about the terror they may have experienced in their own lives.

    1. Author Response

      Reviewer #1 (Public Review):

      Using Tet-off system, Kir2.1 was expressed (or not) during the key time of callosal development from E15 to P15. Restoring activity either by adding Dox during a critical period from P6 to P15 or using DREADDs from P10-14 could rescue the callosal projection to the cortex, whereas later restoration of activity (with Dox) was not successful. Did this successful rescue lead to normal activity? Calcium imaging in animals with Kir2.1 had low levels of any kind of activity, both highly correlated and low correlation, but P6-13 dox treatment partially restored only low-correlation activity and not high correlation activity at P13. The effects of DREADDs on activity was not similarly measured though it was effective for at least partially restoring the callosal projection.

      Overall this study builds on earlier findings regarding the importance of neuronal activity in the formation of a normal callosal projection, using in utero electroporation which is particularly well suited for this subject. It makes the case very compellingly that near-normal callosal connectivity can be produced if activity is permitted during a critical period window from P6 or P10 to P15, though the exact timing of this window is imprecise because the elimination of Kir expression was not systematically quantified. For transmembrane proteins like channels it can often take many days for protein expression to completely abate.

      We thank the reviewer for their positive evaluation and the constructive comments. Based on the comment on Kir expression, we conducted new experiments using pTRE-Tight2Kir2.1EGFP, with which EGFP signals reflect localization of over-expressed Kir2.1, and examined when the expression of Kir2.1EGFP went down after Dox treatment at P6. At P6 (before Dox treatment), the signals of Kir2.1EGFP (stained with anti-GFP antibody) were observed in the periphery of the soma and along dendrites, implying that Kir2.1EGFP was transported to the cellular membrane. At P10 and P15 (4 days and 9 days after Dox treatment), Kir2.1EGFP signals were not observed in the periphery of the soma and along dendrites. We noted that low-level green signals were observed in the central part of the cell body. These may stem from low-level expression of Kir2.1EGFP in nuclei or cytosol even after Dox treatment. Alternatively, and more likely, these may reflect bleed-through of RFP signals into GFP channel. Overall, we confirmed that Kir2.1 proteins that were localized to the cellular membrane were largely down-regulated. We described these observations in detail in the figure legend of Figure 1-figure supplement 3, and added the result as Figure 1-figure supplement 3.

      I found the quantification of the callosal projection to be rather minimal and the normalization approach not entirely transparent. For example does activity from P10-15 restore the full normal PATTERN of callosal connectivity or merely the density of input overall?

      We thank the reviewer for this comment. Based on the comment, we added analyses of the pattern of callosal projections; the width of callosal axon innervation zone in layers 2/3 and 5, and densitometric line scans across all cortical layers. Our original quantification showed that the density of callosal axons reaching their target layer (i.e. cortical layer 2/3) is almost recovered in P6-P15 DOX condition (Fig1B-D), but new analyses suggest some aspects of callosal axon projections (the width of the innervation zone in layer 2/3 and 5 (Figure 1-figure supplement 4A,B), and lamina specific innervation pattern (Figure 1-figure supplement 4C)) might be only partially recovered. We have added these new results as Figure 1-figure supplement 4. In future study, we would like to assess the effect of the manipulations at finer resolution by 3D morphological reconstruction of axons of individual neurons.

      Also in the discussion it would be nice to more clearly establish whether activity is thought to be maintaining a projection already formed by P10 or permitting the emergence of such a pattern.

      Thank you for the suggestion. We have added thorough discussions about this point as follows. Page 7, lines 198-208:

      “In the previous study, we showed that callosal axons could reach the innervation area almost normally under activity-reduction, and that the effects of activity-reduction became apparent afterwards (Mizuno et al., 2007). Callosal axons elaborate their branches extensively in P10P15 (Mizuno et al., 2010), and axon branching is regulated by neuronal activity (Matsumoto and Yamamoto, 2016). It is likely that activity is required for the processes of formation, rather than the maintenance of the connections already formed by P10, but the current study employed massive labeling of callosal axons which is not suited to clarify this. In addition, the restoration of activity in the Tet-off (Figure 1) or DREADD (Figure 2) experiment may not completely rescue the ramification pattern of individual axons. Single axon tracing experiments (Mizuno et al., 2010; Dhande et al., 2011) would be required to clarify this. Nonetheless, our findings suggest that callosal axons retain the ability, or are permitted, to grow and make region- and lamina-specific projections in the cortex during a limited period of postnatal cortical development under an activity-dependent mechanism.”

      The calcium imaging is a valuable validation of the Kir expression approach, but it the study here appears to overinterpret what may simply be an intermediate level of activity restoration rather than a specific restoration of L events, as it seems that L events would be the most likely to occur under conditions of reduced overall activity. One possibility is that the absence of H events at P13 in the calcium is due to residual Kir expression creating a drag on high level network activation rather than any more complicated change in patterned spontaneous activity/connectivity. The conclusions from this study regarding the permissive role of activity during a critical window and the lack of a requirement for highly correlated activity are valuable, even if somewhat imprecise on both counts. The authors should probably refrain from use of the term patterned activity given that this was measured but not systematically compared to unpatterned spontaneous activity.

      We thank the reviewer for this constructive comment. Based on this comment, we removed the term “patterned activity” throughout the manuscript and revised the title, abstract, introduction, results, and discussion extensively. For example, in the Discussion, we revised as follows.

      “We have shown that the projections could be established even without fully restoring highly synchronous activity (Figure 4). L events, but not H events, were present in P13 cortex after Dox treatment at P6. L events may be sufficient for the formation of callosal projections. Alternatively, any form of activity with certain level(s) (i.e., “sufficiently” high activity with no specific pattern) could be permissive for the formation of callosal connections.”

      Reviewer #2 (Public Review):

      Tezuka et al. use in vivo manipulations of spontaneous activity to identify the activitydependent mechanisms of callosal projection development. Previous research of the authors' and other labs had shown that overexpressing the potassium channel Kir2.1, which reduces activity levels in the developing cortical network, blocks the formation of callosal connections almost entirely.

      The current manuscript corroborates and extends these previous discoveries by:

      1) Demonstrating that the effect of Kir overexpression can be rescued by pharmacogenetic network activation using DREADDs.<br /> 2) Revealing the requirement of network activity for the development of callosal projections during a particular developmental time window and by<br /> 3) Directly relating perturbed callosal development to the actual changes in activity patterns caused by the experimental manipulations.

      Thus, this paper is important for our understanding of the role of neuronal activity in the development of long-range connections in the brain. In addition it provides strong evidence for a role of specific activity patterns in this process.

      In general, the approach is very straightforward and the results clearly interpreted. Nevertheless, there are a few points to consider.

      We thank the reviewer for these positive and supportive comments.

      1) It is not clear in which cortical area(s) the in vivo 2-photon recordings were performed and in how far cortical areas that actually receive/send callosal projections were included or not in the analysis.

      In response to this comment, we revised the text in the method section as follows.

      “We aimed to record spontaneous neuronal activity in putative binocular zones in V1 (2.5 mm lateral of midline and 1 mm anterior of the posterior suture). Since the boundaries between V1 and higher visual areas, AL/LM are not as obvious as those in adult, our recordings likely contained juxtaposed lateral monocular V1 and AL/LM as well.”

      Based on our colleaguesʼ unpublished observations, V1 and AL/LM can be distinguished solely by spontaneous activity patterns even before eye-opening. They also found frequencies of spontaneous activity are similar across mono/binocular regions of V1 and AL/LM (Murakami, Ohki, et al. unpublished). Thus, our results should hold even with the variability in recording sites.

      2) It is not discussed what the duration of the CNO effect is. Do daily injections rescue activity patterns for 24 hours or a significant proportion of this period?

      In response to this critical comment, we revised the text in the method section as follows.

      “A previous study showed that an intraperitoneally injected CNO was effective (in terms of increasing activity) for about 9hrs (Alexander et al., 2009). The “partial rescue” effect we observed (Figure 2) may suggest that activity was not fully restored during 24hrs by our daily CNO injections.”

      Reviewer #3 (Public Review):

      The manuscript by Tezuka adds to an emerging story about the role of activity in the formation of callosal connections across the brain. Here, the authors show that they can use a TET system to switch off the activity of an exogenous potassium channel, in order to probe when activity might be necessary or sufficient for the formation of callosal connections. The authors find that artificial restoration of activity with DREADS is sufficient to rescue the formation of callosal connections, and that there is a critical period (somewhere between P5-P15) where activity must occur in order for the connections to form within the cortex. Finally, the authors show that when the potassium channel is removed during the critical period, the cortex exhibits activity, but few highly synchronous events. These results indicate that it is activity in general and not specifically highly synchronous activity that is necessary for the final innervation of the callosal cortex.

      In general, the study is well done, and the writeup is polished, well summarized. The figures are solid. There are only a few criticisms/suggestions.

      We thank the reviewer for the positive evaluation.

      Major issue: Have the authors demonstrated a requirement for "patterned spontaneous activity"?

      The authors claim variously in the abstract ("a distinct pattern of spontaneous activity") and in the results (pg 6, "our observations indicate that patterned spontaneous activity") and discussion (pg 6, "we demonstrated that patterned spontaneous activity") that it is "patterned" spontaneous activity that is key for the formation of callosal connections. However, when I was reading the paper, I came to the opposite conclusion: that any sufficiently high spontaneous activity is sufficient for the formation of these connections.

      The authors showed that relieving the KIR expression from P5-15 allows the connections to form; however, in Figure 4, the authors show that the nature of the activity produced in the cortex (in terms of mixtures of H and L events) is very different. Nevertheless, the connections can form. Further, the authors showed that increasing activity when KIR is expressed using DREADS restores the connections. The pattern of activity produced by this DREADS + KIR expression is likely to be very different from the pattern of activity of a typically-developing animal. In total, I thought that the authors demonstrated, quite nicely, that it is just the presence of sufficient activity that is key to the innervation of the contralateral cortex. (It's not cell autonomous, as the authors showed before; there seems to be a "sufficient activity" requirement).

      Therefore, I think the authors should remove references to the requirement of patterned activity and instead say something about sufficiently high activity (or some characterization that the authors choose). I think they've shown quite nicely that a specific pattern of the spontaneous activity is not important.

      We thank the reviewer for this very important insight and interpretation. After considering all the currently presented data again, we have come to agree with the interpretation stated by the reviewer. We removed the term “patterned activity” throughout the manuscript and revised the title, abstract, introduction, results, and discussion extensively. Nevertheless, we would not completely discard the possibility that specific patterns of spontaneous activity, such as L-events, could potentially have some active contribution to the development of projection circuits, and would like to further address this in future study.

      For example, in the Discussion, we revised the text as follows.

      “We have shown that the projections could be established even without fully restoring highly synchronous activity (Figure 4). L events, but not H events, were present in P13 cortex after Dox treatment at P6. L events may be sufficient for the formation of callosal projections. Alternatively, any form of activity with certain level(s) (i.e., “sufficiently” high activity with no specific pattern) could be permissive for the formation of callosal connections.”

    1. Author Response

      Reviewer #1 (Public Review):

      In computational modeling studies of behavioral data using reinforcement learning models, it has been implicitly assumed that parameter estimates generalize across tasks (generalizability) and that each parameter reflects a single cognitive function (interpretability). In this study, the authors examined the validity of these assumptions through a detailed analysis of experimental data across multiple tasks and age groups. The results showed that some parameters generalize across tasks, while others do not, and that interpretability is not sufficient for some parameters, suggesting that the interpretation of parameters needs to take into account the context of the task. Some researchers may have doubted the validity of these assumptions, but to my knowledge, no study has explicitly examined their validity. Therefore, I believe this research will make an important contribution to researchers who use computational modeling. In order to clarify the significance of this research, I would like the authors to consider the following points.

      1) Effects of model misspecification

      In general, model parameter estimates are influenced by model misspecification. Specifically, if components of the true process are not included in the model, the estimates of other parameters may be biased. The authors mentioned a little about model misspecification in the Discussion section, but they do not mention the possibility that the results of this study itself may be affected by it. I think this point should be discussed carefully.

      The authors stated that they used state-of-the-art RL models, but this does not necessarily mean that the models are correctly specified. For example, it is known that if there is history dependence in the choice itself and it is not modeled properly, the learning rates depending on valence of outcomes (alpha+, alpha-) are subject to biases (Katahira, 2018, J Math Pscyhol). In the authors' study, the effect of one previous choice was included in the model as choice persistence, p. However, it has been pointed out that not including the effect of a choice made more than two trials ago in the model can also cause bias (Katahira, 2018). The authors showed taht the learning rate for positive RPE, alpha+ was inconsistent across tasks. But since choice persistence was included only in Task B, it is possible that the bias of alpha+ was different between tasks due to individual differences in choice persistence, and thus did not generalize.

      However, I do not believe that it is necessary to perform a new analysis using the model described above. As for extending the model, I don't think it is possible to include all combinations of possible components. As is often said, every model is wrong, and only to varying degrees. What I would like to encourage the authors to do is to discuss such issues and then consider their position on the use of the present model. Even if the estimation results of this model are affected by misspecification, it is a fact that such a model is used in practice, and I think it is worthwhile to discuss the nature of the parameter estimates.

      We thank the reviewer for this thoughtful question, and have added the following paragraph to the discussion section that is aims to address it:

      “Another concern relates to potential model misspecification and its effects on model parameter estimates: If components of the true data-generating process are not included in a model (i.e., a model is misspecified), estimates of existing model parameters may be biased. For example, if choices have an outcome-independent history dependence that is not modeled properly, learning rate parameters have shown to be biased [63]. Indeed, we found that learning rate parameters were inconsistent across the tasks in our study, and two of our models (A and C) did not model history dependence in choice, while the third (model B) only included the effect of one previous choice (persistence parameter), but no multi-trial dependencies. It is hence possible that the differences in learning rate parameters between tasks were caused by differences in the bias induced by misspecification of history dependence, rather than a lack of generalization. Though pressing, however, this issue is difficult to resolve in practicality, because it is impossible to include all combinations of possible parameters in all computational models, i.e., to exhaustively search the space of possible models ("Every model is wrong, but to varying degrees"). Furthermore, even though our models were likely affected by some degree of misspecification, the research community is currently using models of this kind. Our study therefore sheds light on generalizability and interpretability in a realistic setting, which likely includes models with varying degrees of misspecification. Lastly, our models were fitted using robust computational tools and achieved good behavioral recovery (Fig. D.7), which also reduces the likelihood of model misspecification.“

      2) Issue of reliability of parameter estimates

      I think it is important to consider not only the bias in the parameter estimates, but also the issue of reliability, i.e., how stable the estimates will be when the same task is repeated with the same individual. For the task used in this study, has test-retest reliability been examined in previous studies? I think that parameters with low reliability will inevitably have low generalizability to other tasks. In this study, the use of three tasks seems to have addressed this issue without explicitly considering the reliability, but I would like the author to discuss this issue explicitly.

      We thank the reviewer for this useful comment, and have added the following paragraph to the discussion section to address it:

      “Furthermore, parameter generalizability is naturally bounded by parameter reliability, i.e., the stability of parameter estimates when participants perform the same task twice (test-retest reliability) or when estimating parameters from different subsets of the same dataset (split-half reliability). The reliability of RL models has recently become the focus of several parallel investigations [...], some employing very similar tasks to ours [...]. The investigations collectively suggest that excellent reliability can often be achieved with the right methods, most notably by using hierarchical model fitting. Reliability might still differ between tasks or models, potentially being lower for learning rates than other RL parameters [...], and differing between tasks (e.g., compare [...] to [...]). In this study, we used hierarchical fitting for tasks A and B and assessed a range of qualitative and quantitative measures of model fit for each task [...], boosting our confidence in high reliability of our parameter estimates, and the conclusion that the lack of between-task parameter correlations was not due to a lack of parameter reliability, but a lack of generalizability. This conclusion is further supported by the fact that larger between-task parameter correlations (r>0.5) than those observed in humans were attainable---using the same methods---in a simulated dataset with perfect generalization.“

      3) About PCA

      In this paper, principal component analysis (PCA) is used to extract common components from the parameter estimates and behavioral features across tasks. When performing PCA, were each parameter estimate and behavioral feature standardized so that the variance would be 1? There was no mention about this. It seems that otherwise the principal components would be loaded toward the features with larger variance. In addition, Moutoussis et al. (Neuron, 2021, 109 (12), 2025-2040) conducted a similar analysis of behavioral parameters of various decision-making tasks, but they used factor analysis instead of PCA. Although the authors briefly mentioned factor analysis, it would be better if they also mentioned the reason why they used PCA instead of factor analysis, which can consider unique variances.

      To answer the reviewer's first question: We indeed standardized all features before performing the PCA. Apologies for missing to include this information - we have now added a corresponding sentence to the methods sections.

      We also thank the reviewer for the mentioned reference, which is very relevant to our findings and can help explain the roles of different PCs. Like in our study, Moutoussis et al. found a first PC that captured variability in task performance, and subsequent PCs that captured task contrasts. We added the following paragraph to our manuscript:

      “PC1 therefore captured a range of "good", task-engaged behaviors, likely related to the construct of "decision acuity" [...]. Like our PC1, decision acuity was the first component of a factor analysis (variant of PCA) conducted on 32 decision-making measures on 830 young people, and separated good and bad performance indices. Decision acuity reflects generic decision-making ability, and predicted mental health factors, was reflected in resting-state functional connectivity, but was distinct from IQ [...].”

      To answer the reviewer's question about PCA versus FA, both approaches are relatively similar conceptually, and oftentimes share the majority of the analysis pipeline in practice. The main difference is that PCA breaks up the existing variance in a dataset in a new way (based on PCs rather than the original data features), whereas FA aims to identify an underlying model of latent factors that explain the observable features. This means that PCs are linear combinations of the original data features, whereas Factors are latent factors that give rise to the observable features of the dataset with some noise, i.e., including an additional error term.

      However, in practice, both methods share the majority of computation in the way they are implemented in most standard statistical packages: FA is usually performed by conducting a PCA and then rotating the resulting solution, most commonly using the Varimax rotation, which maximizes the variance between features loadings on each factor in order to make the result more interpretable, and thereby foregoing the optimal solution that has been achieved by the PCA (which lack the error term). Maximum variance in feature loadings means that as many features as possible will have loadings close to 0 and 1 on each factor, reducing the number of features that need to be taken into account when interpreting this factor. Most relevant in our situation is that PCA is usually a special case of FA, with the only difference that the solution is not rotated for maximum interpretability. (Note that this rotation can be minor if feature loadings already show large variance in the PCA solution.)

      To determine how much our results would change in practice if we used FA instead of PCA, we repeated the analysis using FA. Both are shown side-by-side below, and the results are quite similar:

      We therefore conclude that our specific results are robust to the choice of method used, and that there is reason to believe that our PC1 is related to Moutoussis et al.’s F1 despite the differences in method.

      Reviewer #2 (Public Review):

      I am enthusiastic about the comprehensive approach, the thorough analysis, and the intriguing findings. This work makes a timely contribution to the field and warrants a wider discussion in the community about how computational methods are deployed and interpreted. The paper is also a great and rare example of how much can be learned from going beyond a meta-analytic approach to systematically collect data that assess commonly held assumptions in the field, in this case in a large data-driven study across multiple tasks. My only criticism is that at times, the paper misses opportunities to be more constructive in pinning down exactly why authors observe inconsistencies in parameter fits and interpretation. And the somewhat pessimistic outlook relies on some results that are, in my view at least, somewhat expected based on what we know about human RL. Below I summarize the major ways in which the paper's conclusions could be strengthened.

      One key point the authors make concerns the generalizability of absolute vs. relative parameter values. It seems that at least in the parameter space defined by +LRs and exploration/noise (which are known to be mathematically coupled), subjects clustered similarly for tasks A and C. In other words, as the authors state, "both learning rate and inverse temperature generalized in terms of the relationships they captured between participants". This struck me as a more positive and important result than it was made out to be in the paper, for several reasons:

      • As authors point out in the discussion, a large literature on variable LRs has shown that people adapt their learning rates trial-by-trial to the reward function of the environment; given this, and given that all models tested in this work have fixed learning rates, while the three tasks vary on the reward function, the comparison of absolute values seems a bit like a red-herring.

      We thank the reviewers for this recommendation and have reworked the paper substantially to address the issue. We have modified the highlights, abstract, introduction, discussion, conclusion, and relevant parts of the results section to provide equal weight to the successes and failures of generalization.

      Highlights:

      ● “RL decision noise/exploration parameters generalize in terms of between-participant variation, showing similar age trajectories across tasks.”

      ● “These findings are in accordance with previous claims about the developmental trajectory of decision noise/exploration parameters.”

      Abstract:

      ● “We found that some parameters (exploration / decision noise) showed significant generalization: they followed similar developmental trajectories, and were reciprocally predictive between tasks.“

      The introduction now introduces different potential outcomes of our study with more equal weight:

      “Computational modeling enables researchers to condense rich behavioral datasets into simple, falsifiable models (e.g., RL) and fitted model parameters (e.g., learning rate, decision temperature) [...]. These models and parameters are often interpreted as a reflection of ("window into") cognitive and/or neural processes, with the ability to dissect these processes into specific, unique components, and to measure participants' inherent characteristics along these components.

      For example, RL models have been praised for their ability to separate the decision making process into value updating and choice selection stages, allowing for the separate investigation of each dimension. Crucially, many current research practices are firmly based on these (often implicit) assumptions, which give rise to the expectation that parameters have a task- and model-independent interpretation and will seamlessly generalize between studies. However, there is growing---though indirect---evidence that these assumptions might not (or not always) be valid.

      The following section lays out existing evidence in favor and in opposition of model generalizability and interpretability. Building on our previous opinion piece, which---based on a review of published studies---argued that there is less evidence for model generalizability and interpretability than expected based on current research practices [...], this study seeks to directly address the matter empirically.”

      We now also provide more even evidence for both potential outcomes:

      “Many current research practices are implicitly based on the interpretability and generalizability of computational model parameters (despite the fact that many researchers explicitly distance themselves from these assumptions). For our purposes, we define a model variable (e.g., fitted parameter, reward-prediction error) as generalizable if it is consistent across uses, such that a person would be characterized with the same values independent of the specific model or task used to estimate the variable. Generalizability is a consequence of the assumption that parameters are intrinsic to participants rather than task dependent (e.g., a high learning rate is a personal characteristic that might reflect an individual's unique brain structure). One example of our implicit assumptions about generalizability is the fact that we often directly compare model parameters between studies---e.g., comparing our findings related to learning-rate parameters to a previous study's findings related to learning-rate parameters. Note that such a comparison is only valid if parameters capture the same underlying constructs across studies, tasks, and model variations, i.e., if parameters generalize. The literature has implicitly equated parameters in this way in review articles [...], meta-analyses [...], and also most empirical papers, by relating parameter-specific findings across studies. We also implicitly evoke parameter generalizability when we study task-independent empirical parameter priors [...], or task-independent parameter relationships (e.g., interplay between different kinds of learning rates [...]), because we presuppose that parameter settings are inherent to participants, rather than task specific.

      We define a model variable as interpretable if it isolates specific and unique cognitive elements, and/or is implemented in separable and unique neural substrates. Interpretability follows from the assumption that the decomposition of behavior into model parameters "carves cognition at its joints", and provides fundamental, meaningful, and factual components (e.g., separating value updating from decision making). We implicitly invoke interpretability when we tie model variables to neural substrates in a task-general way (e.g., reward prediction errors to dopamine function [...]), or when we use parameters as markers of psychiatric conditions (e.g., working-memory parameter and schizophrenia [...]). Interpretability is also required when we relate abstract parameters to aspects of real-world decision making [...], and generally, when we assume that model variables are particularly "theoretically meaningful" [...].

      However, in midst the growing recognition of computational modeling, the focus has also shifted toward inconsistencies and apparent contradictions in the emerging literature, which are becoming apparent in cognitive [...], developmental [...], clinical [...], and neuroscience studies [...], and have recently become the focus of targeted investigations [...]. For example, some developmental studies have shown that learning rates increased with age [...], whereas others have shown that they decrease [...]. Yet others have reported U-shaped trajectories with either peaks [...] or troughs [...] during adolescence, or stability within this age range [...] (for a comprehensive review, see [...]; for specific examples, see [...]). This is just one striking example of inconsistencies in the cognitive modeling literature, and many more exist [...]. These inconsistencies could signify that computational modeling is fundamentally flawed or inappropriate to answer our research questions. Alternatively, inconsistencies could signify that the method is valid, but our current implementations are inappropriate [...]. However, we hypothesize that inconsistencies can also arise for a third reason: Even if both method and implementation are appropriate, inconsistencies like the ones above are expected---and not a sign of failure---if implicit assumptions of generalizability and interpretability are not always valid. For example, model parameters might be more context-dependent and less person-specific that we often appreciate [...]“

      In the results section, we now highlight findings more that are compatible with generalization: “For α+, adding task as a predictor did not improve model fit, suggesting that α+ showed similar age trajectories across tasks (Table 2). Indeed, α+ showed a linear increase that tapered off with age in all tasks (linear increase: task A: β = 0.33, p < 0.001; task B: β = 0.052, p < 0.001; task C: β = 0.28, p < 0.001; quadratic modulation: task A: β = −0.007, p < 0.001; task B: β = −0.001, p < 0.001; task C: β = −0.006, p < 0.001). For noise/exploration and Forgetting parameters, adding task as a predictor also did not improve model fit (Table 2), suggesting similar age trajectories across tasks.”

      “For both α+ and noise/exploration parameters, task A predicted tasks B and C, and tasks B and C predicted task A, but tasks B and C did not predict each other (Table 4; Fig. 2D), reminiscent of the correlation results that suggested successful generalization (section 2.1.2).”

      “Noise/exploration and α+ showed similar age trajectories (Fig. 2C) in tasks that were sufficiently similar (Fig. 2D).” And with respect to our simulation analysis (for details, see next section):

      “These results show that our method reliably detected parameter generalization in a dataset that exhibited generalization. ”

      We also now provide more nuance in our discussion of the findings:

      “Both generalizability [...] and interpretability (i.e., the inherent "meaningfulness" of parameters) [...] have been explicitly stated as advantages of computational modeling, and many implicit research practices (e.g., comparing parameter-specific findings between studies) showcase our conviction in them [...]. However, RL model generalizability and interpretability has so far eluded investigation, and growing inconsistencies in the literature potentially cast doubt on these assumptions. It is hence unclear whether, to what degree, and under which circumstances we should assume generalizability and interpretability. Our developmental, within-participant study revealed a nuanced picture: Generalizability and interpretability differed from each other, between parameters, and between tasks.”

      “Exploration/noise parameters showed considerable generalizability in the form of correlated variance and age trajectories. Furthermore, the decline in exploration/noise we observed between ages 8-17 was consistent with previous studies [13, 66, 67], revealing consistency across tasks, models, and research groups that supports the generalizability of exploration / noise parameters. However, for 2/3 pairs of tasks, the degree of generalization was significantly below the level of generalization expected for perfect generalization. Interpretability of exploration / noise parameters was mixed: Despite evidence for specificity in some cases (overlap in parameter variance between tasks), it was missing in others (lack of overlap), and crucially, parameters lacked distinctiveness (substantial overlap in variance with other parameters).”

      “Taken together, our study confirms the patterns of generalizable exploration/noise parameters and task-specific learning rate parameters that are emerging from the literature [13].”

      • Regarding the relative inferred values, it's unclear how high we really expect correlations between the same parameter across tasks to be. E.g., if we take Task A and make a second, hypothetical, Task B by varying one feature at a time (say, stochasticity in reward function), how correlated are the fitted LRs going to be? Given the different sources of noise in the generative model of each task and in participant behavior, it is hard to know whether a correlation coefficient of 0.2 is "good enough" generalizability.

      We thank the reviewer for this excellent suggestion, which we think helped answer a central question that our previous analyses had failed to address, and also provided answers to several other concerns raised by both reviewers in other section. We have conducted these additional analyses as suggested, simulating artificial behavioral data for each task, fitting these data using the models used in humans, repeating the analyses performed on humans on the new fitted parameters, and using bootstrapping to statistically compare humans to the hence obtained ceiling of generalization. We have added the following section to our paper, which describes the results in detail:

      “Our analyses so far suggest that some parameters did not generalize between tasks, given differences in age trajectories (section 2.1.3) and a lack of mutual prediction (section 2.1.4). However, the lack of correspondence could also arise due to other factors, including behavioral noise, noise in parameter fitting, and parameter trade-offs within tasks. To rule these out, we next established the ceiling of generalizability attainable using our method.

      We established the ceiling in the following way: We first created a dataset with perfect generalizability, simulating behavior from agents that use the same parameters across all tasks (suppl. Appendix E). We then fitted this dataset in the same way as the human dataset (e.g., using the same models), and performed the same analyses on the fitted parameters, including an assessment of age trajectories (suppl. Table E.8) and prediction between tasks (suppl. Tables E.9, E.10, and E.11). These results provide the practical ceiling of generalizability. We then compared the human results to this ceiling to ensure that the apparent lack of generalization was valid (significant difference between humans and ceiling), and not in accordance with generalization (lack of difference between humans and ceiling).

      Whereas humans had shown divergent trajectories for parameter alpha- (Fig. 2B; Table 1), the simulated agents did not show task differences for alpha- or any other parameter (suppl. Fig E.8B; suppl. Table E.8, even when controlling for age (suppl. Tables E.9 and E.10), as expected from a dataset of generalizing agents. Furthermore, the same parameters were predictive between tasks in all cases (suppl. Table E.11). These results show that our method reliably detected parameter generalization in a dataset that exhibited generalization.

      Lastly, we established whether the degree of generalization in humans was significantly different from agents. To this aim, we calculated the Spearman correlations between each pair of tasks for each parameter, for both humans (section 2.1.2; suppl. Fig. H.9) and agents, and compared both using bootstrapped confidence intervals (suppl. Appendix E). Human parameter correlations were significantly below the ceiling for all parameters except alpha+ (A vs B) and epsilon / 1/beta (A vs C; suppl. Fig. E.8C). This suggests that humans were within the range of maximally detectable generalization in two cases, but showed less-than-perfect generalization between other task combinations and for parameters Forgetting and alpha-.”

      • The +LR/inverse temp relationship seems to generalize best between tasks A/C, but not B/C, a common theme in the paper. This does not seem surprising given that in A and C there is a key additional task feature over the bandit task in B -- which is the need to retain state-action associations. Whether captured via F (forgetting) or K (WM capacity), the cognitive processes involved in this learning might interact with LR/exploration in a different way than in a task where this may not be necessary.

      We thank the reviewer for this comment, which raises an important issue. We are adding the specific pairwise correlations and scatter plots for the pairs of parameters the reviewer asked about below (“bf_alpha” = LR task A; “bf_forget” = F task A; “rl_forget” = F task C; “rl_log_alpha” = LR task C; “rl_K” = WM capacity task C):

      Within tasks:

      Between tasks:

      To answer the question in more detail, we have expanded our section about limitations stemming from parameter tradeoffs in the following way:

      “One limitation of our results is that regression analyses might be contaminated by parameter cross-correlations (sections 2.1.2, 2.1.3, 2.1.4), which would reflect modeling limitations (non-orthogonal parameters), and not necessarily shared cognitive processes. For example, parameters alpha and beta are mathematically related in the regular RL modeling framework, and we observed significant within-task correlations between these parameters for two of our three tasks (suppl. Fig. H.10, H.11). This indicates that caution is required when interpreting correlation results. However, correlations were also present between tasks (suppl. Fig. H.9, H.11), suggesting that within-model trade-offs were not the only explanation for shared variance, and that shared cognitive processes likely also played a role.

      Another issue might arise if such parameter cross-correlations differ between models, due to the differences in model parameterizations across tasks. For example, memory-related parameters (e.g., F, K in models A and C) might interact with learning- and choice-related parameters (e.g., alpha+, alpha-, noise/exploration), but such an interaction is missing in models that do not contain memory-related parameters (e.g., task B). If this indeed the case, i.e., parameters trade off with each other in different ways across tasks, then a lack of correlation between tasks might not reflect a lack of generalization, but just the differences in model parameterizations. Suppl. Fig. \ref{figure:S2AlphaBetaCorrelations} indeed shows significant, medium-sized, positive and negative correlations between several pairs of Forgetting, memory-related, learning-related, and exploration parameters (though with relatively small effect sizes; Spearman correlation: 0.17 < |r| < 0.22).

      The existence of these correlations (and differences in correlations between tasks) suggest that memory parameters likely traded off with each other, as well as with other parameters, which potentially affected generalizability across tasks. However, some of the observed correlations might be due to shared causes, such as a common reliance on age, and the regression analyses in the main paper control for these additional sources of variance, and might provide a cleaner picture of how much variance is actually shared between parameters.

      Furthermore, correlations between parameters within models are frequent in the existing literature, and do not prevent researchers from interpreting parameters---in this sense, the existence of similar correlations in our study allows us to address the question of generalizability and interpretability in similar circumstances as in the existing literature.”

      • More generally, isn't relative generalizability the best we would expect given systematic variation in task context? I agree with the authors' point that the language used in the literature sometimes implies an assumption of absolute generalizability (e.g. same LR across any task). But parameter fits, interactions, and group differences are usually interpreted in light of a single task+model paradigm, precisely b/c tasks vary widely across critical features that will dictate whether different algorithms are optimal or not and whether cognitive functions such as WM or attention may compensate for ways in which humans are not optimal. Maybe a more constructive approach would be to decompose tasks along theoretically meaningful features of the underlying Markov Decision Process (which gives a generative model), and be precise about (1) which features we expect will engage additional cognitive mechanisms, and (2) how these mechanisms are reflected in model parameters.

      We thank the reviewer for this comment, and will address both points in turn:

      (1) We agree with the reviewer's sentiment about relative generalizability: If we all interpreted our models exclusively with respect to our specific task design, and never expected our results to generalize to other tasks or models, there would not be a problem. However, the current literature shows a different pattern: Literature reviews, meta-analyses, and discussion sections of empirical papers regularly compare specific findings between studies. We compare specific parameter values (e.g., empirical parameter priors), parameter trajectories over age, relationships between different parameters (e.g., balance between LR+ and LR-), associations between parameters and clinical symptoms, and between model variables and neural measures on a regular basis. The goal of this paper was really to see if and to what degree this practice is warranted. And the reviewer rightfully alerted us to the fact that our data imply that these assumptions might be valid in some cases, just not in others.

      (2) With regard to providing task descriptions that relate to the MDP framework, we have included the following sentence in the discussion section:

      “Our results show that discrepancies are expected even with a consistent methodological pipeline, and using up-to-date modeling techniques, because they are an expected consequence of variations in experimental tasks and computational models (together called "context"). Future research needs to investigate these context factors in more detail. For example, which task characteristics determine which parameters will generalize and which will not, and to what extent? Does context impact whether parameters capture overlapping versus distinct variance? A large-scale study could answer these questions by systematically covering the space of possible tasks, and reporting the relationships between parameter generalizability and distance between tasks. To determine the distance between tasks, the MDP framework might be especially useful because it decomposes tasks along theoretically meaningful features of the underlying Markov Decision Process.“

      Another point that merits more attention is that the paper pretty clearly commits to each model as being the best possible model for its respective task. This is a necessary premise, as otherwise, it wouldn't be possible to say with certainty that individual parameters are well estimated. I would find the paper more convincing if the authors include additional information and analysis showing that this is actually the case.

      We agree with the sentiment that all models should fit their respective task equally well. However, there is no good quantitative measure of model fit that is comparable across tasks and models - for example, because of the difference in difficulty between the tasks, the number of choices explained would not be a valid measure to compare how well the models are doing across tasks. To address this issue, we have added the new supplemental section (Appendix C) mentioned above that includes information about the set of models compared, and explains why we have reason to believe that all models fit (equally) well. We also created the new supplemental Figure D.7 shown above, which directly compares human and simulated model behavior in each task, and shows a close correspondence for all tasks. Because the quality of all our models was a major concern for us in this research, we also refer the reviewer and other readers to the three original publications that describe all our modeling efforts in much more detail, and hopefully convince the reviewer that our model fitting was performed according to high standards.

      I am particularly interested to see whether some of the discrepancies in parameter fits can be explained by the fact that the model for Task A did not account for explicit WM processes, even though (1) Task A is similar to Task C (Task A can be seen as a single condition of Task C with 4 states and 2 possible visible actions, and stochastic rather than deterministic feedback) and (2) prior work has suggested a role for explicit memory of single episodes even in stateless bandit tasks such as Task B.

      We appreciate this very thoughtful question, which raises several important issues. (1) As the reviewer said, the models for task A and task C are relatively different even though the underlying tasks are relatively similar (minus the differences the reviewer already mentioned, in terms of visibility of actions, number of actions, and feedback stochasticity). (2) We also agree that the model for task C did not include episodic memory processes even though episodic memory likely played a role in this task, and agree that neither the forgetting parameters in tasks A and C, nor the noise/exploration parameters in tasks A, B, and C are likely specific enough to capture all the memory / exploration processes participants exhibited in these tasks.

      However, this problem is difficult to solve: We cannot fit an episodic-memory model to task B because the task lacks an episodic-memory manipulation (such as, e.g., in Bornstein et al., 2017), and we cannot fit a WM model to task A because it lacks the critical set-size manipulation enabling identification of the WM component (modifying set size allows the model to identify individual participants’ WM capacities, so the issue cannot be avoided in tasks with only one set size). Similarly, we cannot model more specific forgetting or exploration processes in our tasks because they were not designed to dissociate these processes. If we tried fitting more complex models that include these processes to these tasks, they would most likely lose in model comparison because the increased complexity would not lead to additional explained behavioral variance, given that the tasks do not elicit the relevant behavioral patterns. Because the models therefore do not specify all the cognitive processes that participants likely employ, the situation described by the reviewer arises, namely that different parameters sometimes capture the same cognitive processes across tasks and models, while the same parameters sometimes capture different processes.

      And while the reviewer focussed largely on memory-related processes, the issue of course extends much further: Besides WM, episodic memory, and more specific aspects of forgetting and exploration, our models also did not take into account a range of other processes that participants likely engaged in when performing the tasks, including attention (selectivity, lapses), reasoning / inference, mental models (creation and use), prediction / planning, hypothesis testing, etc., etc. In full agreement with the reviewer’s sentiment, we recently argued that this situation is ubiquitous to computational modeling, and should be considered very carefully by all modelers because it can have a large impact on model interpretation (Eckstein et al., 2021).

      If we assume that many more cognitive processes are likely engaged in each task than are modeled, and consider that every computational model includes just a small number of free parameters, parameters then necessarily reflect a multitude of cognitive processes. The situation is additionally exacerbated by the fact that more complex models become increasingly difficult to fit from a methodological perspective, and that current laboratory tasks are designed in a highly controlled and consequently relatively simplistic way that does not lend itself to simultaneously test a variety of cognitive processes.

      The best way to deal with this situation, we think, is to recognize that in different contexts (e.g., different tasks, different computational models, different subject populations), the same parameters can capture different behaviors, and different parameters can capture the same behaviors, for the reasons the reviewer lays out. Recognizing this helps to avoid misinterpreting modeling results, for example by focusing our interpretation of model parameters to our specific task and model, rather than aiming to generalize across multiple tasks. We think that recognizing this fact also helps us understand the factors that determine whether parameters will capture the same or different processes across contexts and whether they will generalize. This is why we estimated here whether different parameters generalize to different degrees, which other factors affect generalizability, etc. Knowing the practical consequences of using the kinds of models we currently use will therefore hopefully provide a first step in resolving the issues the reviewer laid out.

      It is interesting that one of the parameters that generalizes least is LR-. The authors make a compelling case that this is related to a "lose-stay" behavior that benefits participants in Task B but not in Task C, which makes sense given the probabilistic vs deterministic reward function. I wondered if we can rule out the alternative explanation that in Task C, LR- could reflect a different interpretation of instructions vis. a vis. what rewards indicate - do authors have an instruction check measure in either task that can be correlated with this "lose-stay" behavior and with LR-? And what does the "lose-stay" distribution look like, for Task C at least? I basically wonder if some of these inconsistencies can be explained by participants having diverging interpretations of the deterministic nature of the reward feedback in Task C. The order of tasks might matter here as well -- was task order the same across participants? It could be that due to the within-subject design, some participants may have persisted in global strategies that are optimal in Task B, but sub-optimal in Task C.

      The PCA analysis adds an interesting angle and a novel, useful lens through which we can understand divergence in what parameters capture across different tasks. One observation is that loadings for PC2 and PC3 are strikingly consistent for Task C, so it looks more like these PCs encode a pairwise contrast (PC2 is C with B and PC2 is C with A), primarily reflecting variability in performance - e.g. participants who did poorly on Task C but well on Task B (PC2) or Task A (PC3). Is it possible to disentangle this interpretation from the one in the paper? It also is striking that in addition to performance, the PCs recover the difference in terms of LR- on Task B, which again supports the possibility that LR- divergence might be due to how participants handle probabilistic vs. deterministic feedback.

      We appreciate this positive evaluation of our PCA and are glad that it could provide a useful lens for understanding parameters. We also agree to the reviewer's observation that PC2 and PC3 reflect task contrasts (PC2: task B vs task C; PC3: task A vs task C), and phrase it in the following way in the paper:

      “PC2 contrasted task B to task C (loadings were positive / negative / near-zero for corresponding features of tasks B / C / A; Fig. 3B). PC3 contrasted task A to both B and C (loadings were positive / negative for corresponding features on task A / tasks B and C; Fig. 3C).”

      Hence, the only difference between our interpretation and the reviewer’s seems to be whether PC3 contrasts task C to task B as well as task A, or just to task A. Our interpretation is supported by the fact that loadings for tasks A and C are quite similar on PC3; however, both interpretations seem appropriate.

      We also appreciate the reviewer's positive evaluation of the fact that the PCA reproduces the differences in LR-, and its relationship to probabilistic/deterministic feedback. The following section reiterates this idea:

      “alpha- loaded positively in task C, but negatively in task B, suggesting that performance increased when participants integrated negative feedback faster in task C, but performance decreased when they did the same in task B. As mentioned before, contradictory patterns of alpha- were likely related to task demands: The fact that negative feedback was diagnostic in task C likely favored fast integration of negative feedback, while the fact that negative feedback was not diagnostic in task B likely favored slower integration (Fig. 1E). This interpretation is supported by behavioral findings: "Lose-stay" behavior (repeating choices that produce negative feedback) showed the same contrasting pattern as alpha- on PC1. It loaded positively in task B, showing Lose-stay behavior benefited performance, but it loaded negatively on task C, showing that it hurt performance (Fig. 3A). This supports the claim that lower alpha- was beneficial in task B, while higher alpha- was beneficial in task C, in accordance with participant behavior and developmental differences.“

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Sasaki et al titled "Conditional GWAS of non-CG transposon methylation in Arabidopsis thaliana reveals major polymorphisms in five genes" employed conditional GWAS to identify trans-regulators of mCHG levels in Arabidopsis natural accessions, after controlling for mCHH. Using loss of function mutants for couple of these genes, the authors also tested their effects on mCHG levels.

      Overall, this manuscript makes a nice contribution. I suggest the following improvements to enhance the quality of this manuscript.

      Comments:

      1. MSI1 has been shown to be copurified with TCX5, a component of DREAM Complex. The DREAM complex transcriptional regulates CMT3, MET1, DDM1 in a cell cycle dependent manner (ref: Yong-Qiang Ning, 2020 nature plants). Tcx5/6 double mutants have ectopic gain of TE and genic mCHG. It would be nice to refer this paper and add to the MSI1 part accordingly. Absolutely: thanks for suggesting this!

      Multifaceted regulation of mCHG levels seems to be evident from this and previous studies. Why would such complex pathways evolv to regulate mCHG? Bewick et al 2016 and Wendte et al 2019 showed lack of CMT3 or ectopic expression of CMT3 can influence CG gene body methylation (gbM). One possibility is that these five factors regulate CHG to maintain it at a level that is just enough to target TE. Irrespective of the functional relevance of gbM, differences in the levels of these five factors might result in erroneous gbM. It would be interesting to look for the rates of gbM and number of gbM genes in the natural accession carrying 1 to 4 number of mCHG-decreasing alleles. Also, in the one line from Iberian peninsula carrying polymorphisms in all five genes.

      Yes, the connection between CHG and gbM is very interesting and deserves more attention. We looked for the effect of cumulative mCHG-decreasing alleles on gbM, but there was no association with gbM — but this is really not expected given the stable epigenetic inheritance of gbM. The Iberian peninsula line carrying all decreasing alleles did slightly lower gbM levels, but it is impossible to exclude the effects of population structure. Since we have nothing to add beyond speculation, we prefer not to go into this topic.

      The authors mentioned a significant peak for mCHG|mCHH on RdDM-targeted transposons was located 196 bp downstream of MIR823a and not on mature miRNA. Therefore, this cannot directly impair miR823 base pairing with CMT3 mRNA transcripts and its cleavage. Moreover, natural accessions carrying alternative MIRNA823 allele show reduced CMT3 and mCHG levels, meaning more miR823 levels? Does this 196 downstream region contain any regulatory feature that effects miR823 transcription? Or this region still falls in the primary miRNA hairpin region? A single nucleotide change in pri-miRNA can have a significant impact on its secondary structure that can impede DICER processivity and effectively levels of mature miR823 molecules? It will be beyond the scope of this paper to pin down the exact mechanism. But a simple stem loop RT-PCR for miR823 levels in reference and alternative accessions would be informative (on accessions that grow at the same speed). Perhaps, the authors can at least model SNP induced pri-miRNA secondary structure variations using Vienna RNAFold (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi) and present MEF values (maximum free energy) for representative accessions.

      Stem-loop qRT-PCR for MIR823a expression would indeed be helpful to confirm allelic effects. However, comparing lines with wildly different genetic backgrounds is fraught with difficulty due to trans-effects. Furthermore, MIR823a is expressed specifically during embryogenesis, and the expression quickly decreases after the early heart stage (Papareddy et al., 2021). Thus, we would need to extract microRNA from embryos at exactly the same developmental stage, from lines that may develop at different speeds.. Most likely, time-series data would be required, and generating such data is a massive undertaking. As noted in the paper, we did measure MIR823a expression by stem-loop qRT-PCR for several lines carrying reference and alternative alleles but the results were inconclusive. A proper study of this is beyond the scope of this paper.

      Testing predicted effects on RNA secondary structure, on the other hand, is eminently feasible. As suggested, we used Vienna RNAFold for the region, including the GWAS peak. Since the SNP is linked to a 35 bp deletion (shown in S4A), it is closer to the MIR823A coding region than 196 bp. However, the results indicate that the SNP (Chr3:4496626) is not within the stem-loop. It remains possible that this SNP tags multiple SNPs in the annotated stem regions. This is now mentioned.

      Figure 1A can be made more reader friendly. Perhaps this can be broken down into correlation plots for individual conditions or tissue types. In addition, it might be good to add individual r-square values for each of them instead of compound r-square.

      We respectfully disagree, since the main point of the figure is the overall correlation and heterogeneity, rather than the correlation within sub-sets. Instead of splitting the plot, we changed color contrasts to make it easier to read.

      Page 3, Paragraph 1 from line 3 to end of paragraph. The authors wrote "Much of this variation is due to differences in the environment (including tissue, which can be viewed as a cellular environment)". A possible explanation is these two tissues have different mitotic indices (fraction of cells diving and non-diving; flowers have more dividing cell, leaves have more non dividing and endoreduplicated cells) that explains non-CG variation. I would suggest authors to change the text to this and refer to Filipe Borges et al 2021 Current biology paper.

      This is certainly a possibility, although higher mCHG levels in flower buds presumably also reflect higher CMT3 expression during embryogenesis (Feng et al. 2020; Gutzat et al. 2020; Papareddy et al. 2021). We now mention both explanations and cite Borges et al. (2021).

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Sasaki et al. carried out a conditional GWAS analysis of TE-CHG methylation in Arabidopsis thaliana natural accessions. They revealed multiple associations with SNPs in known DNA methylation genes. A new finding is the association found proximal to JMJ26, which had no previously described role in the maintenance/establishment of RdDM-targeted transposons. The authors validate the JMJ26 association using a loss-of-function mutant of JMJ26, which essentially recapitulates the GWAS effect, suggesting that JMJ26 is likely causal. An important point of the study is that the associations detected with conditional GWAS have not been seen in previous univariate (i.e. unconditional) GWAS, probably due to to a lack of power. At the sub-genome-wide threshold the authors discovered further, albeit weaker, associations that were also highly enriched for known DNA methylation genes.

      Overall impression:

      The manuscript is clearly written, and the functional validation of the JMJ26 GWAS signal is commendable and certainly goes beyond the typical GWA study. Beyond this validated association however, the GWAS results are mainly confirmatory. They essentially highlight that methylation genes previously identified by way of mutant screens are variable in natural populations, and (probably) causative of non-CG methylation variation in TEs. What I personally found very distracting throughout the manuscript was the strong emphasis on the methodological aspect; that is, the conditional GWAS, which is really not new. Furthermore, the conceptual/philosophical discussion about what is a complex trait or what can be called polygenic was slightly pedantic and distracted from the biological message.

      There are three points here. First, we disagree that the GWAS results are confirmatory. Sure, only one of our associations is connected with a novel gene, but the fact that the four other genes apparently harbor major polymorphism is a new finding that contributes to our understanding of the function of this trait (and, possibly, these genes). Second, while it is possible that we emphasize statistical methodology too much, we do this for clarity, not to claim that what we are doing is novel. Third, we are similarly not interested in defining what is polygenic and what isn’t, but rather put the results in the context of other studies. We have changed the writing in various places to make it clearer (and hopefully less distracting/pedantic).

      A conceptual comment:

      • The conditional GWAS presented here is conceptually very similar to conditional QTL mapping approaches where candidate loci are included, a priori, as covariates in the model, and a scan is performed to search for additional modifiers. It is known that this approach increases power because the scan is performed on the residual trait variation (having accounting for effect of candidate loci). This is also the idea behind MQM mapping, although in the latter the inclusion is not restricted to candidate loci. Instead of including candidate SNPs as covariate the authors include TE-CHH methylation levels as a covariate as it is highly correlated with TE-CHG methylation. By doing this, the authors essentially "control" for any SNP affecting the covariance between CHG and CHH, even if these SNPs (and their genetic architecture) remain unknown. Hence, the conditional scan is mainly on the residual variation in TE-CHG methylation that is unique to this context (i.e. independent of CHH). That additional TE-CHG associated loci pop up in this scan is perhaps not so surprising.

      We agree, and have even written papers on this very subject. We were surprised by this comment as we felt we had included lengthy sections (see also comment above) about methodology, emphasizing that multi-trait analysis is a good idea in principle. One of our purposes here is to provide a beautiful example demonstrating this. We have tried to make these points clearer.

      The finding that this conditional GWAS yields again a handful of loci of that explain a considerable part of the trait (now residual trait) variation leads the authors to suggest that the genetic architecture underlying non-CG methylation of TEs is not "polygenic". I think this is semantics. All the authors have done is relegate any causal SNPs underlying the covariance between TE-CHG and TE-CHH to the right hand side of the equation of their GWAS model, and subsumed it under the predictor "TE-CHH methylation levels". That is, the genetic architecture underlying this covariance is still unknown, difficult to identify and probably highly polygenic.

      Again we agree, and fail to see why the reviewer thinks we do not. Nowhere do we claim that the overall covariance has a simple basis, and we explicitly state that it is the conditional mCHG variation that has an oligogenic basis. We did write that “univariate GWAS of mCHG variation failed to detect any significant associations, leading us to conclude, erroneously, that the trait was simply too polygenic”, which was imprecise, and arguably erroneous. The word “erroneously” has been removed in the revision.

      The authors essentially decompose a complex traits into parts and map genetic architectures for each part. Although each part seems less complex and more oligogenic than polygenic, when putting all the parts back together, I would argue we are getting close to a complex trait with a polygenic architecture. The study by Hüther et al, which the authors also cite, is another example of how a complex trait can be decomposed into parts. In reference to one of the authors' GWAS associations, they say "...this association was also recently found by Hüther et al. (2022) using GWAS for unconditional mCHG levels of individual transposons. The MIR823A polymorphism appears to almost exclusively affect mCHG (Figs. S2, S3), primarily targeting the same transposons as a CMT3 knock-out...". In the case of Hüther et al., the complex TE-CHG methylation trait is simplified by selecting specific TEs, a priori, that are differential methylated in CMT3 knock-out lines. One could go on like this, and continue to peel away this complex trait. But, again, this does not mean that the overall TE-CHG methylation trait is not complex nor polygenic. It spirals down into a discussion of what is actually meant by "complex" or "polygenic", which is an interesting discussion, but - in the case - of this manuscript takes away from the biological message. My point is perhaps best reflected in the following statement from the discussion section: "Despite high heritability, univariate GWAS of mCHG variation failed to detect any significant associations, leading us to conclude, erroneously, that the trait was simply too polygenic (Kawakatsu et al., 2016)." But a few lines below the authors seem to realize what they have actually done "We believe that, by controlling for mCHH, we have effectively simplified the trait, revealing genetic factors affecting mCHG only, perhaps by affecting the maintenance of this type of DNA methylation."

      The phrase “seem to realize” is unwarranted and unnecessary sarcasm. Given that we cite the two century-old papers that first demonstrated that it was possible to decompose complex traits into Mendelian ones, it should be obvious that we understand what we have done. That our writing could have been better is another matter. As noted above, the word “erroneously” has been dropped, and we have also changed the second sentence to make it obvious that this is obvious. We suspect that whether one finds this part of the Discussion “distracting” or not depends on training and background — our objective was to explain our results to readers who (unlike us and the reviewer) are not well-versed in quantitative genetics.

      Specific comments

      1. A large part of the manuscript focuses on SNPs that enriched for a priori genes that fall below the genome-wide significance threshold. While I see the reasons for doing this in this particular manuscript, I do not see how this is useful in general (again this approach is partly "sold" on methodological grounds). The approach can obviously not be extended to study traits where a priori gene sets are unavailable or incomplete. Moreover, the "FDR" approach based on the a priori gene set labels GWAS hits that are not within the a priori set "false discoveries", which may or may not be true. Moreover, there is no "natural" stopping point for going below GWAS thresholds. An alternative, to this would be to perform a targeted GWAS for a priori genes (+ a LD window around them). Since this alleviates the multiple testing burden, I would be curious to see what this yields both in terms of conditional and unconditional analysis. Candidates that show a signal could be included as covariates and a conditional scan for unknown genes could then be performed.

      The FDR analysis using a set of a priori genes should be explained in detail in this ms. It is cumbersome to go to another manuscript to see what was done exactly, especially since this information is also difficult to dig up in the Atwell 2010 study. Although I understand the idea behind this approach, I would be concerned that this type of "FDR" analysis assumes that that all methylation genes are known. A novel candidate that was perhaps never identified in mutants screens before would be classified as a false discovery. Similarly, known candidates that carry no functional polymorphisms in nature, perhaps because they are highly constraint, will never become a discovery.

      Comments 1 and 9 largely overlap, and so we moved 9 here for clarity and respond to both at the same time. We agree that the enrichment analysis should be explained in this article as well, so as to save the reader from finding the supplement to an old paper. A new section has been added to Methods. In this section, we also try to preempt some of the misunderstandings in the reviewer's comments.

      First, our approach is indeed generally applicable. Whether it is useful depends on what you want to do, and yes, the utility will depend on the quality of the independent data, but note that the a priori gene set does not have to be genes: you could use this approach to compare coding vs non-coding regions of the genome, for example.

      Second, we are not trying to “sell” our approach (or anything else for that matter).

      Third, the approach does not label GWAS hits that are not within the a priori set as false discoveries: it says nothing about these hits.

      Fourth, we are not sure what is meant by a ‘“natural” stopping point for going below GWAS thresholds’, but our approach does provide a simple way to explore how FDR (in the a priori set!) depends on the threshold used.

      Fifth, the proposed alternative of “targeted GWAS” (non-genomewide association, as it were) is not equivalent, because our approach was not designed to increase power by alleviating the multiple testing burden, but rather to rigorously demonstrate that there is a signal in the data when faced with uncalibrated p-values. That it can also be used to explore sub-significant associations is a nice side-effect that we exploit here.

      Sixth, we do not assume that all methylation genes are known, nor is our goal to find them all.

      With regards to the CMT2 signals (particularly section "Further evidence for allelic heterogeneity at CMT2") it would have been useful/clearer to break down CHH into CWA and non-CWA.

      While this is a sensible suggestion, the focus of this paper is on mCHG, and refining the mCHH measurement would essentially amount to re-doing all analyses.

      I understand that the authors set out to do this conditional analysis because previously no hits could be found for CHG TE methylation. However, have the authors considered going the other way around and performing a CHH|CHG analysis to find additional QTL affecting CHH methylation, partly indepedently of CHG?

      Yes, this was in the paper, but we only mention it in the Discussion (and Fig S13) as the results were only of methodological interest (as expected, they were very similar).

      The authors write: "While both mCHG and mCG showed high heritability, GWAS yielded little in terms of significant associations. This might be because these "traits" are highly polygenic, or because they are at least partly transgenerationally inherited, and hence do not behave like standard phenotypes." Please clarify what they mean by "not behave like standard phenotypes".

      Done.

      The authors write: "Our starting point is the observation that mCHG and mCHH levels on transposons are strongly correlated in the 1001 Epigenomes data set (Kawakatsu et al., 2016), especially for RdDM- targeted transposons (Fig. 1A; see Methods). Much of this variation ....". What is mean by "this variation"?

      The sentence has been changed to make this clearer.

      A few lines below, they write "...huge". Please rephrase.

      Done.

      The authors write: "sample data set ("Leaf SALK ambient temperature"; n=846). Interestingly, the covariance between mCHH and mCHG showed the same pattern in data generated by knocking out known or potential DNA methylation regulators in the same genetic background (Fig. 1B) (Stroud et al., 2013). This demonstrates strong co-regulation of these types of methylation, in particular for RdDM-targeted transposons." It is noticeable that many double mutants are off the diagonal. To me this indicates that they affect one context more than the other (i.e. they break covariance). Second, it suggests that they are probably interacting non-additively. It would be great if the authors could comment on this observation; perhaps also later in the ms, where they make a case for additivity.

      We are not convinced that the double- or triple-mutant show non-additivity. Adding up effects in Figure 1 works pretty well. As for our GWAS results, it is clear that small effects (like the ones in our GWAS) will always tend to look additive for simple mathematical reasons. This does not mean that no interactions exist, and we emphasize this in the paper. We also have an example of non-linearity when it comes to TE activity. This is now also emphasized.

      The authors write: " it is difficult to say what fraction of these factors is genetic and what is environmental, but, regardless of this, we hypothesized that the substantial covariance could reduce power of GWAS for either mCHH or mCHG (when using a standard univariate model), and that an analysis accounting for this covariance might perform better...". The arguments given thus far are not sufficient to understand why a "substantial covariance" between traits would reduce the power to map individual traits. I think more needs to be done here to motivate this.

      The sentence following the one quoted is “In essence, we sought to simplify a complex trait by breaking it into constituent parts”, which is very much part of the motivation. As the reviewer noted above, it is not surprising that a conditional analysis turns out to be more powerful. The comment may have arisen from the statement “This insight is the basis for this paper”, which is misleading — there is no insight here, just a very obvious hypothesis, which turned to be correct. We have changed the writing to make this clearer.

      The authors write" "However, MSI1 is required to control DNA methylation via repression of MET1, and a loss of FAS2 in CAF-1 induces mCHG hypermethylation (Fig 1B) (Stroud et al., 2013; Jullien et al., 2008)...", where is the "FAS2 in CAT-1" result visible in Fig. 1B?

      fas2 induces mCHG hypermethylation in CMT2-targeted TEs, presumably via a complex that also involves MSI1. It is marked in Fig. 1B. We have rephrased the sentence to make this clearer.

      The results presented in "A jmjC gene is a novel modifier of mCHG in RdDM-targeted transposons" could have been showcased better. Only after reading the methods part did I realize that the authors generated CRISPR mutants. It reads as if the authors just picked up some available loss of function mutants and profiled them. But, clearly, much more work was involved here and the authors could have brought that out more. Perhaps more generally, I think all the new functional analysis the authors perform is largely "under-sold" in this manuscript at the expense of unnecessary methodological/concpetual discussion (see point above).

      We actually generated CRISPR/CAS9 mutants only for MIR823A (Table S5). For JMJ26, a t-DNA insertion line was available, and results based on this and rescue lines provided sufficient results. To clarify this, we corrected the subsection titles.

      In section "The power and complexity of conditional GWAS", the authors write "The performance of GWAS relies on using the right model for the relation between genotype and phenotype. As with other statistical methods, using the wrong model may lead to unpredictable results." This seems like a too obvious of a statement.

      Indeed: it is meant ironically. It is obvious, yet people do it.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      Sasaki et al. carried out a conditional GWAS analysis of TE-CHG methylation in Arabidopsis thaliana natural accessions. They revealed multiple associations with SNPs in known DNA methylation genes. A new finding is the association found proximal to JMJ26, which had no previously described role in the maintenance/establishment of RdDM-targeted transposons. The authors validate the JMJ26 association using a loss-of-function mutant of JMJ26, which essentially recapitulates the GWAS effect, suggesting that JMJ26 is likely causal. An important point of the study is that the associations detected with conditional GWAS have not been seen in previous univariate (i.e. unconditional) GWAS, probably due to to a lack of power. At the sub-genome-wide threshold the authors discovered further, albeit weaker, associations that were also highly enriched for known DNA methylation genes.

      Overall impression:

      The manuscript is clearly written, and the functional validation of the JMJ26 GWAS signal is commendable and certainly goes beyond the typical GWA study. Beyond this validated association however, the GWAS results are mainly confirmatory. They essentially highlight that methylation genes previously identified by way of mutant screens are variable in natural populations, and (probably) causative of non-CG methylation variation in TEs. What I personally found very distracting throughout the manuscript was the strong emphasis on the methodological aspect; that is, the conditional GWAS, which is really not new. Furthermore, the conceptual/philosophical discussion about what is a complex trait or what can be called polygenic was slightly pedantic and distracted from the biological message.

      A conceptual comment:

      • The conditional GWAS presented here is conceptually very similar to conditional QTL mapping approaches where candidate loci are included, a priori, as covariates in the model, and a scan is performed to search for additional modifiers. It is known that this approach increases power because the scan is performed on the residual trait variation (having accounting for effect of candidate loci). This is also the idea behind MQM mapping, although in the latter the inclusion is not restricted to candidate loci. Instead of including candidate SNPs as covariate the authors include TE-CHH methylation levels as a covariate as it is highly correlated with TE-CHG methylation. By doing this, the authors essentially "control" for any SNP affecting the covariance between CHG and CHH, even if these SNPs (and their genetic architecture) remain unknown. Hence, the conditional scan is mainly on the residual variation in TE-CHG methylation that is unique to this context (i.e. independent of CHH). That additional TE-CHG associated loci pop up in this scan is perhaps not so surprising.

      The finding that this conditional GWAS yields again a handful of loci of that explain a considerable part of the trait (now residual trait) variation leads the authors to suggest that the genetic architecture underlying non-CG methylation of TEs is not "polygenic". I think this is semantics. All the authors have done is relegate any causal SNPs underlying the covariance between TE-CHG and TE-CHH to the right hand side of the equation of their GWAS model, and subsumed it under the predictor "TE-CHH methylation levels". That is, the genetic architecture underlying this covariance is still unknown, difficult to identify and probably highly polygenic.

      The authors essentially decompose a complex traits into parts and map genetic architectures for each part. Although each part seems less complex and more oligogenic than polygenic, when putting all the parts back together, I would argue we are getting close to a complex trait with a polygenic architecture. The study by Hüther et al, which the authors also cite, is another example of how a complex trait can be decomposed into parts. In reference to one of the authors' GWAS associations, they say "...this association was also recently found by Hüther et al. (2022) using GWAS for unconditional mCHG levels of individual transposons. The MIR823A polymorphism appears to almost exclusively affect mCHG (Figs. S2, S3), primarily targeting the same transposons as a CMT3 knock-out...". In the case of Hüther et al., the complex TE-CHG methylation trait is simplified by selecting specific TEs, a priori, that are differential methylated in CMT3 knock-out lines. One could go on like this, and continue to peel away this complex trait. But, again, this does not mean that the overall TE-CHG methylation trait is not complex nor polygenic. It spirals down into a discussion of what is actually meant by "complex" or "polygenic", which is an interesting discussion, but - in the case - of this manuscript takes away from the biological message. My point is perhaps best reflected in the following statement from the discussion section: "Despite high heritability, univariate GWAS of mCHG variation failed to detect any significant associations, leading us to conclude, erroneously, that the trait was simply too polygenic (Kawakatsu et al., 2016)." But a few lines below the authors seem to realize what they have actually done "We believe that, by controlling for mCHH, we have effectively simplified the trait, revealing genetic factors affecting mCHG only, perhaps by affecting the maintenance of this type of DNA methylation."

      Specific comments

      • A large part of the manuscript focuses on SNPs that enriched for a priori genes that fall below the genome-wide significance threshold. While I see the reasons for doing this in this particular manuscript, I do not see how this is useful in general (again this approach is partly "sold" on methodological grounds). The approach can obviously not be extended to study traits where a priori gene sets are unavailable or incomplete. Moreover, the "FDR" approach based on the a priori gene set labels GWAS hits that are not within the a priori set "false discoveries", which may or may not be true. Moreover, there is no "natural" stopping point for going below GWAS thresholds. An alternative, to this would be to perform a targeted GWAS for a priori genes (+ a LD window around them). Since this alleviates the multiple testing burden, I would be curious to see what this yields both in terms of conditional and unconditional analysis. Candidates that show a signal could be included as covariates and a conditional scan for unknown genes could then be performed.
      • With regards to the CMT2 signals (particularly section "Further evidence for allelic heterogeneity at CMT2") it would have been useful/clearer to break down CHH into CWA and non-CWA.
      • I understand that the authors set out to do this conditional analysis because previously no hits could be found for CHG TE methylation. However, have the authors considered going the other way around and performing a CHH|CHG analysis to find additional QTL affecting CHH methylation, partly indepedently of CHG?
      • The authors write: "While both mCHG and mCG showed high heritability, GWAS yielded little in terms of significant associations. This might be because these "traits" are highly polygenic, or because they are at least partly transgenerationally inherited, and hence do not behave like standard phenotypes." Please clarify what they mean by "not behave like standard phenotypes".
      • The authors write: "Our starting point is the observation that mCHG and mCHH levels on transposons are strongly correlated in the 1001 Epigenomes data set (Kawakatsu et al., 2016), especially for RdDM- targeted transposons (Fig. 1A; see Methods). Much of this variation ....". What is mean by "this variation"?
      • A few lines below, they write "...huge". Please rephrase.
      • The authors write: "sample data set ("Leaf SALK ambient temperature"; n=846). Interestingly, the covariance between mCHH and mCHG showed the same pattern in data generated by knocking out known or potential DNA methylation regulators in the same genetic background (Fig. 1B) (Stroud et al., 2013). This demonstrates strong co-regulation of these types of methylation, in particular for RdDM-targeted transposons." It is noticeable that many double mutants are off the diagonal. To me this indicates that they affect one context more than the other (i.e. they break covariance). Second, it suggests that they are probably interacting non-additively. It would be great if the authors could comment on this observation; perhaps also later in the ms, where they make a case for additivity.
      • The authors write: " it is difficult to say what fraction of these factors is genetic and what is environmental, but, regardless of this, we hypothesized that the substantial covariance could reduce power of GWAS for either mCHH or mCHG (when using a standard univariate model), and that an analysis accounting for this covariance might perform better...". The arguments given thus far are not sufficient to understand why a "substantial covariance" between traits would reduce the power to map individual traits. I think more needs to be done here to motivate this.
      • The FDR analysis using a set of a priori genes should be explained in detail in this ms. It is cumbersome to go to another manuscript to see what was done exactly, especially since this information is also difficult to dig up in the Atwell 2010 study. Although I understand the idea behind this approach, I would be concerned that this type of "FDR" analysis assumes that that all methylation genes are known. A novel candidate that was perhaps never identified in mutants screens before would be classified as a false discovery. Similarly, known candidates that carry no functional polymorphisms in nature, perhaps because they are highly constraint, will never become a discovery.
      • The authors write" "However, MSI1 is required to control DNA methylation via repression of MET1, and a loss of FAS2 in CAF-1 induces mCHG hypermethylation (Fig 1B) (Stroud et al., 2013; Jullien et al., 2008)...", where is the "FAS2 in CAT-1" result visible in Fig. 1B?
      • The results presented in "A jmjC gene is a novel modifier of mCHG in RdDM-targeted transposons" could have been showcased better. Only after reading the methods part did I realize that the authors generated CRISPR mutants. It reads as if the authors just picked up some available loss of function mutants and profiled them. But, clearly, much more work was involved here and the authors could have brought that out more. Perhaps more generally, I think all the new functional analysis the authors perform is largely "under-sold" in this manuscript at the expense of unnecessary methodological/concpetual discussion (see point above).
      • In section "The power and complexity of conditional GWAS", the authors write "The performance of GWAS relies on using the right model for the relation between genotype and phenotype. As with other statistical methods, using the wrong model may lead to unpredictable results." This seems like a too obvious of a statement.

      Significance

      The manuscript is clearly written, and the functional validation of the JMJ26 GWAS signal is commendable and certainly goes beyond the typical GWA study. Beyond this validated association however, the GWAS results are mainly confirmatory. They essentially highlight that methylation genes previously identified by way of mutant screens are variable in natural populations, and (probably) causative of non-CG methylation variation in TEs.

  2. May 2022
    1. scanned for solutions to long-standing problems in his reading,conversations, and everyday life. When he found one, he couldmake a connection that looked to others like a flash of unparalleledbrilliance

      Feynman’s approach encouraged him to follow his interests wherever they might lead. He posed questions and constantly

      Creating strong and clever connections between disparate areas of knowledge can appear to others to be a flash of genius, in part because they didn't have the prior knowledges nor did they put in the work of collecting, remembering, or juxtaposition.

      This method may be one of the primary (only) underpinnings supporting the lone genius myth. This is particularly the case when the underlying ideas were not ones fully developed by the originator. As an example if Einstein had fully developed the ideas of space and time by himself and then put the two together as spacetime, then he's independently built two separate layers, but in reality, he's cleverly juxtaposed two broadly pre-existing ideas and combined them in an intriguing new framing to come up with something new. Because he did this a few times over his life, he's viewed as an even bigger genius, but when we think about what he's done and how, is it really genius or simply an underlying method that may have shaken out anyway by means of statistical thermodynamics of people thinking, reading, communicating, and writing?

      Are there other techniques that also masquerade as genius like this, or is this one of the few/only?

      Link this to Feynman's mention that his writing is the actual thinking that appears on the pages of his notes. "It's the actual thinking."

    2. You may find this book in the “self-improvement” category, but in adeeper sense it is the opposite of self-improvement. It is aboutoptimizing a system outside yourself, a system not subject to you

      imitations and constraints, leaving you happily unoptimized and free to roam, to wonder, to wander toward whatever makes you feel alive here and now in each moment.

      Some may categorize handbooks on note taking within the productivity space as "self-help" or "self-improvement", but still view it as something that happens outside of ones' self. Doesn't improving one's environment as a means of improving things for oneself count as self-improvement?

      Marie Kondo's minimalism techniques are all external to the body, but are wholly geared towards creating internal happiness.

      Because your external circumstances are important to your internal mental state, external environment and decoration can be considered self-improvement.


      Could note taking be considered exbodied cognition? Vannevar Bush framed the Memex as a means of showing associative trails. (Let's be honest, As We May Think used the word trail far too much.)

      How does this relate to orality vs. literacy?

      Orality requires the immediate mental work for storage while literacy removes some of the work by making the effort external and potentially giving it additional longevity.

    1. Joint Public Review:

      The present manuscript compares the connectomes of a large range of mammal species using diffusion MRI data. The manuscript reports two main findings: (1) connectomes of more related species are generally more similar, as assessed using Laplacian eigenspectra, than of unrelated species; (2) differences between species' connectomes are generally driven by local regional connectivity profiles, whereas global features are generally preserved.

      The first finding is comforting, but in a way not extremely surprising. It would be extremely surprising if more related species do not show more similarity in their connectome. Indeed, this is the reason many phylogenetic analyses use statistical techniques that take the relatedness of species explicitly into account. I find the statement that connectome organization recapitulates traditional taxonomies a bit over the top, as this suggests that a phylogenetic tree constructed based on connectomes would be similar to a tree based on other measures, such as morphology or genetics. This will probably be the case, but is not what the authors have tested here.

      The second result is in my opinion the key result of the paper. The main novelty of the paper is that -finally, for the field-bridges approaches taken by some researchers in searching for differences across species (these are usually researchers interested in anatomy) and researchers searching for conserved principles across species (usually researchers approaching connectivity from a network or graph theory perspective). By showing what aspects of a connectome are generally conserved and which are changed, this paper starts unifying the two views and this is an important contribution.

      It would, however, have been nice if the authors had explored this notion a bit further. Now, they just state that taking certain features into account means the connectomes look more different, but they do not zoom into the specific brains to see what this means at a biological level. Some of the authors have published, for instance, on the unique connectivity profiles of parts of the human brain and it would have been nice to show that these fall under the local regional connectivity profile aspects of the connectomes. This is a missed opportunity to even further unify the different research traditions.

      The manuscript suggests that white matter connectivity in mammals is more similar between species within one taxonomic group than across different groups, proposing that the brain's connectome reflects phylogenetic relationships. The manuscript further details which features of the network organisation are associated with larger differences across groups and hence may drive speciation; and which features seem to be a common principle across mammals.

      The authors present evidence based on the analysis of diffusion-weighted brain imaging data across 124 species, 111 of which were included in the comparison. The dataset is a great resource to address their research question.

      The paper is clear and the evidence compelling. The manuscript adds valuable insights into the connectome architecture across species, potentially opening a new perspective on the link between genetics and behaviour. I would like to point out the great open science practice of the authors - code is available with a great ReadMe to guide potential users, connectivity matrices are available, and all software packages used in the analyses have been cited.

      The figures are clear and complement the manuscript.

      Technical Comments:

      - Spectral approach / Interpretation<br /> It would be good to have more insight into the meaning of the spectral distance results. My understanding is this: the eigenvalues of the normalised Laplacian obviously have a mean of 1 (because their sum equals the trace of the Laplacian, which is equal to N [number of nodes]). Therefore, the distances between the spectra essentially amounts to comparing higher moments, and in particular the variance (as the histograms look quite Gaussian, I am guessing the distances are dominated by differences in the variance). But what does it mean that bats have a higher variance in these Eigenvalues than primates? I know that the authors try to give *some* insight, e.g. that when the distribution is peaky around 1, it means there are more stereotypical local patterns of connectivity. I understand that. But what are these patterns?

      - Effect Size / Null Distribution<br /> I like the idea and the ambition of this paper. My main concern is that the differences are very small. Pretty much all the measures (laplacian eigenspectra and network-theoretic measures) are very similar between animals. This can be interpreted in two ways. (1) it may mean that the brain organisation is preserved, which is the interpretation of the authors. But it could also mean that (2) the metrics are not very informative. How do we know if we are in situation (1) or (2)? There is no comparison to a good null model (except in Fig4 but I don't think a random network is a good null). One possible null is two random networks connected to each other with a few random connections (to mimic left-right brains)?

      * The authors use cosine similarity to compare the eigenspectra distributions. I think this does them a disservice. cosine similarity normalises the distributions quadratically instead of linearly. But the main thing that is changing is the variance. So normalising quadratically diminishes the dissimilarities between distributions. I have looked at their data (thanks for sharing!) and using multidimensional scaling with Euclidean looks much better than with cosine distance. I would suggest using euclidean.

      * The authors use a bootstrapping method to calculate an average distance which they claim is useful because they don't have the same number of animals in each category. I don't think this bootstrapping is useful at all. If anything, it just adds noise. Averaging 10,000 samples with replacement does not change the outcome compared to simply averaging the matrices without the sampling. To test this: vary n and it should converge to the average of the original non-sampled data. (I've tried it!)

      * The authors should clarify whether they are using the weighted or binarised connectivity matrices in the spectral approach (and also what threshold). I suspect that they are using binarised matrices, which probably explains why the spectral results fit better with the graph topology results when the latter uses binarised matrices.

      - Parcellation.<br /> One main issue is the way in which the connectomes are divided up into 200 regions each, independent of the brain size. This to me seems a confound. I know it's rather standard practise in the field, but I have yet to see a validation that this does not influence the results. Given the enormity of the dataset here I would ask the authors to run their analyses in a way that the number of regions is a function of the size of the brain-this is a much more realistic assumption, as we know that a shrew size brain has about 20 cortical areas, whereas the human has about 180 according to Glasser et al.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper presents an approach to estimate Rt for 170 countries. While it is an impressive amount of work, I think the pipeline is similar to many currently available frameworks. The paper claims the following novelties over current framework, but more efforts are needed to be done to make it convincing.

      1) Obtain stable estimate from multiple types of data:

      It turns out the stable estimates just repeatedly use the same approaches to different time series (Figure 3A middle). From the wording I think there should be some methods to combine these time series to have a single estimate of Rt. Overall, the Rt and the time series of infection should be unique. It would be suboptimal, for example if there are big differences in the results from death time series and reported cases time series, which one should I trust?

      We think it is a strength to compute different Rt values based on different data, as this allows researchers, policy makers and the public alike to compare the information from different observation types directly. Any discrepancy between two Re trajectories (e.g. between the Re based on cases, Rcc(t), and that based on hospitalisations Rh(t)) is an indication to investigate which external variables (e.g. testing strategy) have changed. We have found it a great advantage when communicating and sharing our results outside of academia that we could point to these separately obtained Re estimates: if the estimates all agreed, more confidence could be given to them.

      If one would want to estimate a single estimate, this would require adopting a fundamentally different framework to estimate Re, which exceeds the scope of this work. One could use heuristics (weights representing the trustworthiness of a given source at a given time) to combine the various Re estimates into a single ensemble estimate. Alternatively, one could model the full underlying population dynamics (e.g. with a compartmental model including hospitalization and death) and adopt a fully Bayesian approach to fitting such a model. However, both options require heuristics or priors that will vary substantially through time and per country (as discussed in the Supplementary Discussion), and thus limit how widely the pipeline can be applied.

      We have revised the manuscript to make it more clear (early on) that we estimate multiple Re values from separate types of data (see also the response to reviewer 3, item #5). In addition, we now discuss more explicitly what the advantages and disadvantages are of showing these estimates separately (lines 281-290).

      2) Adequate representation of uncertainty:

      This is the result in Figure 2, suggesting the CI from EpiEstim is too narrow. This would be expected given that EpiEstim assumed the input infection time series is observed and fixed. It would be expected that the proposed approach would provide wider CI and hence the proportion covered would be more. However, I think to validate the wider CI is the correct one, simulation studies are required. I think the most related one would be Figure 1B. The results suggested that the approach works when the Rt is not rapidly changing. However, I have concern on the methods for simulation (details below).

      Indeed, the difference in coverage between our method and EpiEstim is due to observation noise. We agree the CI from EpiEstim should be correct assuming that the infection incidence time series can be observed perfectly. However, in reality quite a bit of variability is introduced between infection and case observation: not only due to the delay from infection to observation, but also due to e.g. reduced testing capacity on weekends or reporting errors. To accurately assess the coverage of our method (and whether the CIs are too narrow or too wide) we need to include realistic amounts of observation noise in the simulations. This is why we add autocorrelated noise to our simulated observations, where this noise mimics observed residuals in Switzerland and other countries (Figs. S3, S4, S15, S17).

      We have now added explicit comparison to the EpiEstim confidence intervals to supplementary Fig. S4. In addition, we extended the corresponding method section to describe more extensively why and how we added observation noise to our simulations (lines 498-518; see also the detailed response to comment 4 below).

      3) Real-time of the Rt

      There is no simulation about the real-time property of the Rt. The most related one is still Figure 1B. However, looks at the right-tail of the figure (the real-time performance), the proportion covered the true value is decreasing and more efforts are needed to support the framework can be accurate in real-time. For example, how is the real-time performance when Rt is increasing, or Rt decrease sharply due to lock down?

      As suggested, we included an additional simulation study to investigate the accuracy and stability of the last possible Re estimate. We present this analysis in a new results paragraph (subsection "Stability of Re estimates in an outbreak monitoring context"; line 121) and Figure S10. Using this analysis, we highlight the trade-off that exists between the timeliness of the Re estimates and their stability.

      4) simulation methods to estimate Rt

      Both 2) and 3) need simulation to support the results, and hence the simulation approach would be critical. The first part based on Poisson distribution to generate an infection time series, which is OK. However, the issue is the secondary part about how the authors obtained the time series for death/hospitalization/reported cases. To me, after generating the infection time series, based on the delay distribution from infection to death/hospitalization/reported, we could obtain those time series. I am not clear and sure if the authors approach is correct by using smoothing and fitting ARIMA to get those time series.

      We believe there may have been some confusion about how our simulation set-up works, and we provided insufficient detail on the design decisions behind this set-up. We have added more explanation for both points to the paper (lines 503-518; additional supplementary Figs. S15-S17). In brief, our simulation process consists of three parts. We first conduct the two steps the reviewer also mentioned: (i) simulating the infection time series, and (ii) simulating the observed time series by using the delay distribution from infection to death/hospitalisation/case report.

      However, we find that the observations simulated this way are too smooth compared to real data (see Figure S17). Possible reasons for this are that the delay distribution does not account for weekend and holiday effects, the random and occasional delay in recording confirmed cases, nor irregular components such as confirmed cases that are imported from abroad. We therefore added a noise term in our simulations, resulting in a third step: (iii) adding noise generated from an ARIMA model.

      To obtain a realistic ARIMA model for this third step, we fitted a model based on the confirmed case data for SARS-CoV-2 in Switzerland. Specifically, we first obtained the additive residuals based on the log-transformed confirmed cases. We then fitted ARIMA models of various orders and assessed the resulting ACF and PACF plots of their residuals. Based on this, we chose an ARIMA(2,0,1)(0,1,1) model. We refer to Figure S16 to support this: The first row shows the ACF and PACF plots of the original residuals, showing strong autocorrelation. The second row shows the ACF and PACF plots of the residuals after fitting the ARIMA model. We see that there is little autocorrelation left, indicating that this model is reasonable.

      In Figure S17, we present simulated observations based on all three steps, and one can see that they look more realistic than the simulated observations after step (ii).

      We would also like to point out that the ARIMA model is only used to obtain simulated observations. Our main method to estimate Re and obtain the related confidence intervals does not require fitting an ARIMA model.

      Minor comments:

      1) What does near real-time mean? The estimates of Rt are delayed for a few days like other approaches?

      Indeed, the estimates of Rt are delayed by the time it takes from infection to a case to be observed. We have replaced the term “near real-time” by “timely” throughout the manuscript, and added this explanation of the delay more explicitly to the text (line 86).

      2) For the results in Table 1, I think if there are some results suggesting that other approaches (like EpiEstim) perform worse than the proposed approach, it would be better to illustrate the value of the proposed approach.

      We have improved and extended the comparison of our method against others in two ways: (i) we added further comparison of the coverage of our method vs. that of EpiEstim to Fig. S4 (see also the response to major comment 2), and (ii) we added comparison against different commonly used pipelines (see minor comment 3 below). Instead of comparing to other approaches, the analysis in Table 1 was meant to illustrate the use of the Re estimates resulting from our method alone.

      3) I think more discussions are needed for the similarity and differences for current approach. For example, Abbott et al (https://wellcomeopenresearch.org/articles/5-112) used a similar pipeline.

      We added a section to the results (paragraph starting line 182; Fig. 3), dedicated to comparing our approach with relevant alternatives. We compared some of our empirical results with the estimates published on epiforecasts.io (based on EpiNow2 package from Abbott et al.), as well as official COVID-19 Re estimates for Austria (by AGES) and Germany (by RKI). We find that estimates published by the RKI and AGES health authorities are likely to be overconfident and to suffer from previously-identified biases (notably in Gostic et al., 2020, PLOS Computational Biology). We provide a detailed comparison of the features and approaches of these methods (EpiNow2, AGES, RKI), with the addition of the epidemia R-package (Supp File S2). This comparison highlights the unique features of the method developed: its ability to account for time-varying delay distributions and to combine symptom onset data with case data.

      4) Figure S11 is about accounting for known imports. While if the local cases are dominant and hence imported cases would not have a big impact on estimates of Rt. The impact of imported cases on estimates of Rt could be complicated, as suggested in Tsang et al. (https://pubmed.ncbi.nlm.nih.gov/34086944/). In addition to assuming imported cases and 'exported' cases could be canceled, it is also assumed that the imported cases had similar transmissibility to the local cases, which may not be true if there is border control.

      We thank the reviewer for this interesting comment and reference. We added a brief discussion in the result section of the manuscript to address this limitation (lines 174-177).

      Reviewer #2 (Public Review):

      This manuscript describes an algorithm of estimating real time effective reproductive number R_e (t). This algorithm combines several methods in a reasonable way: deconvolution of time series of reported case into time series of infection, a Poisson model for generation of infections, and block-bootstrap of residuals to assess uncertainty. Each component is not necessarily novel, but the performance of this algorithm has been validated using comprehensive simulation studies. The algorithm was applied to COVID-19 surveillance data in selected countries across continents, revealing a great deal of heterogeneity in the association of R_e (t) with nonpharmaceutical interventions. Overall, the conclusions seem reliable.

      I have several moderate critiques and suggestions:

      1) From a statistical point of view, it seems much more natural to integrate the infection generation process and the delay from infection to reporting, possibly with reporting errors, into the same model, with which you will avoid combining the bootstrap and the credible intervals in a somewhat awkward way. I understand you can take advantage of EpiEstim package, but the likelihood is very simple and easy to program up. Nevertheless, I'm not strongly against the current paradigm.

      We agree that such an integrated approach is useful, and makes the uncertainty interval estimation more coherent. However, in such an integrated approach one can not use the analytical solution for the likelihood, and methods that choose this approach (like EpiNow2 and epidemia) tend to pay for it in computational complexity. It also makes it harder to include time-varying delay distributions into the model, one aspect that sets our pipeline apart from existing alternatives.

      An additional advantage of our method is that estimates for the infection incidence are not influenced by priors on Re. In case of a bad model fit this allows us to separate more easily which part of the model may be misbehaving; and as such can help as a sanity check.

      Lastly, our framework has the advantage of modularity: pieces of the pipeline can be (and were) continuously refined or replaced with better pieces. This continuous improvement process allowed a flexible response to the pressing circumstances (the COVID-19 pandemic), and allowed us to extend it to entirely new types of proxy data (e.g., wastewater viral loads - https://ehp.niehs.nih.gov/doi/10.1289/EHP10050 ).

      2) Is there a strong reason to believe the residuals are autocorrelated? The block sampling with block size 10 seems arbitrary. The authors fitted an ARIMA model to the residuals for some countries, how good was the fitting? If the block size doesn't matter, then probably the stronger but simpler assumption of independent residuals may not compromise the estimation of R_e (t) much.

      Yes, there is reason to believe the residuals are autocorrelated. New supplementary Figure S15 shows the ACF and PACF of the residuals based on the confirmed cases of Switzerland, China, New Zealand, France and the US, and one can see that for most countries, the obtained residuals are clearly autocorrelated. We added this point to the simulations method section in the paper (lines 503-518). Please also see our response to Reviewer 2, major point 4 above.

      Choosing an optimal block size for the block bootstrap method is generally difficult. To capture weekly patterns, we need a block size of at least 7. We tried different sizes and found that 10 tended to work well in a variety of simulation settings (an example is given in Fig. S19).

      3) I don't see the necessity of using segmented R_e (t) instead of a smooth curve in the simulation studies. The inferential performance, especially the coverage of the CI's, is much less satisfactory when a segment has a steep slope. The authors may consider constructing splines based on the segments or using basis functions directly.

      We started using a segmented Re(t) trajectory to allow for simple parametric generation of different scenarios (e.g. in new Fig. S10), and to specifically study our ability to estimate sudden transitions in Re (discussed wrt. Table 1, Fig. S2). We agree this approach makes our method look worse than necessary, since it is generally difficult to estimate such abrupt changes in Re. However, we thought this would be the more stringent test of our method, as we will perform better on any more smooth trajectory.

      4) The authors smoothed the log-transformed observed incidences to come up with the residuals. For Poisson data, a variance-stabilized transformation is taking the square root, not the logarithm. In addition, as you already have bootstrap estimates, why not using quantiles directly for CIs but instead using a normal approximation (asymptotic)? When incidence is low, the normal approximation may be much less satisfactory. Also, when using normal approximation for CI, it's much safer to calculate standard deviation and construct CI at the log-scale, i.e., log(θ ̂^*(t)), and then exponentiate back.

      Our goal of transforming the original case observations is to stabilize the variance of the residuals. Indeed, the square root transformation is generally recommended if the data to be transformed is Poisson distributed. In our case, however, the original case observations are not quite Poisson. Specifically, the infection incidence at time t given the past incidence is modelled with a Poisson process (see Section 4.4), but the case observations are modelled with an additional convolution step of the infection incidence with a delay distribution, and there is additional variation due to e.g. weekday effects. It is thus not clear a priory which transformation works best for our data, and we therefore investigated various possible transformations (including the square root transformation). We found that no transformation was uniformly the best for data of different countries, but that the log-transformation tended to perform best overall. This is why we chose the log-transformation. Please see the new supplementary Figure S14, where we show the residuals after the square root transformation and the log transformations for various countries.

      Regarding the bootstrap confidence intervals, we also investigated different options. Again it is not clear a priory which bootstrap confidence interval performs best for our data, so we compared common choices like quantile, reversed quantile and normal-based in a simulation study. Specifically, we assessed their coverage and found that the normal-based confidence intervals performed best overall (see Fig. S4).

      For low incidence settings, none of the bootstrap methods perform very well (as bootstrap consistency does not apply). We now mention this consideration in the paper (line 442).

      Finally, regarding the suggestion to compute exp(SD(log(X)): This quantity is generally different from SD(X), which we need for the confidence intervals. We also refer to the coverage in the various supplementary figures (e.g. S2, S4, S5) to support that our approach works well.

      5) The stringency index is a convenient metric for intervention intensity. However, it doesn't reflect actual compliance as the authors admitted. Another likely more pertinent metric is human movement (could be multiple movement indices). Human movement indices may not be available in all countries, but they are available in some, e.g., the US, and first wave in China. In some states of the US, it was clear that human movement decreased substantially even before initiation of lockdown. Lack of human movement metrics most likely has contributed to the difficulty in the interpretation of Figure 4.

      We have added mobility data (from Apple and Google location data) to our general dashboard, and to the analysis shown in Fig. 5. The mobility traces give more detailed insight in the behavior that may have led to decreases in Re. However, we find similar patterns wrt. decreases in Re as with the stringency index. A more extensive analysis that focuses on different phases of the pandemic may allow for more detailed insights, but we believe this is beyond the scope of our manuscript.

    2. Reviewer #1 (Public Review):

      This paper presents an approach to estimate Rt for 170 countries. While it is an impressive amount of work, I think the pipeline is similar to many currently available frameworks. The paper claims the following novelties over current framework, but more efforts are needed to be done to make it convincing.

      1) Obtain stable estimate from multiple types of data:<br /> It turns out the stable estimates are just repeatedly use the same approaches to different time series (Figure 3A middle). From the wording I think there should be some methods to combine these time series to have a single estimate of Rt. Overall, the Rt and the time series of infection should be unique. It would be suboptimal, for example if there are big difference in the results from death time series and reported cases time series, which one should I trust?

      2) Adequate representation of uncertainty:<br /> This is the result in Figure 2, suggesting the CI from EpiEstim is too narrow. This would be expected given that EpiEstim assumed the input infection time series is observed and fixed. It would be expected that the proposed approach would provide wider CI and hence the proportion covered would be more. However, I think to validate the wider CI is the correct one, simulation studies are required. I think the most related one would be Figure 1B. The results suggested that the approach works when the Rt is not rapid changing. However, I have concern on the methods for simulation (details below).

      3) Real-time of the Rt<br /> There is no simulation about the real-time property of the Rt. The most related one is still Figure 1B. However, looks at the right-tail of the figure (the real-time performance), the proportion covered the true value is decreasing and more efforts are needed to support the framework can be accurate in real-time. For example, how is the real-time performance when Rt is increasing, or Rt decrease sharply due to lock down?

      4) simulation methods to estimate Rt<br /> Both 2) and 3) needs simulation to support the results, and hence the simulation approach would be critical. The first part based on Poisson distribution to generate an infection time series, which is OK. However, the issue is the secondary part about how the authors obtained the time series for death/hospitalization/reported cases. To me, after generating the infection time series, based on the delay distribution from infection to death/hospitalization/reported, we could obtain those time series. I am not clear and sure if the authors approach is correct by using smoothing and fitting ARIMA to get those time series.

      Minor comments:<br /> 1) What is near real-time mean? The estimates of Rt are delay for a few days like other approach?<br /> 2) For the results in Table 1, I think if there are some results suggesting that other approaches (like EpiEstim) perform worse than the proposed approach, it would be better to illustrate the value of the proposed approach.<br /> 3) I think more discussions are needed for the similarity and differences for current approach. For example, Abbott et al (https://wellcomeopenresearch.org/articles/5-112) used a similar pipeline.<br /> 4) Figure S11 is about accounting for known imports. While if the local cases are dominate and hence imported cases would not have a big impact on estimates of Rt. The impact of imported cases on estimates of Rt could be complicated, as suggested that in Tsang et al. (https://pubmed.ncbi.nlm.nih.gov/34086944/). In addition to assume imported cases and 'exported' cases could be canceled, it is also assumed that the imported cases had similar transmissibility to the local cases, which may not be true if there is border control.

    1. Author Response

      Reviewer #1 (Public Review):

      Xiong and colleagues use an elegant combination of theory development, simulations, and empirical population genomics to interrogate a largely unexplored phenomenon in speciation/ hybridization genomics: the consequences and implications of admixture between species with differing substitution rates. The work presented in this well-written manuscript is thorough, thought provoking, and represents an important advancement for the field. However, there are a few instances where I feel the strength of the conclusions drawn is not fully supported.

      Thank you for the positive comments!

      The authors begin by presenting evidence based on whole genome sequencing that the two focal species, P. syfanius and P. maackii, are highly diverged despite ongoing hybridization. Though the discussion of remarkable mitochondrial sequence similarity is underdeveloped. I do not understand how such a pattern is not most likely the result of introgression from one species to the other given the relatively high FST across much of the nuclear genome coupled with the generally higher mitochondrial mutation rate in animals.

      That’s a very good point. We have included this likely explanation of mitochondrial genome similarity in Line 84-86.

      Next, they posit that barrier loci are likely to exist. To support this assertion, the authors use a combination of parental population genetic diversity and divergence comparisons and ancestry pattern analysis in hybrid populations. They show that there is a strong correlation between divergence across pure species and within species diversity across the autosomes. Then using four hybrid individuals they show that low ancestry randomness, as quantified estimates of between group and within group entropy, is associated with genomic region of reduced within group diversity and elevated between group divergence. The use of entropy estimates as a stand-in for admixture proportions and ancestry block analysis when sample size is severely limited is particularly clever. Though I must admit, I do not fully understand the derivations of the two entropy measures, it seems to me that relatedness might have a strong effect on the interpretability of between individual entropy estimates (Sb). With very small population sizes this may be a real issue.

      Yes, genetic relatedness will play a big role in between-individual entropy (Sb). A group of highly correlated individuals will produce highly predictable ancestry (knowing one individual’s local ancestry gives much information on the local ancestries of others), and Sb will be small because entropy is a measure of uncertainty. If inbreeding is very severe, Sb will no longer be a useful measure because it will be too small across the entire genome. In our hybrid samples, although some genomic regions imply the possibility of inbreeding (see local ancestry of Z chromosomes in Figure 3–Figure supplement 1), there is still considerable variation of Sb across the genome which allows us to test for its correlation with DXY and π.

      A brief discussion of potential caveats in using the new method developed here seems warranted given its potential usefulness to the population genomics field more broadly. One plausible but less likely alternative interpretation of these patterns is briefly discussed.

      We have now devoted the first subsection of Discussion to the caveats and various motivation for entropy metrics. The appendix also contains further explanation of our intuition (section “Appendix-The entropy of ancestry”).

      The authors then move on to evidence for divergent substitution rates. Analysis of both D3 and D4 statistics using several different outgroups and a series of progressively stringent FST thresholds shows that site patterns between the two species are highly asymmetrical with P. maackii lineage harboring more substitutions than P. syfanius. The authors offer two possible explanations for this finding and then test both hypotheses. First, they use a comparative tree-based method to show that there is little phylogenetic evidence for lineage biased hybridization from outgroups into either of the focal lineages. Further, the range overlaps of the study species do not correspond with the inferred direction of allele sharing from the Dstat analysis. This is a good argument against contemporary gene flow between the outgroups and P. syfanius, but I am not convinced that ancient gene flow that could have occurred when, say, species distributions may have been different, can be ruled out using this analysis.

      Yes, we also felt that our original wording was overly strong. Now we say that our argument is based on current geographic distributions, but that archaic gene flow cannot be totally ruled out. However, we also point out that archaic gene flow with outgroups should still leave some detectable fractions of paraphyletic local gene trees after phylogenetic reconstruction. (Line 192-194).

      To test whether this asymmetry can be explained by a difference in substitution rate between the two species the authors show that observed D3 increases and D4 decreases with increasingly divergent outgroups as predicted by theory developed here. The authors take this as evidence supporting the divergent substitution rates. Though they claim only that existence such rate divergence is likely. The unfortunately limited samples sizes seem to preclude attaining more certainty than this. Interestingly, as a byproduct of using D4 as an extended measure of site pattern asymmetry the authors highlight one way in which the ABBA-BABA test can give false positives for introgression. This is an important contribution to the field.

      We agree with the reviewer that, for our data type – a handful of unphased genomes, it will be difficult to obtain more direct evidence for substitution rate differences. In line 182-187, we show using maximum-likelihood gene tree reconstruction that P. maackii samples often inherit more derived mutations than P. syfanius. This could be viewed as a separate test utilizing more accurate substitution models in phylogenetic software, while our theoretical calculation provides a coarse but testable signature of D3 and D4.

      To provide more direct evidence, we believe one ought to measure spontaneous mutation rates in both species under their native habitats, and obtain better knowledge of generation times and population sizes. The limitation of sampling and rearing these rare species are major barriers for incorporating this kind of evidence into this study.

      Finally, the authors observe a monotonic relationship substitution rate ratio and relative genetic divergence across the genome which is in line with their theoretical predictions for differential substitution rates in the face of gene flow. From this they infer an 80% increase in substitution rate from P. syfanius to P. maackii. It is remarkable to be able to extract these substitution rates from genomic regions with the least gene flow. However the veracity of these estimates relies on the assumptions I have highlighted above and should be presented with appropriate caution.

      We have included the limitations of our conclusions in the final subsection of the Discussion. Because high FST regions are relatively rare, estimates of observed rate ratio “r” have larger errors in those regions. This problem is partially resolved by using the entire monotonic relationship between r and FST to estimate the true rate ratio, so we rely not only on regions with the least gene flow but the full dataset.

      However, we do agree with the reviewer that ours is still a coarse theoretical framework since we do not impose a realistic substitution model (e.g., we don’t allow reverse mutations). We have now emphasized this weakness in the Discussion (Line 348-350).

      Reviewer #2 (Public Review):

      In their manuscript ("Admixture of evolutionary rates across a hybrid zone"), Xiong et al. use whole genome resequencing data to assess rates of genome evolution between two species of butterflies and determine whether putative barrier loci between the species are also those that evolve at asymmetric rates between them. This work presents a novel hypothesis and rigorously tests these ideas using a combination of empirical and theoretical work. I think the authors could more formally link loci that are evolving at highly asymmetric rates with those that are most likely to be barrier loci by evaluating the relationship between ancestry entropy and ratios of substitution rates between species. Additionally, clarifying the relationship between barrier loci and asymmetric evolution would be beneficial (i.e. are loci that we typically envision to be barrier loci, such as loci involved in reproductive isolation, evolving at asymmetric rates or do asymmetrically evolving loci represent a new type of barrier loci?).

      Many thanks for these comments! For the second point (clarifying the relationship between barrier loci and asymmetric evolution), we specifically mean that barrier loci (which specifically are of interest to those who study speciation) cause asymmetric rates of evolution to be preserved between hybridizing species. Asymmetric rates themselves are caused by other factors (spontaneous mutation rate differences, generation times, environmental effects) specific to each species, and barrier loci merely prevent the mixing of asymmetric rates. For the first point (evaluating the relationship between entropy and ratios of substitution rates).

    1. Gyuri Lajos 2 minutes ago https://youtu.be/5IfgBX1EW00?t=887 Listen go Frank Herbert for 3 minutes What he says there is perfect harmony of what you say. Thank you for saying. Top Quotes from the Frak Herbert Interview "remember that there's nothing at all wrong with saying that the Protestant ethic is full of it that it's all right to 00:14:30 enjoy your work you don't have to fight your way out of bed every morning you can get up every morning eager to go do whatever it is you do have a love affair 00:14:43 with your with your world and remember that you're not going to be able to predict every consequence of what you do" fiducary roots of science "question things I have the most fun that I'm writing questioning things that people do not question the assumptions that everybody 00:15:56 knows are true I'm going to declare a heresy for you all science if you go 00:16:07 back into its ruts saying why do I believe this well I believe this because of these tests and this this proof well why do I believe this why did I set up 00:16:21 this test why did I believe that proof all science goes back to something that we believe because we believe it we 00:16:34 believe it because we believe it and we have no proof for it it's like a religion so" And the message: Being comfortable with the unknown, as a finite human being "when you dig into the roots of 00:16:45 science a gray area at the bottom but it's like a balloon and the surfaces word the computer science has given us I 00:17:00 love this language the surface of the balloon is their face with what we do not know inside the balloon as we blow into it is what we have proved okay but 00:17:17 as we increase what we think we know we increase our exposure to what we do not know this is one of the inevitable laws 00:17:28 of our universe" as we increase what we think we know we increase our exposure to what we do not know this is one of the inevitable laws 00:17:28 of our universe no dead end, on and on and on "but isn't it more interesting to live in a universe where there are unknowns to discover new lands 00:17:43 to explore than to live in an absolute box where when you find the edge that's it baby no place to go from there I 00:17:57 I like the fact that we cannot predict everything I like the fact that we live in a universe where anything may happen because the alternative to me is a 00:18:12 constricting dead end" No End is the Ending, never Ending! Thank you Quinn. You've got it. Creating a space whaer I can share the same learnings. Anybody who got as far as Chapter House, may be on the second time of reading of it all will sure to get THIS. I believe that Show less Read more 0 0 Reply Gyuri Lajos 42 minutes ago Thank you articulating what I felt back then when I read it back then when it came out. I learned since recently that the message is "being comfortable with unknown", nay delight in it with pious awe towards the dignity of being reflected in human being

      never ending is the ending

      being comfortable with the unknown

      Frank Herbert Dune

    1. Author Response

      Reviewer #1 (Public Review):

      Redman and colleagues employed microprisms and two-photon optical imaging to track separately the structure of dorsal CA1 pyramidal neurons or the activity patterns of dorsal Dentate Gyrus, CA3, CA2 and CA1 pyramidal neurons, longitudinally in live mice. First, they carried out a characterization of the optical properties of their system. Second, they performed an example tracking of dendritic spines in the apical aspect of dorsal CA1 pyramidal neurons. Finally, they characterized differences in spatial coding along the tri-synaptic pathway, in the same animals. The main focus of the manuscript is technological and the authors show interesting data to support their technique, which I believe will be of relevance to neuroscientists interested in the hippocampal formation.

      Strengths.

      While using microprisms to achieve a "side" view of neurons in specific brain areas is not new per se [see Chia et al., J. Neurophysiol. (2009), Andermann et al., Neuron (2013), Low et al., PNAS (2014) etc.] the authors were able to visualize activity of a large neuronal circuit such as the hippocampal trisynaptic pathway - for the first time - in the same animal exploring an environment. This is not only a technical feat but it opens new scientific avenues to study how information is transformed at different stages within the hippocampus, as such I think this will be of broad interest for people in the field. In addition, the authors demonstrated imaging of dendritic spines in the apical aspect of pyramidal neurons but limited to dorsal CA1 due to the labelling density of the transgenic mouse line they decided to use. Despite the fact that imaging apical dendritic spines in dorsal CA1 has been shown earlier [see Schmid et al., Neuron (2016) and Ulivi et al., JoVE (2019)], the use of the micro periscope greatly increases the flexibility of these sort of experiments by enabling tracking of large portion (both apically and basally) of the dendritic arbors of dorsal CA1 pyramidal neurons.

      Thank you for the positive comments. We have clarified that apical CA1 dendrites have been imaged in previous work as you point out, just not along the somatodendritic axis (lines 127-130). We have also clarified that we were able to image CA2 and CA3 spines as well (only DG exhibited the increased labeling density in Thy1-GFP-M mice; lines 130-132).

      Weaknesses.

      While the data are sufficient to demonstrate the technique, the conceptual advance of the paper is very narrow. The findings on spatial coding differences in different hippocampal subregions - namely a nonuniform distribution of spatial information in the different hippocampal subregions - do not add new knowledge but largely confirm the literature. The results on the dynamics of apical dendritic spines of pyramidal neurons in dorsal CA1 seem to confirm previous work, but the interpretation of these results differs fundamentally. In fact both papers cited by the authors (Attardo et al., and Pfeiffer et al.,) come to the conclusion that dendritic spines on basal dendrites of CA1 pyramidal neurons are highly unstable, at least by comparison to other neocortical areas. The authors seem to ignore this discrepancy. However, this discrepancy has importance also to the characterization of the technique the authors developed. In fact, the optical resolution of the system strongly affects the ability to resolve neighboring spines - especially at the high density of dorsal CA1 - and thus it has a direct effect on the measures of synaptic stability [Attardo et al., Nature, (2015)]. The authors duly report lateral and axial resolutions for their micro periscopes and both are lower than the ones of Attardo and Pfeiffer, thus the authors should consider the effects of this difference on the interpretation of their data.

      We agree that the advance described in this manuscript is more methodological than conceptual. We do have other studies in progress that will be of greater conceptual interest. However, we believe the technique is of sufficient interest to the field that it is worth publishing the methodological approach and characterization as soon as possible.

      We have also addressed the comparison with Attardo et al. and Pfeiffer et al. mentioned by the reviewer. We actually agree with the previous work that dendritic spines in CA1 show a high degree of instability compared to cortex, finding ~15% spine addition and ~13% spine subtraction between consecutive days (Fig. 3H, I), similar to single-day turnover rates observed in Attardo et al. and other papers. Despite the high turnover rate, the fraction of experimentally observed spines that persist across 8-10 days plateaus around 75-80%, indicating that there is a substantial fraction of apical spines that remain stable in the face of ongoing daily turnover. This was also observed in basal dendrites by Attardo et al. (with similar survival fractions) and Pfeiffer et al. (albeit with lower survival fractions), so we would not necessarily characterize this as a discrepancy. We have clarified these points in the manuscript (lines 157, 162-168, 331-332).

      The reviewer pointed out that some previous studies used super-resolution microscopy to detect smaller structures and reduce optical merging. This would be an excellent extension of our work, as in principle super-resolution microscopy could be used with the implanted microperiscopes. Although the survival fractions we observed were similar to Attardo et al., they were higher than Pfeiffer et al., possibly due to the predicted effects of optical merging. We have updated the text to note that our results may inflate the degree of stability due to resolution limitations (lines 165-68, 335-340).

      Reviewer #2 (Public Review):

      Strengths

      The Hippocampus is a key brain region for episodic and spatial memory. The major Hippocampal subregions: Dentate Gyrus (DG), CA3, and CA1 have predominantly been investigated independently due to technical limitations that only allow one subregion to be recorded from at a time. In this paper the authors developed a new method that allows DG, CA3, and CA1 to be imaged simultaneously in the same mouse during behavior with a 2-photon microscope. This method will allow investigation of the interactions between Hippocampal subregions during memory processes - a critical yet unexplored area of Hippocampal research. This method therefore provides a new tool that will help provide insight into the complex functions of the Hippocampus during behavior.

      This method also provides high resolution optical access to deep dendritic structures that have been out of reach with existing methods. The authors demonstrate they can measure the structure of single spines on distal apical dendrites of CA1 cells. They track populations of spines and quantify spine changes, spines loss, and spine appearance. Spine turnover is thought to be a key process in how the Hippocampus encodes and consolidates memories, and this method provides a means to quantify spine dynamics over very long time periods (months) and can be used to study spine dynamics in CA3 and DG.

      We appreciate the comments.

      Weaknesses

      This method requires the implantation of a relatively large glass microperiscope that cuts through part of the Septal end of the Hippocampus. This is a necessary step to image transversally and observe all the major subregions simultaneously. This is an unfortunate limitation as it damages the very circuits being investigated. The authors attempt to address this by measuring the functional properties of Hippocampal cells, such as their place field features, and claim they are similar to those measured with other methods that do not damage the Hippocampus. However, it is very likely the implant-induced damage is affecting the imaged cells in some way, so caution should be taken when using this method. The authors are very aware of this and briefly discuss the issue. In addition, the authors observe damaged adjacent to face of the glass microperiscope that extends to ~300 um from the face. This area should therefore be avoided when imaging the Hippocampus through the microperiscope.

      We agree. This will be important for the interpretation of experiments using the microperisope approach. For many experiments, electrophysiology or traditional CA1 imaging approaches might be preferable to avoid damage to the hippocampal structure. We have tried to be straightforward about these caveats in our discussion. However, we believe the capability of imaging the transverse hippocampal circuit will allow a number of experiments that are currently intractable, and that the benefits will outweigh the caveats in these cases.

      Reviewer #3 (Public Review):

      Redman et al. describe a novel approach for long-term cellular and sub-cellular resolution functional and structural imaging of the transverse hippocampal circuit in mice. The authors discuss their procedure for implanting a glass microperiscope and show data that clearly support their ability to simultaneously record from neurons within the DG, CA3, and CA1 subregions of the hippocampus. They offer optical characterization demonstrating sufficient resolution to image at the cellular and subcellular level, which is further supported by experimental data characterizing changes in morphology of CA1 apical dendritic spines. Finally, neurons are recorded from as mice engage in navigation behavior, allowing authors to characterize spatial properties of hippocampal cells and relate findings to prior work in the field.

      The ability to image from multiple hippocampal subregions simultaneously is a great technical achievement, sure to advance study of the hippocampal circuit. In particular, this approach will likely have tremendous application for addressing the question of how neural representations dynamically change across the hippocampal subfields during initial encoding of novel contexts or later during retrieval of familiar. While the feasibility and utility of this preparation is supported by the data, further characterization of recorded cells will aid the comparison of data collected using this imaging approach to data previously collected with other methodologies.

      Thank you for the comments, we have addressed the specific concerns below.

      1) Further measures could be taken to more thoroughly evaluate the impact of the implant on cell health. While authors evaluate glial markers, it is not obvious how long after implant these measurements were taken. Additionally, authors could characterize cell responses of neurons recorded proximal to and more distal to their implant to further evaluate implant effect on cell health.

      Good points. We have added the date post implantation for the histology samples (Figure 1F caption). To address the second point, we added additional experiments characterizing functional response properties as a function of depth (Figure S7). We did not find systematic changes in place field width or place cell spatial information, as a function of imaging depth (lines 220-224; Figure S7A, B). We did however find a significant relationship between the decay constant for the fitted transients and depth, with cells close ( 130 um) to the surface of the microperiscope face exhibiting slower decay (Figure S7C). This appeared to be due to a small fraction of cells exhibiting longer decay times closer to the microperiscope face. As a result, we advise only imaging neurons >150 um from the microperiscope face (lines 224-226).

      2) More in-depth analysis of place cells will aid the comparison of data collected using this novel approach to previously published data. For instance, trial-by-trial data and clearer descriptions of inclusion criteria will allow readers a more detailed understanding of observed place cells.

      We have included example place cells with individual trial data (Figure 5C) and have added additional discussion and detail on our selection process for identifying place cells (lines 207-209, 663-666, 674676). In the revised manuscript, we further increased the stringency of our place cell criteria so that none of the cells with time shuffled responses pass the criteria. It should be noted that our place cells were not as reliable as those recorded in the presence of reward (Go et al, 2021). We chose to forgo reward to help ensure that the neurons were responding to spatial location and not to other task variables, but this likely reduced response reliability (see Krishnan et al, bioRxiv; Pettit et al, 2022). We have added discussion of this issue to the manuscript (lines 307-318).

    1. Author Response

      Reviewer #1 (Public Review):

      The goal of the work was to test for direct and indirect fitness costs associated with specific types of constructs that could be used for gene drive. The authors conclude that there are no direct fitness costs associated with the presence and expression of either Cas9 or the guide RNAs but that the Cas9 is causing off-target cuts that result in loss of fitness. They also conclude that a newer form of CAS doesn't cause these off-target cuts. While the goal of this study is important, there are many caveats associated with the work as reported, and these limit interpretation of the results, Many of the caveats are pointed out in the discussion.

      1.a) I am specifically concerned by the fact that from what I read, a company made the transgenic lines and that there was only one transgenic line per treatment. Unless the fly line used for the insertion was completely homozygous for the chromosome where the insertion was made, the lines could have differed in fitness, due to somewhat deleterious reccessives captured in one G1 but not another. This cost could have persisted for a number of generations after the crosses were made, especially in the high frequency "releases". This may not have been a real problem, but without any replication it is difficult to know.

      We apologize that this was unclear in our initial submission. We did in fact generate several transgenic lines of each construct and used independently obtained lines for each of our population cages, except for the Cas9_gRNAs construct, where four lines were used in seven population cages (replicates 1 to 4 were founded with the same line). All of these were also crossed to w1118 flies before we obtained homozygous lines, so the impact of deleterious alleles would have been minimized. We have edited the section “Generation of transgenic lines” in the Methods to clarify this.

      We also examined the possibility of fitness effects being caused by such alleles in our maximum likelihood analysis (assuming they are unlinked from the construct — otherwise they should have appeared as direct fitness effects). This model was not a good match for the data, nor was the model with direct fitness effects. Based on these results, we consider it unlikely that such deleterious alleles had a major impact on the observed frequency trajectories in our cage populations.

      1.b) My concern is reinforced by the fact that the no-Cas9, no-gRNA line goes up in frequency for the first 5 generations and then becomes stable in frequency. The loss of the fitness advantage is consistent with a fitness effect partially linked to the insertion site in that one cross but not others.

      Both of these cages were made with independent lines. We agree with the reviewer that the increase in frequency of the no-Cas9_no-gRNAs construct at the beginning of the experiment seems surprising at first. However, if an initial fitness advantage was truly driving the dynamics of this construct, we would expect that the “initial off-target model” (where fitness costs originated before the experiment) should have yielded the highest model quality in our maximum likelihood analysis, since we also allowed advantageous cut off-target alleles (i.e., fitness estimates > 1) in this model. While the maximum likelihood fitness estimate in the “initial off-target model” indeed exceeded the reference value of 1, its 95% confidence interval still included a fitness value of 1, and a neutral model actually yielded the lowest AICc value (i.e., best model quality, Table 3). We think that one possible explanation for this apparent initial frequency increase is that population cages tend to undergo larger than average fluctuations in the first one or two generations due to the smaller initial population size and potential health differences between founding fly lines (which can persist for a generation or two). We briefly note this in the manuscript methods section.

      1.c) It is important to note that the starting points are cages with separate vials of the control and experimental strain. Even a small difference in development time of the two strains in the first generation could lead to an excess of homozygotes in the next generation.

      We agree. In our maximum likelihood framework, such differences in development time should show up as a viability difference (fraction of offspring that made it to adulthood in the time window of our experiment). We now note in our revised manuscript that fitness differences between genotypes could be due to longer development time rather than an increase in the juvenile death rate in Cas9_gRNAs carriers. In the “Phenotypic fitness assays” section of our revised manuscript, we additionally state that “longer development time of individuals carrying the Cas9_gRNAs construct would also have appeared as a viability cost in our cage study but not in these fitness assays.”

      1.d) I am also concerned by the fact that the main conclusion is that the decline in frequency in the Cas9-gRNA line is due to off-target cuts, but there was no sequencing to back up that conclusion. In the discussion, this problem is mentioned but dismissed. I don't see how it can be dismissed when this is a major conclusion that remains based on very indirect evidence.

      We thank the reviewer for raising this important concern, which touches on the issue of how our approach differs from previous approaches that sought to directly detect off-target cleavage through sequencing. Our approach, by contrast, seeks to provide a “direct” measurement of the fitness of an allele. While this allows us to avoid the challenging task of detecting off-target mutations in vivo through whole-genome, population-level sequencing (and then predicting their potential effects), it comes at the price that inferences about the molecular nature of these fitness effects will rely on indirect evidence. However, we want to point out that our conclusion of these fitness effects being primarily due to off-target cleavage is based on three independent lines of evidence: (i) The maximum likelihood analysis of the frequency trajectory of the Cas9_gRNAs construct, where statistical model comparison ranked the off-target effect model higher than the direct fitness costs model; (ii) The fact that we inferred fitness costs only for the Cas9_gRNAs construct but not the construct in which Cas9 was replaced with the high-fidelity Cas9HF1 endonuclease (which should have similar expression and thus, similar direct fitness costs); and (iii) The heterogeneity we observed in the frequency trajectories of the Cas9_gRNAs construct in our cages, which is consistent with a model where off-target sites accumulate over the course of the experiment yet more difficult to reconcile with a model of direct fitness costs.

      Inspired by the reviewer’s recommendation, we wondered whether we may in fact be able to directly detect cuts at a few computationally predicted off-target sites. To this end, we performed Sanger sequencing at six sites that were computationally predicted for our Cas9_gRNAs construct by CRISPR Optimal Target Finder, which unfortunately revealed only wild-type sequences (this analysis is described in the new section “Evaluation of computationally predicted off-target sites”). However, we believe that this does not rule out off-target cutting as the primary driver of fitness costs for the Cas9_gRNAs construct due to the following arguments we state in the discussion section of our revised manuscript:

      “For example, our sequencing approach would not have allowed us to detect larger insertion/deletion events, which are frequently observed at on-target sites (48, 49). More likely though, we suspect that cleavage events occurred at other sites than the six computationally predicted ones. Indeed, the predictions by CRISPR Optimal Target Finder are based on cleavage specificity in cell lines, where off-target cutting is known to occur more frequently than in animals (47). All but one of the predicted off-target sites carry combinations of single nucleotide mismatches in the PAM-proximal and the distal region, which could make in-vivo cleavage less likely at these sites. Generally, our results are consistent with other studies that found off-target cleavage to frequently occur at sites which would have been difficult to predict computationally (50).”

      In a sense, our inability to detect any mutated alleles at this small set of computationally predicted off-target sites might actually highlight a key benefit of our approach: It can estimate the potential fitness costs of a construct without having to rely on accurate computational predictions of putative off-target sites or requiring the very costly approach of whole-genome, population-scale sequencing.

      Additionally, we would like to point out that while we found off-target effects to explain the empirical data best, we would probably consider our estimation of the overall magnitude of the fitness costs of the Cas9_gRNAs construct as one of the main conclusions of our manuscript, together with the fact that these were avoided when using the high-fidelity Cas9HF1 endonuclease instead. Thus, even if some readers may remain skeptical about the role of off-target cleavage (and we made sure to qualify our claims on this in the Discussion section accordingly), our systematic analysis of the overall fitness effects is more robust and should be of broad interest.

      1.e) When releasing homing gene drives, the initial frequency of the transgenic line is very low, and as in the Garrood et al paper cited, it is possible for the gene drive to outpace the non-target cutting. The modeling does not address what the impact of the presumed fitness costs in this experiment would be for a replacement/suppression drive released at low frequency.

      We thank the reviewer for raising this point. It has led us to add a completely new analysis on the “Effect of off-target fitness costs on gene drive performance”, in which we now show simulation results to illustrate the effect of direct and off-target fitness effects on both modification and suppression homing drives. We have also added more discussion on how these different types of fitness costs may affect other frequency-dependent CRISPR based gene drives.

      Reviewer #2 (Public Review):

      This paper reports a set of Drosophila population cage experiments aimed at quantifying fitness effects associated with the expression of Cas9 gene drive constructs in the absence of homing. The study attempts to deconvolve fitness effects due to the presence of the active nuclease at a genomic location from those that arise from off-target effects elsewhere in the genome: an important issue when considering gene drive strategies in the wild. To distinguish effects due to cleavage at the target site from activity elsewhere in the genome, a construct where Cas9 was replaced with a high fidelity nuclease (Cas9HF1) was employed. The experimental design compares the active nuclease-gRNA constructs targeting a site on another chromosome with no gRNA and reporter only controls, all inserted in the same locus. The Cas9 construct was assayed in 7 replicates with Cas9HF1 and controls assessed as duplicates with cages running for between 8 and 19 generations.

      2.a) There is a lack of clarity in terms of the cage set up design, the description in the supplementary methods could clarify if all the replicates came from a single founder and the difference in set-ups that necessitated ignoring some 1st generations.

      Thank you for pointing this out. We have thoroughly revised and extended our Methods section on “Generation of transgenic lines” to clarify this point. We now explicitly mention that we generated several transgenic lines of each construct and used independently obtained lines for each of our population cages, except for the Cas9_gRNAs construct, where we used four lines in seven population cages (replicates 1 to 4 were founded with the same line).

      For the cage start conditions, we now note that “To avoid potentially confounding maternal fitness effects on the construct frequency dynamics (which could arise based on minor differences in health or age between the initial batches of flies mixed together), we excluded the first generation of five cage populations…” In general, it is quite common for this to happen in insect population cage studies (please see some examples below) and is always a very short-term effect.

      2.b) The main finding reported from this part of the work is that with the control populations the frequency of the construct remained fairly constant across the generations, but the active nuclease tended to decline. I am somewhat confused by some of the claims here. First, the authors report a "bottoming out" effect where construct frequency declines then levels off: I am not entirely convinced that Figure 2 shows this. For example, comparing replicates 4 and 5 (8 and 16 generations respectively), it looks to me that there is a steady decline at the same rate with no evidence for a plateau. Perhaps replicates 2 and 3 show "some" evidence of leveling. In addition, replicates 4, 5, 6 and 7 have similar construct starting frequencies (particularly 5 and 7, which are only a few % different) yet the former show a steady decline whereas the latter maintain the construct at a steady level. This does not appear to be consistent with the author's explanation of higher off-target effects in populations carrying high frequencies of the construct. It would be helpful if the authors could more clearly explain the trajectories presented in Figure 2.

      We agree with the reviewer that our initial description of the raw construct frequency dynamics solely based on visual clues was making too strong claims (e.g., “different frequency dynamics between single replicates”) without providing more quantitative statistical support. This was originally intended as some basic introduction, with our maximum likelihood analysis then providing a more rigorous assessment in the next section. To improve clarity, we have completely restructured this in our revised manuscript. We removed the comparison of Cas9_gRNAs replicates solely based on visual clues, highlighted the general heterogeneity in trajectories among replicates (without making any specific claims), and instead of the vaguely defined “bottoming out” interpretation, we now only mention the average construct frequency change for the Cas9_gRNAs construct. In addition, we now present our more rigorous maximum likelihood analysis of the construct frequency trajectories and statistical model comparison earlier on in the Results section, so that all of our conclusions are now based on this statistical analysis, rather than an initial visual inspection of the curves. Please see also our comments to point 3.a) below, as reviewer 3 made very similar comments and suggestions.

      2.c) Utilising the allele frequencies obtained from the cages, 2 locus ML models were applied with the construct insertion site and an idealised off target site. They argue, correctly in my view, that fitness effects can be attributed to off target activity and not cleavage at the 3L target since the Cas9HF1 construct shows no substantive effect. In the models they assume that the presence of Cas9 in the germline (or maternally contributed) will invariably lead to cleavage at the idealised site. The model indicates that the construct insertion per se has no direct fitness costs but that off-target effects may have fitness consequences of approximately 30%, and seek to support this conclusion with simulations. I found this section difficult to follow but I feel that the conclusions are supported.

      We agree with the reviewer that the “Maximum likelihood analysis” section was too dense and therefore challenging to follow, especially for non-expert readers who may not be very familiar with such methods. We have revised and extended this section. In particular, we now also provide a brief summary of the modeling approach at the beginning of the section and have added subsection titles aiming to better guide the reader through the various steps of the analysis. Furthermore, we added a table with an overview of all tested models and highlighted the best-fitting models in tables 2 and 3. We hope that this has improved the clarity of our revised manuscript.

      2.d) Direct phenotypic assays with the active Cas9 nuclease were performed, looking at viability, mating preference and fecundity. Relegating these data to the supplements is not useful. While significant effects are attributed to the Cas9-gRNA construct, the authors cannot rule out a DsRed effect and it is a shame they did not assay at least one of the control constructs. In addition, in their modelling they assume that Cas9 activity will always cleave but see no evidence for this in the heterozygote viability assay. Whether this is due to the difference in rearing conditions that the authors claim is debatable.

      We thank the reviewer for this valuable feedback. As suggested, we have moved the phenotypic assays (Methods & Results) of the Cas9_gRNAs construct to the main part of the revised manuscript. We decided to conduct phenotypic assays only for the Cas9_gRNAs construct, because it was the only one that displayed some fitness costs in our maximum likelihood analysis (in particular, the DsRed construct did not display any fitness costs in the cages). However, given more time and capacity, we agree that additional phenotypic assays would have been desirable (e.g., a larger sample size per construct and additional constructs). Regarding our choice of model for the maximum likelihood analysis, we used a highly simplified off-target approach, which was necessary given the available information.

      2.e) Finally, since the initial cage experiments suggest that the Cas9HF1 enzyme reduces off-target effects they assay this enzyme in a model homing drive, indicating that this enzyme performs as well as the regular Cas9. Again, relegation of these data to supplementary datasets is unhelpful and it would improve the manuscript if these results could be simply summarised in a figure.

      We added an additional figure at the end of the “Cas9HF1 homing drive” section in the Results showing the gene drive inheritance rate and resistance allele formation rate in early embryos for the Cas9HF1 and Cas9 homing drive respectively. The gene drive inheritance rate is the percentage of offspring with DsRed fluorescence when crossing individual gene drive heterozygotes with “wildtype” homozygotes (i.e., not carrying any gene drive allele) and is used to calculate the gene drive conversion rate (i.e., the rate at which wildtype alleles are converted to drive alleles) mentioned in the main text. We hope that this has improved the clarity of our revised manuscript.

      2.f) Taken together, I think this is a useful study but is presented in a way that is at times impenetrable to the non expert. More clarity in presenting the cage and modelling data, as well as promotion of figures from supplementary material to the main manuscript would considerably aid the non expert and provide greater confidence in the interpretations. If these issue could be clarified I feel the work provides a useful addition to the gene drive field and will help those thinking about developing such strategies, particularly relevant are the findings related to the Cas9HF1 enzyme.

      We thank the reviewer for the valuable feedback. We have significantly revised the Results as well as the Discussion, provided additional information on the modeling approach, and shifted supplementary material to the main text of the manuscript. We hope this has improved the overall clarity of the manuscript.

      Reviewer #3 (Public Review):

      The manuscript by Langmuller, Champer and colleagues reports a set of experiments and models investigating the fitness effects of transgenes in Drosophila melanogaster carrying CRISPR components to determine how useful such transgenes may be for population control. This study benefits from well-designed transgene constructs that allow the investigators to distinguish the effects of on-target and off-target Cas9 endonuclease activity, and a sophisticated maximum likelihood modeling framework that allows estimation of the fitness effects of the transgene constructs. The manuscript's major shortcoming is the absence of statistical analysis of the allele frequency data and some potentially unrealistic assumptions that went into the model.

      3.a) My first recommendation is that a statistical analysis of the allele frequency data should be included in the manuscript, rather than inferring patterns solely from visual inspection of the data. Specifically, the manuscript claims that (lines 176-180): "We found Cas9_gRNAs to be the only construct that systematically decreased in frequency across all replicate cages (Figure 2). Interestingly, the allele frequency change was not consistent with fixed direct fitness costs. Instead, the construct frequency "bottomed out" in most replicates, and this occurred more quickly when the starting frequency was higher (Figure 2)." These conclusions regarding allele frequency changes should be supported by statistical analyses. What is the uncertainty surrounding the allele frequency estimates? Some indication of this uncertainty (such as error bars) could be added to Figure 2. Which of the trajectories in Figure 2 show a statistically significant change in allele frequency over the course of the experiment? Is the increase in the frequency of the no-Cas9_no-gRNA replicates significant? What support is there for the claim that the allele frequency changes "bottomed out"? Does a non-linear model fit these data significantly better than a linear trend? What is the evidence that allele frequency decreases slowed earlier "when the starting frequency was higher"? What is the evidence that "replicates 3 and 4 ... had very different frequency dynamics"? While they started at different frequencies, the slope of those two trajectories could be statistically indistinguishable. What is the authors' interpretation of the Cas9_gRNAs replicates 6 & 7 whose trajectories did not decrease?

      We thank the reviewer for this detailed recommendation. We agree that our description of construct frequency dynamics solely from visual clues was indeed making too strong claims (e.g., regarding “different frequency dynamics”) without providing enough statistical support for these specific statements. We had originally thought that some readers would prefer we first provide such a qualitative description of the allele frequency trajectories, prior to going into the mathematically more rigorous (but therefore also more complicated) maximum likelihood inference of fitness costs and statistical model comparison of different selection scenarios (“full inference model” vs. “construct model” vs. “off-target model”, etc.)

      In response to the reviewer’s comments, we decided to completely restructure this first part of the Results section. Specifically, we have removed our comparison of Cas9_gRNAs replicates solely based on visual clues, and also any mention of the admittedly vaguely defined “bottoming out” behavior. Instead, we now only mention the average frequency change for the Cas9_gRNAs construct across all replicates, while highlighting the heterogeneity among replicates. The maximum likelihood analysis is now introduced right after this and has also been revised extensively to improve clarity. We believe that this analysis provides a very powerful framework for the systematic inference of fitness costs and for assessing which of the different selection scenarios best explains our empirical data. This is because it combines the data from all replicates while fully accounting for the heterogeneity among them. For example, it could well be that construct frequency trajectories in individual replicates may not be statistically distinguishable from neutral evolution, yet in aggregate, an inferred fitness cost of the construct becomes highly significant. Note that the maximum likelihood framework also provides confidence intervals for its estimates, based on the entirety of the data. So the question of whether a departure from a neutral model is significant comes down to whether the 95% confidence interval surrounding the fitness estimate of the given construct still includes a value of 1 (which it does for the “direct fitness” estimate of the full model, but not for the “off-target fitness” estimate, see Table 2).

      Regarding the comment about error bars for the allele frequency trajectories in Figure 2, we want to point out that our construct frequency estimates are actually based on the genotype counts of all adult flies present in the given cage experiment at the specific time point. We therefore did not include uncertainty estimates in Figure 2, nor did we include sampling noise in the maximum likelihood analysis. We have now clarified this in the caption of Figure 2 and in the Methods section (“Maximum Likelihood framework for fitness cost estimation”). We also acknowledge that we still cannot rule out sampling noise completely (for example through escaped flies, phenotyping errors, or loss of frozen flies due to destruction or other issues). However, we expect that the relative contribution of these errors should be negligible compared to drift.

      The reviewer raises an interesting question: Why did the Cas9_gRNAs construct frequency not decrease in the two replicates with the highest construct starting frequency (replicate 6 and 7)? A possible explanation could be that — given a limited set of off-target sites — cut off-target alleles that impose a fitness cost will accumulate and start to independently segregate from the construct alleles very quickly in populations where the construct has a high starting frequency (and thus a higher overall rate of cleavage events). We now state this possible explanation in the section on “Construct frequency dynamics suggest moderate off-target fitness costs” of our revised manuscript.

      3.b) My second recommendation involves the assumptions that went into the maximum likelihood modeling. In particular, it strikes me as unrealistic to assume that 1) the genome contains only a single off-target site that is entirely responsible for the decrease in fitness due to Cas9 activity; and 2) that the rate of off-target mutation is as high as it is assumed to be ("In individuals that carry a construct, all uncut off-target alleles are assumed to be cut in the germline, which are then passed on to offspring that could suffer negative fitness consequences."). Regarding point 1), isn't a more realistic scenario that there are multiple off-target sites, each with a potentially different fitness consequence resulting from Cas9-induced mutations? If so, doesn't the likelihood that all off-target sites have been cut depend on the number of such sites, as multiple off-target sites should reduce the mutation rate at any single site. This possibility also suggests that there may be multiple loci with potentially deleterious Cas9-induced alleles segregating within the experimental populations. Regarding point 2), even assuming only a few potential off-target sites per genome, it seems like the rate of off-target cutting would have to be unrealistically high to approach mutating all off-target sites in the population. The conversion efficiency of the constructs used here is reported as ~80% and 60% in females and males, respectively; it seems likely that the rate of Cas9 mutation at off-target sites is lower than this efficiency for the target site. These assumptions should be justified or relaxed before claiming that mutational saturation of off-target sites is responsible for a decreasing fitness loss over the course of the experiments (after confirming that there is statistical support for the claim that the allele frequency trajectories bottom out).

      The reviewer raises a very important point: modeling only one off-target site that represents the net fitness effect of Cas9 cleavage outside the target region as well as a cut rate of 100 % (i.e., the off-target site is always cut in the presence of Cas9) is highly idealized.

      (1) We agree with the reviewer that in reality, the experimental populations might have a polygenic off-target landscape, where the fitness of cleavage alleles could differ vastly within as well as between loci. However, given the limited number of data points (e.g., n=87 generation transitions for experimental populations with the Cas9_gRNAs construct), it would be extremely difficult if not impossible to disentangle the numerous parameters that would be necessary to describe such a more complex off-target scenario with our modeling approach. We have now highlighted our model choices, potential caveats, and resulting limitations in both the Discussion section and also the section “Construct frequency dynamics suggest moderate off-target fitness costs” in the Results.

      (2) Similar to the single off-target locus, our cut rate of 100 % is an idealized assumption that was chosen with the aim to reduce model complexity. As outlined above, it would be extremely hard to disentangle the cut rate from other parameters (such as the number of target sites if fitness effects are multiplicative across loci). Additionally, we would like to point out that the reported conversion efficiencies (~80 % in males, ~60% in females) are not the conversion efficiencies of the constructs in the experimental populations shown in Figure 2, but of separate homing drives with a single gRNA. All constructs in the experimental populations are designed in a way that no homing can occur, and they have four gRNAs if any. We apologize for the confusion. Our revised manuscript contains now a paragraph in the “Cas9HF1 homing drive” section in the Results that highlights the differences between the constructs in the cage populations and the homing drives assessed in this study. Furthermore, we have added an additional figure that displays the individual results of the homing drive (Figure 5) — we hope this improves clarity.

      3.c) My third suggestion involves the correspondence between the results of the likelihood modeling and the phenotypic assays. The best fit model inferred a viability loss of 26% and no detectable effects on female choice (or male attractiveness) or fecundity. In contrast, the phenotypic assays inferred no detectable effect on viability, but a 50% reduction in male attractiveness and 25% reduction in female fecundity. I think that the authors' conclusion that "[t]hese assays broadly confirmed our previous findings" needs some context or explanation as to how these numerically discrepant findings are broadly confirming, beyond the speculation that the discrepancy in viability may be due to rearing in vials vs. population cages.

      We thank the reviewer for pointing this out. We removed the claim that the phenotypic assays “broadly confirmed our previous findings” and highlight now the differences in estimated fitness costs for male and females in the phenotypic assays as well as the discrepancy to our maximum likelihood estimates. Furthermore, we provide now additional explanations for what might be causing this phenomenon (i.e., single crosses vs. large populations, vial vs. cage, interactions between individual genotypes and the environment, delayed development of construct homozygotes being interpreted as reduced viability in the maximum likelihood analysis). We also point towards the discrepancies in the Discussion of our revised manuscript and recap potential explanations.

      3.d) My fourth suggestion involves the comparison between the Cas9_gRNAs and Cas9HF1_gRNAs transgenes. The inference that off-target cuts are the major source of fitness loss for the Cas9_gRNAs construct relies heavily on the observation that there was no decrease in allele frequency for the two Cas9HF1_gRNAs replicates. It therefore seems critical to be confident in this observation, and to rule out alternative explanations as much as possible. For example, did the authors confirm that the Cas9HF1_gRNAs construct has on-target Cas9 activity levels as high as the Cas9_gRNAs construct? Although I am not certain about this (see comments in the next paragraph on this point), I think the transgene constructs used to estimate drive conversion rates are different from the constructs used for the population cage experiments; if this is correct, I think it would be helpful to provide the on-target mutation rates for the actual constructs used in the population cages.

      The reviewer is correct: The constructs in the population cages are different to the homing gene drives for which we estimated the gene drive conversion rates. However, we were able to confirm at least one mutated gRNA target site in every PCR-based genotyped offspring of individuals carrying either the Cas9_gRNAs or the Cas9HF1_gRNAs construct (this is now specified in the manuscript). Thus, we did not expect a systematic difference in on-target mutation rates for Cas9_gRNAs, and Cas9HF1_gRNAs constructs respectively. We acknowledge in the Discussion that construct performance might substantially vary with genomic sites and even organisms.

      3.e) Relatedly, I was confused about the portion of the manuscript that reports the drive conversion efficiency. The manuscript states, "As a proof-of-principle that Cas9HF1 is indeed a feasible alternative, we designed a homing drive that is identical to a previous drive (45), except that it uses Cas9HF1 instead of standard Cas9. This drive targets an artificial EGFP target locus with a single gRNA (see Methods)." Given that the rate of drive conversion was estimated by the loss of GFP, these homing drive constructs must be different from the constructs used in the population cage experiments, as those constructs targeted a site on chromosome 3L which does not contain GFP. I could not find a description of these homing constructs in the Methods - while a reader might be able to puzzle this out by reading reference #45, I think it would be helpful to explicitly describe these details in this manuscript.

      We apologize for the confusion. We have highlighted the similarities (e.g., nanos promoter, DsRed) as well as the differences (e.g., number of gRNAs) between the homing drives and the constructs in the cage populations at the beginning of the section “Cas9HF1 homing drive” in the Results. We hope this makes it more clear.

    2. Reviewer #3 (Public Review):

      The manuscript by Langmuller, Champer and colleagues reports a set of experiments and models investigating the fitness effects of transgenes in Drosophila melanogaster carrying CRISPR components to determine how useful such transgenes may be for population control. This study benefits from well-designed transgene constructs that allow the investigators to distinguish the effects of on-target and off-target Cas9 endonuclease activity, and a sophisticated maximum likelihood modeling framework that allows estimation of the fitness effects of the transgene constructs. The manuscript's major shortcoming is the absence of statistical analysis of the allele frequency data and some potentially unrealistic assumptions that went into the model.

      My first recommendation is that a statistical analysis of the allele frequency data should be included in the manuscript, rather than inferring patterns solely from visual inspection of the data. Specifically, the manuscript claims that (lines 176-180): "We found Cas9_gRNAs to be the only construct that systematically decreased in frequency across all replicate cages (Figure 2). Interestingly, the allele frequency change was not consistent with fixed direct fitness costs. Instead, the construct frequency "bottomed out" in most replicates, and this occurred more quickly when the starting frequency was higher (Figure 2)." These conclusions regarding allele frequency changes should be supported by statistical analyses. What is the uncertainty surrounding the allele frequency estimates? Some indication of this uncertainty (such as error bars) could be added to Figure 2. Which of the trajectories in Figure 2 show a statistically significant change in allele frequency over the course of the experiment? Is the *increase* in the frequency of the no-Cas9_no-gRNA replicates significant? What support is there for the claim that the allele frequency changes "bottomed out"? Does a non-linear model fit these data significantly better than a linear trend? What is the evidence that allele frequency decreases slowed earlier "when the starting frequency was higher"? What is the evidence that "replicates 3 and 4 ... had very different frequency dynamics"? While they started at different frequencies, the slope of those two trajectories could be statistically indistinguishable. What is the authors' interpretation of the Cas9_gRNAs replicates 6 & 7 whose trajectories did not decrease?

      My second recommendation involves the assumptions that went into the maximum likelihood modeling. In particular, it strikes me as unrealistic to assume that 1) the genome contains only a single off-target site that is entirely responsible for the decrease in fitness due to Cas9 activity; and 2) that the rate of off-target mutation is as high as it is assumed to be ("In individuals that carry a construct, all uncut off-target alleles are assumed to be cut in the germline, which are then passed on to offspring that could suffer negative fitness consequences."). Regarding point 1), isn't a more realistic scenario that there are multiple off-target sites, each with a potentially different fitness consequence resulting from Cas9-induced mutations? If so, doesn't the likelihood that all off-target sites have been cut depend on the number of such sites, as multiple off-target sites should reduce the mutation rate at any single site. This possibility also suggests that there may be multiple loci with potentially deleterious Cas9-induced alleles segregating within the experimental populations. Regarding point 2), even assuming only a few potential off-target sites per genome, it seems like the rate of off-target cutting would have to be unrealistically high to approach mutating all off-target sites in the population. The conversion efficiency of the constructs used here is reported as ~80% and 60% in females and males, respectively; it seems likely that the rate of Cas9 mutation at off-target sites is lower than this efficiency for the target site. These assumptions should be justified or relaxed before claiming that mutational saturation of off-target sites is responsible for a decreasing fitness loss over the course of the experiments (after confirming that there is statistical support for the claim that the allele frequency trajectories bottom out).

      My third suggestion involves the correspondence between the results of the likelihood modeling and the phenotypic assays. The best fit model inferred a viability loss of 26% and no detectable effects on female choice (or male attractiveness) or fecundity. In contrast, the phenotypic assays inferred no detectable effect on viability, but a 50% reduction in male attractiveness and 25% reduction in female fecundity. I think that the authors' conclusion that "[t]hese assays broadly confirmed our previous findings" needs some context or explanation as to how these numerically discrepant findings are broadly confirming, beyond the speculation that the discrepancy in viability may be due to rearing in vials vs. population cages.

      My fourth suggestion involves the comparison between the Cas9_gRNAs and Cas9HF1_gRNAs transgenes. The inference that off-target cuts are the major source of fitness loss for the Cas9_gRNAs construct relies heavily on the observation that there was no decrease in allele frequency for the two Cas9HF1_gRNAs replicates. It therefore seems critical to be confident in this observation, and to rule out alternative explanations as much as possible. For example, did the authors confirm that the Cas9HF1_gRNAs construct has on-target Cas9 activity levels as high as the Cas9_gRNAs construct? Although I am not certain about this (see comments in the next paragraph on this point), I think the transgene constructs used to estimate drive conversion rates are different from the constructs used for the population cage experiments; if this is correct, I think it would be helpful to provide the on-target mutation rates for the actual constructs used in the population cages.

      Relatedly, I was confused about the portion of the manuscript that reports the drive conversion efficiency. The manuscript states, "As a proof-of-principle that Cas9HF1 is indeed a feasible alternative, we designed a homing drive that is identical to a previous drive (45), except that it uses Cas9HF1 instead of standard Cas9. This drive targets an artificial EGFP target locus with a single gRNA (see Methods)." Given that the rate of drive conversion was estimated by the loss of GFP, these homing drive constructs must be different from the constructs used in the population cage experiments, as those constructs targeted a site on chromosome 3L which does not contain GFP. I could not find a description of these homing constructs in the Methods - while a reader might be able to puzzle this out by reading reference #45, I think it would be helpful to explicitly describe these details in this manuscript.

    1. Author Response

      Reviewer #3 (Public Review):

      The primary strength of this study is in establishing the N999S heterozygous mouse as a useful model system for debilitating paroxysmal non-kinesigenic dyskinesia (PKND), with or without epilepsy. This outcome was hard-won following a comprehensive analysis of biophysical, neurophysiological, and behavioral tests. Ultimately the convincing evidence was demonstrated through a clever application of a stress-related behavioral test (quite in alignment with triggers in patients) to elicit the hypo-motility associated with PKND. Like patients who exhibit variable penetrance, even highly inbred mice exhibit much variability, and uncovering a robust phenotype took a nuanced approach and perseverance.

      To reach this point, several experiments provided mechanistic insights into the mutant channel behavior. First, whole-cell patch clamp experiments revealed shifts in the G-V consistent with gain-of-function behavior previously characterized using the N999S and D434G mutants expressed heterologously. Novel observations of H444Q revealed a loss-of-function (LOF) behavior with the G-V shifted to positive potentials but to a lesser degree. These electrophysiological phenotypes establish the rank of predicted severity as N999S>D434G>H444Q.

      This prediction was tested in brain slices of heterozygous animals where the mutant channels would be normally spliced and associate with WT subunits and other components such as beta subunits. The investigators evaluated BK currents by patch clamp from hippocampal neurons where BK channels are known to play key functional roles. Both N999S and D434G showed the predicted increase in current magnitude, though interestingly the differences between them apparent in heterologous expression were lost in the native setting. Curiously, no differences in BK current magnitude were observed in neurons of heterozygotes carrying the putatively LOF mutation H444Q.

      In terms of seizure susceptibility, D434G mutants different from WT and less severe than N999S mutants with respect to time to evoked seizure, although differences in "EEG power" were not statistically significant between D434G and WT. These observations support the conclusion that D434G represents an intermediate disease phenotype.

      The behavioral studies were the most effective in revealing differences among the variants and in defining GOF N999S heterozygotes as a compelling animal model for PKND and providing evidence that the LOF mutation conferred the opposite effect of hyperkinetic mobility. The findings provide the new insight that KCNMA is the target of heritable, monogenic disease, a conclusion that was previously not forthcoming because known human mutations have arisen de novo. The dyskinetic phenotypes in response to stress induction are wholly consistent with patient symptoms.

      With respect to rigor and reproducibility, it is commendable that the investigators were blinded to genotype during data collection and analysis. Moreover, the study provides an important confirmation of previous findings from another lab regarding the cellular phenotype of the N999S mutant. WT controls were compared to transgenic littermates within individual transgenic lines. In some cases, the sample sizes were rather low (see below), but otherwise the study seems rigorous.

      The strengths of the manuscript far outweighed the weaknesses. The experiments interpreted to suggest a gene dosage effect with D434G were not compelling to this reviewer and might be better documented in the supplement with the conclusion that further work is required.

      Due to pandemic-related animal and lab issues, we were unable to generate and surgically implant full Kcnma1D434G/D434G homozygous cohorts for the EEG/seizure portion of the study. We focused instead on using the limited mice of this genotype for the novel PNKD3 assays (n=7), leaving the seizure dataset at n=3.

      To address the concern, the Kcnma1D434G/D434G data was removed from Figure 4 to avoid overinterpretation of a gene dosage effect. However, we did retain the individual measurements within the Results text (lines 383 and 385), on the basis of facilitating direct comparisons between our study and other D434G studies. For example, even with only three measurements, the trend toward the shortest seizure latencies in Kcnma1D434G/D434G mice is similar to the result obtained with an independently generated D434G mouse model (Dong et al, 2022). Yet seizure power and the presence of spontaneous seizures do not show a similar trend, suggesting our results differ from theirs in these important aspects. This is now stated more clearly in the revised conclusion for that paragraph, ‘While not conclusive and requiring substantiation in a larger cohort, the Kcnma1D434G/D434G seizure data raise the possibility of a gene dosage effect with D434G that qualitatively differs from an independently-generated D434G mouse model (Dong et al., 2022),’ (lines 388-390).

      In contrast to the seizure part of the study, the increased severity of Kcnma1D434G/D434G PNKD-immobility is fully supported by the data with sufficient statistical power (Figure 5D). However, the idea that the increased severity with homozygous D434G in PNKD-immobility was consistent with gene dosage observations for seizure was removed for consistency (lines 549-550).

      As a side note, we also added additional clinical descriptors (akinesia) and colloquial descriptions for PNKD3 (‘drop attack’) to disambiguate how a PNKD3 episode appears different from other types of motor dysfunction. This was to facilitate comparison with the two other KCNMA1-D434G models (mouse and fly; Dong et al, 2022; Kratschmer et al., 2021), which report aspects of dyskinesia in the setting of baseline locomotor dysfunction. To our knowledge, these models have not been evaluated for the striking ‘drop attack’ immobility presenting in patients (lines 84-85).

      The consequences of the altered BK current levels were assessed on the voltage dependence of firing frequency in the hippocampal neurons, but it was not very clear how increased BK current would enhance neuronal excitability. Also, how might it lead to the PKND phenotype? A paragraph even speculating on these mechanistic links in the Discussion would be welcome.

      The mechanism for how BK currents increase action potential firing are not fully identified in this study (see also response to reviewer #2). In the Results, a new paragraph was added at the end of action potential section to summarize the AHP changes in more detail and speculate an indirect mechanism of action for the increase in BK current, predicted from a similar ‘GOF’ BK current type, where β4 regulation of BK channels is lost (lines 294-304). Additional details have also been added to the Discussion regarding the factors contributing to lower seizure threshold (lines 675-680).

      Additional re-organization of Discussion text addresses the basis for PNKD. A direct statement that it is not clear yet which neurons/circuits are the most critical for PNKD-like symptomology was added, and which of these express BK channels (lines 680-700). We follow with a succinct summary of phenotypically-relevant PNKD models. While there is a lot to unpack with respect to similarities and differences between different paroxysmal dyskinesia models in the literature, they ultimately shed little light the question of KCNMA1 PNKD3-related dysfunction. With the addition of the d-amp rescue control, we focus mainly on the amphetamine response predicting a CNS locus (lines 692-693). The d-amp response may even suggest dopaminergic pathways (some of which express BK channels) as a plausible to investigate in future studies, but due to the complex interplay of d-amp dosage and the novel motor assay, we don’t think speculating on a specific circuit is supported with enough actual data to add in the Discussion.

    1. Author Response

      Reviewer #1 (Public Review):

      The 2019, Johnson et al., Science study (referred to as "2019 study" or "prior study" in the rest of the comments) measured mutational robustness in F1 segregants derived from a yeast cross between a laboratory and a wine strain, which differ at >35,000 loci. To realize this, the authors developed a pipeline 1) to create the same set of transposon insertion mutations in each yeast strain via transformation; and 2) to measure the fitness effects of these specific insertion mutations.

      In this manuscript, the authors applied the same pipeline to laboratory evolved yeast strains that differ in only tens or hundreds of loci and thus are much less divergent than those used in the prior study. Both studies aim to characterize how the fitness of the sets of insertion mutations (mostly deleterious) vary depending on the existing mutations (mostly beneficial) in those yeast strains. However, the current manuscript, especially when compared to the prior study, suffers from several major weaknesses.

      First, only 91 genes out of >6,000 genes in the yeast genome are perturbed in the manuscript. The small set of disruption mutations is unlikely to faithfully capture the pattern of epistasis in the selected clones. By comparison, >1,000 insertion mutations were evaluated in the 2019 study. Because the majority of the >1,000 tested mutations were neutral, the authors focused on 91 insertions that had significant fitness effects. The same 91 insertion mutations are used in the current study. However, as evident in both studies, epistasis plays an important role in how insertion mutations interact with different genetic backgrounds. Considering the vastly different genetic backgrounds between clones used in the prior and current studies, the insertion mutations of interest in the current study is unlikely to be the same as those in the prior study. The large-scale genetic insertion used in the prior study is suggested to be conducted in the current study.

      This concern is summarized in Essential Revision 1 above; see our comments there for our detailed response. Briefly, we have added an additional Figure Supplement (Fig. 1 – Supplement 8; see above) demonstrating that the 91 insertion mutants have a similar range of effects in this study as in the previous one (which may be expected since the genetic backgrounds here are as closely related to those in the 2019 study as the backgrounds in the 2019 study are to each other).

      Second, the statistical power in the current manuscript is insufficient to support the conclusions. Fitness errors were not considered when several main conclusions were drawn (fitness errors on the y-axis of Figure 1B are not available; fitness errors on the x-axis of Figure 2 are not available). The current conclusions are invalid without knowing the magnitude of fitness error. Fitness of each clone should be measured in at least two replicates in order to infer errors of fitness measurements. Additionally, the authors isolated two clones from the same timepoint of each population and treated them as biological replicates based on the fitness correlation between the two clones. However, this practice can be problematic because the extent of fitness correlation varies across populations and it is less likely to capture the patterns of epistasis when clones are isolated from more heterogeneous populations. Similarly, the authors could avoid this bias by measuring the fitness of each clone in multiple replicates and treat the two clones from the same timepoint/population separately.

      We agree that details about statistical methods, most of which are taken from Johnson et al. (2019), were not clear in our text. As we also describe in our response to the Essential Revisions above, we have rewritten a large part of the methods text to provide more details about statistical methods and have calculated and reported errors more broadly:

      Errors on fitness effects: We have expanded our methods text describing how the fitness effect of a mutation is determined for a single clone / condition. This text now emphasizes the internal replication provided by redundant barcodes, which allows us to calculate a standard error for the effect of a mutation in a single clone / condition. These errors are shown in Figure 1 – Figure Supplements 1-3. We have also added details on how errors are calculated for a mutation for a population-timepoint, and these errors are now included in Figure 2.

      Errors on the DFE mean: We discuss this below.

      Considering clones separately: As we also describe in the essential revisions above, Johnson et al. (2021) shows that the mutational dynamics in these evolving populations are dominated by successive selective sweeps, so we expect clones isolated from the same population-timepoint to rarely differ by many mutations. However, we agree that there are likely some cases in which the two clones have important genetic differences. To address this concern, we have reanalyzed our data as you suggest, considering each clone separately. The results of this analysis are included for every main text figure in the form of figure supplements (Figure 1 - figure supplement 7, Figure 2 - figure supplement 5, Figure 3 - Figure supplement 5, and Figure 4 - figure supplement 1), which show that our qualitative conclusions are unchanged.

      Reviewer #2 (Public Review):

      Johnson and Desai have developed a yeast experimental-evolution system where they can insert barcoded disruptive mutations into the genome and measure their individual effect on fitness. In 2019 they published a study in Science where they did this in a set of yeast variants derived by crossing two highly diverged yeast. They found a pattern that they termed "increasing cost epistasis": insertion mutations tended to have more deleterious effects on higher fitness backgrounds. The term "increasing cost epistasis" was coined to play off the converse pattern commonly observed in experimental evolution of "diminishing returns epistasis" wherein beneficial mutations tend to have smaller effects on more fit backgrounds. Another way to think about fitness effects is in terms of robustness: when mutations tend to have little effect on phenotype, the system is said to be robust. Thus, when increasing costs epistasis is observed, it suggests that higher fitness backgrounds are less robust.

      In this paper, Johnson and Desai use this same barcoded-insertions system in yeast, but here the backgrounds receiving insertions are adapting populations. More specifically, they took 6 replicate populations that evolved for 8-10k generations and inserted a panel of 91 mutations at 6 timepoints along the evolutions. They then did this entire experiment in two different environments: one in rich media at permissive temperature (YPD 30) and one in a defined media at high temperature (SC 37). Importantly, the mutations accumulating in a population over time here are driven by selection-and thus the patterns of epistasis observed here are probably more relevant to "real" evolution than the backgrounds from the 2019 paper. The overarching question, then, is whether similar patterns of epistasis is found in these long-term adaptations and across conditions as was previously observed.

      The first major finding in this work is that at YPD 30 (where the yeast are presumably "happy"), the mean fitness effect does decline in most (but not all) populations as they adapt. Since the population is becoming more fit over time (relative to a constant reference type), this is consistent with the previously observed pattern of increasing cost epistasis. The strength of the effect is, however, weaker than in that previous work. The authors speculate that this may reflect the fact that far few mutations are involved here than in the previous study-giving far fewer opportunities for (mostly negative) epistatic effects. I find this explanation likely correct, although speculative.

      The second major, and far more surprising, result is that in the other condition (SC 37), the insertion mutations mutations do not show a consistent trend: mean fitness effect of the insertion mutations does not change as adaptation proceeds. This is despite the fact that fitness increases in these population over time just as it did in the YPD populations. Toward the end of the paper, the authors speculate as to why this is the case. They argue that in the YPD 30 environment, selection is mainly on pure growth rate. They suggest that the growth rate depends on different components such as DNA synthesis, production of translation machinery, and cell wall synthesis. Critically, these components are non-redundant and can't "fill in" for each other. So, for example, rapid DNA synthesis is of little value if cell-wall synthesis is slow. As adaptation fixes mutations that increase the function of one of these growth components, they shift the "control coefficient" to other components. This, they argue, may be the major explanation behind increasing cost epistasis. I find the logic of their argument compelling and potentially providing great insight into developing a richer view of epistasis. Future experiments will be needed to test how well the hypothesis holds up. They then flip the argument around and suggest that in the SC 37 environment, the targets of selection are fundamentally different from those in growth-centric YPD 30 conditions. Instead, they argue, there is likely more redundancy in the components that mutations are affecting. I again find their arguments compelling.

      After establishing these observed patterns for mean effects, they examine individual mutations and look at the relationship of fitness effects as a function of background fitness. The upshot of this analysis is that there are more negative correlations than positive ones (especially in the YPD 30 conditions), but also that there is a lot of variation: there are many mutations that show no correlation and a small number with a positive correlation. This casts substantial doubt on the simplistic view that for the vast majority of mutations, fitness itself causes mutations to have greater costs.

      We thank the reviewer for these positive comments and the nice summary of our work.

      As a minor point of criticism, a lot of statistical test are being done here and there is no attempt to address the issue of multiple testing. I would like the authors to address this. I say minor because I don't think the overarching patterns are being affected by a few false positive tests.

      Related points were also raised by the other reviewer. To address this, we have added multiple-hypothesis-corrected p-values for these least-squares Wald Tests (using the Benjamini-Hochberg method) to our dataset (Supplementary File 1). As you suggest, for this particular analysis in which we compare the overall number of mutations following each pattern, we are willing to accept the possibility of false positives, so we still use the original p-values to categorize the mutations in Figure 2. We address this point in the main text and provide the numbers of mutations falling in each category after we perform this correction:

      “Because we are primarily focused on comparing the frequency of each pattern across environments, we report these values before multiple-hypothesis-testing correction here and in Figure 2; after a Benjamini Hochberg multiple-hypothesis correction these values fall to 24/77 (~31%), 15/74 (~20%), 9/77 (~12%), and 11/74 (~15%), respectively.”

      From here the authors turn to using a formal modeling to understand epistasis better. For each mutation, they fit the fitness data to three models: fitness-mediated model = fitness effects are explained by background fitness, idiosyncratic model = fitness effects can change at any point in an evolution when a new mutation fixes, and full model = fitness effects depend on both fitness and idiosyncratic effects.

      My major criticism of the work lies here: the authors don't explain how the models work carefully and thoroughly, leaving the reader to question downstream conclusions. Typically, when models are nested (as the fitness-mediated and idiosyncratic models appear to be nested within the full model), the full model will, by definition, fit the data better than the nested models. But that is not the case here: for many mutations the idiosyncratic model explains more of the variance than the full model (e.g. Figure 3A). (Note, the fitness-mediated model never fits better than the full model). Further, when dealing with nested models in general, one should ask whether the more complex model fits the data enough better to justify accepting it over simpler model(s). There are clearly details and constraints in the models used here (and likely in the fitting process) that matter, but these are not discussed in any detail. Another frustrating part of the model fitting is that each model is fit to each mutation individually, but there is no attempt to justify this approach over one where each model is expected to explain all mutations. I'm not saying I think the authors have chosen a poor strategy in what they have done, but they have made a set of decisions about how to model the problem that carries consequences for interpretation, and they don't justify or discuss those decisions. I think this needs to be added to the paper. I think this should include both a high level, philosophy-of-our-approach section and, probably elsewhere, the details.

      The reason this matters is because the authors move on to use the fitted models and the estimated coefficients from the models in discussing and interpreting the structure of epistasis. For example, they say "We find that the fitness model often explains a large amount of variance, in agreement with our earlier analysis, but the idiosyncratic model and the full model usually offer more explanatory power." Looking at Figure 3A, this certainly appears to be the case, yet that type of statement is squarely in the domain of model comparison/selection-but as explained above, this issue is not addressed. Relatedly, the authors go on to argue that "Positive and negative coefficients in the idiosyncratic model represent positive and negative epistasis between mutations that fix during evolution and our insertion mutations." I'm left wondering whether the details of the model fitting process matter. I am left asking how the idiosyncratic model would perform on data that arose, for example, under the fitness-mediated model? Or how it would perform on data originating under the full model? Is it true that when data arises under a different model (say the full model) but is fit under the idiosyncratic model, negative coefficients always represent negative epistasis and positive coefficients will always represent positive epistasis and that model misspecification does not introduce any bias? Another thing I am left wondering about concerns the number of observed coefficients in the idiosyncratic model: if one mutation shows similar effects across backgrounds, it might generate one coefficient during model fitting, while another mutation that has different effects on different backgrounds could give rise to several coefficients-is there some type of weighting that addresses the fact that individual mutations can generate different numbers of coefficients? One can imagine bias arising here if this isn't treated carefully.

      One of the main conclusion that the authors reach is that the pattern of increasing cost epistasis (observed previously and here in the YPD 30 environment) may not arise from the effect of background fitness itself, but instead arise because epistatic effects tend to be negative-and the more interactions there are (with mutations accumulating over time), the more they tend to have a negative cumulative effect. I find it very likely that the authors have this major conclusion correct. By contrast, they find that at SC 37, the distribution of fitness effects is less negatively skewed-with a considerable number of coefficients estimated to be > 0. They close with a really interesting discussion exploring how these patterns likely arise from underlying biology of the cell, metabolic flux, redundancy, and selection for loss-of-function vs gain-of-function. I find a lot of this interesting and insightful. But because some of their conclusions rest squarely on the modeling, I encourage the authors to be more thorough and convincing in how they execute this aspect of the work.

      Thanks for these detailed comments about the modeling approach and analysis, which raise points that were also described in the Essential Revisions and by Reviewer 1. We agree that these details were not presented sufficiently clearly in the original manuscript. In the revised manuscript, we have added a much more in-depth section on the details of the modeling procedures in the Materials and Methods, including formulas for each model and a discussion of how noise could affect our modeling results (see responses to essential revisions and reviewer 1 above for more information). This includes an analysis of shuffled and simulated datasets, which will give readers a better sense of how to interpret these modeling results. We have also included a new paragraph in the results that compares the models for each mutation and for the entire dataset using the Bayesian Information Criteria (BIC):

      “We can also ask which model best explains the data using the BIC, which penalizes models based on the number of parameters. The small squares below the bars in Figure 3A indicate which model has the lowest BIC for each mutation. In YPD 30°C, the full model has the lowest BIC for 40/77 (~52%) mutations and the idiosyncratic model has the lowest BIC for 37/77 (~48%). In SC 37°C, the full model has the lowest BIC for 49/73 (~67%) mutations and the idiosyncratic model has the lowest BIC for 24/73 (~33%). When we assess how well each model fits the entire dataset in each environment, the full model has a lower BIC than the idiosyncratic model in both environments.”

      We also appreciate the suggestion to look at how coefficients are spread among mutations. We have made a new supplemental figure (Figure 3 - Figure supplement 3) that clearly shows the coefficients broken down by mutation for each condition. This figure shows that coefficients are often clustered for one mutation. That is, multiple populations often have similar coefficients / patterns of epistasis for a particular mutation. We don’t view this as a source of bias in our data, but as an indication that the mutations fixing in these populations sometimes exhibit similar patterns of epistasis with these insertion mutations. We now reference this supplemental figure in the main text (“see Figure 3 – figure supplement 3 for a breakdown of coefficients by individual mutations”) as a better representation of the coefficients that result from our modeling.

    2. Reviewer #2 (Public Review):

      Johnson and Desai have developed a yeast experimental-evolution system where they can insert barcoded disruptive mutations into the genome and measure their individual effect on fitness. In 2019 they published a study in Science where they did this in a set of yeast variants derived by crossing two highly diverged yeast. They found a pattern that they termed "increasing cost epistasis": insertion mutations tended to have more deleterious effects on higher fitness backgrounds. The term "increasing cost epistasis" was coined to play off the converse pattern commonly observed in experimental evolution of "diminishing returns epistasis" wherein beneficial mutations tend to have smaller effects on more fit backgrounds. Another way to think about fitness effects is in terms of robustness: when mutations tend to have little effect on phenotype, the system is said to be robust. Thus, when increasing costs epistasis is observed, it suggests that higher fitness backgrounds are less robust.

      In this paper, Johnson and Desai use this same barcoded-insertions system in yeast, but here the backgrounds receiving insertions are adapting populations. More specifically, they took 6 replicate populations that evolved for 8-10k generations and inserted a panel of 91 mutations at 6 timepoints along the evolutions. They then did this entire experiment in two different environments: one in rich media at permissive temperature (YPD 30) and one in a defined media at high temperature (SC 37). Importantly, the mutations accumulating in a population over time here are driven by selection-and thus the patterns of epistasis observed here are probably more relevant to "real" evolution than the backgrounds from the 2019 paper. The overarching question, then, is whether similar patterns of epistasis is found in these long-term adaptations and across conditions as was previously observed.

      The first major finding in this work is that at YPD 30 (where the yeast are presumably "happy"), the mean fitness effect does decline in most (but not all) populations as they adapt. Since the population is becoming more fit over time (relative to a constant reference type), this is consistent with the previously observed pattern of increasing cost epistasis. The strength of the effect is, however, weaker than in that previous work. The authors speculate that this may reflect the fact that far few mutations are involved here than in the previous study-giving far fewer opportunities for (mostly negative) epistatic effects. I find this explanation likely correct, although speculative.

      The second major, and far more surprising, result is that in the other condition (SC 37), the insertion mutations mutations do not show a consistent trend: mean fitness effect of the insertion mutations does not change as adaptation proceeds. This is despite the fact that fitness increases in these population over time just as it did in the YPD populations. Toward the end of the paper, the authors speculate as to why this is the case. They argue that in the YPD 30 environment, selection is mainly on pure growth rate. They suggest that the growth rate depends on different components such as DNA synthesis, production of translation machinery, and cell wall synthesis. Critically, these components are non-redundant and can't "fill in" for each other. So, for example, rapid DNA synthesis is of little value if cell-wall synthesis is slow. As adaptation fixes mutations that increase the function of one of these growth components, they shift the "control coefficient" to other components. This, they argue, may be the major explanation behind increasing cost epistasis. I find the logic of their argument compelling and potentially providing great insight into developing a richer view of epistasis. Future experiments will be needed to test how well the hypothesis holds up. They then flip the argument around and suggest that in the SC 37 environment, the targets of selection are fundamentally different from those in growth-centric YPD 30 conditions. Instead, they argue, there is likely more redundancy in the components that mutations are affecting. I again find their arguments compelling.

      After establishing these observed patterns for mean effects, they examine individual mutations and look at the relationship of fitness effects as a function of background fitness. The upshot of this analysis is that there are more negative correlations than positive ones (especially in the YPD 30 conditions), but also that there is a lot of variation: there are many mutations that show no correlation and a small number with a positive correlation. This casts substantial doubt on the simplistic view that for the vast majority of mutations, fitness itself causes mutations to have greater costs. As a minor point of criticism, a lot of statistical test are being done here and there is no attempt to address the issue of multiple testing. I would like the authors to address this. I say minor because I don't think the overarching patterns are being affected by a few false positive tests.

      From here the authors turn to using a formal modeling to understand epistasis better. For each mutation, they fit the fitness data to three models: fitness-mediated model = fitness effects are explained by background fitness, idiosyncratic model = fitness effects can change at any point in an evolution when a new mutation fixes, and full model = fitness effects depend on both fitness and idiosyncratic effects.

      My major criticism of the work lies here: the authors don't explain how the models work carefully and thoroughly, leaving the reader to question downstream conclusions. Typically, when models are nested (as the fitness-mediated and idiosyncratic models appear to be nested within the full model), the full model will, by definition, fit the data better than the nested models. But that is not the case here: for many mutations the idiosyncratic model explains more of the variance than the full model (e.g. Figure 3A). (Note, the fitness-mediated model never fits better than the full model). Further, when dealing with nested models in general, one should ask whether the more complex model fits the data enough better to justify accepting it over simpler model(s). There are clearly details and constraints in the models used here (and likely in the fitting process) that matter, but these are not discussed in any detail. Another frustrating part of the model fitting is that each model is fit to each mutation individually, but there is no attempt to justify this approach over one where each model is expected to explain all mutations. I'm not saying I think the authors have chosen a poor strategy in what they have done, but they have made a set of decisions about how to model the problem that carries consequences for interpretation, and they don't justify or discuss those decisions. I think this needs to be added to the paper. I think this should include both a high level, philosophy-of-our-approach section and, probably elsewhere, the details.

      The reason this matters is because the authors move on to use the fitted models and the estimated coefficients from the models in discussing and interpreting the structure of epistasis. For example, they say "We find that the fitness model often explains a large amount of variance, in agreement with our earlier analysis, but the idiosyncratic model and the full model usually offer more explanatory power." Looking at Figure 3A, this certainly appears to be the case, yet that type of statement is squarely in the domain of model comparison/selection-but as explained above, this issue is not addressed. Relatedly, the authors go on to argue that "Positive and negative coefficients in the idiosyncratic model represent positive and negative epistasis between mutations that fix during evolution and our insertion mutations." I'm left wondering whether the details of the model fitting process matter. I am left asking how the idiosyncratic model would perform on data that arose, for example, under the fitness-mediated model? Or how it would perform on data originating under the full model? Is it true that when data arises under a different model (say the full model) but is fit under the idiosyncratic model, negative coefficients always represent negative epistasis and positive coefficients will always represent positive epistasis and that model misspecification does not introduce any bias? Another thing I am left wondering about concerns the number of observed coefficients in the idiosyncratic model: if one mutation shows similar effects across backgrounds, it might generate one coefficient during model fitting, while another mutation that has different effects on different backgrounds could give rise to several coefficients-is there some type of weighting that addresses the fact that individual mutations can generate different numbers of coefficients? One can imagine bias arising here if this isn't treated carefully.

      One of the main conclusion that the authors reach is that the pattern of increasing cost epistasis (observed previously and here in the YPD 30 environment) may not arise from the effect of background fitness itself, but instead arise because epistatic effects tend to be negative-and the more interactions there are (with mutations accumulating over time), the more they tend to have a negative cumulative effect. I find it very likely that the authors have this major conclusion correct. By contrast, they find that at SC 37, the distribution of fitness effects is less negatively skewed-with a considerable number of coefficients estimated to be > 0. They close with a really interesting discussion exploring how these patterns likely arise from underlying biology of the cell, metabolic flux, redundancy, and selection for loss-of-function vs gain-of-function. I find a lot of this interesting and insightful. But because some of their conclusions rest squarely on the modeling, I encourage the authors to be more thorough and convincing in how they execute this aspect of the work.

    1. Reviewer #3 (Public Review):

      Punishment is a key form of learning and behavior change, yet its core behavioural and brain mechanisms remain poorly understood and certainly less well understood than reward learning. This manuscript by Jacobs et al from the Moghaddam laboratory uses dual fibre photometry for calcium transients to make an important advance in understanding how punishment is learned by studying how punishment changes action and punisher coding in the PFC and VTA of rats. This work builds on the elegant single unit work from this group reported previously. The authors use a single action, probabilistic task whereby rats are first trained to nosepoke for sugar pellets on an FR1, with a 5 sec DS signalling reinforcement. Then, in blocks of 30 trials each, the nosepoke is punished on a probabilistic contingency of 0%, 6%, 10%. The authors used dual fibre photometry to concurrently record calcium transients in "dmPFC" and VTA, with a focus on transients related to action emission and punisher as well as reward delivery.

      There are quite a few key findings here: 1) action transients in dmPFC change across punishment from modest inhibitory transients in 0% risk to no change (i.e possible loss of inhibitory transient in PFC) or modest positive transients (in VTA) as risk increased from 6-10%; 2) comparison with past single-unit data suggested similarity between photometry and single unit measures for the action but not DS; 3) there was no change in punisher transients in these regions; 4) diazepam which had modest behavioral effects to alleviate punishment had no effects on PFC transient to the action or punisher but did reveal peri-action ramping-like transients in VTA; 5) diazepam increased correlated activity between VTA and PFC at 0% and 6% risk

      Overall, I enjoyed reading this manuscript and I learned much from it. The work builds neatly and clearly on the past work of this group in this task, providing new information on how punishment shapes action coding in the prefrontal cortex and VTA, how it shapes correlated activity between these regions, and how benzodiazepines may affect these to achieve their anxiolytic effects. The critical conclusions are that these regions are important for action, but not punisher, encoding, and that peri-action ramping in VTA neurons and VTA-PFC correlated activity contribute to the anxiolytic effects of benzodiazepines in this task.

      Comments

      1. I think it is worth drawing the distinction between punishment (i.e. learning and performance) versus the punisher (footshock). For example, the title (and across the manuscript) refers to "punishment coding" to mean transients to the punisher itself. I would suggest using "punisher" when referring to the outcome used (footshock) or its associated transients and "punishment" when referring to learning. So, learning punishment involves changes in action but not punisher encoding in these regions.

      2. "dmPFC". Different researchers mean different things by this term. Would it be possible to state exactly where the fibres were instead (e.g., Laubach et al., eNeuro, 2018)?

      3. I did struggle to understand the functional significance of the PFC transients. I am convinced they are real and robust because we see precisely the same in our own unpublished work. But, I am still puzzled as to what a loss of an 'inhibitory' transient around the punished action in PFC means? This is not really addressed but it is the main effect of punishment on action coding in the PFC and I think some readers would appreciate the author's interpretation of this.

      4. Related to 3, it was also not clear why these PFC transients differed only at 6% risk and not also 10% risk. Again, I think this is worth discussing.

      5. Re: analyses. I thought these were generally well done. There are two questions one might be interested in. The first is whether the transients are different from 0%. The second is whether transients differ across sessions. The figures do a good job at answering the second question (which to me is the most important question) by using coloured bars above transients to show when session differences are present as assessed by a robust analysis. However, I do think some readers would also appreciate knowing whether and when transients themselves were significantly < or > 0%. Perhaps these figures could be presented as supplementary data.

      6. The comparison with previously published single-unit data was very interesting. Here I was persuaded that these correlations were meaningful because of the difference between these correlations for cue and action. I am not suggesting the authors do the following, I only offer it for their consideration in future work. Kriegeskorte has developed ways of assessing dissimilarity in different data types from the same behavioural designs that could prove very helpful and persuasive here (e.g., Front. Syst. Neurosci., 24 November 2008; https://doi.org/10.3389/neuro.06.004.2008).

      7. The authors comment on the overgeneralisation of punishment learning. That is, in session 1 there is a broad suppression of behavior by punishment that was not obviously present in the remaining sessions. I am not sure overgeneralisation is the best term because this implies punishment learning generalised. More likely is that Pavlovian fear was present in session 1 to generally suppress nosepoking and this fear was reduced in the remaining sessions as the instrumental punishment contingency was learned. Bolles made this point some years ago and it may be worth citing Bolles et al. Learning and Motivation Volume 11, Issue 1, February 1980, Pages 78-96, on this point.

    1. Author Response

      Reviewer #2 (Public Review):

      This manuscript from Shi, Ballesta, and Padoa-Schioppa examines the relationship between neural activity in the monkey orbitofrontal cortex (OFC) and various choice patterns that arise in sequential (versus simultaneous) choice. This approach addresses a central question in the study of decision-making: how can one identify value-dependent versus value-independent effects on choice behavior when value is defined from that behavior itself? Here, the authors document three behavioral differences in sequential choice: choosers are nosier, show an order bias, and show a preference bias. Leveraging a conceptual computational framework for OFC activity that the authors have developed over many years, the authors link reduced accuracy to changes in neural valuation in the OFC, order effects to post-valuation decision activity in the OFC, and preference effects to extra-OFC processes. For decision neuroscientists, these findings show specific differences between sequential and simultaneous choice, and suggest the integration of multiple stages (valuation, decision, and post-decision) in the selection process. More broadly, this work shows how an examination of neural activity can shed light on aspects of the decision process that cannot be distinguished by an examination of behavior alone.

      Strengths:

      Overall, this paper presents a novel and thoughtful task design that allows comparison of neural and behavioral value and choice effects. In concert with an established circuit-based framework for parsing different types of OFC response patterns, the authors test and validate a number of hypotheses on the link between neural activity and choice.

      (1) Comparing sequential and simultaneous choice tasks in an interleaved manner is a clever approach to separate valuation and comparison processes in time. While not entirely novel (e.g. see work from the Hayden group), the combination of this approach with the OFC response pattern (offer value, chosen value, chosen juice) framework allows a distinction between valuation and comparison-related effects.

      (2) This paper is the latest in a significant series of related papers on orbitofrontal activity from this group, and cleverly utilizes their expertise in characterizing, analyzing, and conceptualizing different patterns of OFC activity. In addition to the long-established offer value/chosen value/chosen juice categorization, recent papers from this group have established the causal contribution of OFC offer value activity to economic choice and established similar OFC neural contributions to sequential and simultaneous choice tasks.

      (3) Apart from a causal test (e.g. cell type specific stimulation) of the contribution of different neural responses to different choice effects, the next strongest evidence is a demonstration of a consistent relationship across sessions. The authors show such a relationship between offer value coding strength and choice accuracy, between chosen value sequence effects and behavioral order bias, and between chosen juice inhibition and order bias. At the least, these relatively strong effects show a strong correlation between different OFC responses and behavior.

      Thank you for emphasizing these points.

      Weaknesses:

      While the experimental approach and rigor of the analyses are strengths, there are issues of interpretation and generality of analytical approaches that should be clarified.

      (1) The abstract, introduction, and discussion touch on canonical behavioral economic choice effects as a prelude to the behavioral effects documented here, but it's not clear they are so closely related. [A] Many of the effects in the cited literature (framing effects in risky choice, preference reversals, etc.) are robust across different task paradigms, whereas the effects shown here arise specifically from a comparison of choice across different task paradigms (sequential vs. simultaneous). Furthermore, [B] it's not clear that the term "bias" adequately captures the array of effects in the behavioral economic literature (for that matter, [C] one of the main effects in this paper is reduced choice accuracy rather than a bias). [D] The paper would benefit from a clearer conceptual linkage between documented behavioral biases (particularly in humans) and the effects shown here.

      [B] We beg to differ. In our reading of the literature, the term “bias” is very general and it is invoked practically every time choices present some effect that seems idiosyncratic or “irrational”. The list of documented biases is very long – a good reference is the Wikipedia page on cognitive biases (for more scholarly references, see (Gilovich et al., 2002; Kahneman et al., 1982)).

      [A] As for whether biases documented in behavioral economics are robust across task paradigms, that’s really matter of perspectives. For example, we all understand the phenomenon of loss aversion (a.k.a. “status quo bias”) to be very robust and almost intuitive. But before the prospect theory paper of Kahneman and Tversky (1979), that was not at all the case. In the 15 years following that paper, much of what Kahneman and Tversky did was to show how loss aversion affected choices in different domains (Kahneman and Tversky, 2000). Other biases are much less reliable. For example, there is an extensive literature on decoy effects – i.e., violations of the axiom of “independence of irrelevant alternatives”. However, it turns out that the strength and even the direction of decoy effects depend on seemingly minor details (Spektor et al., 2021). In other words, decoy effects are not as robust as one might think. As for the biases dicussed here, our hunch is that the order bias is quite ubiquitous. Indeed, it was already documented using different tasks in different species (Krajbich et al., 2010; Rustichini et al., 2021). The preference bias might also be the manifestation of a rather general phenomenon. Afterall, there is a common intuition that when a decision is difficult we sometimes fail to finalize it, and eventually choose some default option. In conclusion, we think of the two biases discussed here as conceptually very comparable to biases described in behavioral economics.

      [C] We agree that the drop in accuracy is (strictly speaking) not a choice bias, and we carefully chose the title and wrote the whole manuscript to keep that point clear. However, let us note that the drop in accuracy observed under sequential offers could easily be construed as a choice bias – specifically, a bias favoring in any situation the lesser option (lower value). As we conclude the present study, this phenomenon continues to fascinate us. Indeed, while it is clear that the behavioral effect arises at the valuation stage, we still don’t understand why the activity range of offer value cells is reduced under sequential offers. Naively, one might have guessed the opposite – i.e., that when only one offer is on display, the lack of competition translates to stronger offer value signals. We plan to give this issue more thought in the future. One possibility is that the system modulates the activity range of offer value cells depending on the task and/or the behavioral context. If so, differences in choice accuracy measured under sequential versus simultaneous offers would be a manifestation of a more general phenomenon. Of course, this matter remains open for future research.

      [D] The link between the biases discussed here and other biases described in the literature is conceptual. The main point we want to make is this: Over the past 20 years, we have gained some understanding of the neural circuit and mechanisms underlying simple economic choices. While our understanding remains incomplete and object of ongoing research, notions acquired for simple choices can be used to make sense of a broader class of choices. Thus, in principle at least, it is possible to shed light on a variety of traits and biases by observing the activity of particular cell groups. The last paragraph of the ms conveys this point.

      (2) The analyses rely on a particular quantification of choice behavior (probit regression), which interprets choice effects (e.g. relative valuation of the two juices, sigmoid steepness) via specific parameter combinations and relies on specific assumptions about the construction of choice (e.g. cumulative normal distribution, constant sigmoid slope across order effects). This method of quantifying choice behavior is well-documented in previous studies, allowing a comparison to past work. However, given the importance of this approach to both quantifying choice effects and comparing choice to OFC responses, the paper would benefit from directly addressing two issues: (1) how well does probit regression actually capture stochastic choice behavior (in both Task 1 and Task 2), and (2) do the findings rely on specific choice modeling assumptions? The second issue is most important for the order bias effects, which assume a constant sigmoid across conditions - do the authors reach similar conclusions if this assumption is relaxed?

      Thanks for raising this question. We address it more thoroughly below (under “Recommendations for the authors”, point (2)). In a nutshell, when we designed the behavioral analysis, we chose the probit function and the log value ratio model (as opposed to the value difference model) based on general considerations and for consistency with our previous studies. We now conducted a series of control analyses using logit instead of probit and value difference instead of log value ratio. We also repeated all the analyses of neuronal activity using measures for relative value, choice accuracy and order bias derived from these behavioral models. The upshot is that all of our results hold true independently of the regression model used to analyze choices. Thus we kept the results as in the original ms, and we included a new section in the Methods to describe our control analyses (p.16-17).

      (3) There are some issues with the strength and interpretation of the preference bias that need to be addressed. Re: strength and significance of the preference bias, the text seems to overemphasize the dependence of the effect on relative value (rotation of the rho-2 vs rho-1 ellipse) at the cost of the simple task difference (shift in the ellipse above the identity line). Conceptually, a preference bias (an shift in relative value towards the favored item) requires only the task difference, not the dependence on relative value. It would be clearer for example if the main text (pg. 6) presented the statistics (t-test, Wilcoxon) supporting the difference in relative values (rhos) between Tasks 1 and 2. Furthermore, the rotation does not seem as robust: the text states that the result is significant in both animals (p<0.04) but the ANCOVA results (Fig 3C and 3F) suggests that the effect is only significant in Monkey J. Is the preference effect significant only in one animal, and if so, is the effect significant across the combined data?

      Let us refer to Fig.3C. There is no question that the separation between the red and blue lines is statistically significant (order bias). In addition, the two lines appear (a) displaced upwards and (b) rotated counterclockwise compared to the identity line. In our understanding, the question raised by R2 is whether the two effects – displacement (a) and rotation (b) – are both present and both necessary to define the preference bias. We actually gave this issue extensive thought early on, and we concluded that displacement and rotation are not easily dissociable, at least in our data set. The reason is simple: to dissociate them, we would have to make some assumption about the center of rotation. For example, if we assume that the center of rotation is [0, 0], then there clearly is a rotation but the displacement is close to zero. Conversely, if the center of rotation is [1, 1] (which, in some ways, is a more logical assumption), the rotation is still there but the displacement is >0. When we considered these elements, we realized that any choice of a center of rotation would be somewhat arbitrary. Further complicating things, once a center of rotation is chosen, rotation and displacement are non-commutative operations. Importantly, this issue only affects the displacement, meaning that the rotation angle (and its statistical significance) does not depend on choosing any particular center of rotation. In this light, we chose to define the preference bias in a way that is more tight to the rotation than to the displacement, while noting that the net effect of the phenomenon was to bias choices in favor of the preferred juice (hence, the phrase “preference bias”). The only problem with this definition is that it doesn’t do full justice to the phenomenon in monkey G (Fig.3F), where the displacement is more clearly evident than the rotation (indeed, the latter only trends towards statistical significance (p=0.07)). Still, we don’t see a better way to design our analyses. Thus we kept the ms unchanged in this respect.

      (4) On a related note, the authors present and view the effects as detrimental for the animals, but I think they have to more explicitly state how they are defining outcomes. For example, the abstract states "By neuronal measures, each phenomenon reduced the value obtained on average in each trial and was thus costly to the monkey". Does this mean that outcomes are less valuable, with value defined by (offer value cell) firing rates? A clarification is particularly important for the preference bias, where animals show a stronger bias for the preferred option compared to simultaneous choice. At the behavioral level, this effect seems to only be a poorer outcome if one assumes that simultaneous choice demonstrates true values - can it not be assumed that sequential choice demonstrates true preference, and the preference bias reduces performance in simultaneous choice? The authors may have an explanation in mind based on OFC value coding, and it would be helpful to be explicit here.

      Thank you for raising this question. The revised ms includes a new section (Discussion; ‘The cost of choice biases’; p.13) that discusses this important issue. In a nutshell, if in two conditions subjective values are the same but choices are different, in one or both conditions the subject fails to choose the higher value. In that sense, the choice bias is detrimental. Our analyses of neuronal activity indicated that subjective offer values were (a) the same in the two tasks and (b) independent of the presentation offer in Task 2. Hence, both the preference bias and the order bias were detrimental to the animal.

      (5) Finally, at a broad level, the authors rigorously define and test hypotheses about how the different behavioral effects relate to OFC activity within the context of their neurocomputational framework (offer value, chosen value, chosen juice cells arranged in a competitive inhibition network; Fig. 1). However, it should be acknowledged that the primary conclusions - about how the different behavioral effects arise during valuation, comparison, or post-comparison - relies on the assumption that the different OFC response patterns reflect these specific circuit functions, and that OFC is causally related to choice. It would be more balanced if the authors could acknowledge this point in the discussion, and discuss any relevant potential alternative explanations for their findings.

      This issue is addressed above (Essential revision, point 1). In essence, R2 is correct: all our analyses were designed, and all our results are interpreted, under a series of assumptions. Most of these are backed by empirical evidence (e.g., showing that the encoding of decision variables in OFC is categorical in nature). However, one assumption remains a working hypothesis. Specifically, we assume that the cell groups identified in OFC constitute the building blocks of a decision circuit. If so, the activity of different cell groups may be associated with different computational stages. We edited the Discussion to clarify this point (p.11-12). As for possible alternative explanations, we agree that it is a very reasonable question to ask, but we honestly are at a loss addressing it. Indeed, one would never conduct the analyses presented in this ms if not in the framework of Fig.1. Consequently, it is hard to come up with any interpretation for the results without embracing that computational framework. If R2 can propose some alternative interpretation for the results presented in the ms, we would be more than happy to think about it, and possibly revise our thinking.

    1. Author Response

      Reviewer #1 (Public Review):

      Overall, this study is well designed with convincing experimental data. The following critiques should be considered:

      1) It is important to examine whether the phenotype of METTL18 KO is mediated through change with RPL3 methylation. The functional link between METTL18 and RPL3 methylation on regulating translation elongation need to be examined in details.

      We truly thank the reviewer for the suggestion. Accordingly, we set up experiments combined with hybrid in vitro translation (Panthu et al. Biochem J 2015 and Erales et al. PNAS 2017) and the Renilla–firefly luciferase fusion reporter system (Kisly et al. NAR 2021) (see Figure 5A).

      To test the impact of RPL3 methylation on translation directly, we purified ribosomes from METTL18 KO cells or naïve HEK293T cells supplemented with ribosome-depleted rabbit reticulocyte lysate (RRL) and then conducted an in vitro translation assay (i.e., hybrid translation, Panthu et al. Biochem J 2015 and Erales et al. PNAS 2017) (see figure above and Figure 5A). Indeed, we observed that removal of the ribosomes from RRL decreased protein synthesis in vitro and that the addition of ribosomes from HEK293T cells efficiently recovered the activity (see Figure 5 — figure supplement 1A).

      To test the effect on Tyr codon elongation, we harnessed the fusion of Renilla and firefly luciferases; this system allows us to detect the delay/promotion of downstream firefly luciferase synthesis compared to upstream Renilla luciferase and thus to focus on elongation affected by the sequence inserted between the two luciferases (Kisly et al. NAR 2021) (see figure above and Figure 5A). For better detection of the effects on Tyr codons, we used the repeat of the codon (×39, the number was due to cloning constraints in our hands). We note that the insertion of Tyr codon repeats reduced the elongation rate (or processivity), as we observed a reduced slope of downstream Fluc synthesis (see Figure 5 — figure supplement 1B).

      Using this setup, we observed that, compared to ribosomes from naïve cells, RPL3 methylation-deficient ribosomes led to faster elongation at Tyr repeats (see Figure 5B). These data, which are directly reflected by the ribosomes possessing unmethylated RPL3, provided solid evidence of a link between RPL3 methylation and translation elongation at Tyr codons.

      2) The obvious discrepancy between the recent NAR an this study lies in the ribosomal profiling results (such as Fig.S5). The cell line specific regulation between HAP1 (previously used in NAR) vs 293T cell used here ( in this study) needs to be explored. For example, would METLL18 KO in HAP1 cells cause polysome profiling difference in this study? Some of negative findings in this study (such as Fig.S3B, Fig.S5A) would need some kind of positive control to make sure that the assay condition would be working.

      According to the reviewer’s suggestion, we conducted polysome profiling of the HAP1 cells with METTL18 knockout. For this assay, we used the same cell line (HAP1 METTL18 KO, 2-nt del.) as in the earlier NAR paper. As shown in Figure 9 — figure supplement 2A and 2B, we observed reduced polysomes in this cell line, as observed in the NAR paper.

      We did not find the abundance of 40S and 60S by assessing the rRNAs and the complex mass in the sucrose gradient (see Figure 9 — figure supplement 2C-E) by METTL18 KO in HAP1 cells. This observation was again consistent with earlier reports.

      Overall, our experiments in sucrose density gradient (polysome and 40S/60S ratio) were congruent with NAR paper. A difference from our finding in HEK293T cells was the limited effect on polysome formation by METTL18 deletion (Figure 4 — figure supplement 1A and 1B). To further provide a careful control for this observation, we induced a 60S biogenesis delay, as requested by the Reviewer. Here, we treated cells with siRNA targeting RPL17, which is needed for proper 60S assembly (Wang et al. RNA 2015). The quantification of SDG showed a reduction of 60S (see figure below and Figure 3 — figure supplement 1D-F) and polysomes (see Figure 4 — figure supplement 1C and 1D), highlighting the weaker effects of METTL18 depletion on 60S and polysome formation in HEK293T cells. We note that all the sucrose density gradient experiments were repeated 3 times, quantified, and statistically tested.

      To further assess the difference between our data and those in the earlier NAR paper, we also performed ribosome profiling on 3 independent KO lines in HAP1 cells, including the one used in the NAR paper (METTL18 KO, 2-nt del.). Indeed, all METTL18 KO HAP1 cells showed a reduction in footprints on Tyr codons, as observed in HEK293 cells (see Figure 4H), and thus, there was a consistent effect of RPL3 methylation on elongation irrespective of the cell type. On the other hand, we could not find such a trend (see figure below) by reanalysis of the published data (Małecki et al. NAR 2021).

      Thus far, we could not find the origin of the difference in ribosome profiling compared to the earlier paper. Culture conditions or other conditions may affect the data. Given that, we amended the discussion to cover the potential of context/situation-dependent effects on RPL3 methylation.

      3) For loss-of-function studies of METLL18, it will be beneficial to have a second sgRNA to KO METLL18 to solidify the conclusion.

      We thank the reviewer for the constructive suggestion. Instead of screening additional METTL18 KO in HEK293T cells, we conducted additional ribosome profiling experiments in HAP1 cells with 3 independent KO lines. In addition to ensuring reproducibility, these experiments should assess whether our results are specific to the HEK293T cells that we mainly used. As mentioned above, even in the different cell lines, we observed faster elongation of the Tyr codon by METTL18 deficiency.

      4) In addition to loss-of-function studies for METLL18, gain-of-function studies for METLL18 would be helpful for making this study more convincing.

      Again, we thank the reviewer for the constructive suggestion. To address this issue, we conducted RiboTag-IP and subsequent ribosome profiling. Here, we expressed Cterminal FLAG-tagged RPL3 of its WT and His245Ala mutant, in which METTL18 could not add methylation (Figure 2A), in HEK293T cells, treated the lysate with RNase, immunoprecipitated FLAG-tagged ribosomes, and then prepared a ribosome profiling library (see figure below, left). This experiment assessed the translation driven by the tagged ribosomes. Indeed, we observed that, compared to the difference in Tyr codon elongation in METTL18 KO vs. naïve cells, His245Ala provided weaker impacts (see figure below, right). Given that METTL18 KO provides unmodified His, the enhanced Tyr elongation may be mediated by the bare His but not by Ala in that position. Since this point may be beyond the scope of this study, we omitted it from the manuscript. However, we are happy to add the data to the supplementary figures if requested.

      Reviewer #3 (Public Review):

      In this article, Matsuura-Suzuki et al provided strong evidence that the mammalian protein METTL18 methylates a histidine residue in the ribosomal protein RPL3 using a combination of Click chemistry, quantitative mass spectrometry, and in vitro methylation assays. They showed that METTL18 was associated with early sucrose gradient fractions prior to the 40S peak on a polysome profile and interpreted that as evidence that RPL3 is modified early in the 60S subunit biogenesis pathway. They performed cryo-EM of ribosomes from a METTL18-knockout strain, and show that the methyl group on the histidine present in published cryo-EM data was missing in their new cryo-EM structure. The missing methyl group gave minor changes in the residue conformation, in keeping with the minor effects observed on translation. They performed ribosome profiling to determine what is being translated efficiently in cells with and without METTL18, and found decreased enrichment of Tyrosine codons in the A site of ribosomes from cells lacking METTL18. They further showed that longer ribosome footprints corresponding to sequences within ribosomes that have already bound to A-site tRNA contained less Tyrosine codons in the A site when lacking METTL18. This suggests methylation normally slows down elongation after tRNA loading but prior to EF-2 dissociation. They hypothesize that this decreased rate affects protein folding and follow up with fluorescence microscopy to show that EGFP aggregated more readily in cells lacking METTL18, suggesting that translation elongation slow down mediated by METTL18 leads to enhanced folding. Finally, they performed SILAC on aggregated proteins to confirm that more tyrosine was incorporated into protein aggregates from cells lacking METTL18.

      The article is interesting and uses a large number of different techniques to present evidence that histidine methylation of RPL3 leads to decreased elongation rates at Tyrosine codons, allowing time for effective protein folding.

      We thank the reviewer for the positive comments.

      I agree with the interpretation of the results, although I do have minor concerns:

      1) The magnitude of each effect observed by ribosome profiling is very small, which is not unusual for ribosome modifications or methylation. Methylation seems to occur on all ribosomes in the cell since the modification is present in several cryo-EM structures. The authors suggest that the modification occurs during biogenesis prior to folding and being inaccessible to METTL18, so it is unlikely to be removed. For that reason, I do not think it is warranted to claim that this is an example of a ribosome code, or translation tuning. Those terms would indicate regulated modifications that come on and off of proteins, but the authors have not presented evidence that the activity is regulated (and don't really need to for this paper to be impactful).

      We thank the reviewer for making this point, and we agree that the nuance of the wording may not fit our results. We amended the corresponding sentences to avoid using the terms “ribosome code” and “translation tuning” throughout the manuscript.

      2) In Figure 4-supplement 1, it appears there are slightly more 80S less 60S in the METTL18 knockout with no change in 40S. It might be normal variability in this cell type, but quantitation of the peaks from 2 or more experiments is needed to make the claim that ribosome biogenesis is unaffected by METTL18 deletion. Likewise, the authors need to quantitate the area under the curve for 40S and 60S levels from several replicates and show an average -/+ error for figure 3, supplement 1 because that result is essential to claim that ribosome biogenesis is unaffected.

      Accordingly, we repeated all the sucrose density gradient experiments 3 times, quantified the data, and statistically tested the results. Even in the quantification, we could not find a significant change in either the 40S or 60S levels by METTL18 deletion in HEK293T cells (see Figure 3 — figure supplement 1B and 1C).

      Moreover, for the positive control of 60S biogenesis delay, we treated cells with siRNA targeting RPL17, which is needed for proper 60S assembly (Wang et al. RNA 2015). The quantification of SDG showed a reduction in 60S (see figure below and Figure 3 — figure supplement 1D-F) and polysomes (see Figure 4 — figure supplement 1C and 1D), highlighting the weaker effects of METTL18 depletion on 60S and polysome formation.

      3) The effect of methylation could be any step after accommodation of tRNA in the A site and before dissociation of EF-2, including peptidyl transfer. More evidence is needed for claiming strongly that methylation slows translocation specifically. This could be followed up in vitro in a new study.

      We truly thank the reviewer for the suggestion. Accordingly, we set up experiments combined with hybrid in vitro translation (Panthu et al. Biochem J 2015 and Erales et al. PNAS 2017) and the Renilla–firefly luciferase fusion reporter system (Kisly et al. NAR 2021) (see Figure 5A).

      To test the impact of RPL3 methylation on translation directly, we purified ribosomes from METTL18 KO cells or naïve HEK293T cells supplemented with ribosome-depleted rabbit reticulocyte lysate (RRL) and then conducted an in vitro translation assay (i.e., hybrid translation, Panthu et al. Biochem J 2015 and Erales et al. PNAS 2017) (see figure above and Figure 5A). Indeed, we observed that removal of the ribosomes from RRL decreased protein synthesis in vitro and that the addition of ribosomes from HEK293T cells efficiently recovered the activity (see Figure 5 — figure supplement 1A).

      To test the effect on Tyr codon elongation, we harnessed the fusion of Renilla and firefly luciferases; this system allows us to detect the delay/promotion of downstream firefly luciferase synthesis compared to upstream Renilla luciferase and thus to focus on elongation affected by the sequence inserted between the two luciferases (Kisly et al. NAR 2021) (see figure above and Figure 5A). For better detection of the effects on Tyr codons, we used the repeat of the codon (×39, the number was due to cloning constraints in our hands). We note that the insertion of Tyr codon repeats reduced the elongation rate (or processivity), as we observed a reduced slope of downstream Fluc synthesis (see Figure 5 — figure supplement 1B).

      Using this setup, we observed that, compared to ribosomes from naïve cells, RPL3 methylation-deficient ribosomes led to faster elongation at Tyr repeats (see Figure 5B). These data, which are directly reflected by the ribosomes possessing unmethylated RPL3, provided solid evidence of a link between RPL3 methylation and translation elongation at Tyr codons.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      General Statements [optional]

      We are grateful for the very kind, thoughtful, and detailed comments of the reviewers, which we have strived to fully integrate into the revised manuscript.

      Of note are the concerns with the data from stages S21 and S22, which we acknowledge do appear to be qualitatively and quantitatively distinct from the other samples. While we are unable to completely disambiguate meaningful biological variation from technical or experimental noise using our data, we hope a few additional analyses and visualization tools we have included can provide greater confidence in the reliability of our findings.

      Additionally, while attempting to evaluate Reviewer #2’s suggestions about examining the distribution of intergenic peaks along the genome, we discovered an error in our code that resulted in the improper assignment of peak categories. The error resulted in the improper assignment of intronic and exonic peaks as intergenic peaks. While the largest group of peaks in our dataset remains distal intergenic peaks (30.2%), and distal intergenic peaks remain a larger proportion of our intergenic peaks than proximal intergenic peaks, many of the peaks originally assigned to the intergenic categories have been reclassified as exonic or intronic peaks. We have updated our code and figures upon reanalysis of our data and have revised our findings and discussion accordingly.

      Description of the planned revisions

      Reviewer #3, Comment #3 of 11_

      “In general, I thought that the bioinformatic methods (i.e., the code or the options used for each program) would have been helpful for my understanding in some cases. The authors say that these will be published on an accompanying GitHub repository, which should be fine if this is sufficient for journal policy.”_

      We are still at work compiling the code for our analyses into a more reader-friendly form and setting up a GitHub repository to enable easy access to more detailed methods for interested readers. Some of the most important settings have been included in the Methods and Supplementary Methods sections, but we hope to include more thorough detailing of our pipelines in the GitHub repository. The raw data for portions of the RNA-Seq and all of the ATAC-Seq data have been uploaded to the Sequence Read Archive, and we are finalizing additional raw data submission. We are also in the process of determining what data to include in our Gene Expression Omnibus submission, which we hope to include all pertinent final data analysis files as well as any intermediate or accompanying datasets which would facilitate downstream analyses. The large size and number of our final analysis files has resulted in some challenges with data transfer and storage, which has delayed the upload and submission process.

      We are also collating several of the data visualization scripts built for this manuscript into a Jupyter notebook. This tool will enable the visualization of ImpulseDE2 models and peak classifications for arbitrary genes and genome regions of a user’s choice, alongside additional functions which are discussed in this revision plan.

      Description of the revisions that have already been incorporated in the transferred manuscript

      We have addressed the following substantive concerns with the manuscript:

      Reviewer #2, Comment #2 of 3:_

      “Authors have repeatedly used S21 and S22 throughout the manuscript to support their claims with clustering etc. May authors shed some light on the differences in replicates for these timepoints. Furthermore, I could not find Fig 3J, perhaps author would like to point out Fig 3H.”_

      Reviewer #3, Cross-comment #2 of 3:_

      “Focus on stages S21/S22: This might indeed be somewhat problematic. The libraries from these two stages (particularly S21) seem to be very different from those from the other stages. In the PCA (Fig. 1C), S21 doesn't cluster well with anything, and the difference between the two replicates is massive compared to other stages. The accessibility pattern (Fig. 1D) also looks odd. The libraries also have the lowest scores for % of mapped reads (Fig. S2B), fragment size distribution (S2E), and Spearman correlation (S2I). All this could be biologically sound and be due to a major developmental transition at this point, but maybe it justifies revisiting the data and testing whether leaving out S21 (and/or S22) makes a big difference for the clustering analyses.”_

      1. Reviewers #2 and #3 discussed concerns with the outlying nature of libraries S21 and S22. We had also previously held concerns about these samples and had performed some analyses to examine whether the global properties of our dataset are dramatically changed upon removing those samples. We did not observe dramatic changes to the structure of our data in the absence of the S21/S22 samples.

        • a. Samples S21 and S22 appear to be highly separated from the rest of our data using Principal Components Analysis. We had also previously believed that this suggested that these samples might be problematic. However, a colleague indicated to us that researchers in microbiome ecology had observed similar phenomena, often caused by strong single axes of variation (or “linear gradients”) in the datasets. In “Uncovering the Horseshoe Effect in Microbial Analyses” (mSystems, 2017) by Morton et al., the authors describe how a strong linear gradient can create a “horseshoe effect” or “Guttman effect”, where PCA results in the two ends of a linear gradient appearing to come together in ordinal space. The authors also describe a similar “arch effect” which strongly resembles the general shape of our PCA curve. We suggest that the strong apparent “outlier” appearance of S21 and S22 may be exaggerated or induced by the technical “arch effect” phenomenon, and may be caused by a strong single biological gradient – a developmental timecourse – which our data aimed to capture.
        • b. We also performed PCA on our dataset with the S21 and S22 time points removed prior to performing the analysis (see right panel, bottom). When we did so, we observed that the relative positions of the remaining libraries remains largely similar, with time points closer to the middle of development showing a positive loading in PC2, and time points closer to the beginning and end of development showing a negative loading. This suggests that the second major axis of variation in our dataset would remain a contrast between middle vs. terminal timepoints, even without the S21/S22 data, and that the relative positioning of the remaining data within PC-space is not entirely driven by S21/S22.
        • c. To further assess the degree of the S21/S22 samples’ outlying effects, we also performed ImpulseDE2 analysis to generate model fits without S21/S22 data. Doing so allowed us to determine to what degree the S21/S22 stages are necessary for driving the accessibility trajectory of individual peaks, and of the data more broadly. We performed IDE2 with either all data, or the S21/S22 data removed prior to input into IDE2. This generated two sets of model fits to the “cloud” of accessibility vs. time measurements: one that included the S21/S22 data, and one without. We evaluated, for each peak in our dataset, the time point at which the IDE2 model achieved maximum accessibility (the “IDE2 max fit”), and plotted both the “all” and “noS21S22” data as a histogram (see right panel, top graph). The presence of peaks that achieve predicted maximum accessibility in the S21/S22 stages in the “no S21/S22” data is a result of how we calculate “max fit”, which does not require that there is a known accessibility value at a given timepoint; only that the time point during which the model fit is maximum is closest to the timing of that developmental stage. Overall, we still observed early, middle, and late enrichment of IDE2 max fit even when the S21/S22 data are removed. We do see a rightward shift in the middle timepoint histogram in the direction of later stages, although this may be expected given the absence of concrete accessibility values at S21/S22 in the “no S21/S22” data. This indicates that our data globally retain the general trends of early, middle, and late enrichment of accessibility in the absence of the S21/S22 data. Moreover, this suggests that, even without the S21/S22 data, the remaining data from early and late stages result in a model fit that still predicts maximum accessibility at middle developmental stages for many peaks.
        • d. To further measure the influence of the S21/S22 data in IDE2 model fit, we also evaluated the degree of change in the global behavior of a peak when the S21/S22 stages were removed. This analysis aimed to assess whether removing S21/S22 data resulted in an IDE2 model with the same general trajectory as with all data, as opposed to the more stringent requirement of evaluating whether the exact developmental stage of the peak was changed. To perform this analysis, we grouped developmental stages into five quintiles, each representing three stages of development. We asked, for each peak in our dataset, whether that peak’s IDE2 max fit was “stable” when the S21/S22 data were removed; that is, if the quintile of the IDE2 max fit was altered when the S21/S22 data were removed (i.e. if a peak moved more than 3 developmental stages away from its original position), a peak was considered “unstable”. We observed that over 80% of peaks in each quintile remained “stable” after removing the S21/S22 data, suggesting that the vast majority peaks show the same general trajectory of accessibility even without the S21/S22 data. Peaks within the middle time points appeared to be more unstable than peaks at the terminal timepoints, which could be expected given that the S21/S22 timepoints constituted the middle-most timepoints in our dataset.

      We acknowledge that the S21/S22 timepoints still appear to be qualitatively different in other ways. Moreover, we acknowledge that some of the peaks in our dataset are “dependent” on the S21/S22 stages, given that their accessibility trajectory changes when these stages are removed. It is difficult to determine whether a change in accessibility trajectory for a given peak caused by the removal of S21/S22 data is indicative of technical differences in sample preparation, such as batch effects; biological variation, such as a potentially unknown mutant or sick embryo; or due to genuine wildtype biological processes that occur at the S21/S22 stages.

      These caveats acknowledged, a comparative analysis of the data in the absence of the S21/S22 stages suggests that much of the global picture of development remains the same. In the interest of providing the data we generated as a resource, we decided to include the S21/S22 data in the final manuscript we have prepared for submission.

      We have included an additional supplementary figure (Supp. Fig. 2.2) highlighting these further analyses, which we hope future readers will consider when performing their own analyses with these timepoints, as well as a summary of the ways we evaluated this potential concern in the Supplementary Methods. To facilitate future users of this dataset, we will include the model parameters calculated from IDE2 using both the full dataset and the data with S21/S22 removed in the GEO accession data, as well as a Jupyter notebook (ParhyaleATACExplorer.ipynb) that allows users to plot the raw accessibility data and IDE2 model fits for individual peaks of interest (C, example on right panel), so that downstream experiments can consider the potential differences with the S21/S22 samples.

      Reviewer #2, Comment #3:_

      “The majority of ATAC-seq peaks in the distal intergenic regions is a very surprising result. Authors defend this result by suggesting that this organism has big genome. May author perform a short analysis that shows that these peaks are indeed represent nearby genes or may point towards 3D genome organisation. For example, I see that this genome might have regions in the genomes that are densely organised in gene clusters, in those cases does the pattern remains same i.e he majority of the genes are very distant from each other and hence use vital regulatory elements?”_

      Reviewer #3, Cross-comment #3 of 3:_

      Peaks in distal intergenic regions: I agree that this could be elaborated on. It might also be that >10 kb is not actually that distal for Parhyale. I would suggest to split the "distal peaks" further (e.g., in 10 kb or 2-log steps, or whatever makes most sense) and try to understand if >10 kb is mostly <20 kb, or if most of them are hundreds of kb from the nearest gene?_

      1. Reviewers #2 and #3 expressed interest in understanding the absolute distribution of distal intergenic peak distances from nearby genes in our dataset. In generating the analyses to address this question, we stumbled upon an error in our code that reveals that the true number of intergenic peaks is much lower than we had originally reported. We discuss the nature of the error below. Moreover, we address the previous question using the new data, which overall still indicates that distal intergenic peaks remain a large portion of the Parhyale genome.
        • a. To address Reviewer #2’s comments with respect to the presence of potential clusters of intergenic regions, we built a Python tool (included in ParhyaleATACExplorer.ipynb) enabling the visualization of different cis-regulatory element categories along a genomic coordinate. Upon plotting our data with this tool, we observed problems with the categorization of the peaks – namely, that intronic and exonic peaks were erroneously classified as intergenic peaks (see right panel, top). We analyzed our script for classifying annotations more carefully and realized that we had erroneously used “bedtools closest” instead of “bedtools intersect” to try to identify all peaks overlapping with gene annotations in our genome. We corrected this error and observed the expected distribution and categories of peaks in our data (right panel, bottom).
        • b. The revised peak categories have been added to the updated manuscript in Fig. 3H and Fig. 5C. The categories of peaks we observed differ substantially from our previous results, in that we observe a much higher representation of exonic and intronic peaks in our dataset, with intronic peaks now representing 28.2% of all peaks (increased from <1%), and distal intergenic peaks representing 30.2% (decreased from 51.2%). While distal intergenic peaks remain the largest category over time, the proportion is relatively equal to the fraction of intronic peaks. Intergenic peaks (distal and proximal combined) now make up only a slightly larger fraction of peaks (37.2%) than gene body peaks (exon, intron; total 34.4%). This updated result is a significant departure from our previous report, and we have updated the text of the manuscript to correct this mistake.-
        • c. While intergenic and distal intergenic peaks constitute a much smaller portion of our data, we still wanted to address Reviewer #2 and #3’s questions about the distribution of distances between intergenic peaks and nearby genes. We generated a plot to illustrate the number of intergenic peaks at variable distances to the nearest gene (B, right panel). As illustrated in the plot, there are a very large number of distal intergenic peaks, including many peaks >100kb away from the nearest gene. The average distance of intergenic peaks from the nearest gene was 73,351bp. We neglected to mention in the original manuscript that one of the rationales for choosing a 10kb cutoff as “distal intergenic” was that peaks beyond this distance would be considerably more difficult to isolate as single fragments combined with a proximal promoter using PCR, agnostic of their orientation with respect to the promoter element. Such peaks could not have been easily identified using previous transgenic approaches, and are thus distinguished from “proximal” peaks by their necessary identification using techniques such as ATAC-Seq. We have updated the text to reflect this distinction.
        • d. Given that both intergenic and gene body peaks appeared to comprise large fractions of our revised data, we also examined the relative enrichment of intergenic and gene body peaks with respect to time (after normalizing for the fraction of “unknown” peaks, as suggested by Reviewer #3). We observed that the proportion of peaks belonging to intergenic and promoter regions declined slightly as development progressed, while the proportion of gene body peaks increased (E, below). There appeared to be slightly more intergenic peaks than gene body peaks at all developmental time points, and the ratio of intergenic peaks to gene body peaks declined very slightly over time (F, below). These data indicate that intergenic and gene body peaks have different enrichment trajectories over time. As development progresses, gene body peaks are increasingly enriched, and may have a greater impact on gene regulation. We have added these additional observations to the text and to a new Supplementary Figure 2.3.

      We have also addressed the following textual and conceptual concerns with the manuscript:

      Reviewer #3, Comment #1 of 11_

      I felt that the first paragraph of the introduction is not necessary._

      1. We believe the introductory paragraph helps frame the paper in the context of the broader scope of advances in technologies for emerging research organisms – currently, it has become straightforward to both generate a genome sequence and to identify and manipulate coding genes of interest across diverse taxa, but the identification of gene regulatory mechanisms remains more difficult. We have edited the introduction to better reflect this perspective and to link the first paragraph to the rest of the paper.

      Reviewer #2, Comment #1 of 3_

      “In Introductory paragraph 2, sentence one, authors suggest that gene regulation plays more important role in evolutionary process than genes. Although a significant amount of research has been dedicated to gene regulation based evolution still this field is in nascent form. For example evidence of inheritance of the gene regulation pattern across generation is scarce and requires more evidence. I suggest authors to modulate the claim that still gene based evolution is the main paradigm instead otherwise.”_

      Reviewer #3, Cross-comment #1 of 3_

      Evolution via gene regulation vs. coding sequence: While (to my understanding) it is largely accepted in the field that changes to the CDS will often have more deleterious effects than changes to the expression of a gene, I agree that this could be elaborated on a bit.

      1. As requested by Reviewers #2 and #3, we have clarified the language surrounding the debate between gene functional and gene regulatory evolution to indicate that both mechanisms appear to be important for evolutionary processes, with the importance of the latter more recently revealed.

      Reviewer #3, Comment #2 of 11_

      Use of Genrich: I presume this was run on both duplicates simultaneously? This is not clear from the methods section. It might have implications for downstream analyses (e.g., differential accessibility between time points) because running on both sequencing library replicates simultaneously leads to a single "replicate" of peaks per time point, while running it individually leads to two. However, I have never tested if this actually does make a difference. Maybe the authors have and can comment on this?

      1. In response to Reviewer #3’s inquiry about Genrich, we have added additional clarifying information into the Methods section. “Genrich analysis was run on both duplicate libraries simultaneously; Genrich performs peak calling on each peak individually, and then merges the p-values of the replicates using Fisher’s method to generate a q-value, obviating the need to calculate an Irreproducible Discovery Rate (IDR).” We did not test running Genrich on individual libraries, opting for the more conservative approach of using the combined q-value as a filtering score for peak quality. For further information, the reviewer can see the Genrich Github repository section here: < [https://github.com/jsh58/Genrich#multiple-replicates]

      Reviewer #3, Comment #4 of 11_

      The section on the IDE2 models (the paragraph at the end of page 4/beginning of page 5) was unclear to me but appears sound. (The only instance where I didn't quite understand what the program actually does.) Maybe this can be explained a bit easier?_

      1. As requested by Reviewer #3, we have attempted to explain the methods and logic of using ImpulseDE2 a bit more clearly:

      “To identify regions of dynamically accessible chromatin, we used the ImpulseDE2 (IDE2) pipeline (Fischer et al., 2018). IDE2 differs from other software for differential expression analysis in that it allows the investigation of trajectories of dynamic expression over large numbers of timepoints. It does so by modeling a gene expression trajectory as an “impulse” function that is the product of two sigmoid functions (Chechik and Koller, 2009; Yosef and Regev, 2011). This approach enables the modeling of a trajectory of gene expression in three parts: an initial value, a peak value, and a steady state value, thus summarizing an expression trajectory using a fixed number of parameters. With the ability to capture the differences between early, middle, and late expression values for each gene in a dataset, IDE2 also enables the detection of transient changes in gene expression or accessibility during a time course. Identifying differential expression over large numbers of timepoints is difficult for more categorical differential expression software such as edgeR and DESeq2, which generally use pairwise comparisons between timepoints to assess change over time (Love et al., 2014; Robinson et al., 2010).”

      Reviewer #2, Comment #2 of 3_

      2-2) Authors have repeatedly used S21 and S22 throughout the manuscript to support their claims with clustering etc. May authors shed some light on the differences in replicates for these timepoints. Furthermore, I could not find Fig 3J, perhaps author would like to point out Fig 3H.

      Reviewer #3, Comment #5 of 11_

      On page 7, Fig.3J needs changing to 3H. This figure should, in my opinion, also contain the absolute number of peaks for each time point to set the individual proportions into context.

      1. As requested by Reviewer #3, we have added a bar charts representing the number of peaks found at each time point (Fig. 3H) and the number of peaks found in each cluster (Fig. 5C) to the peak type proportion plots. We have also fixed references to Fig. 3J to instead refer to Fig. 3H – we apologize for the confusion.

      Reviewer #3, Comment #6 of 11_

      Last paragraph of the "Improving the Parhyale genome annotation" section: I think this needs to focus on those regions of the genome for which the location is known - after all, the "unknown" regions" could all be "distal transgenic", which would significantly change the relative proportions._

      1. We have revised our analysis of this topic with our updated peak type proportions, as described above in point 2d above under “substantive concerns”.

      Reviewer #3, Comment #7 of 11_

      “On page 9, t-SNE is mentioned but doesn't seem to be cited.”

      1. As requested by Reviewer #3, we have added citations for the t-SNE method, as well as scikit-learn, the software we used for t-SNE visualization.

      Reviewer #3, Comment #8 of 11_

      “The third paragraph on page 9 ("We evaluated the differences...") should mention the fact that clusters 1 and 2 are the only ones with significant proportions of exonic and intronic peaks. In the accompanying figure (5C), the total number of peaks would again be helpful.”_

      1. After identifying the error in our peak category classification pipeline, this observation was no longer true. However, upon examining the new distributions by cluster, we observed that in Clusters 3–7, for which we observed GO enrichment for developmental processes, there appeared to be slightly higher enrichment of intronic regulatory elements than distal intergenic regulatory elements. These results resemble the observation from recent work showing that tissue-specific enhancers are enriched in intronic regions in various human cell types (e.g. Borsari et al. 2021, Genome Research). We have noted this new observation in the text.

      Reviewer #3, Comment #9 of 11_

      In figure 5D, I can't quite make out at which stage the dip in the peak of Cluster 8 occurs. This is quite an unusual pattern of accessibility change, and I can't help but wonder if it has something to do with the quality of one of the libraries? Also, the fact that half of the peaks fall into unmapped regions of the genome is unusual, and I feel this deserves more discussion._

      1. In Figure 5D, Reviewer #3 asks about a dip in accessibility for Cluster 8 peaks. The dip in accessibility was actually observed for Cluster 9 peaks and is marked by the asterisk in that panel. We have updated the figure legend to clarify the significance of the asterisk and have referred readers to examine Supp. Fig. 5.1B, where the IDE2 model fits more clearly show a collective dip in accessibility for Cluster 9 peaks. Upon examining the size distribution of the clusters, we have also noticed that Cluster 8 is the smallest cluster. We have noted the small cluster size and high “unknown” peak enrichment for Cluster 8 in the text.

      Reviewer #3, Comment #10 of 11_

      “On page 10, the abbreviation PFM appears, but it is only explained in the legend of Fig.4. This should appear in the text.”_

      1. Reviewer #3 mentions that on page 10, we use the abbreviation for position frequency matrices (PFMs) without previous reference. We first introduce the abbreviation on page 8, but given the repeated use of “PFM” on page 10, we have added an additional explanation of the abbreviation on page 10, for ease of reading.

      Reviewer #3, Comment #11 of 11_

      “The section on "Concordant and discordant expression and accessibility" is the one I disagree most with. The authors seem to suggest that a repressive cis-regulatory module should become less accessible when the gene is activated. However, they leave trans-acting factors completely out of their conceptualisation here. It is in general likely the availability of transcription factors that leads to repression, while the "silencer" can be well accessible in all cells. Moreover, it has become clear in recent years that CRMs are not just repressors or enhancers per se but can act as either depending on the availability of transcription factors. I think these facts could partially explain the weak correlation and should be discussed.”_

      1. We appreciate the comments from Reviewer #3, which alerted us to the more recent literature around the bifunctional potential of regulatory elements. We have revised our claims to clarify that concordance and discordance analysis cannot be used to directly assign “enhancer” or “silencer” identity to given regulatory elements. Instead, we suggest that evaluating concordance and discordance can be useful for downstream users of our data, such as those aiming to build reporter constructs for a given gene of interest. To facilitate such tool development, we have built additional functions into a Jupyter notebook to enable the visualization of accessibility, gene expression, fold change of accessibility and gene expression, significance of fold change, and concordance/discordance assignment for arbitrary peak-gene pairs. An example of this visualization is shown on the following page. Panel A shows the region around the Engrailed-1 and Engrailed-2 loci in Parhyale (text labels within the plot region were added manually in Illustrator). Panel B shows visualization of the En1 promoter peak alongside En1 expression. Significant log fold changes (DESeq2 padj < 0.05) are marked by asterisks in the bar plots, and concordance/discordance assignment at each time point is indicated by the color of the comparison text (red = concordant, blue = discordant). Panels C and D show accessibility and expression visualization for a single peak (En1 peak5) compared to two nearby genes (En1 and En2). We hope to include sufficient documentation in our GitHub repository such that using these tools is accessible for most researchers, even with limited programming knowledge.

      Description of analyses that authors prefer not to carry out

      We were unable to easily visualize the distribution of regulatory elements across the whole genome as suggested by Reviewer #2. One challenge of working with the Parhyale genome is the lack of complete chromosomes. The genome is distributed across ~290,000 contigs of variable size. We were unable to find any software that could be easily and quickly set up to visualize our data, although we will provide in a Jupyter notebook the tools for local visualization of peak types that we developed.

    1. publication has been extended far beyond our present ability to make real use of the record. The summation of human experience is being expanded at a prodigious rate, and the means we use for threading through the consequent maze to the momentarily important item is the same as was used in the days of square-rigged ships.

      That is the key issue

    1. A lot of us may have felt pressure at times to find our purpose — to find our one true cause, our personal mission, what we personally should be doing and where we fit in.

      I think everyone rushes to find out their purpose in life but I think it's fine not to know. You'll get there eventually in life. My purpose in life has always been to be a good person and I realized that purpose a long time ago while I was in a bad place in life. Your purpose doesn't have to be the same as anyone else's. It's simply yours and you choose what to make of it.

    1. The new lines you mention really are present in the text content of the element. HTML tags are not being replaced by new lines, they just get omitted entirely. If you look at the textContent property of the <p> element you selected in the browser console, and you'll see the same new lines. Also if you select the text and run window.getSelection().getRangeAt(0).toString() in the browser console you'll see the same new lines. In summary, this is working as it is currently expected to. What I think may have been surprising here is that the captured text is not the same as what would be copied to the clipboard. When copying to the clipboard, new lines in the source get replaced with spaces, and <br> tags get converted to new lines. Browser specifications distinguish the original text content of HTML "in the source" as returned by element.textContent from the text content "as rendered" returned by element.innerText. Hypothesis has always captured quotes from and searched for quotes in the "source" text content rather than the "rendered" text. This behavior causes issues with line breaks as well. It might make sense for us to look at capturing the rendered text (as copied to the clipboard) rather than the source text in future. We'd need to be careful to handle all the places where this distinction comes up, and also make sure that all existing annotations anchor properly. Also we should talk to other parties interested in the Web Annotations specifications to discuss how this impacts interoperability.
      What I think may have been surprising here is that the captured text is not the same as what would be copied to the clipboard. When <mark>copying to the clipboard, <mark style="background-color: #8000314f">new lines in the source</mark> get <mark style="background-color:#00800030">replaced with spaces</mark>, and <br> tags get converted to new lines</mark>. </br> <mark>Browser specifications distinguish <mark style="background-color: #00800036">the original text content of HTML "in the source"</mark> as returned by <mark style="background-color: #00800036"/>element.textContent</mark> from <mark style="background-color: #ffa500a1">the text content "as rendered" returned by element.innerText.</mark></mark> Hypothesis has always captured quotes from and searched for quotes in the "source" text content rather than the "rendered" text.
    1. "As We May Think" predicted (to some extent) many kinds of technology invented after its publication, including hypertext, personal computers, the Internet, the World Wide Web, speech recognition, and online encyclopedias such as Wikipedia:

      Dispositivo avanzado para la época, pudo predecir de forma general el funcionamiento de la web hoy en día, aun así ni siquiera se ha igualado ese nivel de pensamiento, puesto que el Memex planteaba una forma de imitar procesos neuronales complejos de organización y asociación.

    1. Herald: Nay, ill it were to mar with sorrow's tale The day of blissful news. The gods demand Thanksgiving sundered from solicitude. If one as herald came with rueful face To say, "The curse has fallen, and the host Gone down to death; and one wide wound has reached The city's heart, and out of many homes Many are cast and consecrate to death, Beneath the double scourge, that Ares loves, The bloody pair, the fire and sword of doom"-- If such sore burden weighed upon my tongue, 'Twere fit to speak such words as gladden fiends. But--coming as he comes who bringeth news Of safe return from toil, and issues fair, To men rejoicing in a weal restored-- Dare I to dash good words with ill, and say How the gods' anger smote the Greeks in storm? For fire and sea, that erst held bitter feud, Now swore conspiracy and pledged their faith, Wasting the Argives worn with toil and war. Night and great horror of the rising wave Came o'er us, and the blasts that blow from Thrace Clashed ship with ship, and some with plunging prow Thro' scudding drifts of spray and raving storm Vanished, as strays by some ill shepherd driven. And when at length the sun rose bright, we saw Th' Aegaean sea-field flecked with flowers of death, Corpses of Grecian men and shattered hulls. For us indeed, some god, as well I deem, No human power, laid hand upon our helm, Snatched us or prayed us from the powers of air, And brought our bark thro' all, unharmed in hull: And saving Fortune sat and steered us fair, So that no surge should gulf us deep in brine, Nor grind our keel upon a rocky shore. So 'scaped we death that lurks beneath the sea, But, under day's white light, mistrustful all Of fortune's smile, we sat and brooded deep, Shepherds forlorn of thoughts that wandered wild, O'er this new woe; for smitten was our host, And lost as ashes scattered from the pyre. Of whom if any draw his life-breath yet, Be well assured, he deems of us as dead, As we of him no other fate forebode. But heaven save all! If Menelaus live, He will not tarry, but will surely come: Therefore if anywhere the high sun's ray Descries him upon earth, preserved by Zeus, Who wills not yet to wipe his race away, Hope still there is that homeward he may wend. Enough--thou hast the truth unto the end.

      Herald: menelaus had disappeared don't make me taint good news with bad

               there was a storm and boats crashed but we were spared, they may be alive but they will think we are dead just as we think they are dead
      
                wait for Menelauss's return because Zeus favors him
      

      .

    2. Think you--this very morn--the Greeks in Troy, And loud therein the voice of utter wail! Within one cup pour vinegar and oil, And look! unblent, unreconciled, they war. So in the twofold issue of the strife Mingle the victor's shout, the captives' moan. For all the conquered whom the sword has spared Cling weeping--some unto a brother slain, Some childlike to a nursing father's form, And wail the loved and lost, the while their neck Bows down already 'neath the captive's chain. And lo! the victors, now the fight is done, Goaded by restless hunger, far and wide Range all disordered thro' the town, to snatch Such victual and such rest as chance may give Within the captive halls that once were Troy-- Joyful to rid them of the frost and dew, Wherein they couched upon the plain of old-- Joyful to sleep the gracious night all through, Unsummoned of the watching sentinel. Yet let them reverence well the city's gods, The lords of Troy, tho' fallen, and her shrines; So shall the spoilers not in turn be spoiled. Yea, let no craving for forbidden gain Bid conquerors yield before the darts of greed. For we need yet, before the race be won, Homewards, unharmed, to round the course once more. For should the host wax wanton ere it come, Then, tho' the sudden blow of fate be spared, Yet in the sight of gods shall rise once more The great wrong of the slain, to claim revenge. Now, hearing from this woman's mouth of mine, The tale and eke its warning, pray with me, "Luck sway the scale, with no uncertain poise. For my fair hopes are changed to fairer joys."

      we won, troys' triumphant and subdued are like oil and water the triumphant revel in it, the subdued weep and toil

      if we don't desecrate troys' shrines we'll be fine but if our people do it'll be bad

      we all have cause to celebrate

    1. Blog Tucker Carlson: Biden Giving WHO Power to 'Deploy Proactive Countermeasures Against Misinformation and Social Media Attacks' By Craig Bannister | May 20, 2022 | 10:39am EDT Tucker Carlson (Screenshot) Pres. Biden has found a new way to censor free speech – by giving the World Health Organization (WHO) control of Americans’ speech – Fox News Host Tucker Carlson warned on Thursday. After dissolving his “Disinformation Governance Board, due to public outcry, Biden is preparing to sign WHO’s new World Pandemic Treaty, giving a global operational control and power – through ‘proactive countermeasures’ - to combat what it deems “disinformation,“ Carlson explained, citing a WHO working group's draft text:#stickypbModal625{ position : relative; z-index : 30; margin:0px px; padding: 9px; background: rgba(0,0,0,0.0);} @media only screen and (max-width: 1024px) {#stickypbModal625 { flex-wrap: wrap;}} googletag.cmd.push(function() { googletag.display("div-hre-CNS-News-625"); }); “So, what would this ‘operational control’ mean? “Let’s be specific. Right off the bat, the treaty demands ‘National and global coordinated actions to address the misinformation, disinformation, and stigmatization that undermines public health.’ “Oh! Here we go! Right to censorship: ‘People are criticizing us, and for public health reasons, that can't be allowed. If you criticize us, people will die.’  “So, you saw yesterday that the Biden administration, in the face of universal laughter and derision, had to fire the head of its new Ministry of Truth - but they found another way to do it: ‘W.H.O. Secretariat to build capacity to deploy proactive countermeasures against misinformation and social media attacks.’” “So, they are going to get to censor anybody who doesn't agree with what they do, as they control the intimate details of your life,” Carlson explained: “And they will control those details. Under this treaty, the World Health Organization will get to establish vaccine passports and regulate travel. World Health organization will ‘Develop standards for producing a digital version of the international certificate of vaccination and prophylactics.’  “Okay.  “So you may think, ‘Well, it is just about COVID and I went along with mandatory vaccines and vaccine passports at the time, how bad could it be?’ [Laughs] First of all, if you went along with that, you should be repenting right about now. But, it is not just about COVID because the W.H.O. Will be in charge of ‘The digitalization of all health forms.’ The World Health Organization will also ‘Share real-time information about travel measures.’  “So you are going to find out exactly when you are allowed to get on a bus or train or airplane, or how about your bicycle, will they regulate that too? Maybe. Now the World Health Organization has sought this authority for years. Of course. Who doesn't want more power?” Carlson then played a foreboding comment by W.H.O. Director-General Tedros Adhanom Ghebreyesu. “Here’s Tedros back in April of 2020: “People in countries with stay-at-home orders are understandably frustrated with being confined to their homes for weeks on end. But the world will not and cannot go back to the way things were. There must be a new normal. A world that is healthier, safer, and better prepared.” Americans should question relinquishing control over their lives to an unelected person and global authority they had no say in choosing, Carlson said:#stickypbModal711{ position : relative; z-index : 30; margin:0px px; padding: 9px; background: rgba(0,0,0,0.0);} @media only screen and (max-width: 1024px) {#stickypbModal711 { flex-wrap: wrap;}} googletag.cmd.push(function() { googletag.display("div-hre-CNS-News-711"); }); “Okay, so there’s a guy with a long and documented history of subverting public health, who is clearly a liar, who is acting as an agent for the Chinese government, and you have to ask yourself, ‘Did I vote for that guy? Is he one of my elected representatives in this democracy? How did he get power over where I can travel and when?’ “Good question.”

      Summary of Tucker's televised evening talk show.

    1. Author Response

      Reviewer #3 (Public Review):

      The import of soluble precursor proteins into the mitochondrial matrix is a complex process that involves two membranes, multiple protein interactions with the translocating substrate, and distinct forms of energetic input. The traditional approaches for in vitro measurement of protein translocation across membranes typically involve radiography or immunodetection-based assays. These end-point approaches, however, often lack optimal resolution to analyze the sequential processes of protein transport. Therefore, the development of techniques to dissect the kinetic steps of this process will be of great interest to the field of protein trafficking.

      This study by Ford et al. employs a novel bioluminescence-based technique to analyze the import of presequence-containing precursors (PCPs) into the mitochondrial matrix in real time. As a follow-up study to previous work from the Collinson group (Pereira et al. 2019), this approach makes use of the split NanoLuc luciferase enzyme strategy, whereby mitochondria are isolated from yeast expressing matrix localized 'LgBiT' (encoded by the mt-S11 gene) and used for import experiments with purified PCPs containing 'SmBiT' (the 11-residue pep86 sequence). The light intensity that results from the high-affinity interaction of pep86 with mt-S11 is convincingly shown in this study to be a reliable reporter of protein import into the matrix space. Therefore, from a technical stance, this appears to be a very promising approach for making high-resolution measurements of the different kinetic steps of protein translocation.

      The authors leverage this technology to seek insights into several features of mitochondrial protein import, with some observations challenging key longstanding paradigms in the field. Using series of PCP constructs differing in length and placement of the pep86 peptide, the authors perform luminescence-based import tests with varying protein concentration, energetic input, and presequence charge distribution. Fits to the time course data suggest two main kinetic steps that govern matrix-directed import: transit of the PCP across the TOM complex into the IMS and association of the PCP with the TIM23 motor complex. The results support some very interesting insights into TIM23-mediated protein import, including: that precursor accumulation is strongly dependent on length; that the kinetically limiting step of IM transport is engagement with the TIM23 complex, not transmembrane transport itself; and that presequence charge distribution differently affects import rate and matrix accumulation. The results of this study appear repeatable among samples and the mathematical fits to time courses are well explained. However, there remain some questions about the nature of the experimental approach and the interpretation of the kinetics data in terms of the underlying biological processes. These questions are as follows:

      Major points

      Overall system characterization and mathematical analysis

      1) The Western-based characterization of the amount of matrix-localized 11S (shown in Figure 1 - figure supplement 1) shows that the concentration of 11S varies significantly (> twofold concentration difference, quantified as a ratio to Tom40) among yeast/mitochondria preps. Is there a particular reason for this large variability? Perhaps more significantly, the import efficiency (judged by luminescence amplitude) shows high batch variability as well (> twofold efficiency difference). While this series of experiments makes the case that the luminescence readout of import is not limited by matrix-localized 11S, it does raise a potential concern of batch-to-batch variation in import competence. Could this have any implications for the reproducibility of results by this assay, particularly regarding the kinetic parameters reported?

      It is very difficult to know what causes this variability as it can be seen even between triplicate preparations carried out on the same day. It could be due to slight differences in the flasks used to grow cells (such as the size of the baffles). However, we have determined that the variability in 11S concentration does not correlate with import competence (Figure 1 – figure supplement 1C), and that the kinetics of import are not affected (Figure 1 – figure supplement 2C).

      2) My understanding from the Pereira 2019 JMB paper is that the yeast expressing the matrix-targeted 11S were engineered so that the 11S construct contained a 35 residue presequence from ATP1. In Figure 1 - figure supplement 1, panel A, it looks like the mitochondria-derived 11S constructs are significantly larger than the purified 11S constructs used to calibrate concentration. If the added residues on the mitochondrial 11S constitute a presequence, then they should be cleaved up on import to yield the mature sized protein. Why are the mitochondrial 11S constructs so much larger than the purified ones? Explicit labeling of MW markers would be useful here.

      We noted that it seemed likely that the presequence was not getting cleaved off. There may also be some kind of SDS-PAGE mobility issues for 11S (common for beta-barrels), such that the purified version has a different mobility to the matrix localised version. Therefore, the possibility remains that the MTS is cleaved off, but the mature product migrates anomalously on gels. For this reason we carried out experiments to show that 11S is matrix localised, which turned out to be the case (Figure 1 – figure supplement 1D). So irrespective non-MTS cleavage, or unexpected gel mobility of correctly processed 11S, the reporter is where it should be – in the matrix. These points are elaborated in the text.

      Labels have been added to molecular weight markers, as requested.

      3) From Figure 1D, given that the amplitude linearly increases with added Acp1pep86 up to ~45 nM, this suggests that matrix-localized 11S is in stoichiometric excess of imported peptide within this range of added substrate. Given a matrix [11S] of 2.8 uM, a stoichiometrically equivalent amount of Acp1-pep86 would be equivalent to an import of <0.5% of added substrate, and it is suggested that import efficiency is actually much lower than that. How can this very low import efficiency be explained?

      Import is single turnover under our assay conditions and is therefore limited by the number of import sites rather than matrix [11S]. Under standard conditions, we intentionally add substrate in vast excess and only anticipate that a very small proportion will be imported.

      4) Apropos of point #3 above: Given the low efficiency of import observed for the purified PCP substrates in this study, one wonders if this due to the formation of off-pathway (translocation incompetent) precursors established during the import reaction, before substrates have a chance to engage OM receptors (e.g., due to aggregation, etc.) In this case, the interpretation of single-turnover conditions may instead be caused by a vast majority of PCP losing translocation competence, rather than the requirement for energetic resetting that is suggested. Might this be a possibility?

      We anticipate that some PCP will aggregate and add substrate in excess to allow for that. Our interpretation of the reaction as single turnover was drawn from a comparison of PCP-pep86-DHFR import amplitude in the presence versus absence of MTX, rather than amplitudes from absolute amounts of PCP. We cannot think of a reason why MTX would affect protein solubility.

      5) Import time courses in many cases show a progressive drop in luminescence at later time points after a maximum value has been reached. This reduction in signal cannot be accounted for by the two rate constants in the equation used in two-step kinetic model. How were such luminescence deviations accounted for when fitting data to obtain these kinetics parameters? What might be the reason for this downward drift in signal once maximum amplitude has been reached?

      We almost always see this gradual drop in luminescence in both the mitochondrial and bacterial systems. The data points acquired after the amplitude are excluded for the fitting. The assay is based on an enzymatic reaction and we think that the downward drift is due to a combination of substrate depletion and accumulation of reaction products.

      Import kinetics: dependence on total protein size

      6) In Figure 3 - figure supplement 1, some of the kinetic parameters from the PCP concentration-dependent responses are quite noisy. For instance, responses for the shortest constructs (L and DL) show a lot of variability in the k1 and k2 parameters. Is this (partly) due to difficulty in resolving these two parameters during the nonlinear least-squares fitting protocol for these particular constructs?

      It is difficult to resolve k1 and k2 perfectly, so the numbers are only estimates.

      7) The data in Figure 3, panels E and F (derived from Figure 3 - figure supplement 1) in some cases show non-linear dependence of kinetic parameters on the 'N to pep86 distance' for the length (panel E) and position (panel F) variants. For instance, from the length series, the k1 mean goes from 132 to 385 to 237 nM for the DL, DDL, and DDDL constructs, respectively. The variances suggest that these differences are real. Is there a reason that kinetic parameters would have such non-monotonic dependence on length?

      We don’t know the reason for this variance, but it could be investigated in future studies.

      Import kinetics: dependence on energetic input

      8) The data of Figure 4A show the results of partial dissipation of the membrane potential by 10 nM valinomycin. Most studies designed to cause a gradual dissipation of membrane potential do so by protonophore (e.g., CCCP) titration. Given that matrix-directed import is completely blocked by low micromolar amounts of this potent ionophore, it would be useful to have an independent readout (e.g., TMRM measurements) of the residual membrane potential that exists upon treatment with the lower concentrations of valinomycin used here.

      We have now included data that shows the partial effect of 10 nM valinomycin on membrane potential (TMRM measurements) and protein import (Figure 4 – figure supplement 1A-B).

      9) The step associated with k1, designated as transport across the TOM complex, is suggested to go to completion before starting the step associated with k2, engagement of the TIM23 complex. The k1 step shows a strong dependence on membrane potential (Fig. 4A, middle), particularly for the length series. Why would this be, given that no part of translocation across the OM should be associated with a valinomycin-sensitive electric potential?

      This effect is relatively small and mainly affects shorter PCPs. Our interpretation is that passage of the PCP through TOM is reversible, and committing PCP to import across the IMM (which requires ∆ψ) prevents this reversibility. However, it is also possible that transport through TOM and TIM23 are partially coupled. Both these possibilities are discussed in the discussion.

      Working model

      10) One of the most surprising outcomes of this study is that passive transport of substrates across the TOM complex and energy-coupled transport via the TIM23 complex are kinetically separable and independent events. As the authors note in the Discussion, the current paradigm of the field is that matrix-targeted substrates concurrently traverse the OM and IM via the TIM-TIM23 supercomplex, and this model is supported by quite a bit of experimental evidence. Even in this study, the fact that the PCP-pep86-DHFR construct exposes the pep86 sequence to the matrix in the presence of MTX (Figure 2) is evidence of a two membrane-spanning intermediate. Key mechanistic questions arise regarding the model proposed in this study. For example, if PCPs traverse the TOM complex as a stand-alone step, what is the driving force (e.g., a simple pathway of protein interactions with increasing affinity)? And would soluble, matrix-directed substrates be expected to accumulate in the very restricted space of the IMS? If so, how would TIM23directed membrane proteins keep from aggregating in the aqueous IMS? These questions would be worth addressing in the discussion of the model.

      We have included a discussion of the experimental evidence for TOM-TIM23 supercomplexes. The acid chain hypothesis has been proposed as the driving force for PCP transport though TOM ‒ an interaction between positive charges of the presequence and negatively charged residues within the TOM40 channel. Proteins that are targeted to the IMS are imported through TOM without the participation of TIM23 and we think that matrix-targeted proteins can do the same. This could explain why TOM is in excess over TIM23. We also think that some matrix-targeted PCPs can accumulate in the IMS, although this may not be true of membrane proteins.

      Import kinetics: dependence on MTS charge distribution

      11) The fact that import rates are increased with a more electropositive presequence makes sense in terms of the electrophoretic pull exerted on the PCP (matrix, negative). However, the greater accumulation of precursors containing more electronegative presequences remains puzzling. In the manuscript, this is explained based on the concept that accumulation of positive charges will cause partial collapse the membrane potential. However, I am still uncertain about this explanation for a few reasons. First, for each PCP, the presequence will constitute just a small fraction of the total length of the precursor, and therefore contribute a small fraction of the total charge density of imported protein. Would such a small change in total PCP charge be expected to have the dramatic effect observed among samples?

      The majority of the total PCP charge is from the mature region, and whilst the positive charges in the presequence undoubtedly deplete ∆ψ, the differences in extent of ∆ψ depletion that we see between PCPs that vary in charge, is due to the difference in charge of the mature regions (as their presequences are identical).

      Second, given the small amount of protein imported under these conditions, would the total charge of imported PCPs be expected to affect transmembrane ion distribution so significantly? For instance, as I recall, it takes up to micromolar amounts of mitochondria-targeted lipophilic cations (e.g., TPP+) to cause a major change in the TMRM-detected membrane potential.

      The effect was indeed unexpected. Despite the seemingly small number of PCPs that are imported, the total number of charged residues will be much greater.

      Finally, I would expect isolated mitochondria to be capable of respiratory control. It is well known, for example, that isolated mitochondria can respond to temporary draw-down of the membrane potential (e.g., by ADP/Pi addition) by going into state 3 respiration and restoring membrane gradients. Why would that not be the case here (Figure 5D)?

      The isolated mitochondria that we used for the import assays demonstrate increased O2 consumption in response to ADP addition, as expected (Figure 5 – figure supplement 1A-B). In addition to this new figure, we have now included TMRM data (Figure 6 – figure supplement 2B) that shows a depletion of ∆ψ in response to ADP addition, that is temporary and dependent on the amount of ADP added. We are therefore confident that our isolated mitochondria are capable of respiratory control as expected. We think that the lack of restoration of ∆ψ, following import-induced dissipation, is a consequence of the import process in vitro. Perhaps the import process compromises the channel resulting in concomitant ion/ charge dissipation during the active process. Moreover, this is likely to be exacerbated in vitro upon acute exposure to PCP, causing a sudden saturation of the import sites – thereby compromising the ∆ψ and the mitochondria’s ability to rapidly recover (this possibility has been noted in the MS).

      General

      12) Although the spectral approach in this study is developed as an alternative to the more traditional import assays, it would be useful to have some control import tests (done with Westerns or autoradiography) as complements to the luminescence-based imports. For example, control tests to accompany Figure 1 that show import efficiency or tests that accompany Figure 3 to show import of the different length and position series constructs. Perhaps this could be done with immunodetection of Acp1 or the pep86 epitope, showing protease-protected, processed import substrates that appear in a membrane potential/ATP-dependent manner. Even if the results from the more traditional techniques ran contrary to the results using the NanoLuc system, this would still allow the authors to compare which effects are consistent and which are dissimilar between different approaches.

      We have now included a Western blot import assay for the PCP-pep86-DHFR substrate and show that import is ∆ψ-dependent (Figure 2 ‒ figure supplement 1).

      13) The authors might also consider conducting imports with mitoplasts as a way to test the kinetic model that includes the TIM23-mediated step alone.

      We conducted import assays with mitoplasts and have now included this as a main Figure 5.

      14) It is difficult to follow the logic in the Discussion regarding the number of TIM23 sites limiting the number of 11S imported into mitochondria in live cells (page 15, lines 23-27). Are the authors suggesting that in vivo, one TIM23 complex serves to transport a single protein? This needs to be clarified.

      This has been removed, and this section of the discussion has been clarified.

    1. Author Response

      Reviewer #1 (Public Review):

      The paper is very well written, the question is interesting, and the analyses are innovative. However, I do have concerns about the overall approach. My main concern is about looking at asymmetries in the low dimensional representation of connectivity. A secondary concern has to do with looking at the parcellated connectome. I explain these concerns in succession below.

      We thank the Reviewer for the appreciation of our work and the insightful comments, which we have addressed below. The page numbers are corresponding to the clean version of the manuscript.

      The first concern is to me quite a fundamental issue: looking at connectivity in a low dimensional space, that of the laplacian eigenvectors. There are two issues with this. The first one, which is less important than the second, is that the authors have a reference embedding to which they align other embeddings using a procrustes method with no scaling. While the 3D embedding is still optimally representing the connectivity (because distances don't change under rotations), we can no longer look at one axis at a time, which is what the authors do when they look at G1. In this case, G1 is representative of the connectivity of the reference matrix (LL), but not the others.

      But even if the authors only projected their matrices onto a single G1 dimension with no procrustes (and only sign flipping if necessary), there is still a major issue. One implicit assumption of this whole approach is that if there is a change in connectivity somewhere in the original matrix, the same "nodes" of the matrix will change in the embedding. This is not the case. Any change in the original matrix, even if it is a single edge, will affect the positions of all the nodes in the embedding. That is because the embedding optimises a global loss function, not a local one.

      To make this point clear, consider the following toy example. Say we have 4 brain regions A,B,C,D. Let us say that we have the following connectivity:

      In the Left Hemisphere: A-B-C-D

      In the Right Hemisphere: A-B=C-D

      So the connection between B and C is twice as strong in the right hemi, and everything else remains the same.

      The low dimensional embedding of both will look like this:

      Left: ... A ... B ....... C ... D ...

      Right A... ... ... B ... C ... ... ... D

      Note how B,C are closer to each other in the RIGHT, but also that A,D have moved away from each other because the eigenvector has to have norm 1.

      So if we were to calculate an asymmetry index, we would say that:

      A is higher on the LEFT

      B is higher on the RIGHT

      C is higher on the LEFT

      D is higher on the RIGHT

      So we have found asymmetry in all of our regions. But in fact the only thing that has changed is the connection between B and C.

      This illustrates the danger of using a global optimisation procedure (like low-dim embedding) to analyse and interpret local changes. One has to be very careful.

      We thank the Reviewer for the detailed description of the first concern. We agree that low-dimensional embeddings describe global embedding of local features, rather than local phenomena. Moreover, we indeed assume that the connectivity embedding of a given node gives us information about its position along ‘gradients’ relative to other nodes and their respective embedding. Thus, indeed, when a single node (node X) has a different connectivity profile in the right hemisphere relative to the left, this will also have some impact on the embeddings of all nodes showing a relevant (i.e., top 10%) connection to node X.

      To evaluate whether asymmetry could be observed in average connectivity within functional networks, an alternative approach to measure asymmetry was taken by computing average connectivity within different functional networks. Following we compared the within-network connectivity between left and right. We have now added this conceptual analysis to our results robustness analysis section. In short, we observed that transmodal networks (DMN, FPN, and language network) showed higher connectivity in the left hemisphere but other networks showed higher connectivity in the right hemisphere. Thus, this indicates that observations made with respect to asymmetry of functional gradients are similar to those observed for within-network functional asymmetry between the left and right hemispheres. We have now detailed the outcome of this analysis in our Result section and Supplementary Materials.

      Results, p.14.: “As low-dimensional embedding is a global approach to summarize functional connectivity we reiterated our analysis by evaluating asymmetry of within network functional connectivity in the current sample. Observations made with respect to asymmetry of functional gradients are similar to those observed for within-network functional asymmetry between the left and right hemispheres.”

      “To further explore functional connectivity asymmetry between left and right hemispheres, we calculated the LL within network FC and RR within network FC (Figure 2-figure supplement 5). It showed that connections in the left hemisphere and right hemisphere were relatively equal in the global scale. However, for the local differences, networks showed significant subtle leftward or rightward asymmetry (vis1: t = -5.203, P < 0.001; vis2: t = -22.593, P < 0.001; SMN: t = -8.262, P < 0.001; CON: t = -32.715, P < 0.001; DAN: t = -11.272, P < 0.001; Lan.: t = 33.827, P < 0.001; FPN: t = 24.439, P < 0.001; Aud.: t = 0.191, P = 0.849; DMN: t = 11.303, P < 0.001; PMN: t = -35.719, P < 0.001; VMN: t = -11.056, P < 0.001; OAN: t = 0.311, P = 0.756).”

      Irrespectively, we have further highlighted that such a global interpretation for asymmetry of areas is still meaningful, given that a node is always placed in a global context. We have now further explained that our metrics give insights in local embedding of global phenomena in the introduction, p. 3.

      Introduction, p. 3: “These low-dimensional gradient embeddings describe global embedding of local features, rather than local phenomena. Thus, interpretation for asymmetry of areas is under a global context.”

      My second concern is about interpreting the brain asymmetry as differences in connectivity, as opposed to differences in other things like regional size. The authors use a parcellated approach, where presumably the parcels are left-right symmetric. If one area is actually larger in one hemisphere than in the other, the will manifest itself in the connectivity values. To mitigate this, it may be necessary to align the two hemispheres to each other (maybe using spherical registration) using connectivity prior to applying the parcellation.

      Thanks for this nice idea. We have now computed the differences of the mean rsfMRI connectome along the first gradient at the vertex level using 100 random subjects, as we have the data mapped to a symmetric template (fs_LR_32k), indicating that each vertex has a symmetric counterpart in the right hemisphere. Our results show left-right asymmetry as language/default mode-visual-frontoparietal vertices, which is consistent with the main results of the parcel-based approach. We have also added this response to the Supplementary materials.

      Though overall findings are consistent, spherical registration may also have new issues. Total anatomical spatial symmetry may not provide functional comparability at the vertex level between left and right hemisphere. For example, during language tasks in the current sample, the activated frontal region in the left hemisphere is larger than the activated contralateral region in the right hemisphere. In the current study, we aimed to evaluate asymmetry between functionally and structurally homologous regions, as described by the Glasser atlas. In case of the resting state fMRI data, we used the region-wise symmetric multimodal parcellation (Glasser et al., 2016). This parcellation ensures the functional contralateral regions in both hemispheres. A previous study (Williams et al., 2021) investigated the structural and functional asymmetry in newborn infants. They used spherical registration (make fs_LR symmetric) for structural asymmetry but not for functional asymmetry. As such spheric registration may hide functional information, we think spherical registration may be more suitable for structural studies.

      To address the concern regarding the alignment of hemispheres, we used joint alignment for LL and RR to compare the results between this and the Procrustes alignment technique (Pearson r=0.930, P_spin<0.001), below is the figure of asymmetry along the principal gradient (upper: joint alignment, below: Procrustes alignment) indicating convergence between both approaches. We have reported this information in the Supplementary Materials.

      Lastly, we do agree that parcel size might be an important issue influencing the asymmetry pattern. To test for such an effect, we performed the correlation between the rank of parcel size (left-right)/(left+right) and rank of asymmetry index. It suggests only a small insignificant correlation along G1 (Spearman r_intra=0.130, P_spin=0.105; Spearman r_inter=0.130, P_spin=0.084). Of note, there is a systematic difference in parcel size as a function of sensory-association hierarchy, indicating that the link between parcel-size and asymmetry may vary as a function of sensory vs associative regions.

      Reviewer #2 (Public Review):

      Using recently-developed functional gradient techniques, this study explored human brain hemispheric asymmetry. The functional gradient is a hot technique in recent years and has been applied to study brain asymmetries in two papers of 2021. Compared to previous studies, the current study further evaluated the degree of genetic control (heritability) and evolutionary conservation for such gradient asymmetries by using human twin data and monkey's fMRI data. These investigations are of value and do provide interesting data. However, it suffers from a lack of specific hypotheses/questions/motivations underlying all kinds of analyses, and the rich observational or correlational results seem not to offer significant improvement of theoretical understanding about brain asymmetries or functional gradient. In addition, given the limited number of twins in HCP project (for a heritability estimation), the limited number of monkeys (20 monkeys), and the relatively poor quality of monkeys' resting functional MRI data, the results and conclusion should be taken cautiously. Below are major concerns and suggestions.

      We thank the Reviewer for the evaluation of our work and the helpful suggestions.

      The gradient from resting-state functional connectome has been frequently used but mainly at the group level. The current study essentially applied the gradient comparison (i.e., gradient score) at the individual level. Biological interpretation for individual gradient score at the parcel level as well as its comparability between individuals and between hemispheres should be resolved. This is the fundamental rationale underlying the whole analyses.

      We thank the Reviewer for this remark, and are happy to provide further rationale for using and comparing individual gradients scores to evaluate individual variation in asymmetry and associated heritability. Though gradients from resting-state functional connectivity have been frequently used at the group level, various studies have also studied individual differences. For example, using linear mixed models to compare gradient scores between left and right across subjects (Liang et al., 2021), applying the individual gradient scores to compare disease and controls (Dong et al., 2020, 2021; Hong et al., 2019; Park et al., 2021), and link individual hippocampal gradients to memory recollection (Przeździk et al., 2019). Together, these studies show individual variations of local gradients, indicating changes in node centrality and hubness (Hong et al., 2019), and connectivity profile distance (Y. Wang et al., 2021). Of note, low-dimensional embeddings describe global embedding of local features, rather than local phenomena. Thus, interpretation for asymmetry of areas is under a global context. The biological interpretation for individual gradients would be to what degree the system segregated and integrated has changed patterns of ongoing neural activity (Mckeown et al., 2020). It reflects that individuals have different functional boundaries between anatomical regions. Whereas, individual neurons are embedded under the global-local boundaries through a cortical wiring space consisting of intricate long- and short-range white matter fibers (Paquola et al., 2020).

      Introduction, p. 4: “We applied the individual gradient scores to study the asymmetry, consistent with prior studies (Gonzalez Alam et al., 2021; Liang et al., 2021). Individual variation along the gradients reflects a global change across subjects in the functional connectome integration and segregation, and it is under genetic control (Valk et al., 2021). Moreover, to what degree the system segregated and integrated relates to patterns of ongoing neural activity (Mckeown et al., 2020), and different individuals have different functional boundaries between anatomical regions.”

      Results, p. 5: “Next, individual gradients were computed for each subject and the four different FC modes and aligned to the template gradients with Procrustes rotation. It rotates a matrix to maximum similarity with a target matrix minimizing sum of squared differences. As noted, Procrustes matching was applied without a scaling factor so that the reference template only matters for matching the order and direction of the gradients. Therefore, it allows comparison between individuals and hemispheres. The individual mean gradients showed high correlation with the group gradients LL (all Pearson r > 0.97, P spin < 0.001).”

      Only the first three gradients are used but why? What about the fourth gradient? Specific theoretical interpretation is needed. At the individual level, is it ensured that the first gradients of all individuals correspond to each other? In this study, it is unclear whether we should or should not care about the G2 and G3. The results of G2 and G3 showed up randomly to some degree.

      In the current study we focused on the principal gradient in the main analysis, given its association with sensory-transmodal hierarchy, microstructure, and evolutionary alterations (Margulies et al., 2016; Paquola et al., 2019; Xu et al., 2020).

      Conversely, gradient 2 reflects the dissociation between visual and sensory-motor networks and gradient 3 is linked to task-positive, control, versus ‘default’ and sensory-motor regions. We analyzed asymmetry and its heritability of the first three gradients (explaining respectively 23.3%, 18.1%, and 15.0% of the variance of the rsFC matrix). However, we extracted the first ten gradients to maximize the degree of fit (Margulies et al., 2016; Mckeown et al., 2020). We have now also shown G4-10 mean asymmetry results as a supplementary figure. To ensure correspondence of gradients across individuals, we aligned the individual gradients to the group level template with Procrustes rotation. Procrustes rotation rotates a matrix to maximum similarity with a target matrix minimizing sum of squared differences. The approach is typically used in comparison of ordination results and is particularly useful in comparing alternative solutions in multidimensional scaling. Figure S1 shows the mean gradients across subjects of each FC mode, which is close to the Figure 1D template gradient space.

      Results, p. 5: “The current study analyzed asymmetry and its heritability of the first three gradients explaining most variance (Figure 1d). As they all have reasonably well described functional associations (G1: unimodal-transmodal gradient with 24.1%, G2: somatosensory-visual gradient with 18.4%, G3: multi-demand gradient with 15.1%). However, given we extracted ten gradients to maximize the degree of fit 26,52. We stated mean asymmetry of G4-10 in Figure 1-figure supplement 1.”

      The intra-hemispheric gradient is institutive. However, it is hard to understand what the inter-hemispheric gradient means. From the data perspective, yes you can do such gradient comparison between the LR and RL connectome but what does this mean? Why should we care about such asymmetry? From the introduction to the discussion, the authors simply showed the data of inter-hemispheric gradients without useful explanation. This issue should be solved.

      We are happy to further clarify. The LR and RL connectivity reflects cross-hemispheric functional signal interaction via corpus callosum, whose structural asymmetry is usually studied (Karolis et al., 2019). Such intra-hemispheric connections, compared to the inter-hemispheric connections, have been suggested to reflect the inhibition of corpus callosum, and underlie hemispheric specialization. Different information relies on hemispheric specialization (e.g., visual, motor, and crude information) and/or inter-hemispheric information transfer (e.g., language, reasoning, and attention) (Gazzaniga, 2000). To clarify and motivate the analysis of both intra- and inter-hemispheric asymmetry in functional gradients, we have now added further detail in the introduction, p. 5.

      Here is text: Introduction, p. 4. “The full FC matrix contains both intra-hemispheric and inter-hemispheric connections. Intra-hemispheric connections, compared to the inter-hemispheric connections, have been suggested to reflect the inhibition of corpus callosum and may underlie hemispheric specializations involving language, reasoning, and attention. Conversely, inter-hemispheric connectivity may reflect information transfer between hemispheres, for example a wide range of modal and motor information, and crude information concerning spatial locations 48. Previous studies have reported intra-hemispheric FC to study gradient asymmetry 6,38. By having the callosum related to association white matter fibers, one hemisphere could develop for new functions while the other hemisphere could continue to perform the previous functions for both hemispheres 48. Therefore, in addition to the intra-hemispheric FC gradients, we depicted the inter-hemispheric FC, which is abnormal in patients with schizophrenia 23,49 and autism 24.”

      as well as Discussion, p. 16 “Conversely, the transmodal frontoparietal network was located at the apex of rightward preference, possibly suggesting a right-ward lateralization of cortical regions associated with attention and control and ‘default’ internal cognition 62,63. The observed dissociation between language and control networks is also in line with previous work suggesting an inverse pattern of language and attention between hemispheres 3,64. Such patterns may be linked to inhibition of corpus callosum 65, promoting hemispheric specialization. It has been suggested that such inter-hemispheric connections set the stage for intra-hemispheric patterns related to association fibers 48. Future research may relate functional asymmetry directly to asymmetry in underlying structure to uncover how different white-matter tracts contribute to asymmetry of functional organization.”

      and Discussion, p.18 “Though overall intra- and inter-hemispheric connectivity showed a strong spatial overlap in humans, we also observed marked differences between both metrics across our analysis. For example, although we found both intra- and inter-hemispheric differences in gradient organization to be heritable, only for intra-hemispheric asymmetry we found a correspondence between degree of asymmetry and degree of heritability. Similarly comparing asymmetry observed in human data to functional gradient asymmetry in macaques, we only observed spatial patterning of asymmetry was conserved for intra-hemispheric connections. Whereas intra-hemispheric asymmetry relates to association fibers, commissural fibers underlie inter-hemispheric connections 77 It has been suggested that there is a trade-off within and across mammals of inter- and intra-hemispheric connectivity patterns to conserve the balance between grey and white-matter 76. Consequently, differences in asymmetry of both ipsi- and contralateral functional connections may be reflective of adjustments in this balance within and across species. Secondly, previous research studying intra- and inter-hemispheric connectivity and associated asymmetry has indicated a developmental trajectory from inter- to intra-hemispheric organization of brain functional connectivity, varying from unimodal to transmodal areas 78,79. It is thus possible that a reduced correspondence of asymmetry and heritability in humans, as well as lack of spatial similarities between humans and macaques for inter-hemispheric connectivity may be due to the age of both samples (young adults in humans, adolescents in macaques). Further research may study inter- and intra-hemispheric asymmetry in functional organization as a function of development in both species to further disentangle heritability and cross-species conservation and adaptation.”

      When aligning intra-hemispheric gradient, choosing averaged LL mode as the reference may introduce systematic bias towards left hemisphere. Such an issue also applies to LR-RL gradient alignment as well as cross-species gradient alignment. This methodological issue should be solved.

      We thank the Reviewer for raising this point. Indeed, we also used RR as reference, the results were virtually identical. We have stated this in the Results, p. 13. Regarding the cross-species alignment, we averaged the left and right hemispheres to reduce the systematic bias. It showed that the correlation and comparison results remained robust. Now we have updated the method and corresponding results (p.10). Here is the text:

      Results (p.15): “We also set the RR FC gradients as reference, the first three of which explained 22.8%, 18.8%, and 15.9% of total variance. We aligned each individual to this reference. It suggested all results were virtually identical (Pearson r > 0.9, P spin < 0.001).”

      Results (p.10): “To reduce a possible systematic hemispheric bias during the cross-species alignment, we averaged the left and right hemisphere. We found that the macaque and macaque-aligned human AI maps of G1 were correlated positively for intra-hemispheric patterns (Pearson r = 0.345, P spin = 0.030). For inter-hemispheric patterns, we didn’t observe a significant association (Pearson r = -0.029, P spin = 0.858)”

      The sample size of monkey (i.e., 20) is far less than human subjects (> 1000). Such limitation raises severe concern on the validity of the currently observed gradient asymmetry pattern in the monkey group, as well as the similarity results with human gradient asymmetry pattern. Despite the marginal significance of G1 inter-hemisphere gradient between humans and monkeys, I feel overall there is no convincingly meaningful similarity between these two species. However, the authors' discussion and conclusion are largely based on strong inter-species similarity in such asymmetry. The conclusion of evolutionary conservation for gradient asymmetry, therefore, is not well supported by the results.

      We agree with your comments. Although it is a small sample compared to humans, in NHP studies, it is a relatively decent sample size (most of the studies have N<10). Of note, recent work suggested that the individual variation pattern can be captured using 4 subjects in both human and macaques (Ren et al., 2021).

      To overcome potential overinterpretation of our findings, we have now changed the title to a more descriptive format: “Heritability and cross-species comparisons of asymmetry of human cortical functional organization”

      And further detailed findings already in the Abstract; “These asymmetries were heritable in humans and, for intra-hemispheric asymmetry of functional connectivity, showed similar spatial distributions in humans and macaques, suggesting phylogenetic conservation.”

      We have pointed out the small sample size in the limitation. Please find the text below: Discussion, p. 18: “Due to the small sample size of macaques, it is important to be careful when interpreting our observations regarding asymmetry in macaques, and its relation to asymmetry patterning observed in humans. Therefore, further study is needed to evaluate the asymmetry patterns in macaques using large datasets 53,79”

      And nuanced the conclusion, p.19: “This asymmetry was heritable and, in the case of organization of intra-hemispheric connectivity, showed spatial correspondence between humans and macaques. At the same time, functional asymmetry was more pronounced in language networks in humans relative to macaques, suggesting adaptation.”

    1. Author Response

      Reviewer #2 (Public Review):

      Weaknesses:

      1) It is surprising that certain enzymes with established depalmitoylation activity were excluded from BrainPalmSeq data-base (e.g. ABHD4, ABHD11, ABHD12, ABHD6)

      We have now included additional depalmitoylating enzymes in our database and manuscript.

      2) Albeit not essential it will be of great interest to include in the established database enzymes necessary for synthesis of ACYL-CoA (e.g. ACSL enzymes). One improvement may include the ability of future researchers to add such curated analysis to the platform within future research studies.

      We agree with the reviewer there are many expansions of our gene set that would be interesting to include. Given the size of the current manuscript however, for brevity we have decided at present to curate data for the core set of genes that directly regulate dynamic palmitoylation. We have also added a ‘Contact Us’ feature to the website, so that repeatedly requested genes or datasets can be added in future.

      3) The experimental validation presented in figure 6 relies on over-expression of substrates and ZDHHC enzymes. This setup is known to often provide unspecific S-acylation events which result from excess enzyme or substrate availability. Hence, such validation would be greatly strengthened by loss of function experiments.

      We have now done loss-of-function experiments and included results in major discussion point 1 above. If the editors/reviewers think it is appropriate to add to the manuscript, we will comply. However, as our negative data does not negate the fact that ZDHHC9 is able to palmitoylate the myelin proteins tested, but merely suggests it may not be necessary for protein palmitoylation in vivo, we do not think it strengthens the manuscript.

      4) The authors relevantly use in-situ hybridization images from the Allen Brain atlas to validate their predictions. Although it is understandable that an extensive experimental validation of the predictions here established would be out of the scope of the current study, this work could be improved by validating the RNA expression at the protein level of certain abundant ZDHHC enzymes in available neuro-associated cell types.

      We have now validated RNA expression at the protein level for a few palmitoylating and depalmitoylating enzymes.

      5) It would be interesting if the authors would further compare the predicted association clusters (e.g. figure 1), substrates (figures 1 and 2), and S-acylation pairs (figure 4) here determine, with previous determined ZDHHC enzyme associations described in different cell types and biological systems. Alternatively, further relevant validation could include testing whether further established ZDHHC-ZDHHC cascades (e.g. ZDHHC3-7) can be also detected with specific cells or regions of the CNS.

      On our website, all expression data can be downloaded below the heatmaps for each study, and the cell type expression relationships between any 2 genes can be plotted by the user to reveal cell types (if any) within which genes are co-expressed. In response to this comment and that of Reviewer 3 below, we have now performed such analysis on ZDHHC5/ZDHHC20 and ZDHHC6/ZDHHC16, which are to our knowledge the best established ZDHHC cascades. We have included these plots in new Figure 1 – figure supplement 2, along with discussion on line 172. Similar analysis has been performed on the known ZDHHC-accessory protein pairs (see below).

      6) Figure 3B: it is not clear why the cluster of zdhhcs with high layer specific expression displayed at the top of the graph does not follow the low-to-high expression scale of the table.

      The expression data in this figure is grouped by hierarchical clustering, rather than in order of low-to-high expression, in order to be consistent with Figure 2B. While we believe this is the better way to display the data, we are willing to modify if the editors/reviewers have a strong preference.

      7) Figure 4D: the more relevant potential cooperative pairs (ZDHHCs-APTs) could be highlighted in more contrasted colours.

      We thank the reviewer for this suggestion but at this stage would prefer to keep the color scheme as it is so that readers are better able to formulate their own hypotheses when observing these figures.

      Reviewer #3 (Public Review):

      Weaknesses:

      1) There is a vast amount of data available and the description and discussion of this could be endless, but there are a few points that could be brought out in more detail. For example, the correlation (or lack of correlation) of expression of the proposed zDHHC-PAT accessory proteins with their cognate zDHHCs. The dominance of a relatively small number of zDHHC enzymes (20, 2, 17, 3, 21, 8) in the CNS also merits some discussion. Is the combination of a high-capacity, low-specificity enzyme (zDHHC3) with others that are regarded as more 'specific'? I believe none of these are ER-resident - they represent Golgi and PM?

      The reviewer brings up many interesting questions. Indeed, we were hopeful that this type of mining of RNAseq data would bring to light many questions that can be followed up on in future publications.

      We have addressed the correlation in expression of accessory proteins with their cognate ZDHHCs with new data.

      We are unsure how to address the dominance of a relatively small number of ZDHHC enzymes (20, 2, 17, 3, 21, 8) in the CNS, beyond highlighting this expression pattern. We believe that interpretation of the expression of this in any way (e.g. co-expression of high-capacity, low-specificity enzymes (ZDHHC3) with more 'specific' ZDHHCs) would merely be speculative. However, we are open to adding further discussion with some guidance from the reviewer.

    1. [Bruno Giussani, co-curator of TED] gave the example of Steven Pinker‘s popular TED talk on the decline of violence over the course of history, based on his book The Better Angels of Our Nature. Pinker is a respected professor of psychology at Harvard, and few would accuse him of pulling his punches or yielding to thought leadership’s temptations. Yet his talk became a cult favorite among hedge funders, Silicon Valley types, and other winners. It did so not only because it was interesting and fresh and well argued, but also because it contained a justification for keeping the social order largely as is. Pinker’s actual point was narrow, focused, and valid: Interpersonal violence as a mode of human problem-solving was in a long free fall. But for many who heard the talk, it offered a socially acceptable way to tell people seething over the inequities of the age to drop their complaining. ‘It has become an ideology of: The world today may be complex and complicated and confusing in many ways, but the reality is that if you take the long-term perspective you will realize how good we have it,’ Giussani said. The ideology, he said, told people, ‘You’re being unrealistic, and you’re not looking at things in the right way. And if you think that you have problems, then, you know, your problems don’t really matter compared to the past’s, and your problems are really not problems, because things are getting better.’Giussani had heard rich men do this kind of thing so often that he had invented a verb for the act: They were ‘Pinkering’ — using the long-run direction of human history to minimize, to delegitimize the concerns of those without power. There was also economic Pinkering, which ‘is to tell people the global economy has been great because five hundred million Chinese have gone from poverty to the middle class. And, of course, that’s true,’ Giussani said. ‘But if you tell that to the guy who has been fired from a factory in Manchester because his job was taken to China, he may have a different reaction. But we don’t care about the guy in Manchester. So there are many facets to this kind of ideology that have been used to justify the current situation.’ —Winners Take All, pp. 126-127

      An early example of the verbification of Steven Pinker's name. Here it indicates the view of predominantly privileged men to argue that because the direction of history has been so positive, that those without power shouldn't complain.

      I've also heard it used to generally mean a preponderance of evidence on a topic, as seen in Pinker's book The Better Angels of Our Nature, but still not necessarily convincingly prove one's thesis.

    1. It is ironical that we Senators can in debate in the Senate directly or indirectly, by any form of words, impute to any American who is not a Senator any conduct or motive unworthy or unbecoming an American -- and without that non-Senator American having any legal redress against us -- yet if we say the same thing in the Senate about our colleagues we can be stopped on the grounds of being out of order. It is strange that we can verbally attack anyone else without restraint and with full protection and yet we hold ourselves above the same type of criticism here on the Senate Floor.  Surely the United States Senate is big enough to take self-criticism and self-appraisal.  Surely we should be able to take the same kind of character attacks that we "dish out" to outsiders. I think that it is high time for the United States Senate and its members to do some soul-searching -- for us to weigh our consciences -- on the manner in which we are performing our duty to the people of America -- on the manner in which we are using or abusing our individual powers and privileges.

      Aristotelian criticism is largely concerned with wondering how effective an artifact is in reaching its intended audience. In this case, Senator Smith never mentions Sen. McCarthy by name, but given the historical context, it is obvious she refers to him and his supporters in condemning "the Senate and its members". If one were to measure her success in doing so, as mentioned in Lorraine Boissoneault's Smithsonian article, stating , "The one person who didn’t forget Smith’s speech was McCarthy himself. 'Her support for the United Nations, New Deal programs, support for federal housing and social programs placed her high on the list of those against whom McCarthy and his supporters on local levels sought revenge,' writes Gregory Gallant in Hope and Fear in Margaret Chase Smith’s America. When McCarthy gained control of the Permanent Subcommittee on Investigations (which monitored government affairs), he took advantage of the position to remove Smith from the group, replacing her with acolyte Richard Nixon, then a senator from California." This may not have been an intended effect, but shows nonetheless the significance and degree to which Smith's speech was able to reach her audience. Unfortunately however, it's hard to say how effective entirely her speech would have been, as popularity of her speech waned as the Korean War broke out later the same month, inclining many to take a more right-wing, anti-communist approach favored by McCarthy and many other Republicans.

    1. It is ironical that we Senators can in debate in the Senate directly or indirectly, by any form ofwords, impute to any American who is not a Senator any conduct or motive unworthy orunbecoming an American -- and without that non-Senator American having any legal redressagainst us -- yet if we say the same thing in the Senate about our colleagues we can bestopped on the grounds of being out of order.It is strange that we can verbally attack anyone else without restraint and with full protectionand yet we hold ourselves above the same type of criticism here on the Senate Floor. Surelythe United States Senate is big enough to take self-criticism and self-appraisal. Surely weshould be able to take the same kind of character attacks that we "dish out" to outsiders.I think that it is high time for the United States Senate and its members to do some soul-searching -- for us to weigh our consciences -- on the manner in which we are performing ourduty to the people of America -- on the manner in which we are using or abusing ourindividual powers and privileges.

      Aristotelian criticism is largely concerned with wondering how effective an artifact is in reaching its intended audience. In this case, Senator Smith never mentions Sen. McCarthy by name, but given the historical context, it is obvious she refers to him and his supporters in condemning "the Senate and its members". If one were to measure her success in doing so, as mentioned in Lorraine Boissoneault's Smithsonian article , "The one person who didn’t forget Smith’s speech was McCarthy himself. 'Her support for the United Nations, New Deal programs, support for federal housing and social programs placed her high on the list of those against whom McCarthy and his supporters on local levels sought revenge,' writes Gregory Gallant in Hope and Fear in Margaret Chase Smith’s America. When McCarthy gained control of the Permanent Subcommittee on Investigations (which monitored government affairs), he took advantage of the position to remove Smith from the group, replacing her with acolyte Richard Nixon, then a senator from California." This may not have been an intended effect, but shows nonetheless the significance and degree to which Smith's speech was able to reach her audience. Unfortunately however, it's hard to say how effective entirely her speech would have been, as popularity of her speech waned as the Korean War broke out later the same month, inclining many to take a more right-wing, anti-communist approach favored by McCarthy and many other Republicans.

    Annotators

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): Excellent quality of cell biology and biochemistry. the additional supports are needed for the claim of actin elongation using different formin variants.

      Reviewer #1 (Significance (Required)): Ingrid Billault-Chaumartin and co-authors described interesting research that provides insights on formin-isoform specific function in fission yeast and a new role of Fus1 FH2 domain in cell-cell fusion event. While three formin isoforms have different localization, research proposed an additional dissection in their functional differences by having different functions in C-terminus, including FH1 FH2 and formin C-terminus. The work also described additional factors that regulate cell fusions from autotrophy effect and formin expression level, in addition to the well-accepted formin biochemical activities. Here are my comments regarding the strengths of the work and improvements that could further strengthen the story.

      Major comments 1. Fig.1 shows Cdc12C could recapitulate Fus1 function by ~80% if fused with Fus1C, whereas deletion of the C-terminal tail of Cdc12 following FH2 introduces drastic dysfunction. Together with Fig. 3, these results indicate Cdc12 Cter plays more important roles than Fus1 Cter for there respective functions. Such results suggested a Cter-mediated mechanism that differentiates the functions of three fission yeast formin isoforms. The authors examined contributions from the difference in FH1 (Figs 4,5) and FH2 residues (Fig. 6). Whereas the obvious phenotype of Cter was not further investigated and not much discussed. The Cter of budding yeast formins interacts with nucleation-promoting factors, Bud6 and Aip5. Although S. Pombe does not have orthologs of budding yeast Bud6 and Aip5, I wonder would the author discuss the potential contribution of Cter in differentiating S. Pombe formins.

      The reviewer is correct that the C-terminal tail region of Cdc12 beyond the FH1-FH2 domains has a strong influence on the ability of Cdc12C to replace Fus1C. This is one reason why we specifically investigated the possible role of Fus1 C-terminal tail, which is much shorter than that of Cdc12. We found that Fus1 C-terminal tail plays only very minor role in regulating Fus1 function, as described in Figure 3. We note that contrary to what the reviewer states, Bud6 exists in S. pombe and binds the C-terminal tail of the formin For3 (see Martin et al, MBoC 2007), but whether it binds Fus1 is unknown. We have expanded our discussion to include a paragraph on the role of formin C-termini.

      Because the manuscript is focused on the function of Fus1 formin, we did not explore further the role of the Cdc12 C-terminal tail. It was previously shown that this region of Cdc12 contains an oligomerization domain that promotes actin bundling (Bohnert et al, Genes and Dev 2013). It is thus likely that this helps Cdc12 FH1-FH2 perform well in replacement of Fus1. In fact, it is likely that oligomerization boosts formin function, as we have discovered that Fus1 N-terminus contains a disordered region that fulfils exactly this function. This is described in a distinct manuscript under review elsewhere and just deposited on BioRxiv (Billault-Chaumartin et al, BioRxiv 2022; DOI: 10.1101/2022.05.05.490810). We have now cited this point in the discussion.

      1. Here, the study focuses on the FH1 between Fus1 and Cdc12 to understand their different functions in actin polymerization. FH1 mediated actin elongation through its interaction with profilin via polyP. The transfer rate of G-actin from profilin and profilin sliding depends on the polyP patterns regarding the length of each polyp motif and their distance to FH2 (Naomi Courtemanche and Thomas D. Pollard, JBC, 2012). To better understand the mechanisms by which these engineered FH1 variants on both Fus1 and Cdc12 in Fig. 4, the author may want to list the sequence of these engineered FH1 domains, including the information of the number and length of polyp motifs, and discuss these patterns.

      This list and discussion were available in the initial paper that characterized each of the constructs in vitro (Scott et al, MBoC 2011). We have now re-drawn it in a supplemental figure for convenience (as also answered in response to minor point 2), which is already provided in the revised manuscript as Figure S1. (Previous supplementary figures are re-numbered S1>S2, S2>S3 and S3>S4).

      1. Figs.4,5 cell biology results do not directly support the point of specific elongation rate unless the LifeAct-labeled actin cable elongation speed could be followed and quantified. The fluorescent tagging of tropomyosin does not show the actin cable pattern, which makes it very difficult to be used to study actin cable dynamics, such as elongation. Therefore, I feel the data in current Fig. 4 and Fig. 5 could not claim the differences in actin elongation without a quantitative comparison of elongation rate. I suggest a CK666 treatment to increase the visibility of the actin cable pattern of LifeAct, used before in both fission and budding yeasts, which would allow the author to quantify the actin cable elongation rate. Another way is to use the TIRF assay used in this study, which would give a better quantitation of formin nucleation and profilin-aided elongation.

      We respectfully disagree with the reviewer on this point. All the constructs we use in vivo have been characterized in vitro and their elongation rate carefully measured (Scott et al, MBoC 2011). These values are thus known and can be directly compared to our results in vivo.

      Of course, it would be fantastic to be able to directly measure formin elongation rates in vivo, but we are not aware that this has been done in any system. The proxy experiments that the reviewer suggests would be good ones, but each faces technical challenges that make them impossible in our system. First, because the fusion focus is a structure that forms in response to cell-cell pheromonal communication, we cannot add CK-666 or any other drug during this phase, as this perturbs the pheromone signal. Indeed, we had shown that simple buffer wash leads to loss of the fusion focus (see Dudin et al, Genes and Dev 2016). Second, the fusion focus is at the contact site between partner cells, i-e somewhat distant (1-2µm) from the coverslip during imaging. It is thus impossible to use TIRF. Finally, the fusion focus is a tightly packed actin structure. This is the reason why (rather than use of the tropomyosin marker) we cannot image single actin filaments (or even bundles) of which we could follow the dynamics as has been done to measure the retrograde flow of actin cables in yeast.

      What we have done is to use a better tropomyosin tag, mNeonGreen-Cdc8, which was just described (Hatano et al, BioRxiv 2022; DOI: 10.1101/2022.05.19.492673) to quantify amounts of linear actin. Although this is not a measure of elongation rate, it would give some sense about amounts of polymer assembled. We have obtained images with mNeonGreen-Cdc8 of all experiments previously conducted with GFP-Cdc8 and have replaced them in Figure 4C, Figure 5E, Figure 6E and Figure S2B. We have also quantified the relevant strains. The relative intensities of mNeonGreen-Cdc8 at the fusion focus at fusion time reflect remarkably well the measured elongation rates of the various formin constructs characterized in vitro. These data are now provided as new panels Figure 4F and Figure 5F.

      1. I appreciated the detailed biochemical dissections of multiple aspects of WTFus1 and Fus1R1054E, although the biochemical assays could not identify the mechanism by which R1054E causes the cell fusion. In many cases, the formin functions are diverse in diverse biological processes and sophisticated that cannot be explained well only from its biochemical activities in actin polymerization, such as the bundling, nucleation, and elongation studied in this story regarding fusion. This exciting information allows us to think of more possibilities that might regulate formin function rather than a direct change of formin activities in actin polymerization. I think a discussion of different aspects of functional regulation of formin might inspire society to investigate new possibilities to solve the mysteries. For example, the changes in formin behaviors and functions could be regulated by stress-induced formin turnover by degradation, cell signaling-regulated formin clustering and complex assembly, and their potential relevance to recruit protein constituents for fusion progression.

      We have added a paragraph on the role of Fus1 C-terminus. If you feel we should expand more on the diverse modes of regulation of formins, we could, but we have so far kept the discussion centred around the points of investigation in this paper, whose aim was to probe how changes in nucleation and elongation rates, rather than other regulations, affect the in vivo function of Fus1.

      Minor comments. 1. There are two types of "C", one includes FH1/FH2 and one following FH2, used in the manuscript, and it is a bit confusing. Better to differentiate them that allows an easy following. Fig. 1 uses Cdc12C-deltaC, Fig. 3 uses Fus1-delta Cter.

      We have updated the nomenclature to make this clearer: the C-terminal region beyond the FH1-FH2 domains is now called Cter throughout the manuscript.

      1. It's better to specify the amino acid position on the schematic of formins, such as panel A in many figures. It's always more informative to compare formin activities by considering the domain lengths, especially for the C-terminal tail that is variable in lengths and sequences. With similar thoughts, I suggest a supplementary figure that lists the sequence of all FH1 domains variants and Cter domains, such as the FH2 domain in Fig. S1.

      We have made a supplementary figure (new Figure S1) listing all constructs with specific aa positions as well as the FH1 domain variants and their sequences (see also answer to point 2 above). We have not added the sequence of the Cter domains in this figure, as these are extremely divergent and not particularly informative at this point.

      1. "n" for the statistic needs to be provided for Fig. S3.

      We have added the information to the legend of the figure (now Fig S4).

      1. The SDS-PAGE staining gel of the purified recombinant proteins for biochemical assays should be provided, particularly for these newly reported mutant variants.

      This is now provided as new panel S4C. We show the purified recombinant Cdc122FH1-Fus1FH2 proteins, which are the newly reported ones.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): In this study, Billaut-Chaumartin and colleagues investigate the molecular specialization of the S. pombe formin, Fus1. The authors systematically modulate the actin filament elongation and nucleation activities of Fus1 by expressing chimeric constructs that contain Formin Homology 1 and 2 domains from two other formins with known polymerization activities. By characterizing the architecture of the fusion focus and the efficiency of cell fusion, they find that both the elongation and nucleation properties of Fus1 are specifically tailored for its cellular role. Comparison of formin constructs with similar elongation and nucleation activities also reveals that the Fus1 FH2 domain possesses a specific property that promotes efficient cell fusion. Using sequence alignment and homology modeling, the authors identify R1054 as the residue that confers this novel, fusion-specific activity to Fus1, despite producing no effect on its bundling or polymerization properties in vitro.

      Overall, this study is well motivated, and the results support the conclusions that are drawn. I have only minor suggestions, as described below.

      Minor comments: (1) The schematic diagrams of the chimeric formin constructs are very helpful. However, it is difficult to distinguish the colors from one another, especially in the case of the Cdc12FH1-Fus1FH2 variant, which requires discernment of the relatively small purple region within the dark blue molecule. Would it be possible to modify the colors to increase their contrast? Similarly, the blue and gray data sets in Figure 3B are very difficult to discern.

      We have changed the colours to improve contrasts.

      (2) The affinities (Kd) with which the formins bind the barbed ends as described in the second-to-last paragraph on page 8, in Figure Legend 7G, and in the "Analysis of pyrene data" section of the Materials and Methods should be defined as dissociation "constants", rather than dissociation "rates". Also, these affinities are lacking units in the following sentence on page 8.

      We have corrected this. The unit is nM.

      (3) When comparing the TIRF micrographs in Figure S3A, it looks as though both formins (but especially the R1054E variant) nucleate more filaments in the presence of profilin than in its absence. Is this a reproducible effect? If so, can the authors provide an explanation for this?

      There is strong variability in the filament numbers observed by TIRF in replicate experiments, which makes it difficult to use this technique to determine the nucleation efficiency. This may be due for instance to the stickiness of the glass, which may influence the number of observed filaments. We have measured the number of filaments after 130s of polymerization for each condition to test whether there are any significant differences between conditions despite overall variability. The measurements suggest that the addition of profilin increases the number of actin filaments. However, these results should be taken very carefully due to the experimental variations (very large error bars). Additionally, because Fus1-associated filaments are very short in absence of profilin, it is quite likely that this influences their crowding at the glass surface compared to longer filaments (in presence of profilin). Since in TIRF we can only observe the filaments at the glass surface, we may miss a portion of short Fus1-bound actin filaments in absence of profilin.

      For these reasons, and because the possible role of profilin in modulating nucleation efficiency by formins is not the object of the work here, would thus prefer not to include this graph in the manuscript.

      Reviewer #2 (Significance (Required)): This study contributes a key advancement towards understanding how the polymerization activities of formins are tailored to support diverse and specific cellular functions. The results in this study nicely complement and expand upon similar recent work that dissected the polymerization requirements of the formin Cdc12, which mediates cytokinetic ring assembly in S. pombe, and For2, which drives the assembly of apical networks that are necessary for polarized growth in Physcomitrella patens. As such, this work will likely be of significant interest to scientists who study mechanisms of actin dynamics regulation. The identification of R1054 as a residue that confers a novel regulatory activity to the FH2 domain of Fus1 will also likely be of great interest to biochemists and other scientists who study formins at the molecular level.

      My expertise is in the field of formins and actin polymerization.

    1. Reviewer #1 (Public Review): 

      In this article Farrell et al. leverage existing datasets which measure frailty longitudinally in mice and humans to model 'robustness' (the ability to resist damage) and 'resilience' (the ability to recover from damage), their dynamics across age, and their relative contributions to overall frailty and mortality. The concept of separating damage/robustness from recovery/resilience is valid and has many important applications including better assessment and prediction of effective intervention strategies. I also appreciate the authors' sophisticated attempts to effectively model longitudinal data, which is a challenge in the field. The use of human and mouse data is another strength of the study, and it is quite interesting to see overlapping trends between the two species. 

      While I find the rationale sound and appreciate the approach taken at a high level, there are a few key considerations of the specific data used which are lacking. The authors conceptualize resilience based on studies which primarily use short time scales and dynamic objective measures (ex. complete blood cell counts in Pyrkov et al.) often in conjunction with an acute stress stimulus. For example, they heavily cite Ukraintseva et al. who define resilience as "the ability to quickly and completely recover after deviation from normal physiological state or damage caused by a stressor or an adverse health event." 

      Given these definitions, the human data used seem to fit within this framework, but we should carefully consider the mouse data. The mouse frailty index is a very useful tool for efficiently measuring the organismal state in large cohorts. A tradeoff for quickly measuring a broad range of health domains is that the individual measurements are low resolution (categorical) and involve inherent subjectivity (which may be considered part of the measurement error). Some transitions in individual components are due to random measurement error and I believe this is especially likely with decreases (or 'resilience' transitions). 

      The reason I think the resilience transitions are subject to high measurement error is that I am skeptical as to whether many of the deficits in the mouse index are reversible under normal physiologic conditions. For example, it is exceptionally unlikely for a palpable/visible tumor to resolve in an aged mouse over the time scales studied here, thus any reversal that was observed is very likely due to random measurement error. Other components which I have doubts about reversibility are alopecia, loss of fur color, loss of whiskers, tumors, kyphosis, hearing loss, cataracts, corneal capacity, vision loss, rectal prolapse, genital prolapse. 

      In summary, I applaud the authors' efforts in generating complex models to better understand longitudinal aging data. This is an important area that needs further development. I appreciate their conceptualization of resilience and robustness and think this framework has an important place in aging research. I also appreciate their cross-species approach. However, the authors may have over-conceptualized and made some assumptions about the mouse data which may not be valid. It will be important to assess the results with careful consideration of the time scales of the underlying biology and the resolution and measurement error inherent to these tools.

    1. What did Franklin himself think about abortions? In 1728 during his early years as a printer, he generated controversy over something he would end up doing himself. According to “Benjamin Franklin: An American Life” by Walter Isaacson, he “manufactured” an abortion debate, largely because he wanted to crush a rival, but his own opinions may not have been too strong about it. Franklin wrote a series of anonymous letters for another paper to draw attention away from Samuel Keimer’s paper: The first two pieces were attacks on poor Keimer, who was serializing entries from an encyclopedia. His initial installment included, innocently enough, an entry on abortion. Franklin pounced. Using the pen names “Martha Careful” and “Celia Shortface,” he wrote letters to Bradford’s paper feigning shock and indignation at Keimer’s offense. As Miss Careful threatened, “If he proceeds farther to expose the secrets of our sex in that audacious manner [women would] run the hazard of taking him by the beard in the next place we meet him.” Thus Franklin manufactured the first recorded abortion debate in America, not because he had any strong feelings on the issue, but because he knew it would help sell newspapers.

      Benjamin Franklin manufactured the first recorded abortion debate in America to help sell his newspapers and to crush a rival.

    1. The student doesn’t have a strong preference for any of these archetypes. Their notes serve a clear purpose that’s often based on a short-term priority (e.g, writing a paper or passing a test), with the goal to “get it done” as simply as possible.

      The typical student note taking method of transcribing, using (or often not using at all), and keeping notes is doomed to failure.

      Many students make the mistake of not making their own actual notes. By this I don't mean they're not writing information down. In fact many are writing information down, but we can't really call these notes. Notes by definition ought to transform something seen or heard into one's own words. Without the transformation, these students think that they're taking notes, but in reality they're focusing their efforts on being transcriptionists. They're attempting to capture something for later consumption. This is a deadly trap! By only transcribing, they're not taking advantage of transforming information by putting ideas down in their own words to test their understanding. Often worse, even if they do transcribe notes, they don't revisit them. If they do revisit them, they're simply re-reading them and not actively working with them. Only re-reading them will lead to the illusion that they're learning something when in fact they're falling into the mere-exposure effect.

      Students who are acting as transcriptionists would be better off simply reading a textbook and taking notes directly from that.

      A note that isn't revisited or revised, may as well be a note not taken. If we were to consider a spectrum of useful, valuable, and worthwhile notes, these notes would be at the lowest end of the spectrum.

      link to: https://hypothes.is/a/QgkL6IkIEeym7OeN9v9New

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      From the start, the authors would like to thank all the reviewers for their careful and constructive consideration of our manuscript. We have now made several changes to the paper and believe it to be better for the feedback.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this study, Rees et al. perform an RNA-seq circadian time course experiment in the recently formed allopolyploid wheat. Through comparisons with other circadian transcriptomic datasets in other species it appears that the period of rhythmic genes is much more variable in wheat with a shift to longer periods compared to the other species examined. Interestingly, by analyzing circadian parameters among expressed genes, they find evidence that this newly formed allopolyploid already shows signs of divergence in circadian traits among homoeologs. A thorough comparison with circadian regulated genes in Arabidopsis reveals overlap in phasing of genes involved in certain biological processes such as photosynthesis and light signaling whereas genes involved in starch metabolism were found to have different levels of rhythmicity and phasing. This dataset will be a great resource for the community and enable new predictions about the influence of polyploidy on the circadian control of important crop improvement traits and the circadian regulation of gene expression.

      Major Comments

      1. The results section starts with very little explanation of the experiment. It would help to provide a little more detail at the start of the results to explain the context for the experiment and what was done, when samples were collected and for how long. For the methods section, it isn't until line 650 that it is clearly stated that the sampling started at ZT0. It would be better to put this in the plant materials and growth condition section.

      Thank you for highlighting the need for this context, we agree that the manuscript is improved by an introduction to the experiments. We have now included an “Experimental context” section in the results and have taken the opportunity to explain how the full 0-68h and 24-68h datasets are used within our analysis. Ln 74-82. We have also edited the Methods as suggested Ln 610-615.

      The low proportion of circadian regulated genes is likely due to the very low cutoff for calling a gene expressed, especially when there are three days of repeated timepoints. If a gene is expressed across the time course it should have values above TPM 0 for at least 3 time points in order for it to be expressed each day. I'd also be suspicious of a gene with a TPM value less than 0.5. Comparing these types of numbers is always challenging due to the various cutoffs used. Along those lines, why was a different filtering scheme used for Arabidopsis (line 657)?

      We completely agree that the proportion of genes described as rhythmic changes a great deal with the threshold at which you exclude low expression transcripts as well as the window over which measurements are taken and the q-value cut-off for rhythmicity. We performed an analysis to test the effects of applying a pre-filtering step to exclude low-expression genes and discuss our findings in Supplementary Note 1. Briefly, we removed genes with expression less than 0.1 TPM in six or more timepoints and again ran Metacycle to define numbers of rhythmic genes. Our results are discussed in Supplementary Note 1 and are presented in Supplementary Table 1. Regardless of the cut-offs applied, Arabidopsis and wheat data was treated identically, and our findings reported in the main results were consistent with those reported in the Supplementary analysis. Thank you for raising this point, as we have now improved our description of this analysis in the main text (Ln 92-95).

      Regarding the different filtering schemes, the filtering mentioned by Reviewer 1 was applied to both Arabidopsis and wheat data for a stricter retention of rhythmic genes, as part of the pre-WGCNA clustering analysis. Filtering to retain genes with >0.5TPM across 3 timepoints was applied to reduce lowly expressed genes, that act as background 'noise' when defining clusters. We applied this across 3 timepoints rather than the WGCNA suggestion of 90% of samples - because the patterns of expression in our rhythmically filtered datasets were cyclical in nature.

      In reference to the shortening of the period every day, this should be interpreted with caution. Period estimate of a single cycle are not very reliable and the SD for each day is around 3h so it is difficult to draw any conclusions about changes in period each day. One option would be to only include genes with an SD less than 1h or alternatively to remove the discussion surrounding the comparison of period across the three days and focus on the period results for the full 24h-68h window shown in 1b. While 2 days is better it is still not ideal for calling period; however, your first day will still have a strong diurnal driven pattern that will likely skew your circadian period.

      Thank you for your comments. Our question here was to determine whether the mean period lengths of rhythmic transcripts in wheat were always immediately longer upon transfer to constant light, or whether they got progressively longer over time. Upon reading the reviewer’s comment, we realize that the explanation provided of how we conducted this analysis was misleading. Our approach was to take a 44h sliding window (almost 2 days) and measure period at 0-44h, 12-56h and 24-68h. We have now added the previously missing statistics that support our findings in the main text, and which hopefully show the significance of the period changes over time (supplementary note 2). One of the most surprising findings from this analysis was that the periods in the first window were the longest 28.61h (SD=3.421), suggesting that the diel (driven) oscillation had little impact upon immediate transfer to free run. Our interpretation is that the mean period initially lengthens trying to follow the missing dusk signal, before the free-running endogenous period asserts itself in later cycles (Ln 129-128).

      Line 87-93: If the dusk cue is important for clock expression you would think this would be biased towards genes that peak later in the day or near dusk. This argument should be connected better to the period results discussed on lines 98-101.

      Following on from our statement above, we have now combined our hypothesis for why wheat transcripts expressed at dusk have longer periods with the discussion about longer periods upon transfer to constant light. We agree that the two processes are likely to be connected and have now placed them together in Ln 129-128.

      1. Lines 650-652 of the Methods mentions that one of the main interests was the response to transfer to L:L, but this isn't mentioned in the introduction and doesn't come up much in the Results section. Most of the expression comparisons are focused on the 24-68h window. It also isn't clearly explained why the first day in LL is still a diurnal cycle. This would be helpful for non-circadian readers who may wonder why the first day is not included in all the analyses.

      We believe this point is now also addressed by the addition of an Experimental Context section in the results (Ln 74-82), in response to the reviewer’s previous comment.

      1. The phase comparisons shown in Figure suppl 4 are confusing. Suppl. Note 3 states that the period from the 24-68h data window was used to establish the bins but then the phase is shown for 3 different windows for each column? When calculating the phase for each of those 3 windows which period was used as the denominator in the phase calculation? Was it the period that matches the window used to calculate phase? What does the plot look like if phase is called on the same window used to calculate period (24-68)? What method was used to call phase in Suppl. Fig 4? As shown in Suppl Fig. 3 the method can influence the phase distributions. The methods suggest that the phase was determined with Metacycle but then FFT and MESA were used to verify. What does this mean verify, were they adjusted if FFT/MESA didn't agree?

      We agree that this Figure was unnecessarily complicated. We have now simplified Supplementary Figure 4 so that only the phases from 24-68h are presented. We have also clarified the legend to explain why we used FFT-NLLS to improve accuracy of Metacycle predictions.

      It is difficult to interpret the value of the period and phase comparisons shown in Fig. 1b, c, e and f after the preceding section about how variable the period and phase is across days. It is also surprising that the full 3 days were used to calculate the circadian statistics considering the first day is still under diurnal control. Do the ratios remain the same if the statistics are performed only on the 24h-68h window? For consistency with the rest of the paper and avoid confusion it would be best to have all circadian parameters measured using the same time window (24h-68h).

      Thank you for your comments, we can see how our logic in using the different data windows was not clear enough. As mentioned above, we have now explained the use of the full and shortened data windows in Experimental context section (Ln 74-82). Fig 1c is a comparison between different circadian datasets and as such we have only compared periods across 24-68h window. Similarly, Fig 1b is a global analysis of periods in rhythmic genes in comparison with Arabidopsis and so is again measured from 24-68h. We have now clarified this in the Figure legend for 1b.

      For comparisons of homoeologs within wheat triads, our question was in identifying homoeologs which behaved differently when placed under free-running conditions. We therefore still feel justified in using the full 0-68h dataset to identify homoeolog periods and phases which indicate differential circadian regulation, but we have now clarified that we are using the full dataset for the triad analysis in the results (Ln 140).

      Fig 1h-m. How were those genes chosen? It would help to see the SD of the replicates shown, since this is just showing one triad. It would be helpful to see a plot that represents the full set of triads rather than just one that looks best. If normalized to a standard phase they could be put on the same plot. For example, panel j is meant to show the 8h lag of subgenome D. If the data is normalized so that A and B are set to the same phase all the triads could be displayed with shaded SD bars to show the variation. Something like this would be a better representation of the data rather than showing just one example.

      Fig. 1h-m are case-studies illustrating the different forms of circadian imbalance between homoeologs. We agree that it is helpful to see the standard deviation as error bars on these triad plots and have added it as suggested. In line with another Reviewer 2’s suggestion we have removed Fig 1k and have replaced this with a comparison of mean normalised data for Triad 408 and Triad 2454, highlighting the difference between imbalanced rhythmicity and imbalanced amplitudes between homoeologs. Fig 1 I and m do not have error bars as adding standard deviations to mean normalised data wasn’t appropriate.

      Thank you for your suggestion on how to display the different phases between homoeologs. We feel that if we were to plot all of the triads displaying imbalanced phases, the differences in period length and accompanying noise differences would make the plot so busy as to be unreadable. We hope that the pie charts Fig 1 d-g give a global overview of the proportions of triads with circadian imbalance, but agree with the point that it is useful to allow readers to view triads of their own preference. Therefore, we have now provided the replicate level TPM data with the triad IDs annotated (Supplementary File 12) and Supplementary file 11 provides the classification of each triad alongside Metacycle statistics, ortholog identification and cluster information discussed elsewhere in the paper. Readers can now look up a triad or gene of interest and see how it was classified and what the expression looks like over the full dataset.

      It is surprising that there aren't more comparisons with the B. rapa dataset, especially when discussing the clock genes that show balanced or imbalanced expression. Are they similar in B. rapa and does it support your hypothesis that unbalance for certain genes are selected against?

      While we agree that a thorough, multiple species, comparative transcriptomic analysis is undoubtably of interest for the future, we feel it is beyond the scope of the questions being addressed in this paper. We do compare paralogs defined as “similar” in the Greenham dataset with homoeologs described as “balanced” in our dataset and find that genes involved with “photosynthesis” and “generation of precursor metabolites and energy” tend to be common between the two groups, potentially suggesting conservation of balance for certain types of genes (Ln 206-217).

      Figure 2 networks. Why were these specific modules selected? Is it actually appropriate to directly compare these modules? I do see that some of the comparisons have high correlations from panel a, but not all. For example, in panel b the W9 and A9 modules have a correlation value of 0.92, which seems appropriate. However, panel c (modules W3 and A2) have a correlation of 0.42, which seems far too low to make any sort of comparison meaningful.

      The modules were selected to simplify the comparison of genes expressed in the dawn, midday, dusk, and night. We were interested in identifying common GO-enrichment in genes peaking throughout the day, although as you have identified, the differences in period length between Arabidopsis and wheat made this difficult. Our reasons for comparing module W3 with module A2, were that, even though their eigengenes are not highly correlated per se, when period length is taken into account, both modules peak during the subjective day (CT 6.34h and 6.19h) and they share commonly enriched GO terms which make sense for day peaking genes.

      Further, as described in methods comments, using a cutHeight as low as 0.15 will likely lead to some number of genes in any given module that do not necessarily "share" a similar expression pattern. These genes could have a pattern that has very low correlation to their module eigengene and were only placed in that module because the pattern was "less similar" to other module eigengenes. The current expression plots in this figure follow a clear pattern, but I suspect this would be even more apparent if the genes within these modules had a higher correlation to the module eigengene. Perhaps the current genes in these modules could just be filtered to have a higher correlation score?

      Thank you for your comments, we have now made changes to the Results and Methods to clarify our approach (Ln 237-239 and Ln738-765). Merging modules with highly correlated module eigengenes (ME) is the final step in constructing our co-expression networks. To do this, as the reviewer describes - we used the WGCNA default parameter of a mergeCutHeight() of 0.15. This results in the merging of modules with highly correlated ME as the 0.15 mergeCutHeight() refers to the dissimilarity metric of 1 minus the eigengene correlation. So for WGCNA, a mergeCutHeight() of 0.15 corresponded to a correlation of 0.85. For the wheat modules, we took the additional step of merging closely related modules (mergeCloseModules()) using a cutHeight of 0.25, again a dissimilarity metric of 1 minus the eigengene correlation (corresponding to a correlation of 0.75). Reducing the stringency of the cutHeight to merge highly correlated wheat modules enabled us to more easily compare significantly correlated wheat and Arabidopsis co-expression modules to identify groups of genes in wheat and Arabidopsis expressed at similar times in the day, and enable the comparison of whether similar phased transcripts in wheat and Arabidopsis had similar biological roles.

      Lines 327-334: I am not following the connection between 'response to abiotic stimulus' and the photoreceptor and light signaling proteins. At the start of this section (line 308) the authors say that the GO analysis was only done on rhythmically expressed genes but the reference to only one PHYA being rhythmic and yet multiple genes are shown in the plot in fig. S16. Does this mean that all the genes were shown and not just the rhythmic ones? This would explain why many of the PHY and CRY genes don't seem to have rhythms. This should be clarified better in the text or indicated in the plot which ones were called rhythmic. Since the first day following transfer is still the diel pattern from the entrainment condition, what does the PHY and CRY expression look like? Does it appear rhythmic under diel but lose rhythmicity in LL? It should be noted in the text that arrhythmicity in circadian conditions doesn't mean there isn't rhythmicity under diel conditions. This could be an additional explanation apart from the current one in the text that the regulation is at the level of protein stability/localization. Overall, this entire section is very long and entirely based on data shown in the supplemental material. I do appreciate having the individual gene plots that supplement Figure 4 and would suggest either providing a main figure to highlight a small subset of genes or pathways in this section or shorten it and focus on the results shown in the main figures.

      Upon reading the reviewer’s comment, we realize that we should have made our motivations and processes clearer within this section. We used the data filtered for rhythmicity to conduct the GO-enrichment analysis and then used that to identify processes which should be of interest for further investigation. We have now added an additional sentence (Ln 352-354) to explain this more clearly. We then considered the orthologs of well-known Arabidopsis gene networks and extracted their expression from our circadian dataset, whether rhythmic or not. Supplementary Table 10 contains all of the genes we investigated, their expression and their MetaCycle statistics. We have also indicated here which genes are plotted in which Supplementary Figure 18-20. The reasons for plotting non-rhythmic genes in some cases was that it illustrates the differences between circadian control in Arabidopsis versus wheat (as is the case for the PHY and CRY genes). We understand that it is useful to see at a glance which genes are classified as rhythmic or arrhythmic, so have now highlighted each row in Supplementary Table 10 to make this more intuitive, and added a read me tab.

      Regarding your point about oscillation under diel cycles, we agree that some transcripts will show rhythmic behaviour under entraining environments but not under constant conditions, and may perform time-of-day specific functions. However, these transcripts are likely to not be regulated by the circadian clock (at the transcriptional level) and so are not discussed in the context of a circadian transcriptome.

      For your interest, here is the full expression of PHY and CRY transcripts starting at ZT0:

      [Image]

      It is difficult to say for definite, but it seems likely that some of these photoreceptors will have rhythmic patterns of expression under diel cycles, but these rhythms do not endogenously persist under constant conditions.

      We appreciate your feedback that this section would benefit from cutting down of text and addition of a Figure to illustrate the text. We have now cut some of this section down and created a new main figure based on some of the oscillation plots from Supplementary Figure 18 and 19. We chose examples that reflect a conservation of relationships between transcripts of different peak phases, as we find it interesting that both species have similar patterns. (Main Figure 4, Ln 361--363, 382).

      1. Primary metabolism section: in terms of the supplemental figure, similar to the previous one I think it would declutter the plots if the genes that are not rhythmic were left out and simply indicate below the plot that they didn't meet the rhythmicity cutoff. This is another area where there is more discussion surrounding the supplemental figures than the main figure 4.

      One of the overall findings of this section was that many of the genes involved in Starch and T6P metabolism which are rhythmically expressed in Arabidopsis are not rhythmically expressed in wheat. We feel removing these genes from the results would detract from the importance of this finding. We have now edited Supplementary Table 10 to highlight which genes are classified as rhythmic. We have also added in a sentence to the start of this section which lays out our motivations for this analysis, summarises our findings and better connects the text with an explanation of Fig. 5 (Ln 408-430).

      For all gene expression figures there should be SD or SE shown either as bars or ribbons to represent the variation in replicates.

      Although we agree that error bars are informative for showing variation between replicates (and have added them to Fig. 1 to show differences within wheat triads) we feel that adding error bars to the gene expression plots in Fig. 3, Fig 4 and Supplementary Fig 19-20 would make these plots difficult to read, particularly where the wheat homeologs are very similar. The purpose of these gene expression plots is to compare circadian profiles in Arabidopsis and wheat orthologs rather than to claim significant differences in expression at any particular timepoint. This is fairly common in other circadian biology studies:

      https://www.pnas.org/doi/10.1073/pnas.1408886111 ,

      https://www.jbc.org/article/S0021-9258(17)49454-3/fulltext#seccestitle20 , https://journals.plos.org/plosone/article/comments?id=10.1371/journal.pone.0169923 , https://www.science.org/doi/10.1126/science.290.5499.2110?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed,

      https://www.frontiersin.org/articles/10.3389/fgene.2021.664334/full,

      https://www.science.org/doi/full/10.1126/science.1161403

      The replication level information for each gene has now been made available in Supplementary file 12.

      1. It would be very helpful to include the code used to generate the networks and perform the cross-correlation of eigengenes across networks should be included in the Methods. This will also save you from responding to email requests!

      Thank you for your comment, Code for the cross-correlation analysis, Loom plots and WGCNA network construction is now available from our groups GitHub repository: https://github.com/AHallLab/circadian_transcriptome_regulation_paper_2022/tree/main

      Minor Comments

      1. Figure 1, panel d: - The "unbalanced" triads that are depicted by the lighter shading; do these in fact have a different cutoff than the original rhythmic homoeologs? In the figure it says qThank you for bringing this to our attention, this has now been corrected.

      Hard to directly compare the GO term overlap in Figure 2f. Might be better to only show the results for the 4 pairs shown in b-e and put them side by side in the bubble plot.

      Thank you for this feedback, We have tried to make this plot easier to understand without losing any of the available information. Hopefully it is now more intuitive to understand which columns are being compared. We have changed the coloured lines to make them slightly wider, put the modules in corresponding coloured boxes and highlighted GO-slim terms shared by modules being compared.

      1. Line 314 -316 don't see supp tables 10, 11

      Our apologies, these files were missed previously from the upload are now available.

      1. For the selection of B. rapa circadian paralogs with similar and differential expression patterns (starting line 714), the authors choose a hard cut off of 0.001 (differentially patterned) OR 0.1 (similarly patterned). What happens to the genes that are between these two cut offs or is this a typo. Since all the other cutoffs for rhythmicity was set at 0.01 it seems likely that this is a typo.

      We have now clarified this in the methods, (Ln 807-822). This is not a typo, but it is a different method to the Metacycle approach we have used for our wheat data. We defined similar/different paralogs as characterized in Greenham et al, (2020) using DiPALM p-values. We chose these DiPALM p-value cut-offs as they gave us approximately equal numbers of paralogs in each category, which represent tails of similarly expressed or differently expressed circadian genes. We checked these cut-offs by calculating average Pearson’s correlation statistics between paralogs and found that differential Brassica paralogs had a mean Pearson correlation coefficient of 0.31 (SD = 0.43) and similar Brassica paralogs had a mean Pearson correlation of 0.75 (SD= 0.23) which confirms that the DiPALM method of defining expression patterns makes sense in the context of this analysis.

      Line 681. Should be supplemental Figure 6 not 9.

      1. References to most supplemental figures are not the correct number.

      2. Labels above the plots in Supp Fig5 do not match the legend.

      We apologise for these mistakes. We realize that we had mistakenly submitted an earlier draft of the Supplementary materials file, which was missing Supplementary Figure 5, 6 and 9 which therefore shifted the order of the remaining figures. This is now updated.

      1. Suppl table 7 should be as a separate .csv file or similar to be able to see the full table.

      This is a good suggestion, and we have added this.

      1. Line 723 should be B. rapa not B. napus.

      Thank you for catching this! Corrected.

      1. Figure 4. There is no explanation for what the black boxes represent in the figure legend.

      Thank you for your comment. Figure 4 (new Figure 5) has now been updated.

      Reviewer #1 (Significance (Required)):

      This study provides new insight into the circadian regulation of the transcriptome in a new allopolyploid. It adds a valuable resource to a growing collection of circadian studies in important crops and will greatly improve our efforts to learn more about the circadian control of important crop improvement traits. The dataset will be of interest to other plant circadian biologists as well as the general plant biology community who focus on monocot crops. My expertise is more on the transcriptomic side and I do not have the expertise to evaluate the phylogenetic work presented in this study.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary Rees et al. present an RNAseq time course of bread wheat. Its recent polyploidisation is one motivation for this study as gene expression dosage is known to be important for clock function in other plants. The time course covers 3 days at sampling intervals of 4h of 2-week old wheat plants (all aerial tissues), in triplicates. The subsequent analysis of the RNAseq data includes analysis of the generated data by itself (e.g. GO analysis, rhythmicity, period and phase analysis, rhythmicity of transcription factor families as well as TF binding sites) as well as thorough comparison with published datasets of other species (Arabidopsis, Brassica rapa, Brachypodium dystachion). One of the key findings is that the mean period length and the period spread are larger in wheat than in these other species). Circadian clock genes largely have similar dynamics in wheat compared to Arabidopsis. In addition, one focus is the analysis of the dynamics of three genes of one triad and imbalance / balance of such triads. To the surprise of the authors, circadian regulated and clock genes were not necessarily balanced. Silencing is one of their explanation for imbalance of circadian genes as arrhythmic genes of one triad are typically those with the lowest expression level. Finally, the authors point out more examples of rhythmic processes and genes (photoreceptors and signalling, auxin, carbon metabolism) and their commonalities and differences with Arabidopsis.

      Major comments - The key conclusions and the data are convincing

      We thank the reviewer for their supportive comments.

      • line 120 and figure 1: In my opinion, q > 0.05 is not a good definition of arrhythmicity as non-significant q-values can result from either noise in spite of rhythmicity or from arrhythmicity. A more statistically sound way to detect arrhythmicity could for example be two-one-side tests (for example in the R package 'equivalence', e.g. see usage for time courses by Noordally et al. 2018, https://www.biorxiv.org/content/10.1101/287862v1).

      Thank you for pointing us in the direction of this package, we agree that choosing methods for circadian quantification and q-value cut-offs is always tricky and different approaches will perform better for noisier or non-sinusoidal waveforms. For future work, we will investigate the application of the suggested method in circadian rhythmicity analysis. However, we believe that the criteria used in this paper for rhythmicity quantification is suitable for addressing our questions, and overall, we are satisfied that rhythms with a q-value of >0.05 would also be classified by eye as being arrhythmic, and rhythms with a q-value Many other studies have used meta2d B.H q-values as a metric of rhythmicity: e.g. (https://bmcplantbiol.biomedcentral.com/articles/10.1186/s12870-022-03565-1 , https://link.springer.com/content/pdf/10.1186%2Fs12915-022-01258-7 , https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8782462/pdf/pcbi.1009762.pdf )

      • lines 480-484 and intro: In the introduction, the authors write that expression levels of clock components are important for the function of the clock, and that this is one motivation for the current study where polyploidisation is expected to affect the expression levels of clock genes and their outputs. I wonder what answers or speculations this study provides in the end, or whether such answers / speculations should be made clearer. For example, do the authors think that the higher variability of periods in wheat could be a consequence of lower robustness (in addition to possible spatial differences that are mentioned) due to polyploidisation? Is anything known about the period of rhythms of close wheat relatives that did not undergo polyploidisation? Did you look at dampening over the time course in wheat vs. Arabidopsis?

      The point above is an interesting one, and we thank the reviewer for raising it. We agree that the high variability of periods in wheat may be a product of polyploidisation, as functional redundancy between homoeologs may allow a tolerance for less tightly regulated, non-dominantly expressed circadian transcripts. We have now added this hypothesis to our discussion: Ln536-550.

      In our comparative analysis of period distributions, we looked at periods of transcripts from a diploid relative of hexaploid wheat, Brachypodium distachyon. In Brachypodium, period lengths have around the same SD as in Arabidopsis but the mean period length is slightly longer (Supplementary table 2). We have now edited our results to make the relationship between wheat and Brachypodium clearer (ln 109-110).

      Minor comments:

      Introduction - lines 49: it is unclear what is meant by ppd-1 at this position of the sentence

      We agree this was unclear and have revised it to “notably the ppd-1 locus within TaPRR3/7” Ln 52

      • line 54/55: clarify that this refers to Arabidopsis thaliana

      Corrected.

      Results - line 69 and 76: cite references for these tools here (not only in the methods section)

      Corrected.

      • line 90-93: Why wouldn't the same thing happen on subsequent subjective evenings?

      Thank you for your comments. We have now combined our hypothesis for why wheat transcripts expressed at dusk have longer periods with the discussion about longer periods upon transfer to constant light. We think that the two processes are likely to be connected and have now placed them together in Ln 126-131.

      The behaviour of mean period lengths of wheat transcripts upon transfer to constant light was unexpected and we believe is quite interesting. One explanation is that the influence of the ongoing light zeitgeber when dusk was expected causes a delay in the expression of evening peaking genes which are delayed by the continuous light signal. Then, on subsequent evenings the influence of the diel dusk signal is ‘forgotten’ as the governance of the endogenous clock takes over. The very long period observed at 0-24h (28.61h) may be due to a phase shift rather than an intrinsic lengthening of period per se. Whether this trait is unique to wheat or can also be seen in other plant species is, to our knowledge, unknown.

      • line 118: what is your defined cutoff for significance of the Chi square test (p=0.03 not regarded significant?)

      The reviewer is completely right, we have now clarified this. Ln 145-149

      • figure 1h,i: In order for the reader to see whether A and D (Figure 1h) or A (figure 1i) are indeed arrhythmic, one would need to see plots with a normalisation as done in figure 1m for 1l.

      We have now removed the triad showing one rhythmic gene and two arhythmic genes (as Fig. 1h already illustrates this type of circadian imbalance) and replaced this with a side by side comparison of how imbalance in rhythmicity differs from imbalance in relative amplitude as suggested.

      • figure 1h-m (and others with circadian time course traces): could a measure of variation (e.g. SD, SEM, confidence interval) be plotted as a shaded region around the curves (unless they're so small that they are there but not visible)?

      We have now added error bars to these plots to show standard deviation between replicates, in Fig. 1 h, j, k and l. We could not think of an accurate way to display this information for the mean normalised data (Fig 1. i and m) so have not put error bars on these plots.

      • line 139 (also in 737 and 450): give reference to Ramirez-Gonzalez et al in the same style as the rest of the manuscript (number)

      Thank you for raising this, we believe we have corrected all in-text citations (both narrative and fully parenthetical form) for consistency with the APA format used by the majority of Review Commons affiliate journals.

      • Clustering (modules): What is the reason for choosing 9 clusters? Was this number optimised or chosen for other reasons?

      WGCNA uses an unsupervised clustering algorithm that works within the supplied parameters to determine the optimum number of clusters to explain the dataset, without prior specification of the number of clusters. We have amended the manuscript text to clarify this Ln237-239.

      • lines 280 - 284: The TaELF3-1D phenotype could be explained a bit better to the non-wheat specialist, for example by mentioning in the beginning of this set of sentences.

      Done (Ln 314-318).

      • The authors present an analysis of TF binding sites. Can they say something about binding sites in a less sophisticated manner, such as on some very well-known motifs in promoters like the evening element?

      We agree that this is a very interesting question, and one that we may investigate in more detail with our data in the future. In this paper, we performed a global analysis of wheat TFBS predicted from orthologous Arabidopsis TF targets. These targets have been experimentally validated in Arabidopsis using DAP-seq, but we have not validated that these binding sites exist in wheat promoters. We therefore took a tentative approach, and presented only enrichments at the superfamily level rather than talking about specific regulatory motifs.

      The evening element would fit most likely fit within the MYB or MYB-related TFBS superfamily, however the diversity of transcription factors in this family means that there is significant enrichment of these TFBS in multiple modules throughout the day (Supplementary Figure 11). In summary, a more in depth TFBS analysis of known circadian motifs is of great interest, but we feel would be a substantial work in its own right.

      • Figure 1h-l: If known or meaningful, it would be interesting to know the gene identities behind the triads shown, as in supplementary figure 5.

      These triads were selected as case studies to exemplify the ways in which we were defining imbalanced circadian triads. They have no particular relevance to the figure, but out of curiosity, these are the closest Arabidopsis orthologs for the triads displayed in Fig. 1:

      Triad 408 has highest identity to a hypothetical protein (AT4G26415).

      Triad 2454 is similar to AT3G07600, a heavy metal transport/detoxification superfamily protein

      Triad 13405 is similar to AT3G22360, encoding an ALTERNATIVE OXIDASE 1B, AOX1B

      Triad 10854 is similar to NSE4A, a δ-kleisin component of the SMC5/6 complex, possibly involved in synaptonemal complex formation (AT1G51130).

      Information about wheat gene names in each triad and their Arabidopsis orthologs can be viewed in Supplementary Table 11, so that readers can search for genes of particular interest to them.

      • Figure 4 and text: The illustration of starch metabolism is very helpful. However, I think the paper would benefit from giving a better reason for the selection of this specific set of processes, for example by relating these findings to functional differences in starch metabolism in the two species (in contrast to Arabidopsis, wheat stores little starch in leaves but uses fructans as main reserve carbohydrate)? Are there known differences in the dynamics of starch degradation during the night?

      The reviewer raises an interesting point, and we have now clarified in our results that the stated differences between starch regulation in Arabidopsis and wheat was part of the motivation behind studying this pathway. Starch is at the centre of plant primary metabolism as a carbon storage source and is arguably one of the most important features that breeders look for in regard to grain filling and yields. Additionally, it is of interest to circadian biologists as starch (as well as sucrose) have been shown to transiently cycle and to be regulated by the circadian clock. However, in wheat, carbon storage primarily uses sucrose rather than starch, and we have now added sucrose to Figure 5 to place it in this context. We think your suggestion has now improved our explanation for why we focused on starch in the manuscript, and we are grateful for your input (Ln 408-421).

      We also agree that the differences in the ways that Arbaidopsis and wheat utilise starch versus sucrose, and perhaps the role that fructans have in as a reserve carbohydrate and in protection against freezing in wheat may be one of the reasons we are seeing differences in circadian regulation of starch. We have now added this to our discussion (Ln 584-592).

      • Figure 4: triose-phosphates can be transported in and out of the chloroplast, as is illustrated in the figure. However, the illustration looks as though they are converted to hexose phosphates during the transport process. In order to be consistent with other transport processes of the figure (maltose and glucose), triose-phosphate should be repeated on the cytosolic side.

      We have now amended this (new Fig. 5). Thank you for your feedback.

      Methods - line 543: if I understand correctly that triplicates were collected and analysed for each time point, '18 samples' is mis-leading (18 time points would be more accurate).

      We agree this was badly worded. Changed Ln 615.

      Supplementary - Supplementary figure 3: x axis label very small and contains typo

      Now corrected. Also enlarged axis for Supplementary Figure 2.

      • Supplementary table 1: Romanowski et al 2020 (add year), or use ref. number citation style as in the rest of the manuscript

      Thank you for raising this, we have now hopefully corrected all in text citations (both narrative and fully parenthetical form) to be consistent with APA format used by the majority of Review commons affiliate journals.

      • Supplementary table 9, primary metabolism: does bold highlighting of Arabidopsis accession numbers have a meaning or is it accidental?

      We apologise that this was unclear. We have corrected this. Supplementary Table 10 now also has a “Read me” tab which explains that table.

      Reviewer #2 (Significance (Required)):

      I believe this is a precious, carefully generated and analysed dataset which many biologists will benefit from, beyond wheat or circadian specialists. The dataset expands the knowledge of circadian transcriptome regulation to an important crop and contributes a resource of which only a handful of others exist in other species. Many high impact papers on RNAseq include some follow-up on candidates, for example in Romanowski et al 2020, which is admittedly easier to do in Arabidopsis than wheat due to the availability of genetic resources.

      My expertise: Plant circadian clock (Arabidopsis), dataset analysis (but not specifically for RNAseq)

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This manuscript is based on the analysis of a single experiment consisting in transcriptomic profiling of one (hexaploid) wheat genotype along 3 days (samples taken every 4 hours). The experiment is performed in constant light conditions, allowing detection of transcripts controlled by the circadian clock. The bioinformatic analysis studies the dynamics of the different homoeologous transcript in the polyploid genome and compares cycling transcripts in wheat with what is known from Arabidopsis.

      The manuscript is well written, the methods are correct, the analysis performed is sufficiently extensive and the figures are clear. The manuscript finds interesting expression patterns among homeologous genes, and goes into detail on important differences in circadian regulation of relevant gene families between Arabidopsis and wheat. The work is purely descriptive and does not aim at associations with physiological phenotypes, but the bioinformatic analysis is very thorough and uncovers interesting examples.

      Only one caveat: For what I gather, there is no replication in the RNA-seq experiment, although the exact method does not appear in the text. From the Methods section: "tissue was sampled every 4h for 3 days (18 samples in total)" and "At each timepoint, we sampled the entire aerial tissue from 3 replicate plants". Whether these samples were pooled or not is not described. The "Data Availability" section links to 18 RNA-seq paired end libraries, which suggest that the replicates were pooled, although some type of barcoding might have been used. The text should mention if the replicates were pooled or not, and, if so, what was the method used for poling (tissue, RNA or libraries). Even in the case of no biological replication the manuscript brings interesting insights into wheat transcriptomics and circadian biology. The editor (or the rules of the journal) should decide if they accept articles with no "real" biological replication (I am sure we all understand by now the benefits and limitations of pooling biological replicates into a single RNA-seq library).

      There was replication within the RNA sequencing experiment, and we apologise that this was unclear from our manuscript. Each timepoint consisted of three independent biological replicates. We have now created a new “Experimental context” section in the results to explain this (Ln 74-82) and have clarified in the methods how our data was processed (Ln 609-615 and 636-638).

      We have now included an additional matrix with TPMs at the replicate level to assist readers in looking at specific genes of interest (Supplementary Table 12).

      Minor comments:

      The description of the experimental setup in the first sentence of the Results section is too brief. Could you please talk about for how long the experiment was running? At what intervals the samples were taken? What conditions were used?

      We apologise that this was unclear. We hope that the new Experimental Context section, added in response to comments from several reviewers, makes this much clearer, alongside the clarification in the methods (Ln 609-615 and 636-638).

      Line 280: "...due *to* an introgression..."

      Corrected. Ln 315

      The legend of Figure 3l says elf4 instead of elf3

      We thank the reviewer for noticing this mistake that we have now corrected.

      Line 306 "says Supplementary Note 7 instead of Supplementary Note 7

      We are not sure what is to be corrected here!

      Reviewer #3 (Significance (Required)):

      This works advances our knowledge on how genome wide expression levels are controlled by the circadian clock in polyploids. Although previous works had performed similar analyses in other polyploid plants, this is the first time this is done in an hexaploid. This work is a starting step to understand gene regulation in this important crop, and have interest for researchers working in fundamental and applied plant biology.

      Thank you for your positive comments and your feedback in improving this manuscript. We would like to clarify that to our knowledge, this work presents the first analysis of a circadian transcriptome in a polyploid crop. The work by Greenham et al, although undoubtably providing insight into circadian regulation of ancient paralogs, was performed in the diploid Brassica rapa.

    1. Author Response

      Reviewer #1 (Public Review):

      Bice et al. present new work using an optogenetics-based stimulation to test how this affects stroke recovery in mice. Namely, can they determine if contralateral stimulation of S1 would enhance or hinder recovery after a stroke? The study provides interesting evidence that this stimulation may be harmful, and not helpful. They found that contralesional optogenetic-based excitation suppressed perilesional S1FP remapping, and this caused abnormal patterns of evoked activity in the unaffected limb. They applied a network analysis framework and found that stimulation prevented the restoration of resting-state functional connectivity within the S1FP network, and resulted in limb-use asymmetry in the mice. I think it's an important finding. My suggestions for improvement revolve around quantitative analysis of the behavior, but the experiments are otherwise convincing and important.

      Thank you for the positive feedback regarding our work.

      Other comments - Data and paper presentation:

      1) Figure 1A is misleading; it appears as if optogenetic stimulation is constant (which indeed would be detrimental to the tissue). Also, the atlas map overlaps color-wise with conditions; at a glance it looks like the posterior cortex might be stimulated; consider making greyscale?

      We have updated Figure 1A to address these concerns.

      Reviewer #2 (Public Review):

      These studies test the effect of stimulation of the contralateral somatosensory cortex on recovery, evoked responses, functional interconnectivity and gene expression in a somatosensory cortex stroke. Using transgenic mice with ChR2 in excitatory neurons, these neurons are stimulated in somatosensory cortex from days 1 after stroke to 4 weeks. This stimulation is fairly brief: 3min/day. Mice then received behavioral analysis, electrical forepaw stimulation and optical intrinsic signal mapping, and resting state MRI. The core finding is that this ChR2 stimulation of excitatory neurons in contralateral somatosensory cortex impairs recovery, evoked activity and interconnectivity of contralateral (to the stimulation, ipsilateral to the stroke) cortex in this localized stroke model. This is a surprising result, and resonates with some clinical findings, and a robust clinical discussion, on the role of the contralateral cortex in recovery. This manuscript addresses several important topics. The issue of brain stimulation and alterations in brain activity that the studies explore are also part of human brain stimulation protocols, and pre-clinical studies. The finding that contralateral stimulation inhibits recovery and functional circuit remapping is an important one. The rsMRI analysis is sophisticated.

      Thank you for the supportive comments regarding our manuscript

      Concerns:

      1) The gene expression data is to be expected. Stimulation of the brain in almost any context alters the expression of genes.

      We agree with the reviewer that stimulation of the brain is expected to broadly alter gene expression. However, in this set of studies, we examined a subset of genes that are of particular interest in neuroplasticity, and compared expression in ipsi-lesional vs. contra-lesional cortex in the presence or absence of contralesional stimulation during the post stroke recovery period. Genes like Arc, for example, have been shown by our group to be necessary for perilesional plasticity and recovery (Kraft, et al., Science Translational Medicine, 2018). The finding that validated plasticity genes are suppressed by contralesional stimulation is consistent with the central finding that contralesional stimulation suppresses the recovery of normal patterns of brain organization and activity. Importantly, there were also genes associated with spontaneous recovery that were unaltered or increased by contra-lesional brain stimulation. While these data do not provide causal associations, they may prove to be useful for developing hypotheses regarding molecular mechanisms involved in spontaneous brain repair for future studies.

      In light of the reviewer’s comment, we have altered text throughout to not focus on specific directionality of transcripts. Instead, we indicate that relevant transcript changes are those that are altered in association with spontaneous recovery, and which are altered in the opposite direction with contralesional brain stimulation.

      Minor points.

      1) Was the behavior and the functional imaging done while the brain was being stimulated?

      We have updated the methods (page 17) to clarify that the only experiments during which the photostimulus occurred during neuroimaging are reported in new Figure 6, and to clarify that photostimulation did not occur during the behavioral tests of asymmetry.

      2) It would be useful to understand what is being stimulated. The stimulation method is not described. Is an entire cortical width of tissue stimulated, and this is what is feeding back onto the contralateral cortex? Or is this stimulation mostly affecting excitatory (CaMKII+) cells in upper or lower layers? This will be important to be able to compare to the Chen et al study that gave rise to the stimulation approach here. This gets to the issue of the circuitry that is important in recovery, or in inhibiting recovery. One might answer this question by doing the stimulation and staining tissue for immediate early gene activation, to see the circuits with evoked activity. Also, the techniques used in this study could be applied with OIS or rsMRI during stimulation, to determine the circuits that are activated.

      We have clarified the stimulation protocol in response to Essential point 2.2. Due to light scattering and appreciable attenuation of 473nm in brain tissue, only ~1% of photons penetrate to a depth of 600 microns. Experimentally, this provides superficial-layer specificity to Layer 2/3 Camk2a cells (https://doi.org/10.1016/j.neuron.2011.06.004)

      To answer the question of what circuits are affecting recovery, we performed 2 sets of additional experiments – Experiment 1: OISI during photostimulation before and after photothrombosis, and Experiment 2: tissue staining for IEG expression (cFOS). We describe each below:

      Experiment 1 New results are included from 16 Camk2a-ChR2 mice (Results, page 10-11; Methods, page 18) and reported as new Figure 6. Similar to the previously reported experiments, all mice were subject to photothrombosis of left S1FP, half of which received interventional optogenetic photostimulation beginning 1 day after photothrombosis (+Stim) while the other half recovered spontaneously (-Stim). To visualize in real time whether contralesional photostimulation differentially affected global cortical activity in these 2 groups, concurrent awake OISI during acute contralesional photostimulation was performed in +Stim and –Stim groups before, 1, and 4 weeks after photothrombosis. At baseline, all mice exhibited focal increases in right S1FP activity during photostimulation that spread to contralateral (left) S1FP and other motor regions approximately 8-10 seconds after stimulus onset. While activity increases within the targeted circuit, subtle inhibition of cortical activity can also be observed in surrounding non-targeted cortices. Thus, activity both increases and decreases in different cortical regions during and after optogenetic stimulation of the right S1FP circuit. Of note, regions that are inhibited by S1FP stimulation show more pronounced decreases in activity in +Stim mice at 1 and 4 weeks compared to baseline and were significantly larger in +Stim mice compared to –Stim mice. We conclude that focal stimulation of contralesional cortex results in significant, widespread inhibitory influences that extend well beyond the targeted circuit.

      Experiment 2 For experiment 2, we hypothesized that IEG expression would increase in photostimulated regions, cortical regions functionally connected to targeted areas, and potentially deeper brain regions. For the IEG experiments, healthy ChR2 naïve animals (C57 mice) or CamK2a-ChR2 mice were acclimated to the head-restraint apparatus described in the manuscript used for photostimulation treatment. Once trained, awake mice were subject to the same photostimulus protocol as described in the manuscript applied to forepaw somatosensory cortex in the right hemisphere. After stimulation, mice were sacrificed, perfused, and brains were harvested for tissue slicing and immunostaining for cFOS. Tissue slices containing right and left primary forepaw somatosensory cortex and primary and secondary motor cortices (+0.5mm A/P) or visual cortex (-2.8mm A/P) were examined for cFOS staining and compared across groups.

      Below is a summary table of our findings, and representative tissue slices. While c-FOS IHC was successful, results are not consistent with expectations from the mouse strains used. Only 1 ChR2+ mouse exhibited staining patterns consistent with local S1FP photostimulation, while expression in ChR2- mice was more variable, and in some instances exhibits higher expression in targeted circuits compared to ChR2+ mice. It is possible that awake behaving mice already exhibit high activity in sensorimotor cortex at rest, which might obscure changes specific to optogenetic photostimulation. Regardless, because the tissue staining experiments were inconclusive in healthy animals, we did not proceed with further experiments in the stroke groups, and do not report these findings in the manuscript.

      3) Also, it is possible that contralateral stimulation is impairing recovery, not through an effect on the contralateral cortex (the site of the stroke), but on descending projections, or theoretically even through evoking activity or subclinical movement of the contralateral limb (ipsilateral to the stroke). By more carefully mapping the distribution of the activity of the stimulated brain region, and what exactly is being stimulated, these issues can be explored.

      The reviewer raises an excellent point. We have added to the “Limitations and Future work” section of the Discussion on pages 15-16

    1. • About 99% of the time, the right time is right now. • No one is as impressed with your possessions as you are. • Dont ever work for someone you dont want to become. • Cultivate 12 people who love you, because they are worth more than 12 million people who like you. • Dont keep making the same mistakes; try to make new mistakes. • If you stop to listen to a musician or street performer for more than a minute, you owe them a dollar. • Anything you say before the word “but” does not count. • When you forgive others, they may not notice, but you will heal. Forgiveness is not something we do for others; it is a gift to ourselves. • Courtesy costs nothing. Lower the toilet seat after use. Let the people in the elevator exit before you enter. Return shopping carts to their designated areas. When you borrow something, return it better shape (filled up, cleaned) than when you got it. • Whenever there is an argument between two sides, find the third side. • Efficiency is highly overrated; Goofing off is highly underrated. Regularly scheduled sabbaths, sabbaticals, vacations, breaks, aimless walks and time off are essential for top performance of any kind. The best work ethic requires a good rest ethic. • When you lead, your real job is to create more leaders, not more followers. • Criticize in private, praise in public. • Life lessons will be presented to you in the order they are needed. Everything you need to master the lesson is within you. Once you have truly learned a lesson, you will be presented with the next one. If you are alive, that means you still have lessons to learn. • It is the duty of a student to get everything out of a teacher, and the duty of a teacher to get everything out of a student. • If winning becomes too important in a game, change the rules to make it more fun. Changing rules can become the new game. • Ask funders for money, and they’ll give you advice; but ask for advice and they’ll give you money. • Productivity is often a distraction. Don’t aim for better ways to get through your tasks as quickly as possible, rather aim for better tasks that you never want to stop doing. • Immediately pay what you owe to vendors, workers, contractors. They will go out of their way to work with you first next time. • The biggest lie we tell ourselves is “I dont need to write this down because I will remember it.” • Your growth as a conscious being is measured by the number of uncomfortable conversations you are willing to have. • Speak confidently as if you are right, but listen carefully as if you are wrong. • Handy measure: the distance between your fingertips of your outstretched arms at shoulder level is your height. • The consistency of your endeavors (exercise, companionship, work) is more important than the quantity. Nothing beats small things done every day, which is way more important than what you do occasionally. • Making art is not selfish; it’s for the rest of us. If you don’t do your thing, you are cheating us. • Never ask a woman if she is pregnant. Let her tell you if she is. • Three things you need: The ability to not give up something till it works, the ability to give up something that does not work, and the trust in other people to help you distinguish between the two. • When public speaking, pause frequently. Pause before you say something in a new way, pause after you have said something you believe is important, and pause as a relief to let listeners absorb details. • There is no such thing as being “on time.” You are either late or you are early. Your choice. • Ask anyone you admire: Their lucky breaks happened on a detour from their main goal. So embrace detours. Life is not a straight line for anyone. • The best way to get a correct answer on the internet is to post an obviously wrong answer and wait for someone to correct you. • You’ll get 10x better results by elevating good behavior rather than punishing bad behavior, especially in children and animals. • Spend as much time crafting the subject line of an email as the message itself because the subject line is often the only thing people read. • Don’t wait for the storm to pass; dance in the rain. • When checking references for a job applicant, employers may be reluctant or prohibited from saying anything negative, so leave or send a message that says, “Get back to me if you highly recommend this applicant as super great.” If they don’t reply take that as a negative. • Use a password manager: Safer, easier, better. • Half the skill of being educated is learning what you can ignore. • The advantage of a ridiculously ambitious goal is that it sets the bar very high so even in failure it may be a success measured by the ordinary. • A great way to understand yourself is to seriously reflect on everything you find irritating in others. • Keep all your things visible in a hotel room, not in drawers, and all gathered into one spot. That way you’ll never leave anything behind. If you need to have something like a charger off to the side, place a couple of other large items next to it, because you are less likely to leave 3 items behind than just one. • Denying or deflecting a compliment is rude. Accept it with thanks, even if you believe it is not deserved. • Always read the plaque next to the monument. • When you have some success, the feeling of being an imposter can be real. Who am I fooling? But when you create things that only you — with your unique talents and experience — can do, then you are absolutely not an imposter. You are the ordained. It is your duty to work on things that only you can do. • What you do on your bad days matters more than what you do on your good days. • Make stuff that is good for people to have. • When you open paint, even a tiny bit, it will always find its way to your clothes no matter how careful you are. Dress accordingly. • To keep young kids behaving on a car road trip, have a bag of their favorite candy and throw a piece out the window each time they misbehave. • You cannot get smart people to work extremely hard just for money. • When you don’t know how much to pay someone for a particular task, ask them “what would be fair” and their answer usually is. • 90% of everything is crap. If you think you don’t like opera, romance novels, TikTok, country music, vegan food, NFTs, keep trying to see if you can find the 10% that is not crap. • You will be judged on how well you treat those who can do nothing for you. • We tend to overestimate what we can do in a day, and underestimate what we can achieve in a decade. Miraculous things can be accomplished if you give it ten years. A long game will compound small gains to overcome even big mistakes. • Thank a teacher who changed your life. • You cant reason someone out of a notion that they didn’t reason themselves into. • Your best job will be one that you were unqualified for because it stretches you. In fact only apply to jobs you are unqualified for. • Buy used books. They have the same words as the new ones. Also libraries. • You can be whatever you want, so be the person who ends meetings early. • A wise man said, “Before you speak, let your words pass through three gates. At the first gate, ask yourself, “Is it true?” At the second gate ask, “Is it necessary?” At the third gate ask, “Is it kind?” • Take the stairs. • What you actually pay for something is at least twice the listed price because of the energy, time, money needed to set it up, learn, maintain, repair, and dispose of at the end. Not all prices appear on labels. Actual costs are 2x listed prices. • When you arrive at your room in a hotel, locate the emergency exits. It only takes a minute. • The only productive way to answer “what should I do now?” is to first tackle the question of “who should I become?” • Average returns sustained over an above-average period of time yield extraordinary results. Buy and hold. • It’s thrilling to be extremely polite to rude strangers. • It’s possible that a not-so smart person, who can communicate well, can do much better than a super smart person who can’t communicate well. That is good news because it is much easier to improve your communication skills than your intelligence. • Getting cheated occasionally is the small price for trusting the best of everyone, because when you trust the best in others, they generally treat you best. • Art is whatever you can get away with. • For the best results with your children, spend only half the money you think you should, but double the time with them. • Purchase the most recent tourist guidebook to your home town or region. You’ll learn a lot by playing the tourist once a year. • Dont wait in line to eat something famous. It is rarely worth the wait. • To rapidly reveal the true character of a person you just met, move them onto an abysmally slow internet connection. Observe. • Prescription for popular success: do something strange. Make a habit of your weird. • Be a pro. Back up your back up. Have at least one physical backup and one backup in the cloud. Have more than one of each. How much would you pay to retrieve all your data, photos, notes, if you lost them? Backups are cheap compared to regrets. • Dont believe everything you think you believe. • To signal an emergency, use the rule of three; 3 shouts, 3 horn blasts, or 3 whistles. • At a restaurant do you order what you know is great, or do you try something new? Do you make what you know will sell or try something new? Do you keep dating new folks or try to commit to someone you already met? The optimal balance for exploring new things vs exploiting them once found is: 1/3. Spend 1/3 of your time on exploring and 2/3 time on deepening. It is harder to devote time to exploring as you age because it seems unproductive, but aim for 1/3. • Actual great opportunities do not have “Great Opportunities” in the subject line. • When introduced to someone make eye contact and count to 4. You’ll both remember each other. • Take note if you find yourself wondering “Where is my good knife? Or, where is my good pen?” That means you have bad ones. Get rid of those. • When you are stuck, explain your problem to others. Often simply laying out a problem will present a solution. Make “explaining the problem” part of your troubleshooting process. • When buying a garden hose, an extension cord, or a ladder, get one substantially longer than you think you need. It’ll be the right size. • Dont bother fighting the old; just build the new. • Your group can achieve great things way beyond your means simply by showing people that they are appreciated. • When someone tells you about the peak year of human history, the period of time when things were good before things went downhill, it will always be the years of when they were 10 years old — which is the peak of any human’s existence. • You are as big as the things that make you angry. • When speaking to an audience it’s better to fix your gaze on a few people than to “spray” your gaze across the room. Your eyes telegraph to others whether you really believe what you are saying. • Habit is far more dependable than inspiration. Make progress by making habits. Dont focus on getting into shape. Focus on becoming the kind of person who never misses a workout. • When negotiating, dont aim for a bigger piece of the pie; aim to create a bigger pie. • If you repeated what you did today 365 more times will you be where you want to be next year? • You see only 2% of another person, and they see only 2% of you. Attune yourselves to the hidden 98%. • Your time and space are limited. Remove, give away, throw out things in your life that dont spark joy any longer in order to make room for those that do. • Our descendants will achieve things that will amaze us, yet a portion of what they will create could have been made with today’s materials and tools if we had had the imagination. Think bigger. • For a great payoff be especially curious about the things you are not interested in. • Focus on directions rather than destinations. Who knows their destiny? But maintain the right direction and you’ll arrive at where you want to go. • Every breakthrough is at first laughable and ridiculous. In fact if it did not start out laughable and ridiculous, it is not a breakthrough. • If you loan someone $20 and you never see them again because they are avoiding paying you back, that makes it worth $20. • Copying others is a good way to start. Copying yourself is a disappointing way to end. • The best time to negotiate your salary for a new job is the moment AFTER they say they want you, and not before. Then it becomes a game of chicken for each side to name an amount first, but it is to your advantage to get them to give a number before you do. • Rather than steering your life to avoid surprises, aim directly for them. • Dont purchase extra insurance if you are renting a car with a credit card. • If your opinions on one subject can be predicted from your opinions on another, you may be in the grip of an ideology. When you truly think for yourself your conclusions will not be predictable. • Aim to die broke. Give to your beneficiaries before you die; it’s more fun and useful. Spend it all. Your last check should go to the funeral home and it should bounce. • The chief prevention against getting old is to remain astonished.

      So much wisdom and stuff to think about here.

    1. Author Response

      Reviewer #1 (Public Review):

      1) The connectivity patterns along the anterior-posterior hippocampal axis broadly follow an anterior-posterior cortical bias, such that posterior regions, e.g. the visual cortex, are preferentially connected to the hippocampal tail, and anterior regions, e.g. the temporal pole, are preferentially connected to the hippocampal head. The authors focus on the twenty regions with the highest connectivity profiles, which appears to capture the majority of all connections. However, some of the present structural connectivity patterns differ in interesting ways from previously described cortical networks reported in resting-state fMRI studies. Most notably, the medial PFC and orbitofrontal regions combined account for less than 1% of all connections in the present investigation (Table S1 & S2). This is an interesting contrast to functional investigations which tend to find that these regions cluster with the aHPC (e.g., Adnan et al. 2016 Brain Struct Func; Barnett et al. 2021 PLoS Biol; Robinson et al. 2016 NeuroImage). In contrast, the present DWI results suggesting preferential pHPC-medial parietal connectivity dovetail with those observed in fMRI studies. It seems important to discuss why these differences may arise: whether this is a differentiation between structural and functional networks, or whether this is due to a difference in methods.

      We thank Reviewer 1 for making this important point and agree that these observations are deserving of further expansion. We have now included additional text where we place the surprising observation of sparse connectivity between PFC regions and the hippocampus more firmly in the context of recent evidence and argue that these observations suggest a potential differentiation between structural and functional networks.

      We have included the following text in the discussion (pp. 16-17, lines 439-457);

      “While many of our observed anatomical connections dovetail nicely with known functional associations, patterns of anatomical connectivity strength did not always mirror well characterised functional associations between the hippocampus and cortical areas. For example, a surprising observation from our study was that only weak patterns of anatomical connectivity were observed between the hippocampus and the ventromedial prefrontal cortex (vmPFC) and other frontal cortical areas. This lies in contrast to well documented functional associations between these regions (46-48). Our observation, however, supports a growing body of evidence that direct anatomical connectivity between the hippocampus and areas of the PFC may be surprisingly sparse in the human brain. For example, Rosen and Halgren (49) recently reported that long range connections between the hippocampus and functionally related frontal cortical areas may constitute fewer than 10 axons/mm2 and more broadly observed that axon density between spatially distant but functionally associated brain areas may be much lower than previously thought. Our observation of sparse anatomical connectivity between the hippocampus and PFC mirrors this recent work and suggests a potential differentiation between structural and functional networks as they relate to the hippocampus. It remains possible, however, that methodological factors may contribute to these differences. We return to this point later in the discussion. A future dedicated study aimed at assessing whether the well characterised functional associations between the hippocampus and vmPFC are driven by sparse direct connections or primarily by intermediary structures is necessary to address this issue in an appropriate level of detail.”

      2) While the analytic pipeline is described in sufficient detail in the Methods, it is somewhat unclear to a non-DWI expert what the major methodological advance is over prior approaches. The authors refer to a tailored processing pipeline and 'an advance in the ability to map the anatomical connectivity (p. 5), but it's not immediately clear what these entail. It would be useful to highlight the key methodological differences or advances in the Introduction to help with the interpretation of the similarities and differences with previous connectivity findings.

      We have now included a brief description in the Introduction highlighting the key methodological advances used in the current study.

      We have included the following text in the Introduction (pp. 4-5, lines 130-144);

      “In typical fibre-tracking studies, we cannot reliably ascertain where streamlines would naturally terminate, as they have been found to also display unrealistic terminations, such as in the middle of white matter or in cerebrospinal fluid (39). While methods have been proposed to ensure more meaningful terminations (40), for example, with terminations forced at the grey matter-white matter interface (gmwmi), this approach is still not appropriate for characterising terminations within complex structures like the hippocampus. A key methodological advance of our approach was to remove portions of the gmwmi inferior to the hippocampus (where white matter fibres are known to enter/leave the hippocampus). This allowed streamlines to permeate the hippocampus in a biologically plausible manner. Importantly, we combined this with a tailored processing pipeline that allowed us to follow the course of streamlines within the hippocampus and identify their ‘natural’ termination points. These simple but effective methodological advances allowed us to map the spatial distribution of streamline ‘endpoints’ within the hippocampus. We further combined this approach with state-of-the-art tractography methods that incorporate anatomical information (40) and assign weights to each streamline (41) to achieve quantitative connectivity results that more faithfully reflect the biological accuracy of the connection’s strength (39).”

      3) Related to the point above, it was a bit unclear to me how the present connections map onto canonical white matter tracts. In Fig., 4A, the tracts are shown for a single participant, but it would be helpful to map or quantify know how many of the connections for a given hippocampal subregion are associated with a given tract to provide a link to prior work or clarify the approach. A fairly large body of prior research on hippocampal white matter connectivity has focused on the fornix, but it's a little difficult to align these prior findings with the connectivity density results in the current paper.

      We thank Reviewer 1 for this comment and agree this would be an interesting avenue to pursue. However, the reliable segmentation of white matter fibre bundles is currently an area of contention in the DWI community. This pervasive and problematic issue was highlighted in a recently published large multi-site study that revealed a high degree of variability in how white matter bundles are defined, even from the same set of whole-brain streamlines (Schilling et al., 2021, Neuroimage. Nov; 243:118502. https://pubmed.ncbi.nlm.nih.gov/34433094/). This means that, even if we were to choose a particular method to segment white matter bundles, our results would not be readily translatable to those reported in previous DWI studies. This significantly limits meaningful comparison and/or interpretation. Indeed, such an approach may paradoxically take away from the detailed characterisations we have achieved in the current study. As highlighted in that study, it is now paramount that consensus is reached in this field to define criteria to reliably and reproducibly define white matter fibre bundles. Once that is achieved, we plan to conduct a follow-up study to characterise this in more detail, with bundles that will be able to be reliably reproduced by others.

      4) Finally, on a more speculative note: based on the endpoint density maps, there seems to be a lot of overlap between the EDMs associated with different cortical regions (which makes sense given the subregion results). Does this effectively mean that the same endpoints may be equally connected with multiple different cortical regions? Part of the answer can be found in Fig. 3D showing the combined EDM for three different regions, but how spatially unique is each endpoint? This is likely not a feasible question to address analytically but it might be helpful to provide some more context for what these maps represent and how they might relate to differences across individuals.

      The primary aim of the current analysis was to characterise broad patterns of endpoint density captured by our averaged group level analysis. However, Reviewer 1 is astute in assuming that, although there is overlap in the group averaged endpoint density maps (EDMs) associated with different cortical areas, at the single participant level, there are both overlaps and spatial uniqueness in the location of individual endpoints. For example, while group level analysis revealed that area V1 and area V2 showed preferential connectivity with overlapping regions of the posterior medial hippocampus, when visualising individual endpoints associated with each of these areas at the single participant level, we can see that some endpoints overlap while others display spatially unique patterns (see image below). Although a more in-depth analysis of individual variability in these patterns was beyond the scope of this investigation (as noted on Page14; Lines 379-381), we agree with Reviewer 1 that this is an important point to note in the manuscript. We have, therefore, included additional text touching on this and have included a new Supplementary Figure (Page 42; also see below) to emphasise that, at the single participant level, different cortical areas display both overlapping and spatially unique endpoints within specific regions of the hippocampus (using areas V1 and V2 as an example).

      We have included the following text in the Results section (pp. 14, lines 370-379);

      “Finally, while we observed clear overlaps in the group averaged EDMs associated with specific cortical areas, a closer inspection of individual endpoints at the single participant level revealed that endpoints associated with different cortical areas displayed both overlapping and spatially unique characteristics within these areas of overlap. For example, at the group level, areas V1 and V2 showed preferential connectivity with overlapping regions of the posterior medial hippocampus (see Supplementary Figure S5) while, at the single participant level, individual endpoints associated with each of these areas display both overlapping and spatially unique patterns (see Supplementary Figure S6). This suggests that, while specific cortical areas display overlapping patterns of connectivity within specific regions of the hippocampus, subtle differences in how these cortical regions connect within these areas of overlap likely exist.”

      Reviewer #2 (Public Review):

      Dalton and colleagues present an interesting and timely manuscript on diffusion weighted imaging analysis of human hippocampal connectivity. The focus is on connectivity differences along the hippocampal long axis, which in principle would provide important insights into the neuroanatomical underpinnings of functional long axis differences in the human brain. In keeping with current models of long-axis organisation, connectivity profiles show both discrete areas of higher connectivity in long axis portions, as well as an anterior-to-posterior gradient of increasing connectivity. Endpoint density mapping provided a finer grained analysis, by allowing visualisation of the spatial distribution of hippocampal endpoint density associated with each cortical area. This is particularly interesting in terms of the medial-lateral distribution with hippocampal head, body and tail. Specific areas map to precise hippocampal loci, and some hippocampal loci receive inputs from multiple cortical areas.

      This work is well-motivated, well-written and interesting. The authors have capitalised on existing data from the Human Connectome Project. I particularly like the way the authors try to link their findings to human histological data, and to previous NHP tracing results.

      Many thanks.

      1) There are some important surprises in the results, particularly the relatively strong connectivity between hippocampus and early visual areas (including V1) and low connectivity with areas highly relevant from functional perspectives, such as the medial prefrontal cortex (rank order by strength of connectivity 7th and 78th of all cortical structures, respectively). This raises a concern that the fibre tracking method may be joining hippocampal connections with other tracts. In particular, given the anatomical proximity of the lateral geniculate nucleus to the body and tail of the hippocampus, the reported V1 connectivity potentially reflects a fusion of tracked fibres with the optic radiation. In visualizing the putative posterior hippocampus-to-V1 projection (Figure 4B, turquoise), the tract does indeed resemble the optic radiation topography. Although care was taken to minimise the hippocampus mask 'spilling' into adjacent white matter, this was done with focus on the hippocampal inferior margin, whereas the different components of the optic radiation lie lateral and superior to the hippocampus.

      We agree with Reviewer 2 that our observations relating to area V1 could be the result of limitations inherent to current tracking methodology. Indeed, probabilistic tracking can result in tracks mistakenly ‘jumping’ between fibre bundles. Unfortunately, primarily due to limitations in image resolution, we do not believe that we can categorically rule this possibility out in the current dataset beyond the measures we have already taken in our analysis pipeline. We have now included additional text in the Discussion acknowledging and emphasising this possible limitation of our study.

      We have included the following text in the Discussion section (Page 25; Lines 694-699);

      “Also, we cannot rule out that some connections observed in the current study may result from limitations inherent to current probabilistic fibre-tracking methods whereby tracks can mistakenly ‘jump’ between fibre bundles (e.g. for connections between the posterior medial hippocampus and area V1 due to the proximity to the optic radiation), especially in “bottleneck” areas. Again, future work using higher resolution data may allow more targeted investigations necessary to confirm or refute the patterns we observed here.”

      Beyond the possibility of tracks jumping between fibre bundles, we feel it is important to emphasise that an integral part of our analysis was the detailed attention we took to minimise mask ‘spillage’ of the entire hippocampus mask. It is not the case that we primarily focussed on inferior portions of the hippocampus as stated by Reviewer 2. Equal focus was paid to medial, lateral and superior portions of the mask which lie adjacent to visual thalamic nuclei, the optic radiation posteriorly and a number of other structures. We can see that our description relating to this lacked the necessary detail to convey this important point clearly and we apologise for the confusion. We have, therefore, included additional text in the Methods section clarifying this further.

      We have included the following text in the Methods section (Page 26; Lines 751-755);

      “We took particular care to ensure that all boundaries of the hippocampus mask (including inferior, superior, medial and lateral aspects) did not encroach into adjacent white or grey matter structures (e.g., amygdala, thalamic nuclei). This minimised the potential fusion of white matter tracts associated with other areas with our hippocampus mask.”

      These points notwithstanding, our results support recently observed structural and functional associations between the posterior hippocampus and early visual processing areas. We agree that these findings are potentially of great conceptual importance for how we think about the hippocampus and its connectivity with primary sensory cortices in the human brain and we have now included a brief comment relating to this in the Discussion.

      We have included the following text in the Discussion (Page 23-24; Lines 638-644);

      “However, this observation supports recent reports of similar patterns of anatomical connectivity as measured by DWI in the human brain (38) and functional associations between these areas (43, 60). Collectively, these findings are potentially of great conceptual importance for how we think about the hippocampus and its connectivity with early sensory cortices in the human brain and open new avenues to probe the degree to which these regions may interact to support visuospatial cognitive functions such as episodic memory, mental imagery and imagination.”

      2) A second concern pertains to the location of endpoint densities within the hippocampus from the cortical mantle. These are almost entirely in CA1/subiculum/presubiculum. It is, however, puzzling why, in Supp Figure 2, the hippocampal endpoints for entorhinal projections is really quite similar to what is observed for other cortical projections (e.g., those from area TF). One would expect more endpoint density in the superior portions of the hippocampal cross section in head and body, in keeping with DG/CA3 termination. I note that streamlines were permitted to move within the hippocampus, but the highest density of endpoints is still around the margins.

      We agree with Reviewer 2 that, in relation to the entorhinal cortex, we would expect to see more endpoint density in areas aligning with the dentate gyrus (DG) and CA3 regions of the hippocampus. We noted in the discussion that “Despite the high-quality HCP data used in this study, limitations in spatial resolution likely restrict our ability to track particularly convoluted white-matter pathways within the hippocampus and our results should be interpreted with this in mind”. We believe that this limitation applies to pathways between the entorhinal cortex and DG/CA3. We have now included additional text specifically noting that this limitation likely affects our ability to track streamlines as they relate to DG/CA3. A targeted investigation of this effect using higher resolution diffusion MRI data may help address this issue, and this will be the subject of future work.

      We have included the following text in the Discussion (Page 25; Lines 690-693);

      “Indeed, this may explain the surprising lack of endpoint density observed in the DG/CA4-CA3 regions of the hippocampus where we would expect to see high endpoint density associated with, for example, the entorhinal cortex which is known to project to these regions. Future dedicated studies using higher resolution data are needed to assess these pathways in greater detail.”

      3) On a related point, the use of "medial" and "lateral" hippocampus can be confusing. In the head, CA2/3 is medial to CA1, but so are subicular subareas, just that the latter are inferior.”

      We agree that applying the terms ‘medial’ and ‘lateral’ to our three-dimensional representations can lead to some ambiguities and confusion. We have included a new description defining our use of these terms in the Results section.

      We have included the following text in the Results section (Page 10; Lines 268-273).

      “In relation to nomenclature, our use of the term ‘medial’ hippocampus refers to inferior portions of the hippocampus aligning with the distal subiculum, presubiculum and parasubiculum. Our use of the term ‘lateral’ hippocampus refers to inferior portions of the hippocampus aligning with the proximal subiculum and CA1. In instances that we refer to portions of the hippocampus that align with the DG or CA3/2 we state these regions explicitly by name”.

    1. Discussion, revision and decision


      Decision

      Verified with reservations: The content is scientifically sound, but has shortcomings that could be improved by further studies and/or minor revisions.

      Dr. Bañuelos: Verified manuscript

      Dr. Morris: Verified with reservations


      Revision

      Response to Reviewer 1 (Dr. Bañuelos)

      1. Most importantly, I would like to see an introduction that explains the authors’ general arguments about grading changes – including the trajectory of these changes at Dalhousie and why this arc contributes to our knowledge of the history of higher education more broadly. Then, the authors might continually remind us of the arc they present at the outset of their paper – especially when they are highlighting a piece of evidence that illustrates their central argument. To me, the quotes from students and faculty responding to grading changes are among the most interesting parts of the paper and placing these in additional context should make them shine even more brightly!

      Our Response: Thank you so much for your thoughtful review. We have added a larger new introduction section of the paper (paragraphs 1-5 in the latest draft are new) that outlines the general importance of the topic, the Canadian context, details on Dalhousie University, and our overall thesis statement (i.e., most decisions were to improve the external communication value of grades). Moreover, we have added three new student quotes form the Dalhousie Gazette to build a stronger picture for student reactions, and to build a better case for our overall thesis statement (i.e., that changes in grading were often to increase the external communication value of grades). Moreover, throughout we have added some details on the overall funding trajectory for institutions in Canada that created some pressure to standardize grading. We think that these changes have improved the manuscript.

      1. I’d like to read a little more about Dalhousie itself – why it is either a remarkable or unremarkable place to study changes in grading policies. Is it representative of most Canadian universities and thus, a good example of how grading changes work in this national context? Is it unlike any other institution of higher education and thus, tells us something important about grades that we could not learn from other case studies? I don’t think this kind of description needs to be particularly long, but it should be a little more involved than the brief sentences the authors currently include (p.3, paragraph 1) and should explain the choice of this case.

      Our Response: This comment revealed that two additional pieces of context were needed for the introduction: (a) some national context for higher education policy in Canada and (b) some extended description of Dalhousie University when compared to other universities in Canada. To this end, two new paragraphs have been added to the paper (paragraphs 2 & 3 in the current draft).

      Notably, Jones (2014) notes that “Canada may have the most decentralized approach to higher education than any other developed country on the planet” (pg 20). With this in mind, any historical review of education policy is by necessity specific to province and institution – that is, the information can be placed in its context, but resists wide generalization to the country as a whole. In the newest draft, we tried to describe the national, provincial, and institutional context in some more detail in paragraphs 2 & 3.

      1. I’d also like to know more about the archival materials the authors used. The authors mention that they drew from “Senate minutes, university calendars, and student newspapers” (p. 3), but what kinds of conversations about grades did these materials include? At various points, the authors engage in “speculation” (e.g. p.4) about why a particular change occurred. This is just fine and, in fact, it’s good of the authors to remind us that they are not really sure why some of these shifts happened. But, they might go one step further and tell us why they have to speculate. Were explicit discussions of grading changes – including in inter- and intradepartmental letters and memo, reports, and other documents – not available in these archives? Why are these important discussions absent from the historical record?

      Our Response: We have added a new paragraph (paragraph 4) to the paper discussing the sources in some more detail. It is true that the verbatim discussions are frequently absent from the record, especially earlier in history – or if they exist, we have not found them! Instead, we frequently are reviewing meeting minutes or committee reports, which are summaries of discussions. As we now note in the paper, “Thus, the sources used showed what policy changes were implemented, when they were implemented, and a general sense of whether there was opposition to changes; however, there were notable gaps in faculty and student reactions to grade policy changes, as these reactions were frequently not written down and archived.”

      This gap was most apparent in the Senate minutes around the 1940s, where I (the first author) could not find any direct discussions of why changes were implemented. Under the 1937-1947 heading, we more clearly indicate that the rationale for the changes was absent from the Senate minutes during this period. I add some further speculation on why these records might be absent, based on summaries from Waite (1998b); specifically, the university president of the time often made unilateral decisions, circumventing Senate, which might account for why the changes are absent from the records.

      This will hopefully make the limitations of what can be learned from this approach more apparent.

      1. At various points, the authors make references to the outside world – for example, WWII (p. 5), the Veteran’s Rehabilitation Act (pp. 6-7), and British versus American grading schemas (p. 6). But, these references are brief and seem almost off-handed. I know space is limited, but putting these grading changes in their broader context might help make the case for why this study is interesting and important. Are the changes in the 1940s, for example, related to the ascendance of one national graduate education model over another (e.g. American versus British)? Are there any data on how many Canadian undergraduates enrolled in British versus American graduate programs over time? If so, I would share any information you might have on these broader trends.

      Our Response: To our knowledge, there isn’t any comparable report to what we’ve written here documenting the transition from British “divisions” to American “letter grades” in Canadian Universities, making our report novel in this regard. It might well be that a similar historical arc exists in many of the 223 public and private universities in Canada, but we don’t believe such data exists in any readily accessible way – excepting perhaps undergoing a similar deep dive into historical documents at each respective institution! So, we do not have the answer to your question: “Are there any data on how many Canadian undergraduates enrolled in British versus American graduate programs over time?” However, we did add one reference which provided a snapshot point of comparison in 1960, noting in the paper “Baldwin (1960) notes that the criteria for “High First Class” grades in the humanities was around 75-80% at Universities of Toronto, Alberta, and British Columbia in 1960, suggesting that Dalhousie’s system was similar to other research-intensive universities around this time.” That said, there are a few major national events related to the funding of universities in Canada that we have elaborated on in the text to address the spirit of your recommendation for describing the national context:

      a) In the “Late 1940s” section of the paper, we added: “Though Dalhousie had an unusually high proportion of veterans enrolled relative to other maritime universities during this period (Turner, 2011), the Veteran’s Rehabilitation Act was a turning point for large increases in enrollment and government funding Canada-wide, at least until the economic recession of the 1970s (Jones, 2014).”

      b) In the 1990s, there were major government cuts to funding, creating challenging financial times for the university. We discuss the funding pressures that likely contributed to standardization of grading during this time by saying the following in the 1980s-2000s section: “Starting in in the 1980s-1990s there were major government cuts to university funding nation-wide, with the cuts becoming more severe in the 1990s (Jones, 2014; Higher Education Strategy Associates, 2021). Because of the nature of the funding formulas, cuts in Nova Scotia were especially deep. Beyond tuition increases, university administrators knew that obtaining external research grants, Canada Research Chairs, and scholarship funding was one of the few other ways for a university to balance budgets, so there was extra pressure to be competitive in these pools. […] The increased standardization was likely related to increased financial pressures at this time – standardization is an oft-employed tool to deal with ever-increasing class sizes with no additional resources.”

      c) In the 2010s section of the paper, we added context to how universities in country-wide have become increasingly dependent on tuition fees for funding: “Following the 2008 recession, federal funding decreased again (Jones, 2014; Higher Education Strategy Associates, 2021); however, this time universities tended to balance budgets by increasing tuition and international student fees. This trend towards increased reliance on tuition for income is especially pronounced in Nova Scotia, which has the highest tuition rates in the country (Higher Education Strategy Associates, 2021). Thus, the university moved closer to a “consumer” model of education, so it makes sense that a driving force for standardization was student complaints.”

      1. This is a very nitpicky concern that doesn’t fit well elsewhere, so please take it with a grain of salt. I was surprised at the length of the reference list – it seemed quite short for a historical piece! I wonder, again, if more description of the archival material - including why you looked at these sources, in particular, and what was missing from the record – would help explain this and further convince the reader that you have all your bases covered.

      Our Response: In the introduction section, paragraph 4, we describe our sources in more detail including what is likely missing from the record and why we used them. Regarding the length of the reference list, we did add ~12 new references to the list in the course of making various revisions, which partially addresses your concern. Beyond this though, it’s worth noting that some of the sources more extensive than they seem, even though they don’t take up much space in the reference list (e.g., there is one entry for course calendars, but this covers ~100 documents reviewed!). Moreover, there were many dead-ends in the archives that are not cited (e.g., reviewing 10 years of Senate minutes in the 1940s produced little of relevance), so the reference list is curated to only those sources where relevant materials were found.

      Reviewer response to revisions

      The new introduction to the piece addresses many of my previous questions about the authors’ general arguments, the Dalhousie context, and the source material. Thank you for addressing these! Reading this version, it is much clearer that the key argument is that standardized, centralized grading practices were “to improve the external communication value of the grades, rather than for pedagogical reasons” (p. 6). I also really enjoyed the added quotes from students in the Dalhousie Gazette.

      The authors’ response to Reviewer 2 really gave me a better sense of why they wrote this piece and also helped me to more clearly put my finger on what was troubling me in the first round. It still reads a little like a report for an internal audience – which is just fine and, in fact, can be extremely useful for historians of the future. But, as Reviewer 2 notes, this means it does not really seem like a piece of historical scholarship. I do worry that shaping it into this form would take an extensive revision and might not be in the spirit of what the authors intended to do.

      A different version of this article might start with this idea that grades were standardized for external audiences and in response to financial pressures. It would then develop a richer story behind the sudden importance of these external audiences and the nature (i.e. source, type) of financial pressures Dalhousie was facing. It would highlight the impact such changes had on students and their future careers/graduate experiences. It could then connect these trends to other similar changes for external audiences and the increasing interconnectedness of American, Canadian, and British systems through graduate education. It might even turn to sociological theories of organizational change and adaptation and make an argument for when (historically) similar forms of decoupling were likely to occur in the Canadian higher education system. Finally, it might connect these grading changes to current trends – including accusations of grade inflation and accepted best practices for measuring learning outcomes.

      But, it doesn’t seem that the authors necessarily want to do this, which I can understand and respect. I think there is enormous value in a piece of scholarship like this existing – both for internal audiences and for future historians. Indeed, imagine if every university had a detailed history of its grading policies like this available somewhere online! Comparing such practices across institutions would certainly tell us a lot about why grading currently looks the way it does.

      Decision changed

      Verified manuscript: The content is scientifically sound, only minor amendments (if any) are suggested.


      Response to Reviewer 2 (Dr. Morris)

      The authors dove headfirst into Dalhousie’s archives, unpacking the subtle shifts in grading policy. Their work seems to be comparable to archaeologists, digging deep beneath mountains of primary sources to find nuggets of clues into Dalhousie’s grading evolution. I particularly liked when the authors were able to link these changes to student voices, as seen in moments when they referenced student publications.

      Ultimately, I kept coming back to one main comment that I wrote in the margins: “So what?” I would humbly suggest that the authors reflect on why this history matters to them. Granted, they do this in the conclusion, where they touch on Schneider & Hutt’s argument that grades evolved to increasingly be a form of external communication with audiences beyond school communities. Sure. But I want more. I wanted to see a new insight that this microhistory of Dalhousie significant to the history of Canada or the history of education more generally.

      If the authors are so inclined, there might be several approaches to transform this manuscript. I would suggest the following. First, instead of tracing the entire history of grading at the institution, choose one moment of change that you think is the most important. Perhaps in the 1920s and the lack of transparency in grading, or the post-war shift toward American grading. Second, show me – don’t tell me – what Dalhousie was like at this moment. Paint a picture of the institution with details about student demographics, curriculum, educational goals, the broader town, etc. Make the community come alive. Show me what makes Dalhousie unique from other institutions of higher ed. Once you establish that picture, perhaps you could link the change in grading practices to subtle changes at the university community, thereby establishing a before and after snapshot. This will require considerable amounts of work, and the skills of a historian. You will have to find primary and secondary sources that go far beyond what you’ve relied on thus far.

      In the end, I found myself wanting the authors to humanize this manuscript, meaning I wanted them to show me that changes in grading practices have tangible effects on real-life human beings. A humanization of their research would mean going narrower and deeper; or, in other words, eliminating much of what they have documented.

      However, if that is too tall of an order, I would ask that the authors clarify for themselves who this manuscript is for. Is this a chronicling of facts for an internal audience at Dalhousie’s faculty, alumni, and students? Fine. But my guess is that even members of the Dalhousie community want to read something relatable.

      I am suggesting revisions, although not because of objective errors. History is more of an art, in my opinion. With that in mind, I would suggest that the authors paint a more vivid picture (metaphorically) of Dalhousie, showing me how changes one moment of change in grading practices impacted the lives of human beings.

      Our Response: Thank you very much for taking the time to read our paper and provide your thoughts and recommendations. It may be helpful to begin by describing why I (the first author) decided to write this paper. Ultimately, I wrote this paper to satisfy my own personal curiosity and to connect with other people at my own place of employment by exploring our shared history. At present day, Dalhousie has a letter grading scheme with a standardized percentage conversion scheme that all instructors used. I wanted to know why this particular scheme was used, but I quickly realized that nobody at Dalhousie really knew how we ended up grading this way! There was an institutional memory gap, and a puzzle that was irresistible to me. So, I wrote this paper for the most basic of all academic reasons: Pure curiosity. I do very much recognize that the subject matter is very niche, perhaps too niche for a traditional journal outlet. Thus, my publishing plan is to self-publish a manuscript to the Education Resources Information Center (ERIC) database and a preprint server as a way of sharing my work with others who might be interested in what I found. Nonetheless, I believe in the importance and value of peer review, especially since I am writing in a field different than most of my scholarly work. That is why I chose PeerRef as a place to submit, so that I could undergo rigorous peer review to improve the work while still maintaining the niche subject matter and focus that drives my passion and curiosity for the project. Of course, if you feel the whole endeavor is so flawed that it precludes publication anywhere, then we can consider this a “rejection” and I will not make any further edits through PeerRef.<br /> The core of your critique suggested that I should write a fundamentally different paper on different subject matter. While I don’t necessarily disagree that the kind of paper you describe might have broader appeal, it would no longer answer the core research question I wanted an answer to: How has Dalhousie’s grading changed over time? So, I must decline to rewrite the paper to focus on a single timeframe as recommended. All this said, I did try my best to address the spirit of your various concerns to improve the quality of the manuscript. Below, I will outline the various major changes to the manuscript that we made to improve the manuscript along the lines you described, while maintaining our original vision for the structure and focus of the paper. The specific changes are outline below:

      a) Two new paragraphs (now paragraphs 1-2 of the revised manuscript) were added to explain the “so what” part of the question. Specifically, we describe why we think the subject matter might be of interest to others and summarize the general dearth of historical information on grading practices in Canada as a whole.

      b) Consistent with recommendations from the other reviewer, we now state a core argument (i.e., that most major grading changes were implemented to improve the external communication value of the grades) earlier in the introduction in paragraph 5 and describe how various pieces of evidence throughout the manuscript tie back to that core theme.

      c) In an attempt to “humanize” the manuscript more, we added more student quotes from the Dalhousie Gazette throughout the paper so that readers can get a better sense of how students thought about grading practices at various times throughout history. Specifically, three new quotes were added in the following sections: 1901-1936, late 1940s, 1950s-1970s. We also added this short note about the physical location where grades used to be posted: “Naturally, this physical location was dreaded by students, and was colloquially referred to as “The Morgue” (Anonymous Dalhousie Gazette Author, 1937).”

      d) Early in the paper, we describe why we chose Dalhousie and the potential audience of interest: “As employees of Dalhousie, we naturally chose this institution as a case study due to accessibility of records and because it has local, community-level interest. The audience was intended to be members of the Dalhousie community; however, it may also be a useful point of comparison for other institutions, should similar histories be written.”

      e) We have described some of the limitations of our sources in paragraph 4, which may explain why the manuscript takes the form it does – it has conformed to the information that is available!

      f) We have linked events at Dalhousie to the national context in some more detail, by detailing some national events related to the funding of universities in Canada. See our response to Reviewer 1, #4 above for more details on the specific changes.

      g) Consistent with your stylistic recommendations, we have changed various spots throughout the paper from the present tense (e.g., “is”) to the past tense (e.g., “was”), and were careful in our new additions to maintain the past tense, when appropriate. If there are any spots that we missed, let us know the page number / section, and we will make further changes, as necessary.

      h) We retained the first person in our writing – this may be discipline-specific, but in Psychology (the first author’s home discipline), first person is acceptable in academic writing. If you feel strongly about this, we can go through the manuscript and remove all instances of the first person, but we would prefer to keep it, if at all possible.

      Hopefully this helps address the spirit of your concerns, and I look forward to hearing your thoughts in the second round of reviews.

      Decision changed

      Verified with reservations: The content is scientifically sound, but has shortcomings that could be improved by further studies and/or minor revisions.

    1. Author Response

      Reviewer #1 (Public Review):

      We thank the reviewer for a very constructive evaluation of our work and for a fair summary of its main strengths. We have addressed her/his main concerns as follows:

      1) The experiments involve an invasive neurosurgical procedure used to perform hippocampal imaging, which removes the ipsilateral overlying somatosensory cortex, and it is not possible to evaluate from the data provided that this surgery does not disrupt network function, especially given the focus on movement-related activity patterns.

      We thank the reviewer for bringing up this important issue. Indeed, our experimental access to early hippocampal activity with 2-photon calcium imaging relies on a quite invasive procedure. However, the many control experiments we have performed indicate that early hippocampal dynamics were not significantly altered by the surgery. First, our extracellular electrophysiological recordings from a sample of 6 mice (ranging from P6 to P11, Figure 1- figure supplement 1C) show that the frequency of early sharp waves (eSW) was slightly but not significantly reduced in the ipsilateral hemisphere compared to the contralateral one. Of note, a similar “non-significant” decrease had been previously reported by another group (Graf et al 2021 Fig S6C). As suggested by the reviewer, we can speculate that this slight decrease may result from a reduction of the sensory feedback re-afference originating from the right limbs. Indeed, we observed that movements of the right limbs (contralateral to the window implant) elicited a slightly smaller response than those from the left limbs. This observation has been added to Figure 1 - Supplement 1E and described in the results (lines 128-134) and discussion (lines 314-320).

      We have performed additional control experiments using EMG nuchal electrodes in two pups aged P5 and P6. We observed that, an hour following the surgery (corresponding to the recovery time in our experimental procedure), the composition of the sleep-wake cycle (with 70 to 80 % of active sleep) was comparable to previous reports (Jouvet-Mounier, 1969, Fig 4). This quantification was added to Figure 1- figure supplement 1B (lines 82-86).

      2) State-dependent parameters are not adequately described, controlled, and examined quantitatively to ensure that data from similar behavioral states is being used for analysis across ages. Network activity from wakefulness, REM/active sleep and NREM/quiet sleep should not be presumed to be indistinguishable.

      We would like to point out that our analysis across ages focused on the population response following animal movements, and not across all behavioral states. That said, it is true that two types of movements can be distinguished, namely the twitches and the complex ones. To take this behavioral heterogeneity into account, we have now separately quantified the hippocampal activation following twitches (movement during active sleep) and complex movement (during wakefulness). We show in Figure 2 - figure supplement 1B that the hippocampal response to twitches and complex movements is similar across ages. Thus, even if the amount of time spent in each behavioral state is modified over the developmental period that we have studied, we are pretty confident that it does not impact the transition we have described in the relationship between animal movements and hippocampal activity. Additionally, we were able to combine in one P5 mouse pup 2p-imaging with nuchal EMG recordings and separately computed the PMTH for movements observed during REM or wakefulness (Figure 2 - figure supplement 1C). We show that CA1 hippocampal neurons were activated time-locked to movement in both behavioral states, with only the amplitude of the population response differing between wakefulness than during REM. This point is now included in the result section (lines 148-152) and discussed (lines 324-327).

      3) Currently employed statistics are not rigorous, unified, or sensitive, and do not support all of the authors' claims. Data shown suggest potentially significant changes that have not been identified due to suboptimal statistical approach and/or underpowering.

      We obviously agree with this reviewer that rigorous statistics should be employed and can certify that the data analyzed in the submitted manuscript was carefully examined following that principle. We feel that his/her strong criticism regarding that point was not fully justified. In particular, we do not understand why statistical tests should be “unified” across different figures of the paper. Rather, statistical tests should be adapted to the sample size and distribution. Of course, the same tests were used for similar datasets. This revised manuscript now contains further description and justification of all the tests included in every figure panels.

      4) The authors use an artificial neural network approach to infer cell classification (pyramidal cell vs. interneuron). From the data provided, it is not possible to adequately evaluate whether these 'inferred' interneurons represent the same population as conventionally labeled interneurons.

      We thank the reviewer for this important remark and apologize for the lack of detailed description of our method to ‘infer’ interneurons. This method was previously published (Denis et al., 2020), and designed to identify interneurons from their calcium fluorescence signals in the absence of a reporter. Most importantly, this cell type classifier was trained and tested on a dataset in which interneurons were labeled using a reporter mouse line (GAD 76-Cre). This dataset is included in this article. This means that all the ‘labelled’ interneurons included here were also used for the training and the test dataset. As for the activity classifier, the training and test data sets covered all the developmental ages used in the study. Thus, the previously published statistics (accuracy/sensitivity) of this classifier should well account for the present analysis. This method is now described in better detail in the results (line 183) and methods parts (lines 616-619). We now also illustrate in the figures how this classifier can infer interneurons with 91% precision (split up of prediction vs ground truth in test data are reported from Denis et al) and that these ‘infered’ interneurons are activated with movement just as genetically ‘labeled’ interneurons (Figure 3 - figure supplement 1B-E).

      5) Functional GABAergic activity is not assessed across development (only at P9-10), limiting mechanistic conclusions that can be drawn.

      We thank the reviewer for this comment that reveals some lack of clarity in the previous description of our experiments. Indeed, functional GABAergic activity was also assessed before P9, however, given that there are no GABAergic axons in the CA1 pyramidal layer at early stages (for both CCK cf. Morozov and Freund 2003, and prospective PV cells cf. Figure 4A,B), there is no signal to be measured either. We have now added a new figure (Figure 4 - figure supplement 1) to clarify this point. In agreement with our Syt2 longitudinal quantification, we show, using tdTomato expression in the Gad67cre driver mouse line, that GABAergic perisomatic innervation is only visible after p9. This matches as well our attempted imaging experiments using axon enriched GCaMP in mice before P9.

      6) The present analyses are almost exclusively focused on movement-related epochs, substantially limiting conclusions that can be drawn as to what neural dynamics are actually occurring during epochs that the authors propose comprise internal representations.

      We agree with this reviewer that our study is focusing on movement-related episodes and that we are not assessing hippocampal representations, especially since the pups are recorded in conditions that minimize external environmental influences. Still, we observe that there is a switch in the distribution of spontaneous activity in CA1 after P9, with most activity occurring outside from the synchronous calcium events and detached from movement. The exact nature of this activity remains to be studied, however, it is most likely not evoked by extrinsic phasic inputs and rather represents local dynamics. We have now removed reference to ‘internal representations” or “internal models” in the two previous instances of use i(abstract and discussion) and replaced them, when possible by “self-referenced” representations alluding to self-generated-movement-triggered activity.

      Reviewer #2 (Public Review):

      The study by Dard et al aims to uncover the post-natal emergence of mature network dynamics in the hippocampus, with a particular focus on how pyramidal cells and interneurons change their response to spontaneous limb movement. Several previous studies have investigated this topic using electrophysiology, but this study is the first to utilize 2-photon calcium imaging, enabling the recording of hundreds of individual neurons, and discrimination between pyramidal cell and interneuron activity. The aims of the study are of broad interest to all neuroscientists studying development (including neurodevelopmental disorders) and the basic science of network dynamics.

      The main conclusions of the study are that (1) in early life, most pyramidal cell activity occurs in bursts synchronized to spontaneous movement, (2) by P12, pyramidal cell activity is largely desynchronized from spontaneous movement, and indeed movement triggers an inhibition in the pyramidal network (approximately 2-4sec following movement), (3) unlike pyramidal cells, interneuron activity remains positively modulated by movement, throughout the period P1-P12, (4) the changes in pyramidal cell activity are achieved by means of increases in perisomatic inhibition, between P8 and P10.

      It should be noted that conclusion (1) and to some extent conclusion (2) have already been reported, by previous studies using electrophysiology (as clearly acknowledged by the authors).

      A principal strength of this manuscript is the extremely high quality of the data that the authors are able to use in support of (1) and (2), with very large numbers of neurons being analyzed to clearly delineate the relationship between neural activity and movement. The finding that pyramidal cells become inhibited following movement is novel, I believe. Furthermore, this study offers the first description of the development of interneuron activity, in this experimental context.

      The main weakness of the manuscript is that the authors cannot provide direct functional evidence for the conclusion (4). As shown by the analysis in support of conclusion (3), interneuron activity with respect to movement does not actually change during the developmental period being studied, making it prima facie unlikely that this is the cause of changes in pyramidal network responses to movement. To overcome this, the study describes the activity of GABA-ergic axon terminals in the pyramidal cell layer at P9-10, but it appears that due to technical problems this was not possible in younger animals. It, therefore, remains unknown if the functional inhibitory inputs to pyramidal cells are changing over the ages studied.

      We thank this reviewer for acknowledging the broad interest of the study, its novelty, and the high quality of our dataset. The main concern raised by this reviewer (lack of axonal activity experiments in younger pups) was in fact a misunderstanding of the experiments performed and we apologize for this lack of clarity. Reviewer #2 is correct in that the relationship between interneuron activity and movement does not change over the developmental period studied. However, we have only included GABAergic axonal imaging after P9, not due to a technical problem but rather because there are no GABAergic axons in the pyramidal layer before (we see GABAergic neurites only outside the layer). We have now dedicated a new supplementary figure (Figure 4 - figure supplement 1) to explain why we could not image GABAergic axons in the pyramidal cell layer at earlier developmental stages.

      The study does describe increases in the protein synaptotagmin-2, in the pyramidal cell layer, between P3 and P11, but in my opinion, this molecular evidence for increases in perisomatic inhibition does not match the (very high) standards of neuronal function/activity reported elsewhere in the manuscript.

      In the absence of parvalbumin expression in early development, synaptotagmin-2 has been described as the best marker of prospective PV boutons in the cortex (Someijer et al. 2012). This molecular marker has been used in other studies (Modol et al. Neuron 2020, Sigal et al. PNAS 2019). We respectfully disagree with this reviewer, and think that quantification from immunohistochemistry experiments is as high of a standard as functional imaging as it is the only way to describe the anatomical structure of active neuronal processes.

      Reviewer #3 (Public Review):

      Dard and colleagues use both in vivo calcium imaging and computational modelling to explore the relationship between the early movement of CA1 hippocampal activity in neonatal mice.

      The manuscript represents a significant technical advance in that the authors have pioneered the use of multiphoton imaging to record activity in the hippocampus of awake neonates. Overall the presentation of the data is convincing although I would recommend a number of tweaks to the figures and the inclusion of some raw data to better direct and inform non-expert readers. I also believe that the assessment of long-range inputs using pseudo-rabies virus should be present in the main body of the manuscript as opposed to supplemental material. The computational modeling supports their idea but does not exclude other possibilities. Further, it is not clear to what extent the strengthening of local excitatory input onto the interneurons - the dominant route of recurrent input in the hippocampus, is important; something that the authors acknowledge in the discussion.

      Overall, I believe the paper adds to our knowledge of the timeline of development and further identified the postnatal day (P)9-P10 window as important in emergent cortical processing. The fact that this is linked to an increase in GABAergic innervation has implications for our understanding of both normal and dysfunctional brain development.

      We thank the reviewer for his constructive comments and helpful suggestions. As suggested, this revised version now includes some raw-data and better descriptions to guide non-expert readers. Regarding the inclusion of rabies-tracing experiments in the main part of the MS, we would like to state here that there are still a number of limitations with the use of this method during development (incubation time, spatial precision of the injection site, etc. ) that limit the interpretation and quantification of the results. As a result, we have decided to remain only qualitative, focusing on identifying the brain regions that could send projections onto CA1 pyramidal cells and interneurons. We believe that this type of description is more suited for a supplementary figure than a principal figure, but will be happy to change this, if the reviewer and editors think otherwise.

    1. Some students do as well in online courses as in in-person courses, some may actually do better, but, on average, students do worse in the online setting, and this is particularly true for students with weaker academic backgrounds.

      I think this statement is important because it shows that the argument is not as simple as, "Online courses are bad and in-person classes are good". It shows that, while plenty of students do just fine learning online, the online courses themselves lack a lot of the edge that an in-person course can give a student. This is an important observation because we can use this research to optimize the way we learn online moving forward!

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2021-01219

      Corresponding author(s): Rajan, Akhila

      1) General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      The goal of this study is to:

      • Define how prolonged exposure to a high-sugar diet (HSD) regime alters both the lipid landscape and feeding behavior.
      • Determine how changes in lipid classes within the adipose tissue regulates feeding behavior. Key findings:

      In this study, by taking an unbiased systems level and genetic approach, we reveal that phospholipid status of the fat tissue controls global satiety sensing.

      Impact of Key findings:

      By uncovering a critical role for adipose tissue phospholipid balance as a key regulator of organismal feeding, our work raises the possibility that the rate-limiting enzymes in phospholipid synthesis, including Pect, are potential targets for therapeutic interventions for obesity and feeding disorders.

      Peer review comments:

      This study has immensely benefited from the thoughtful peer-review of three reviewers. As per their recommendations, we have performed a major revision by performing additional experiments (see summary table below in next section) and strived to address the major concerns raised. Based on our reading, there were two major concerns that overlapped between all three reviewers raised. They are as follows:

      • Does the genetic disruption of Pect in fly fat body alter phospholipid levels? Two reviewers (#2 and #3) recommended that we perform lipidomic analyses on adult flies with adipose tissue specific knockdown of For the revised version, we have completed this lipidomic experiment, and present results as a new main Figure 6, Supplemental S7 and S9.
      • Is the dampened HSD induced hunger-driven feeding (HDF) behavior because of increased baseline feeding (#1 and #3)? In addition, reviewer #1, asked us whether HSD flies experience an energy-deficit? In other words, we were asked to uncouple whether what we observed was HSD-driven allostasis or indeed, as we had interpreted, that HSD dampened hunger-driven feeding response.

      Hence, they recommended that we:

      1. Re-analyze our hunger-driven feeding datasets and present non-normalized data (also requested by Reviewer #3) and show baseline feeding behavior on HSD. To address this, we have completed this analysis and present our results in Figure 1B-D and S1.
      2. Determine whether the HSD fed flies display an energy deficit on starvation. To this end, we performed an assayed starvation-induced fat mobilization on HSD, results for this are now presented on Figure 1E-G and S2. Conclusions after the revision:

      First, it is important to note here that the additional experiments have not caused a significant revision of the major conclusions of the original version of our study. In fact, we hope that the revised version provides clarity and further substantiation to our original arguments.

      • The lipidomics experiments on Pect fat-specific knock-down flies show that reducing Pect in fat-body causes a significant reduction in certain PE lipid species (PE 36.2 specifically- Figure 6B). This is consistent with a prior report on lipidomics of the Pect null allele by Tom Clandinin’s group (PMID: 30737130). Furthermore, we note that when Pect is knocked down in the fat body, there is a significant increase in two other classes of phospholipids LPC and LPE (Figure 6A). Together, this suggests that an imbalance in phospholipid composition in the absence of Pect activity in fat.
      • The starvation-induced fat mobilization experiments show that despite being fed a prolonged HSD, adult flies sense starvation and effectively mobilize fat stores, at a level comparable to Normal food (NF) fed adult flies, suggesting that even despite HSD exposure, adult flies experience an energy deficit on starvation.
      • In our non-normalized data, we find that the baseline feeding events are not significantly altered between HSD and NF-fed flies (Figure 1D). This suggests that the effects we observe are not due to an increase in the “denominator”, but a dampening of hunger-driven feeding on HSD. With regard to our original version, all three peer-reviewers found that the study was interesting, significant, important, and novel – Reviewer #1: “The work is potentially novel and interesting”; #2 : “I find the study to be potentially very important - the authors combine a longitudinal study that would be difficult in any other model with the powerful genetic tools available in the fly. The conclusions are mostly convincing”; #3: “This manuscript demonstrates how fat body Pect levels affect HSD induced changes in hunger-driven feeding response. I agree with all the reviewers points; potentially very interesting”. But had requested that we provide further substantiation and clarification.

      We sincerely hope that the peer-reviewers find that our revised version with additional new experimental datasets, improved data visualization, and the presentation of non-normalized raw data points, makes this study clear, compelling, and well-substantiated.

      • Point-by-point description of the revisions This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *

      Below we summarize in Part A, the key experiments that were performed to address the major concerns. In Part B, we provide a point-point response to each reviewer with embedded datasets.

      Part a:

      We performed several new experiments, including:

      • To address the primary concern of Reviewer #1 regarding whether the HSD flies have a similar energy deficit to Normal food (NF) fed flies, we performed analysis of stored neutral fat Triacylglycerol (TAG) reserves and how HSD fed flies mobilized fat stores on starvation. We present these results in Figure 1E-G, S2. These results show that HSD-flies despite accumulating more TAG (S2), breakdown a similar amount of fat reserves as NF-fed flies on starvation at any time-point (Figure 1E-G). This suggests that HSD-fed flies do sense and respond to energy deficit.
      • To address concerns of reviewer #2 and #3 on whether Pect genetic manipulation affects specific phospholipid classes, we performed lipidomic analyses. The table below summarizes the new 3 new figures and 4 supplemental figures (blue text are all new figure numbers and figure panels) and three new Supplementary files as per reviewer’s request.

      Figure #

      Main point

      New datasets in revision

      Companion Supplement

      1

      HSD alters feeding behavior, but flies still breakdown TAG on starvation.

      TAG storage and breakdown over longitudinal HSD shows that HSD and NF fed flies show similar levels of TAG breakdown on starvation, despite consistently elevated TAG on HSD. This supports the idea that flies do sense starvation even on HSD, but there is a uncoupling of the feeding behavior after Day 14. Revised the data representation of Figure 1 to show non-normalized data over time. S1 and S2 companions are new in the revision. Panels 1D to 1E are new for the revision.

      S1- Raw data of feeding events plotted.

      S2 Elevated TAG at all time points.

      2

      HSD causes insulin resistance

      S3A added to show that insulin transcript levels remain the same in response to reviewer #3’s concerns.

      S3

      3

      Phospholipid concentration raw data from lipidomic on Day 7 and Day 14 HSD suggest that PC, PE levels are increased on Day 14 HSD.

      Figure 3 revamped to show new data visualization and non-normalized raw data to address Reviewer #2’s major concerns. S4A and S4B added. In addition Supplementary File 1 and 2 provided with raw lipidomics data as per reviewer #2’s request.

      S4.

      S4A- non normalized raw data of all other lipid classes on HSD.

      S4B- fatty acid species data on Day 14 added as per request of rev.#2.

      4

      HSD regulate Apo-I levels in the IPCs and phenocopies Pect KD.

      Added Figure 4A to show that HSD phenocopies Pect-KD in terms of delivery to brain

      S5 showing the validation of the Apo-I antibody.

      S6 validation of Pect KD and over-expression and Pect mRNA levels dysregulation on HSD.

      5

      Pect RNAi is insulin resistant

      N/A

      N/A

      6

      Pect knockdown shows significant increase in LPC and LPE, and a non-significant reduction in PC, PE levels. Specifically, the PE lipid class PE36.2 is downregulated.

      Fig 6, S7, S9 are completely new based on reviewer #2 and #3 requests. In addition Supplementary File 3 provided with raw lipidomics data as per reviewer #2’s request

      S7, S8, S9#.

      S7- new Pect KD other classes

      S8- new PE classes for day 14 and Pect associated classes.

      S9- Pect OE lipidomics

      7

      Pisd and Pect activity in adipocytes are required for hunger-driven feeding behavior in normal diets

      Pisd RNAi data was moved from supplement to main figure.

      N/A

      Note on revised text: We have revised text not only in the results section, but also as per reviewer #2’s recommendation, we have revamped our introduction and discussion as well. Since the manuscript has been significantly revised to include a main figure 6, fully altered Figure 1 and 3, multiple new supplemental figures, the changes in text are extensive. Hence, they are unmarked in the main text. Nonetheless, we hope that the reviewers will be able to evaluate these changes, as we have provided the specific locations in text and embed key figures in the point-point response below.

      __Part B: __Point-Point responses to reviewer comments.

      Reviewer #1 comments in Blue, author response in black.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Kelly et al. show that the difference between the feeding behavior of fed and starved flies (hunger-driven feeding; HDF) is absent in animals fed a high-sugar diet (HSD) for two weeks or more. The disappearance of HDF with HSD coincides with changes in phospholipid profiles caused by HSD. Furthermore, RNAi-mediated downregulation of Pect in the fat body-a key enzyme in the PE biosynthesis pathway-phenocopies physiological effects of HSD. Moreover, downregulation or overexpression in the fat body abolishes or induces HDF, respectively, abolishes or induces HDF, respectively, independent of HSD treatment.

      Overall, the manuscript is well-written and the phenotypes are clear. However, I have major concerns regarding the authors' interpretation of the data and their conclusion. Most importantly, while it is clear that the authors' high-sugar dietary treatment affects feeding behavior and physiology, I am not convinced that the changes can be considered "hunger-driven"-which is central to the main point of the manuscript. Therefore, it is my recommendation that the authors substantially revise the manuscript by either showing additional/re-analyzed data that rule out alternative hypotheses, or rewriting the manuscript keeping alternative interpretations in mind.

      We are thankful to this reviewer for their thoughtful critique, and constructive and specific suggestions on how we can redress these concerns. We have taken on board the concerns of this reviewer regarding our interpretation of whether the changes in feeding behavior can be considered hunger-driven or not. Based on their advice, we have made significant changes by addressing: i) does HSD increased baseline feeding- we now show non-normalized raw data and data supports conclusion that baseline feeding is not higher; ii) whether HSD- fed flies can sense an energy deficit at levels similar to NF fed flies- we show that HSD flies sense energy deficit. We have provided detailed response below, and we hope the reviewer finds the additional datasets and re-analyzed data are consistent with the interpretation that prolonged HSD dampens starvation induced feeding. In addition to this key concern this reviewer has made a many other salient points that we have addressed with additional data or by clarifying the text.

      Major comments: 1) The data do not sufficiently show that the long-term HSD regime disrupts "hunger-sensing." The manuscript should address alternative hypotheses by showing raw instead of normalized data, rewriting the manuscript with a new central conclusion, or running additional experiments that actually show a defect in hunger-driven response. a. The main results that the authors rely on for the argument is that the ratio of feeding events that the starved and non-starved flies eat is different between the groups fed normal or HSD. However, because the authors only show normalized data (normalized to non-starved flies; Fig. 1), it is difficult to tell whether the change is due to a chronically increased feeding in non-starved HSD flies-maybe in perpetual hunger-like allostasis-or dampened starvation response. Indeed, the data shown in Fig S1 show that flies fed HSD for as short as 5 days show more frequent feeding events compared to age-matched controls fed normal food. It is possible that because the HSD-fed flies eat more than NF-fed flies, even without being starved, the ratio of starved/non-starved feeding is lower in the HSD-fed group-due to changes in the denominator, rather than the numerator.

      We have taken onboard this concern regarding presenting only normalized data, and that clouded the interpretation and left open other possibilities. In the completely revised figure 1 and S1. We now show non-normalized data, as a function of time. First we note that HSD-fed flies, do not show higher baseline feeding that NF fed flies, except on Day 10 of HSD, when there is a modest but significant elevation (Figure 1D).

      Nonetheless, on Day 10 HSD, flies still display increased hunger-driven feeding HDF (Figure 1C), it is only after Day 14 HSD that HSD dampens the starvation induced feeding.

      1. It is also possible that the HSD-fed flies are simply not in as big an energy deficit physiologically, due to the increased fat deposits they've accumulated (as the authors show later in the manuscript). It may take longer for the fat HSD flies to reach substantial energy deficiency than the NF flies, but they still may eventually be able to appropriately respond to hunger, just like NF flies. In such case, it would be a misnomer to call this behavioral change a 'defect in hunger-driven feeding behavior.' Maybe an experiment with a dose-response curve of "hunger driven feeding response" as a function of duration of starvation would help? Prompted by this reviewers question, we asked whether HSD fed flies, that have a higher baseline neutral fat store (Triacylglycerol-TAG) level, and if HSD-fed flies can sense energy deficit. For this, we revisited the longitudinal assays for neutral fat triacylglycerol (TAG) storage that our lab had generated, along with the HSD-HDF studies. We now present this evidence as Figure 1E-1G and Figure S2. Overall, our experiments point to the idea that adult flies fed HSD, are able to sense and mobilize TAG stores effectively throughout the 28-day time point that we analysed.

      First as shown in Figure S2, flies fed HSD display an increase in TAG levels. But it is to be noted that while TAG stores increase, the increase is not linear with time. This suggests that adult flies exposed to HSD store excess energy as TAG, but the increased TAG stores stay within a certain range despite the length of HSD exposure. This suggests that adult flies on HSD still display TAG homeostasis.

      Next, to directly address the reviewers point about HSD fed flies not sensing an energy deficit, we subject HSD-fed flies to an overnight starvation, same regime as used in the overnight feeding experiments, and asked whether they mobilize TAG. We noted that flies exposed to HSD breakdown TAG throughout the 28-day exposure at statistically significant levels for Day 3- Day 28, except on 14 and 21 days (Figure 1F). While there is TAG mobilization on Day 14 and 21, the difference is not statistically significant. Nonetheless, we note the same levels TAG breakdown for normal lab food (NF) fed flies on Day 14 and 21 (Figure 1E). Overall, HSD fed flies sense and display energy deficit, as measured by TAG store mobilization, throughout the 28 days of HSD exposure, at levels comparable to NF-fed flies (Figure 1G).

      Taken together, these results suggest that while HSD-fed flies experience an energy deficit on starvation, at levels comparable to NF-fed flies, throughout the 28-day time point assayed. But, their starvation driven feeding-response is dampened by Day 14 and by Day 28, the HSD-fed flies display more feeding events than HSD starved flies. These results are consistent with the interpretation that in HSD-fed flies the starvation-induced feeding behavior becomes desynchronized from the starvation induced TAG-mobilization, suggesting that there is an absence of hunger-driven feeding.

      2) How can you be sure that lower Dilp5 immunofluorescence is indicative of increased Dilp5 secretion? Wouldn't decreased production of dilp5 also have the same results?

      It has been shown previously in HSD fed larvae are hyperinsulinemic, i.e., they have 55% increase in circulating Dilp2 ( PMID: 22567167). Additionally, we have shown that ectopic activation of the insulin-producing neurons by expressing TRPA1, an ion channel that activates neurons, reduces Dilp5 accumulation without a change in Dilp5 mRNA levels (PMID: 32976758), suggesting that reduced Dilp5 accumulation, without alterations to mRNA levels is a proxy for increased secretion. Now, in response to this concern, in the revised manuscript, we have added qPCR data of Dilp2 and 5 (Figure S3A), which show no difference in expression levels after 14 days on HSD. Therefore, there is no dip in Dilp5 mRNA production. Given that Dilp2 and Dilp5 mRNA levels remain the same, but we see reduced Dilp5 accumulation, we interpret this to mean that Dilp5 secretion is increased.

      1. Also, the authors should state in the main text that it is Dilp5, not just any Dilp. Thanks for this suggestion and we have fixed this and referred to Dilp5 specifically throughout the text in the results section.

      3) Data presentation: a. Sometimes the data are normalized to NF (Fig 4B-C), sometimes not (ex. Fig 4A, S4C). Unless there is a specific rationale for the data transformation, it would be more appropriate to show untransformed data (ex. Fig 4A, S4C), especially as the authors use two-way ANOVA to determine significance. Only showing the differences implies comparison against a hypothetical mean (i.e. μ0=0), not between two group means.

      We thank the reviewers for bringing this issue to our attention. We updated all the figures to show untransformed data in the revised manuscript.

      1. Some figures show both individual data points and summary statistics (mean, SD, ... ex. Fig 2A)-which I believe is ideal-but some show only one or the other (ex. Fig 2B, no summary statistics; Fig. 3, no data points. The manuscript would read more convincing if data visualization is consistent across figures. We thank the reviewers for their feedback. We have made changes to all the figures in the revised manuscript to improve visual consistency.

      Minor comments: 1) High sugar diet: what is the actual sugar concentration in the NF v. HSD diets? The authors write that the HSD diet contains "30% more sugar" than the NF, but providing the final sugar concentrations-sucrose or others-would be informative for other scientists studying the effect of high sugar diets.

      We thank the reviewer for their suggestion and now we have updated the methods to include this sentence. After 7 days, flies were either maintained on normal diet or moved to a high sugar diet (HSD), composed of the same composition as normal diet but with an additional 300g of sucrose per liter”.

      1. Additionally, the definition of HSD is inconsistent. Main text (Page 5, line 17) states that their HSD is "60% more sugar than normal media," whereas the figure legend (Fig 1) and the Methods state that the HSD contains "30% more sugar." We apologize for this egregious typo in the figure legend! We have now fixed this to say 30% HSD. Only 30% HSD was used throughout this study.

      2) Starvation medium: please provide justification for why the authors used 1% sucrose/agar for starvation medium, instead of plain agar/water that most labs use. At least clarify and provide a reference for the claim that the 1% sucrose/agar "is a minimal food media to elicit a starvation response."

      We are very grateful for this reviewer identifying this this methods description error and bring it to our attention. We used 0% sucrose agar for overnight starvation in this study as most labs do. The error occurred because we were using another manuscript from the lab to help draft the methods section (PMID: 29017032). In that study, where we assayed the effect of chronic starvation our lab used: “1% sucrose agar for 5 days at 25C”. However, in this current study, because we are testing acute effects of overnight starvation, we are using 0% sucrose agar.

      3) Pect mRNA level is higher with HSD. This is surprising because not only, as authors mention, is increased PC32.2 with HSD suggests lower Pect activity, but also because Pect RNAi phenocopies long-term HSD in HDF behavior, lipid morphology, FOXO accumulation in fat body. The authors speculate that the data "likely shown an upregulation in an attempt to mediate the Pect dysregulation occurring at the protein level." If that were true, a western blot may be informative. Zhao and Wang (2020, PLoS Genetics) generated a Pect antibody that seems compatible with western blot applications. That being said, I don't think such data is critical for the manuscript. I mention this simply as a suggestion for the authors. a. page 8, line 22-23, did you mean to write "Given how PC32.2 is elevated after 14 days of exposure to HSD, we assumed that Pect levels would be low for flies under HSD," not "high?" Otherwise the subsequent 2 sentences don't make sense.

      We agree that the most confusing aspect of the study was that Pect mRNA levels being very high on Day 14 HSD, but nonetheless the effects of Pect-KD phenocopied HSD. To resolve this, we have now performed lipidomic analyses on whole adult flies, when Pect is knocked-down (KD) by RNAi in the fat tissue. We now present a new dataset in Figure 6. Two striking changes occur. They are:

      1. Pect-KD shows increase in the phospholipid classes LPC and LPE (Figure 6A). In contrast, LPE is significantly downregulated on HSD Day 14 (Figure 3).
      2. Pect-KD shows a significant reduction in specific class of PE 36.2 (Figure 6B). Our data regarding increase in PE 36.2 agree with a previous lipidomic analyses of Pect mutant retina (PMID: 30737130). In contrast, PE 36.2 trends upwards on 14 day HSD (Figure S7C) though not significantly. On 14-day HSD consistent with extreme upregulation of Pect mRNA fed flies (Figure S6A; Pect mRNA 200-250 fold), PE trends upwards on 14-day HSD (Figure 3) and PE 36.2 trends higher (Figure S7C). We note that on the surface of it PE and LPE per se are contrasting between 14-day HSD lipidome and fat-specifc Pect-KD. But there is a significant commonality that under both states there is an imbalance of phospholipids classes PE and LPE. Hence, we propose that maintaining the compositional balance of phospholipid classes PE and LPE is critical to hunger-driven feeding and insulin sensitivity. Hence, either increase or decrease, of these key phospholipid species, may lead to abnormal hunger-driven feeding.

      We agree that a western blot would be informative as well, but we were unable to obtain the reagent from Dr. Wang’s group, precluding us from performing this request. See email snapshot.

      To ensure that we appropriately discuss and clarify this issue, we have now included a section in the discussion - Page 14 Lines 26-34- under the subtitle “The implications of relationship between Pect levels and HSD”. We have pasted an excerpt from that subsection below for this reviewers assessment.

      Also, we note that over-expression of Pect cDNA in the fat-body does not alter phospholipid balance (Figure S9) and indeed improves HDF on HSD (Figure 7B). While this may appear inconsistent, it is critical to note that over-expression of Pect cDNA using UAS/Gal4 only increases Pect mRNA expression by 7-fold (Figure S6A), whereas HSD causes its upregulation by 250-fold (Figure S6B). Hence, we speculate that an increased ‘basal’ level of Pect such as by that provided by a cDNA over-expression in fat, may be protective to the negative effects of HSD (Figure 7B) without affecting overall phospholipid levels (Figure S9) , but extreme upregulation Pect on HSD affects the PE and LPE balance (Figure 3).”

      Reviewer #1 (Significance (Required)):

      The work is potentially novel and interesting, but at this stage it's difficult to interpret what the phenotype signifies. Although the manuscript could be revised simply by modifying the text, experimentally addressing the concerns would significantly improve the work.

      In sum, we hope we have addressed the key concern for Reviewer #1 as to whether the behavior we report here is indeed a dampening of starvation-induced feeding, or an effect of increase in baseline feeding. We hope that by reviewing our non-normalized data, they can appreciate that it is the former. Also, we hope that Reviewer #1 appreciates that we have strived to address the concerns by additional experiments, to clarify our findings and improve the impact of the work.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This intriguing manuscript by Kelly and colleagues uses the fruit fly Drosophila melanogaster as a model to understand how diet-induced obesity alters the feeding response over time. In particular, the authors findings indicate that chronic exposure to a high-sugar diet significantly alters the starvation-induced feeding response. These behavioral studies are complemented by a lipidomics approach that reveals how a chronic high sugar affects many lipid species, including phospholipids. The authors then pursue mechanistic studies that indicate phospholipid metabolism within the fat body appears to remotely affect insulin secretion from the insulin producing cells. Moreover, the changes in phospholipid abundance are associated with changes in insulin-signaling, including increased insulin secretion from the IPCs and elevated levels of FOXO within the nucleus.

      I find the study to be potentially very important - the authors combine a longitudinal study that would be difficult in any other model with the powerful genetic tools available in the fly. The conclusions are mostly convincing, but a few follow-up experiments are required:

      We are grateful for the reviewers constructive, detail-oriented, and balanced feedback, and their recognition of the value of this study. Now, we have performed additional experiments to address the key concerns raised by all reviewers. We hope that on reading the revised version of our study, that the reviewer continues to feel positive about the message of this study and its potential impact.

      1. The key conclusions from the manuscript assume that manipulation of Pect expression levels alters phosphatidylethanolamine (PE) levels. However, the authors make no attempt to verify that the genetic experiments described herein actually affect PE levels. At a minimum, changes in PE levels should be verified for the Pect knockdown and overexpression lines. Similarly, there is no evidence that manipulation of either EAS or Pcyt2 induces the expected metabolic effects. I'm not asking that the longitudinal feeding experiments be repeated, simply that the authors measure the relevant lipid species, preferably with a targeted LC-MS approach.

      Prompted by this reviewer, we performed targeted LC-MS on whole adult flies, on normal diet, to assess lipid levels for fat-specific Pect-KD and overexpression. We decided to focus on Pect, as its knock-down even on normal diet causes a dampened hunger-driven feeding behavior (Figure 7A) and phenocopied a 14-day HSD feeding phenotype.

      We now present a new dataset in Figure 6. Two striking changes occur:

      They are:

      Pect-KD shows a significant reduction in specific class of PE 36.2 (Figure 6B). Our data regarding decrease in PE 36.2 agree with a previous lipidomic analyses of Pect mutant retina (PMID: 30737130). It is to be noted that though overall levels of all PE species trend downwards, like the Clandinin lab study on Pect (PMID: 30737130), we did not find a significant change in the overall PC and PE levels.

      • Pect-KD shows increase in the phospholipid classes LPC and LPE (Figure 6A). In contrast, LPE is significantly downregulated on HSD Day 14 (Figure 3). On 14-day HSD consistent with extreme upregulation of Pect mRNA fed flies (Figure S6A; Pect mRNA 200-250 fold), PE trends upwards on 14-day HSD (Figure 3) and PE 36.2 trends higher (Figure S7C). We note that on the surface of it PE and LPE per se are contrasting between 14-day HSD lipidome and fat-specifc Pect-KD. But there is a significant commonality that under both states there is an imbalance of phospholipids classes PE and LPE. Hence, we propose that maintaining the compositional balance of phospholipid classes PE and LPE is critical to hunger-driven feeding and insulin sensitivity. Hence, either increase or decrease, of these key phospholipid species, may lead to abnormal hunger-driven feeding.

      Finally, fat-specific Pect-OE did not cause significant changes to lipid species (Figure S9). This could either be due to the fact that in fat-specific Pect-OE flies under normal food and that we were assaying whole body lipid levels and not fat-specific lipid changes. But to counter that, even a 60% reduction in Pect mRNA levels (Figure S6A), was sufficient to produce an effect on whole body phospholipid balance (Figure 6). Hence, we speculate that by maintaining a basally higher (7-fold higher Pect mRNA level Figure S6A), might allow 14-day HSD-fed flies to buffer the negative effects of HSD and we predict that it might take longer to disrupt the phospholipid balance and HDF response.

      We have now included a section in the discussion - Page 14 Lines 26-34- under the subtitle “The implications of relationship between Pect levels and HSD”. We have pasted an excerpt from that subsection below for this reviewers assessment.

      Also, we note that over-expression of Pect cDNA in the fat-body does not alter phospholipid balance (Figure S9) and indeed improves HDF on HSD (Figure 7B). While this may appear inconsistent, it is critical to note that over-expression of Pect cDNA using UAS/Gal4 only increases Pect mRNA expression by 7-fold (Figure S6A), whereas HSD causes its upregulation by 250-fold (Figure S6B). Hence, we speculate that an increased ‘basal’ level of Pect such as by that provided by a cDNA over-expression in fat, may be protective to the negative effects of HSD (Figure 7B) without affecting overall phospholipid levels (Figure S9), but extreme upregulation Pect on HSD affects the PE and LPE balance (Figure 3).”

      A central hypothesis in the study is that the HSD over a period of 14 days results in insulin resistant and that these changes are leading to changes in hunger dependent feeding. I would encourage the authors to determine if Foxo mutants are resistant to these HSD-induced effects on HFD.

      We thank the reviewers for this suggestion. However, given that dFOXO nuclear localization rather than expression levels regulate insulin sensitivity, we feel that disrupting dFOXO levels via mutation or knockdown will produce a plethora of indirect effects including developmental abnormalities (PMID: 24778227, PMID: 16179433, PMID: 29180716, PMID: 12893776). Our data suggest that chronic HSD treatment and Pect affect insulin sensitivity in fat tissue. However, we feel that investigating whether insulin sensitivity/FOXO signaling in fat tissue regulates feeding behavior is outside the scope of our work.

      1. In lines 25-30, the authors draw the conclusion that an increase in unsaturated fatty acid species is associated with the HSD and that these changes results in a more fluid lipid environment. While I agree with the model, the manuscript contains no evidence to support such a model. Either test the hypothesis or move the last line of the section to the discussion.

      We thank the reviewer for this important and insightful comment. We agree that the data we presented and discussed in the original version is at the moment speculative. Addressing the hypothesis that increase in unsaturated fatty acid species result in a more fluid lipid environment will require us to build tools and expertise. Hence, this hypothesis is better suited for exploration in a future study. Given this, we have moved this out of the results section into the Discussion section titled “HSD and fat-specific PECT-KD causes changes to phospholipid profile” (See excerpt below from page 13, lines 24-35).

      In addition to changes in phospholipid classes, we found that HSD caused an increase in the concentration of PE and PC species with double bonds (Figure S4C and S4D). Double bonds create kinks in the lipid bilayer, leading to increased lipid membrane fluidity which impacts vesicle budding, endocytosis, and molecular transport14,92. Hence it is possible that a mechanism by which HSD induces changes to signaling is by altering the membrane biophysical properties, such as by increased fluidity, which would have a significant impact on numerous biological processes including synaptic firing and inter-organ vesicle transport.”

      Also, as per the reviewer’s guidance, given that we are speculating here, we have also shifted this dataset from Main figure 4 to supplement S4C and S4D.

      In addition, lines 25-30 state that FFAs are increased after 14 days of a HSD. Figure 3A shows the exact opposite - FFAs are significantly decreased in 14 day fed animals despite being elevated in the 7 day fed animals. This is an interesting result that warrants discussion. Moreover, I would encourage to examine the lipidomic data more carefully to ensure that the text accurately portrays the lipid profiles.

      We apologize for misstating that FFAs are decreased on 14-day HSD in the lines 25-30. It was an error and we have corrected this. We agree with the reviewer that the reduction of FFA on Day 14-HSD is an intriguing and unexpected observation that needs to be emphasized and further discussed. To this end, we have added figure S4B, wherein we have provided the difference in FFA concentration (by species) after days 7 and 14.

      Furthermore, we have discussed what the potential meaning of reduced FFA at Day 14 implies in page 12, lines 19-27 of the Discussion section titled “HSD and fat-specific PECT-KD causes changes to phospholipid profile”. We have stated the following-

      We speculate that this reduction in FFA maybe due to their involvement in TAG biogenesis (PMID: 13843753). We were interested to see if the decrease in FFA correlated to a particular lipid species, as PE and PC are made from DAGs with specific fatty acid chains. However, further analysis of FFAs at the species level did not reveal any distinct patterns. The majority of FFA chains decreased in HSD, including 12.0, 16.0, 16.1, 18.0, 18.1, and 18.2 (Figure S4B). This data was more suggestive of a global decrease in FFA, likely being converted to TAG and DAG, rather than a specific fatty acid chain being depleted.”

      The processed lipidomics data should also be included as supplementary data table so that they can be independently analyzed by the reader.

      We thank the reviewer for this suggestion. As per the reviewers request, we have included the raw data as an attachment in our supplementary material (Supplementary Files 1-3.), so that interested readers can use the datasets generated in this study for future work and further analysis.

      Beyond these experimental suggestions, the manuscript needs significant editing for clarity. While I won't provide a comprehensive list, the authors need to provide accurate descriptions and annotation of genotypes (including w[1118], which is written as W1118), typos, and formatting. I've listed a few examples below:

      1. Page 3, Line 1 and 2: "...have been shown to impact feeding behavior and metabolism that leads to..." This is an awkward and grammatically incorrect sentence.
      2. Page 3, Lines 7-32 is one very large paragraph but contains concepts that should be broken down over at least three paragraphs.
      3. Page 3, Line 25: A description of the reaction catalyzed by Pect would be helpful for a manuscript focused on Pecte activity.
      4. Page 4, Line 10: "previously characterized method of eliciting diet induced feeding behavior." As stated in the text, the method is previously described yet the manuscript characterizing the method isn't cited.
      5. Figure legend 3 contains a random assortment of capitalized lipid species. Also, the names of lipid species are inappropriately broken into multiple names. Please use correct nomenclature throughout the manuscript.

      The list above is nowhere near comprehensive. The manuscript requires significant editing.

      We are grateful to the reviewer for drawing our attention to these errors. We have made significant edits to the revised manuscript to address the above-mentioned concerns, as well as made additional textual changes throughout and copyedited it. We hope that the reviewer will find the manuscript reads better and the clarity and preciseness is significantly improved.

      Reviewer #2 (Significance (Required)):

      I find the study to be potentially very important - the authors combine a longitudinal study that would be difficult in any other model with the powerful genetic tools available in the fly. The findings will significantly advance our understanding of how lipid metabolism links dietary nutrition with feeding behavior.

      Once again, we are grateful for this reviewer’s thoughtful critique and encouraging words regarding our work and its potential impact.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary: This manuscript uses Drosophila to investigate how diet-induced obesity and the changes in the lipid metabolism of the fat boy modulate hunger-driven feeding (HDF) response. The authors first demonstrate that chronic exposure (14 days) of high sugar diet (HSD) suppresses HDF response. Through lipidome analysis, the authors identify a specific class of lipids to be elevated upon chronic HSD feeding. This coincided with the changes in expression of Pect, an enzyme that regulates the biosynthesis of these lipids. Modulating the expression of Pect specifically in the fat body affected HDF response.

      We thank this reviewer for their rigorous and thoughtful critique and for identifying a key issue with our original study pertaining to a gap in how Pect mRNA levels on 14-day HSD are elevated but the Pect-KD phenocopies the HDF. Now by performing whole-body adult fly lipidomic on fat-specific Pect-KD we have resolved this issue and provided clarity on role of Pect in maintaining phospholipid homeostasis and thus subsequently impacts hunger-driven feeding. We hope the reviewer finds that the revised manuscript provides further clarity to the functional link between Pect’s role in fat-body and hunger-driven feeding.

      Major comments: The author claim that the HDF response in HSD is distinct between early (5d, 7d) and chronic (day 14) HSD feeding. However, the data seem to indicate that HDF response is significantly decreased at all time points in HSD. For example, at day 5 HDF response was increased only 3-fold in HSD (Figure 1C) compared to around 50-fold increase in NF (Figure 1B). The scale of the Y-axis in Figure 1B and 1C is an order of magnitude different. Including the starved data (NFstv and HSDstv) in Figure S1, normalized to NF fed group, would better visualize the overall trends. Related to this, having the source data for the actual number of feeding events would be useful (e.g., to see the baseline changes in feeding in different time points in Figure 1 and the effect of genetic manipulations in Figure 7).

      As per the reviewers request, we now have modified our graphs to show source data (Figure S1) and show the raw feeding events.

      Then in the non-normalized graphs we plot, over a longitudinal time course, baseline and hunger-driven feeding events (Figure 1B-D). We also show that HSD fed flies do not display increased baseline feeding (Figure 1D) suggesting that the effect we see on HDF are no clouded by increased baseline feeding.

      Yes, the reviewer makes an important point that HDF response on HSD fed flies is of a lower magnitude than NF fed flies. We think that is a biologically meaningful observation, as it suggests that flies have a remarkably fine-tuned ability to coordinate food-intake with nutrient store levels.

      ­­Now we have included a paragraph in the Discussion, Page 11 Lines 23-27, that say the following to ensure the readers appreciate this salient point raised by this reviewer.

      *It is to be noted that the HDF response of HSD-fed flies (Figure 1C, Days 3-10) is of lower order of magnitude than the NF-fed flies. This suggests that that in addition to sensing an energy deficit and mobilizing fat stores (Figure 1F, 1G, S1), HSD fed flies calibrate their starvation-induced feeding to compensate only for the lost amount of fat. Overall, this suggests that flies have a remarkably fine-tuned ability to coordinate food-intake with nutrient store levels. *

      The association between fat body Pect level and phospholipid levels is not clear. Day 14 of HSD feeding shows high expression of Pect in the fat body and elevated levels of PC32.0 and PC32.2. The authors assume the high expression of Pect in the fat body is due to the compensatory response, but there are no data indicating downregulation of Pect levels at the earlier time points of HSD feeding. A previous study demonstrated that Pect mutant flies have lower levels of PC32.0 but higher PC32.2 (PMID: 30737130).

      We agree that one puzzling aspect of the original version of this study was that Pect mRNA levels being very high on Day 14 HSD, but nonetheless the effects of Pect-KD phenocopied HSD. To resolve this, prompted by Reviewer #2 and #3 concerns, for this revised version we have now performed lipidomic analyses on whole adult flies, when Pect is knocked down (KD) by RNAi in the fat tissue. We now present a new dataset in Figure 6. Two striking changes occu. They are:

      1. Pect-KD shows increase in the phospholipid classes LPC and LPE (Figure 6A). In contrast, LPE is significantly downregulated on HSD Day 14 (Figure 3).
      2. Pect-KD shows a significant reduction in specific class of PE 36.2 (Figure 6B). Our data regarding increase in PE 36.2 agree with a previous lipidomic analyses of Pect mutant retina (PMID: 30737130). In contrast, PE 36.2 trends upwards on 14 day HSD (Figure S7C) though not significantly. On 14-day HSD consistent with extreme upregulation of Pect mRNA fed flies (Figure S6A; Pect mRNA 200-250 fold), PE trends upwards on 14-day HSD (Figure 3) and PE 36.2 trends higher (Figure S7C). We note that on the surface of it PE and LPE per se are contrasting between 14-day HSD lipidome and fat-specifc Pect-KD. But there is a significant commonality that under both states there is an imbalance of phospholipids classes PE and LPE. Hence, we propose that maintaining the compositional balance of phospholipid classes PE and LPE is critical to hunger-driven feeding and insulin sensitivity. Hence, either increase or decrease, of these key phospholipid species, may lead to abnormal hunger-driven feeding.

      On day 14, HDF response was increased 70-fold in w1118 flies in NF (Figure 1B; w1118), but only 2.5-fold in lpp>LucRNAi control flies in NF (Figure 7A). This suggests that lpp-gal4 driver lines have a significant effect on HDF response. Using a different fat-body specific Gal4 line would be necessary to validate conclusions.

      Regards reduced HDF magnitude, in our experience using UAS-Gal4 reduces HDF response magnitude consistently and cannot be compared to w1118 which is more robust. To account for background differences, we use Uas-Gal4 with control RNAi. It clearly shows differences in HDF response on starvation, but Pect and Pisd RNAi does not (Figure 7A). Hence, given that this experiment internally controls for any changes in HDF response for UAS-Gal4>RNAi, we conclude that HDF response in disrupted in Pect and PISD KD (Figure 7).

      We only presented the Lpp-driver in our study, as this driver is the only fat-specific driver that has no leaky expression in other tissues, and is specific to fat as apolpp promoter used to generate this Gal4 line is only expressed in fat tissue (Eaton and colleagues, PMID: 22844248). Other widely used fat-specific drivers, including the pumpless-Gal4 (ppl-Gal4) driver has leaky expression in gut or other tissues (See Table 2 of this detailed study by Dr. Drummond- Barbosa https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7642949/). If the reviewer is aware of a fat-specific Gal4 line, other than Lpp-Gal4, which has a highly specific expression in the fat tissue without leaky expression in other tissues, then we are happy to take onboard the reviewer’s suggestion and try that fat-specific Gal4 that they suggest.

      HSD feeding promotes Pect expression (Figure S3C) and global changes in phospholipid levels (Figure 3, 4). Therefore, shouldn't Pect overexpression (not Pect RNAi) in a normal diet mimic HSD feeding state and promote loss of HDF response? Conversely shouldn't knockdown of Pect in HSD rescue loss of HDF response?

      We agree that a puzzling aspect is that Pect mRNA levels are significantly elevated in HSD Day-14, but Pect-KD showed displays the inappropriate HDF response. As we have described in our response to this reviewer on Page 19, we believe that Pect-KD and HSD disrupt PE and LPE balance overall but in different ways. Whereas Pect-OE using cDNA expression in fat body does not cause a significant change to any lipid class (Figure S9), and our results suggest that basally higher level of PECT is likely to be protective on HSD with respect to HDF(Figure 7B).

      To ensure that we appropriately discuss and clarify this issue, we have now included a section in the discussion - Page 14 Lines 26-33- under the subtitle “The implications of relationship between Pect levels and HSD”. We have pasted an excerpt from that subsection below for this reviewers assessment.

      Also, we note that over-expression of Pect cDNA in the fat-body does not alter phospholipid balance (Figure S9) and indeed improves HDF on HSD (Figure 7B). While this may appear inconsistent, it is critical to note that over-expression of Pect cDNA using UAS/Gal4 only increases Pect mRNA expression by 7-fold (Figure S6A), whereas HSD causes its upregulation by 250-fold (Figure S6B). Hence, we speculate that an increased ‘basal’ level of Pect such as by that provided by a cDNA over-expression in fat, may be protective to the negative effects of HSD (Figure 7B) without affecting overall phospholipid levels (Figure S9) , but extreme upregulation Pect on HSD affects the PE and LPE balance (Figure 3).”

      We would have liked to test Pect protein expression on HSD, but since we were unable to access antibodies for Pect published in a prior study (PMID: 33064773) from Dr. Wang’s lab (see Page 10-11, of response to Reviewer #1). Hence, we were unable to test how the proteins levels of Pect correlate with the 250-fold increase mRNA expression.

      In conclusion, we hope the reviewer appreciates that our results regarding Pect function are consistent with the main conclusion that achieving the right phospholipid balance between PE and LPE, is critical for an organism to display an appropriate HDF response.

      Minor comments: All graphs should plot individual data points and showed as box and whisker plot as much as possible.

      Thanks for this suggestion, we have added individual data points to the vast majority of figures in the paper. We have made exceptions to graphs such as seen in figure 1 and FigureS4B-D where we find individual data points add an unnecessary layer of complexity. We hope these changes provide additional clarity and strength to the claims made in this manuscript.

      Data for day 14 missing in Figure S4A and S4B.

      We have provided Day 14 for the PC composition and PE composition, due to changes in Figures, they are now S7A and S7B.

      Reviewer #3 (Significance (Required)):

      The interactions between diet-induced obesity, peripheral tissue homeostasis and feeding behavior is an interesting topic that can be addressed using Drosophila. This manuscript demonstrates how fat body Pect levels affect HSD induced changes in hunger-driven feeding response. However, at this point, the functional association between fat body Pect level, global phospholipid level, and loss of hunger-driven feeding response in chronic HSD feeding is not clear.

      We hope the revised data, and discussion of the paper, provides well-substantiated functional association on the importance of maintaining phospholipid balance, driven by Pect enzyme, as a critical regulator of hunger-driven feeding behavior. As stated in the revised discussion, the key take home message of our manuscript is that on prolonged HSD exposure PC, PE and LPE levels are dysregulated, the loss of phospholipid homeostasis coincided with a loss of hunger-driven feeding. Following this lead on phospholipid imbalance, we then uncovered a critical requirement for the activity of the rate-limiting PE enzyme PECT within the fat tissue in controlling hunger-driven feeding.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript, Kelly et al. show that the difference between the feeding behavior of fed and starved flies (hunger-driven feeding; HDF) is absent in animals fed a high-sugar diet (HSD) for two weeks or more. The disappearance of HDF with HSD coincides with changes in phospholipid profiles caused by HSD. Furthermore, RNAi-mediated downregulation of PECT in the fat body-a key enzyme in the PE biosynthesis pathway-phenocopies physiological effects of HSD. Moreover, downregulation or overexpression in the fat body abolishes or induces HDF, respectively, abolishes or induces HDF, respectively, independent of HSD treatment.

      Overall, the manuscript is well-written and the phenotypes are clear. However, I have major concerns regarding the authors' interpretation of the data and their conclusion. Most importantly, while it is clear that the authors' high-sugar dietary treatment affects feeding behavior and physiology, I am not convinced that the changes can be considered "hunger-driven"-which is central to the main point of the manuscript. Therefore, it is my recommendation that the authors substantially revise the manuscript by either showing additional/re-analyzed data that rule out alternative hypotheses, or rewriting the manuscript keeping alternative interpretations in mind.

      Major comments:

      1. The data do not sufficiently show that the long-term HSD regime disrupts "hunger-sensing." The manuscript should address alternative hypotheses by showing raw instead of normalized data, rewriting the manuscript with a new central conclusion, or running additional experiments that actually show a defect in hunger-driven response.
        • a. The main results that the authors rely on for the argument is that the ratio of feeding events that the starved and non-starved flies eat is different between the groups fed normal or HSD. However, because the authors only show normalized data (normalized to non-starved flies; Fig. 1), it is difficult to tell whether the change is due to a chronically increased feeding in non-starved HSD flies-maybe in perpetual hunger-like allostasis-or dampened starvation response. Indeed, the data shown in Fig S1 show that flies fed HSD for as short as 5 days show more frequent feeding events compared to age-matched controls fed normal food. It is possible that because the HSD-fed flies eat more than NF-fed flies, even without being starved, the ratio of starved/non-starved feeding is lower in the HSD-fed group-due to changes in the denominator, rather than the numerator.
        • b. It is also possible that the HSD-fed flies are simply not in as big an energy deficit physiologically, due to the increased fat deposits they've accumulated (as the authors show later in the manuscript). It may take longer for the fat HSD flies to reach substantial energy deficiency than the NF flies, but they still may eventually be able to appropriately respond to hunger, just like NF flies. In such case, it would be a misnomer to call this behavioral change a 'defect in hunger-driven feeding behavior.' Maybe an experiment with a dose-response curve of "hunger driven feeding response" as a function of duration of starvation would help?
      2. How can you be sure that lower Dilp5 immunofluorescence is indicative of increased Dilp5 secretion? Wouldn't decreased production of dilp5 also have the same results?
        • a. Also, the authors should state in the main text that it is Dilp5, not just any Dilp.
      3. Data presentation:
        • a. Sometimes the data are normalized to NF (Fig 4B-C), sometimes not (ex. Fig 4A, S4C). Unless there is a specific rationale for the data transformation, it would be more appropriate to show untransformed data (ex. Fig 4A, S4C), especially as the authors use two-way ANOVA to determine significance. Only showing the differences implies comparison against a hypothetical mean (i.e. μ0=0), not between two group means.
        • b. Some figures show both individual data points and summary statistics (mean, SD, ... ex. Fig 2A)-which I believe is ideal-but some show only one or the other (ex. Fig 2B, no summary statistics; Fig. 3, no data points. The manuscript would read more convincing if data visualization is consistent across figures.

      Minor comments:

      1. High sugar diet: what is the actual sugar concentration in the NF v. HSD diets? The authors write that the HSD diet contains "30% more sugar" than the NF, but providing the final sugar concentrations-sucrose or others-would be informative for other scientists studying the effect of high sugar diets.
        • a. Additionally, the definition of HSD is inconsistent. Main text (Page 5, line 17) states that their HSD is "60% more sugar than normal media," whereas the figure legend (Fig 1) and the Methods state that the HSD contains "30% more sugar."
      2. Starvation medium: please provide justification for why the authors used 1% sucrose/agar for starvation medium, instead of plain agar/water that most labs use. At least clarify and provide a reference for the claim that the 1% sucrose/agar "is a minimal food media to elicit a starvation response."
      3. PECT mRNA level is higher with HSD. This is surprising because not only, as authors mention, is increased PC32.2 with HSD suggests lower PECT activity, but also because PECT RNAi phenocopies long-term HSD in HDF behavior, lipid morphology, FOXO accumulation in fat body. The authors speculate that the data "likely shown an upregulation in an attempt to mediate the PECT dysregulation occurring at the protein level." If that were true, a western blot may be informative. Zhao and Wang (2020, PLoS Genetics) generated a PECT antibody that seems compatible with western blot applications. That being said, I don't think such data is critical for the manuscript. I mention this simply as a suggestion for the authors.
        • a. page 8, line 22-23, did you mean to write "Given how PC32.2 is elevated after 14 days of exposure to HSD, we assumed that PECT levels would be low for flies under HSD," not "high?" Otherwise the subsequent 2 sentences don't make sense.

      Significance

      The work is potentially novel and interesting, but at this stage it's difficult to interpret what the phenotype signifies. Although the manuscript could be revised simply by modifying the text, experimentally addressing the concerns would significantly improve the work.

      The co-reviewer and I have expertise in Drosophila neurobiology and behavior.

      Referees cross-commenting

      Hi all, although the reviews hit upon some overlapping, but mostly different points, I agree with all of the concerns raised. There's some really interesting stuff here but some of the results, as presented, don't make sense. It's possible this will be clarified by revising the text, although I suspect it's more likely that the authors will have to add a number of the experimental suggestions made by the reviewers.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      *Reviewers comments in italics *

      We thank all reviewers for their positive and encouraging comments and criticisms to improve our work. Here we present a reviewed version of the manuscript according to the comments risen.

      • Reviewer #1 (Evidence, reproducibility and clarity (Required)): This is an interesting paper that identifies Tns3 as a potential effector of oligodendrocytes differentiation based on an ingenious strategy comparing regulatory binding sites of known master regulators of differentiation, and then shows using in vivo genetics that this role is indeed correct. Next, a potential mechanism is identified by showing co-localization with beta 1 integrin, known to regulate apoptosis of newly-formed oligodendrocytes. The results are well illustrated and the experiments performed with appropriate power using a broad range of techniques that combine in silico, in vitro and in vivo work to great effect.

      I think this represents an important contribution that will be of significant interest to neuroscientists - the mechanisms regulating oligodendrocytes generation remain poorly understood and the evidence that this contributes to adult learning (adaptive myelination) and CNS regeneration makes this a key question. I would suggest that the following are considered before publication: We thank the reviewer for this positive comments and critics to improve the manuscript. The work describing the KO mice that were not used as they proved unsuitable need not be described - it breaks the logical flow.*

      In agreement with the reviewer comment, we have reduced this part to a sort paragraph indicating that our analyses of several Tns3 constitutive KO lines showed developmental lethality and possible genetic compensation in Tns3 expression, leading us to conclude them inappropriate tools to study Tns3 function in oligodendrogenesis. We have summarized the data in Fig. S7 and the description in the method section.

      It would be useful to compare the extent of cell death in the Tns3 cKO mice with that described in the alpha6 integrin KO and the integrin beta1 cKO (the Colognato and Benninger papers). Do they match? If not (and I suspect the Tns3 cKO death is greater) could other mechanisms be downstream of the Tns3?

      In agreement with the reviewer comment, we have added the following paragraph to the discussion:

      ‘Knockout mice for integrin-a6 present a 50% reduction in brainstem MBP+ OLs at E18.5, just before they die at birth, accompanied by an increase in TUNEL+ dying OLs (Colognato et al, 2002). Similarly, conditional deletion of integrin-b1 in immature OLs by Cnp-Cre also leads to a 50% reduction in cerebellar OLs at P5, with a parallel increase in TUNEL+ dying OLs (Benninger et al., 2006). Therefore, given that Tns3-induced deletion in postnatal OPCs also leads to 40-50% reduction in OLs in both grey and white matter regions of the postnatal telencephalon (this study), paralleled by similar increase in TUNEL+ apoptotic oligodendroglia, we suggest that Tns3 is required for integrin-b1 mediated survival signal in immature oligodendrocytes.’

      I'm not sure why the authors argue that the activation of beta 1 would not be informative experiment? This will regulate actin dynamics just as it regulates other integrin signaling pathways. Indeed, I would argue that an integrin activation experiments would be a neat way to prove mechanism (as it would be predicted to rescue the Tns3 cKO phenotype).

      In agreement with the reviewer comment, we have removed this sentence: ‘If so, exogenous activation of integrin a6b1 in cultured OPCs by Mn2+ (Colognato et al., 2004) would not be expected to increase oligodendrogenesis in Tns3-iKO oligodendroglia.’

      In an effort, to understand Tns3 function by acute Tns3-deletion in postnatal OPCs, we have compared the transcriptome of Tns3-iKO oligodendroglia compared to control cells, and we present these results in figure 7 pinpointing deregulated genes leading to reduced oligodendroglial differentiation, integrin dysregulation, increase apoptosis, and conflicting cell cycle signaling, and leaving for further studies the full characterization how the loss of Tns3 leads to the deregulation of these processes.

      Can the authors provide any data on GM oligos and their OPCs? Is the requirement for Tns3 the same, and if so what might the implications be in the adult where new oligodendrocytes are being generated throughout life?

      Indeed, in our analyses of Tns3-iKO mice, we provide quantifications of the cortex as a grey matter territory, showing a similar 40-50% reduction in OLs as in white matter areas (corpus callosum and fimbria, and mixed regions such as the striatum.

      I note in S13 that integrin beta1 is not highly expressed in human oligos at the time in question. Does this call into question the relevance for human disease?

      We realize that scRNAseq plots are never easy to interpret but it is important to note that the levels of expression are coded by the intensity of the color scale, while the surface of the dot plots indicate the experimental sensitivity to detect transcript expression in a larger or smaller proportion of the cells in a given cluster/cell type (due to the drop out limitation of current single cell RNA-seq technologies). Considering this, please note that beyond a stronger expression in neural progenitor cells (NPCs, blue color), integrin-b1 (Itgb1) transcripts are expressed at medium to high levels (green to blue) in human immature OLs (Fig. S13B), similar to their pattern of expression in mouse oligodendroglia (Fig. S13A).

      Reviewer #1 (Significance (Required)): See above

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      *In this article, the authors identify and characterise Tensin3 (Tns3) as a target of key oligodendroglial transcription factors driving differentiation in the mouse. They use multiple transgenic models to describe loss of function, and suggest Tns3's action through integrin B1 signalling, with the key function being oligodendroglial survival.

      There is extensive and impressive work here, including identification of Tns3 by ChIPseq, expression of Tns3 in brain development, analysis of human (ES-derived) and mouse scRNAseq to infer timing of expression in the differentiation pathway, generation of V5-tagged Tns3-KI mice to overcome antibody limitations, identification of its expression in mouse remyelination, generation of a new Tns3KO mouse, in vivo Crispr Tns3KO in development, generation of a conditional KO, for deletion in adulthood, and finally some culture work to investigate potential mechanisms of actions. The bottom line is that Tns3 is required for survival of OPCs and immature oligodendrocytes in development/remyelination in mouse at least, and loss leads to apoptosis (through p53 increase and loss of integrin-B1 signalling), leading to a failure of proper differentiation.

      The experiments are carefully done, convincing and the tools generated impressive. There is clearly more to be done on clarifying the mechanism of action of Tns3, but I do not think further experiments on this topic are needed for this paper - they can wait for the next.*

      We thank the reviewer for the positive and encouraging reviewing comments. In an effort, to understand Tns3 function by acute Tns3-deletion in postnatal OPCs, we have compared the transcriptome of Tns3-iKO oligodendroglia compared to control cells, and we present these results in figure 7 pinpointing deregulated genes leading to reduced oligodendroglial differentiation, integrin dysregulation, increase apoptosis, and conflicting cell cycle signaling, and leaving for further studies the full characterization how the loss of Tns3 leads to the deregulation of these processes.

      My only query is whether the expression of Tns3 is also in immature OLs in human brain (rather than human ES-derived OLs). This should be easily checked with interrogation of online Shiny apps from already published snRNAseq from various groups on human post mortem adult brain, but if not present then in also baby/fetal brain. This would be interesting and may well be different from the ES_derived cells which tend to be very immature and would add interest to the possible translational impact.

      According to the suggestion of the reviewer, we analyzed 69,174 snRNAseq GW9-GW22 from fetal cerebellum,; Aldinger & Miller, 2021; https://doi-org.proxy.insermbiblio.inist.fr/10.1038/s41593-021-00872-y), which we present now in Figure S3, finding a cluster of cells expressing iOL markers, including NKX2-2, TNS3, ITPR2, and BCAS1, similar to the hiPSCs-derived iOL1/iOL2 clusters and mouse iOL1/iOL2 clusters shown in Fig. S2.

      We also analyzed other datasets without finding iOLs given their age or numbers, including:

      • Immunopanned PDGFRA+ cells from human cortex GW20-GW24 (2690 cells, Huang and Kriegstein, Cell 2020) finding OPCs but not iOLs.

      -The recently published dataset from GW8-GW10 human forebrain oligodendroglia (van Brugen & Castelo-Branco, Dev Cell 2022; https://doi.org/10.1016/j.devcel.2022.04.016) containing OPCs but not iOLs.

      -The GW17 to GW18 human cortex (40,000 cells, Polioudakis & Geschwind, 2019, https://doi.org/10.1016/j.neuron.2019.06.011) containing OPCs but not iOLs.

      Reviewer #2 (Significance (Required)): This work extends our knowledge of oligodendroglial differentiation, links it to the ECM and provides interest in manipulating this in diseases including glioma. My expertise: myelin, oligodendroglia, remyelination, human neuropathology

      *Reviewer #3 (Evidence, reproducibility and clarity (Required)): *

      see below Reviewer #3 (Significance (Required)): Using purified oligodendrocytes target genes of key regulators of oligodendrocyte differentiation were analyzed, which led to the identification of Tensin-3. The authors performed a detail characterization of Tensin-3 expression. They found that Tensin-3 is highly expressed in immature mouse and human oligodendrocytes. Interestingly, Tensin-3 is selectively enriched in immature oligodendrocytes, and not present at detectable levels in OPCs and mature oligodendrocytes. Subsequently, the authors characterized Tensin-3 function by a series of knockdown approaches in vitro and in vivo. These series of experiments revealed an essential function of Tensin-3 in supporting oligodendrocytes survival. In the absence of Tensin-3 a large fraction of oligodendrocytes undergo apoptosis while differentiating to mature oligodendrocytes. This is a remarkable study applying an impressive array of methods that led to an important discovery in the field of oligodendrocyte biology. The main advances for the field are: 1) identification of a novel marker for premyelinating oligodendrocytes, 2) elucidation of Tensin-3 as a pro-survival factor in oligodendrocytes differentiation, 3) evidence of link of Tensin-3-integrin signal in survival of oligodendrocytes. The data is well presented and organized, and the paper well written. I recommend publication with only minor suggestions for a revision:

      • *

      We thank the reviewer for this positive comments and critics to improve the manuscript.

      In Figure 2, only images are shown, and the data is referred to as highly expressed or strong co-localization. Even if the data looks clear, the authors should provide some quantification of the data in the figure.

      We thank the reviewer for his comment and we have now provided a quantification of the fraction of Tns3+ cells expressing different markers of oligodendrocyte lineage progression/stages, and the percentage of each stage expressing Tns3.

      Figure 3 is given too much weight in the manuscript text. I would recommend to shorten the text in the result section, and to move this figure to the supplement as it does not advance the story. It mainly shows that the KO mice still express transcripts in the brain. Were the transcripts lost in peripheral tissue?

      • *

      As mentioned above, in agreement with the reviewers #1 and #3 comments, we have reduced this part to a sort paragraph indicating that our analyses of several Tns3 constitutive KO lines showed developmental lethality and possible genetic compensation in Tns3 expression, leading us to conclude them inappropriate tools to study Tns3 function in oligodendrogenesis. We have summarized the data in Fig. S7 and the description in the method section.

      Page 11: the authors describe in the text how the floxed allele was generated. This should be shifted to the supplement.

      According to reviewers suggestion, we have moved the description of Tns3 floxed allele generation to the Methods section. Page 16: the authors refer to Bcas1 as a problematic marker for immature oligodendrocytes, because the transcript is also expressed in mature oligodendrocytes. The authors are correct that the transcript is expressed in mature oligodendrocytes. However, the proteins changes its localization when oligodendrocytes mature. On protein level, it is valuable and a selective marker, as antibodies only label pre-myelinating and actively myelinating cells. In mature oligodendrocytes, antibodies against Bcas1 do not label the cell, only myelin. The text is misleading and needs to be corrected.

      In agreement with reviewers comment we have modified the text as follows: ‘An optimized protocol for immunodetection using Bcas1-recognizing antibodies has been shown to label iOLs (Fard et al., 2017).’

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      The manuscript by Tran et al. describes the mechanism by which IFNa treatment prevents the development of liver CRC metastasis in several mouse models. They show how continuous administration of IFNa strength liver vascular barrier by a direct effect on endothelial cells and avoids the trans-sinusoidal migration of tumour cells.

      Major points:

      1. Authors use an elegant orthotopic model of liver metastasis to confirm the effect of continuous IFNa on hepatic colonization (Fig.3). Although they extensively characterize the metastatic lesions, they do not show data on the potential impact of IFNa treatment in the primary caecum tumour. Authors should clarify if the described effects are taken place in the liver or/and in the caecum. It would be interesting to show if IFNa affects the primary tumour size, the extravasation of cancer cells and the immune infiltration since all these factors could have an impact in the number of liver lesions.

      We thank the reviewer for acknowledging the importance of our results particularly in the context of the orthotopic mouse model we developed. We agree that displaying the results of continuous IFNα therapy on primary intracecal tumors, as well as the results pertaining to the few mice that develop microscopic or macroscopic liver metastasis, is important for the interpretation of our work. Thus, we evaluated the dimension of primary intracecal CRC lesions (Fig 3D,E) and we performed additional IHC characterization of the primary tumors (Fig S4A,B). The analysis showed that the dimension of the primary lesions and the markers we analyzed were non significantly modified by continuous IFNα therapy (Fig 3D,E and Fig S4A,B). These results favor the hypothesis that IFNα therapy does not modify the number of cells that spread from the primary tumors and seed into the liver, but it rather impinges on the intravascular containment of CRC cells circulating within the liver (Fig 3F). As said earlier, the data also highlight the possibility that CRC tumors may become refractory to IFNα or that the dose and schedule we adopted does not significantly affect the growth of established liver CRCs at late time points. The data are also consistent with results obtained with MC38Ifnar1_KO CRC cells indicating that continuous IFNα therapy does not require Ifnar1 expression by tumor cells to exert its antimetastatic function (Fig 4A,C-D). This is also in line with the high IFNα concentrations required to activate the "tunable" direct antiproliferative functions of this cytokine that exceed those achieved in our system (Catarinella et al, 2016; Schreiber, 2017). Text has been added in the revised manuscript at lines 175-197 and in the discussion lines 425-431.

      1. Figure 3f right shows liver images without any obvious metastatic lesion. Since authors are analysing the effect of IFNa treatment in proliferation, vascularization and immune composition in liver tumours, they may show and quantify images with metastatic lesions and restrict the analysis to the tumour area.

      Since the main finding of our manuscript regards the prevention of hepatic colonization by continuous IFNα therapy, we think that the original data presented in Fig 3G,H are representative of the overall efficacy of our strategy that confers protection in up to 60% of the mice carrying intramesenteric tumors of increasing dimensions (Fig 3H). We have thus maintained our original results, adding the quantification of all IHC data on groups of Sham control livers (n=6), as suggested. In any case, we also included the same IHC characterization of the few and small intrahepatic lesions that have bypassed the intravascular antimetastatic barrier (Fig S4C,D). Indeed, in agreement with the results observed in primary intracecal lesions, these metastatic lesions that developed in IFNαtreated mice showed similar markers of cell proliferation, neoangiogenesis, F4/80 macrophages and CD3+ T cells, as control lesions detected in NaCl-treated mice. Once again, the results highlight the possibility that CRC tumors, once established as micro/macroscopic metastases, may become refractory and resistant to IFNα therapy by downregulating the Ifnar1 in various components of the tumor microenvironment (Boukhaled et al., 2021; Katlinski et al., 2017). Text has been added in the revised manuscript at lines 175-197 and in the discussion lines 496-515.

      1. Authors analyse the recombination efficiency of different mouse CRE lines by non-quantitative methods (PCR of hepatic genomic DNA and GFP expression by immunofluorescence in healthy liver). Since PDGFRβ-Cre/ERT2 and CD11c-Cre lines are used to exclude a role of IFNa on the targeted cells, authors should provide stronger evidences to support this. They may consider studding the ablation of Ifnar1 in FACS sorted fibroblasts and myeloid cells. Moreover, it would be important showing the proportion of GFP+ cells in the sorted populations to understand how broadly these stromal populations are targeted.

      We thank the referee for raising this important issue, which is related to the relative efficiency of Ifnar1 recombination in each of the Cre-expressing mouse models we have used in the study. To this regard, we newly performed an extensive colocalization analysis quantifying the percentage of GFP+ cells that colocalize with cell specific markers (i.e., PDGFRβ, CD11c, F4/80 and CD31) of the various mouse models (PDGFRβCreERT2, CD11cCre and VeCadCreERT2, respectively) crossed with RosaZsGreen reporter mice. Colocalization analysis of GFP in the different systems was performed using the ImageJ “colocalization” algorithm developed by Pierre Bourdoncle (Institut Jacques Monod, Service Imagerie, Paris; 2003–2004). The method allows the generation of unsupervised profiles of co-localized pixels between two channels. This methodology has been included in the section Methods and Protocols, line 806-809. Of note, we observed an almost complete recombination in liver fibroblast (GFP+/PDGFRβ+), with about 98.2 ± 0.72% hepatic stellate cells that co-expressed GFP+ and PDGFRβ+ signals (see the new Fig S5E). Similarly, hepatic DCs (GFP+/CD11c+) had 94.17 ± 2.16% colocalization, while F4/80+ KCs or LCMs (GFP+/F4/80+) colocalized in 78.14 ± 5.03% (see the new Fig S5E). Finally, HECs, including LSECs, (GFP+/CD31+) showed 85.3 ± 5.03% colocalization (see the new Fig S5E,F), with no expression of GFP signals in cells other than CD31+. Note that these values indicate an almost complete colocalization of the Cre recombinase in the target cell types analyzed (see representative IF shown in Fig S5E). Text has been added in the revised manuscript at lines 225-233. Moreover, DEGs analysis between NaCl-treated VeCadIfnar1_KO and Ifnar1fl/fl HECs showed a significant downregulation of Ifnar1 expression in CD31+ VeCadIfnar1_KO cells, with a log2 fold-change of -0.387 and an adjusted p-value of 0.033, further confirming Cre recombination in HECs isolated from VeCadIfnar1_KO mice (as depicted in the heatmap of Fig 6B; the 12th gene of the Type I IFN response is Ifnar1). We have prepared all source images at higher dimension to better appreciate the colocalization within liver microvasculature. In addition, we performed several flow cytometry analyses to identify liver cell populations of Cre-recombinant mice that express Ifnar1. Unfortunately, the predicted low cellular surface expression of this molecule coupled with the experimental conditions needed to extract viable non-parenchymal cells from the liver have prevented us from obtaining informative results.

      1. Ifnar1 ablation in VeCad+ cells prevents the effect of IFNa on tumour growth (Fig. 4d), suggesting the existence of anti-tumour mechanisms beyond the effects on hepatic colonization. Authors may consider checking proliferation, vascularization and immune infiltration in these tumours to enhance their conclusion.

      We fully agree with the referee’s concern and as above mentioned, we have followed his/her suggestion and examined the existence of antitumor mechanisms beyond the effects on hepatic colonization in VeCadIfnar1_KO mice treated with NaCl or IFNα. To this end, 4 NaCl-Ifnar1fl/fl, 7 IFNα-Ifnar1fl/fl, 4 NaCl-VeCadIfnar1_KO and 4 IFNα-VeCadIfnar1_KO mice were intrasplenically injected with MC38 CRC cells (Fig S7A,B). Twenty-one days after injection, mice were euthanized and their livers analyzed for tumor size, proliferation, signs of angiogenesis (as denoted by CD34 staining) and immune infiltration (F4/80+ macrophages and CD3+ T cells). Consistent with data presented in Fig 4D, histological analysis showed that Ifnar1fl/fl mice did not develop liver metastases in IFNα-treated mice. Furthermore, metastatic lesions detected in VeCadIfnar1_KO mice treated or not with IFNα did not show significant differences in Ki67 positivity, CD34 staining or the amount of F4/80+ resident macrophages and CD3+ T cells. This further supports that the antimetastatic potential of IFNα therapy may be primarily depend on the inhibition of hepatic trans-sinusoidal migration, a limiting step in the metastatic cascade that could secondarily influence colonization and outgrowth (Chambers et al, 2002). Corresponding text has been added at lines 248-252.

      1. Immune properties of LSECs are analysed in vivo by using a mouse CRE line that targets all endothelial cells, including those ones located in lymphoid organs, and evaluating T cell composition in the spleen. I found difficult to conclude that these properties are exerted directly by LSECs and not by other endothelial cells in vivo. To clarify the local effect of LSECs in modulating anti-tumour immunity, T cell composition and activation should be checked in tumours shortly after tamoxifen administration.

      We thank the reviewer for pointing out this issue, which cannot not be tested directly because - as also mentioned by reviewer 2 - LSEC-specific Cre-recombinant driver mice do not exist . As also indicated in the cited literature, central memory T cells accumulate after peripheral priming in secondary lymphoid organs such as the spleen (Sallusto et al, 2004; Stone et al, 2009; Yu et al, 2019). To this end, the generation and regulation of antitumor immunity is a highly orchestrated multistep process involving the uptake of tumor-associated antigens by professional APCs, their time-consuming migration to draining lymph nodes and the generation of protective T cells. Unlike other APCs, HECs/LSECs do not need to migrate to draining lymph nodes to activate effector T cells, leading to a rapid intrahepatic CD8+ T cell activation. In this context, LSECs must not only efficiently uptake, process and present CRC-derived antigens coming from intravascularly contained tumor cells, but they also require the attraction and retention within the liver micro-vasculature of T cell populations necessary for the generation of effective antitumor immune responses, where chemokines play an important role (Lalor et al, 2002). As shown in Fig 6A-C, two prominent chemokines (Cxcl10 and Cxcl9) required for T cell recruitment to the liver are specifically upregulated only in HECs/LSECs from IFNα-treated Ifnar1fl/fl mice, whereas HECs from VeCadIfnar1_KO mice maintained low expression of these chemoattractants in both NaCl- and IFNα-treated mice. These data are also consistent with the in vitro cross-priming results (see Fig 7A,B) showing that in the absence of IFNα, HECs have a low capacity to prime naïve T cells (Katz et al, 2004), indicating that LSEC-primed by tumor-derived antigens coming from apoptotic intravascular CRC metastatic cells play an important role in inducing tolerance (Berg et al, 2006; Katz et al., 2004), especially when CRC cells quickly extravasate and position within the space of Disse, likely becoming less accessible to intravascular patrolling by naïve and effector T cells (Benechet et al, 2019; Guidotti et al, 2015). On the contrary, in IFNα-treated Ifnar1fl/fl mice, CRC cells are rapidly contained in the liver microvasculature (Fig 5A,B) with CRC-derived antigens that could be immediately taken up by LSECs due to their anatomical proximity and efficient endocytosis capacity, which is among the highest of all cell types in the body (Sorensen, 2020). Here, the continuous sensing of IFNα by LSECs upregulates several genes related to antigen processing and presentation pathways (Fig. 6B,D), leading to efficient cross-priming of tumor-specific CD8+ T cells to the same extent as professional APCs, such as splenic DCs (Fig 7B). Text has been added in the revised manuscript at lines 496-515. Finally, regarding the suggestion to analyze the role of HECs/LSECs in inducing antitumor T cell immunity shortly after tamoxifen administration, while we agree that it would be interesting to analyze HEC/LSEC-mediated T cell activation by treating NaCl- and IFNαtreated Ifnar1fl/fl and VeCadIfnar1_KO mice with tamoxifen after CRC cell injection, we would like to point out that tamoxifen treatment will not only induce Cre recombination and Ifnar1 loss on endothelial cells but it may also induce several “off-target” effects complicating the interpretation of the results. Indeed, tamoxifen is known to i) inhibit the in vitro proliferation of several CRC cell lines (Ziv et al, 1994), ii) impair the growth of CRC liver metastases in vivo (Kuruppu et al, 1998) and iii) modify matrix stiffness to reduce tumor cell survival (Cortes et al, 2019). Further, as IFNα modifies the hepatic vascular barrier and the accessibility of antigens by LSECs, the specific timing of tamoxifen treatment could also affect the immunological consequences of Ifnar1 deletion making these experiment impractical. For these reasons, we’d like not to perform the suggested experiment with tamoxifen.

      Reviewer #1 (Significance):

      The conclusions of this study are consistent with previously published literature and the biological insights are potentially useful to the cancer biology community.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this study Dr. Sitia's group investigated the effect of IFNα1 as perioperative agent preventing liver metastasis formation of colorectal carcinoma (CRC). To this end, various mouse models were used such as liver colonization models, i.e. intrasplenic and mesenterial injections of MC38 and CT26 CRC cell lines. Besides, spontaneous metastasis of CRC was analyzed by orthotopic injection of MC38 into the cecum. To study the influence of IFNα1 in these settings mini-osmotic pumps releasing IFNα1 were used. Moreover, conditional mouse models with a cell-type specific deficiency of Ifnar1 were compared. Altogether, the application of IFNα1 led to a reduction in liver colonization of CRC in all models studied. This was ascribed to decreased trans-sinusoidal migration of CRC and increased cross-priming by LSEC entailing in T cell activation.

      Major comments:

      Overall the study is well performed and the major conclusions seem to be drawn well. However, there are certain points I like to address:

      • First, the authors started their experiments with MC38 and CT26 CRC cell lines. At the end they just applied MC38. The rational behind this should be clearly stated. Second, as in their previous publication (Catarinella et al, 2016) F1 hybrids of C57BL/6 x BALB/c mice were used for the experiments. However, I believe that the genetic heterogeneity might be strongly increased by this approach which might lead to difficult reproducibility of the results.

      We thank the referee for raising this important issue; additional text describing the reason of our choice has been introduced at lines: 203-205. We respectfully disagree with the comment that CB6F1 hybrids may increase genetic heterogeneity and impair reproducibility of our results. Each CB6F1 hybrid individual is genetically identical to its littermates, sharing 50% of genes of each parental mouse line and being tolerant to reciprocal MHC-I genes (thus permitting the correct engraftment of both cell lines). We agree that the use of mismatched backcrosses after the F1 generation would increase genetic heterogeneity and thus may affect outcome. This is also the reason why we could not perform experiments with CT26 in the Ifnar1fl/fl conditional lines that are in C57BL/6 background and would have needed at least 10 generations of backcrossing in the BALB/c background before being suitable to such experiments. Finally, all experiments described in Fig 4, 5, 6 and 7 were performed in C57BL/6 mice using MC38 CRC cells with results that reproduced those obtained in CB6F1 hybrids, and very similarly to what we have previously reported with MC38 in C57BL/6 mice (see Fig 5 (Catarinella et al., 2016)).

      • At page 16 the authors conclude that "patients suffering from chronic liver fibrotic disease... display lower incidence of hepatic metastases". In the community there is contradictory data (see Kondo et al, BJC, 2016, https://www.nature.com/articles/bjc2016155). This should be precisely discussed, otherwise this claim should be removed.

      We thank the referee for raising this issue and modified the discussion accordingly. Text has been added in the revised manuscript at lines 455-457.

      We agree with the reviewer's suggestion and added new text to recognized the interplay between different cell types such as dendritic cells within the hepatic niche (see new text at lines 505-515).

      • Last, multiple times the authors write about data that is "not shown". Please either include these data in the manuscript or delete corresponding phrases because it is not possible for the reader to scrutinize it.

      We fully agree with the referee’s concern and displayed all “not shown results” in Fig S1E and Fig S9C-I.

      • Besides, I suggest additional experiments further substantiating the study:
      • To see if this effect of IFNα1 is cell type-specific liver metastasis of other solid tumors such as breast cancer or melanoma should be investigated.

      We agree with the reviewer's suggestion, as also indicated in our original discussion. We believe that additional experiments with other solid tumor cell lines would be important to generalize the potential of perioperative IFNα therapy. In particular, we believe that pancreatic ductal adenocarcinoma (PDAC), a highly lethal disease that most commonly metastasizes to the liver (Lambert et al, 2017), may benefit from our approach. It should be noted, however, that the pleotropic nature of IFNα allows this cytokine to inhibit tumor growth by several mechanisms. Above all, the ability of IFNα therapy to directly reduce tumor growth depends on the relative surface expression of Ifnar1 on each tumor cell and the ability to maintain such expression in the harsh tumor microenvironment during IFNα therapy. As the degradation of Ifnar1 by CRC tumors has been well described (Katlinski et al., 2017), it is possible that CRC tumors thus escaping the antitumor properties of endogenous type I interferons may respond less efficiently to therapeutic IFNα regimens such as those herein described. This notion is consistent with our data on primary orthotopic tumors (Fig. 3D,E), which are no longer responsive to continuous IFNα therapy as early as 7 days after implantation of CT26LM3 cells. In addition, the definition of the HEC/LSEC antimetastatic barrier has been possible only because CRC cells are not directly susceptible to the IFNα antiproliferative activity, which we observed in vitro at extremely high IFNα dosages (Catarinella et al., 2016) but not in vivo (as formally demonstrated by using MC38Ifnar_ko cells, Fig 4A). At any rate, we followed the reviewer’s suggestion and performed an additional experiment in which we intramesenterically injected the PDAC cell line Panc02 (H-2b, C57BL/6-derived) (Soares et al, 2014) into C57BL/6 mice 7 days after of NaCl or IFNα therapy initiation. As shown below, MRI analysis at day 21 showed that none of the IFNα-treated Panc02 challenged mice developed metastatic lesions, while NaCl controls displayed a high metastatic burden that required euthanization for ethical reasons of about 67% of these mice shortly after MRI analysis. These data indicate that perioperative IFNα therapy completely curbs metastatic development in IFNα-treated PDAC animals. The notion that these cells may be more IFNα-susceptible than CRCs may well depend on the relative capacity of the former cells to maintain Ifnar1 expression, as suggested by others (Zhu et al, 2014). Properly addressing the reviewer’s comment would thus require extensive investigations involving the establishment of new mouse models of metastases from other solid tumors, starting from the in vitro and in vivo regulation of surface Ifnar1 expression in each tumor cell. We strongly believe that this work has merit but we think that it should be reported separately.

      • The authors applied a broad range of cell type-specific mice. However, a thorough characterization of the deletion of Ifnar1 in the corresponding cell types is missing. This is crucial for the manuscript.

      We fully agree with the referee’s concern and as previously mentioned, we have improved the characterization of Ifnar1 deletion (see response to the same critique received from reviewer 1, comment 3).

      • The capillarization of the hepatic vascular niche is a crucial point in this story. I believe that the hepatic endothelium should be further characterized by additional vascular markers.

      In response to the reviewer’s suggestion, we have included in our analysis the characterization of Lyve-1, a marker of hepatic capillarization (Pandey et al, 2020; Wohlfeil et al, 2019). Indeed, IFNα treatment of Ifnar1fl/fl mice significantly increased the expression of Lyve-1, whereas IFNα treatment of VeCadIfnar1_KO mice showed no effect (Fig S9A,B), further corroborating our findings. Text has been added in the revised manuscript at lines 291-294. To better aid readers, we have prepared high-resolution images for each IF channel and have provided these data as source date for Fig S9A.

      • Last, the data and methods appear adequately presented and experiments seem to be reproducible. Just in Figure 4 the exact number of mice and replicates are not clearly presented. Otherwise, everything is fine.

      We thank the reviewer for raising this issue, which apparently was not properly described in our original submission. We have now included the exact number of mice in each experimental group in the figure legend to Fig 4.

      Minor comments:

      Overall the text and figures are accurately presented. However, I would like to add further minor comments:

      • In Fig. 1 you present the IFNα dosing regimen. How do you explain the decrease in serum IFNα after day 2? Besides, the data points at day 0 should be excluded since measuring startet from day 2! Why did you decide to treat for seven days until the start of the experiment? One could think 2 days might already be enough.

      We thank the reviewer for raising these important points. Regarding the pharmacokineticpharmacodynamic (PK-PD) behavior of our approach, we do not believe that MOP reduced its pumping efficacy after day 2 (Theeuwes & Yum, 1976), nor that counterregulatory mechanisms, such as the induction of anti-IFNα blocking antibodies, occurred in such a short time frame (Wang et al, 2001). It is neither feasible that IFNα treatment significantly downregulated Ifnar1 in the liver (as demonstrated by pSTAT1 activation after MOP treatment in Fig S1E). Rather, our results reflect the PK-PD behavior of other long-lasting formulations of IFNα, which depend on intrinsic pharmacological properties of IFNα already described in (Jeon et al, 2013). Text has been added in the revised manuscript at lines 110-112. We also corrected the figures in which we quantified serum IFNα. Indeed, blood was drawn one day before MOP implantation rather than on the same day of surgery to avoid additional blood loss, which could be a source of unnecessary stress for the animals. Therefore, we corrected the results section and Fig S1A-C and Fig 1A,B. The decision to start treatment 7 days rather than 2 days before seeding was made for several reasons: i) this study follows our previous gene/cell therapy approach, in which the time interval between reconstitution of the transduced bone marrow with Tie2-IFNα and tumor challenge was at least 7-8 weeks. We therefore thought that 7 days might be a sufficient/necessary time period to induce similar phenotypes in the liver after continuous IFNα administration; ii) 7 days is a time frame compatible with the perioperative period in humans (Horowitz et al, 2015). Furthermore, the side effects that patients may experience after IFNα therapy are generally limited to the first few days after administration, allowing patients to benefit from IFNα-induced vascular antimetastatic barriers at the time of surgery without potential side effects of IFNα. Because oncologic guidelines recommend starting adjuvant chemotherapy at least 4 weeks after surgery in stage 2-3 CRC patients at risk of later developing liver metastases (Engstrand et al, 2019; van Gestel et al, 2014), our proposed perioperative time frame does not even conflict with these indications (Van Cutsem et al, 2016). We have included additional text in the lines 131-132 to motivate the timing of our regimens.

      • Fig. 2: Did you check for metastases in other organs than the liver at the timepoint of euthanization, e.g. lungs. In the discussion section you talk about a potential influence of IFNα1 on other organs. Therefore, I think that the mice should be thoroughly analyzed and the data presented. The manuscript will benefit from it.

      We thank the reviewer for this valuable comment. Indeed, we always check for dissemination of CRC metastases on MRI analysis and necroscopy. As stated at lines 146-147 and 158 CRC tumors seeded in the liver vasculature after colonizing the liver do not spread to other organs such as the lungs. Indeed, CRC cells intravascularly seeded in the portal circulation, are trapped at the beginning of hepatic sinusoids because their diameter is bigger than that of liver sinusoids (Fig S8A,B). These micro-anatomic peculiarities are also thought to impede the spreading of tumor cells from periportal to centrilobular areas and to the general circulation (Catarinella et al., 2016; Vidal-Vanaclocha, 2008), and this is consistent with studies showing that in CRC patients undergoing surgery the majority of CRC-derived circulating tumor cells are found in the portal vein (Deneve et al, 2013).

      • Overall, MRI pictures and pictures of IHC or IF are sometimes too small to see. Please provide pictures with larger magnification or enlarge the images.

      We thank you for this suggestion and we have indeed increased the size of all MRI, IHC, and IF images to the maximum that will fit within the figure. In addition, we presented the images at the highest magnification available, without making digital enlargements that would significantly reduce resolution.

      • Fig. 3 F, G: immune cell infiltration in the liver was analyzed. Please compare it to untreated, tumor-free wildtype liver tissue.

      We appreciated the reviewer's suggestion and included the results of six Sham mice per each marker in our analysis. The text was added on the figure legends to Fig 3H and Fig S4B,D.

      • Fig. 6: the graphs are too small to be read, especially the volcano plot and the gene names of the heatmap.

      We increased the font size of genes in the volcano plots and heatmap in Fig 6A,B, as suggested.

      • Fig. S6: Pictures of co-immunofluorescences are presented. For the reader it is really hard to distinguish the stainings and to identify colocalized areas. Please provide pictures with one channel to better compare the marker expression.

      We thank the reviewer for pointing this out and we have tried to make each panel as large as possible to fit into a two-column figure. We have also prepared high magnification images of each channel for all immunofluorescence images, which we provide as source data. We hope that this is sufficient to help readers to interpret our results without increasing the number of main or supplementary figures.

      • From page 8 onwards (section about transgenic mice) LSEC was used as kind of synonym for hepatic endothelial cells. Since there is still no LSEC-specific driver mouse, it should be stated "hepatic endothelial cells" instead.

      We agree with this suggestion and thus have indicated that the results refer to HECs but include a large majority of LSECs. Indeed, LSECs make up the majority (~89%) of the total HEC population (Su et al, 2021). In addition, some SEM and TEM analyses were performed only on LSECs, as well as the IF analyses. Therefore, we believe that LSECs play an important role in this process. Although not specifically suggested, we have also changed the title of our manuscript to reflect the reviewer's suggestion. Thus, we propose "Continuous sensing of IFNα by hepatic endothelial cells shapes a vascular antimetastatic barrier" as new title.

      • P. 11: there is a typo: Fig. Fig. S6G,H

      We corrected this typo.

      • P. 13: the authors describe Gata4 as inhibitor of subendothelial matrix deposition. This should be precisely written, since Gata4 originally is described as master-regulator of liver sinusoidal differentiation which leads to liver fibrosis development upon loss of Gata4.<br /> Besides, I came across a study of the same group that investigated the role of Notch signaling in hepatic CRC and melanoma metastasis (Wohlfeil et al, Cancer Res, 2019, https://aacrjournals.org/cancerres/article/79/3/598/638600/Hepatic-Endothelial-Notch-Activation-Protects). Similar to your study they tie the reduction in hepatic metastasis to capillarization of the hepatic microvasculature.

      We agree with this suggestion and modified text accordingly. We are also glad that our results agree with previous reported literature that has now been correctly cited at lines 351-356 and in the discussion lines 474-476.

      • The discussion reads like paraphrasing the results section. The manuscript would clearly benefit if the discussion section had been rewritten short and concisely.

      We agree with this suggestion, and we have modified discussion accordingly. We are also willing to shorten the discussion by removing the schematic model that could possibly be used as a graphical abstract.

      References

      Benechet AP, De Simone G, Di Lucia P, Cilenti F, Barbiera G, Le Bert N, Fumagalli V, Lusito E, Moalli F, Bianchessi V et al (2019) Dynamics and genomic landscape of CD8(+) T cells undergoing hepatic priming. Nature 574: 200-205

      Berg M, Wingender G, Djandji D, Hegenbarth S, Momburg F, Hammerling G, Limmer A, Knolle P (2006) Cross-presentation of antigens from apoptotic tumor cells by liver sinusoidal endothelial cells leads to tumor-specific CD8+ T cell tolerance. Eur J Immunol 36: 2960-2970

      Boukhaled GM, Harding S, Brooks DG (2021) Opposing Roles of Type I Interferons in Cancer Immunity. Annu Rev Pathol 16: 167-198

      Catarinella M, Monestiroli A, Escobar G, Fiocchi A, Tran NL, Aiolfi R, Marra P, Esposito A, Cipriani F, Aldrighetti L et al (2016) IFNalpha gene/cell therapy curbs colorectal cancer colonization of the liver by acting on the hepatic microenvironment. EMBO Mol Med 8: 155-170

      Chambers AF, Groom AC, MacDonald IC (2002) Dissemination and growth of cancer cells in metastatic sites. Nat Rev Cancer 2: 563-572

      Cortes E, Lachowski D, Robinson B, Sarper M, Teppo JS, Thorpe SD, Lieberthal TJ, Iwamoto K, Lee DA, Okada-Hatakeyama M et al (2019) Tamoxifen mechanically reprograms the tumor microenvironment via HIF-1A and reduces cancer cell survival. EMBO Rep 20

      Deneve E, Riethdorf S, Ramos J, Nocca D, Coffy A, Daures JP, Maudelonde T, Fabre JM, Pantel K, Alix-Panabieres C (2013) Capture of viable circulating tumor cells in the liver of colorectal cancer patients. Clin Chem 59: 1384-1392

      Engstrand J, Stromberg C, Nilsson H, Freedman J, Jonas E (2019) Synchronous and metachronous liver metastases in patients with colorectal cancer-towards a clinically relevant definition. World J Surg Oncol 17: 228

      Guidotti LG, Inverso D, Sironi L, Di Lucia P, Fioravanti J, Ganzer L, Fiocchi A, Vacca M, Aiolfi R, Sammicheli S et al (2015) Immunosurveillance of the liver by intravascular effector CD8(+) T cells. Cell 161: 486-500

      Horowitz M, Neeman E, Sharon E, Ben-Eliyahu S (2015) Exploiting the critical perioperative period to improve long-term cancer outcomes. Nature reviews Clinical oncology 12: 213-226

      Jeon S, Juhn JH, Han S, Lee J, Hong T, Paek J, Yim DS (2013) Saturable human neopterin response to interferon-alpha assessed by a pharmacokinetic-pharmacodynamic model. Journal of translational medicine 11: 240

      Katlinski KV, Gui J, Katlinskaya YV, Ortiz A, Chakraborty R, Bhattacharya S, Carbone CJ, Beiting DP, Girondo MA, Peck AR et al (2017) Inactivation of Interferon Receptor Promotes the Establishment of Immune Privileged Tumor Microenvironment. Cancer cell 31: 194-207

      Katz SC, Pillarisetty VG, Bleier JI, Shah AB, DeMatteo RP (2004) Liver sinusoidal endothelial cells are insufficient to activate T cells. Journal of immunology 173: 230-235

      Kuruppu D, Christophi C, Bertram JF, O'Brien PE (1998) Tamoxifen inhibits colorectal cancer metastases in the liver: a study in a murine model. Journal of gastroenterology and hepatology 13: 521-527

      Lalor PF, Shields P, Grant A, Adams DH (2002) Recruitment of lymphocytes to the human liver. Immunol Cell Biol 80: 52-64

      Lambert AW, Pattabiraman DR, Weinberg RA (2017) Emerging Biological Principles of Metastasis. Cell 168: 670-691

      Pandey E, Nour AS, Harris EN (2020) Prominent Receptors of Liver Sinusoidal Endothelial Cells in Liver Homeostasis and Disease. Front Physiol 11: 873

      Sallusto F, Geginat J, Lanzavecchia A (2004) Central memory and effector memory T cell subsets: function, generation, and maintenance. Annu Rev Immunol 22: 745-763

      Schreiber G (2017) The molecular basis for differential type I interferon signaling. J Biol Chem 292: 7285-7294

      Soares KC, Foley K, Olino K, Leubner A, Mayo SC, Jain A, Jaffee E, Schulick RD, Yoshimura K, Edil B et al (2014) A preclinical murine model of hepatic metastases. J Vis Exp: 51677

      Sorensen KK, Smedsrod, B. (2020) The Liver Sinusoidal Endothelial Cell: Basic Biology and Pathobiology. In: The Liver: Biology and Pathobiology, Sixth Edition pp. 422-434. John Wiley & Sons Ltd. :

      Stone JD, Chervin AS, Kranz DM (2009) T-cell receptor binding affinities and kinetics: impact on T-cell activity and specificity. Immunology 126: 165-176

      Su T, Yang Y, Lai S, Jeong J, Jung Y, McConnell M, Utsumi T, Iwakiri Y (2021) Single-Cell Transcriptomics Reveals Zone-Specific Alterations of Liver Sinusoidal Endothelial Cells in Cirrhosis. Cell Mol Gastroenterol Hepatol 11: 1139-1161

      Theeuwes F, Yum SI (1976) Principles of the design and operation of generic osmotic pumps for the delivery of semisolid or liquid drug formulations. Ann Biomed Eng 4: 343- 353

      Van Cutsem E, Cervantes A, Adam R, Sobrero A, Van Krieken JH, Aderka D, Aranda Aguilar E, Bardelli A, Benson A, Bodoky G et al (2016) ESMO consensus guidelines for the management of patients with metastatic colorectal cancer. Ann Oncol 27: 1386-1422

      van Gestel YR, de Hingh IH, van Herk-Sukel MP, van Erning FN, Beerepoot LV, Wijsman JH, Slooter GD, Rutten HJ, Creemers GJ, Lemmens VE (2014) Patterns of metachronous metastases after curative treatment of colorectal cancer. Cancer Epidemiol 38: 448-454

      Vidal-Vanaclocha F (2008) The prometastatic microenvironment of the liver. Cancer microenvironment : official journal of the International Cancer Microenvironment Society 1: 113-129

      Wang DS, Ohdo S, Koyanagi S, Takane H, Aramaki H, Yukawa E, Higuchi S (2001) Effect of dosing schedule on pharmacokinetics of alpha interferon and anti-alpha interferon neutralizing antibody in mice. Antimicrob Agents Chemother 45: 176-180

      Wohlfeil SA, Hafele V, Dietsch B, Schledzewski K, Winkler M, Zierow J, Leibing T, Mohammadi MM, Heineke J, Sticht C et al (2019) Hepatic Endothelial Notch Activation Protects against Liver Metastasis by Regulating Endothelial-Tumor Cell Adhesion Independent of Angiocrine Signaling. Cancer research 79: 598-610

      Yu X, Chen L, Liu J, Dai B, Xu G, Shen G, Luo Q, Zhang Z (2019) Immune modulation of liver sinusoidal endothelial cells by melittin nanoparticles suppresses liver metastasis. Nat Commun 10: 574

      Zhu Y, Karakhanova S, Huang X, Deng SP, Werner J, Bazhin AV (2014) Influence of interferon-alpha on the expression of the cancer stem cell markers in pancreatic carcinoma cells. Exp Cell Res 324: 146-156

      Ziv Y, Gupta MK, Milsom JW, Vladisavljevic A, Brand M, Fazio VW (1994) The effect of tamoxifen and fenretinimide on human colorectal cancer cell lines in vitro. Anticancer Res 14: 2005-2009

      Reviewer #2 (Significance):

      • Since liver metastases of various tumor are tremendously hard to treat and mediates therapy resistance, the authors focus on a very important field of research - prevention of liver metastasis formation.
      • This study adds insights into the mechanisms of action of IFNα1 in the hepatic microenvironment. It extends previous findings of Toyoshima who described anti-tumoral effects of IFNα1 released by dendritic cells in the liver.
      • The study is well designed and will be of great interest for the scientific community. Besides, it will be appreciated by physicians, However, as mentioned in the discussion, further clinical studies by physicians are needed to translate its findings into the clinic.
      • The author of this review works as physician and often deals with liver metastasis. It is one field of focus of her/his research.
    1. Author response


      • A comment on the overall organization of the paper. Figure 2 has a major location in the paper, but it seems that its main takeaway is that these MAPs aren't really involved in the main process this paper is probing. While these are important findings, it might be more satisfying to move some of the central results earlier.

      We agree that this figure displays mostly negative results. However, most work on anaphase B microtubule dynamics from our group and others has focused on the effect that motors and MAPs may have on microtubule dynamics (EB1 and kinesin-8 in budding yeast, klp9 in fission yeast). Therefore, we consider it is important to clearly show that previously proposed candidates are not required for the observed decrease in microtubule growth speed, prior to introducing the unexpected effect of the membrane.

      *A model schematic might drive home the main finding of the paper, and be particularly useful for readers who are not experts in microtubule or spindle dynamics. That said, the Discussion does an excellent job of summarizing the findings and explaining the takeaway message(s), even for the non-expert.

      We have added a model schematic and we have referred to it in the main text.

      Specific comments

      • ‘In higher eukaryotes’ - Suggest avoiding the terms higher and lower when describing organisms, and instead, directly defining which organisms, for instance in animals/metazoans that would be a better description.

      We have removed this terminology.

      • Figure 1 E-F - It is hard to see the difference in the distribution, maybe a different color could be used instead of stars.

      We have used a different color.

      • Figure 1 Data shown in pink in G comes from 832 midzone length measurements during anaphase, from 60 cells in 10 independent experiments - The pink here does not correspond to the pink coding in D, consider colour choice for clarity across panels.

      We have changed this.

      • Finally, yeasts undergo closed mitosis - How does this relate to the findings in the Dey paper (cited here) which shows it was somewhat semi-closed or semi-open. According to the Dey paper, the membrane disassembles locally twice, at the SPB and the bridge.

      Membrane disassembly at the nuclear membrane bridge occurs at late anaphase, and leads to the disassembly of the spindle, presumably by the action of cytoplasmic factors (Dey et al. 2020). We do not believe the membrane disassembly itself has a role in spindle elongation or microtubule dynamics, as when it happens the spindle is then disassembled. However, the fact that les1D reduces the decrease in microtubule growth speed associated with internalisation of microtubules in the nuclear membrane bridge suggest that the organisation of the nuclear membrane bridge required for its local disassembly at late anaphase might affect microtubule growth (see section “Formation of Les1 stalks […]”).

      • ‘vertical comets in kymographs (Fig. 1C) do not correspond to non-growing microtubules, but rather microtubules that grow at a speed matching the sliding speed’- For clarity, it might be nice to add: "(as the SPB moves away from the plus end in the kymograph)".

      We have included this useful clarification.

      • ‘significantly shorter than in interphase, where growth events last more than 120 seconds on average [42, 43]. Microtubule shrinking speed did not change during anaphase either (Fig. 1-Supplement 1D), and was on average 3.56±1.75 μm/min, also lower than in interphase (~8 min/μm)’ - This comment concerns the comparison of growth and shrinking rate as well as growth duration. The authors did not measure microtubule dynamics in interphase in this manuscript but compared their numbers to literature values. The comparison raises some questions for three reasons: 1) the microscopy method used is different in this paper and the two references provided, 2) the sample is mounted differently compared to the two references provided - 1) and 2) combined could lead to different levels of stress on the cells which could affect MT dynamics-, 3) (probably the most important caveat) the experiments are done at different temperatures: 27C in this paper versus 25C in the references provided. Microtubule dynamics are sensitive to temperature so this could explain part of the differences observed. Also, there are multiple values published for MT dynamics in interphase depending on the strain used and the microscopy method used. Suggest that the authors measure microtubule dynamics in interphase cells at 27C in SIM to ensure that the differences are not due to the technical parameters employed. Small item - should ‘8 min/μm’ read “8 μm/min"?

      We have measured microtubule growth speed and growth event duration using GFP-Mal3 during interphase and anaphase B in the same conditions as proposed (see Figure 1 – Supplement 2). Unfortunately, shrinkage speed cannot be measured using GFP-Mal3, so we cannot confirm that the difference between our measurements and the literature values would be observed.

      • ‘we observed two populations of microtubules (fast and slow growing)’ - Does this statement about thistle fast and slow growing populations refer to the data in Fig. 1C and 2A?

      Yes, we have added reference to this figures in the next sentence (mentioned below).

      • ‘In some cells, all microtubules seemed to switch to the slow growing phase simultaneously (Fig. 1C), while in others fast and slow growing microtubules co-existed (Fig. 2A)’ - This is a very interesting observation, could we know how many cells (%) were detected in each case? Is it that in 90% of the cells the switch is simultaneous, and hence the microtubule growth is somehow synchronized? Or is it more random, e.g. around 50%?

      This was just to point the reader to two kymographs and show that a clear point where all microtubules change speed is not present in all kymographs, as one may think from Fig. 1C. Later in the paper, we show that the change in growth depends on whether the microtubule rescue occurs inside or outside the nuclear membrane bridge, so it is a matter of where microtubules are rescued once the dumbbell transition occurs, which is a stochastic process. We have added another sentence pointing the reader to examples in the kymograph (see line 152, This representation captures…).

      • On such a plot, the data points visibly cluster in two separate clouds and the variation of growth speeds can be fitted by an error function (Fig. 1F)’ - It is unclear that there are two distinct clusters, maybe the assertion should be toned down, or some sort of cluster analysis provided.

      We acknowledge that the data is widely spread across the y axis, and given that the magnitude “distance to the closest pole at rescue” is continuous the transition is not a clear cut. However, we consider the fact that the averaged curve closely matches the error function fit to be sufficient evidence for the existence of two populations of microtubule growth. Additionally, R2 of the fit is ~0.5 indicating that half of the variance is explained by this model. In any case, we show later that these two populations do exist (Fig. 3D), and why plotting microtubule growth against distance to the closest pole at rescue is a good way to segregate them (Fig. 3E).

      • ‘speed of interphase microtubules (~2.3 μm/min)’ - It would be interesting to see the dynamics in a les1 mutant (Dey Nature 2020) paper. Just as a control for presence/absence of the bridge?

      We thank the reviewers for kindly suggesting this interesting experiment. We have included it after the ase1 section. Les1 forms stalks at the edges of the nuclear membrane bridge that restrict nuclear membrane disassembly to the center of the bridge at the end of mitosis (Dey at al. 2020). While les1 deletion does not prevent the formation of the nuclear membrane bridge, it has been proposed that Les1 stalks may constitute sites of close interaction between the nuclear membrane and the spindle. Therefore, these sites may influence microtuble growth. Indeed, we have found that removing these Les1 stalks by either deleting les1 or nem1 leads to a smaller decrease in microtubule growth speed when plus ends enter the nuclear membrane bridge (see section “Formation of Les1 stalks […]”)

      *‘Figure 2, Transition from fast to slow microtubule growth occurs in the absence of known anaphase MAPs’ - It looks like the overlap zone is larger on the mal3 kymograph. Is the size of the midzone changed in some of the mutants? It could be important to report. Related to it, is the spindle length changed in some of the mutants? (It does not look like it from the kymographs displayed).

      The midzone is indeed longer in mal3D strains, now this can be seen in Fig. 2 – Supp. 2 and it is mentioned in the main text in line 272. As for the spindle length, diverse kinds of alterations in spindle length have been previously reported for the mutants that we used in this study. For instance, ase1D /cls1off cells have shorter spindles at anaphase onset (Loiodice et al. 2005 and data not shown), and klp5Dklp6D have longer spindles at anaphase onset (Syrivatkina et al. 2013). klp9D / clp1D / dis1D cells have lower spindle elongation velocity and may not reach the wild-type spindle length by the end of anaphase (Kruger et al. 2019). Despite these differences, the decrease in microtubule growth as a function of distance to the closest pole has a similar tendency across conditions, suggesting that the mentioned differences in spindle length are unlikely to have an important effect.

      • Additionally, adding the data about rescue localization in the mutant (equivalent of Fig 1 G) would be interesting to better describe the role of these different proteins. Figure 2, Panel G to L - Could the authors indicate the value for the average +/- error in each bin for the WT and the mutants? Also, it is hard to say from the plots, but it looks like the WT average speed in the first bin is different in every panel, that would be good to know to have an idea of the reproducibility/variability.

      We have added a figure with the rescue distribution (see Fig. 2 – Supp. 2). This apparent difference in the wt speed in different experiments might have come from looking at normalised data. The new way of representing the data in fig. 2H and J shows that the microtubule growth velocity in the wild-type is very consistent across experiments. We have added a table with microtubule growth velocity values (Table 1), and the source data is available.

      • The dots making up the "thick lines" are centered on 1.5/2.5/etc.. in some panels (G and K) and centered on 1/2/3/etc.. the others (I,J,L). Could the authors provide some clarification?

      We have fixed this inconsistency across the paper.

      • Figure 3 - Can the authors indicate the average values +/- error for each of the distributions in Fig. 3D? Maybe on the plot itself, in the legend or as a table. This would make them easily available without having to infer them from the Y axis. This comment is also valid for Fig 4I and 4J.

      We have added tables with average values and confidence intervals in the appendix.

      • Figure 3E ‘Distance from the plus-end to the nuclear membrane bridge edge at rescue as a function of distance from the plus-end to the closest pole at rescue’ - The Y axis reads as "distance to the bridge edge" but it shows negative values, could this be "position to the bridge edge" instead? (same item throughout the text).

      We have fixed this.

      • Figure 3 ‘Number of events: 442 (30 cells) wt, 260 (27 cells) klp9OE, 401 (35 cells) cdc25-22, from 3 independent experiments’ - P values this small raise a concern. Presumably the number of degrees of freedom in the regression analysis should not exceed the number of independent experiments. Instead, the DoF listed under "error" in the analysis output is hundreds or thousands instead of 3. To address this, the regression analysis should use either the "Error" function in R or a linear mixed-effects model to account for the nesting of the repeated measurements within each independent experiment. Alternatively, it is also possible to just calculate summary means for each independent experiment, and calculate p values based on that N=3. See: Lazic. Experimental Design for Laboratory Biologists. p. 157. and the supplemental file of: https://doi.org/10.1371/journal.pbio.2005282 and the additional file 1 of: https://doi.org/10.1186/s12868-015-0228-5 and this for an alternative plotting approach: https://doi.org/10.1083/jcb.202001064 Recommend either recalculating the p values by one of the methods above or removing the reported p values from the paper. The large effects observed in many cases are self-evident without a significance metric, so eliminating the p values would be acceptable here. (This comment applies to other figures through the paper that report p values based on number of cells or number of measurements instead of number of independent samples/experiments.)

      We thank the reviewers for suggesting the improvements to the statistical analysis, as well as for pointing us to useful resources that described the statistical methods and their implementation in detail. We have followed Aarts et al. 2015 and used a linear mixed effects model (see Methods>Statistical Analysis)

      Due to the change in statistical analysis method, to show that some of the differences we had reported previously were significant, we included more cells in the analysis from our existing data. We did this for klp5Dklp6D kymographs (Fig. 2I and Fig.2 – Supp. 1). Spindle dynamics in ase1D (Fig. 5D and Fig. 5 – Supp. 1) and klp9D (Fig. 2 – Supp. 3 A, C). Cell length (Fig. 3 – Supp. 1A).

      For the same reason, we measured anaphase spindle elongation velocity (Fig. 3 – Supp. 1C) from kymographs instead of measuring them from the 1 minute interval movies that we had used previously (from Fig. 3 – Supp 1B). We have reflected this in the methods (see added text in line 800 and deleted text in line 809 in the document with changes highlighted).

      None of these changes has altered our conclusions.

      • Figure 4 - Nice experiment. It brings the question of how cell-shape affects all these dynamics (probably out of the scope of this work). But a for3 mutant for example?

      This is an interesting suggestion, to be tested in the future. Furthermore, we believe that nuclear shape should also have an important effect, since the spindle is confined inside the nuclear membrane. We would expect that mutants that perturb nuclear shape might have effects on microtubule growth. We have observed that the decrease in growth speed associated with internalisation of microtubules in the nuclear membrane bridge is reduced upon nem1 deletion, which increases nuclear membrane surface, and produces membrane ruffling (Fig. 4-Supplement 2). However, nem1 deletion also removes les1 stalks from the nuclear bridge (Dey et al. 2020). It would be interesting to find a perturbation of the nuclear membrane that does not remove the les1 stalks.

      • ‘Ase1 is required for microtubule growth speed to decrease during anaphase B, this is unlikely to be a direct effect’ - If it is unlikely to be a direct Ase1 effect is the title of the section accurate? "Ase1 is required for normal rescue distribution and for microtubule growth speed to decrease in anaphase B"

      Ase1 recruits multiple proteins to the spindle midzone, so the fact that ase1 deletion produces a given phenotype does not necessarily mean that this phenotype results from the absence of Ase1 protein activity. For instance, deleting ase1 perturbs rescue distribution, but it does not mean that Ase1 acts as a rescue factor itself, or at least to a relevant extent, given that deletion of cls1 completely prevents rescue, but ase1 deletion does not. In the discussion we propose some indirect effects of ase1 deletion that may produce this effect. In any case, upon more careful analysis we have found that ase1 deletion does not prevent the decrease in microtubule growth speed during anaphase B, but rather makes it smaller (see section “The decrease in growth speed associated with internalisation of microtubules in the nuclear membrane bridge is reduced upon ase1 deletion”).

      • Figure 5 - What about an ase1 lem1 double mutant?

      We suppose that the intended gene is les1. We have studied the effects of les1 deletion in the new version of the manuscript. However, we do not see the information we would obtain from a double deletion ase1D les1D.

      • ‘In summary, Ase1 is required for rescue organisation and for microtubule growth speed to decrease during anaphase B ‘- In this context it could make sense to discuss the observations from this paper (doi:10.1371/journal.pone.0056808) about the role of Ase1 ortholog's MAP65-1 in coordinating MT dynamics within bundles.

      In the mentioned paper, the authors showed that the presence of PRC1 (ase1 orthologue) in bundles increases microtubule rescue rate, and that it slightly reduces microtubule growth speed.

      We observe a small increase in microtubule growth speed throughout anaphase upon ase1 deletion (Fig. 5), which is consistent with the in vitro observation that PRC1 decreases microtubule growth. However, once more this might not be a direct effect of Ase1, since less Cls1 is recruited if ase1 is deleted, and Cls1 reduces microtubule growth speed (Fig. 2). In addition, this can also be a result of higher concentration of tubulin / MAPs resulting from less polymerised tubulin in ase1 deleted cells, which have less spindle microtubules on average.

      Regarding the increase in rescue rate produced by PRC1 in vitro, it is possible that Ase1 contributes to microtubule rescue in the spindle. However, given that no rescues occur upon inactivation of cls1 (Bratman et al. 2007), we believe Cls1 is the dominant factor, and Ase1 contribution is likely negligible.

      • ‘We initially set the microtubule growth velocity to 1.6 μm/min (early anaphase speed, Fig. 1F), and aimed to reproduce the experimental distribution of positions of rescue and catastrophe at early anaphase (spindle length < 6 μm’ - Kudos to the authors for detailing the model and its parameters in a way that even non-modelling experts can understand.

      Discussion - ‘Our data suggests that microtubule growth speed is mainly governed by spatial cues’ - Is it right to assume that in the cases where fast and slow growing microtubules were simultaneously observed, the fast microtubules were not/had not yet reached the midzone?

      Our data suggests that it’s not about being inside the midzone, but rather inside the nuclear membrane bridge formed after the dumbbell transition. We have elaborated more on this in the main text, pointing the reader to examples in the kymograph, and giving a quantitative argument for distance to the closest pole being a better predictor than anaphase progression or position with respect to the center (which is equivalent to distance to the midzone), see line 152.

      • Methods - ‘PIFOC module (perfect image focus), and sCMOS camera’ - Is this Nikon's "Perfect Focus" autofocus, or some other manufacturer's system? And back-thinned sCMOS.

      We have clarified this in the Methods section.

    1. Author Response

      Reviewer #1 (Public Review):

      1) In terms of the prior hypothesis here I think the authors justify a prior with respect to striatum and I think the most principled analysis of their hypothesis would be based on volumes of interest in striatum. Figure 1 does show difference in MTsat in striatum between neurotypicals and DLDs but the changes are all in the caudate I think- I cannot see anything in putamen. The authors actually describe changes in only one part of anterior caudate. The authors do describe a number of previous conflicting studies that examine caudate structural changes but that is not their hypothesis. The discussion goes into developmental changes affecting striatum at different times that might be relevant and would require a longitudinal study for a definitive study - as the authors acknowledge.

      The reviewer is correct that at this statistical threshold we only observe MTsat differences in the caudate nucleus. Changes in the putamen did not survive this threshold. Lowering the threshold for MTsat (our maps are openly available on Neurovault), or an ROI analysis (see (https://osf.io/2ba57/)) does not reveal significant statistical differences in the putamen. As we noted in the paper, there are differences in the putamen in R1 (these are also observed in the ROI analysis).

      2) There is a lot of overlap between the caudate signal in the two groups - although the correlation of individual differences is reasonable. The caudate signal would not allow group classification.

      Yes, it is clear that these differences would not be sufficient to allow for group classification of DLD. We have discussed this overlap in the discussion.

      3) Outside of the caudate they do show changes in left IFG and auditory cortex that are hypothesised. But there is a lot else going on - I was struck by occipital changes in figure 1 which are only mentioned once in the manuscript.

      We now discuss these differences in the discussion. Note that we did not have any a priori hypotheses about these regions; to our knowledge, they have not been previously described and are not predicted by any theoretical accounts of DLD.

      4) Should I be concerned by i) apparent signal changes in right anterior lateral ventricle from group comparison in figure 1 ii) signal change correlation in right anterior lateral ventricle in figure 4 (slice 22) and iii) signal change outside the pial surface of the occipital lobe in figure 1?

      No – these may be accounted for by smoothing during analyses. Note, these changes at tissue boundaries are fairly commonly seen in statistical maps following smoothing but are not evident when data are projected onto a 3D surface.

      Reviewer #2 (Public Review):

      This work demonstrates the value that multiparameter mapping imaging protocols can have in uncovering microstructural neural differences in populations with atypical development. Previous studies looking at differences in brain structure have typically used voxel based morphometry (VBM) approaches where differences in volumes can be hard to interpret due to complex tissue compositions. The imaging protocol outlined in this paper can specifically index different tissue properties e.g. myelin, giving a much more sensitive and interpretable measure of structural brain differences. This paper applies this methodology to a population of adolescents with developmental language disorder (DLD). Previous evidence of structural brain differences in DLD is very inconsistent and, indeed, using traditional VBM the authors do not find a difference between children with DLD and those with typical language development. However, they provide convincing evidence that despite no macrostructural differences, children with DLD show clear differences in levels of myelin in the dorsal striatum and in brain regions in the wider speech and language network. This can help to reconcile previous inconsistent findings and provide a useful springboard for both theoretical and empirical work uncovering the nature of the brain bases of language disorders.

      We are grateful for these comments, and to the reviewer for pointing out some key strengths of this work.

      Strengths:

      The imaging protocol is robust and is explained very clearly by the authors. It has been used before in other populations so is an established method but has not been applied to populations of children with DLD before, yielding novel and very interesting results. The authors demonstrate that this is a methodology which could have great value in other populations that display atypical development, increasing the impact of these findings.

      The sample size is large for research in this area which increases confidence in the results and the conclusions.

      Rather than relying solely on group differences in brain microstructure to draw conclusions about neural bases of language development, the authors correlated brain microstructural measures with performance on standardised language tests, allowing stronger inferences to be drawn about the relationships between structure and function. This is often an important omission from developmental neuroimaging work. It gave increased confidence in the finding that alterations in striatal myelin are linked to language difficulties.

      Weaknesses:

      The authors rightly use the CATALISE definition of developmental language disorder, which differs from much of the previous literature by not requiring that children with language difficulties have nonverbal ability that is in the normal range. As can be common when using this definition of DLD, the group with DLD have significantly weaker nonverbal ability than the typically developing group. The authors show that brain microstructural differences correlate with language ability but they don't rule out a correlation with nonverbal or wider cognitive skills. Given the widespread differences in myelination across areas of the brain, including those that weren't predicted e.g. medial temporal lobe, it is plausible that perhaps some of the brain microstructural differences are not linked directly to language impairment but a broader constellation of difficulties. Some of the arguments in the paper would be strengthened if this interpretation could be ruled out.

      To rule out the effect of nonverbal IQ or wider cognitive differences, we have conducted stepwise regression analyses on the quantitative data extracted from the statistical cluster covering the caudate nuclei, assessing the influence of factors such as language proficiency, verbal memory and IQ. We find that language status accounts for the most variance, rather than nonverbal IQ or verbal memory (details are included in the paper).

      We also discuss this point in the discussion, pointing to the presence of co-occurring differences in DLD and how these might account for some of the broader group differences we observe.

      The authors acknowledge in the limitations section that their data cannot speak to whether brain differences are a cause or consequence of language impairment. However, there are some implied assumptions throughout the discussion of the results that brain differences in myelination have functional consequences for language learning. A correlation between structure and function does not indicate this level of causality, particularly in an adolescent population - function could just as easily have had structural consequences or environmental differences could have influenced both structure and function. In my view, the speculations about functional consequences of myelin differences are not fully supported by the data collected.

      The reviewer is correct in saying that the myelin deficit could be either a cause or a consequence of DLD or even that both are caused by a third factor. We specifically address this in the discussion section, and note a longitudinal analysis would be the best way to address this question. Indeed, R3 notes about our paper, “…it does a very good job of avoiding the common trope of assuming neural differences play a causal role in DLD (when in fact, reduced atypical development could cause neural differences)”.

      The data suggest that there is much greater variability in left caudate nucleus MTsat values for the DLD group than the other two groups. The impact this may have on the results is not discussed in the interpretation and it is unclear whether this greater variability occurs throughout all of the key MPM measures for the DLD group.

      Thank you for raising this important issue. In figure 1, we only plot the MTsat values from the caudate nucleus for visualisation, and as you note, there we is a considerable degree of variability within the DLD group. However, and crucially, this difference would not influence statistical interpretation of our results. The whole-brain analysis used involves permutation testing, and is robust to a difference in group variability. However, the issue of variability within DLD is important and we now highlight this in our discussion, noting that not every child with DLD will have reduced striatal myelin. Indeed, this variability is even more evident in figure 4. An important challenge for future studies is to understand the link between striatal myelination and the spectrum of language variability.

      Reviewer #3 (Public Review):

      Developmental Language Disorder (DLD) is observed in children who struggle to learn and use oral language despite no obvious cause. It is extremely wide-spread affecting 7-10% of children, and extremely consequential as it persists throughout life and has downstream effects on reading, academic outcomes, and career success. A large number of prior studies have attempted to identify the structural neural differences that are associated with DLD. These have generally shown mixed results, but support a number of candidate regions including left hemisphere language areas (particularly the inferior frontal gyrus), and striatal regions that are possibly linked to learning. However, these studies have suffered from small sample sizes and conflicting results. Part of this may be their reliance on traditional voxel-based-morphometric techniques which estimate cortical thickness and gray matter density. The authors argue that these measures are biologically imprecise; gray matter can be thinner for example, due to synaptic pruning or increased mylenation.

      The authors of this study offer a powerful new tool for understanding these differences. Multi-Parameter Mapping (MPM) is based on standard MRI techniques but offers several measures with much greater biological precision that can be tied specifically to myelination, a key marker of efficient neural transmission. The test a very large number of children (>150) with and without DLD using MPM and show strong evidence for fundamental biological differences in these children.

      This study features a number of key strengths. First, at the level of neuro-imaging, the MPM technique is new in this population and offers fundamental insight that cannot be obtained by other measures. Indeed, the authors wisely use a traditional gray matter approach (voxel based morphometry) and find few if any differences between children with DLD and typical development. This offers a powerful proof of the sensitivity of this approach. Moreover, the authors analyze their data comprehensively, looking at two measures of myelin (MTsat and R1) and their convergence.

      However, at the most important level, I think structural approaches (like MPM, diffusion weighted imaging and so forth) offer tremendous promise for dealing with this as they avoid the ambiguity associated with interpreting functional MRI. Are children showing reduced BOLD because they are less good at language processing? Or do the differences in brain function cause poorer language processing? Structural approaches - and MPM in particular - offer tremendous promise as they unambiguously assess the fundamental neuro-biology.

      Beyond the neuro-imaging this study is also strong in their sample and the measurements of language. The sample size is very large and an order of magnitude larger than existing studies. It is well characterized, and the authors use a large set of well-motivated measures that capture the relevant dimensionality of language. Moreover, the authors treat language both as a clinical category and a continuous measure which is consistent with current thinking on the nature of DLD as potentially the low end of a continuous scale rather than a discrete disorder.

      Finally, the discussion of this paper for the most part does a good job of fitting these neurobiological findings into our broader understanding of DLD. It does an excellent job of mapping the observed brain differences onto functional differences in the child. Importantly, in doing this it does a very good job of avoiding the common trope of assuming neural differences play a causal role in DLD (when in fact, reduced atypical development could cause neural differences).

      We are very grateful for the reviewer for taking the time to read our work so closely and pointing out these strengths in the work.

      Despite these strengths, I have a number of substantive concerns that if addressed will improve the overall impact of this paper.

      First, as the authors are aware, there is a long running and active debate in DLD as to whether DLD is the tail end of continuous distribution of children or a unique disorder (Leonard, 1987, 1991; Tomblin, 2011; Tomblin & Zhang, 1999). The results here offer great promise for informing that debate. And in that vein the authors quite appropriately analyze their data in two ways: once using DLD as a categorical variable and once using continuous measures of language. However, they don't really attempt to wrestle with the differences between the model.

      We have now included a section on the implications of our results for DLD in the discussion.

      Second, I was a little surprised to see the authors highlight left IFG in the discussion to the degree they did. While there was clear evidence for reduced myelin there in the MTsat analysis, this did not hold up in R1 analysis, and even in the MTsat, IFG was clearly not the primary locus. Rather the areas of differences seemed to be centered at Pre- and Post-Central gyrus and extending ventrally (to IFG) and posteriorly from there. Given debate on the role of IFG in language specific processing in general (Diachek, Blank, Siegelman, Affourtit, & Fedorenko, 2020; Fedorenko, Duncan, & Kanwisher, 2013), it was not immediately clear to me why that area was important to highlight. For example, some of the posterior temporal areas (and motor areas) that were found were equally important for perceptual, lexical and phonological processing that are important for other theories of DLD.

      We do see group differences in left IFG in the R1 analysis (see Figure 2) and they were more extensive than those seen in the MTsat analysis with which they overlapped. The reviewer is correct that the differences were limited to the opercular part of the IFG in both analyses whereas they extended more dorsally in the R1 analysis. They also extended ventrally to the anterior insular cortex. We respectfully disagree with the reviewer about the importance of highlighting these differences, given the importance of this region for language processing, and our previous hypotheses about this region. Even so, we agree that the posterior temporal and motor areas are of equal importance and have highlighted these in the discussion.

      The authors rightly point to their differences in the striatum as supporting theories of DLD centered around differences learning. However, as they discuss, there are also large differences throughout the brain in both perceptual, motor and language areas. These would seem to support theories of DLD centered around processing and representation. In particular, the differences in myelination likely are linked to differences in the efficiency of neural coding. This would seem to favor two theoretical views that might be worth mentioning - speed of processing (Miller, Kail, Leonard, & Tomblin, 2001), and approaches based on lexical processing (McMurray, Klein-Packard, & Tomblin, 2019; McMurray, Samelson, Lee, & Tomblin, 2010; Nation, 2014). I was surprised these were not mentioned, given the clear link to the timecourse of processing. Does then suggest that these theories might complement each other? It would be useful to see some more discussion of the implications of these findings for broader theories.

      We have now incorporated mention of these theories in the discussion and discuss implications. We agree with the reviewer that it would be interesting to see whether the different theories could be reconciled.

    1. Reviewer #2 (Public Review):

      Suvorov and colleagues present a well-supported genome-scale phylogeny for 149 Drosophila species based on thousands of single-copy-orthologs. They then use several approaches to estimate the extent of introgression across the phylogeny, and report that it is common both recently and deeper in the past.

      The main strength of this paper is that it uses a scale of sequencing that allows an assessment of genus-wide trends with reasonably good power. It also presents two new analysis approaches, but these represent fairly minor modifications of existing techniques to suit multiple gene alignments, and unfortunately their reliability is not evaluated in this paper. Nevertheless, the main finding that introgression is common appears to be well supported. This finding echoes those of similar recent studies on taxa such as cichlid fishes and Heliconius butterflies. The different approaches used, and different levels of sampling in these different studies do not allow for quantitative comparisons, leaving us with the somewhat vague conclusion that introgression is 'common' in all of these taxa. Perhaps most critically, the present paper does not delve any deeper into the evolutionary impacts of introgression, nor the factors at the species or genomic level that might determine its frequency. Below I describe some areas of concern in more detail.

      1. Extent of introgression

      Perhaps equally as interesting as the frequency of introgression per species across the phylogeny is the proportion of the genome of each species that is affected. Without such estimates, the full extent of introgression is difficult to assess.

      2. Sampling effects

      Since this paper is attempting to make an (admittedly crude) estimate of the extent of introgression in the entire genus, some discussion is needed to address the possible consequences of the fact that only around 10% of species in the genus are represented. For example, if sampling is very even, perhaps most ancient events would be detectable, but more recent events may tend to be missed simply because the species involved are not sampled.

      3. Ancestral structure

      The reasoning provided for dismissing the possible effect of ancestral population structure is unconvincing. First, the authors argue that it "seems less likely" that non-sister taxa would have bred more frequently in the ancestral population. However, this is the entire basis of the problem: it might be unlikely, but it can happen. Eriksson and Manica (2012 https://doi.org/10.1073/pnas.1200567109) provided a very reasonable scenario in which colonisation of a new region can lead to this pattern.

      Second, the authors argue that QuIBL "should not be impacted by ancestral structure because this method searches for evidence of a mixture of coalescence times: one older time consistent with ILS and one time that is more recent than the split in the true species tree and that therefore cannot be explained by ancestral structure." This argument needs clarification. My understanding is that the split in the "true species tree" would also be inflated if there was ancestral structure.

      My view is that ancestral structure leading to discordance between gene trees and species trees is itself an interesting phenomenon. In some ways, it is not conceptually distinct from introgression occurring soon "after" speciation if we consider ancestral structure as the beginning of a continuous speciation process, so I don't think it would weaken the paper to accept this as a possible contributing process.

      4. Discordant count test

      The statistical analysis in the DCT accounts for multiple testing of many triplets for introgression, but there is no mention of the fact that these triplets are non-independent. It is not clear to me whether this makes the correction used more or less conservative.

      If there are any cases where the internal branch is long and the number of ILS gene trees is very small or zero, use of a chi-squared test may not be appropriate.

      5. Branch length test

      The authors acknowledge that the BLT is "conceptually similar" to that of Hahn and Hibbins 2019 https://doi.org/10.1093/molbev/msz178, but to me it seems that the only material difference is the statistical procedure for testing for an significant difference between branch lengths.

      An important consideration that appears to have been ignored is whether selection can impact the distribution of branch lengths, especially since many of the the BUSCO genes used here will be under strong selective constraint.

      6. Intra-locus recombination

      The paper needs to address the possible impact of intra-locus recombination on all of the introgression tests. For the DCT, I imagine that counts would be biased toward the species tree topology if the inferred trees span multiple distinct genealogies (see for example simulations by Martin and Van Belleghem 2017 https://doi.org/10.1534/genetics.116.194720 Figure S7). This might reduce test sensitivity.

      Similarly, for the BLT, I would expect that true introgression would be more difficult to detect in the presence of recombination. It is possible that the block jackknife procedure of Hahn and Hibbins (2019, https://doi.org/10.1093/molbev/msz178) may be more suitable than the comparison of distributions of point estimates for genes used here.

    2. Reviewer #3 (Public Review):

      The authors compiled a collection of published and newly sequenced genomes to assemble the largest collection of Drosophila genomes to date. Using this dataset they extracted a set of single copy orthologs to use for phylogenomic analyses, with a focus on estimating a time-calibrated phylogeny and introgression.

      This new dataset is a valuable resource that will serve the broader community of Drosophila researchers opening many new avenues for future phylogenomics research. The workflow of focusing on BUSCO genes for all comparative analyses is simple in a good way -- it is easy to understand how the data were collected and it should be easily reproducible -- which makes it easy to read past the genomics details and focus on the analyses of these data.

      However, I feel this is an important aspect of the paper that should receive more details, perhaps in the supplement. I may have missed it, but I could not find statistics about this ortholog data set. On average, how long is each locus, how many variable sites are there, how many taxa are missing data for any given locus due to paralogy? Do the BUSCO genes include both introns and exons? It is also unclear from the description exactly how the BUSCO genes were extracted from genomes. Are they extracted from the final assembled genomes, or do you perform variant calling after identifying them to call heterozygous site? If heterozygosity is excluded, how might this impact metrics such as the branch length tests, especially among close relatives? It likely impacts node age estimates as well?

      The authors use this dataset to infer phylogenetic relationships among taxa using both ML concatenation (IQtree) and a two-step MSC approach (Astral) which yielded quite similar topologies, and they examined the impact of filtering loci with treeshrink, which had minimal impact. This new topology represents a substantial step forward for understanding the relationships among major Drosophila clades.

      One of the main results of this study is a new set of node age estimates on the tree. For this they estimated branch lengths in mcmctree from a concatenated matrix of 1000 loci in the presence of fossil calibrations. The fossil calibration scheme selected as the best option includes three fossils, one dating the divergence at the split from mosquitos (uniform 195-230Ma) and two ingroup calibrations (U(43,64) and U(15,43)). To me, the credible intervals on node ages seem incredibly narrow. The authors mention this as an improvement compared to earlier studies, but they also mention later that the total amount of sequence data does not greatly impact node dating. So I'm a bit confused why the node ages are expected to be more accurate here. It seems to me that time calibrations should be most accurate when the greatest number of fossils are available, and when very appropriate Bayesian priors on set on the analysis. The effect of sequence variation is then relatively small. But here there are very few fossils, one of which is hugely distant, and so I would not expect highly precise age estimates. So I guess my question to the authors is, what do you think is going on here? Perhaps further description in the supplement of how the mcmctree method implemented here differs from traditional node dating done in a program like BEAST would help to clarify.

      Considering that this paper aims to infer the new best time calibrated tree for the Drosophila community, I think that the current description of fossil calibration schemes, which primarily refers to other publication names in the supplement, is insufficient. Which fossils are used in those studies, are you using those fossils as calibrations here, or are you implementing secondary calibrations based on their phylogenetic results? The reader should not have to read every one of those papers to understand the basis of the calibrations in this paper.

      Fig.1 shows nodal age posterior probabilities. Are these 95% confidence intervals? The taxon labels are too small in this figure, both on the large tree and especially in the inset figure. The legend refers to fossil taxon names used for calibrations, but because it is still unclear to me where the fossils are placed on the tree. Are the calibrations indicated somewhere in the figure?

      The authors demonstrate evidence of introgression by showing mostly overlapping evidence from two different types of tests. Together, these tests show that most major clades contain significant imbalanced discordance in gene tree counts or branch lengths. The taxon labels in Figure 2 are unfortunately quite unreadable, especially the matrix labels, which makes it difficult to interpret.

      I do not see a reason for presenting new names and acronyms for the introgression tests used in this study. The "DCT" is described as being similar to a suite of existing tests which are also based on comparison of rooted-triplet gene tree frequencies. These methods have been presented in many frameworks (BUCKy, D-stat, f4, etc.) and the only difference here seems to be the precise method used to determine significance. Similarly "the BLT is conceptually similar to the D3 test" could be replaced by just saying we implemented the D3 test which we refer to here as a 'branch length test (BLT)' to clarify that you have not in fact created a new test (e.g., you say "The first method we developed was the discordant-count test...")

      I am not very satisfied with the estimates of the "upper bounds" of introgression used here. It seems that there could possibly be many ways in which admixture edges could be drawn on the tree to explain the matrix of significant test results, and it is better to let formal network inference methods (e.g., SNAQ, Phylonet) infer these edges rather than guess at their placement. The current approach of "placing introgression events between pairs of branches for which most descendant extant taxa show evidence of introgression" leaves significant room for subjectivity.

      The authors did implement phylonet, but not very exhaustively. Why only fit a single edge on the tree instead of multiple? The authors state "networks with more reticulation events would most likely exhibit a better fit to observed patterns of introgression but the biological interpretation of complex networks with multiple reticulations is more challenging". I don't think this type of result is any more complicated to understand than the current approach used by the authors of drawing edges manually. And it is much less subjective. The authors say that it is computationally intractable, and this may be true for clades above ~15 tips, but testing on smaller trees by subsampling 10-12 tips seems feasible. From my experience network inference using pseudo-likelihood methods in SNAQ or phylonet takes a few minutes to fit 1 edge, and a few hours to fit 2-3 edges.

      Currently the two major results of the paper seem disjointed. The authors infer a time-calibrated tree, and they infer introgression events, but there is not much connection between the two. I applaud the authors on one hand for being cautious in interpreting their "upper bounds" of introgression to say too much about when they think introgression has occurred in the context of the time-calibrated tree. I think there is insufficient confidence in the introgression timing estimates to do that. But, what about the inverse relationships? Does this extent of introgression across the tree impact your confidence in the estimated timing of divergence events? One expectation would be that it is biasing all of the divergence times to appear younger. See my suggestions for addressing this.

      Overall, this study presents an impressive new dataset and important new results that greatly impact our understanding of the evolutionary history of Drosophila. Although the estimates of node ages and introgression events may be imperfect, they are clearly a step forward. It is clear from these results that introgression has occurred throughout the history of Drosophila, and this study paves the way for further investigation of these patterns, as the authors propose in their conclusions.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We thank the reviewers for their careful and constructive analysis of our work. Our manuscript aims to exemplify the use of cryo-soft-X-ray tomography (cryoSXT) as a technique to study the dynamic changes to host-cell morphology that accompanies virus infection. This emerging method has several strengths when compared to other ultrastructural analysis techniques. Specifically, cryoSXT does not require the addition of contrast agents and therefore samples can be prepared via plunge cryopreservation alone, allowing us to capture them in a near-native state. Furthermore, the penetrating power of soft X rays and large field of view in cryoSXT allow rapid data acquisition, facilitating quantitative analysis of 10s to 100s of individual cells. We combined high-throughput cryoSXT data collection with semi-automated tomogram segmentation and fluorescence cryo-microscopy to study a recombinant herpes simplex virus (HSV)-1 that produces a pattern of fluorescence indicative of the stage of the infection in a single cell (‘timestamp’ HSV-1) and quantitatively monitored changes in lipid droplet, vesicle and mitochondrial morphology as HSV-1 infection progresses. In response to the reviewers’ comments, we have expanded our analysis of lipid droplet morphology, identifying a transient increase in the size of lipid droplets at early stages of HSV-1 infection, and completed additional fluorescence microscopy analysis to support our statements about the changes to microtubule, mitochondrial and Golgi morphology that accompany infection. Furthermore, we have included additional discussion on the relative merits of cryoSXT versus other ultrastructural analysis techniques like transmission electron microscopy, electron cryo-microscopy and electron cryotomography. We believe that our study serves as a powerful example of how cryoSXT can be used for quantitative cell biology and will be of broad interest to an audience of cell biologists and colleagues who study infection processes.

      1. Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      The authors have performed an explorative study, investigating morphological changes that occur in cells upon infection with Herpes Simplex Virus 1 (HSV-1) by the use of cryo soft X-ray tomography (cryoSXT). cryoSXT is an emerging technique for imaging of biological material, that allows for 3D imaging of significant volumes of cells under near-native conditions, without the need for sectioning or sample preparation other than rapid freezing. Reference (Groen et al. 2019) provides a nice list of examples from various biological samples. By the use of cryoSXT, the authors confirm findings that they have previously published by use of light and expansion microscopy (ref 16 from manuscript), namely an enrichment of small vesicles close to the nucleus and elongation and branching of mitochondria into interconnected networks in infected cells.

      Infection experiments were done in two different cell types in this study (HFF and U2OS), and a timestamp reporter virus that allows to distinguish between early and late stages of infection was used to provide more context to the observed morphological changes in the cells.

      Major comments

      It is a bit difficult to follow the main message throughout the manuscript, as the topics brought up in the introduction, results and discussion sections are not very coherent. The introduction gives some background on the virus and the timestamp reporter system, and further focuses on cryoSXT as a method and how this can overcome sample preparation artefacts that might be introduced by chemical fixation and sample processing. The results do not contain any direct comparisons between cryoSXT and other methods or sample preparations (light microscopy or EM-based), and the discussion only to a small extent comes back to the advantages brought by cryoSXT compared to other methods. Rather the discussion largely revolves around the possible involvement of microtubules in generating the observed morphological changes, and the possible meaning of elongated mitochondria in infected cells. Both of these topics are barely introduced, and not at all experimentally interrogated in the case of microtubules. There is also some discussion about Golgi fragmentation, although this is also not directly interrogated by cryoSXT in the current manuscript.

      We thank the reviewer for these comments. We have: - Updated the introduction to enunciate more clearly the aims of our study - Included a substantial comparison of the relative merits of cryoSXT versus other ultrastructural analysis techniques (TEM, cryoEM and cryoET) in the discussion - Updated the introduction to introduce the concepts of microtubule and mitochondrial morphology changes during infection that are covered in depth in the discussion - Included additional microscopy experiments, including super-resolution structured illumination microscopy (SIM), to demonstrate the changes in Golgi (Figures 6 and 7), microtubule (Figure 8) and mitochondrial (Suppl. Figure 4) morphology that accompany HSV-1 infection. These additional experiments support the hypotheses presented in the submitted manuscript, namely that microtubule organising centres are disrupted, Golgi membranes dispersed, and mitochondria redistributed as HSV-1 infection progresses.

      The authors perform imaging with a 40nm or a 25nm zone plate, where the 25nm zone plate provides improved resolution of a smaller volume compared to the 40nm zone plate. The authors do not really make use of the improved resolution offered by the 25nm zone plate in the results, so the motivation for turning to this (and therefor also changing cell line) is a bit unclear. The reason for the U2OS cell line to better preserved during X ray imaging is also not discussed, maybe it has to do with the thickness of the cells (as the U2OS cells are very flat). Furthermore, images from the 25 nm zone plate are not compared side by side to neither the 40nm zone plate nor standard TEM, which makes it hard to judge what the increased resolution really brings.

      Only one zone plate can be installed at any one time in the microscope and altering the zone plates requires extensive hardware changes that are outside the control of beamline users. We agree that this was not clearly discussed in the text. We have included additional text in the results (lines 207–208) and methods (lines 633–638) explaining this operational limitation and clarifying which zone plate was used for which experiment. In this study we observed that tomograms acquired with the 25 nm zone plate did not provide significantly more biological information than with the 40 nm zone plate, and thus both are suitable for characterisation of overarching cellular ultrastructural changes that accompany infection. We have added a sentence to this effect to the discussion (lines 410–412). Like U2OS cells, HFF-hTERT cells are also very flat. They appear more robust compared to HFFs when used for protracted exposures to soft X-rays and less likely to suffer from heat deposition after an extensive data collection round. We can speculate at this point that this could conceivably be due to the particular chemical composition of the intracellular environment in different cell lineages but it is impossible to offer anything other than speculation and therefore we have refrained from commenting further on this in the manuscript.

      The switch from a 40 to a 25nm zone plate required a switch in the model system, as mentioned above. The chosen cell types are not linked to biological relevance however (neurons and epithelial cells are mentioned as relevant cell types in the introduction), and it is therefor a bit unclear what the relevance is of keeping results from both cell types and comparing the two, rather than sticking to the one that works with cryoSXT. The results from the U2OS cells could still be compared by LM to the HFF cells if this contributes to the aim of the study.

      U2OS cells were chosen because they have been used previously for studies of HSV-1 infection (references 55–56) and are known to be well suited to cryoSXT analysis (references 32–33). We have added a sentence to this effect to the results (lines 208–211).

      The distribution of the viral proteins of the timestamp reporter virus is used to categorize infected HFF cells into 4 infection stages. In the U2OS cells the protein distribution is a bit different, which only allows them to be categorized into early (stage 1+2) and late (stage 3+4) stage of infection. Although this is what the authors state in the text, all 4 stages are included in Fig.2 for the U2OS cells, so it is not clear how this subdivision is performed and it does not seem like an accurate representation of the data. Furthermore, the uninfected population is not included in the timecourse, and there is not really a gradual change in infection states over the different timepoints as one could have expected. Therefor it is a bit hard to see the relevance of the timecourse. In the paper where the reporter virus is published (ref 16), shorter infection times were used, which leads to a more gradual change in infection stages.

      We thank the reviewer for pointing out these omissions. We have updated Figure 2A to only show the categories early (stage 1+2) and late (stage 3+4) for the U2OS cells. Furthermore, we have repeated the infection time course experiment, quantitating uninfected cells in addition to infected cells and including additional time points (2-, 4- and 6-hours post-infection). This new data (Figure 2B) demonstrates that the temporal profiles of infection progression are similar in HFF-hTERT and U2OS cells. Furthermore, it supports our choice of 9 hours post-infection as a suitable time point for plunge freezing of samples in order to obtain a mixture of cells at early and late stages of infection.

      There is a lot of importance given to the morphological changes of mitochondrial networks in infected cells. However, the quantification represented in Fig.5B is a bit unclear. The mitochondria are classified into different groups, but there is no specific description of the definition and cutoff values of each group. The name of some groups is also confusing, such as "short and long" mitochondria. Furthermore, there are large differences between replicates (suppl. fig. 2). The authors state that some mitochondria are swollen, which they interpret as a sign of apoptosis. They find these swollen mitochondria in 75% of the tomograms of uninfected cells in replicate number 3. If this is indeed cell death this replicate is not healthy.

      We apologise that the categorisation of mitochondria was not sufficiently clear in the submitted manuscript. The categories were percentage of tomograms that had the different mitochondrial morphologies present, not percentages of mitochondria. Thus, tomograms with both short and long mitochondria were classified as “short and long”. We have re-generated Figure 5C and Suppl. Figure 2C as a Venn diagram to illustrate this point more clearly. We have also updated the legend of Figure 5C (lines 845–850) to state clearly that the diagram shows percentage of tomograms with the relevant mitochondrial morphologies. The categorisation was performed manually and we have included examples of each category in Figure 5A. Manual classification can be subjective but, given the large number of tomograms analysed and the clear distinction between morphology in uninfected vs early- and late-stage infected cells, we are confident that our results are robust. We note that we have deposited all of the source tomograms in the Apollo repository at the University of Cambridge (https://doi.org/10.17863/CAM.78593); the data we used for this analysis are thus freely available for inspection and re-analysis by interested colleagues. We note that the swollen mitochondria were observed in multiple samples of uninfected and infected cells. This suggests that, regardless of infection, this is a common phenotype of U2OS cells. Others have observed this morphology by EM in the context of apoptosis and suggest it may represent porous mitochondria (reference 61). Although the proportion of tomograms containing these swollen mitochondria were higher in the uninfected sample of replicate 3, the other 25% contained typical mitochondrial morphologies that we could include in our analysis. The presence of inter-cell morphological variability such as this highlights the importance of imaging multiple cells within a population and performing several distinct biological replicates, as we have done in this study, to ensure project-relevant information is captured and delineated from the background structural variability inherent within a cell population. Previous cryoSXT studies had observed (but did not specifically comment on) a similar swollen mitochondrial morphology (reference 59). However, out of an abundance of caution we excluded all tomograms with swollen mitochondria from our analysis of mitochondrial branching (Figure 5C). Moreover, Tukey tests were performed per replicate for each pair of conditions in Figure 5C and statistical significance was reported only if it was observed independently in all three replicates. We are thus confident that any sampling error in replicate 3 that may arise from excluding tomograms will not have meaningfully altered our conclusions.

      Minor comments

      Results section 1, line 115-117: Where the authors state that it is unclear whether "naked" HSV-1 capsids would be visible by cryoSXT, it would be useful to refer to literature where these are observed by TEM, or to compare to TEM in their own experiments.

      We have included references to previous TEM studies in the results (lines 128–129), as requested. However, we note that TEM and cryoSXT are fundamentally different as TEM uses contrast agents whereas contrast in cryoSXT arises from differential elemental densities (in particular the density of oxygen versus carbon or phosphorous). We have updated the results (lines 129–131) to clarify this point.

      Results line 143: The authors state that it's hard to observe the perinuclear viruses with TEM, but there are several examples of this in the literature that could be referenced, e.g. (Skepper et al. 2001; Leuzinger et al. 2005; Baines et al. 2007; Johnson and Baines 2011), although this does not mean that they are not hard to find or that 3D is not advantegous.

      We thank the reviewer for these references and we have added them to the manuscript.

      Fig.4: It is unclear why all the vesicles are open-ended

      This is due to the differential path-length of carbon rich (and thus high contrast) membrane traversed by the X-rays for the membranes normal or parallel to the incident X-ray beam. We have clarified this point in the results (lines 290–301).

      Some places in the manuscript PFU per cell is used, other places MOI

      Thank you for pointing this out. For consistency, we have changed all instances of PFU per cell to MOI.

      If some specific adjustments to the methods had to be implemented for bio safely reasons (virus work), this should be stated in the methods.

      We have added a section on biosafety measures to the methods (lines 562–568).

      Access to the synchrotron should also be described

      We have expanded the synchrotron access attribution the Acknowledgments section (lines 737– 738).

      Discussion line 320: "consistent with previous research" - there is a reference missing.

      Thank you for spotting this. We have now added the reference.

      The quantifications are based on a limited number of tomograms, but there is no statement as to how the specific tomograms were selected. With a variability between replicates and tomograms, a random selection is important.

      We included all tomograms collected for the relevant experimental condition in all our analyses unless otherwise stated. For the vesicle segmentation we chose four reconstructed tomograms from each condition at random (lines 690–691). For lipid droplet volume analysis and mitochondrial branching analysis we included all tomograms that matched our quality-control criteria. We have added a few sentences to the Segmentation and Graphs and Statistics sections of the methods (lines 691–694 and 724–733) describing our selection criteria for the lipid droplet, vesicle and mitochondrial branching analysis, respectively.

      If gold fiducials are visible in the tomograms it could be useful to indicate, as they can look similar to lipid droplets to a non-expert reader.

      We have indicated gold fiducials Figure 1 H, the only figure in which they are visible, with a gold star as requested.

      Suppl. Fig.2: For clarity it would be good not to use the same color arrows to indicate different things in A and B.

      Suppl. Figure 2B has been removed in response to another reviewer request.

      Reviewer #1 (Significance):

      The authors of this study demonstrate that cells infected by HSV-1 virus can be investigated by the use of cryoSXT, and use this to show that infected cells have more elongated and interconnected mitochondria, and an enrichment of small vesicles close to the nucleus. They thereby also show that cryoSXT offers a nice resolution for characterizing morphological changes in significant volumes of near native-state cells, and that the method offers a promising throughput for screening of large amounts of cells. However, the study does not really present new biological or technical advances compared to previously published literature, see e.g. Müller et.al. 2012, Duke et.al 2014, Perez Berna et.al. 2016, Groen et.al. 2019, Weinhardt et.al. 2020, Loconte et.al. 2021 (not cryo but demonstrates the advantage of capillaries), Kounatidis et.al. 2020, Scherer 2021 (ref 16 from paper), some of which are also referenced in the current study. The study could thus have profited from a more defined focus and possibly further experiments (live-cell imaging, CLEM, TEM, microtubules or more mechanistically focused) depending on the main interest of the authors. The advantage with the current broad focus (assuming that the main concerns are addressed) is that the study could interest a larger audience, ranging from virology, cell biology and immunology to microscopy and methods development.

      We thank the reviewer for recognising the broad audience that will be interested in our manuscript. We believe that our analysis highlights the broad applicability of cryoSXT for analysing cell ultrastructure and changes that occur in response to infection. Furthermore, we think that our use of robust numerical analysis to quantitate the phenotypes we observe highlights the strength of cryoSXT as a high throughput technique for ultrastructural analysis. Our study is the first to investigate HSV-1 infection using cryoSXT and, in addition to confirming previous ultrastructural changes observed using other methods, we present new biological insight in organelle architecture and distribution such as that lipid droplets undergo a transient size increase during early stages of infection. We believe that we have demonstrated the robust utility of cryoSXT as a tool to study ultrastructural changes in response to insults, such as infection by intracellular pathogens, and hope that our manuscript will act as inspiration for others seeking to use cryoSXT to image cellular ultrastructure.

      Reviewer #2 (Evidence, reproducibility and clarity):

      The authors use soft X-ray tomography to examine cell structure following infection by herpes simplex virus-1 (HSV-1). This imaging method can provide 3D images of cryo-preserved intact cells without chemical fixation or staining. The authors find several morphological differences between uninfected and infected cells, including changes in the number and size of vesicles and in the size and shape of mitochondria.

      This is a well-done study with careful and extensive analysis that in general produces convincing images to support the authors' conclusions. The procedures are clearly described and reproducible, and the authors have examined an impressive number of images and have performed appropriate statistical analyses.

      We thank the reviewer for their positive comments.

      I had two comments / suggestions regarding the findings about changes in morphology after infection. First, in the Discussion, the authors consider the possibility of Golgi fragmentation. Can the authors test this by counting Golgi before and after fragmentation?

      We did not frequently observe well-defined Golgi apparatuses in our tomograms, consistent with previous cryoSXT studies (reference 61). We therefore performed new experiments using SIM microscopy to demonstrate the disruption of Golgi apparatus and trans-Golgi network in fixed U2OS cells stained with the markers GM130 and TGN46, respectively. These new results are presented in Figures 6 and 7 and in the results (lines 342–355).

      Second, in the Results the authors report that they did not observe a change in lipid droplets after infection. However, the late-stage image in Fig. 5A seems to show such a change, with the lipid droplets becoming larger and darker relative to the early stage or uninfected cells. Maybe this is just the particular image that was selected, but perhaps it is worth looking at more images by eye just in case the segmentation procedure somehow missed this change.

      We thank the reviewer for suggesting we re-visit the properties of lipid droplets. Based on this suggestion we segmented the lipid droplets from 94 tomograms and found a robust change in the median volume of lipid droplets at early stages of infection. We have included this new data in Figure 4C, Suppl Figure 2 and the text of the results (lines 302–312). The observation that lipid droplet volumes change is particularly interesting as another group recently observed similar changes in lipid droplets in response to HSV-1 infection of astrocytes and they postulate that this may modulate the cellular immune response (reference 85). Our data support and extend their conclusions, as described in the discussion (lines 476–494).

      Minor comments:

      Line 127 - As I understand it, the alignment by fiducial markers corrects primarily for small inaccuracies in tilting of the stage. Hopefully there are not significant vibrations in the microscope because this would also lead to loss of resolution during the exposure of each tilt angle.

      Thank you, we have corrected “vibrations” to “small inaccuracies in tilting of the microscope stage”.

      Line 145 - "electron light" Is this common usage? To me it seems more accurate to just say electrons because light to me means photons.

      Thank you, we have corrected “electron light” to “electrons”.

      Line 390 - detection OF ("of" is missing)

      Thank you, we have made the correction.

      Line 564 - Fig. 2 legend. "partial retention in the nucleus of U2OS cells". I am not sure where the nucleus is in the images. To me, it looks like there is almost no stain for ICP0 in hTERT at stage 1 and stage 3, and then cytoplasmic stain at stage 2 and stage 4. In contrast, for U2OS, the stain looks mostly nuclear until stage 4 when it is partially cytoplasmic. This all needs to be better explained, and perhaps arrows added to the images such that the reader does not have to guess.

      We agree and have added a silhouette around each nuclei in Figure 2 to make this clearer. We have also added arrows to indicate the gC-mCherry enriched juxtanuclear compartment in cells at stage 3 (HFF-hTERT) or a late stage (U2OS) of infection.

      Line 585 - The authors could consider rotating the images by 180{degree sign} in panel A (late) in order to maintain the same orientation of nucleus and cytoplasm. This would make it easier for readers to see the point.

      Done as requested.

      Line 614 - I could not find the length of the scale bar in the legend.

      We apologise for omitting this – is has now been added.

      Reviewer #2 (Significance):

      The significance of the study is two-fold. First, it is a nice technical demonstration of what can be accomplished using soft X-ray tomography. I am qualified to evaluate this, since my expertise is in biological applications of this technique. The second significant aspect of the study is the demonstration of morphological changes in mitochondria and vesicles. I am not a virologist, so I do not know the literature on this point with regard to virus infection, but I find it interesting that the authors were able to detect such changes.

      We thank the reviewer for their positive assessment of our work.

      I believe the authors should cite a couple of papers:

      10.1016/j.cell.2015.11.029 which looks at HSV infection and reports viral particles between the inner and outer nuclear membrane.

      We have included a citation to this work as requested (lines 162–165).

      10.1016/j.jsb.2011.11.025 which also reports nuclear membrane separations or bulges by soft X-ray tomography.

      We have elaborated on this section and incorporated the reference as requested (lines 265– 276).

      Regarding these nuclear membrane bulges, there are a number of papers that show they can also arise from mutations in nuclear-lamin associated proteins like nesprin and SUN (see for example https://doi.org/10.1093/hmg/ddm338). This is perhaps something interesting for the authors to think about, but not necessary for the current manuscript.

      Thank you for this comment. We did consider studying the breakdown of the nuclear lamina during HSV-1 infection, as this has been shown in previous studies [e.g. 10.1101/2021.06.02.446771]. However, we could not robustly resolve the nuclear lamina from the nuclear envelope in uninfected cells. The nuclear lamina is quite thin (30–100 nm in width) and this may have confounded its identification.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      The manuscript by Nahas et al. describes the structural studies performed in U2OS cells infected with a recombinant HSV-1 virus that enables tracing the stage of the infection using fluorescent markers. This system was used to determine major structural changes in HSV-1 infected cells using cryo-soft X ray tomography (cryo-SXT) on near native-state samples. The data presented complement previous studies (particularly ref.16) using similar reagents but different microscopy techniques. While the data are generally well presented and discussed, they do not provide any substantially novel information on the structural changes in HSV-1. Nevetheless, they constitute an interesting technical achievement.

      We thank the reviewer for supporting the technical quality of the analysis. In response to the comments of another reviewer we have extended our analysis and documented new biological information for this system relating to lipid droplet re-shaping and distribution in response to HSV-1 infection; all our new findings are included in the updated manuscript.

      Major comments:

      There are no major concerns on the data, although some of the statements could be revised for a more realistic interpretation of the results.

      • In Figure 1F and lines 152-156 it is stated that a bulging of the nuclear envelope occurs around some of the putative particles, while in lines 243-244 and lines 625-628, it is stated that bulging occurs both in mock and infected cells. This should be clarified to avoid confusion. It is possible that authors differentiate both situations and this should be more clearly stated.

      Many thanks for identifying a possible area of confusion. We have updated the results to clearly distinguish the expansion of the perinuclear space that accompanies virus nuclear egress (lines 160–175) from the bulges of the nuclear envelope that are observed in uninfected and infected cells (lines 265–276).

      • The statistical tests are different for different hypothesis testing throughout the manuscript. The authors should justify in the methods section the use of one or another test. This will contribute to clarity in the hypothesis that is being test and will clarify the reason for the selected test.

      We have significantly expanded the Graphs and Statistics section of the methods (lines 703– 734) to further justify the statistical tests used throughout our study.

      • Sentence: "Our observation..." in lines 349-352. Even though the sentence is in the Discussion it is wildly speculative. The authors could use different approaches to tackle experimentally the question of whether active fusion or faulty fission is involved, but this is not the main subject the manuscript. Please revise the sentence or address experimentally, this would provide new insight into the impact of HSV-1 infection on mitochondrial network morphology. This sentence could be qualified as "speculative".

      We agree that this section of the discussion strayed into speculative territory and have removed it from the updated manuscript.

      • Although ref.16 provides evidence supporting Golgi fragmentation and mitochondrial elongation after HSV-1_timestamp virus infection in HFF cells, it would be important to show confocal microscopy data in U2OS cells, which were used for cryo-SXT, particularly since the authors refer differential virus kinetics and subcellular distribution of viral antigens in these cells. These would greatly contribute to support the statements regarding these two phenomena. It is very likely that the authors already have the data and could easily show them.

      We have included new microscopy experiments to demonstrate changes in mitochondrial (Suppl. Figure 4) and Golgi (Figures 6 and 7) morphology that accompany HSV-1 infection, and these new experiments are now included in the results (lines 335–310 and 342–355).

      -Line 269: Apposition of lipid droplets and mitochondria is not thoroughly described. This statement requires quantitation. Optimally, confocal imaging using Mitotracker and bodipy493/503 or superresolution imaging using specific antibodies may also contribute to strengthen the statement.

      We agree with the reviewer that we do not at this stage have adequate data to support this assertion and have therefore removed it from the manuscript.

      • It would be of great interest to document the budding events observed by cryo-SXT using higher resolution techniques and the kinetic resolution provided by the fluorescent infection fiducials. This would confirm the nature of the particles (using immunogold) and would demonstrate the the usefulness of the cryo-SXT data. This by itself would justify the use of cryo-SXT to temporally locate events that are difficult to visualize otherwise (as stated by the authors).

      We agree with the reviewer that a correlative imaging strategy involving cryoSXT and fluorescence microscopy could aid in identifying features of infection, and have highlighted this interesting future direction in the discussion (line 406–409). However, performing such analysis will be a substantial experimental commitment in its own and is outside the scope of our current manuscript.

      Minor comments:

      • Given that the software used for segmentation (Contour) is not published, a minimal comparative description between manual and semi-automated segmentation may be shown in the supplementary, to illustrate the robustness of the new method and the reliability of the measurements.

      We have now published a preprint (recently accepted in the journal Biological Imaging) that describes Contour in detail, which we have referenced in the updated manuscript: Nahas, K. L., Ferreira Fernandes, J., Crump, C., Graham, S. C. & Harkiolaki, M. (2021) Contour, a semi-automated segmentation and quantitation tool for cryo-soft-X-ray tomography. http://biorxiv.org/lookup/doi/10.1101/2021.12.03.470962

      • Lines 278-280: statistical test and p value are not shown.

      We have updated the text to include details of the statistical test and p value as requested (lines 326–330 of the updated manuscript).

      • After line 376: It would be interesting to mention that transient elongation of mitochondria is observed during dengue virus infection (https://doi.org/10.1016/j.chom.2016.07.008) and that this has also consequences for innate immunity against viruses.

      We thank the reviewer for this suggestion, which we have incorporated into the discussion (lines 522–523).

      • Given that HSV-1 is a BSL-2 level virus and that a recombinant version (GMO) has been used in the study, the authors should describe the biosafety measures taken to image non-inactivated infectious samples by cryo-SXT. The authors should state that a biosafety committee has reviewed these activities.

      We have included a Biosafety Measures section to the methods (lines 562–568) that details the biosafety measures used and their approval by the relevant committees.

      Reviewer #3 (Significance):

      This study constitutes an incremental technical advance in the study of HSV-1 infection. The broad context and the quasi-native structure of the cells enables documenting events that are difficult to observe thin sections for TEM.

      This study is one of the few examples of the use of cryo-SXT for infected cell imaging. Other examples of the literature are cited as well as previous structural studies performed with higher resolution techniques.

      The manuscript may be suitable for HSV-1 specialists and cell biologists interested in using near-native samples for gross cellular imaging and documentation of low-resolution maps revealing alterations in large subcellular structures.

      We thank the reviewer for highlighting that ours is one of only a few comprehensive studies using cryoSXT, illustrating how it can be used to image cellular processes that are hard to ‘catch’ using techniques that require ultra-thin sectioning, and as such that it will be of interest to cell biologists studying infection processes in cellulo.

    1. Reviewer #3 (Public Review):

      The manuscript presents data that high expression of Protein Phosphatase 1 inhibitor in triple-negative breast cancer contributes to the poor outcome by downregulation of an important kinase, GSK3β. If substantiated, this would enhance our understanding of the pathophysiology of this important disease and might suggest new treatment options. Indeed, changes in PPP1R14C expression alter the behaviour of TNBC in cells and in mouse models, but the mechanistic links to GSK3 are not robustly established.

      Fig 1-2 identified the PPP1R14C as upregulated in TNBC and with a significant correlation with worse outcome. Fig 3 and 4 show in vitro and in vivo effects of changes in PP1R14C consistent with increased proliferation, migration and metastasis in vivo. These studies look very solid and appear to identify a role for this phosphatase regulator in TNBC.

      The weaker part of the manuscript is the mechanistic link to GSK3 regulation. Over-expression and knockdown of PPP1R14C have effects on GSK3β phosphorylation and downstream targets, but the direct connection is unclear and made challenging by a number of complex experimental issues.

      The big questions -<br /> 1. Is GSK3 directly ubiquitylated by TRIM25 on K183? I don't think the data are strong here, for reasons elaborated on below.

      2. Is GSK3 really the important target of PPP1R14C/PP1 complex? The biological data are correlative and the direct experiment, does GSK3β (S9A/K183R) rescue PPP1R14C over-expression, would need to be done. But since I suspect K183R is kinase-dead, this may fail.

      3. The studies with C2 are confounded by the broad effects (including on PP2A) of treating cells with ceramide. Calling C2 a specific PP1 activator is I think unwarranted.

      Specific comments:<br /> Why is there a band in Fig 5D lane 2, the Flag-PPP1R14C lane, in the absence of Flag-PPP1R14C?

      Why in Fig 5E, F, G are there two bands in the pGSK3bS9 blot?<br /> The authors would need to show the total GSK3 coming down here too, and the total GSK3 present in Fig 5H as well.

      I have trouble understanding the result in Fig 5H. According to this, global PP1 phosphatase activity increases 3 fold when PPP1R14C is knocked down. First, there is no method noted for this assay. How do we know this is specific to PP1? Second, PPP1R14C is only one of many PP1 interactors. How can its knockdown change cellular PP1 activity 3-fold? I note the knockout mouse for PPP1R14C had a 15% increase in thalamus PP1 activity (see fig 3, https://doi.org/10.1016/j.neuroscience.2009.10.007). This experiment needs much more in the way of controls.

      Fig 6 evaluates the role of PPP1R14C in GSK3 protein stability. There is a fundamental weakness here - How do the authors know the ubiquitylated smear in the various Fig 6 assays is GSK3 versus a ubiquitylated protein that interacts with active GSK3? GSK3 phosphorylation directs many proteins (famously β-catenin and Myc) for ubiquitylation and degradation, so the co-IP of ubiquitylated proteins with GSK3 is to be expected if the IP stringency is not very very high. This is consistent with inactive pSER9 GSK3 not bringing down ubiquitylated proteins. An IP after for example boiling in SDS to break up large complexes would be needed to test if GSK3 itself, rather than associated substrates, is directly ubiquitylated.

      Is TRIM25 specific for GSK3? It's identified by mass spectrometry. However, when I plug TRIM25 into the CRAPome database (https://reprint-apms.org) I find it comes down in 136/716 (19%) of all MS IP studies, making it a very common contaminant in IP. Thus the bar is high to show this is specific. Here the interaction is validated with over-expression of various truncation mutants.

      Line 235: "K183 of GSK3β has been recognized as the ubiquitylation site". First, what is the reference for this statement? I found one paper (https://doi.org/10.1074/jbc.M116.771667 that claims this residue is important for FBXO17 K48 modification, not the K63 linkage associated with TRIM25). In the crystal structure of GSK3β, that K183 appears to coordinate the phosphates of ATP, so the effect of the K183R mutation may be to make the kinase inactive, which would confound their results. So an important experiment is, does K183R retain wildtype kinase activity? Or is it inactive, and so act like the phosphorylated S9 GSK3?

      The reference for ceramide as a PP1 activator is not a primary reference, it is to a paper in the Journal of Endodontics, which uses it. It would be important to cite primary literature for this usage of C2. I note that many papers cite C2 ceramide as a PP2A activator. It is unclear what the rationale is for using it as a specific PP1 activator?

    1. I’d want to learn a lot from Professor Zimmerman so that I may obtain as much information as possible and use it in reality. It’s not about the work.

      This is a "free write" that we did in class recently to think on how we want our experiences in this class to play out during the rest of the semester. As you can see from the first few phrases, I explained how I wanted to learn as much as possible to help me in the future. I made it very obvious that "it wasn't about the work" and that it goes far deeper than that.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      We would like to thank the reviewers for their helpful and constructive comments.

      2. Point-by-point description of the revisions

      Reviewer #1

      This reviewer thought our findings would be of interest to a broad range of scientists from both the centrosome and mitosis fields, but noted some important aspects for improvements.

      Additional Experiments (we number these points for ease of discussion).

        • Figure 3. The reviewer points out that because our analysis of Ana2-∆CC and Ana2-∆STAN mutant proteins was conducted in the presence of endogenous WT protein, we should be more cautious in our interpretation.* We agree and apologise for overstating these findings. We have now rewritten the title and text of this section to be more cautious (p11, para.2)
      1. Figure 5A. The reviewer wonders whether the reduced recruitment of Sas-6 in the presence of Ana2(12A) is due to reduced binding, and they request we test this biochemically. This is our favoured interpretation, but we have been unable to test this biochemically for two reasons. First, although we have successfully purified several recombinant Sas-6 and/or Ana2 fragments (Cottee et al., eLife, 2015), the full-length proteins are poorly behaved (tending to precipitate, likely due to their inherent ability to self-oligomerise). Thus, we have been unable to reconstitute their interaction in vitro*. Second, as we show here, the proteins are normally expressed in embryos at surprisingly low concentrations (~5-20nM), and we can detect no interaction between them in coimmunoprecipitation experiments from embryo extracts (not shown). Indeed, this concentration is so low that Sas-6 does not even appear to form a homo-dimer in the embryo, even though Sas-6 clearly functions as a homo-dimer in centriole assembly (new Figure S4A). We now explain these points, and state that our favoured hypothesis that Ana2(12A) has reduced affinity for Sas-6 (or other core duplication proteins) remains to be tested (p22, para.2).

      2. The Reviewer wonders if all 12 of the potential Cdk1 phosphorylation sites that we mutate in Ana2(12A) are important in vivo, and whether we have tested whether mutating fewer sites (e.g. the two sites [S284/T301] that we show are phosphorylated by Cdk1/Cyclin B in vitro) might be sufficient to recapitulate the Ana2(12A) phenotype. *We have now tested this by mutating just the S284/T301 sites to Alanine [Ana2(2A)], but the results were not very informative (Reviewer Figure 1 [RF1]). Whereas Ana2(12A) is recruited to centrioles for a longer period and to higher levels than WT Ana2 (Figure 4A), Ana2(2A) is recruited to centrioles for a normal period but to lower levels (RF1A,B). The interpretation of this result is complicated because western blots show that Ana2(2A) is also present at lower-levels than normal (RF1B). Thus, it is clear that Ana2(2A) does not recapitulate well the behaviour of Ana2(12A). We have decided not to present this data as it is difficult to interpret and it does not change any of our conclusions.

      3. Figure 6. The reviewer asks whether the 12A mutations impair the interaction with Plk4, influence Plk4’s kinase activity or the ability of Plk4 to phosphorylate Ana2. These are excellent questions but, for the same reasons described in point 2 above, we cannot address them biochemically as we cannot purify well-behaved recombinant full-length Ana2 or active Plk4 in vitro, and both proteins are present at such low levels in the embryo that we cannot detect any interaction between them in embryo extracts. We are working hard to reconstitute in vitro* systems to probe these important points, but it may be sometime before we are able to do so.

      4. Figure 7. The reviewer suggests that the 12D/E phosphomimetic substitutions introduce more negative charge than the putative phosphorylation of Ser/Thr residues and they ask if the Ana2(2D/E) [stated as Ana2(3D/E)] is, like the Ana2(12D/E) mutant, not efficiently recruited to centrioles.* This is a fair comment, but we have not analysed an Ana2(2D/E) mutant because, as described in point 3 above, the Ana2(2A) mutant did not recapitulate well the Ana2(12A) phenotype.

      Minor comments

        • Figure S1. The reviewer requests that we show that the mNG tag on its own is not recruited to centrioles.* We do not show this (as it would create a lot of white space in this Figure), but now state that mNG and dNG do not detectably localise to centrioles (p7, para.1).
        • Figure S4C.* We have included the missing error bars (now Figure S4B).
        • Figure S5A. The reviewer asks about the expression levels of the Ana2(12A) mutant, which are not shown in this Figure. They also state that the expression levels of the transgenes shown in Figure 5A are not similar.* The expression level of Ana2(12A) is shown in Figure S9, as this data was analysed independently of the other mutant proteins shown in Figure S5. We agree that it was overly simplifying the situation to state that the expression levels of WT Ana2-mNG, eAna2(∆CC)-mNG and eAna2(∆STAN)-mNG were “similar” (Figure S5), and we now specifically mention the differences between them (p11, para.3). Reviewer #2

      This reviewer found this a rigorous study that advances our understanding of the regulation of centriole duplication, but raised some minor points.

      Minor Points

      The reviewer requests that we mention the literature describing how Ana2/STIL can influence the abundance and centriolar localisation of Plk4. We apologise for this omission, and have amended our description of this literature in the Introduction to include this point (p3, para.2).

      The reviewer notes that we interpret the ability of the Ana2(12A) mutant to keep incorporating into the centrioles for a longer period as being consistent with our idea that rising levels of Cdk activity during S-phase normally reduce the ability of WT Ana2 to bind to the centriole. They ask us to show how Cdk activity increases over this time-course, and to test whether dampening Cdk has the same effect on Ana2 recruitment (i.e. allows Ana2 to be recruited for a longer period). The time-course of Cdk activation in these embryos has been reported previously (Deneke et al., Dev. Cell, 2016; we present the relevant data from this paper in RF#2A [black line]). This reveals how Cdk activity rises throughout S-phase, which is crucial for our model. To assess the effect of dampening Cdk activity in these embryos we have now analysed the effect of halving the genetic dose of Cyclin B (RF#2B). This perturbation extends S-phase length, but has a complicated effect on the recruitment dynamics of Ana2 (RF#2B). As we would predict, Ana2 is recruited to centrioles for a longer period in these embryos, but it is also recruited more slowly (so it accumulates to lower levels). This is consistent with our hypothesis that Cdk1 activity might first stimulate and then ultimately inhibit the centriolar recruitment of Ana2. The interpretation of this experiment is not straightforward, however, as dampening Cdk1 activity alters Ana2 recruitment dynamics (and many other processes in the embryo) in complicated ways, so we have decided not to include it in the manuscript.

      The reviewer suggests that it would be valuable to show that all 12 of the potential Cdk1 phosphorylation sites in Ana2 can be phosphorylated by Cdk1 in vitro. We think this would not be particularly informative as our hypothesis does not rely on all 12 sites being phosphorylated to generate the Ana2(12A) phenotype. We simply mutate all 12 sites because we don’t know which, if any, are relevant. Thus, showing that some/all of the 12 sites can/cannot be phosphorylated in vitro does not test any hypothesis and would not change any of our conclusions. We now explain our thinking on this in more detail (p12, para.2)

      Other points

      Figure 3. We have corrected the amino-acid numbering mistakes.

      Figure 5Aii. We have changed the x-axis (time) labelling in this and all other Figures.

      Figure Legends. We have tried to eliminate the typos from the Figure legends, and apologise that these errors made it through to the final submitted version of our manuscript.

      Reviewer #3

      This reviewer thought our manuscript would be of great interest to not only the centrosome field but also to cell biologists more generally. Although they had no major concerns, they made a number of suggestions for improvements.

      1. As the reviewer suggests, we now explicitly state that although the Ana2(12A) mutant appears to be largely functional, the overall conformation of the protein may be altered, changing its function in ways we do not appreciate (p21, para.2).

      2. The reviewer suggests we include a multiple sequence alignment of Ana2/STIL proteins to provide more context about the distribution and conservation of the 12 S/T-P sites mutated in Ana2(12A).* This is an excellent idea, and we now include this in a new Figure S6, where we also provide more information about which of these sites have been shown to be phosphorylated in embryo or S2-cell extracts

      3. The reviewer is confused as to why the 12A and 12D/E mutants rescue the ana2-/- mutant flies so well, which suggests that the mechanism we propose here cannot be essential for centriole duplication. We understand this confusion and we now make this point more clearly and explain why we think this occurs in more detail (e.g. p22, para.1). We propose that Cdk normally phosphorylates Ana2 to inhibit its ability to promote centriole duplication, but this phosphorylation does not entirely block this function. So, if all other elements of the system are functional, Ana2(12A) is recruited to centrioles for longer than normal, but this does not dramatically perturb centriole duplication because the many other factors that regulate centriole duplication (such as the pulse of Plk4 recruitment to centrioles [Aydogan et al., Cell, 2020]) still occur normally and are sufficient to ensure that centrioles still duplicate normally. When Ana2 phosphorylation is mimicked [Ana2(12D/E)], the ability of Ana2 to promote centriole duplication is perturbed (but not abolished). This perturbation is lethal in the early embryo—where the centrioles must duplicate in just a few minutes to keep pace with the rapid nuclear divisions. In somatic cells S-phase is much longer, so these cells can still duplicate their centrioles (as we observe) even though Ana2(12D/E) does not function efficiently. As we now explain, this phenotype (being lethal in the early embryo, but not in somatic cells) is a common feature of mutations that influence the efficiency* of centriole and centrosome assembly (p17, para.2).

      4A. The reviewer asks us to comment in more detail on why centrioles do not seem to be elongated in the Ana2(12A) mutant wing disc cells (now Figure S8C), even though we show that Ana2(12A) (Figure 4A), and also Sas-6 (Figure 5), are recruited to centrioles for an abnormally long period. This is an excellent question and, although we do not know the answer, we now discuss this interesting point in more detail (p16, para.1). We think this is likely due to the “homeostatic” nature of centriole growth: in our hands, almost any perturbation that makes centrioles grow for a longer/shorter period, also makes them grow more slowly/quickly, so that they tend to grow to a similar size (Aydogan et al., JCB, 2018; Cell, 2020). This is fascinating, but poorly understood. When we perturb the system by expressing Ana2(12A), both Ana2(12A) and Sas-6 incorporate into centrioles for a longer period, as we predict (Figure 4A and 5A). Unexpectedly, however, Sas-6 is also recruited to centrioles much more slowly. Thus, as so often happens, when we perturb the system so the centrioles grow for a longer time, the centrioles “adapt” by growing more slowly. We do not currently understand why this occurs (although we speculate that Ana2 may also be regulated by Cdk/Cyclins to help recruit Sas-6 to centrioles in early S-phase). In the embryo, where S-phase is very short, this homeostatic compensation is not perfect, and the centrioles appear to actually be shorter than normal. In somatic wing-disc cells, where S-phase is much longer, we suspect that there is more scope for homeostatic compensation and so the centrioles grow to the correct size.

      4B. In this point (also labelled [4] by the reviewer, so we have retained this numbering but labelled the points A and B) the reviewer asks why levels of Ana2(12A) eventually decline at centrioles once the embryos actually enter mitosis. The reviewer notes our rheostat theory, but suggests a discussion of other mechanisms might be interesting. This is a good point, and we agree that the observation that Ana2(12A) levels ultimately still decline at centrioles during mitosis is likely to be important in explaining why centriole duplication is not more dramatically perturbed by Ana2(12A). We now expand our discussion of this point, highlighting that other mechanisms must help to ensure that Ana2 is not recruited to centrioles during M-phase, and discussing the possibility that the receptors that recruit Ana2 to centrioles are themselves inactivated during mitosis by high levels of Cdk activity (p15, para.1). In such a model, the rapid drop in WT Ana2 centriolar levels is due to a combination of switching off Ana2’s ability to bind to centrioles (as we propose here) and switching off the ability of the centrioles to recruit Ana2. For Ana2(12A), only the latter mechanism would operate, so Ana2(12A) levels would start to drop later in the cycle (as the inflexion point at which Ana2 recruitment and loss balances out would be moved to later in the cycle), and these levels would drop more slowly—as we observe.

      • The reviewer is confused to how the Ana2(12D/E) mutant can rescue the mutant phenotype when it is recruited to centrioles so poorly. Ana2(12D/E) is indeed recruited very poorly to centrioles in the experiment shown in Figure 7. However, this experiment had to be conducted in the presence of WT untagged Ana2—as the embryos do not develop in the presence of only Ana2(12D/E). We would predict that WT Ana2 would bind more efficiently to centrioles than Ana2(12D/E) (which appears to behave as if it has been phosphorylated by Cdk/Cyclins, and so cannot be recruited to centrioles efficiently). Thus, in the experiment we show in Figure 7, the Ana2(12D/E) protein is probably being “outcompeted” for binding to the centriole by the WT protein. In somatic cells expressing only* Ana2(12D/E) presumably sufficient mutant protein can be recruited to centrioles to support normal centriole duplication (as it no longer has to compete with the WT protein). We now explain our thinking on this point (p18, para.1).

      • The reviewer wonders whether Ana2(12D/E) may be unable to homo-oligomerize, and this may explain why the protein is not recruited to centrioles efficiently even in the presence of WT protein. This is indeed a possibility, but we think it unlikely as it is widely believed that Ana2/STIL proteins must multimerize to be functional (Arquint et al., eLife, 2015; Cottee et al., eLife, 2015; Rogala et al., eLife, 2015; David et al., Sci. Rep., 2016). As Ana2(12D/E) strongly restores centriole duplication in ana2-/-* mutant somatic cells, it seems unlikely that it cannot multimerize. Nevertheless, we now specifically highlight that the 12D/E (and 12A) mutations might alter the ability of Ana2 to multimerise (p21, para.2).

      We thank the reviewers again for their thoughtful and constructive comments. We hope they will agree that the revised manuscript is now improved and would be appropriate for publication in The Journal of Cell Biology.

      With best wishes,

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Point-by-point description of the revisions

      Black: Comments from reviewers

      Green: Answers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Yamamoto and colleagues have investigated the interplay between microtubules (MTs) and actin in positioning the MTOC at "the cell centre". They have developed a novel experimental setup akin to a synthetic cell to study this question. Essentially a cell-sized (15 µm) microwell that is coated in lipid and then tubulin/actin added and the positioning of a MTOC proxy is studied by microscopy. This is a well executed study. These complicated biochemical reconstitutions are the hallmark of Blanchoin and Théry's group, but even so, it's clear that the exact conditions (e.g. tubulin concentration) are fiddly and critical for these experiments to work. The data are clear, well analysed and presented. In brief, the conditions for centring a cytoskeletal network and decentring/polarising it are recapitulated. This is a short, straightforward paper and I found the results to be clear and the authors' interpretation to be well supported by the data.

      Two questions occurred to me as I read the paper: 1. While the setup is reminiscent of a cell, I suspect that the edge/wall of the microwell is much stiffer than the plasma membrane. So a MT that encounters the wall may behave differently in the cell. This would affect the non-actin conditions but possible also the conditions where an actin mesh is present. Maybe my intuition is not even correct, but I think this issue should be discussed in the paper as a potential limitation of the system.

      Author response: We thank the reviewer for this wise comment. Indeed, the deformation of the container may impact the organization of the MT network, the force balance and the final position of the MTOC. We commented this limitation in the revised discussion (page 10 line 31). However, it should be noted that in the presence of a cortical actin network, MTs are much less capable of deforming the cell than in a vesicle or a in cell treated with actin drugs, so our conditions with a cortical actin network are physiologically relevant although the container can not be deformed.

      1. The graphs in 3C and 4G (lesser extent Fig 1) show nicely that the aMTOC position has apparently rested at a steady state. Some representative trajectories are shown in some figures, but not mentioned much in the text. How does the pathlength (cumulative distance) over time compare to the "distance to centre" measurement? Is there more or less travel under the different conditions? From the supplementary videos it looks like there is a difference. An apparent resting position may still represent significant motion, e.g. circling the centre. What does an analysis of tracklength tell us, if anything?

      Author response: We appreciated reviewer’s comment and followed his/her advice. We measured the pathlength (cumulative distance moved) based on the data shown in Figure 3C and 4G. The analysis confirmed that the MTOC was static in the presence of bulk actin network (shown in the new Supplementary Figure 6B). Interestingly, it also showed that the final position adopted by the MTOC in conditions where it could move more freely was also static, as revealed by the saturation of the pathlength after 1 hour. These analyses are shown in the new Supplementary Figure 6B for the centering in the absence of cortical actin, for the non-centering with long microtubules in Supplementary Figure 7E and for the centering with long MTs and a cortical actin network in Supplementary Figure 7E.

      Very minor clerical point: - the first two sentences of the abstract could be clearer. "The position of centrosome, the main microtubule-organizing center (MTOC), is instrumental in the definition of cell polarity. It is defined by the balance of tension and pressure forces in the network of microtubules (MTs)." In the second sentence, "it" and "defined" are confusing. Are you talking about the position of the centrosome or cell polarity?

      Author response: We thank the reviewer for this comment. As the reviewer suggested, this was a confusing description. Accordingly, we corrected the sentence in the abstract for :

      The orientation of cell polarity depends on the position of the centrosome, the main microtubule-organizing center (MTOC). It is determined by the balance of tension and pressure forces in the network of microtubules (MTs).

      Reviewer #1 (Significance (Required)):

      As I see it, the main advance here is in novel experimental setup which has real potential in the field. Existing methods such as MTs inside lipid bubbles are limited, whereas as the microwell method with fabrication methods allows the shape of the "synthetic cell" to be carefully modulated. Tying the results together with cytosim simulations is also a powerful combination. There is a lot of interest in bottom-up reconstitution of cell biological phenomena, especially those that underlie specialised cell processes, e.g. polarity. My expertise: microtubules in a cellular context with limited experience of MT reconstitution assays.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: This manuscript describes the use of an elegant in vitro reconstitution system to study the effect of variations in the organization of the actin network on the positioning of a microtubule organizing center (MTOC) within the cell. By using a reconstituted system the authors are able to specifically study the contribution of the "pushing" forces generated by microtubule (MT) growth, without the confounding influence of other factors, like pulling forces from MT motors. The authors find that a bulk actin networks at sufficient density can impair MTOC displacement, likely a result of the large viscous drag of the MTOC. Next they show that MTOC centering more resilient to changes in microtubule length. Finally they show that an asymmetric actin network can cause asymmetric positioning of the MTOC.

      Major comments: 1) The model the authors put forth is that the growth of long MTs leads to decentering as a result of the MTs slipping along the well edge. The presence of a cortical actin mesh prevents this slipping. Their argument would be strengthened with and analysis of the MT behaviors in the various conditions. For example when discussing MTOC in well without actin...

      "As they grew, they first ensured a proper centering but after an hour, MT elongation and slippage along microwell edges broke the network symmetry and MTs pushed aMTOC away from the center (Figure 1I, J and Supplementary Movie 2)"

      In this movie I don't see evidence of MTs hitting the cortex and sliding on the "short" side of the well relative to the MTOC. An analysis of the behavior of MTs in various circumstances would help link the behavior of MTs to the movement of the MTOC for all of their conditions. What fraction of MTs hit the cortex and remain relatively motionless, what fraction slide, what fraction catastrophe, what fraction turn and follow the curve of the well? And how does this behavior change for microtubules that end up on the short side vs. the long side of the MTOC? This type of analysis would solidify their model for how centering/decentering occurs in the various conditions they test.

      Author response: This is a fair criticism. The possibility to perform fine analysis of MT dynamics is technically limited by the fluorescent background due to free tubulin dimers. It is the reason why classical in vitro assays are monitored in TIRF microscopy, which is not possible here since MTOCs move in 3D in the microwells. In addition, working with higher laser power to increase the signal to noise ratio generates severe photodamages on MTs. Nevertheless, we could visualize MT dynamics and displacements near the edge of the microwells and describe their behavior more precisely than in the previous version of our manuscript. New images and tracking of MT behavior are now reported in the new Figure 4E, 4F and 5G, as well as the new supplementary Figure 4C, 4D, 7B, and 7C. We also replaced the supplementary movie 2 and Figure 1I in order to show more clearly MTs hitting and slipping along the well boundary. In addition, we also characterized the pivoting of MTs around the MTOC and near the edge of the microwell in order to better characterize the effect of cortical actin. This is now shown in the new Figure 4G and 4H as well as in the new Supplementary Figure 7C-D). We found that the changes in MT orientation and position, at the centrosome and at the contact with the microwell, were clearly prevented by the presence of cortical actin.

      2) The authors use simulations to support their in vitro findings. However, their simulations have many more microtubules emanating from the MTOC than their experiment (Looks like about 50 in the cytosim and they state they are aiming for 15-20 in the aMTOCs). Do the simulations still reproduce the behavior of the in vitro system with a similar number of MTs?

      Author response: This is another fair criticism. We addressed this point by performing simulations with 10~30 microtubules (the number of MTs is variable because of MT dynamics) which are more similar to the number of MTs that we obtained in our experimental conditions. Results were consistent with previous simulations with higher number of MTs and are now shown in the new supplementary figures 6E-F, 7G and 8I).

      3) When the actin networks are asymmetric, the authors see decentering of the MTOC towards the side with less actin. However there is still actin on the side where the MTOC will move to and in some of their images it looks pretty think. Is the actin on that side not dense enough to prevent MT sliding along the "cortex"? If so, can they generate less dense, but uniform actin networks on the "cortex", where MTs can slide. Again descriptions of MT behaviors would be useful in understanding what is happening.

      Author response: We thank the reviewer for asking this important question. We followed reviewer’s advice and generated homogeneous and less dense cortex by working at lower concentration of actin (0.5 mM). In such conditions, we could not see the centering effect that was observed with dense cortex. These new data are now shown in the new Supplementary Figure 7I. This effect was also tested with numerical simulations (new Supplementary Figure 7J) which were consistent with the key role played by actin network density for MT network positioning by cortical friction.

      Minor Comments: 1)Title - the current title implies that actin is balancing the forces generated by the MTs. I'm not sure this is a good description of what is shown in the paper.

      Author response: We thank the reviewer for pointing at this issue. We revised the title to:

      Reconstitution of centrosome positioning by the production of pushing forces in microtubules growing against the actin network.

      2)The discussion would benefit from more explanation about how the results of this paper relate to the classic examples of MTOC positioning they cite. How do they envision the actin and MTs interacting in these systems and what new insight have we gained from the experiments in this manuscript.

      Author response: This is a good suggestion. We added some comments in our discussion about the actin network asymmetry in several classical examples of cell polarization and explained how our observations suggest some new interpretation on the role of this asymmetry in the reorganization of forces in the MT network and on the consequential peripheral positioning of the MTOC.

      Reviewer #2 (Significance (Required)):

      Overall, this work is a significant advance in our understanding of the potential mechanisms of MTOC movement in cells via pushing by MT growth. The experimental system they have developed is powerful advance, allowing meaningful MTOC reconstitution experiments to be performed in chambers of approximately cellular size. This is an important contribution to understanding the interaction between microtubule pushing and the actin cortex.

      Reviewer expertise: Cell biology of MTOC assembly and positioning. I do not have the expertise to assess the parameters used to generate their cytosim models.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Review of "The architecture of the actin network can balance the pushing forces produced by growing microtubules" by Yamamoto et al.

      The means by which cells maintain their characteristic cytoskeletal architectures is not well understood. This is in part because there is considerable variation in such architectures with, for example, fibroblasts, neurons, and epithelial cells. It is also in part because the microtubule, actin and intermediate filaments engage in a wide range of mechanical and signaling crosstalk mediated by a wealth of proteins and signaling networks, which further complicates the picture.

      In the current study, Yamamoto take the welcome step of developing a simplified system for assessing the mutual contributions of microtubules and F-actin for general cytoskeletal organization in vitro (specifically, in lipid-lined microwells). This allows them to define basic principles of microtubule-F-actin interactions in the absence of the various confounding factors alluded to above. Using their model, they show that artificial MTOCs (aMTOCs) alone will center but as a complex function of microtubule length (controlled by varying tubulin concentrations). That is, the aMTOCs are randomly positioned with short microtubules, stably centered with intermediate length microtubules, and randomly oriented with very long microtubules (following symmetry breaking).

      They then assess the contributions of F-actin to the centering process. In low concentrations of "bulk" F-actin (ie F-actin distributed throughout the droplet) there is no effect on centering whereas at higher concentrations of bulk F-actin, centering is impaired as is the translocation of the aMTOCs. In the presence of uniform peripheral F-actin, in contrast, aMTOC centering is enhanced, and rendered less sensitive to variations in microtubule length. Finally, when the authors contrive a situation in which the peripheral F-actin is non-uniform (by lowering the concentration of actin and adding alpha-actinin, which creates a peripheral ring of F-actin with (I think) relatively less F-actin within the ring), the aMTOCs position themselves within the ring.

      Finally, the authors extend their results with simulations that indicate that the various behaviors can be explained by a combination of friction, pushing and slippage.

      This study is fascinating and will be of general interest to anyone who seeks to understand the contributions of mechanical forces to cytoskeletal organization in a minimal system. I have only minor concerns; these are listed below.

      1. Some of the terminology was a little confusing. The authors introduce the term "inner zone" (pg. 8) without defining it. From the context, it seems like they are talking about the approximate center of the ring of peripheral F-actin. If so, why not just do away with the term "inner zone" and refer to the ring center. If it isn't the ring center, then more explanation is needed as to what the inner zone actually is.

      Author response: We apologize for this confusion and appreciate reviewer’s comment. We coined earlier the term “actin inner zone” to define the central cytoplasmic region in cells that is devoid of actin filament (Jimenez et al., Current Biology, 2021). Because it was a confusing point, we clarified this in the revised version of the manuscript (Page 8, Line 20). What we would like to call the “inner zone” is the region inside of the actin cortex. The definition of this zone and of its geometrical reference points were also pictured more precisely in the new Supplementary Figure 9B.

      1. It is not clear from the text or the images if the region within the F-actin ring has less F-actin, more F-actin, or the same amount of F-actin as the region outside the F-actin ring. This point should be clarified, as it makes a big difference in the interpretation of the findings.

      Author response: We apologize for this lack of clarity. In the revised version of our manuscript, we plotted a line scan intensity profile of the actin fluorescence (new Supplementary Figure 9B). It showed that the region within the actin inner zone contained much less actin than in the cortex. This is consistent with our interpretation of a region-selective pattern of friction acting on microtubules.

      1. Ideally, the authors would include manipulations in which the high concentration of peripheral F-actin is combined with alpha-actinin because, as currently presented, the authors are drawing conclusions from changing two variables at once (ie going from a high concentration of peripheral F-actin to a lower concentration with added alpha-actinin). Thus, the authors cannot cleanly distinguish between effects that arise from F-actin asymmetry versus the presence of an F-actin crosslinker. Since the crosslinking is likely to change the mechanical properties of the peripheral F-actin network, this point should at least be addressed in the text, if not by experiments.

      Author response: We are not sure to fully understand the reviewer’s point. We don’t understand how the crosslinking of a symmetric actin network could break the symmetry of the MT network and force its off-centering. The opposite is clearer to us. A homogeneous and loose actin network can allow MT gliding and MTOC off-centering (like in in Supplementary Figure 7J). The mechanical reinforcement of this network by crosslinkers could indeed resist gliding. But the consequence of this resistance would be similar to the consequence of a dense network: a more robust centering (like in Figure 4). So we don’t understand how the crosslinking by alpha-actinin, rather than the asymmetry of the actin network, could be at the origin of the off-centering we observed. In addition the off-centering of the MTOC was systematically aligned with the asymmetry of the actin network, so both parameters were clearly connected.

      Reviewer #3 (Significance (Required)):

      This is an elegant, well-designed study that provides a clear description of how basic mechanical forces can contribute to cytoskeletal organization in a simplified model system.

    1. Author Responses

      Reviewer #1 (Public Review):

      This study uses a nice longitudinal dataset and performs relatively thorough methodological comparisons. I also appreciate the systematic literature review presented in the introduction. The discussion of confound control is interesting and it is great that a leave-one-site-out test was included. However, the prediction accuracy drops in these important leave-one-site-out analyses, which should be assessed and discussed further.

      Furthermore, I think there is a missed opportunity to test longitudinal prediction using only pre-onset individuals to gain clearer causal insights. Please find specific comments below, approximately in order of importance.

      We thank the reviewers for their positive remarks and for providing important suggestions to improve the analysis. Please see our detailed comments below.

      1) The leave-one-site-out results fail to achieve significant prediction accuracy for any of the phenotypes. This reveals a lack of cross-site generalizability of all results in this work. The authors discuss that this variance could be caused by distributed sample sizes across sites resulting in uneven folds or site-specific variance. It should be possible to test these hypotheses by looking at the relative performance across CV folds. The site-specific variance hypothesis may be likely because for the other results confounds are addressed using oversampling (i.e., sampling with replacement) which creates a large sample with lower variance than a random sample of the same size. This is an important null finding that may have important implications, so I do not think that it is cause for rejection. However, it is a key element of this paper and I think it should be assessed further and discussed more widely in the abstract and conclusion.

      We thank the reviewer for raising this point and providing specific suggestions. As mentioned by the reviewer, the leave-one-site-out results showed high-variance across sites, that is, across cross validation (CV) folds. Therefore, as suggested by the reviewer, we further investigated the source of this variance by observing how the model accuracies correlates with each site and its sample sizes, ratio of AAM-to-controls, and the sex distribution in each site. We ranked the sites from low to high accuracy and observed different performance metrics such as sensitivity and specificity:

      As shown, the models performed close-to-chance for sites ‘Dublin’, ‘Paris’ and ‘Berlin’ (<60% mean balanced accuracy) in the leave-one-site-out experiment, across all time-points and metrics. Notably, the order of the performance at each site does not correspond to the sample sizes (please refer to the ‘counts’ column in the above figure). It also does not correspond to the ratio of AAM-to-controls, or to the sex distribution.

      To further investigate this, we performed another additional leave-one-site-out experiment with all 8 sites. Here, we repeated the ML (Machine Learning) exploration by using the entire data, including the data from the Nottingham site that was kept aside as the holdout. Since there are 8 sites now, we used a 8-fold cross validation and observed how the model accuracy varied across each site:

      The results were comparable to the original leave-one-site-out experiment. Along with ‘Dublin’ and Berlin’, the models additionally performed poorly on the ‘Nottingham’ site. Results on ‘London’ and ‘Paris’ also fell below 60% mean balanced accuracy.

      Finally, we compared the above two results to the main experiment from the paper where the test samples were randomly sampled across all sites. The performance on test subjects from each site was compared:

      As seen, the models struggled with subjects from ‘Dublin’ followed by ‘Nottingham’ ‘London’ and ‘Berlin’ respectively, and performed well on subjects from ‘Dresden’, ‘Mannheim’, ‘Hamburg’ and ‘Paris’.

      Across all the three results discussed above, the models consistently struggle to generalize to subjects particularly from ‘Dublin’ and ‘Nottingham’. As already pointed out by the reviewer, the variance in the main experiment in the manuscript is lower because of the random sampling of the test set across all sites. Since these results have important implications, we have included them in the manuscript and also provided these figures in the Appendix.

      2) The authors state that "83.3% of subjects reported having no or just one binge drinking experience until age 14". To gain clearer insights into the causality, I recommend repeating the MRIage14 → AAMage22 prediction using only these 83% of subjects.

      We thank the reviewer for this valuable comment. As suggested by the reviewer, we now repeated the MRIage14 → AAMage22 analysis by including (a) only the subjects who had no binge drinking experiences (n=477) by age 14 and (b) subjects who had one or less binge drinking experiences (n=565). The results are shown below. The balanced accuracy on the holdout set were 72.9 +/- 2% and 71.1 +/- 2.3% respectively, which is comparable to the main result of 73.1 +/- 2%.

      These results provide further evidence that certain form of cerebral predisposition might be preceding the observed alcohol misuse behavior in the IMAGEN dataset. We discuss these results now in the Results section and the 2nd paragraph of Discussion.

      3) The feature importance results for brain regions are quite inconsistent across time points. As such, the study doesn't really address one of the main challenges with previous work discussed in the introduction: "brain regions reported were not consistent between these studies either and do not tell a coherent story". This would be worth looking into further, for example by looking at other indices of feature importance such as permutation-based measures and/or investigating the stability of feature importance across bootstrapped CV folds.

      The feature importance results shown in Figure 9 is intended to be illustrative and show where the most informative structural features are mainly clustered around in the brain, for each time point. We would like to acknowledge that this figure could be a bit confusing. Hence, we have now provided an exhaustive table in the Appendix, consisting of all important features and their respective SHAP scores obtained across the seven repeated runs. In addition, we address the inconsistencies across time points in the 3rd paragraph in the Discussion chapter and contrast our findings with previous studies. These claims can now be verified from the table of features provided in the Appendix.

      Addressing the reviewer's suggestions, we would like to point out that SHAP is itself a type of permutation-based measure of feature importance. Since it derives from the theoretically-sound shapley values, is model agnostic, and has been already applied for biomedical applications, we believe that running another permutation-based analysis would not be beneficial. We have also investigated the stability of our feature importance scores by repeating the SHAP estimation with different random permutations. This process is explained in the Methods section Model Interpretation.

      Additionally now, the SHAP scores across the seven repetitions are also provided in the Appendix table 6 for verification.

    1. It's a little hard to tell if "IndieWeb" is in practice just its own community of people who like to talk about #indieweb things. (That's what gets surfaced when I try to learn more, but of course it is.) I like the idea more than most "fediverse" incarnations, though.

      The Logos, Ethos, and Pathos of IndieWeb

      Where is the IndieWeb?

      Logos

      One might consider the IndieWeb's indieweb.org wiki-based website and chat the "logos" of IndieWeb. There is a small group of about a hundred actove tp very active participants who hang out in these spaces on a regular basis, but there are also many who dip in and out over time as they tinker and build, ask advice, get some help, or just to show up and say hello. Because there are concrete places online as well as off (events) for them to congregate, meet, and interact, it's the most obvious place to find these ideas and people.

      Ethos

      Beyond this there is an even larger group of people online who represent the "ethos" of IndieWeb. Some may have heard the word before, some have a passing knowledge of it, but an even larger number have not. They all act and operate in a way that either seemed natural to them because they grew up in the period of the open web, or because they never felt accepted by the thundering herds in the corporate social enclosures. Many are not necessarily easily found or discovered because they're not surfaced or highlighted by the sinister algorithms of corporate social media, but through slow and steady work (much like the in person social space) they find each other and interact in various traditional web spaces. Many of them can be found in spaces like Tilde Club or NeoCities, or through movements like A Domain of One's Own, some can be found through a variety of webrings, via blogrolls, or just following someone's website and slowly seeing the community of people who stop by and comment. Yes, these discovery methods may involve a little more work, but shouldn't health human interactions require work and care?

      Pathos

      The final group of people, and likely the largest within the community, are those that represent the "pathos" of IndieWeb. The word IndieWeb has not registered with any of them and they suffer with grief in the long shadow of corporate social media wishing they had better user interfaces, better features, different interaction, more meaningful interaction, healthier and kinder interaction. Some may have even been so steeped in big social for so long that they don't realize that there is another way of being or knowing.

      These people may be found searching for the IndieWeb promised land on silo platforms like Blogger, Tumblr or Medium where they have the shadow on the wall of a home on the web where they can place their identities and thoughts. Here they're a bit more safe from the acceleration of algorithmically fed content and ills of mainstream social. Others are trapped within massive content farms run by multi-billion dollar extractive companies who quietly but steadily exploit their interactions with friends and family.

      The Conversation

      All three of these parts of the IndieWeb, the logos, the ethos, and the pathos comprise the community of humanity. They are the sum of the real conversation online.

      Venture capital backed corporate social media has cleverly inserted themselves between us and our interactions with each other. They privilege some voices not only over others, but often at the expense of others and only to their benefit. We have been developing a new vocabulary for these actions with phrases like "surveillance capitalism", "data mining", and analogizing human data as the new "oil" of the 21st century. The IndieWeb is attempting to remove these barriers, many of them complicated, but not insurmountable, technical ones, so that we can have a healthier set of direct interactions with one another that more closely mirrors our in person interactions. By having choice and the ability to move between a larger number of service providers there is an increasing pressure to provide service rather than the growing levels of continued abuse and monopoly we've become accustomed to.

      None of these subdivisions---logos, ethos, or pathos---is better or worse than the others, they just are. There is no hierarchy between or among them just as there should be no hierarchy between fellow humans. But by existing, I think one could argue that through their humanity they are all slowly, but surely making the web a healthier, happier, fun, and more humanized and humanizing place to be.

    1. Author Response

      Reviewer #2 (Public Review):

      Schumacher and Carlson present volumetric data on the brain and main brain areas in several linages of fish that have independently evolved electroreceptors and electrogenesis. The main question is if the evolution of this novel sensory system has led to similar changes in the brain. Previously, the same authors (Sukhum et al 2018) have shown an increase in the relative size of the cerebellum and hindbrain in mormyrid fishes, one group of electrogenic fish. Here they have collected data on South American weakly electric fishes (Gymnotiformes) and weakly electric catfishes (Synodontis spp.) as well as some outgroups. (22 additionally species). I think the question is very interesting, and the inclusion of electrogenic catfishes is particularly interesting as they are a largely understudied group. I do have some concerns about how the data has been analysed and presented.

      1) A first conclusion is that gymnotiform and siluriform brains are not as enlarged as mormyrid brains, and that this suggests that an increase in brain size is not directly tied to an electrosensory system evolution. I think the story here is more complicated than that. From the data presented, it seems that mormyrids have a different body size-brain volume slope than other groups, but is unclear if this was tested in the PGLS model for brain vs body size, although mormirids show different slopes than other groups in the scaling of the cerebellum to brian volume. This difference in slope for body brain allometry has been confirmed by a manuscript published after the submission of this manuscript (Tsuboi 2021 BBE) with a large data set (~ 850 species, 21 of Osteoglossiformes). This steep slope close to one means that mormyrids with large body size have very large relative brain sizes but smaller mormyrids don't (this can be seen in figure 2). I think this needs to be addressed more carefully. First testing in the PGLS for body size vs brain size if mormyrids have a different slope and then in the discussion. Why mormyrids but not other electrogenic fish have evolved such a unique brain scaling?

      We thank the reviewer for this suggestion. We combined our data with the data from Tsuboi 2021 and assessed how the brain-body allometry has changed across 870 actinopterygians. We identified 3 shifts in lineages with at least 3 descendants and 7 shifts total that were supported by both the OUrjMCMC and PGLS analyses. One of these identified shifts was along the branch leading to osteoglossiforms, with a secondary decrease in one lineage within mormyoids. A second identified shift was along the branch leading to Synodontis multipunctatus. However, we find no shifts along the branches leading to other electrosensory lineages. This suggests that although mormyroids do have a different brain-body allometry compared with other electrogenic fishes, this shift predates the origin of mormyrids as it is found in all osteoglossiforms and thus is unlikely to be related to the evolution of electrosensory systems. These changes are reflected in lines 778-826, 110-153, 513-528, 530-538, 569-575 and figure 3 and associated source data files. See also our detailed response to essential revision 1.

      2) I think the number of outgroups species used are too few and spread among several different linages of teleosts. I think this unfortunately tampers some of the conclusions. Particularly seems to leave unanswered the question if other electrogenic fish have brain larger than non electrosensory or electrogenic fish. A large data set of brain and body size data for teleost has been published (Tsuboi et al 2018; 2021). Adding this data should allow to test for changes in body-brain size relationships in the each electrogenic clades. The addition of the additional data should allow to accurately test for difference in relative brain size between and within electrogenic clades and make it possible to test when exactly in the phylogeny of teleost have grade shits in the body-brain allometry have happened.

      We thank the reviewer for this suggestion. We explicitly addressed this question by fixing shifts along the branches that evolved our three electrosensory phenotypes: evolution of electrogenesis, tuberous electroreceptors, and ampullary electroreceptors. After comparing these models to the unfixed shift model, a model where only osteoglossiforms have a shifted allometry (following the finding of Tsuboi 2021), a model where only intercept can shift, and a model with one shared allometry across all actinopterygians, we found that the unfixed shift model has a better fit than any of the electrosensory phenotype associated models. This further supports the conclusion that a shifted allometry/ large brain size is not necessary to evolve an electrosensory system. These additions are reflected in lines 778-826, 110-153, 513-528, 530-538, 569-575 and figure 3 and associated source data files. See also our detailed response to essential revision 1.

      3) Next, the authors use a principal component analysis and phylogenetic linear models to test how much of brain variation is explained by concerted evolution vs mosaic and where the mosaic change have happened. Here, despite the few non electrogenic/ electrocereptive species, the differences are more clear. I do think that in the case of the linear models, the use brain volume as the independent variable is unnecessary. By regressing the total brain volume, the authors are regressing each structure partially against the same value, and not surprisingly, this generates tight linear correlations. Further, this makes grade shifts (i.e. changes in relative size) less apparent. I think only brain volume -the structure should be used and shown in all figures. This has been the standard in the field when testing for grade shifts.

      We thank the reviewer for this comment. There is much debate in the field regarding whether to use brain volume or brain volume – region of interest as the independent variable, and both are commonly used. Originally, we had looked at both and found qualitatively similar results, but only presented the ‘region x brain volume’ results in the main text for brevity. We have revised this to include the results of statistical analyses for ‘region x brain volume – region’ and the accompanying figures in the main text for both the electrosensory phenotype comparisons and the within electrosensory phenotype comparisons (broadly distributed throughout the results and figure 5—figure supplement 1, figure 5—source data 4-6, figure 7—figure supplement 1, figure 7—source data 2). All of the major findings of relative mosaic shifts between tuberous receptor taxa and non-electric taxa, between electrogenic + ampullary only and non-electric taxa for cerebellum and torus, and no mosaic shifts with electrosensory phenotype in telencephalon hold regardless of the method, and we only find minor differences between the analyses for comparisons that had p values near 0.05. These discrepancies do not change any major conclusions. However, we have kept the reporting of ‘region x total brain volume’ analyses in the main text figures to be consistent with other large comparative studies in the field and our group’s previous work (Yopak et al 2010, Sukhum et al 2018).

      4) Related to the previous point, the authors report significant decreases electrogenic clades in the size of the olfactory bulb, rest of the brain and optic tectum. I think this is and artifact that results from including the cerebellum and other enlarged areas (TS and hindbrain) in the dependent variable. Similarly, the authors state that they found no increase in the size of the telencephalon in electrogenic clades and that non-electric osteoglossiforms have a mosaic increase in telencephalon relative to non-electric otophysans. Again, I think this suffers from the same problem. Figure 4-figure supplement 2 actually provides some insight in this respect. When plotted against the rest of the brain, no apparent differences are found in the size of the optic tectum. In the case of the olfactory bulb only two of the out-group species seem to have larger OB than all other species. Regarding the telencephalon, when plotted against RoB, all osteoglossiform seem to have similar telencephalon size. These conclusions need to be carefully evaluated.

      We thank the reviewer for identifying this miscommunication. We have moved previous figure 4—figure supplement 2 to the main text (now figure 6) and have added the statistical analyses and discussion of this point to both the results and discussion. We have also clarified the distinction between relative and absolute shifts in region sizes throughout but see in particular lines 261-295, 307-317, 330-331, 473-499. See also our detailed response to essential revision 3.

      Reviewer #3 (Public Review):

      The authors use micro-CT scanning and sophisticated statistical techniques to compare the sizes of various major brain regions across a sample of 32 fish species, including lineages that have independently evolved passive electroreception and, in a smaller subset, the ability to generate and sense weakly electric fields. They found that most of the variation in brain region sizes is linked to variation in total brain size, indicating concerted evolution. However, the analysis also reveals that the electrogenic lineages/species have selectively enlarged the cerebellum, the midbrain torus semicircularis, and the hindbrain. These findings are interesting and usefully extend the last author's prior work on a subset of these species.

      A significant strength of the work is that it includes a relatively large number of species, makes a good attempt to understand how these species are related to one another (though the authors admit that the phylogeny is tentative), and that the analytical methods are quantitative and relatively sophisticated. It is also true that other researchers have long argued about the relative frequency and importance of concerted versus mosaic evolution. The present study is a valiant attempt to address this issue.

      However, some key results must be viewed cautiously. Most important is that the dramatic increase in the cerebellum (and torus semicircularis and hindbrain), relative to the rest of the brain, must necessarily lead to some other brain regions appearing to have decreased in size. Therefore, their absolute size may well have stayed the same or even increased in evolution; it's just that the enlarged brain regions decrease the proportions of at least some other regions. The authors mentioned this caveat in their previous paper on mormyroids (Sukhum et al., 2018), but not in the present manuscript. As a result of the problem, it is difficult to interpret the documented variation in olfactory bulb, optic tectum, or telencephalon size; is that variation "real" or just artifacts of major changes in the size of other brain regions (mainly cerebellum, torus, and hindbrain). The best way to address this problem would have been to repeat the analysis using a "reference" brain region that is thought not to vary dramatically in size across the species of interest (e.g., "rest of brain"). However, I acknowledge that this approach also has limitations. Still, the problem should be addressed somehow.

      We thank the reviewer for identifying this miscommunication. We have moved previous figure 4—figure supplement 2 to the main text (now figure 6) and have added the statistical analyses and discussion of this point to both the results and discussion. We have also clarified the distinction between relative and absolute shifts in region sizes throughout but see in particular lines 261-295, 307-317, 330-331, 473-499. See also our detailed response to essential revision 3.

      One strength of the manuscript is that it provides information about y-intercepts and slopes. Many other studies simply note increases or decreases in average volume (before or after correcting for absolute brain size). I like knowing which changes in relative brain region size are grade shifts (changes in intercept) versus changes in slope. However, the authors don't really do anything with those results. What do they mean? Are there different kinds of evo-devo mechanisms that underlie the two types of changes (slope versus intercept)?

      We thank the reviewer for this suggestion. We have added some discussion on potential mechanisms for evolutionary changes in intercept and slope (lines 543-559). Unfortunately, this topic is not well studied in fishes, which have extensive adult neurogenesis.

      On a related note, do the major brain regions vary in allometric slope within a given lineage? The realization that such differences do exist (at least in mammals and cartilaginous fishes) contributed much to the excitement around the concept of concerted evolution, since it means that evolutionary changes in absolute brain size can lead to major shifts in brain region proportions, but the authors seemingly ignore this point.

      We thank the reviewer for this suggestion. We do find variability in slope for different regions of each lineage. We reported these values (figure 5—source data 1, figure 7—source data 1) and add discussion of this point (lines 539-542).

      Finally, I must confess that some of the study's findings didn't surprise me. It is well known among fish neurobiologists that mormyrids have a dramatically enlarged cerebellum and that all electrogenic gymnotoids and mormyroids have a very large torus semicircularis and dorsal/alar hindbrain. One didn't need the fancy analytical techniques to confirm this. To be fair, however, it had not been clear whether the cerebellum is enlarged in gymnotoid electric fish and their non-electrogenic relatives (the authors report that it is). Nor was it known that the weakly electric catfishes have a larger cerebellum (not so much for the torus) than their non-electric relatives. This is new information that raises interesting questions about how the electric catfishes are using their electrosensory system (I would have liked to see some discussion of this).

      We thank the reviewer for this comment. We too agree that electric catfishes warrant further study into which species are electrogenic, whether their discharges are sporadic versus continuous, and how they are using their electrosensory systems. We have added further discussion on electric catfishes (lines 411-416, 425-437).

      On balance, I appreciate that the authors have provided a large and useful data set , which they used to address an interesting set of questions about how brain evolution "works." I'm just disappointed that, for me, there are relatively few significant, novel insights. For example, the notion that "selection can impact structural brain composition to favor specific regions involved in novel behaviors" (last sentence of the abstract) is one that I've accepted for a long time. Maybe the conclusion can be made more interesting by focusing more explicitly on changes in the size of major brain regions versus smaller cell groups (where mosaic evolution is widely accepted).

      We thank the reviewer for this suggestion. We agree that mosaic evolution is more readily detected in smaller subregions/ nuclei/ circuits and is found less so at the scale of major brain regions. We have adjusted the text throughout to further highlight this distinction, but see in particular lines 42-48, 500-528.

      Reviewer #4 (Public Review):

      The authors present a detailed and thorough comparative analysis of brain composition across 3 different lineages of weakly electric fish, and several non-electric fishes. The goal of this comparison was to determine whether the evolution of electrosensory systems is associated with common changes in brain composition across the three lineages. Several aspects of this research are highly novel, such as the use of m-CT imaging and phylogeny-informed multivariate statistics. Overall, the authors show that cerebellar enlargement is key to the evolution of electrosensory systems of all three groups and the enlargement of the hindbrain and torus semicircularis varies depending on the types of electroreceptors and electrical signals produced. This is one of very few examples in evolutionary neuroscience of convergent evolution of brain anatomy and behaviour and sets the stage for future research on other sensory specialists and clades.

      Strengths

      The comprehensive analysis provided by Schumacher and Carlson has several strengths. First, the use of m-CT scans to derive neuroanatomical measurements in fish is relatively novel and the detailed descriptions of brain region borders were greatly appreciated. Few papers that focus on comparative neuroanatomy put this degree of effort into describing how regions were differentiated and defined, but the level of detail provided here will allow other researchers to acquire data in an identical method and is therefore an important resource.

      Second, the statistical analysis is phylogeny-informed and uses an array of approaches. Too many neurobiology papers either avoid phylogeny-informed statistics or execute them poorly. This paper is neither of those and should serve as a template for future studies in the field.

      Third, the inclusion of some recording data for Synodontis is an important contribution. I am not an expert on weakly electric fish, but I do know that the catfish are understudied compared with gymnotiforms and mormyroids. Hopefully, this will result in some well-deserved attention to the diversity of catfishes.

      Fourth, I found the manuscript as a whole well written and presented. In particular, the authors provided a novel way of incorporating additional statistical information into Figures 3 and 4.

      Last, the supplemental video was great addition to the data presented.

      Weaknesses

      First, the Introduction was a bit brief for readers unfamiliar with weakly electric fishes. It would be helpful to provide a bit more information to a general audience. Including a figure depicting the phylogenetic relationships among some (not all) bony fish clade to illustrate the independent evolution of electrosensory systems across the three clades would be particularly helpful in this regard.

      We thank the reviewer for this comment. We have included more background on the evolution of electrosensory systems in actinopterygians and included a figure showing this (lines 76-83, figure 1).

      Second, I think it is important to determine if the principal component analysis changes if the volumetric data is scaled. One issue that can affect multivariate analyses is including variables that differ greatly in scale. For example, if one brain region varies between 0.5-1.2 mm3, but another varies from 10-50 mm3 across species, that difference in scale can sometimes affect the PCA. I suggest checking that the analyses are broadly the same if the volumetric data is scaled (e.g., converting to z-scores).

      We thank the reviewer for this suggestion. We z-score normalized the regions and repeated the pPCA and found nearly identical results (lines 175-177, figure 4—figure supplement 1).

      Third is there any information regarding malapteurid catfish? Are they similar enough to Synodontis or could they exhibit yet another brain type from that discussed in this study? The reason I ask is that the authors raise the issue of Torpedo, but do not discuss other strongly electric fish like Malapteurus (which is a siluriform related to Synodontis).

      We thank the reviewer for this comment. We too agree that they would be worthwhile species to add. Unfortunately, there is no data available on malapteurid catfish, and we were unable to sample any. We have added discussion of this point to lines 411-416.

      Last, some of the graphs in the supplemental material are too small with datapoints too crowded to effectively read them. Larger graphs would enable a more effective evaluation of how the various clades differ from one another.

      We thank the reviewer for this comment. We enlarged the region x region plots and plotted species means instead to make it easier to visualize these data (Figure 6, figure 7—figure supplement 2-4).

    1. Not at this time. You know, we believe that the way we collect images is just like any other search engine. And you know, this is stuff in the public domain. And for the purposes that it’s being used for I think, they can be very pro-social. I don't think we want to live in a world where any big tech company can send a cease and desist, and then control, you know, the public square. So, I think it's an issue that is really important because the issue of collecting publicly available online data is not just images, any kind of data. It affects researchers who may be, you know, studying things like discrimination or studying other things like misinformation, and it affects academics and a whole wide range of other types of use cases as well.

      the companies that have asked Clearview to delete these images, has Clearview done so?

      • Didn't delete anything
      • He thinks they are collecting data like other searching engine
      • He believes the purpose of collecting data is favor by the social
      • Don't want the big tech company control the publiced data, that they could just send a cease and desist, and control eveythign
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Unlike other cell organelles, mitochondria contain a small fraction of their genetic information. However, most of the genetic information about mitochondrial proteins is still in the cell's nucleus and the localization of the respective proteins to mitochondria is facilitated by localized translation of their mRNAs. In turn, the mRNA localization to the mitochondria is partly due to the co-translational association, via the mitochondrial target sequence (MTS) of the nascent peptide.

      The manuscript "Mitochondrial mRNA localization is governed by translation kinetics and spatial transport" investigates the mechanisms of mRNA transport and attachment to mitochondria. Concerning mitochondria-localized mRNAs, two types of mRNAs have been distinguished before: mRNAs that are always attached to the mitochondrium (called "constitutively binding" by the authors) and mRNAs that become "sticky" only under certain conditions (called "conditionally binding" by the authors). Modeling the corresponding cellular processes biophysically, the authors infer that yeast cells exercise control over the localization of mRNA (and consequently over their metabolism) in two ways: via varying the mitochondrial volume fraction, and via varying the speed of translation elongation. Data from previously published genome-wide measurements of mRNAs that localize constitutively and conditionally via their MTS in budding yeast S. cerevisiae were used to investigate these mechanisms.

      The manuscript is very well written and the analysis is of high quality. It starts with an introduction that thoroughly reviews many facets around the conducted research and briefly, but self-consistently, summarizes the current knowledge regarding mitochondrial localization of mRNAs. Next, the consequences of the modeling work (presented in the "methods"-section) are explored in the "Results"-section, which contains meaningful and instructive figures and explanations. The manuscript concludes with a comprehensive evaluation of the consequences of the conducted research. All in all, there are only very few minor changes that could be considered.

      Content-wise, we suggest:

      The modeling of translation kinetics is pretty coarse-grained, using only an average elongation rate per amino acid. Much work in this field was done using totally antisymmetric exclusion principle (TASEP)-based models (e.g. MacDonald, J.H. Gibbs, A.C. Pipkin: Kinetics of biopolymerization on nucleic acid templates; Duc, Saleem, Song: Theoretical analysis of the distribution of isolated particles in totally asymmetric exclusion processes: Application to mRNA translation rate estimation). Perhaps this work can be mentioned, and furthermore, the consequences of inhomogeneity of elongation rate for different codons and amino acids could be explored or at least discussed. In particular, this could shed light into the question if ribosome interference and tRNA charging times have any impact on mitochondrial mRNA localization.

      Thank you to the reviewer for pointing us to these relevant papers. As suggested, we have added a paragraph to our Discussion that mentions this work and discusses the possible implications of inhomogeneous elongation along mRNA sequences. We find this suggestion (and the similar one made by the other reviewer) to explore inhomogeneous elongation particularly encouraging, because we are in the early stages of actively pursuing such work. We feel that beyond discussion, exploring the consequences of inhomogeneous elongation is beyond the scope of this work because significant further experimental work would be needed to quantify the impact of specific sequences on translation progress.

      To our Discussion, we have added the following paragraph.

      "In this work our quantitative model assumed uniform ribosome elongation rates along mRNA transcripts. In the presence of ribosome interactions, such dynamics can lead to both uniform and non-uniform ribosome densities and effective elongation rates along the transcript (MacDonald et al., 1968; Duc et al., 2018). With these uniform ribosome elongation rates, previous theoretical results suggest that collisions will be rare (Duc et al., 2018). However, elongation may not be homogeneous along an mRNA transcript, due to factors such as tRNA availability (Varenne et al., 1984), boundaries between protein regions (Thanaraj and Argos, 1996), amino acid charge (Charneski and Hurst, 2013), and short peptide sequences related to ribosome stalling (Sabi and Tuller, 2017). We have found that slow (homogeneous) elongation facilitates mitochondrial mRNA localization, by providing time for MTS maturation, diffusive search, and to maintain binding-competent MTS-mediated mRNA binding to mitochondria. We expect that inhomogeneities in elongation rate along mRNA could either enhance or reduce mitochondrial mRNA localization, controlled by whether slower elongation is in regions that favor longer MTS exposure. For example, a ribosome stall site following full MTS translation could provide more time for MTS maturation and facilitate mitochondrial localization. Future experimental work could identify such stalling sequences and point towards how modeling can improve understanding of sequence impact on localization."

      Ribosome occupancy data from Arava used to infer translation parameters. But there are more recent data sets based on ribosome profiling. Any reason for not using the more recent data?

      We thank the reviewer for bringing up this important point. Our text describing the origin of data for ribosome occupancy in the inset of Figure 2A lacked a citation to the dataset used, and we agree that more recent ribosome occupancy datasets are more appropriate. For the cumulative distributions of ribosome occupancy shown in the inset of Figure 2A, we used the ribosome occupancy data from Zid and O'Shea from 2014. The Arava data from 2003 was used for the cumulative distributions of Figure S1, to show that the similarity between conditional and constitutive genes in the inset of Figure 2A was present in more than a single dataset.

      We have clarified the origin of the ribosome occupancy data in the text.

      In the text description of the inset of Figure 2A, we now include a direct citation of Zid and O'Shea from 2014.

      "These measurements (Zid and O'Shea, 2014) indicate that conditional and constitutive genes have similar distributions of ribosome occupancy (Fig. 2A, inset; see Fig. S1 for similar distributions of conditional and constitutive gene ribosome occupancy derived from (Arava et al., 2003))."

      We also added a citation of Zid and O'Shea to the caption describing the inset of Figure 2A.

      "Inset is cumulative distribution of ribosome occupancy (Zid and O’Shea, 2014), showing ribosome occupancy and β have similar distributions. "

      To determine the translation parameters in our quantitative model, we applied the datasets of Couvillion et al from 2016 for relative protein per mRNA measurements and Zid and O'Shea from 2014 for ribosome occupancy measurements, combined with individual measurements from Morgenstern et al from 2016 and Riba et al from 2019. How these datasets and measurements are used is described in the Methods subsection “Calculation of translation rates”. In addition to the citations in the methods, we have added citations to the briefer description in the Results section.

      "Using protein per mRNA and ribosome occupancy data (Couvillion et al., 2016; Morgenstern et al., 2017; Zid and O’Shea, 2014; Riba et al., 2019), we estimated the gene specific initiation rate kinit and elongation rate kelong for 52 conditional and 70 constitutive genes (see Methods)."

      The effect of the mitochondrial volume fraction on mRNA localization is investigated with a diffusive model. However, the authors make a two dimensional Ansatz for the cell and mitochondrion while it would seem more natural to assume diffusion in three spatial dimensions, as the cell and mitochondria are both three dimensional objects and diffusion strongly depends on the number of dimensions it occurs in. Why was that Ansatz made and why is it justified?

      Our diffusion model is in fact three-dimensional, rather than two dimensional. Specifically, we treat the search process as occurring in a three-dimensional cylinder, whose cross-section is shown in Figure 1D. We have added to Figure 1D to further describe how three-dimensional cylinders represent the mitochondrial proximity in the cell.

      In the Results, we now write:

      “Specifically, we treat the geometry as a sequence of concentric three-dimensional cylinders, each representing an effective region surrounding a tubule of the mitochondrial network. Figure 1D shows a two-dimensional cross-sectional view of these cylinders. The innermost cylinder represents a mitochondrial tubule…”

      We have also clarified the caption of Figure 1D to include:

      "Schematic of mRNA diffusion in spatial model, shown in cross-section. The cytoplasmic space is treated as a cylinder centered on a mitochondrial cylinder: the three dimensional volume extends along the cylinder axis (not shown)."

      The range of variability in the localized fraction +/- CHX is smaller in the experiment compared to the model (Fig. 4B, C). What could be the rationale?

      We agree that the variability in localized fraction from applying CHX is smaller in the experiment (Figure 4C) in comparison to the model (Figure 4B). Our model uses translation parameters (initiation and elongation rates) that are derived from experimental measurements that are expected to be quite noisy. We expect that this noise in the model parameters will expand the range of localization changes predicted by the model for CHX application.

      In l. 417, the authors remark that "constitutively localized mRNAs are on average longer [...] than conditionally localized mRNAs." Yet constitutively localized mRNAs seem to have higher localized fraction than conditionally localized mRNAs. This is somewhat surprising. While it's clear that a higher diffusivity would be compatible with a faster response time of shorter, conditionally-localized mRNAs, it is not clear how the longer, less diffusive mRNAs would have a higher localization fraction. Perhaps the authors can clarify this point.

      The reviewer is correct that experimental measurements show that constitutively-localized genes are, on average, longer than conditionally-localized genes. In our quantitative model, we assume the mRNA of all genes have the same diffusivity. We have used the same diffusivity for different genes because experimental measurements suggest that mRNA length and the number of translating ribosomes on an mRNA do not substantially impact mRNA diffusivity. In our Methods section, we have added citations to papers indicating lack of dependence of mRNA diffusivity on mRNA length.

      "Simulated mRNA have a diffusivity of 0.1 𝜇m2/s. This diffusivity remains constant across genes and mRNA states, consistent with experimental measurements showing little dependence of mRNA diffusivity on mRNA length (Calderwood et al., 2016) or number of translating ribosomes (Wang et al., 2016)."

      We have additionally clarified the part of our Discussion where we explain the distinction of our results from proposals based on differential mRNA diffusion speed.

      "Lower occupancy was proposed to drive mRNA localization through increased mRNA mobility of a poorly loaded mRNA (Poulsen et al., 2019), as more mobile mRNA could more quickly find mitochondria when binding competent, increasing the localization of these mRNA. By contrast, our results imply an alternate prediction – that translational kinetics lead to enhanced localization of longer mRNAs, due to the increased number of loaded ribosomes bearing a binding-competent MTS. Indeed, constitutively localized mRNAs are on average longer than conditionally localized mRNAs."

      Minor formal changes would be:

      Setting the expressions of the fraction in the binding-competent state in l. 118 and the faction of the mRNA-accessible volume in l. 123 in normal math-environments instead of the inline-environment since they are of key importance to the following discussion.

      These two equations (now equations (1) and (2)) are set as distinct equations that are now referred to by their equation numbers later in the manuscript.

      l. 414 contains the verb "vary" twice

      Thank you to the reviewer for pointing out this redundancy, the sentence now reads

      "Translation kinetics can widely vary between genes ... "

      l. 438 lacks an "h" in the word mitochondria

      Thank you to the reviewer for pointing this out, this spelling error has been corrected. The sentence now reads "all mRNA transcripts studied would be highly localized to mitochondria in all conditions."

      Reviewer #1 (Significance (Required)):

      All in all, this is a strong manuscript that contains solid, simple but meaningful and by no means oversimplified models with impactful consequences on the understanding of mitochondrial mRNA localization. Furthermore, it is likely that the approach applies to other cellular compartments like the ER. The research is explained in a remarkably clear and focussed style which makes it easy to follow and meanwhile succeeds in not omitting any details.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Arceo et al. have developed a stochastic, quantitative model of mitochondrial targeting sequence (MTS)-mediated mRNA localization to mitochondria in yeast. They use this model to investigate the role of translation- and diffusion kinetics in controlling mitochondrial mRNA localization of conditional as well as constitutional genes.

      Most importantly, they find that neither mRNA diffusivity nor ribosome density alone are sufficient to account for the differences in localization that were experimentally observed for the two types of genes. Therefore, they implement an MTS maturation time into their model and find that they can now predict gene specific localization rates. Based on these observations, the authors conclude that yeast cells can regulate the localization of mRNAs to mitochondria through (controlling mitochondrial volume fractions and) differences in translation kinetics, which adjust the exposure time and numbers of mature MTSs that are presented on the mRNP and convey binding-competence.

      Major comments:

      Overall, the manuscript is well written and the conclusions are convincing. The underlying assumptions of the model make sense, but I have no background in modelling and can therefore only comment on the RNA biology aspects and general comprehensibility of the work.

      • The authors calculate gene-specific translation initiation and elongation rates to model localization on different transcript classes. In this context,

      (i) They use a single decay rate to estimate trajectory lifetime and this decay rate is such (1 nt / 600 s) that it would take the average yeast mRNA (~ 1400 nt; Smith et al., JCB, 2015) 10 days to be turned over. This is not consistent with physiological decay rates and as a consequence, they are essentially not accounting for mRNA turnover. This should be explained in the Methods.

      The reviewer has highlighted a lack of clarity in our model description. The mRNA decay rate in the model is (1/600) inverse seconds per entire mRNA molecule, rather than (1/600) inverse seconds per nucleotide. This leads the typical mRNA lifetime to be 600 seconds. The sentence in the Methods section describing the decay timescale now reads "The mRNA decay rate is set to kdecay = 0.0017 s-1 per mRNA molecule, such that the typical decay time for an mRNA molecule is 600 s. This decay time is consistent with measured average yeast mRNA decay times ranging from 4.8 minutes (Chan et al., 2018) to 22 minutes (Chia and McLaughlin, 1979)."

      (ii) Translation and decay are intrinsically linked and translation machinery also recruits decay enzymes. What is more, decay rates differ greatly for different mRNA transcripts. I cannot judge how feasible this is, but it might benefit the model if variable decay rates (i.e. modelled based on translation efficiency?) could be included.

      We appreciate this suggestion from the reviewer. We have added a supplemental figure (Figure S4) to explore how mRNA decay rate can impact mitochondrial localization of mRNA. While longer decay rates have little impact on localization, if the decay rate is sufficiently high, the mRNA will have limited opportunity for translation to initiate and a binding-competent MTS to develop, substantially reducing localization. This analysis does not consider how the mRNA lifetime might be coupled with translational effects (such as ribosome stalling). Accounting for the impact of such more complex decay mechanisms would require substantial expansion of the model and extensive additional experiments to parameterize the coupling effects; we believe this extension would be beyond the scope of this manuscript.

      To our Discussion, we have added

      "While we have focused on how variation in translational kinetics between genes can impact mitochondrial mRNA localization, there is also significant variation in mRNA decay timescales (Chia and McLaughlin, 1979; Chan et al., 2018). Our model suggests (see Fig. S4) that the mRNA decay timescale has a limited effect on mitochondrial mRNA localization, unless the decay time is sufficiently short to compete with the timescale for a newly-synthesized mRNA to first gain binding competence. We leave specific factors thought to modulate mRNA decay, such as ribosome stalling (Mishima et al., 2022), as a topic of future study."

      (iii) Along the same lines: Rare codons as well as specific stalling sequences, are known to slow down translation elongation on many transcripts (and will effectively increase MTS exposure time). Can the authors identify transcripts with such signal sequences (on a global scale, apart from TIM50) and incorporate in their model?

      We find this suggestion (and the similar one made by the other reviewer) to explore stalling sequences particularly encouraging, because we are in the early stages of actively pursuing such work. We feel that beyond discussion, exploring the consequences of inhomogeneous elongation is beyond the scope of this work because significant further experimental work would be needed to quantify the impact of specific sequences on translation progress.

      To our Discussion, we have added the following paragraph.

      "In this work our quantitative model has applied uniform ribosome elongation rates along mRNA transcripts, which with ribosome interactions can lead to both uniform and non-uniform ribosome densities and effective elongation rates along the transcript (MacDonald et al., 1968; Duc et al., 2018). With these uniform ribosome elongation rates, previous theoretical results suggest that collisions will be rare (Duc et al., 2018). However, elongation may not be homogeneous along an mRNA transcript, due to factors such as tRNA availability (Varenne et al., 1984), boundaries between protein regions (Thanaraj and Argos, 1996), amino acid charge (Charneski and Hurst, 2013), and short peptide sequences related to ribosome stalling (Sabi and Tuller, 2017). We have found that slow (homogeneous) elongation facilitates mitochondrial mRNA localization, by providing time for MTS maturation, diffusive search, and maintains a binding-competent MTS-mediated mRNA binding to mitochondria. We expect that inhomogeneities in elongation rate along mRNA could either enhance or reduce mitochondrial mRNA localization, controlled by whether slower elongation is in regions that favor longer MTS exposure. For example, a ribosome stall site after the MTS is fully translated could provide more time for MTS maturation and facilitate mitochondrial localization. Future experimental work could identify such stalling sequences and point towards how modeling can improve understanding of sequence impact on localization."

      • Reduced mature MTS exposure time is presented as one of the determining factors that regulate mitochondrial localization of conditionally localized transcripts. For my background, the underlying mechanisms that determine MTS maturation are insufficiently explained. I understand how chaperone recruitment can contribute to MTS maturation. However, it is not obvious to me how receptor binding would account for such long maturation times as the 40 s used here (Fig. 3, 4). I would appreciate if the authors could elaborate and possibly point to directions that their model could be used to study those.

      We agree with the reviewer that the diffusive search time for a chaperone to find a newly-synthesized MTS would be very short (a small fraction of the proposed 40-second MTS maturation time), and we expect that this maturation period is largely controlled by chaperone and co-chaperone interaction timescales. There is a wide range of timescales for newly-synthesized (or misfolded) proteins to productively interact with a chaperone, and the literature provides examples of timescales comparable to 40 seconds, which we now cite.

      To our Discussion, we have added

      "While the diffusive search for a newly-synthesized MTS by chaperones is expected be very fast ( 100 seconds for human chaperone-mediated folding (Wu et al., 2020)."

      We feel that modeling chaperone facilitation of MTS folding, to determine the timescale of this process, is very distinct from the topics covered in our manuscript, and thus beyond the scope of this work.

      • One of the two main conclusions (at least according to the abstract) from the work is that yeast cells modulate mitochondrial volume fractions to regulate mRNA localization to mitochondria. This is a fact, not a novel finding. The other main conclusion, which is that cells use different translation dynamics to control mRNA localization, is intriguing and deserves more attention. It would be great if the authors could suggest/discuss an experimental approach (i.e. a single mRNA imaging experiment quantifying mitochondrial co-localization and translation kinetics of different reporter constructs) to test this hypothesis.

      We appreciate the reviewer raising the point that yeast cells modulate mitochondrial volume fraction to regulate mitochondrial mRNA localization. While we previously showed this relationship between mitochondrial volume fraction and localization, we used experimental techniques (mutations, nutrient sources) that changed many other factors beyond mitochondrial volume fraction. In this work we have used a quantitative model, lacking those extraneous factors, to demonstrate that a change to mitochondrial volume fraction alone can lead to a change in mitochondrial mRNA localization. This work supports our interpretation of those previous experimental results.

      To our Discussion we have added the sentence

      "Previous experimental work suggested that changing mitochondrial volume fraction could control mitochondrial mRNA localization (Tsuboi et al., 2020) --- our quantitative modeling work provides further support for this mechanism of regulating mRNA localization."

      The reviewer also requests a discussion of an experimental approach to test how cells use translational dynamics to control mRNA localization. With the advent of combined mRNA imaging and live translational imaging it would be interesting to directly measure translation in live cells to correlate localization with a time delay. Unfortunately there are currently no published live translational imaging studies in yeast, and thus such a measurement would require the development of the technique in yeast.

      To our Discussion, we have added

      "Experimentally testing our proposal for translation-controlled localization would involve using combined mRNA and live translational imaging (as yet undeveloped in yeast), to directly measure translation and correlate localization with a time delay, presenting a fruitful pathway for future study."

      Minor comments:

      • Figure 1: X axis labels between panel E and F are not consistent. Inset in panel F is mainly and first discussed in text. Please do not show data as tiny inset but as separate panel.

      We have changed the axis label of Figure 1E to match the axis label of Figure 1G (previously Figure 1F). The inset of the old Figure 1F is now the new Figure 1F, and the old Figure 1F is now the new Figure 1G. We have adjusted the Figure 1 caption and the text description of Figure 1 to match these changes.

      Elongation rates of 250 aa per second are not physiological. In mammalian cells elongation has been quantified to proceed between 1 and app. 20 aa per second (Wang et al, 2016; Wu et al., 2016; Yan et al., 2016; Morisaki et al., 2016).

      The reviewer is correct that the elongation rates of 50/s and 250/s too large to be physiological. These large values have been deliberately selected to probe the nonequilibrium behavior of the quantitative model to test the prediction of the simpler four-state model, rather than represent physiological behavior.

      To the text in the Results section discussing Figure 1F, we have added the following sentence.

      "We include unphysiologically high elongation rates to compare to the expected behavior from the 4-state model."

      Panel E: elongation rate range does not match Fig 1F nor median in Fig 3A.

      The reviewer is correct that the elongation rate parameter range of Figure 1E does not match the elongation rates of Figure 1F or the median in Figure 3A. In Figure 1E, we aimed to show that the physiological range of translation parameters can produce a wide range of both MTSs per mRNA and mRNA binding competence for mitochondria.

      We have expanded the description of Figure 1E in the text.

      "By exploring the physiological range of translation parameters, many orders of magnitude of the mean number of translated MTSs per mRNA (β, see Eq. 5) are covered, which also covers the full range of mRNA binding competence (Fig.1E). We find that, for any set of physiological translation parameters, the number of binding-competent MTS sequences (β) is predictive of the fraction of time (fs) that each mRNA spends in the binding competent state (Fig.1E)."

      • Figure 2A and S1: Please explain how ribosome occupancy is defined here and why it is so different between figures

      We have inserted a citation for Zid 2014, to distinguish that the ribosome occupancy measurements in Figure 2A (Zid and O’Shea) and Figure S1 (Arava et al) come from two different techniques. Zid and O’Shea used ribosome profiling to obtain a relative, rather than absolute measurement. While Arava used a technique where they fractioned mRNAs based on the absolute number of ribosomes loaded across 14 fractions of a sucrose gradient, and measured the relative amount of mRNA in each fraction by microarray. So while ribosome occupancy in each paper was calculated in a very distinct manner, the comparison between conditional and constitutively localized mRNAs shows a very similar trend without significant differences in ribosome occupancy between these two classes of mRNAs with either measurement of ribosome occupancy.

      To the caption of Figure S1, we have added

      "These ribosome occupancy values cover a distinct range, in comparison to those of Fig. 2A, due to distinct experimental measurement techniques."

      • Figure 2C: please show experimental data along with model prediction (in the same graph) so that conclusion becomes immediately apparent from figure not just main text. Label clearly (in figure) when experimental and when model data is shown (maybe by using consistent color scheme?)

      We have added experimental data to Figure 2C. Throughout the manuscript, we have kept a consistent color scheme for data for mitochondrial localization for ATP3, TIM50, conditional, and constitutive mRNA, whether from model or experimental data. We have applied distinct line types (e.g. solid for model vs. dot-dashed with circles for experimental).

      • Figure 4B and C: clearly indicate in figure which are experimental and which are modelled data

      In Figures 4B and 4C, we have clarified which data is experimental and which is modeled by adding to the labels for each violin plot. Violin plot labels for model data now read "Model Conditional" or "Model Constitutive" and labels for experimental data now read "Expt Conditional" or "Expt Constitutive".

      • Figure 4D: show experimental vs. model data in same graph (at same axis scaling) for comparability

      We have added the experimental data, previously in the inset of Figure 4D, to the main part of Figure 4D.

      • Line 305: "constitutive" mRNA

      Thank you to the reviewer for pointing out this redundancy, the sentence now reads

      "Figure 3C shows how the localization for the prototypical conditional and constitutive mRNA varies with the maturation time."

      • Line 334: "other changes, such as diffusivity, are unable to separate the two gene groups" - what other changes? The authors only show diffusivity (Fig S3).

      Thank you to the reviewer for pointing this out. We have revised this sentence to only refer to diffusivity changes.

      "While introduction of this maturation time distinguishes the mitochondrial localization of conditional and constitutive gene groups (Fig. 4A vs Fig. 2B), changes to diffusivity are unable to separate the two gene groups (Fig. S3)."

      • Line 403-405: maybe useful to argue against lower ribosome occupancies as drivers of nascent chain complex mobilities: Wang at el, Cell, 2016; single translation site imaging experiments indicating that ribosome occupancy is not the main determinant of mRNP mobility.

      We thank the reviewer for the direction to this paper, which indeed indicates that ribosome occupancy has limited impact on mRNA diffusivity.

      We now cite this paper in our Methods section.

      "Simulated mRNA have a diffusivity of 0.1𝜇m2/s. This diffusivity remains constant across genes and mRNA states, consistent with experimental measurements showing little dependence of mRNA diffusivity on mRNA length (Calderwood et al., 2016) or number of translating ribosomes (Wang et al., 2016)."

      • Line 601-607: include experimental references to explain how measures (25 nm vs 250 nm) were determined/selected.

      The reviewer raises a valuable point, as it is important to motivate these lengthscales used in the model.

      Microscopy with visible light has a lateral resolution limit of approximately 250 nm, often known as the Abbe limit. Accordingly, we assume that mRNA within 250 nm of mitochondria will be measured as adjacent to mitochondria. To the Methods section, we now include a short explanation and a citation.

      Unlike the 250-nm diffraction limit, there is no widely-used reaction range for mRNA binding to intracellular substrates, nor a measurement of the required proximity for an MTS-bearing mRNA to bind to mitochondria. We estimate the 25-nm distance for mRNA binding to mitochondria from the following contributions:

      • The yeast ribosome is 25 - 28 nm in diameter, or 13 - 14 nm in radius.
      • Yeast MTSs have a length of up to 70 amino acids, with 20 estimated yeast MTS lengths having a mean of 31 amino acids. The MTS forms an amphipathic helix (an alpha helix), which has a pitch of 0.54 nm and 3.6 amino acids per turn, so the 31 amino acids will be approximately 5 nm long
      • The MTS will be attached to the ribosome/mRNA by other peptide regions, expected to typically be a few nanometers in length So overall we estimate a 25 nm range for an MTS-bearing mRNA to bind to mitochondria.

      To our methods, we have added this reasoning and accompanying citations.

      "We estimate the 25-nm binding distance by combining several contributions. The yeast ribosome has a radius of 13 - 14 nm (Verschoor et al, 1998). The MTS region, up to 70 amino acids long, forms an amphipathic helix (Bacman et al., 2020) a form of alpha helix. With an alpha helical pitch of 0.54 nm and 3.6 amino acids per turn, a 31 amino acid MTS (the mean of 20 yeast MTS lengths (Dong et al., 2021)) is approximately 5 nm in length. An additional few nanometers of other peptide regions bridging the MTS to the ribosome provides an estimate of 25 nm for the range of an MTS-bearing mRNA to bind mitochondria. The 250-nm imaging distance is based on the Abbe limit to resolution with visible light (Georgiades et al., 2016)."

      Reviewer #2 (Significance (Required)):

      My field of expertise is the development of single mRNA imaging methods to quantify translation/decay dynamics in living mammalians systems. Thus, I cannot judge the significance of this work with respect to the modelling that is presented here.

      However, I do appreciate that one of the main conclusions of this work, which is that cells might use different translation dynamics to control mRNA localization, is truly exciting and could be applied to other types of transcripts (this is exactly what SRP does for ER-targeted mRNAs) as well. Because mechanisms that regulate translation in a transcript-specific manner and in different subcellular localizations have only been described for a handful of cases, I think that this observation is worth following up on and should be appreciated by a broad scientific audience.

    1. Reviewer #3 (Public Review):

      In their study, the authors set up to challenge the long-held claim that cortical remapping in the somatosensory cortex in hand deprived cortical territories follows somatotopic proximity (the hand region gets invaded by cortical neighbors) as classically assumed. In contrast to this claim, the authors suggest that remapping may not follow cortical proximity but instead functional rules as to how the effector is used. Their data indeed suggest that the deprived hand area is not invaded by the forefront which is the cortical neighbor but instead by the lips which may compensate for hand loss in manipulating objects. Interestingly the authors suggest this is mostly the case for one-handers but not in amputees for who the reorganization seems more limited in general (but see my comments below on this last point).

      This is a remarkably ambitious study that has been skilfully executed on a strong number of participants in each group. The complementarity of state-of-the-art uni- and multi-variate analyses are in the service of the research question, and the paper is clearly written. The main contribution of this paper, relative to previous studies including those of the same group, resides in the mapping of multiple face parts all at once in the three groups.

      In the winner takes all approach, the authors only include 3 face parts but exclude from the analyses the nose and the thumb. I am not fully convinced by the rationale for not including nose in univariate analyses - because it does not trigger reliable activity - while keeping it for representational similarity analyses. I think it would be better to include the nose in all analyses or demonstrate this condition is indeed "noisy" and then remove it from all the analyses. Indeed, if the activity triggered by nose movement is unreliable, it should also affect multivariate.

      The rationale for not including the hand is maybe more convincing as it seems to induce activity in both controls and amputees but not in one-handers. First, it would be great to visualize this effect, at least as supplemental material to support the decision. Then, this brings the interesting possibility that enhanced invasion of hand territory by lips in one-handers might link to the possibility to observe hand-related activity in the presupposed hand region in this population. Maybe the authors may consider linking these.

      The use of the geodesic distance between the center of gravity in the Winner Take All (WTA) maps between each movement and a predefined cortical anchor is clever. More details about how the Center Of Gravity (COG) was computed on spatially disparate regions might deserve more explanations, however. Moreover, imagine that for some reason the forefront region extends both dorsally and ventrally in a specific population (eg amputees), the COG would stay unaffected but the overlap between hand and forefront would increase. The analyses on the surface area within hand ROI for lips and forehead nicely complement the WTA analyses and suggest higher overlap for lips and lower overlap for forehead but none of the maps or graphs presented clearly show those results - maybe the authors could consider adding a figure clearly highlighting that there is indeed more lip activity IN the hand region.<br /> In addition to overlap analyses between hand and other body parts, the authors may also want to consider doing some Jaccard similarity analyses between the maps of the 3 groups to support the idea that amputees are more alike controls than one-handers in their topographic activity, which again does not appear clear from the figures.

      This brings to another concern I have related to the claim that the change in the cortical organization they observe is mostly observed in one-handers. It seems that most of this conclusion relies on the fact that some effects are observed in one-handers but not in amputees when compared to controls, however, no direct comparisons are done between amputees and one-handers so we may be in an erroneous inference about the interaction when this is actually not tested (Nieuwenhuis, 11). For instance, the shift away from the hand/face border of the forehead is also (mildly) significant in amputees (as observed more strongly in one-handers) so the conclusion (eg from the subtitle of the results section) that it is specific to one-hander might not fully be supported by the data. Similar to the invasion of the hand territory from the lips which is significant in amputees in terms of surface area. All together this calls for toning down the idea that plasticity is restricted to congenital deprivation (eg last sentence of the abstract). Even if numerically stronger, if I am not wrong, there are no stats showing remapping is indeed stronger in one-handers than in amputees and actually, amputees show significant effects when compared to controls along the lines as those shown (even if more strongly) in one-handers. Also, maybe the authors could explore whether there is actually a link between the number of years without hand and the remapping effects.

      One hypothesis generated by the data is that lips remap in the deprived hand area because lips serve compensatory functions. Actually, also in controls, lips and hands can be used to manipulate objects, in contrast to the forehead. One may thus wonder if the preferential presence of lips in the hand region is not latent even in controls as they both link in functions?

    1. The biggest mistake—and one I’ve made myself—is linking with categories. In other words, it’s adding links like we would with tags. When we link this way we’re more focused on grouping rather than connecting. As a result, we have notes that contain many connections with little to no relevance. Additionally, we add clutter to our links which makes it difficult to find useful links when adding links. That being said, there are times when we might want to group some things. In these cases, use tags or folders.

      Most people born since the advent of the filing cabinet and the computer have spent a lifetime using a hierarchical folder-based mental model for their knowledge. For greater value and efficiency one needs to get away from this model and move toward linking individual ideas together in ways that they can more easily be re-used.

      To accomplish this many people use an index-based method that uses topical or subject headings which can be useful. However after even a few years of utilizing a generic tag (science for example) it may become overwhelmed and generally useless in a broad search. Even switching to narrower sub-headings (physics, biology, chemistry) may show the same effect. As a result one will increasingly need to spend time and effort to maintain and work at this sort of taxonomical system.

      The better option is to directly link related ideas to each other. Each atomic idea will have a much more limited set of links to other ideas which will create a much more valuable set of interlinks for later use. Limiting your links at this level will be incredibly more useful over time.

      One of the biggest benefits of the physical system used by Niklas Luhmann was that each card was required to be placed next to at least one card in a branching tree of knowledge (or a whole new branch had to be created.) Though he often noted links to other atomic ideas there was at least a minimum link of one on every idea in the system.

      For those who have difficulty deciding where to place a new idea within their system, it can certainly be helpful to add a few broad keywords of the type one might put into an index. This may help you in linking your individual ideas as you can do a search of one or more of your keywords to narrow down the existing ones within your collection. This may help you link your new idea to one or more of those already in your system. This method may be even more useful and helpful for those who are starting out and have fewer than 500-1000 notes in their system and have even less to link their new atomic ideas to.

      For those who have graphical systems, it may be helpful to look for one or two individual "tags" in a graph structure to visually see the number of first degree notes that link to them as a means of creating links between atomic ideas.

      To have a better idea of a hierarchy of value within these ideas, it may help to have some names and delineate this hierarchy of potential links. Perhaps we might borrow some well ideas from library and information science to guide us? There's a system in library science that uses a hierarchical set up using the phrases: "broader terms", "narrower terms", "related terms", and "used for" (think alias or also known as) for cataloging books and related materials.

      We might try using tags or index-like links in each of these levels to become more specific, but let's append "connected atomic ideas" to the bottom of the list.

      Here's an example:

      • broader terms (BT): [[physics]]
      • narrower terms (NT): [[mechanics]], [[dynamics]]
      • related terms (RT): [[acceleration]], [[velocity]]
      • used for (UF) or aliases:
      • connected atomic ideas: [[force = mass * acceleration]], [[$$v^2=v_0^2​+2aΔx$$]]

      Chances are that within a particular text, one's notes may connect and interrelate to each other quite easily, but it's important to also link those ideas to other ideas that are already in your pre-existing body of knowledge.


      See also: Thesaurus for Graphic Materials I: Subject Terms (TGM I) https://www.loc.gov/rr/print/tgm1/ic.html

  3. Apr 2022
    1. Author Response

      Reviewer #1 (Public Review):

      Kwon, Huxlin and Mitchell compared motion perception and oculomotor responses in eight patients with post-stroke lesions in the primary visual cortex (V1). Motion perception was measured as peripheral motion discrimination thresholds (NDR) separately in the affected and the intact visual field. Due to restoration training, the NDR thresholds were below chance even in the affected visual field, indicating that some residual motion discrimination was possible. Oculomotor responses were measured as the gain of eye drifts (PFR) after saccades to dot patterns that are coherently drifting inside peripheral, stationary apertures. The authors distinguish between a predictive, open loop component up to 100 ms after the saccade that is entirely based on presaccadic motion processing in the peripheral visual field and a visually-driven component from 100 ms after the saccade that is based on postsaccadic motion processing in the fovea. While the PFR gain of patients in the intactfield was comparable to the data of healthy control subjects from a previous study (Kwon et al., 2019), the predictive, open-loop PFR gain of patients in the affected field was close to zero. This was not the case for the visually-driven PFR. The authors interpret their findings in terms of a dissociation between residual motion perception and absent predictive oculomotor control in patients with V1 lesions.

      Strengths:<br /> The study contains a rare and valuable set of perceptual and oculomotor data from eight patients with lesions in V1, who underwent restoration training. The direct comparison between peripheral motion discrimination and predictive oculomotor responses is interesting and innovative. Also, the distinction between the predictive, open-loop and the closed-loop component of PFR is important. A potential dissociation between motion perception and oculomotor control would be very relevant for the understanding of different pathways of motion processing for perception and oculomotor control and also for the understanding of the effects of restoration trainings after lesions of V1.

      Weaknesses:<br /> The dissociation between perception and oculomotor control in the affected field is primarily based on two results: First, the combination of low PFR gain (Figure 4A) on the one hand and low to medium NDR thresholds (Table 1) on the other hand. Second, the absence of a correlation between NDR thresholds and PFR gain (Figure 4B). However, the data are not as clear-cut. The regression of PRF gain on NDR thresholds in the intact-field predicts that there should be a substantial PRF gain only at NDR thresholds below about 0.3. For the affected field this applies only to three data points of which one shows a substantial PFR and is fully compatible with the data in the intact-field. Hence, the evidence of a dissociation between motion perception and oculomotor control is based on a very small number of data points. This also allows for a different interpretation: instead of assuming separate pathways for motion perception and oculomotor control in patients, the results might also be explained by a different read-out of the same motion signal for perception and oculomotor control, where oculomotor control applies a more conservative threshold and requires a higher internal signal strength than the motion perception.

      The comparison of the patients' data to the data in the previous study (Kwon et al., 2019) is not very informative. First, the patients were considerably older than the participants in the previous study, and an age-matched control group would be favourable. That being said, the fact that the PFR gain was comparable for the intact-field of the patients and the previous study renders age-effects rather unlikely.

      Second, there is no control data for the motion discrimination task, so we don't know what the NDR thresholds and even more importantly what the relationship between NDR thresholds and PFR gain in healthy observers would be.

      We thank the reviewer for their evaluation. We have attempted to address concerns about sufficient sampling from blind-fields with recovery that reached the normal range by collecting additional data, doubling our sample size within that range. This is discussed above in “Essential revisions”, along with the alternative interpretation that perception and oculomotor control might rely on a different threshold in readout. The role of age differences was considered in the original manuscript, but this remains an unlikely factor, as the reviewer notes. With regard to normative NDR threshold data, surprisingly, this has not been published in visually-intact controls in a manner that is identical to that in the present study. However, prior work has established that performance in CB patients’ intact visual fields is normal across a wide range of behavioral measures that include luminance contrast sensitivity, processing of form, color and motion, as well as spatial and temporal frequencies (e.g. Barbur et al., 1980; Morland et al., 1999; Sahraie et al., 2006; Huxlin et al., 2009; Das et al., 2014; Levi et al., 2015). In the present study, we have thus used the intact-field as an internal control for blind-field performance in the same participant, as is standard in the field, expecting that intact-field NDR thresholds should be within the normal range. Verifying this is outside the scope of the present paper, but is now planned for our subsequent studies. Other detailed responses appear below to point by point for the reviewer’s “Recommendations for authors”.

      Reviewer #2 (Public Review):

      This study addresses the oculomotor behaviour of cortically-blind patients (with lesions in V1) who are instructed to perform a saccade toward a cued target placed either in their intact or in the blind visual field. The saccadic target consists in an aperture containing random-dot motion at 75% direction discrimination threshold ("NDR"), and is presented with iso-eccentric similar distractor apertures: with this kind of stimulus, the gaze of normally-sighted participants drifts smoothly in the direction of the target random dot motion immediately after the end of the saccade. Importantly, for some patients, a perceptual training had led to a good recovery of perceptual performance in the blind-field, as documented by the reduction of motion direction discrimination threshold to levels similar to the control healthy participants. Cortically-blind (CB) patients are shown to perform very similarly to control participants in terms of saccade accuracy, but they have longer latency. As for the postsaccadic ocular following response ("PFR"), the eye velocity component projected on the random-dot motion direction Is comparable to controls when the saccade was directed to the intactfield, but the mean PFR is significantly lower for saccades directed toward the blind-field. The authors conclude that V1 lesions result in a previously ignored selective impairment of the automatic transaccadic transmission of visual information that drive the ocular following response. In the supplementary information, it is also shown and the shift of saccadic landing position which is induced by the presaccadic target motion is strongly reduced (yet different from zero) for saccades to the blind-field locations in CB patients.

      The manuscript is very well written and illustrated, and the addressed question is novel and highly interesting. The inclusion in the experiment of locations of the patients' blind-field for which some perceptual abilities had been recovered is particularly interesting. However some major weaknesses fragilize part of the results and undermine the interpretation of results (see below). I also list a series of other minor issues to be clarified or improved.

      Main weaknesses:<br /> 1) Unfortunately, the present data do not allow to strongly support the conclusion that the reduced PFR gain in patients is decorrelated from the motion discrimination performance. As a matter of fact, in Figure 4B the function describing the relation between PFR gain and NDR is reasonably linear in a very limited interval of NDR values (say <0.3), and it should rather be described as a decreasing exponential, or similar, approaching 0 already for NDR~0.3. On the other hand, it is presumably hard to appropriately fit a similar exponential function to the blind-field datapoints, as the majority of the latter lay in the range of NDR threshold (say > 0.4) where the PFR gain would in any case be flat and close to 0. In other terms, in my view there aren't enough blind-field datapoints with low NDR threshold to assess a quantitative difference in the relation between PFR and NDR between CB patients and Control participants.

      Finally, and probably just a misunderstanding of mine, shouldn't the empty circles in Figure 4A and 4B have the same y-coordinate (the PFR gain value)? It does not seem so when looking at these figures.

      2) A second weak point, in my opinion, concerns the interpretation of the results and in particular the exclusion of a role for presaccadic attentional mechanisms. The authors claim (lines 356-358): "That the FEF and its projections to area MT are intact in V1-stroke patients suggests preservation of presaccadic planning and attention selection for the saccade target even when visual input is weak or abnormal in a blind-field" and this is definitely a valuable point. However a number of other physiological mechanisms involving V1 could play a role in the spatially-selective processing of motion and the argument that (lines 368 and ff) "other aspects of saccade pre-planning related to perceptual shifts in the position of motion targets, remain in the blind-field" is not very robust here, considering that the reduction in the angular deviation is very strong in the blind-field (Supplementary Figure 2).

      Here is a speculative alternative interpretation: V1-lesioned patients suffer among others of a specific impairment for spatially-selective motion processing. Unfortunately, the training in peripheral motion discrimination does not test this particular possibility, if I understand correctly, as there was no other distractor aperture containing distracting motion information (see Fig 2A). In contrast, in the main experiment, a lack of spatial selectivity for motion integration may have strongly affected the presaccadic motion discrimination (being more global than local) as well as PFR and postsaccadic landing position shift (although the latter was partly spared). According to this possibility, a simple prediction is that depending on the (randomly determined) motion direction in the distracting apertures, the PFR (the true eye movement, not the projection according to the stimulus motion axis) should be deviated in different directions, coherent with a global integration of motion. Do the available data allow to verify this possibility? In general, I think that it would be interesting to analyse post-saccadic smooth eye velocity beyond the "projected" velocity.

      We thank the reviewer for their evaluation, several parts of which overlap with Reviewers 1 and 3. In particular, the concerns about sufficient sampling from blind-fields that recover motion integration (NDR < 0.35) have been addressed by collecting additional data and performing new analyses, and we have also addressed possible impairments to spatial attention (see above in “Essential revisions”). The discrepancy noted in the y-ordinate between 4A and B is related to those analyses being by subject (4A) versus by visual field location (4B), which we already addressed above, in response to Reviewer 1. Other detailed responses appear below.

      Reviewer #3 (Public Review):

      The human visual system comprises a tangle of neural pathways that subserve different perceptual, cognitive, and motor functions. Unfortunate cases of brain damage can reveal surprising dissociations between the functions of damaged and spared tissue. Perhaps the most famous example is blindsight, when damage to visual regions of occipital cortex leads to subjective blindness in parts of the visual field while sparing some visually-guided actions. Kwon, Huxlin and Mitchell had a rare opportunity to study eight individuals with that type of cortical blindness due to stroke, and put them through a carefully designed regimen of visual training and oculomotor testing.

      The main focus was a particular oculomotor behavior that they term the "post-saccadic following response": when a neurotypical person makes a saccade to an object moving in the periphery, their eyes immediately begin smoothly following the stimulus motion, due to an oculomotor plan made before the saccade began. In this case, the stroke patients were able to regain their ability to discriminate stimulus motion in the "blind" parts of the visual field, but upon saccading to those stimuli they did not show the immediate post-saccadic following response. This surprising result shows yet another splintering dissociation between perception and action, demonstrating that the effects of stroke can be very specific to certain motor actions.

      Strengths:<br /> - The authors masterfully combined several techniques in a rare and carefully chosen sample of participants: neuropsychiatric evaluations, rehabilitation training, psychophysics and eye-movement analyses.<br /> - The analyses that link all those measures together, while complicated and precise, and elegantly and clearly presented.<br /> The study provides a twist on blindsight that is interesting philosophically, while also constraining our models of neural circuitry and informing approaches to rehabilitation after stroke.

      Weakness:<br /> - The unique nature of this study is a strength but also potentially limits its impact: the authors studied one particular type of eye movement with a complicated, unnatural stimulus arrangement. For example, the stimuli were groups of random moving dots windowed through static apertures. These stimuli, which move but also don't, are quite different from real moving objects that people track with their eyes (flying birds, for example). A related issue, which the authors briefly acknowledge, is that the training was specifically directed towards explicit perceptual reports. We therefore don't know if the oculomotor behavior (the PFR) could also be trained.<br /> - The authors rely on traditional null-hypothesis tests (t-tests and correlations) to make binary judgements of whether each effect or difference is "significant" (p<0.05). Some of the conclusions would be more convincing if supplemented with power analyses, bootstrapped confidence intervals, and Bayes factors to evaluate the strength of evidence.

      We thank Reviewer 3 for their evaluation. The choice of stimuli/task and their “naturalness” is addressed in our point by point responses to the “Recommendations for authors” below. We have also revised the manuscript to include boot-strapped confidence intervals, along with other statistics suggested by other reviewers, as noted under “Essential revisions for authors”. Other detailed responses appear below point by point.

    1. Author Response

      Reviewer #3 (Public Review):

      Phillips and colleagues present results obtained by generating loss-of-function mutations in the YAP/TAZ ortholog of the unicellular holozoan Capsaspora owczarzaki. In previous work published collaboratively by the Pan and Ruiz-Trillo labs, the authors had shown that Capsaspora has orthologs of yorkie (yki) and hippo (hpo) and that when these genes were expressed in Drosophila they functioned in a way that was consistent with the well-characterized function of the Hippo pathway in regulating cell proliferation.

      Characterizing the role of the pathway in Capsaspora required the ability to manipulate gene expression in that organism. In this manuscript, the authors describe remarkable progress in that area. They generate lines that stably express fluorescent proteins. Excitingly, they are able to use CRISPR/Cas9 and generate loss-of-function alleles using a donor-template strategy. These accomplishments pave the way for the study of Capsaspora using molecular tools.

      The authors then use these technologies to generate biallelic loss of function mutations in Capsaspora. They find no evidence of defects in cell proliferation either when these cells are cultured by themselves or when they are mixed with wild-type cells. However, they do find evidence of abnormalities in the cytoskeleton. They find that the cells themselves, and the multicellular aggregates that they form are more irregular in shape. The cells appear to adhere to substrates better than wild-type cells. They show surface blebbing that changes in the cell cortex with evidence for altered actin dynamics.

      From these experiments, the authors conclude that the ancestral function of the Hippo pathway is to regulate the cytoskeleton and that its ability to regulate cell proliferation was acquired more recently in evolution.

      The technical achievements are impressive, the experiments are well designed and executed, and are presented clearly. I have no issues with them. However, I feel that two of the main conclusions that the authors make are not justified by the results.

      1) The authors seem convinced that CoYki functions as a transcriptional regulator. They seem to suggest that it is primarily a regulator of cytoskeletal genes. There is a body of work from the Fehon laboratory that Yki has a function at the cell cortex in Drosophila that is independent of its function as a transcriptional regulator. See the work by Xu et al. 2018; PMID30032991 (not cited in this paper). In the absence of data that shows the localization of CoYki, I don't see how the authors can tell where it is working (in the nucleus or at the cell cortex) to regulate the cytoskeleton.

      To provide support for asserting that coYki is transcriptional regulator, we have done the following:

      • We have cited previous results showing that coYki and its binding partner coSd can, when expressed together in the Drosophila eye, induce transcription of Hippo pathway genes, indicating a role for coYki in transcriptional regulation

      • We have examined the localization fluorescent fusions of coYki and a coYki (coYki 4SA) mutant predicted to be nonphosphorylatable by upstream Hippo pathway kinases. Enrichment of coYki at the cell cortex was not detected. However, the 4SA mutant showed increased localization in the nucleus relative to the WT coYki protein, arguing for a nuclear function of coYki.

      These data are therefore consistent with the prevailing view of Yki/YAP/TAZ as a transcriptional regulator in other species. Nevertheless, we cannot formally exclude the possibility that coYki may also affect the cytoskeleton through a non-transcriptional manner as described by Xu et al., which we have now stated in the Results section of our manuscript.

      2) Capsaspora and animals such as ourselves are equally separated by time from our last common ancestor. There is no reason to think that the function of signaling pathways in the Capsaspora lineage has been frozen in time while ours have evolved. Indeed, the amazing diversity of protists is consistent with lots of evolution in every lineage. One could easily argue from the same data that the ancestral function of the Hippo pathway was to regulate cell proliferation and that this was lost in the lineage that led to Capsaspora. As we learn more about the function of the Hippo pathway in diverse organisms, we will be in a better position to guess what the ancestral function was.

      We agree that the function of signaling pathways in modern protists and their ancestors may not necessarily be identical, and that studies of Hippo signaling in other organisms, especially unicellular holozoans, may clarify which functions may have been ancestral, as we make a point to state at the end of our discussion. However, given that in animals Hippo signaling regulates the cytoskeleton and proliferation, and we find that in Capsaspora coYki affects the cytoskeleton but apparently not proliferation, it seems reasonable to us to suggest a model where cytoskeletal regulation was an ancient function, and the pathway was later co-opted for regulation of proliferation. We have added a section in the Discussion pointing out that we cannot, from our results, definitively conclude an ancestral Hippo pathway function.

      In summary, this manuscript describes technological innovations that will have a big impact on those who want to study this organism. They also provide convincing data to show that the Capsaspora Yorkie ortholog regulates cytoskeletal dynamics and not cell proliferation. However, as described above, the authors would need to tone down some of their conclusions.

    1. Attribution Theory Attribution theoryA process theory of motivation holding that that people are motivated according to what they believe underlies other people’s actions and attitudes. holds that people’s behavior is motivated by how they interpret the behavior of others around them. For instance, we may think that what’s causing others to act as they do is a combination of internal, personal factors. On the other hand, we may think that their behavior is a product of environmental variables.

      impacts

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the referees for their valuable suggestions. We have revised the text accordingly and already conducted most of the requested experiments.

      Reviewer #1

        1. The authors state that addition of mannan increases length of Birbeck granules however, no data are presented. It would make this more convincing when the length is compared between conditions with and without mannan (as shown in Fig 4, where the condition without mannan is lacking).

      Reply: Thank you for pointing out the missing data. We added an EM image of Birbeck granules and quantification of Birbeck granules formation in the absence of mannan (Figure 4A-D).

      • Supp, fig 1B perhaps as a panel in main figure as this is an important control to show that Birbeck granules are isolated.

      Reply: We moved the supplemental figure 1B to main figure 1D.

        1. Only the(total) length of Birbeck granules is taken into account, but not the number of Birbeck granules. Is it possible to quantify the number of Birbeck granules.

      Reply: We added Figure 4D to show the number of Birbeck granules. Note that the difference in the number of Birbeck granules was less significant than that of total length because there were numerous short fragments in the mutant specimen.

      • Fig 5. Only the condition (ARGK) where there is virtually no Birbeck granules formation is included, however, is virus still internalized in the other conditions (MRGD or MRGK) as Birbeck granule formation was less effective but still present? It would be interesting to include those mutants. A more specific quantification would be by p24 ELISA. Is there a reason why immunoblotting has been chosen? In the supernatant condition, explain why the virus p24 seems less in the control condition whereas one would expect max concentration in that condition.

      Reply: Thank you for suggesting the use of ELISA. We chose immunoblotting because of its higher sensitivity and lower cost. But ELISA is advantageous when it comes to comparing large number of samples. We performed p24 ELISA and quantified the virus internalization in all the mutants available (Figure 5C). As you pointed out, the transfer efficiency of the immunoblot in Figure 5A was not uniform across the membrane; Pr55 bands became denser toward the right, while p24 bands had a gradient in the opposite direction. The immunoblots and ELISA showed that about ~1% of the viruses were attached or internalized and ~99% did not interact with the cells. Thus, the attached/internalized viruses did not affect the amount of viruses in the supernatant. Results of ELISA also showed the amount of viruses in the supernatant were nearly equal among the samples (Figure S3B).

      • Abstract First sentence: not mucosal tissue but mucosal epithelium Last sentence: Virual should be viral

      Reply: We corrected the typo. Thank you.

      • Discussion The last section comparing DC-SIGN and langerin is not clear and some overstatements are made. "Considering that DC-SIGN serves as an attachment receptor for viruses but not as an entry receptor, the possible structural coupling of lateral ligand binding and internalization implies that langerin functions as a more efficient entry receptor for viruses than DC-SIGN or other C-type lectins." It is not correct that langerin but not DC-SIGN can function as an entry receptor. DC-SIGN has been shown to facilitate infection of different viruses such DENV and ZIKV. In contrast, langerin can restrict viruses such as HIV-1 but also facilitate infection for example Influenza A and DENV. So attachment or entry is more likely a consequence of the internalization and dependence on pH changes for fusion as some viruses such as DENV fuse in acidic vesicles. This needs to be discussed more clearly.

      Reply: Thank you for pointing out our wrong statement. We replaced the statement with weakened one as below:

      Page 13, line 213: “The difference in the ligand-binding manner between langerin and DC-SIGN may contribute to their different carbohydrate recognition preferences (Valverde et al., 2020; Takahara et al., 2004).“

      Reviewer #2 1) Langerin can exist on the cell surface and in Birbeck granules. They should examine langerin cell surface expression in the 3 states, wildtype, mutated and lectin - . Do the mutations change cell surface expression?

      Reply: We performed surface labeling experiments and showed that those mutations did not affect surface expression of langerin (Figure S3A).

      2) Birbeck granules are present in the absence of mannan and pathogens (see Pena-Cruz JCI 2018, PMID: 29723162). Thus, this suggests that Birbeck granules are present even without langerin clathrin coated pit internalization from the cell surface. How does their model account for this observation?

      Reply: We think there are two possibilities:

      1. Birbeck granules were shown to stem from the endoplasmic reticulum (Valladeau et al Immunity 2000; Lenormand et al PlosONE 2013). Since the rER is the site of glycosylation, langerin is likely to capture the oligo-mannose-glycosylated proteins within the rER and form Birbeck granules.
      2. Blood plasma proteins such as immunoglobulin D, immunoglobulin E, and apolipoprotein B-100 are reported to carry high-mannose glycans (Clerc et al Glycoconj J. 2016). Those glycoproteins in the cell culture media can induce Birbeck granule formation.

        3) Different cell types can have varied Langerin levels (see Pena-Cruz JCI 2018, PMID: 29723162). Is Birbeck granule formation depend on certain level of langerin expression? Do Birbeck granules form when Langerin is present at low as compared to high levels?

      Reply: In the course of the experiments, we isolated a cell line stably expressing langerin. However, langerin expressing cells were extremely slow in proliferation and the expression levels were low. To answer this question, we recovered this “failed” stable cell line and found that the low langerin-expressing cells can form Birbeck granules, but with lower efficiency (Figure S3C-E).

      4) Authors use immunoblots to show that HIV is present in intra-cellular Langerin structures. It would be ideal to visualize HIV with presumably internal Birbeck granules using imaging techniques such as cryo-electron micrography or another form of high resolution imaging.

      Reply: We are currently working on ultra-thin section electron microscopy of HIV-infected langerin-expressing cells. Visualization of HIV-containing Birbeck granules using cryo-electron microscopy is highly challenging because the current precision of cryo-FIB-SEM milling technique is too low to target a specific intracellular structure. We believe conventional electron microscopy will provide sufficiently convincing evidence that HIV is present within Birbeck granules.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We are grateful for the referees' rigorous review of our manuscript and for their overall positive reception of our work. We have pasted below the entirety of the reviewers’ comments, interleaved with our responses.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Gama et al. use a biophysical assay DAmFRET, structural analysis, and optogenetic tools to uncover the nucleation mechanism of CBM signalosome. They performed experiments first in yeast cells that lack death folds or related signaling networks, then confirmed their discoveries in human cells. The results presented here are clear and convincing. The paper is very well presented and clearly written.

      They found it is the CARD domain of BCL10 that acts as a molecular switch that drives all-or-none activation of NF-kB. Monomeric BCL10 possesses an unfavorable conformation and serves as a nucleation barrier, keeping BCL10 in a supersaturated inactive state that allows for binary activation upon stimulation.

      They also characterized CARD9 CARD domain and a coiled-coil region. They reasoned that CARD9CARD functions as a polymer seed to nucleate BCL10, and that the coiled-coil region has multimerization ability to facilitate nucleation. Furthermore, they characterized that MALT1 activation doesn't depend on BCL10 polymers but its own proximity. And MALT1 induces graded NF-kB activation, thus further demonstrating the binary activation is conferred by BCL10.

      Major comments:

      1. Fig S1D and E, the authors used TNF-a to activate NF-kB independent of CBM signalosome and found the activation in each cell increased with dose. In contrast, CBM activation led to bimodal cell activation. The authors claim that this is evidence that positive feedback upstream of NF-kB. We do not believe this claim can be made from this comparative experiment alone. We agree that positive feedback is important for activating an NF-kB response, but the comparison between CBM and TNFa is inaccurate and glosses over published data. Specifically, there is published data that TNF-a does activate a 'switch-like' or digital response, as defined by the translocation of p65 (see (Tay et al. 2010) among other studies that have examined p65 translocation at the single-cell level). The difference in T-sapphire expression between CBM and TNF activation is most likely due to TNFa induced oscillations of p65 translocation (although this is speculation on our part). Therefore we suggest to the authors that the TNF-a data (Fig S1D and E) should be omitted, as the claim of switch or not-switch as pertains to TNF signaling is more complex and nuanced than presented here. We believe omitting this data will strengthen the manuscript and avoid confusion in the field. The bimodal expression of the T-sapphire NF-kB reporter driven by the CBM signalosome activation is sufficient to claim an all-or-none response.

      We thank the reviewer for this suggestion. We acknowledge that the activation of NF-κB by TNF-ɑ is more complex than we had presented, and agree that the differences in T-Sapphire reporter output could be attributed to p65 oscillations. We had not previously considered this interesting possibility -- which is not addressed by the present data -- believe it is worth future investigation. As suggested by the reviewer, we have now omitted the TNF-a data, and agree that this change does not impact the overall claims of the paper.

      Fig 3B, the authors introduced CARD9CARD-µNS as a stable condensed seed for BLC10. However, considering CARD9CARD can form polymers at high concentration (Fig 3B and S3D), are these high expression levels of CARD9CARD able to induce BCL10-mEos3.1 assembly (as measured by DamFRET in yeast cells)? Can the authors examine BCL10 FRET at these high expression level of CARD9CARD? We assume that BCL10 will be assembled in these cells. This would provide a valuable control experiment and support the author's conclusions.

      Indeed, this question is amenable to DAmFRET. Accordingly, we have now performed DAmFRET of yeast cells expressing Bc10-mEos3.1 in the presence of either CARD9CARD-mCardinal or mCardinal itself (see new Fig S6A and B, and associated results section). We confirmed that cells with high CARD9CARD-mCardinal expression had higher FRET on average than cells with low expression. Importantly, cells expressing high or low levels of mCardinal itself had the same FRET level (Fig S6).

      Fig 3C, the text said "Whereas WT CARD9CARD assembled into polymers at high concentration, the pathogenic mutants R18W, R35Q, R57H, and G72S failed to do so (Fig 3C and S7B,C), explaining why they cannot nucleate BCL10". This claim that these mutants can not nucleate BCL10 does not have a figure call out or a reference. The authors then show the results in Fig 3E which supports this claim. Even though they were done in the context of full-length CARD, all proteins contain the I107E mutation that releases autoinhibition. For clarity, the authors should consider rearranging the text to avoid explaining a phenomenon and making conclusions before showing the results.

      We have now rearranged this section to match the figures and claims.

      Fig 4D, E and Video 1, the authors showed the nucleation of BCL10 into puncta within live cells is followed by p65 translocation to the nucleus. The authors claim that 'this result suggests that BCL10 is indeed supersaturated prior to stimulation' (paragraph 2 section titled BCL10 is endogenously supersaturated'). We fail to understand how this live-cell experiment leads to the conclusion BCL10 is supersaturated before stimulation. We think this text should be deleted from the text, or put into context with the DAmFRET data that lead the authors to make this claim. It would be interesting for the authors to define in discussion what are the golden criteria to claim a protein exists in a supersaturated state with live cells (by microscopy or other methods)? Adaptor protein assembly into puncta and the subsequent nuclear translocation of transcription factors is a common phenomenon across signalling pathways. Not all these pathways rely on signaling adaptors existing in a supersaturated state. The field of cell signaling (and cell biology in general) would benefit from a detailed definition of how these physical-chemical definitions of proteins are supported by experimental data. We believe that this paper will become a seminal paper in the field, and future work will benefit from a clear definition of how a claim of supersaturation is derived from the data.

      We appreciate that the concept of supersaturation will be foreign to many biologists, and welcome this opportunity to elaborate. We have now rephrased the corresponding results section for figure 4D, E, and have added new evidence to support our claim that BCL10 is supersaturated, as had been requested by reviewer 2 (see below in response to point 1). Supersaturation, as we (correctly) use the term, occurs when the concentration of a protein in solution exceeds its equilibrium solubility for the given conditions. The term is also sometimes used to describe __global __protein “concentrations” in excess of the solubility limit, even if a dense phase has already formed and potentially depleted the effective concentration (in solution) to the solubility limit. This is a key distinction, as only the former implies a high-energy out-of-equilibrium scenario that predetermines a future change -- release of the excess energy via phase separation.

      How does one experimentally determine if a protein is supersaturated? In theory, one may conclude that a protein is supersaturated if its assembly causes a net loss of energy from the system (i.e. exothermic). Unfortunately, it is likely not yet possible to perform such measurements with sufficient sensitivity inside a living cell. However, it is possible to infer that a protein is supersaturated if assembly can be shown to occur without a net input of energy to the system, i.e. without any change in thermodynamic control parameters such as temperature, pH, post-translational modifications, concentration of the protein, or concentration of any interacting factor. To do this, one introduces a substoichiometric amount of pre-assembled protein to the system. This manipulation will trigger assembly if the protein is supersaturated. If the protein is instead subsaturated, assembly will not occur and the exogenously added assemblies will simply dissolve. This phenomenon, known as “seeding” in the prion field, is considered a golden criterion sufficient to conclude that a protein has prion behavior. However, because bona fide prions additionally require a means for dissemination between cells, seeding analyzed at the cellular rather than population level is more appropriately considered a sufficient criterion for supersaturation (which is a prerequisite for classical prion behavior (Khan et al. 2018)). Our CARD9CARD-Cry2 experiment was designed to test this criterion. Specifically, it allowed us to introduce a seed independently of receptor activation, thereby precluding any orthogonal cellular response that might lower Bcl10 solubility through e.g. a post-translational change. That the seeds were substoichiometric is evidenced by the fact that Bcl10 polymerized homotypically following stimulation (i.e. it didn’t just bind to the CARD9CARD puncta, but went on to deposit onto itself).

      How does assembly under this scenario differ in principle from the many examples of puncta formed by other signaling proteins that occur upon stimulation of their respective pathways? Puncta formation that is induced by a thermodynamic change in the cell cannot be said to have resulted from pre-existing supersaturation. Rather, the stimulus may have caused some change that either increases the effective concentration of the protein (e.g. upregulates its expression, induces a post-translational change that activates it, or releases an inhibitory factor) or reduces solvent activity (e.g. change in pH).

      An additional requirement (necessary but not sufficient) is that the assembly must be regular with respect to some order parameter. That is to say, it must be a bona fide “phase”. At a minimum, this implies a uniform density. Additionally, for supersaturation to persist over biological timescales under physiological conditions and confinement volumes, the assembly (once formed) must also have structural repetition in at least two dimensions, i.e. crystallinity (Rodríguez Gama et al. 2021; Zhang and Schmit 2016). We know this to be true for Bcl10.

      Rodríguez Gama A, Miller T, Halfmann R. 2021. Mechanics of a molecular mousetrap-nucleation-limited innate immune signaling. Biophys J 120:1150–1160. doi:10.1016/j.bpj.2021.01.007

      Khan, T., Kandola, T.S., Wu, J., Venkatesan, S., Ketter, E., Lange, J.J., Rodríguez Gama, A., Box, A., Unruh, J.R., Cook, M., et al. (2018). Quantifying nucleation in vivo reveals the physical basis of prion-like phase behavior. Mol. Cell 71, 155-168.e7.

      Zhang L, Schmit JD. 2016. Pseudo-one-dimensional nucleation in dilute polymer solutions. Phys Rev E 93:060401. doi:10.1103/PhysRevE.93.060401

      Regarding the supersaturated state of BCL10, the authors convincingly use optogenetics to show how transient assemblies of CARD-Cry2 can template BCL10 assembly. This is a convincing experiment that shows templated nucleation of BCL10. To strengthen the claim that BCL10 is supersaturated endogenously we suggest the author quantify the expression of BCL10-mScarlet and CARD-Cry2 and ideally show that this phenomenon can be observed at expression levels equivalent to endogenous.

      As stated above, that BCL10-mScarlet formed polymers that we observed to elongate homotypically off of the CARD9CARD seeds indicates that the protein was supersaturated under the conditions of the experiment. The concentration of CARD9 is not a relevant parameter in this case. We had already compared the expression of BCL10-mScarlet to endogenous BCL10 in 293T, THP-1, and human fibroblast cells by quantitative immunodetection (Fig. S10D), revealing that the expression level of our BCL10-mScarlet constructs matched that of endogenous BCL10, which was approximately the same in all cell lines. We also compared the distribution of expression levels of BCL10-mScarlet versus that of endogenous BCL10 using antibody staining followed by flow cytometry, which confirmed that the range of expression levels of BCL10-mScarlet falls within that of endogenous BCL10 in 293T cells (Fig. S10F). Hence, we believe our data suffice to conclude that Bcl10 is supersaturated at endogenous levels of expression.

      Minor comments:

      1. Special character "delta" is not displayed in the text (instead only a space).

      This error occurred upon exporting the manuscript from our text editor to a PDF. We now have made sure all special characters are present in the PDF version.

      Several cell lines including mouse, human, and yeast lines were used across this manuscript. It would be clearer and more helpful if the exact cell type of the line could be indicated. Such as, "BCL10-mEos3.1 yeast cells" instead of "BCL10-mEos3.1 cells", "BCL10-mScarlet HEK293T cells" instead of "BCL10-mScarlet cells".

      We have now modified all instances to indicate the origin of the cell lines tested.

      Fig 5B, the authors indicated that BCL10 colocalized with CARD9CARD, then please show the merged image as well.

      We have now included the merged image to indicate colocalization in the inset images.

      Fig 6E, authors claimed that cells were stimulated with blue light for the indicated durations. The longest duration is 12 hours. Please specify if it was continuous exposure or several rounds of exposure in the indicated durations.

      We have now specified in the figure legends, text, and methods section, that this specific experiment used a continuous exposure of blue light.

      Reviewer #1 (Significance (Required)):

      This work used a combination of FRET and optogenetic tools to engineer CBM signaling and visualize the effects. They incorporated knowledge from structure biology, together with their results from mutations and truncations, dissected the significance of each protein in CBM signalosome, and demonstrated in detail how higher-order assemblies make all-or-none cellular decisions. We believe this paper will be a seminal paper in the field of cell signalling and cytoplasmic organization. It defines a new paradigm of macromolecules assembly of signalling complexes as being dependent on protein existing in a supersaturated state. Importantly this paper opens up new questions regarding macromolecular signaling complexes (found in many innate immune signaling pathways): How is protein supersaturation maintained and used throughout evolution to construct biochemical signalling switches?

      This paper will be of particular interest to scientists working on immunity and cell signalling, especially in the field of higher-order assemblies. However, we feel the impact of this paper goes beyond these fields, and we believe this manuscript will be of broad interest to the cell biology and biophysics communities. For reference, our expertise is in innate immunity and cell biology.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In their manuscript entitled "A nucleation barrier springloads..." Rodriguez-Gama et al. dissect the assembly mechanism of the signalosome, composed of the proteins CARD9, BCL10 and MALT1, using a novel in-cell biophysical approach (DAmFRET). They first overexpressed fluorescently tagged versions of the proteins to promote their assembly in yeast and mammalian cells, finding that CARD9 forms higher order assemblies across a wide range of concentrations with no discontinuity in the DAmFRET profile. In contrast, the DAmFRET profile of BCL10 showed a clear separation between monomers and higher order assemblies, which started to form spontaneously only at higher BCL10 concentrations. Furthermore, at the two states of the proteins co-exist at all concentrations. These observations imply that there is a nucleation barrier to forming BCL10 assemblies. MALT1 showed no change in FRET regardless of its expression level. These observations, alongside fluorescence microscopy of the assemblies, and previous structural studies, suggest that BCL10 forms self-templating polymers that act as a switch for an all-or-nothing immune response, assayed in this case by monitoring the nuclear translocation of the NF-kB subunit p65. The authors also assessed the effects of known disease-causing mutations on the nucleation barrier, showing that changes in the strength of the nucleation barrier can have major effects on signalosome function. Finally, they used optogenetic methods to trigger assembly of individual signalosome components, providing insight into the minimal components/conditions required for signalosomes to work.

      Major comments

      Overall, the experiments by Rodriguez-Gama et al. offer convincing evidence that there is a nucleation barrier to BCL10 polymerisation, and that a CARD9 template is sufficient to overcome the barrier. Although the existence of a nucleation barrier had already been postulated, based on structural and other studies (referenced by the authors), it had lacked a rigorous demonstration. This work provides that demonstration, which is important for the signalosome field and more broadly applicable to researchers studying cellular decision making. The study further demonstrates that DaMFRET is an excellent to study protein assembly processes in their native environment, allowing the authors to tackle a question that would have been technically very difficult to address otherwise. The optogenetic experiments are a nice sufficiency test for their ideas.

      We feel there are a few key points to address before publication.

      1) One of the main conclusions is that spring-loading the nucleation barrier with high super-saturating BCL10 concentrations allows a decisive response. Although much of the data strongly imply this conclusion, the dependence of the immune response on BCL10 concentration was not tested directly. A key prediction of the nucleation barrier is that at concentrations below saturation, BCL10 should not be able to induce an all-or-nothing response when stimulated. At saturated/super-saturated concentrations BCL10 should be able to induce a response. At deeply super-saturated concentrations the response should start to be activated spontaneously in the absence of an external stimulus. These predictions could be tested using the doxycycline-inducible BCL10 system (Figure S2D), without establishing major new experimental avenues. We feel that such an experiment would strengthen the main conclusion. It might also help to shed light on whether being highly supersaturated enables a more decisive response than being just saturated.

      This is a great idea. As the reviewer suggested, our Doxycycline-inducible BCL10 system enables us to induce and track the state of BCL10 over time. We have now performed the requested experiments (Fig. S9D, E) and incorporated the results into the relevant section of the text. In short, our new analyses show that BCL10 indeed has a concentration threshold for activation by stimulation, and that it can also nucleate spontaneously when overexpressed. Note that our original analyses in Fig. 4B and C also demonstrate spontaneous BCL10 activation at high concentrations. With this new evidence and the orthogonal approaches used in Fig. 5, we believe our data definitively support our conclusion that BCL10 is supersaturated.

      2) Intuitively, readers might expect that if BCL10 is supersaturated then, once nucleated, it would rapidly assemble at the nucleation sites. In Figure 5B, CARD9CARD-miRFP670nano-Cry2 assemblies are optically induced throughout the cell. However, BCL10 appears to nucleate at just a few sites with a few minutes delay. More widespread nucleation and growth of BCL10 polymers seems to take longer (20-40 minutes, Figures 5B and 5C), after CARD9CARD-miRFP670nano-Cry2 has disassembled. Furthermore, in Figures 4D and 4E, very few BCL10 assemblies are visible/quantifiable after 70 minutes PMA exposure, but p65 has clearly entered the nucleus. It looks like BCL10 assembly slightly lags behind p65 nuclear entry. Can the authors provide a more detailed explanation of these kinetics?

      We do note that the number of CARD9CARD clusters formed upon opto-stimulation exceeds the apparent number of BCL10 nucleation sites. We believe this is consistent with nucleation-limited kinetics, where the clustering of CARD9-CARD increases the local probability of nucleation. As nuclei form and grow, they lower the probability of subsequent nucleation elsewhere in the cell. Additionally, it is possible that our artificial seeds do not perfectly mimic the native CARD9 seeds that form upon natural stimulation (e.g. due to potential steric interference from the fluorophore and Cry2). We also acknowledge that there is a slight delay in the visible appearance of BCL10 polymers relative to p65 nuclear translocation. We expect that MALT1 activates already when the polymers are still too small to see (sub-resolution), whereas the polymers only become microscopically visible once they’ve grown quite a bit more.

      3) Related to point 2 above, in Figure 5D, the leftmost cell in the field of view clearly contains CARD9CARD assemblies but there are no BCL10 assemblies and p65 is not imported into the nucleus (in contrast to the central cell in the field of view). How often does CARD9CARD optogenetic assembly lead to BCL10 assembly? In other words, can the authors quantify the cell-to-cell variability in this experiment?

      Throughout our experiments, whether analyzing BCL10 puncta formation, NF-kB transcriptional activity, or p65 translocation, we observed a persistent nonresponsive fraction of cells even at saturating levels of stimulation. Specifically, approximately 30% of THP-1 cells failed to acquire T-Sapphire fluorescence or form BCL10-mEos3.2 puncta when stimulated with high levels of β-glucan (Fig 1B and E, respectively), and approximately 25% of 293T cells failed to acquire T-Sapphire fluorescence or exhibit p65 nuclear translocation when stimulated with high levels of PMA (Fig 1C and Fig 4E, respectively). Because these numbers did not depend on whether BCL10 was endogenously or exogenously expressed, we know that the underlying cell-to-cell heterogeneity involves factors upstream of BCL10. Indeed, the fraction of recalcitrant cells drops to 10% in our optogenetic experiments that bypass upstream factors (Fig S11E). Possible sources of heterogeneity include different physiological states of the cells or fluctuations in the expression levels of any upstream factor in the signaling pathway. We believe that this phenomenon is not unique to the CBM signalosome, as we (unpublished) and others (Fernandes-Alnemri T et al, 2009, Dick M et al, 2016) have similarly observed a fraction of non-responding cells upon activation of the inflammasome, which involves nucleation-limited polymerization of the adaptor protein ASC. While this phenomenon is interesting and may be important to our understanding of the full complexity of signalosomes in vivo, we believe that identifying the source of heterogeneity would be outside the scope of the present manuscript. We now describe this phenomenon in the final paragraph of the “Endogenous BCL10 is constitutively supersaturated” section.

      Fernandes-Alnemri, T., Yu, JW., Datta, P. et al. AIM2 activates the inflammasome and cell death in response to cytoplasmic DNA. Nature 458, 509–513 (2009). https://doi.org/10.1038/nature07710

      Dick, M., Sborgi, L., Rühl, S. et al. ASC filament formation serves as a signal amplification mechanism for inflammasomes. Nat Commun 7, 11929 (2016). https://doi.org/10.1038/ncomms11929

      Minor comments

      While the work is scientifically well done, the text reads as though it is meant for experts rather than a broad audience. This is a pity because it risks alienating readers. We suggest that some adjustments to the text (mainly additional explanations and not ruling out alternative interpretations of the data) would widen the audience and increase the impact of this important study. Below are some suggestions that might help.

      1. In the first results section, the authors write: 'This suggests that Bcl10 but not CARD9 assembly occurs in a highly cooperative fashion that could, in principle (Koch, 2020), underlie the feed forward mechanism.' It isn't obvious how Figure 1 leads to this statement. Could the authors give a more detailed explanation?

      We have now revised the text to elaborate on this interpretation.

      One limitation of DAmFRET is that it can only detect a nucleation barrier where there is a difference in FRET between the monomer and the assembled form of the protein. However, it can't necessarily detect when there is not a nucleation barrier i.e. if there's no difference in FRET. The text seems to suggest that CARD9 and MALT1 don't have nucleation barriers to their assembly. While this might not be intentional, it would be helpful to explicitly state that CARD9 and MALT1 could also possess such barriers that are not detectable by this method. This wouldn't detract from the finding that BCL10 has a barrier that plays an important function.

      The reviewer is correct that DAmFRET would not be able to detect a nucleation barrier if the assembled phase does not condense the fluorophore to a sufficiently high density for FRET to occur. In our experience, this is only a concern for very large proteins whose bulk “dilutes” the fluorophores within the assembly. Death domains, on the other hand, are only ~ 3 nm in diameter, and FRET occurs within a range of ~10 nm; hence we think it very unlikely that the death domains could be forming cryptic polymers that escape our detection. In any case, when assembly does produce a change in FRET, we can with confidence determine how strongly that form of assembly is governed by concentration. Hence, for CARD9, which does produce a FRET signal upon assembly, we can say that assembly has a smaller intrinsic nucleation barrier than that of BCL10. We further eliminated the possibility of multi-step nucleation (which would reduce the apparent nucleation barrier relative to the one-step ideal case) for CARD9 by showing that artificial condensates of the protein expressed in trans do not influence the concentration-dependence of FRET (Fig. 4 B). Finally, under all conditions where CARD9 lacked FRET, it also lacked signaling activity, suggesting there is not a cryptic functional assembly that evades our assay. Likewise MALT1, which lacked FRET at all concentrations, was entirely unable to activate NF-kB upon overexpression (Fig. S8 A and B), suggesting that it too is not forming a cryptic functional assembly that evades our assay. We therefore feel confident in our conclusion that CARD9 and MALT1 lack nucleation barriers of a magnitude comparable to that of BCL10. Note that our claim is not that they entirely lack a nucleation barrier (CARD9 after all does form a multi-dimensionally ordered polymer), but rather that we fail to observe a nucleation barrier and hence any barrier that may exist is insufficient to manifest at the cellular level.

      In the final results section, the idea that MALT1 activation doesn't depend on BCL10 polymer structure doesn't necessarily follow from the data. An alternative interpretation is that optogenetic clustering of MALT1 causes it to recruit BCL10 and form BCL10-MALT1 filaments (structure solved by Schlauderer et al., 2018). Also, the optogenetic clustering of MALT1 may mimic some structure found in the BCL10 cluster. Therefore, we are neither convinced that the data unambiguously show that MALT1 activation strictly depends on multi-valency rather than an ordered structure of BCL10 polymers nor that this conclusion is truly necessary for the paper.

      We agree that the reviewer’s alternative interpretation of this experiment is possible. However, we consider it unlikely because we performed the experiment with MALT1 lacking its Death Domain (residues 126-824), which mediates its interaction with BCL10 (Schlauderer et al., 2018). Our experiments then suggest that MALT1 clustering is sufficient for activation independent of any structuring mediated by BCL10. Nevertheless, we have now performed an additional control in which we treated these cells with PMA to induce BCL10 polymerization. As expected, the NF-kB transcriptional reporter utterly failed to activate in this condition, indicating that MALT1 does not interact with BCL10 polymers when it lacks its death domain. This aspect has been further elaborated in our response to reviewer 3 point 5.

      What optical density do the yeast cells reach during the 16h induction in galactose? If they are in stationary phase, this could affect the assembly status of the proteins being expressed, as the cytoplasm becomes glassy when cells are starved, and this coincides with widespread protein aggregation/assembly (Joyner et al., 2016; Munder et al., 2016).

      In our DAmFRET strategy, we first dilute an overnight culture and regrow the cells to log phase prior to resuspending them in galactose media. Our strain is engineered to undergo cell cycle arrest upon protein induction, hence exponential growth is prevented and the cells do not deplete galactose during the 16 hr induction. We have also performed many time courses of DAmFRET following induction and generally find no qualitative difference between early and late times (unpublished). Early time points simply have lower expression and correspondingly fewer cells in the high FRET state. Importantly, all comparisons between proteins are made with the same 16 hr induction.

      Although these experiments show that thermodynamically lowering the BCL10 nucleation barrier (e.g. by post-translational modifications or protein expression levels) isn't required for a response, they don't rule it out. It would be good to state this in the discussion, as cells may have multiple mechanisms of switching on the signalosome.

      We thank the reviewer for this suggestion and have now explicitly stated in the discussion that our experiments do not argue against possible thermodynamic tuning of the nucleation barrier.

      The discussion compares signalosomes with condensates formed by liquid-liquid phase separation. This is an interesting comparison but it suggests that disordered assemblies would not be capable of performing signalosome-like functions. This needs to be explained more clearly. For example, non-amyloid prions seem to form gel-like assemblies with a high nucleation barrier that are capable of driving heritable traits, likely through self-templating (Chakravarty et al., 2020). Such examples could represent disordered assemblies with signalosome switch-like behaviour. Furthermore, there are examples of condensates that are induced by environmental changes e.g. Pab1 and Ded1 condensates (Riback et al., 2017; Iserman et al., 2020). This potentially allows the proteins to reach high concentrations and remain un-condensed until a change in heat or pH overcomes a nucleation barrier required for condensate formation. Although the condensates aren't self-templating, they seem to require energy for their disassembly. Combined, this also allows switch-like behaviour, where the switch is flipped back to the uncondensed off state once conditions return to normal. In general, crossing a phase boundary can represent a switch-like response. Finally, recent electron-tomography experiments show that ASC puncta comprise clusters of filaments (Liu et al., 2021, biorxiv). CARD9/BCL10 assemblies may have similar ultrastructures and liquid-liquid phase separation may well play a role in their assembly.

      Indeed, we explicitly maintain that liquid phases cannot themselves perform signalosome-like functions. Chakravarty et al. 2020 did not observe amyloids associated with their phenomena, but the relevant experiments were not designed to exhaustively exclude an underlying ordered phase. To the extent that gelation is involved, their observations are fully consistent with ours. IUPAC defines a “gel” as a colloidal network involving a solid phase and a dispersed phase. The existence of a solid phase necessarily implies an underlying disorder-to-order transition, even if limited to small length scales. In the case of gelation associated with liquid-liquid phase separation, nucleation of the ordered phase simply occurs in two steps (first condensation, then ordering). Note also that a liquid phase could in principle give rise to a heritable phenotype if it activates a positive feedback in a molecular biological process involving the protein of interest (e.g. upregulation of its expression or a change in interacting factors). Chakravarty et al. did not exclude such phenomena (it would be very difficult to do so); hence it cannot be concluded that phase separation is responsible for the sustained phenotypic changes.

      We do not fully follow the reviewer’s logic concerning the relevance of Pab1 and Ded1 condensates. These proteins only condense when their respective phase boundaries fall below the endogenous protein concentration, as upon thermal stress. The proteins are not supersaturated in the absence of such conditions (for example, they cannot be seeded), and it is incorrect to characterize the change in heat or pH as overcoming a pre-existing nucleation barrier. The concept of a nucleation barrier only applies under conditions where a phase is thermodynamically favored. It is also misleading to state that the Ded1 and Pab1 condensates require energy for disassembly. Rather, they require energy to disassemble rapidly. Unless the assemblies have accessed a more ordered phase as described above (two step nucleation), involving a lower phase boundary, they will inevitably dissolve after the conditions return to normal.

      We have much prior experience with ASC. Although it has not been explicitly shown, that it forms ordered polymers and can behave as a prionoid in vivo suggests that it very likely operates the same way as BCL10 (i.e. is physiologically supersaturated). That full-length ASC forms clusters of filaments is not relevant (in our view) to the mechanism shown here, which only requires that filaments are indeed formed. Formally, the size of the relevant nucleus determines the minimum length scale at which ordering must manifest in our mechanism. Based on the structure of death domain filaments, this could be as small as tetramers or hexamers (a minimal but structurally complete “polymer”).

      As stated above, and now elaborated in the discussion, our data do not exclude a role of thermodynamic regulation, as could lead to liquid-liquid phase separation, in tuning the nucleation barrier of Bcl10. What they do exclude is that such changes are required for Bcl10 to activate in the first place.

      Can the authors comment on the loss of BCL10 in Echinodermata, Anthropoda, Nematoda? Is there another protein that plays a similar role? Could a CARD or PCASP protein possess self-templating properties? Could other methods of control be at play e.g. protein expression?

      This is a very interesting question! We think the reviewer’s suggested explanations for the loss of BCL10 in those lineages are valid and worthy of future exploration. Nematodes such as C. elegans have lost multiple components of innate immunity. They have very few pathogen recognition receptors and also lack NF-kB! They do, however, have other adaptor proteins that the literature and our unpublished data suggest may have self-templating ability, such as TIR-1. Drosophila also encodes multiple TIR-containing proteins that are essential for innate immunity. In short, it is possible that other proteins have acquired the hypothetically essential role of supersaturation and nucleation-limited signaling in these organisms.

      Figures 1B/1C: Can the authors comment on why the active cells plateau at about 70-75%? This is a striking feature of the plots, but the explanation may not be obvious to readers.

      See our response to major point 3, above.

      Figures 1D/1E: What was the concentration of B-glucan used in this experiment? This could be included in the figure legend. If greater than 1ug/ml this means that the % of active cells in Figure 1B matches the % of cells with BCL10 assemblies in Figures 1D/1E, which is potentially an important point.

      We thank the reviewer for bringing this point to our attention. We have now indicated in the figure legend the concentration of B-glucan used in this experiment (10 μg/ml). That the percentage of active cells in Fig. 1B matches that of cells containing BCL10 polymers in Fig. 1D and E indeed strengthens the stated relationship between BCL10 assembly and NF-kB activation in THP-1 cells subjected to a relatively physiological stimulus. Additionally, we have performed experiments to measure the levels of p65 translocation in THP-1 cells treated with B-glucan that express BCL10-mEos3.2. This data is shown in Figs. S1D and E in response to reviewer 3.

      Use of both 'BCL10' and 'Bcl10' when referring to the protein.

      We have now replaced all instances where Bcl10 was used to follow guidelines for gene and protein name conventions.

      Bruford EA, Braschi B, Denny P, Jones TEM, Seal RL, Tweedie S. Guidelines for human gene nomenclature. Nat Genet. 2020;52(8):754-758. doi:10.1038/s41588-020-0669-3

      In the supplementary figures there are some formatting problems/missing words in the figure legends. In Figure S11 there is a black box covering the lower part of the figure.

      We have now fixed these instances.

      References used in this review

      Chakravarty, A.K. et al. (2020) "A Non-amyloid Prion Particle that Activates a Heritable Gene Expression Program," Molecular Cell, 77(2), pp. 251-265.e9. doi:10.1016/j.molcel.2019.10.028.

      Iserman, C. et al. (2020) "Condensation of Ded1p Promotes a Translational Switch from Housekeeping to Stress Protein Production," Cell, 181, pp. 818-831.e19. doi:10.1016/j.cell.2020.04.009.

      Joyner, R.P. et al. (2016) "A glucose-starvation response regulates the diffusion of macromolecules," eLife, 5. doi:10.7554/eLife.09376.

      Munder, M.C. et al. (2016) "A pH-driven transition of the cytoplasm from a fluid- to a solid-like state promotes entry into dormancy," eLife, 5(MARCH2016). doi:10.7554/ELIFE.09347.

      Riback, J.A. et al. (2017) "Stress-Triggered Phase Separation Is an Adaptive, Evolutionarily Tuned Response," Cell, 168(6), pp. 1028-1040.e19. doi:10.1016/j.cell.2017.02.027.

      Schlauderer, F. et al. (2018) "Molecular architecture and regulation of BCL10-MALT1 filaments," Nature Communications 2018 9:1, 9(1), pp. 1-12. doi:10.1038/s41467-018-06573-8.

      Reviewer #2 (Significance (Required)):

      The existence of a nucleation barrier had already been postulated, based on structural and other studies (referenced by the authors), it had lacked a rigorous demonstration. This work provides that demonstration, which is important for the signalosome field and more broadly applicable to researchers studying cellular decision making. The study further demonstrates that DaMFRET is an excellent to study protein assembly processes in their native environment, allowing the authors to tackle a question that would have been technically very difficult to address otherwise.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The study by Rodriguez Gama et al. addresses the molecular function of CBM complex-forming proteins CARD9, BCL10 and MALT1 in the activation of myeloid cells, using optogenetic tools, transcriptional reporters and biochemical approaches. It is known from previous studies that Bcl10 oligomerizes into filamentous oligomeric structures incorporating Malt1, and that these structures are nucleated by receptor-induced activation of CARD proteins such as CARD11 (in lymphocytes) or CARD9 (in myeloid cells), but the mechanism underlying the assembly of the resulting CBM complexes remain incompletely understood.

      The authors develop beautiful optogenetic tools to address this question, and convincingly demonstrate that CARD9-mediated nucleation of BCL10 triggers a binary cellular NF-kB response in a spring-load-like fashion, and identify mutants of BCL10 and CARD9 that impact this capacity. Unfortunately, however, the authors do not do a good job to simplify this complex problem so it can be easily understood. In particular, the choices of mutants, models and experiments are not consistent between figures, and some data seem to be arbitrarily added or omitted. Complex hybrid constructs are also used, without assessing whether these are indeed functional in the corresponding ko cells. The paper would therefore benefit from a major overhaul. We also noticed that the literature is often not cited adequately and have included a (non-exhaustive) list of examples of wrong, incomplete, or erroneous citations below.

      1. The initial observations of binary signaling are derived from a reporter system. Although there are controls to show that the reporter used does not function intrinsically cooperatively, it would be nice to see additional data to show that cooperativity occurs also at the level of endogenous response systems, for instance by qPCR-based assessment of a natural NF-kB target gene (induced for example by TNFa versus B-glucan in THP-1 cells, and by TNFa versus PMA in 293T cells).

      As detailed in the introduction, NF-kB has been shown by multiple labs to activate in a binary fashion. Our manuscript shows that NF-kB activation occurs in a binary fashion both at the level of transcription and at the level of nuclear translocation (upstream of any transcriptional output). While we do agree that additional data could further illustrate the biological significance of our findings, we do not feel it is necessary for our conclusions. Note also that because NF-kB activation occurs in a binary fashion per cell, a simple qPCR experiment would not suffice to extend our findings to the broader Nf-kB regulon. Instead, one would have to use e.g. RNA-FISH or single cell RNA-seq, nontrivial experiments that would take months to complete.

      The cell lines in Figures 1D-E (and also some of the BCL10 mutants used later on) would have been better run in the assays in the early parts of Figure 1. The final conclusion prior to the section The adaptor protein BCL10 is a nucleation-mediated switch is otherwise not justified. This is a central tenet of the paper, that is referred to again, with some other ancillary data to support it. These mutants reappear later in the paper, but it would have been better, and easier to make rescue lines of BCL10 KO in Figure 1, otherwise the logic is lost, and the models seem chosen arbitrarily.

      The choice of experiments in different panels of Fig. 1 resulted from a chronological progression of reagent construction as the project evolved. We do appreciate that switching between the assays may lead readers to doubt one or the other. Therefore, we have now immunostained for endogenous p65 in the same experiment as for Fig. 1D and confirmed that p65 translocated to the nucleus only in THP-1 BCL10-KO cells that have been reconstituted with WT BCL10-mEos3.2, but not E53R. We think this additional evidence along with our orthogonal measurements in other reporter systems confirms our findings that BCL10 nucleation determines NF-kB activity.

      Expression with microNS is not well controlled and gives little real evidence for what is occurring. It is unclear what the concentration of the protein expressed was, but certainly the relative expression of the CARD9(CARD) and the microNS version should be assessed.

      We believe these concerns result from a misunderstanding. We assume the reviewer is referring to the experiment in Fig. 3B. Expression of muNS on its own has no effect on the DAmFRET of other proteins, and we have previously used it in exactly the same way as here (Holliday M et al. 2019 and Kandola T et al. 2021). Please note that muNS fusion proteins in our experiment have an orthogonal fluorescent protein whose spectra do not significantly overlap with those of mEos3.1. The experiment evaluates a protein’s ability, when condensed via its fusion to muNS, to nucleate an mEos3.1-fused protein that is expressed in trans. Fusion of proteins to muNS does not affect their expression levels, as we now show for CARD9CARD-muNS-mCardinal versus CARD9CARD-mCardinal (Fig. S6D).

      Also, the AmFRET profile of CARD9CARD looks very weird, it cannot be compared to BCL10.

      We are unsure in what way the AmFRET profile of CARD9CARD is “weird”. It is fully consistent with expectations and has been thoroughly explained in the text. We suspect the reviewer was bothered by the sharp acquisition of FRET at approximately 100 uM. As explained in the text, this represents the phase boundary, also known as the solubility line, for CARD9CARD polymers, which we previously showed in vitro (Holliday M et al. 2019). Above this concentration, the protein self-assembles without a nucleation barrier, hence the sharp but continuous change in FRET. BCL10 plots, in contrast, show a discontinuous acquisition of FRET, which indicates a nucleation barrier. In order to highlight that the CARD9CARD transition is understood and expected, we have also now added a line to the plot to demarcate the phase boundary.

      We are not convinced of the usefulness of the introduction of a slew of disease-causing CARD9 mutations that may or may not be relevant to the authors' point. The fact that they do or do not function in a specific sub portion of an assay that may or may not be relevant to biological activity seems to be of interest but without biochemical understanding, little is clear.

      While several reports have shown the clinical importance of these CARD9 mutations on susceptibility to fungal infections, little was known about the molecular mechanism underlying their effects. The inclusion of the disease-causing mutants to this paper is justified for the following reasons. First, they demonstrate the relevance of our work to disease. Second, they build off our findings to provide an otherwise unknown molecular mechanism of these mutants. We showed using independent methods that CARD9CARD mutations disrupt the ability to nucleate BCL10, via two different mechanisms. Finally, validating the disease-causing mutations allowed us to use them as controls for subsequent experiments demonstrating that BCL10 is supersaturated.

      The Optogenetic experiments are interesting, but difficult to interpret without evidence that these MALT1 constructs are indeed still functional when expressed in MALT1-deficient THP-1 cells. We do not therefore think that this experiment shows a necessity for clustering to signal, just a sufficiency, and in a highly artificial construct.

      We welcome the opportunity to elaborate on the optogenetic experiments. Since BCL10 and MALT1 are expressed ubiquitously across cell types, the validity of our findings should not depend on the cell type used. Indeed, much of what we already know about innate immunity signalosomes comes from work in HEK293T cells. Our optogenetic experiments using MALT1 were performed in 293T MALT1-KO cells in Figures 6E and F, and employed two distinct functional assays (p65 nuclear translocation and a transcriptional reporter). While our approach employs light to control clustering, similar approaches using (no less-artificial) chemically induced dimerization domains have been used to study caspase activation (Oberst A et al, 2010, Boucher D et al, 2018). Our use of light affords higher specificity, reversibility, and spatial and temporal control over MALT1 assembly than does chemically induced dimerization.

      To demonstrate the necessity of clustering, we have now performed an experiment with MALT1(126-824)-miRFP670-Cry2 expressed in 293T MALT1 KO cells that contain a transcriptional reporter of NF-kB ,as in figures 6E and F. We added PMA to the cells and found that it failed to activate NF-kB (Fig. 6), confirming that the interaction of MALT1 (via its death domain) with polymerized BCL10 is required for activation. Note that MALT1 and BCL10 exist as a soluble heterodimer prior to BCL10 polymerization; hence it is polymerization, rather than the interaction itself, that activates MALT1. That artificial clustering rescues this defect strongly suggests that the effect of polymerization can be attributed to increased proximity rather than some allosteric effect communicated from BCL10 polymers through the MALT1 DD to its caspase-like domain.

      Oberst, A., Pop, C., Tremblay, A.G., Blais, V., Denault, J.-B., Salvesen, G.S., and Green, D.R. (2010). Inducible dimerization and inducible cleavage reveal a requirement for both processes in caspase-8 activation. J. Biol. Chem. 285, 16632–16642.

      Boucher, D., Monteleone, M., Coll, R.C., Chen, K.W., Ross, C.M., Teo, J.L., Gomez, G.A., Holley, C.L., Bierschenk, D., Stacey, K.J., et al. (2018). Caspase-1 self-cleavage is an intrinsic mechanism to terminate inflammasome activity. J. Exp. Med. 215, 827–840.

      In the introduction and other parts of the paper, there are numerous instances where the previous literature in the field is not adequately cited. Examples include:

      • In the introduction, it is weird to cite one original paper (a MALT1 ko study by Ruland et al., 2001; there are several other studies of ko papers for CBM components that would merit being citated along with this study) together with two reviews on that topic (Ruland and Hartjes 2019 and Gehring et al. 2018)

      • In the introduction, the original study by Wang et al., 2002 should be cited together with Rebeaud et al., 2002; the two studies on the same topic were published back-to-back

      • In the introduction, the statement "CARD10 and CARD14 are expressed in nonhematopoietic cells including intestinal and skin epithelia, respectively" should be supported by citations.

      • Still in the introduction, the 2 references for the statement "... CARD14 gain of function mutations cause psoriasis (Howes et al., 2016; Jordan et al., 2012)" are not appropriate. There are several reports of patients with CARD14 mutations (the study by Jordan et al is only one of them) and several CARD14 mouse models that provoke a psoriasis-like phenotype, which would merit being cited.

      • In the following sentence: "Point mutations and translocations involving BCL10 and MALT1 cause immunodeficiencies (Ruland and Hartjes, 2019), testicular cancer (Kuper-Hommel et al., 2013), and lymphomas (Zhang et al., 1999).", the citation style also seems completely random, combining the citation of a single original paper for lymphomas (Zhang et al. 1999) (there are several other important original studies on that topic or recent reviews that could be cited instead), together with a review on immunodeficiencies (Ruland and Hartjes, 2019) and then another single example for a role of BCL10 and MALT1 in carcinoma (the study by Kuper-Hommel et al. is one, but several other original publications exist on the latter topic, showing for example a role in breast carcinoma or glioblastoma).

      • In the first section of the results, the reference cited for endogenous CARD10 expression in 293T cells (Ruland et al., 2001) is wrong, no endogenous CARD10 expression was assessed in that study

      We have now revised the citations mentioned above and other instances to ensure adequate citations in each case.

      Reviewer #3 (Significance (Required)):

      The paper deals with a complex question, namely how the CBM signalosome assembles and functions to stimulate NF-kB signaling. This question is important to the understanding of pro-inflammatory immune responses and basic life sciences in general. As the focal point of the paper is complex, and tools to study such phenomena are at the limit of technical capabilities, this further increases the potential impact of the work.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      The characterization of open-ended signalosomes in a number of innate-immunity and cell-death pathways, in particular formed by domains from the death-fold family, has led to the suggestions that these complexes allow a switch-like signalling response suitable for these pathways. It appears that this has been widely accepted. However, these suggestions are based largely on indirect observations and speculation.

      Rodriguez-Gama and coworkers have decided to test these suggestions more directly. Their results confirm the suggestions. Based on my own experience, papers that validate widely adopted suggestions are often not considered seriously by top journals, who are looking for hot topics/paradigm-changing/surprising type results. I would urge the editors to consider seriously work such as in this paper, which directly tests important suggestions and does so at a technically high standard. The authors use a range of ingenious approaches, both with recombinant proteins and in cells, and including proteins from organisms from different parts of the evolutionary tree, to support their interpretations, so it is an extensive and high-quality study. I am impressed that so many different fusion proteins with fluorescent tags continued to function as expected, but I guess the authors controlled for this as much as they could.

      Having said all this, I do get the feeling the authors are "over-selling" the nucleation barrier aspect of these signalling mechanisms. It is clearly an important and critical aspect of signalling in many systems, but then it is not the only important aspect; a number of other regulatory inputs play a role in different systems. So the statement "Our findings introduce a novel structure-function paradigm" in my view is overstretching things somewhat. Further in the Discussion section, the authors state "Existing explanations for the preponderance of ordered polymers in immune cell signalosomes have centered on the functions of multivalency at steady state, such as scaffolding and sensitivity enhancement resulting from the cooperativity of homo-oligomerization". They cite a small (and non-exhaustive) number of papers discussing this topic; all these include "seeding" or "nucleation" as an important part of the proposed mechanism. So I suggest the authors provide a more balanced discussion of this aspect. Different pathways appear to display a different level of switch-like behaviour, and one thing that the current version of the manuscript is missing is more discussion of other death fold-based systems and how the results on the CBM signalosome apply to these, and also other systems such as TIR domain-based ones, which currently get no mention whatsoever. In the CBM system, there seems to be one main nucleation barrier; can there be more than one in others?

      We appreciate the reviewer’s perspective and have now acknowledged in the introduction and discussion additional prior literature that has paved the way for our study. Nevertheless, we maintain -- as now stated in the abstract -- that “our results defy the usual protein structure/function paradigm, and demonstrate that protein structure can evolve via selection for energetic maxima in addition to minima”. We have elaborated in the introduction and discussion how immune signaling provides the functional context in which such a paradigm can evolve, and how our findings uniquely support the paradigm.

      One other aspect I need to express some criticism about is attention to detail - especially with a paper focusing on the physics behind biological processes, I would expect a higher standard of getting the terminology and units correct - see specific examples below. This can obviously be fixed easily.

      Specific points are listed below. No page or line numbers are provided so I have done my best to make it clear what the comments refer to.

      1. Abstract line 6 and throughout: in "NF-kB", the "k" is supposed to be "kappa" (Greek letter) - it stands for "nuclear factor kappa-light-chain-enhancer of activated B cells", not fully defined in the manuscript as far as I can see. Occasionally, small k is also used instead of the small cap K or whatever the authors used most of the time, but I don't think any of them use the Greek letter.

      We had indeed used a version of the small “kappa” κ. We have now fixed the cases where we mistakenly used k instead of κ.

      Page 2 (Introduction) paragraph 2 line 9: period missing at the end of sentence. Same Page 4 (Results: Assembly) paragraph 4 line 3.

      This is now fixed.

      Page 2 (Introduction) paragraph 2 line 15 and throughout: in long sentences, more commas can help help readability, for example before "leading" here. Similar page 15 paragraph 2 line 3 after "Additionally", paragraph 4 line 2 before "which".

      We have now included more commas and tried to improve readability throughout.

      Page 4 (Results: Assembly) paragraph 2 line 2: is "positive feedback" different from "cooperativity"? Is it a broader term that includes cooperativity, nucleation and other mechanisms? It may be useful to introduce some of these terms to avoid confusion by the readers.

      “Positive feedback” is the broadest term as it is agnostic to mechanism. “Nucleation” refers to the initiation of a first order phase transition, which is one mechanism of positive feedback. Nucleation involves “cooperativity”, in that a higher order species is more stable than smaller species. However, cooperativity can occur for oligomers of finite size, whereas nucleation is reserved for phase transitions to species of infinite size. We appreciate that the use of so many related terms may have created more confusion than necessary. Hence, we have now revised the text to omit the more general terms -- “positive feedback” and “cooperativity” where possible.

      Page 4 (Results: Assembly) paragraph 2 line 3: please define "TNF".

      We have now fixed this and other acronyms.

      Page 4 (Results: Assembly) paragraph 3 line 2: the use of size-exclusion chromatography to follow the size of complexes would assume that they are irreversible or very stable. It appears this may be the case here, but some discussion may be warranted.

      We have now explained that SEC is appropriate for this experiment because large nucleation barriers generally imply stable assemblies.

      Page 4 (Results: Assembly) paragraph 3 line 4 and throughout: the symbol for "kilodalton" is "kDa".

      We have now fixed this mistake.

      Page 4 (Results: Assembly) paragraph 3: I am not sure how the results discussed in this paragraph demonstrate that assembly occurs in cooperative fashion - just that there is a change in oligomeric states upon stimulation.

      Cooperativity is implied by the absence of oligomer sizes between monomer and the large assembly. Nevertheless, we realized this can only be concluded in the case of homotypic assembly, which we cannot yet assume at this point in the paper. Therefore, we have revised this paragraph to say that the distribution is “consistent with” an underlying phase transition (which we then go on to prove).

      Page 4 (Results: Assembly) paragraph 4 line 2: "WT" is not defined. Wild-type what? I presume "protein"?

      We refer here to the wild-type protein. We have now fixed this mistake.

      Page 4 (Results: Assembly) paragraph 4: it may be worth pointing out here the wild-type and mutant proteins expressed at similar levels; clearly the outcomes will depend on protein concentration in the cell. I believe the supplementary figure shows this to a large extent.

      Indeed, our supplementary figure shows that the WT and mutant protein express to comparable levels. We have now pointed this out in the text.

      Page 4 (Results: The adaptor) paragraph 1 line 4: "CARD domain" would stand for "caspase activation and recruitment domain domain". Please check throughout (including Supplementary Material).

      We have fixed this mistake.

      Page 4 (Results: The adaptor) paragraph 1 line 9: "expressed over a range of concentrations in cells" - this would imply the authors controlled expression - please rephrase to explain what exactly was done.

      We have now rephrased this sentence to indicate that the range of expression results from the use of a genetic construct with cell-to-cell variation in copy number.

      Page 5 (Results: The adaptor) paragraph 2 line 3 and throughout (including Supplementary Material): please use the Greek letter rather that "u" for micro.

      We have now fixed this mistake.

      Page 5 (Results: The adaptor) paragraph 3: this analysis is rather simplistic, it is not just the RMSD value, it is the nature of conformational change that is important? Please elaborate, I would think the papers presenting structural work have already discussed this to some extent?

      The reviewer is correct; it is the nature of the conformational change that is most important. We are unsure how to accurately estimate the energy barrier separating the two conformations for each protein. However, we have now undertaken a collaboration to attempt to do so via FAST molecular simulations (Zimmerman and Bowman 2015). In lieu of the results of these ongoing studies, we have modified the text to acknowledge that RMSD does not necessarily relate to nucleation barriers.

      Maxwell I. Zimmerman and Gregory R. Bowman. Journal of Chemical Theory and Computation, 2015, 11 (12), 5747-5757 DOI: 10.1021/acs.jctc.5b00737

      Page 5 (Results: The adaptor) paragraph 4 line 5 and further in this section: some symbol(s) do not show in the pdf - before "(delta)", next page line 3-5 after "higher" and "both".

      We have fixed this issue that resulted from exporting to a PDF file from our text editor.

      Page 6 (Results: The adaptor) paragraph 4: interface IIa and IIIb are not introduced, and there is not even any reference provided here.

      We have now added a reference for these mutations and elaborated on the interfaces IIa and IIIb.

      Page 6 (Results: Pathogenic) paragraph 1 line 12: "FL" is not introduced.

      We have now fixed this mistake.

      Page 8 (Results: Pathogenic) paragraph 7: the text "absent the pathogenic mutations" is missing something.

      We have now reworded this section.

      Page 10 (Results: BCL10) paragraph 3: why does CARD9 CARD clustering peak and then disassemble (I guess "clustering" doesn't disassemble, please rewrite as well).

      We have now fixed this mistake.

      Page 11 (Results: MALT1) paragraph 1: I presume dimerization doesn't achieve the same level of proximity as higher-order multimerization?

      Our interpretation here is that for MALT1, activation requires close proximity of more than two molecules. Although our dimerization module did not activate the caspase-like domain of MALT1, we know that it achieves close enough proximity to activate the caspase domain of CASP8. Hence we believe the MALT1 mechanism has a stoichiometry requirement in addition to a proximity requirement. This is, of course, consistent with the fact that activation normally occurs in the context of polymers rather than dimers.

      Page 11 (Results: Ancient) paragraph 1 line 4: is this AlphaFold2?

      That is correct, we used AlphaFold2. We have added that detail.

      Page 12 (Discussion) paragraph 4: not sure if "molecular examples of evolutionary spandrels" will be clear to most readers.

      We have now explained what evolutionary spandrels are, and elaborated on the relationship to our findings.

      Page 14 (Materials: Plasmid) line 2 and throughout: "Golden Gate" is usually capitalized. Similar for "Gibson" further in the paragraph. The English in this paragraph is not up to standard in general; for example "Then placing..." is not a complete sentence, and a number of sentences ending with "via gibson" need to be rewritten.

      We have now rewritten this paragraph.

      Page 16 (Materials: Cell) line 4 and throughout: "2" in "CO2" should be subscripted.

      This is now fixed.

      Page 16 (Materials: Transient) line 6 and throughout (including Supplementary Material): please use a space between number and unit ("35 mm").

      This is now fixed.

      Page 16 (Materials: Generation) line 4 and throughout: to distinguish from "gram", please italicize "g" and/or use "x g".

      We have now fixed this.

      Page 17 (Materials: Yeast) line 3: please specify which table is "table X".

      We have now fixed this mistake.

      Page 17 (Materials: Mammalian) line 1: please provide full reference. Same next paragraph line 2.

      We have now fixed this.

      Page 17 (Materials: DAmFRET) line 3: "SSC" and "FSC" are not defined.

      We have now fixed this.

      Page 18 (Materials: Fluorescence) line 10: "Coefficient" does not have to be capitalized. It does not have to be defined again in the next paragraph.

      We have now fixed this.

      Page 19 (Materials: Optogenetic) line 1: "performed" rather than "made"?

      We have now fixed this.

      Page 19 (Materials: Protein) line 12: the Compass software doesn't have a reference?

      We have now added the reference to the software.

      References: please make format consistent: articles titles in sentence or title case.

      We have now formatted all references to be consistent.

      Legend to Fig. 1: I suggest "Schematic diagram"; and "h" rather than "hrs"; please check throughout (including Supplementary Material).

      We agree with this suggestion.

      Legend to Fig. S1: is "TNF-a" supposed to be "TNF-alpha"?

      We have fixed this.

      Legend to Fig. S7: please capitalize "Figure 2H".

      We have fixed this.

      Legend to Fig. S10F: please move "Dox" behind the concentration.

      We have fixed this.

      Fig. S14B: the colours in the superposition make it difficult to see the differences.

      We have used a different color now.

      Legend to Fig. S14: I suggest "structure...predicted by AlphaFold" (2?) and include the reference.

      We agree with this suggestion.

      Reviewer #4 (Significance (Required)):

      As argued above, the significance of this paper is that it tests directly important hypotheses proposed or assumed previously, and does so at a technically high standard. No published report has done so to a similar extent.

      The paper should be of interest to a broad audience from cell biologists and immunologists to biochemists, biophysicists and structural biologists.

      My expertise is in structural biology or systems similar to the one studied here.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The study by Rodriguez Gama et al. addresses the molecular function of CBM complex-forming proteins CARD9, BCL10 and MALT1 in the activation of myeloid cells, using optogenetic tools, transcriptional reporters and biochemical approaches. It is known from previous studies that Bcl10 oligomerizes into filamentous oligomeric structures incorporating Malt1, and that these structures are nucleated by receptor-induced activation of CARD proteins such as CARD11 (in lymphocytes) or CARD9 (in myeloid cells), but the mechanism underlying the assembly of the resulting CBM complexes remain incompletely understood.

      The authors develop beautiful optogenetic tools to address this question, and convincingly demonstrate that CARD9-mediated nucleation of BCL10 triggers a binary cellular NF-kB response in a spring-load-like fashion, and identify mutants of BCL10 and CARD9 that impact this capacity. Unfortunately, however, the authors do not do a good job to simplify this complex problem so it can be easily understood. In particular, the choices of mutants, models and experiments are not consistent between figures, and some data seem to be arbitrarily added or omitted. Complex hybrid constructs are also used, without assessing whether these are indeed functional in the corresponding ko cells. The paper would therefore benefit from a major overhaul. We also noticed that the literature is often not cited adequately and have included a (non-exhaustive) list of examples of wrong, incomplete, or erroneous citations below.

      1) The initial observations of binary signaling are derived from a reporter system. Although there are controls to show that the reporter used does not function intrinsically cooperatively, it would be nice to see additional data to show that cooperativity occurs also at the level of endogenous response systems, for instance by qPCR-based assessment of a natural NF-kB target gene (induced for example by TNFa versus B-glucan in THP-1 cells, and by TNFa versus PMA in 293T cells).

      2) The cell lines in Figures 1D-E (and also some of the BCL10 mutants used later on) would have been better run in the assays in the early parts of Figure 1. The final conclusion prior to the section The adaptor protein BCL10 is a nucleation-mediated switch is otherwise not justified. This is a central tenet of the paper, that is referred to again, with some other ancillary data to support it. These mutants reappear later in the paper, but it would have been better, and easier to make rescue lines of BCL10 KO in Figure 1, otherwise the logic is lost, and the models seem chosen arbitrarily.

      3) Expression with microNS is not well controlled and gives little real evidence for what is occurring. It is unclear what the concentration of the protein expressed was, but certainly the relative expression of the CARD9(CARD) and the microNS version should be assessed. Also, the AmFRET profile of CARD9CARD looks very weird, it cannot be compared to BCL10.

      4) We are not convinced of the usefulness of the introduction of a slew of disease-causing CARD9 mutations that may or may not be relevant to the authors' point. The fact that they do or do not function in a specific sub portion of an assay that may or may not be relevant to biological activity seems to be of interest but without biochemical understanding, little is clear.

      5) The Optogenetic experiments are interesting, but difficult to interpret without evidence that these MALT1 constructs are indeed still functional when expressed in MALT1-deficient THP-1 cells. We do not therefore think that this experiment shows a necessity for clustering to signal, just a sufficiency, and in a highly artificial construct.

      6) In the introduction and other parts of the paper, there are numerous instances where the previous literature in the field is not adequately cited. Examples include:

      • In the introduction, it is weird to cite one original paper (a MALT1 ko study by Ruland et al., 2001; there are several other studies of ko papers for CBM components that would merit being citated along with this study) together with two reviews on that topic (Ruland and Hartjes 2019 and Gehring et al. 2018)
      • In the introduction, the original study by Wang et al., 2002 should be cited together with Rebeaud et al., 2002; the two studies on the same topic were published back-to-back
      • In the introduction, the statement "CARD10 and CARD14 are expressed in nonhematopoietic cells including intestinal and skin epithelia, respectively" should be supported by citations.
      • Still in the introduction, the 2 references for the statement "... CARD14 gain of function mutations cause psoriasis (Howes et al., 2016; Jordan et al., 2012)" are not appropriate. There are several reports of patients with CARD14 mutations (the study by Jordan et al is only one of them) and several CARD14 mouse models that provoke a psoriasis-like phenotype, which would merit being cited.
      • In the following sentence: "Point mutations and translocations involving BCL10 and MALT1 cause immunodeficiencies (Ruland and Hartjes, 2019), testicular cancer (Kuper-Hommel et al., 2013), and lymphomas (Zhang et al., 1999).", the citation style also seems completely random, combining the citation of a single original paper for lymphomas (Zhang et al. 1999) (there are several other important original studies on that topic or recent reviews that could be cited instead), together with a review on immunodeficiencies (Ruland and Hartjes, 2019) and then another single example for a role of BCL10 and MALT1 in carcinoma (the study by Kuper-Hommel et al. is one, but several other original publications exist on the latter topic, showing for example a role in breast carcinoma or glioblastoma).
      • In the first section of the results, the reference cited for endogenous CARD10 expression in 293T cells (Ruland et al., 2001) is wrong, no endogenous CARD10 expression was assessed in that study

      Significance

      The paper deals with a complex question, namely how the CBM signalosome assembles and functions to stimulate NF-kB signaling. This question is important to the understanding of pro-inflammatory immune responses and basic life sciences in general. As the focal point of the paper is complex, and tools to study such phenomena are at the limit of technical capabilities, this further increases the potential impact of the work.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Two reviewers commented on the smeared appearance of Tae1 bands in our Western blot analyses (Figure 4F and 5B) and asked us to improve their technical quality.

      -We agree and will repeat these experiments with more careful attention to lysate preparation, using a higher percentage SDS gel for better separation of low molecular weight proteins as suggested.

      Reviewer 2 requested that we assess how Tae1 variants impact interbacterial competition outcomes.

      -We agree that this would be interesting to take a look at. While this will not be feasible for every variant we examine in the paper, we can conduct comparative interbacterial assays between P. aeruginosa and E. coli using P. aeruginosa strains with a tae1 point mutation for c110s. Given that our biochemical experiments show that this hyperactive variant evades inhibition by the cognate immunity protein, we expect that this may decrease P. aeruginosa fitness, even in the context of competition.

      More generally, we think that examining Tae1 variants in the context of interbacterial competitions would be a critical orthogonal approach in order to validate that the DMS results have any bearing on competition outcomes. However, we feel that major focus of this paper is on the more molecular and biophysical insights that our approach can offer. Our study tests our assumptions about the kinds of features and surfaces that are important for proteins that engage with non-canonical complex substrates. It is, of course, interesting to think about the implications of this for physiological phenotypes and the drivers of toxin evolution. It is also exciting to imagine how this kind of information could be used to one day engineer certain interbacterial outcomes. We hope that others in the field will push our efforts into these directions, but we do not feel that these directions are essential for our conclusions. However, our conclusions on the molecular and biophysical aspects have helped generate interesting hypotheses in microbial ecology that could be largely followed up on by others.

      In order to conduct well-controlled P. aeruginosa:E. coli competition assays for more Tae1 variants, we would need to generate a significant number of new P. aeruginosa strains encoding point mutations for each of our variants across several genetic backgrounds. The competitions themselves also require a considerable amount of work to optimize and quantify. We are able to do this for one of the variants as previously mentioned (C110S). It’s important to note that the first author of this paper, who was the primary driver of this work, is no longer in my lab or in academia. As for myself, I am also in the middle of a transition out of academia and am actively ramping down my lab at UCSF. I no longer have the space or appropriate set-up to support this longer-term effort.

      Reviewer 2 asked that we examine Tae1 (WT and C110S) expression levels in vivo to more precisely examine whether increased self-intoxication by Tae1C110S in P. aeruginosa was due to differences in toxin activity or toxin levels.

      We agree with this suggestion and will look at toxin protein levels by Western blot analysis in the context of P. aeruginosa cells grown 1) alone on solid media and 2) together with E. coli on solid media during interbacterial competition using conditions that match our other competition assays.

      All 3 reviewers asked us to provide more experimental evidence addressing the hypothesis that differential peptidoglycan (PG) affinity across Tae1 variants could explain variation in toxic activity.

      -We agree that this is an interesting point to follow up on further. To be clear, we also do not know whether this hypothesis is true at this stage, and the answer is not necessarily critical for our central advance, but we would like to give it a try! We have devised an approach to ask the question experimentally across a subset of our deep mutational scanning (DMS) variants.

      Reviewer 1 suggested that we quantify in vitro binding affinities for PG using isothermal titration calorimetry (ITC). However, given that ITC requires high concentrations of well-defined homogeneous substrates, which we are not able to generate for more complex higher order structures of cell wall PG, we propose a pull-down based approach.

      Briefly, we plan to conduct pull-downs using insoluble, purified cell wall sacculi from our two E. coli grown under the two conditions as bait for recombinant Tae1 proteins. Given that intact sacculi or inherently insoluble, we can simply collect bound Tae1 through centrifugation of sacculi pellets and examine the amount of Tae1 associated by Western blot analysis. These analyses will need to be conducted across a titration of Tae1 concentrations and also with catalytic activity inhibited to avoid solubilization of sacculi. We will block Tae1 hydrolysis by carrying out pull-downs in the presence of a general commercially-available cysteine hydrolase inhibitor, E64. If there is indeed differential affinity for PG underlying lytic differences across Tae1 variants, we would expect to see greater relative association of Tae1 variants with the type of cell wall sacculi that they more effectively lyse in our DMS screen. We would expect the reverse trend to also be true (lower affinity for less active variants).

      Reviewer 1 would like to know if we have done lysis experiments with any E. coli mutants that only impact PG density but not PG polymer structure? If they haven’t tested any E. coli mutants, have we done lysis experiments using drugs that have a similar impact on PG? Even if we don’t include these data in the paper, the reviewer would like us to comment on the trends we have observed.

      We have not done experiments in any mutants or chemical backgrounds known to only impact PG density but not polymer structure. We think this would be a very interesting angle! But unfortunately this is outside the scope of this study. It would require that we first experimentally confirm that the restrictive effect on only density is clearly demonstrated using a variety of techniques, including microscopy, chemical analyses, and biophysical probing of sacculi.

      Reviewer 1 asked for additional DMS screens in more conditions

      We love this idea! In fact, we hope that others are motivated to adopt our workflow to run many more DMS screens for T6S toxins, as we believe these screens provide a lot of useful and sometimes surprising insights that could be of great interest to others. However, we believe that the primary goal of this paper is to establish this methodology as a compelling approach for studying toxins and, more generally, proteins with complex cellular substrates. It does not necessarily fall within the scope of this paper to fully assess the mechanistic implications of cell wall diversity across a wide range of conditions.

      In our experience, rigorously conducting DMS screens requires a significant amount of effort and resources to establish consistent experimental conditions. Also, a non-trivial number of costly sequencing-based experiments are required across control and variables for the results to be statistically sound and meaningful. Furthermore, experimental validation of results are ultimately important for our ability to confidently generate hypotheses stemming from these datasets. As stated above, the first author of this paper, who was the primary driver of this work, is no longer in my lab or in academia. As for myself, I am in the middle of a transition out of academia and am actively ramping down my lab at UCSF. I no longer have the space or appropriate set-up to support this longer-term effort.

    1. Design is hope made visible. You can live your life as the result of history and what came before, or you can live your life as the cause of what’s to come. You choose. When talent doesn’t hustle, hustle beats talent. But when talent hustles, watch out. When you work only for money, without any love for what you do in and of itself, your work will lack energy. People will feel that. So give every project everything you’ve got, at every moment, every time. A good philosopher will say: “Know thyself.” A good shopkeeper will say: “Know thy customer.” A good designer will say: “Know both.” Listen for when someone is dismissing your ambitions. Only the petty do that. Avoid them. Instead, seek out those much better than you; they’ll make you feel that you can achieve your dreams, as theirs are probably even larger. They’ll wave you on to the finish line. A brand is always answering two questions. The first one internally facing: What do we believe? The second, externally: How do we behave? You must remain authentic to yourself, your core values, and what you stand for. If you’re not, people will sniff you out. But your brand must maintain cultural congruence — remaining relevant to the times, always evolving to inspire people at large. The answers to these two curiosities must always be aligned. Find a way to connect every project to something much bigger: a higher order value, a truth, a courageous goal, or a larger question. Then, if your efforts start to lag or feel mundane, return to that larger ideal that inspired you in the first place. It works. Put this over your desk: “You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete.” Buckminster Fuller knew stuff. A good designer will help a company get to where they want to go. A great designer will push a company to where they should go. Are you going to tell a story? Then tell a big story. An enormous story. An epic story. Or tell no story at all. The role of creative leadership is to create more leaders — not more followers. This view is more uncommon than I’d like. I’ve learned that there are only two kinds of people: 1.) People who do exactly what they say they’ll do. 2.) People who are full of shit. Form follows fantasy. Every good idea comes from a spark of imagination, not pragmatism. Facts are important. But possibility creates futures. Never take an unpaid internship. Ever. It is unethical to be offered one, and in many places, it is illegal. But more importantly, what kind of people would refuse to pay you? Oh yeah, really shitty people. If you lose the desire to be silly, the power to laugh, and the ability to poke fun at yourself, you will lose the power to think. All work and no play makes Jack a dull boy for one reason: It kills off his imagination. Stuck on a problem you can’t solve? Go bigger. Expand it. Make it giant. Do not try to contain it, or simplify it, or reduce it. Make it so large that you can begin to see a new pattern. Solve the larger problem and the smaller one will get solved along the way. Always begin in mythology. It’s good fuel. Fables and fantasy don’t age or grow stale for one reason: They are a step into a dimension beyond the reach of time itself. Build with them. When I turned 35, I shifted my desire to be happy to a desire to be useful. It made all the difference. There are only two kinds of leaders: 1.) Those in the engine room helping the crew shovel the coal. 2.) Those who sit on top of the train and wave at the crowds as they pass by. Learn from ad agencies. They say yes to everything, even when they can’t do it. But they try. Designers say no all too often: “Oh, no. We don’t do that!” That’s shortsighted. Instead, say yes to everything. But always add “yes…if.” Then define your terms. I was on a board with the esteemed educator Sir Ken Robinson. At one meeting where a pompous guest was droning on, he turned to me and whispered, “What we do for ourselves dies with us when we leave this planet. What we do for other people can live on forever.” The opposite of courage is not cowardice. The opposite of courage is conformity. Ubiquity = Invisibility. What we’re overly familiar with, what becomes common, we stop seeing. One function of design is to restore our perception, renew our understanding, and invite us to be more alert. Seek simplicity only on the far side of complexity. Do the work, the research, the understanding, and discover the unseen, surprising, unanticipated insight before you start crafting your solution. A celebrated designer I admire once said “Style = Fart.” I disagree. I believe “Style = Accuracy.” It gives focus and timely relevance to ideas. If you want to make people like things, work in advertising. If you want to make things people like, work in design. Both are valid ways to build a brand, but the second way pays off better in the long run. You can always pull a good story out of a successful product or service. You can’t always pull a good product out of a story. Hire gifted people your clients would never let in their front door. Give them influence. Clear the runway. Provide sandwiches. And stand back. When designers get overwhelmed we can retreat into passivity. We pull back. This gives us an illusion of control. The less we try, the less our chances to fail. We make it look like we’re not responsible for what happens to us. But never give up. Move in closer, instead. Try. Make a mistake. Apologize quickly. And keep trying. Never be boring. Be ridiculous. Absurd. But never be boring. (Yes, this rule will get you in trouble.) Push. Push harder. The goal is to make the complicated simple — not the other way around. The best ideas are often expressed as simple ideas. They’ll have power because they’ll feel inevitable. Looking backward from the end of a project, it will have the appearance of inevitability. But when you began, you had no idea you’d end up there. What dullards suggest at this point is dangerous: “This creative process is too messy and too complicated. It needs efficiency since this solution was so logical. We should apply more logic throughout the process!” That’s the beginning of the end of creativity. Resist this urge. It destroys spontaneity, originality, serendipity, and unintentionality, which is where the biggest ideas are waiting for you. Do you find yourself surrounded by people who whine that “clients don’t understand what we do”? Those people will never have good clients. A designer’s first job is to articulate the tangible value we bring to every situation. It’s not the clients’ job to try to guess it. Average designers hit the brakes when they feel fear. But when the talented get frightened, they hit the pedal, accelerate, and drive headlong into the unknown. I’ve taught students for 20 years. In that time I’ve seen self-confidence, persistence, and desire play a much larger role in growth and achievement than talent. Passive? Whining? Waiting for orders? You won’t get off the ground. Energized? Enthused? Curious? The sky’s your limit. If you want to teach design, first read “Teaching to Transgress” by bell hooks. Your whole mindset will change. If it doesn’t, please do not teach. Seeking mastery in design means being comfortable with making your own path. Forge the new road. Others will question it and doubt it. But that path will eventually come to fit your soul. It will not only lead you into deeper parts of your craft, but to hidden parts of yourself. There may come a time when someone publicly attacks you or your work. If that happens, remember this: Those who attack are the ones who fear you the most. They’ll suspect that your talents might be greater than theirs. They, in fact, become your most sincere believers. It’s a proof point when they start showing up. Watch for them. Then thank them when they arrive. “Always think with your stick forward.” Amelia Earhart painted that on her plane. She meant, I imagine, to seize the moment when it arrives. Refuel as necessary. Don’t wait for any damn kind of “inspiration.” Punch the throttle. Get back in the air. Keep flying. Are you at an agency that habitually recruits outside industry hotshots to lead instead of promoting potential hotshots from the ranks? Run. Now. It will never become what it wants to become. Separate talkers from doers. For someone to score an interview, I suggest a good book — on anything — to read in advance. “After you finish it, call me, and we’ll schedule some time.” 90% drop off. There are exceptions, but I hire from the remaining 10%. Be careful of doing too much work that copies the people you admire. Start out that way to see what feels right. But aim to seek what they were seeking instead of doing what they were doing. Stay away from people who confuse pomposity for profundity. Articulate incompetency is contagious. When you’re out-gunned, out-staffed, and out-equipped in a competition, what are the things you’ve got left to use? Kindness and imagination. When someone disagrees with you, do not defend yourself. Instead, listen. Ask them to explain, validate their concern, expand on it, and affirm their point of view. Only then will anyone listen to anything you have to say. I wish someone had told me this in my teens. We don’t create fantasy worlds to escape reality. We create them so we can better see, understand, and reshape reality. Seek ambition. Hire character. Train talent. When I hear the word “iterate” more than three times in three minutes, I fear there will be a Post-It® fiesta within three minutes. Fair warning. A story is not just a tale of conflict. It can be a well of shared values. If you shift the story people tell about themselves and their communities, you can not only shift those people, you can shift an entire culture. Build a library for yourself, and read John Milton. He had profound respect for books and human thought. “For books are not absolutely dead things, but do contain a potency of life in them to be as active as that soul whose progeny they are; nay, they do preserve as in a vial the purest efficacy and extraction of that living intellect that bred them.” A better definition about the sanctity of books was never written. Notice someone doing something cruel for the first time? Never wait for a second time. Address it fast, or cut them out. Either way, do not “wait and see.” It leaves you and your team vulnerable. What they showed you is who they are. Move fast. Mastery is not gained from intellect. Mastery is not gained from talent. Mastery is not gained from ambition. Mastery is only gained from time and focus applied to your craft over many, many years. Do not conflate it with fame. Try absolutely everything. Then try it all again. And then, one more time. Accept compliments gracefully. Treat flatterers with suspicion. Listen to your complainers and cynics — not because you might learn from them, but because they secretly care. Design ain’t what the thing looks like. Design is what the thing does. Smartphoning has supplanted daydreaming. Fixated on our little, lit-up screens, dusty old thoughts no longer slip out of our brains as easily, so no new, silly, absurd thoughts slip back in. And all good ideas start out as silly, absurd thoughts. Turn off your phone. Daydream. Fart around. Ponder. Let something odd fly in that’s floating around, hoping for an open mind to land in. If an idea doesn’t scare you in some way, it’s not really a good idea. A strong, sincere voice is like a clear bell—when rung, it travels far, across fields, mountains, and rivers. Ring it. And teach others to. Ignore those who tell you to “only focus on your strengths.” Nonsense. Your strengths never go. Build them, hone them, and add muscle to them. But also focus on what you need to move into new and larger worlds. Become a shocking triple threat, not just a shiny, one-trick pony. Failures are not always mistakes. It just might have been the best you could do at that point. Okay, fine. Apologize quickly. The real failure is to beat yourself up and not take the opportunity to learn. Never hire people for “cultural fit.” What a pernicious term. Instead, hire insanely talented people for their “cultural contribution.” For how unique they are. For why they are different from you. For what they will add that you do not have. People who use the word “lifestyle” don’t have one. Big agency order of importance: Clients –> Work –> People. Ours: People –> Work –> The client’s customers –> Clients. It’s easy. Good people do good work that customers love so clients succeed. T’was ever thus. Don’t work with clients to help them become the best. Work with clients to help them become the only. Hire Tigger. Never Eeyore. Surround yourself with optimists. They will build futures into existence. Read a good book every week. After a year your brain will be fueled like a rocket and your mind will naturally start going to new places, connecting new ideas, and thinking in ways you never have before. Never create and edit at the same time. Get all the sloppy, ugly roughs and first drafts out. Quantity is more important than quality at the start. Mess is more. All ideas are bad ideas. They only become good through craft and love. Clients want you to succeed like crazy. That’s why they hired you. Show them how. That’s your damn job. Do it. We perceive through images. We think in metaphors. We learn by stories. We create with fantasy. When you find yourself on the horns of a dilemma, always do the honest thing. This will shock people. And you’ll come out better, anyway. Perhaps. Maybe. Possibly. Someday. These are among the most damaging words a creative person can use. Lose them. Everybody starts out with good intentions. Not everybody finishes with them. This has been the most painful thing I’ve ever learned. People already know what advice they need to hear. They just need to hear it told to them by someone else. There is no such thing as “The Future.” There is only and always “The Futures“—and they are all in competition with each other, fighting for dominance. Which future will you feed? When asked for a definition of “brand,” I use this: A brand is a promise performed consistently over time. It’s held up for a while now. Brands are mentors of things to come. The best ones anticipate, create, and move us into tomorrow. Companies are no longer in competition with each other. They are—we all are—in competition with the future itself. The era of human-centered design is now gone. Our existence was never human-centered, anyway. Covid-19 proved that to be nonsense. It’s time for environment-centered. Not sustainability. Regeneration Design where we create not apart from Nature but as a part of Nature. It is never about winning. It is never about losing. It is only about contributing. It is only about learning. I’m tired of talks from “designers” who never design anything beyond their keynotes. I’m tired of talks from “entrepreneurs” who never build anything beyond themselves. I’m tired of talks from “thought leaders” who lead nothing but the perpetuation of their own fame. When you submit a fee for your work, someone will always ask, “Is this negotiable?” Answer with this: “Yes. Up.” In the end, it’ll not be what you took. It’ll be what you gave away. Do not worry about your competition. You’re not in competition with them anymore. You’re only in competition with the future itself. So don’t look over your shoulder. Look two, three, five years down the road and invent backward from there. Design is the bridge that gets us from where we are to where we should be. It is future-making. And it’s our job to get our clients into the best futures for themselves as quickly and effectively as possible. Skip the whole “Minimal Viable Product” thing. It leads to incrementalism. Try “Maximum Fucking Love.” It leads to something that someone else might actually care about. Be aware that every choice you make comes down to two options: Feeding grievance or creating hope. In the end, it is that simple. The era of problem-solving is gone. It’s too reactive in a world where the future arrives too fast. Designers must now be problem seekers, finding and anticipating problems before they arrive on our desks, because at that point, it’s already too late. We must now all build bridges, not walls. The rest is detail. Design Thinking gives a definition of romantic expression as found in timely historical contexts. Design says: “I’ll be upstairs.” In first creative presentations, to ensure your creative work has time and space to land, ban all of the devil’s advocates from the room before you show a thing. Then say this: “We are here to create something new. New ideas can be fragile because they are unfamiliar. You may not like something you see here, but you are not allowed to say that for now. We’ll have to edit and remove some of this work later, but for now, everything will be in play. So find something, anything, you like in every idea. A color. A word. An image. A sentence. Anything. In the end, we find what we look for. And today we are going to look for the new.” In the end, there are only two key questions the world asks of us: 1.) Who are you? 2.) Where are you going? These questions are the same ones we ask our clients. The first is about authenticity; the second is about relevance. Asking them will keep the world wide open in front of you. Whether you like it or not, your brand’s story already exists, so you should manage it as you would any other powerful company asset. After your product, your means to deliver it, and your audience, your story will be the most potent tool you have to build with. Be very. For a very long time, it took a very long time for anything to change. If you found an answer that worked, you could count on it being the answer for ages. But those days are over. Being an answer is not the answer. Or even an option. Unless, of course, you’re very curious. Or very focused. Very gay. Very straight. Very caring. Very prickly. Very visual. Very verbal. Very brash. Very funny. Very heady. Very anything. Everyone at COLLINS is very something. If I took any lessons from Ogilvy, it was these two: 1) Think bigger. And then, think bigger still. 2) Take every chance while you can. Grab them. And go all in. You never know if they’ll ever come again. Experience. Don’t observe. Inhale. Don’t read. Transfigure. Don’t shift. Advocate. Don’t ponder. Prove. Don’t promise. Encourage. Don’t cut. Imagine. Don’t worry. Do. Don’t analyze. Hear. Don’t listen. Show. Don’t tell. Give. Don’t take. Design is not what we make. Design is what we make possible.

      Some great design principles and wisdom.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors sought to investigate the diet of the early fossil bird Jeholornis and its implications for bird-plant interactions in early bird evolution.

      Major strengths were: 1) an exquisite near-complete cranial reconstruction of the early fossil bird Jeholornis from the Early Cretaceous of China, 2) a large sample of extant bird skulls (160) for the geometric morphometric analysis, and, 3) qualitative description of alimentary contents of extant birds.

      Major weaknesses were: 1) restriction of diet consideration to only granivory and frugivory, 2) under-detailed comparisons between the extant and extinct alimentary contents, 3) unclear explanation of the connection between early fossil birds and seed dispersal.

      Thanks for the summary of our work! To briefly reply to the weaknesses mentioned here (more details are provided in the following reply to the reviewer’s comments and suggestions):

      1) We have added supplementary analyses according to the reviewer’s suggestions, so this should have been addressed now. Our morphometric analyses attempt to explain the presence of seeds in the gut contents of some individuals of Jeholornis. We believe there are only two possible explanations of the presence of these seeds: granivory or frugivory. Therefore, we were initially motivated by the need to rigorously rule-out a granivorous explanation of the present of seeds in the gut of Jeholornis, which then would demonstrate the partially frugivorous diet of Jeholornis - it doesn’t have to be a specialist frugivore and its supplementary diet components don’t influence the inference that the presence of seeds results from fruit-consumption. Fruit-consumption is the key mechanism that we provide evidence of for the first time in early birds, and is central to the potential for mutualisms between plants and early birds. However, our supplementary geometric morphometric analyses do indicate some clues about its supplementary diets that are useful. In particular, they rule out some other diets e.g. piscivory or a probing diet.

      2) Our work is the first work we know to provide comparative data on the seed-containing gut contents of extant birds, as a tool to interpret fossil gut contents. For granivores and frugivores, we have done detailed 3D comparisons among several species. We think this is important, and we have done our best to document them clearly. However, for now, we have further clarified the images that we have presented, in response to a comment by referee 3 (see below). We hope that this also addresses the concerns of referee 1 here.

      3) By providing direct evidence of fruit-consumption in early birds, we provided evidence of the mechanism for potential bird-plant co-evolutionary mutualism during the Early Cretaceous. We are not showing the direct evidence of the mutualism, although note that plants invest energy in fruit production specifically to attract fruit-eating animals to act as seed dispersers. Therefore, the inference of mutualism is not far-fetched and is very likely, even if direct evidence is almost impossible to preserve in fossils - so that we tend to tone down this statement rather than making it too strong. More detailed analyses based on more new fossil discoveries in the future are expected to further explore the role of birds the Cretaceous Terrestrial Revolution. However, our study is the first step to evidence and discuss this ecological topic and the furthest we could go based on the current fossil discoveries. Nevertheless, this seems important and will be the base of future studies.

      The authors did not yet achieve their full aims because their methods limited the scope of their conclusions. Specifically, a third hypothesis that Jeholornis was neither granivorous nor frugivorous was not addressed in the study. This is especially poignant as the PCA data show overlap between the granivory and frugivory data points and the 'other diet' data points. If it is assumed that Jeholornis must be a granivore or a frugivore, then the results support frugivory over granivory for Jeholornis. However, as explained above, this assumption is not supported by the data provided so the third hypothesis needs to be tested.

      Thank you very much for stating the concern of our study. It seems that there is some misunderstanding here about our study. Our analyses attempt to explain how seeds entered the gut content of Jeholornis, not to predict diet in the absence of evidence from gut content. That is why we tested between just two alternative explanations of the gut contents in our original analyses: (1) That seeds entered the gut through granivory (seed-consumption); and (2) That seeds entered the gut through frugivory (fruit-consumption). Based on this combined evidence of seeds in the gut, comparative study of the gut contents of extant birds, plus morphometrics of the skull and mandible, we claimed partial (possibly seasonal) frugivory - a form of facultative frugivory for this lineage. Therefore, we are not claiming specialised frugivory in Jeholornis as the reviewer might think. However, we acknowledge that the word 'frugivorous' might be misleading to some readers, who could interpret it as meaning 'specialised frugivorous'. To avoid this misunderstanding, we did consistently use adjectives such as 'partial', 'seasonal' and 'opportunistic' in our initial submission. And we have tried to reinforce this in our revised manuscript. For example, we converted some instances of ‘frugivory’ to ‘fruit-consumption’ to indicate the act of consuming fruit rather than a perceived idea of specialised frugivory.

      We may also need to emphasize here that, the seed dispersal and frugivore ecology studies of the modern taxa show that, for most frugivores, fleshy fruits are a non-exclusive food resource, which is supplemented with other foods like animal prey and plants (Howe, 1986; Corlett, 1998; Jordano, 2000; Wilman et al., 2014). In addition, plants usually bear fruits only in certain seasons rather than being available throughout the year, which makes strictly specialized frugivore very rare. Therefore, avian frugivores occupy a wide range of diet space that is highly overlapping with some other diets. However, to reply to the comment from the reviewer and also make this clearer to some other readers, we conducted supplemental analyses by dividing 'other diets' further to test what diets Jeholornis possibly/impossibly had as supplements of frugivory. The results of them were shown in Figure 2 - figure supplements 3, Figure 2 - figure supplements 4 and Figure 2 - figure supplements 5 now. We revised and added these texts into the manuscript to describe the added supplemental analyses:

      “Our main analysis is intended to test why seeds entered the gut of Jeholornis by distinguishing between two hypotheses, either (i) fruit consumption or (ii) seed consumption (Figure 2, Figure 2 - figure supplements 2).”

      “Our supplemental analysis includes a further split of “Other diets”, separating the “Other diets” category into: (1) Probing for invertebrates; (2) Grabbing/pecking for invertebrates (Figure 2 - figure supplements 3); (3) Piscivores; (4) Animal-dominated omnivores; (5) Carnivores (Figure 2 - figure supplements 4); (6) Nectarivores; (7) Omnivores; (8) Plant-dominated omnivores (Figure 2 - figure supplements 5). Our prior expectation is that these analyses will not provide an unambiguous classification of the diet of Jeholornis on their own, because craniomandibular shape data does not completely differentiate among diets in birds (Navalon et al., 2019), but that they may be capable of ruling out the occurrence of some diets.”

      The results of these supplemental analyses are as the descriptions we added in the manuscript:

      “Our supplemental analyses exclude Jeholornis from possessing a probing diet, which occupy negative PC1 values (Figure 2 - figure supplements 3), as well as being a piscivore, which occupy positive PC2 values (Figure 2 - figure supplements 4). However, it cannot be distinguished from other diets such as the grabbing/pecking for invertebrates and omnivory (Figure 2 - figure supplements 3, 4, 5). Euclidean distances in the full multivariate shape space suggest that the mandible of Jeholornis is relatively similar to those of various omnivorous (e.g. Podica), seed-grinding (e.g. Calandrella), frugivorous (e.g. Crax), and invertebrate pecking (e.g. Picus) birds (Figure 2 - Source data 3).

      “Similar to the results of the mandible analyses, the results of the supplemental analyses of cranial shape also exclude Jeholornis from possessing a probing diet, which occupy negative PC1 values (Figure 2 - figure supplements 3), as well as being a piscivore, which occupy positive PC2 values (Figure 2 - figure supplements 4).The other diets are also undistinguishable in the supplemental analyses of cranial shape (Figure 2 - figure supplements 3, 4, 5). Euclidean distances in the multivariate shape space, excluding PC3 (which describes the large-scale differences between stem- and crown-group birds) suggest that the cranium of Jeholornis is relatively similar to those of various frugivorous (e.g. Manucodia), seed-grinding (e.g. Pedionomus) and invertebrate pecking (e.g. Hymenops) birds (Figure 2 - Source data 4).”

      These results are briefly merged into the discussion part:

      “Mandibular and cranial shape excludes Jeholornis from being having a probing/piscivorous diet, and is consistent with omnivory, grabbing/pecking for invertebrates, or processing foliage (using the gastric mill).”

      The existed main morphometric analyses show that a seed-cracking diet can be ruled out as an explanation of the presence of seeds in the gut of Jeholornis, which is its primary goal. In addition, our intention of this study is to show evidence for at least seasonal fruit consumption in some of the earliest birds (not specialised frugivorous), which all three reviewers seem to agree is a well-founded conclusion, and the bigger picture insights of our paper arise from that. Here with the new supplementary analyses inspired by the reviewer, the diet of Jeholornis is more detailed in our study, which may interest more readers concerning about the diet components of early birds.

      The cranial reconstruction of Jeholornis and the alimentary content data for extant birds would be invaluable to the community. The geometric morphometric data are presented in a way that obscures how much overlap there is between dietary categories (non-frugivore and non-granivore diets are grouped as 'other diets'), so the utility of these data is unclear. This aspect has hampered the ability of the authors to reconstruct diet in Jeholornis and, thus, the bigger picture insights that can be drawn from these results, limiting the likely impact of the work.

      Thank you very much for the positive comments about our cranial reconstruction of Jeholornis and the alimentary content data for extant birds.

      It was not our intention to obscure the overlaps between the mandible/cranial shape of frugivorous birds, and those of other birds. In fact, we believed that this was clear from the plots, and from the way we described results in the text that various birds with ‘other diets’ could have similar mandible/cranial shape to Jeholornis. This degree of overlap is also expected based on recent studies that found evidence for only quite diffuse relationships between cranial form and diet in birds (Navalón et al., 2019). However, we also see the point that some readers might be curious about the nature of particular datapoints and it would be useful to clarify this. We therefore added supplementary analyses according to the reviewer’s comment/suggestion by dividing the 'other diets' category into several much more detailed categories, so the concern of the reviewer here that “the non-frugivore and non-granivore diets are grouped as 'other diets' is expected to have been addressed here.

      Jeholornis is one of the earliest fossil birds, so understanding its diet and ecological role is important for understanding Mesozoic ecosystems and the emergence of modern ones.

      Thank you very much for this good explanation of the importance of this study, and it also is what we believed when we wrote the manuscript. We hope that the referee will be satisfied with the efforts we made to address their initial comments that that our paper on the ecology and morphology of Jeholornis can be published in an appropriate venue.

      Reviewer #3 (Public Review):

      Hu et al. reported on a new specimen of the early bird Jeholornis, including a nearly complete skull. Using geometric morphometrics data collected from 3D and 2D retro-deformed reconstructions of its skull, the authors convincingly dismiss a seed-cracking feeding strategy for the taxon. They then use comparisons of 3D reconstructions of ingested seeds to extant birds with known feeding strategies to convincingly argue that Jeholornis was likely at least partially frugivorous. As such, this study provides the strongest evidence yet that early birds such as Jeholornis may have played a role in bird-mediated seed dispersal strategies in the Mesozoic.

      Generally, the data presented in this paper support the authors' interpretations. The specimen at the core of this study is truly spectacular, and the authors' retro-deformation of its skull is skilled. The results of the authors' geometric morphometric analyses support their inference that Jeholornis was likely not a seed-cracker. Their comparisons of ingested seed shapes also convincingly supported a partially frugivorous diet. I especially applaud the authors' detailed description of their process of retro-deformation of the fossil skull (an example many should follow, including myself) as well as making both their raw data and their reconstructed surfaces available online.

      Thank you very much for the summary of our work!

      However, there are a few major and several minor issues that I believe need to be addressed.

      1. The implications for possible bird-mediated seed dispersal are clear in this study, but they are not conclusive. Rather, the authors (convincingly) demonstrate that Jeholornis was at least partially frugivorous -- a necessary component of such a mutualistic interaction. The authors do not demonstrate that such an interaction actually occurs. These results are nonetheless exciting and important, but I think certain statements in the paper are too strong. A notable example is the title - "Earliest evidence for frugivory and seed dispersal by birds." I would strongly urge the addition of a single word to better reflect the data presented: "Earliest evidence for frugivory and possible seed dispersal by birds." Similarly, in lines 328-329 -- "Strong indications for at least seasonal frugivory in Jeholornis provides direct evidence of [specialised seed-dispersal by animals during the Early Cretaceous] for the first time" -- is not true. This paper does not provide direct evidence for this, but does provide a mechanism consistent with this. There are a handful of other statements in the paper that I think should be toned down to account for this.

      Thanks for the helpful suggestions! We have revised the title to be “Earliest evidence for frugivory and potential seed dispersal by birds”, and revised this sentence to be “Evidence for at least seasonal frugivory in Jeholornis provides direct evidence of fruit-consumption by early birds, long before the origin of the bird crown-group. This provides an important indication of the likelihood that birds were recruited by plants for seed-dispersal very early in their evolutionary history, during the Early Cretaceous” now. We also revised through the manuscript to tone down some similar statements about the seed dispersal, such as “…indicating that birds may have been recruited for seed dispersal during the earliest stages of the avian radiation.”.

      1. Much more information should be given about the new Jeholornis specimen. In the supplement, the authors state that "a few post cranial elements" (p. 17, line 352) are preserved along with the skull -- which elements? They should be figured and briefly described in the supplement. This is of relevance to the core assumption of the paper, namely that this individual belonged to Jeholornis -- the taxonomic assignment is based partially on the tail morphology -- which I assume means that, minimally, a complete tail is preserved. The authors also mention the pelvic morphology of the new specimen, so I assume at least some part of the pelvis is preserved. These should all be figured. Most anatomical discussion is limited to the skull (and especially the palate), which is understandable, given the focus of the paper. However, with that in mind, more attention should be paid to the retro-deformation of the skull. Figure 1 is quite attractive, but I'm confused by the differences in depicted preservation between the 3D (Fig. 1C, D) and 2D (Fig. 1E, F) reconstructions. For example, the braincase is not shown in panel C but is in panel E -- why? Is its shape inferred from other specimens for panel E? Again, I very much appreciate the inclusion of near step-by-step description of how the rostrum was retro-deformed. Minimally, a few comments on what isn't preserved would be useful.

      1) We added the photograph of the whole slab of Jeholornis STM 3-8 as Figure 1 - figure supplements 1 here (the eLife format for supplementary figures), and revised this sentence to be “…and a few postcranial elements including the vertebral column, the pelvic girdle and fragmentary hindlimbs.” now. As you could see from the photograph, there are very few valid information could be extracted from the incompletely preserved postcranial elements. Considering this paper is focusing on the skull, we only mentioned the relatively better-preserved tail and pelvis in the taxonomic part.

      2) We added “Dashed-lines indicate the elements not preserved but suspected to exist.” in the legend of Figure 1, and added the details of reconstructions of unpreserved elements in the end of CT scans and digital reconstructions in Materials and Methods part: “However, since the braincase is too flattened to be used as the reference for 3D retrodeformation, it was omitted in Figure1C and reconstructed according to its common shape in early birds in Figure 1E. The ectopterygoid is not preserved but suspected to exist as discussed in the Cranial Anatomy part, therefore it was reconstructed according to the shape of this element among other stem birds e.g. Archaeopteryx and Sapeornis (Elzanowski and Wellnhofer, 1996; Hu et al., 2019).”

      1. The figures are visually attractive but I found some of them confusing or unclear. See my comments above regarding Figure 1. Despite the red arrows in Figure 4 and the supplemental figure, I was hard pressed to understand precisely what set the indicated seeds apart from the rest. In some cases I could see slight "dents" where one or two of the arrows indicated, but it was hard for me to see, even when I zoomed in on my screen. I think inset panels featuring zoom-ins on the indicated regions would be very useful in making the point the authors intend. Also, I don't know if the supplemental image naming/number scheme was imposed by the journal or is a choice by the authors, but I found it baffling. Something more traditional (like "Fig. S1" or "Supplemental Figure 1") would be much more efficient.

      1) We have clarified the confusions in Figure 1 as suggested. For Figure 4 and related supplementary figures, the 3D reconstructed seeds are pretty clear, such as the broken ones in Figure 4B. The broken seeds in the scanning slices are more difficult to observe as the reviewer said, since the seed husks are very thin so that they are only slightly brighter, and that’s why we put the red arrows indicating the breakages there. To help readers observe them easier, we added some zoom-in panels and line drawings for the representative ones (not all of them since otherwise it would be too many) now as suggested by the reviewer;

      2) The supplementary image naming/number scheme was imposed by the journal, and it would be more clear when the paper is digitally published, since these supplementary images will be connected to links in the legends of the main figures.

    1. Author Response

      Reviewer #1 (Public Review):

      Zhu et al. found that human participants could plan routes almost optimally in virtual mazes with varying complexity. They further used eye movements as a window to reveal the cognitive computations that may underly such close-to-optimal performance. Participants’ eye movement patterns included: (1) Gazes were attracted to the most task-relevant transitions (effectively the bottleneck transitions) as well as to the goal, with the share of the former increasing with maze complexity; (2) Backward sweeps (gazes moving from goal to start) and forward sweeps (gazes from start to goal) respectively dominated the pre-movement and movement periods, especially in more complex mazes. The authors explained the first pattern as the consequence of efficient strategies of information collection (i.e., active sensing) and connected the second pattern to neural replays that relate to planning.

      The authors have provided a comprehensive analysis of the eye movement patterns associated with efficient navigation and route planning, which offers novel insights for the area through both their findings and methodology. Overall, the technical quality of the study is high. The "toggling" analysis, the characterization of forward and backward sweeps, and the modeling of observers with different gaze strategies are beautiful. The writing of the manuscript is also elegant.

      I do not see any weaknesses that cannot be addressed by extended data analysis or modeling. The following are two major concerns that I hope could be addressed.

      We thank the reviewer for their positive assessment of our work!

      First, the current eye movement analysis does not seem to have touched the core of planning-evaluating alternative trajectories to the goal. Instead, planning-focused analyses such as forward and backward sweeps were all about the actually executed trajectory. What may participants’ eye movements tell us about their evaluation of alternative trajectories?

      This is an important point that we previously overlooked because our experimental design did not incorporate mutually exclusive alternative trajectories. Nonetheless, there are many trials in which participants had access to several possible trajectories to the goal. Some of those alternatives may be trivially suboptimal (e.g. highly convoluted trajectory, taking a slightly curved instead of straight trajectory, or setting out on the wrong path and then turning back). Using two simple constraints described in the Methods (no cyclic paths, limited amount of overlap between alternatives), we algorithmically identified the number of non-trivial alternative trajectories (or options) on each trial that were comparable in length to the chosen trajectory (within about 1 standard deviation). A few examples are shown below for the reviewer.

      The more plausible trajectory options there were, the more time participants spent gazing upon these alternatives during both pre-movement and movement (Figure 4 – figure supplement 1D – left). This is not a trivial effect resulting from the increase in surface area comprising the alternative paths because the time spent looking at the chosen trajectory also increased with the number of alternatives (Figure S8D – middle). Instead, this suggests that participants might be deliberating between comparable options.

      Consistent with this, the likelihood of gazing alternative trajectories peaked early on during pre-movement and well before performing sweeping eye movements (Figure 5D). During movement, the probability of gazing upon alternatives increases immediately before participants make a turn, suggesting that certain aspects of deliberation may also be carried out on the fly just before approaching choice points. Critically, during both pre-movement and movement epochs, the fraction of time spent looking at the goal location decreased with the number of alternatives (Figure 4 – figure supplement 1D – right), revealing a potential trade-off between deliberative processing and looking at the reward location. Future studies with more structured arena designs are needed to better understand the factors that lead to the selection of a particular trajectory among alternatives, and we mention this in the discussion (line 445):

      "Value-based decisions are known to involve lengthy deliberation between similar alternatives. Participants exhibited a greater tendency to deliberate between viable alternative trajectories at the expense of looking at the reward location. Likelihood of deliberation was especially high when approaching a turn, suggesting that some aspects of path planning could also be performed on the fly. More structured arena designs with carefully incorporated trajectory options could help shed light on how participants discover a near-optimal path among alternatives. However, we emphasize that deliberative processing accounted for less than onefifth of the spatial variability in eye movements, such that planning largely involved searching for a viable trajectory."

      Second, what cognitive computations may underly the observed patterns of eye movements has not received a thorough theoretical treatment. In particular, to explain why participants tended to fixate the bottleneck transitions, the authors hypothesized active sensing, that is, participants were collecting extra visual information to correct their internal model about the maze. Though active sensing is a possible explanation (as demonstrated by the authors’ modeling of "smart" observers), it is not necessarily the only or most parsimonious explanation. It is possible that their peripheral vision allowed participants to form a good-enough model about the maze and their eye movements solely reflect planning. In fact, that replays occur more often at bottleneck states is an emergent property of Mattar & Daw’s (2018) normative theory of neural replay. Forward and backward replays are also emergent properties of their theory. It might be possible to explain all the eye movement patterns-fixating the goal and the bottleneck transitions, and the forward and backward replays-based on Mattar & Daw’s theory in the framework of reinforcement learning. Of course, some additional assumptions that specify eye movements and their functional roles in reinforcement learning (e.g., fixating a location is similar to staying at the corresponding state) would be needed, analogous to those in the authors’ "smart" observer models. This unifying explanation may not only be more parsimonious than the author’s active sensing plus planning account, but also be more consistent with the data than the latter. After all, if participants had used fixations to correct their internal model of the maze, they should not have had little improvements across trials in the same maze.

      We thank the reviewer for this reference. We note the strong parallels between our eye movement results and that study in the discussion, in addition to proposing experimental variations that will help crystallize the link. Below, we included our response that was incorporated into the Discussion section (beginning at line 462).

      "In [a] highly relevant theoretical work, Mattar and Daw proposed that path planning and structure learning are variants of the same operation, namely the spatiotemporal propagation of memory. The authors show that prioritization of reactivating memories about reward encounters and imminent choices depends upon its utility for future task performance. Through this formulation, the authors provided a normative explanation for the idiosyncrasies of forward and backward replay, the overrepresentation of reward locations and turning points in replayed trajectories, and many other experimental findings in the hippocampus literature. Given the parallels between eye movements and patterns of hippocampal activity, it is conceivable that gaze patterns can be parsimoniously explained as an outcome of such a prioritization scheme. But interpreting eye movements observed in our task in the context of the prioritization theory requires a few assumptions. First, we must assume that traversing a state space using vision yields information that has the same effect on the computation of utility as does information acquired through physical navigation. Second, peripheral vision allows participants to form a good model of the arena such that there is little need for active sensing. In other words, eye movements merely reflect memory access and have no computational role. Finally, long-term statistics of sweeps gradually evolve with exposure, similar to hippocampal replays. These assumptions can be tested in future studies by titrating the precise amount of visual information available to the participants, and by titrating their experience and characterizing gaze over longer exposures. We suspect that a pure prioritization-based account might be sufficient to explain eye movements in relatively uncluttered environments, whereas navigation in complex environments would engage mechanisms involving active inference. Developing an integrative model that features both prioritized memory-access as well as active sensing to refine the contents of memory, would facilitate further understanding of computations underlying sequential decision-making in the presence of uncertainty."

      In the original manuscript, we referred to active sensing and planning in order to ground our interpretation in terminology that has been established in previous works by other groups, which had investigated them in isolation. Although the role active sensing could be limited, we are unable to conclude that eye movements solely reflect planning. Even if peripheral vision is sufficient to obtain a good-enough model of the environment, eye movements can further reduce uncertainty about the environment structure especially in cluttered environments such as the complex arena used in this study. This reduction in uncertainty is not inconsistent with a lack of performance improvement across trials. This is because the lack of improvement could be explained by a failure to consolidate the information gathered by eye movements and propagate them across trials, an interpretation that would also explain why planning duration is stable across trials (Figure 2 – figure supplement 2B). Furthermore, participants gaze at alternative trajectories more frequently when more options are presented to them. However we acknowledge that this is a fundamental question, and identified this as an important topic for follow up studies and outline experiments to delineate the precise extent to which eye movements reflect prioritized memory access vs active sensing. Briefly, we can reduce the contribution of active sensing by manipulating the amount of visual information – ranging from no information (navigating in the dark) to partial information (foveated rendering in VR headset). Likewise, we can increase the contribution of memory by manipulating the length of the experiment to ensure participants become fully familiar with the arena. Yet another manipulation is to use a fixed reward location for all trials such that experimental conditions would closely match the simulations of the prioritization model. We are excited about performing these follow up experiments.

      Reviewer #2 (Public Review):

      In this study the authors sought to understand how the patterns of eye-movements that occur during navigation relate to the cognitive demands of navigating the current environment. To achieve this the authors developed a set of mazes with visible layouts that varied in complexity. Participants navigated these environments seated on a chair by moving in immersive virtual reality.

      The question of how eye-movements relate to cognitive demands during navigation is a central and often overlooked aspect of navigating an environment. Study eye-movements in dynamic scenarios that enable systematic analysis is technically challenging, and hence why so few studies have tackled this issue.

      The major strengths of this study are the technical development of the set up for studying, recording and analysing the eye-movements. The analysis is extensive and allows greater insight than most studies exploring eye-movements would provide. The manuscript is also well written and argued.

      A current weakness of the manuscript is that several other factors have not been considered that may relate to the eye-movements. More consideration of these would be important.

      We thank the reviewer for their positive assessment of the innovative aspects of this study. We have tried to address the weaknesses by performing additional analyses described below.

      1. In the experimental design it appears possible to separate the length of the optimal path from the complexity of the maze. But that appears not to have been done in this design. It would be useful for the authors to comment on this, as these two parameters seem critically important to the interpretation of the role of eye-movements - e.g. a lot of scanning might be required for an obvious, but long path, or a lot of scanning might be required to uncover short path through a complex maze.

      This is a great point. We added a comment to the Discussion at line 489 to address this:

      "Future work could focus on designing more structured arenas to experimentally separate the effects of path length, number of subgoals, and environmental complexity on participants’ eye movement patterns."

      To make the most of our current design, we performed two analyses. First, we regressed trial-specific variables simultaneously against path length and arena complexity. This analysis revealed that the effect of complexity on behavior persists even after accounting for path length differences across arenas (Figure 4 – figure supplement 3). Second, path length is but one of many variables that collectively determine the complexity of the maze. Therefore, we also analyzed the effects of multiple trial-specific variables (number of turns, length of the optimal path, and the degree to which participants are expected to turn back the initial direction of heading to reach the goal, regardless of arena complexity) on eye movements. This revealed fine-grained insights on which task demands most influenced each eye movement quality that was described. More complex arenas posed, on average, greater challenges in terms of longer and more winding trajectories, such that eye movement qualities which increased with arena complexity also generally increased with specific measures of trial difficulty, albeit to varying degrees. We added additional plots to the main/supplementary figures and described these analyses under a new heading (“Linear mixed effects models”) in the Methods section.

      1. Similarly, it was not clear how the number of alternative plausible paths was considered in the analysis.It seems possible to have a very complex maze with no actual required choices that would involve a lot of scanning to determine this, or a very simple maze with just two very similar choices but which would involve significant scanning to weight up which was indeed the shortest.

      Thank you for the suggestion. In conjunction with our response to the first comment from Reviewer #1, we used some constraints to identify non-trivial alternative trajectories – trajectories that pass through different locations in the arena but are roughly similar in length (within about 1 SD of the chosen trajectory). In alignment with your intuition, the most complex maze, as well as the completely open arena, did not have non-trivial alternative trajectories. For the three arenas of medium complexity, the more open arenas had more non-trivial alternative trajectories.

      When we analyzed the relative effect of the number of alternative trajectories on eye movement, we found that both possibilities you suggested are true. On trials with many comparable alternatives, participants indeed spend more time scanning the alternatives and less time looking at the goal (Figure S8D). Likewise, in the most complex maze where there are no alternatives, participants still spent much more time (than simpler mazes) learning about the arena structure at the expense of looking at the goal (Figure 3E-F). This analysis yielded interesting new insights into how participants solved the task and opens the door for investigating this trade-off in future work. More generally, because both deliberation and structure learning appear to drive eye movements, they must be factored into studies of human planning.

      1. Can the affordances linked to turning biases and momentum explain the error patterns? For example,paths that require turning back on the current trajectory direction to reach the goal will be more likely to cause errors, and patterns of eye-movements that might be related to such errors.

      Thank you for this question. In conjunction with the trial-specific analyses on the effect of the length of the trajectory (Point #1) on errors and eye movement patterns, we also looked into how the number of turns and the relative bearing (angle between the direction of initial heading and the direction of target approach) affected participants’ behavior. Turns and momentum do not affect the relative error (distance of the stopping location to the target) as much as the trajectory length does, which was unexpected (Figure 1 – figure supplement 1F). This supports that errors were primarily caused by forgetting the target location, and this memory leak gets worse with distance (or time). However, turns have an influence on eye movements in general. For example, more turns generally result in an increase in the fraction of time that participants spend gazing upon the trajectory (Figure 4 – figure supplement 1A) and sweeping (Figure 4D). Furthermore, the number of turns decreased the fraction of time participants spent gazing at the target during movement (Figure 2D).

      1. Why were half the obstacle transitions miss-remembered for the blind agent? This seems a rather arbitrary choice. More information to justify this would be useful.

      We tested out different percentages and found qualitatively similar results. The objective was to determine the patterns of eye movements that would be most beneficial when participants have an intermediate level of knowledge about the arena configuration (rather than near-zero or near-perfect), because during most trials, participants can also use peripheral vision to assess the rough layout, but they do not precisely remember the location of the obstacles. We added this explanation to Appendix 1, where the simulation details have been made in response to a suggestion by another reviewer.

      1. The description of some of the results could usefully be explained in more simple terms at various pointsto aid readers not so familiar with the RL formation of the task. For example, a key result reported is that participants skew looking at the transition function in complex environments rather than the reward function. It would be useful to relate this to everyday scenarios, in this case broadly to looking more at the junctions in the maze than at the goal, or near the goal, when the maze is complex.

      This is a great suggestion. We added an everyday analogy when describing the trade-off on line 258.

      "The trade-off reported here is roughly analogous to the trade-off between looking ahead towards where you’re going and having to pay attention to signposts or traffic lights. One could get away with the former strategy while driving on rural highways whereas city streets would warrant paying attention to many other aspects of the environment to get to the destination."

      1. The authors should comment on their low participant sample size. The sample seems reasonable giventhe reproducibility of the patterns, but it is much lower than most comparable virtual navigation tasks.

      Thank you for the recommendation. We had some difficulties recruiting human participants who were willing to wear a headset which had been worn by other participants during COVID-19, and some participants dropped out of the study due to feeling motion sickness. To ameliorate the low sample size, we collected data on four more participants and performed analyses to confirm that the major findings may be observed in most individual participants. Participant-specific effects are included in the new plots made in response to Points # 1-3, and the number of participants with a significant result for each figure/panel has been included as Appendix 2 – table 3.

      Reviewer #3 (Public Review):

      In this article, Zhu and colleagues studied the role of eye movements in planning in complex environments using virtual reality technology. The main findings are that humans can 1) near optimally navigate in complex environments; 2) gaze data revealed that humans tend to look at the goal location in simple environments, but spend more time on task relevant structures in more complex tasks; 3) human participants show backward and forward sweeping mostly during planning (pre-movement) and execution (movement), respectively.

      I think this is a very interesting study with a timely question and is relevant to many areas within cognitive neuroscience, notably decision making, navigation. The virtual reality technology is also quite new for studying planning. The manuscript has been written clearly. This study helps with understanding computational principles of planning. I enjoyed reading this work. I have only one major comment about statistical analyses that I hope authors can address.

      We thank the reviewer for the accurate description and positive assessment of our work.

      Number of subjects included in analyses in the study is only nine. This is a very small sample size for most human studies. What was the motivation behind it? I believe that most findings are quite robust, but still 9 subjects seems too low. Perhaps authors can replicate their finding in another sample? Alternatively, they might be able to provide statistics per individual and only report those that are significant in all subjects (of course, this only works if reported effects are super robust. But only in such a case 9 subjects are sufficient.)

      Thank you for the suggested alternatives. Due to the pandemic, we had some difficulties recruiting human participants who were willing to wear a headset which had been worn by other participants. We collected data on four more participants and included them in the analyses, and also confirmed that the major findings are observed in most individuals. The number of participants with a significant result for each analysis has been included in Figure 1 – figure supplement 3 and Appendix 2 – table 3.

      Somewhat related to the previous point, it seems to me that authors have pooled data from all subjects (basically treating them as 1 super-subject?) I am saying this based on the sentence written on page 5, line 130: "Because we are interested in principles that are conserved across subjects, we pooled subjects for all subsequent analyses." If this is not the case, please clarify that (and also add a section on "statistical analyses" in Methods.) But if this is the case, it is very problematic, because it means that statistical analyses are all done based on a fixed-effect approach. The fixed effect approach is infamous for inflated type I error.

      Your interpretation is correct and we acknowledge your concern about pooling participants. We had done this after observing that our results were consistent across participants but this was not demonstrated. We have now performed analyses sensitive to participant-specific effects and find that all major results hold for most participants, and we included additional main and supplementary bar plots (and tables in Appendix 2) showing per-participant data. The new plots/table show the effect of independent variables (mainly trial/arena difficulty) on dependent variables for each participant, as well as general effects conserved across participants. A new paragraph was added to the Methods section to describe the “Linear mixed effects models” which we used.

      Again, quite related to the last two points: please include degrees of freedom for every statistical test (i.e. every reported p-value).

      Degrees of freedom (df) are now included along with each p-value.

    1. Author Response:

      Reviewer #2 (Public Review):

      This is an interesting and scientifically rigorous report documenting atypical, dendritic locations for the emerging axon of pyramidal neurons. This is not an entirely new observation (the authors cite relevant publications, including Kole and Brette, 2018 and Mendizabal-Zubiaga et al., 2007), but still important, as a relatively overlooked fact with functional implications. A main feature of the present report is an exceptionally thorough cross-species survey, from which the authors conclude that, as compared with non-primates, the macaque and human brains have a lower proportion of neocortical pyramidal neurons with axon carrying dendrites. The results might be further supported by additional experiments, especially ultrastructural data, or by including more extensive developmental data. There is a section on Development, but there is hardly any Discussion. However, these matters are raised and adequately treated by reference to the existing literature.

      We cannot do EM with frozen material or DEPEX-cleared sections. The developmental aspects have been more extensivel discussed now, but we refrained from speculating too much, since we do not have physiological data.

      Reviewer #3 (Public Review):

      The authors used neuroanatomical techniques to study neocortical pyramidal neurons from several different mammalian species. Their message is that primate neocortex differs from that of other mammals in having substantially fewer cells with axons emanating from dendrites, rather than the canonical route from the soma. The authors employed a range of standard methods, ranging from tracer injection to Golgi impregnation to immunocytochemistry. The feature the authors report is undeniable; there clearly are axons that emanate from dendrites of neocortical pyramidal neurons. Prior studies have reported that these axons are more excitable, thus leading to the intriguing possibility of a fundamental architectural (and thus presumably functional) feature in how primate neocortex operates.

      This is a provocative narrative, that leads to a number of interesting questions. However, I have reservations that the authors must address before I believe the claim that primates are really fundamentally different from other mammals in this respect. A strength but also a central limitation of this study is that different species were compared using different methods, and different areas were studied in different species. The authors make the implicit assumption that the prominence of this feature does not differ among cortical areas.

      We initially considered it a strength of the study – looking into many area with many methods in many species. However, it seemed a bit like cherry-picking, and we now enlarged the data sets for a more systematic analysis. Please note, we assessed archived material. We are bound to what we have available. We now delivered areal comparisions, and I am afraid, the answer is NO, no remakable differences in the areas that we assessed in monkey and cat.

      However, it is entirely plausible that the proportion of neurons with axon-carrying dendrites does differ among cortical areas. The authors also group neurons into 2 large populations: infra- and supragranular. But again, layers 2 and 3 differ from one another (as do layers 5 and 6) in the specific populations of pyramidal cells they contain (morphological and neurochemical types, inputs and outputs, etc.). Certainly many studies do group neurons into these broad populations, but for this kind of comparison relevant differences or similarities could have been lost. Comparisons among species ideally would have all been in the same layer and area.

      As said, we are bound to what we have available. And this is more than what has ever been published on these question so far. The graph and the Tables to Figure 3B allow to compare species across the layers.

      We are aware that pyramidal cells in the layers can differ. Looking into RNA seq papers, up to 19 types exist in mouse. How many could potentially then exist in human? There is no way of pulverizing our kind of analysis down to the level of 19 pyramidal cell types differing by some unexplained RNA signatures which so far exist only for mouse. The SMI-32 staining already “selects” for one subtype in that it stains preferentially so-called type 1 pyramidal cells (Molnar et al., 2006).

      Another limitation is that the same method was not employed in different species. The reader needs to know that different methods reveal the same proportion of axon-carrying dendrites in a given area of a certain species. This should have been stated more clearly and earlier in the text; it took examination of the data tables to see this. The tables show that measurements were made in several different cortical areas. Can the authors provide any evidence that the proportion of neurons with axon-carrying dendrites does not differ in any one species among cortical areas?

      We now provide areal comparisons for 5 fields in monkey (new Figure 4A) and visual fields in cat (new Fig. 4B), both with the same methods. We can even provide a within-individual comparison of brain areas and of methods. Another three areal values for the infant macaque have been plotted in Figure 3B.

      Figure 3 description and/or legend needs to state clearly that different species' neocortex was studied in different areas (and if all Fig3 samples shown are from same layers).

      Figure 3A is total cortex, Figure 3 B is by layers. Counting strategies are now described in detail in methods.

      Supplementary Excel file suggests that for humans Golgi-Kopsch reveals fewer infragranular AcD-cells than Golgi-Cox (4.43 vs 1.39), while for adult macaques Golgi-Kopsch revealed fewer than biocytin injection or SMI-32/BetaIV-spectrin immunofluorescence (13.34 vs 7.98 vs 6.29). Since the human data relies on Golgi methods, the authors must reassure the readers that the comparison of species is validated by direct comparison of different methods.

      The message that primates have fewer cells with axon-carrying dendrites than other mammals might therefore certainly be interesting but far less compelling. The message might be that primate neocortex is not qualitatively different from that of other species; instead they simply have somewhat fewer AcD-bearing neurons than other mammalian species. But even that more modest conclusion is suggested but not fully proven by the data here.

      The referee was right at this point. Having doubled our data sets with more human data we now aggree: the Golgi method underestimates the AcD neurons simply because of optical limitations. We now extensively discuss the issue and we no longer do statistical analysis on human. The issue needs further investigation with more methods.

      I was puzzled by Fig 4 not including primate tissue. If the message is that spine density does not differ in dendrites with and without axons, surely it would be important to include primate tissue in this comparison; the comparison between primates and on-primates is after all the core message of this study. I also do not think the values for each species for non-AcD and shared root should be connected by a line; I suggest instead there should simply be a scatter of values for each group with a large symbol indicating mean or median value of each group. This would facilitate comparison.

      First to the graph on spines, now Figure 6. You have to connect the individual neurons by line, otherwise the major point can no longer be seen: the dendrites differ in spine counts, sometimes the AcD is higher than the other basals of the very same neuron, in the next cell the AcD had a lower count. Statistics did not even suggest a trend. We aggree that things may differ in immature neurons. Possibly, during early development the AcD gains advantages by means of its higher excitability.

      Please read the methods part to this point, elegible neurons had to fullfil a number of criteria. We fully exploited the available material of rat and ferret; no more elegible neurons. We indeed tried the same in macaque. Section thickness 50 µm. We found exactly two neurons which fullfilled the criteria. We had no chance with this material given the enormous dimension of the pyramidal cell dendritic trees in monkey. They were simply cut. For this type of classical tracing studies, non-alternating section series were prepared and submitted to different types of staining. Section spacing was several hundred µm in each individual. No chance to “reconstruct” dendrites from adjacent sections, since there were no adjacent sections.

      The core message of the study is still valid, also without the spine analysis in monkey.

    1. solo thinking isrooted in our lifelong experience of social interaction; linguists and cognitivescientists theorize that the constant patter we carry on in our heads is a kind ofinternalized conversation. Our brains evolved to think with people: to teachthem, to argue with them, to exchange stories with them. Human thought isexquisitely sensitive to context, and one of the most powerful contexts of all isthe presence of other people. As a consequence, when we think socially, wethink differently—and often better—than when we think non-socially.

      People have evolved as social animals and this extends to thinking and interacting. We think better when we think socially (in groups) as opposed to thinking alone.

      This in part may be why solo reading and annotating improves one's thinking because it is a form of social annotation between the lone annotator and the author. Actual social annotation amongst groups may add additonal power to this method.

      I personally annotate alone, though I typically do so in a publicly discoverable fashion within Hypothes.is. While the audience of my annotations may be exceedingly low, there is at least a perceived public for my output. Thus my thinking, though done alone, is accelerated and improved by the potential social context in which it's done. (Hello, dear reader! 🥰) I can artificially take advantage of the social learning effects even if the social circle may mathematically approach the limit of an audience of one (me).

    2. Humans’ tendency to“overimitate”—to reproduce even the gratuitous elements of another’s behavior—may operate on a copy now, understand later basis. After all, there might begood reasons for such steps that the novice does not yet grasp, especially sinceso many human tools and practices are “cognitively opaque”: not self-explanatory on their face. Even if there doesn’t turn out to be a functionalrationale for the actions taken, imitating the customs of one’s culture is a smartmove for a highly social species like our own.

      Is this responsible for some of the "group think" seen in the Republican party and the political right? Imitation of bad or counter-intuitive actions outweights scientifically proven better actions? Examples: anti-vaxxers and coronavirus no-masker behaviors? (Some of this may also be about or even entangled with George Lakoff's (?) tribal identity theories relating to "people like me".

      Explore this area more deeply.

      Another contributing factor for this effect may be the small-town effect as most Republican party members are in the countryside (as opposed to the larger cities which tend to be more Democratic). City dwellers are more likely to be more insular in their interpersonal relations whereas country dwellers may have more social ties to other people and groups and therefor make them more tribal in their social interrelationships. Can I find data to back up this claim?

      How does link to the thesis put forward by Joseph Henrich in The WEIRDest People in the World: How the West Became Psychologically Peculiar and Particularly Prosperous? Does Henrich have data about city dwellers to back up my claim above?

      What does this tension have to do with the increasing (and potentially evolutionary) propensity of humans to live in ever-increasingly larger and more dense cities versus maintaining their smaller historic numbers prior to the pre-agricultural timeperiod?

      What are the biological effects on human evolution as a result of these cultural pressures? Certainly our cultural evolution is effecting our biological evolution?

      What about the effects of communication media on our cultural and biological evolution? Memes, orality versus literacy, film, radio, television, etc.? Can we tease out these effects within the socio-politico-cultural sphere on the greater span of humanity? Can we find breaks, signs, or symptoms at the border of mass agriculture?


      total aside, though related to evolution: link hypercycles to evolution spirals?

    1. Author Response

      Reviewer #1 (Public Review):

      In the present study, the authors first analyzed simultaneously recorded human EEG-fMRI data and found the fMRI signatures of burst-suppression. Then, they reported such burst-suppression fMRI signatures in the other three species examined: macaques, marmosets, and rats. Interestingly, their results indicated an inter-species difference: the entire neocortex engaged in burst-suppression in rats, whereas most of the sensory cortices were excluded in primates. The fMRI signatures of burst-suppression were confirmed in several species, suggesting that such signature is a robust phenomenon across animals. These findings warrant further investigation into its neural mechanisms and functional implications.

      Major Issues

      1) One of the major findings is that burst-suppression in primates appeared to largely spare sensory cortices, especially V1. However, as seen in the tSNR map for macaques and marmosets (Figure 3 &4 -figure supplement 4), the tSNR around the primary visual cortex was much weaker than other cortices. Moreover, in marmosets, the EPI slices did not cover the entire brain and actually left most of the V1 uncovered as seen in Figure 4. If so, the authors should draw their conclusions very carefully when talking about the differences in V1 across species. It would be better to analyze and discuss how the tSNR differences affect their findings. For example, the author may consider including the tSNR as covariance in their map analysis.

      The tSNR in the occipital cortex—especially in the macaque V1—is indeed lower than in more anterior parts of the brain. The higher noise in V1 may have obscured the burst-suppression signal and hindered its detection. That said, we think that burst-suppression would still be detectable at such low tSNR values. We base this claim on our analysis of another macaque brain region—area TE of the inferior temporal cortex (see our additions to Figure 3–figure supplement 4). The tSNR in areas TE and V1 is comparably low, and yet TE is significantly correlated with asymmetric PCs while V1 is not. Therefore, if the burst-suppression fluctuation was present in V1 we should have still detected it.

      Regarding the marmoset data, part of V1 was indeed left out of our field of view, as explicitly shown in our figures (Figure 4 and Figure 4–figure supplement 3). Though we cannot exclude the possibility that the omitted posterior V1 engages in burst-suppression, we think that it is unlikely to behave any differently to more anterior visual areas. We sought more support for this view by obtaining full-brain fMRI data in one additional marmoset. We present this analysis in a new paragraph of the relevant Results section and in the new Figure 4–figure supplement 5. The asymmetric PC map in this individual showed widespread correlation across the neocortex, extending slightly further caudally compared with the group map presented in Figure 4. However, nearly all of V1—including the occipital pole—was still uncorrelated. Considering both the new full-brain marmoset data and the results from area TE in macaques, we think that our conclusion about the uncoupling of primate V1 during burst-suppression is still justified. That said, we have now explicitly included the relevant concerns in the manuscript text.

      2) To confirm their findings, it would be great to look into the EEG signals around the sensory cortex (e.g., V1) to see whether the findings in fMRI could be also confirmed with EEG.

      EEG signals around V1 were already examined during the previous analysis of the human dataset (Golkowski et al., 2017). As reported there, the EEG signal of the occipital electrodes did contain bursts, which could not be differentiated from bursts detected by more anterior electrodes in terms of onset timing, duration, or spectral content. This might mean that the BOLD signal in VI is truly uncoupled from electrical activity. However, we should also consider that EEG may lack the spatial resolution to detect a different activity originating from V1. As seen in the human map (Figure 3), the external cortical surface is almost exclusively covered with areas engaging in burst-suppression, whereas the ‘uncoupled’ V1 represents a small patch by comparison. Therefore, EEG cannot safely determine the nature of electrical activity in V1. We have added the above arguments to the last section of Results. We expect a conclusive answer to come from future electrophysiological recordings in nonhuman primates. The larger proportional size of visual areas in macaques and marmosets as well as the possibility of invasive intra-cranial recordings make these animals attractive models for addressing this question.

      3) As seen in Figure 2-figure supplement 2, there was a significant anticorrelation with burst-suppression at the ventricular borders. It is unclear whether the authors have done physiological or white matter/CSF/global nuisance regression as most of the rest-fMRI studies did. Please make it clear. If not, please explain why and discuss whether it would affect their results.

      We chose to analyze the data without CSF or global signal regression. CSF regression typically requires extracting the signal of a few voxels within the ventricles. Accurately placing such voxels is feasible in the human brain but challenging in small animal brains, especially in rodents. Rodent ventricles are very thin, making it difficult to place a CSF voxel that will not overlap with surrounding brain tissue. Since we had prioritized making the analysis as similar as possible across species, we decided to also forgo CSF regression in humans. While this was our original motivation for omitting CSF regression, we later came across an even more important concern. As we show in Figure 2–figure supplement 2, the CSF signal is not ‘noise’; rather, it is directly related to burst-suppression, and most likely caused by it. Regressing it out would remove much of the variance explained by burst suppression. The coherence between neural, hemodynamic, and CSF oscillations that we see in burst-suppression likely also occurs in other states characterized by global synchrony, as has been shown for non-rapid eye movement sleep (Fultz et al., 2019).

      We think that global signal regression makes no sense in our case, given that our goal was to study a nearly global signal fluctuation. Global signal regression relies on the assumption that neuronal activity is variable across brain regions while many non-neuronal sources contribute globally to the brain signal (Murphy and Fox, 2017). This assumption does not hold true in cases where the neuronal activity itself is global.

      4) Three different concentrations of the anesthetic sevoflurane were chosen for human participants. The authors found that the high concentration (3.9-4.6%) induced burst-suppression much better than the other two lower concentrations as expected. However, in rats, almost all asymmetric PCs were found at an intermediate concentration (2%) of isoflurane less at the low (1.5%) or high (2.5%) concentration in Rat 1. At the same time, all fMRI runs from Rat 2 with a 1.3% concentration of isoflurane had a prominent asymmetric PC. That is, it seems that only the high concentration of isoflurane could not induce burst-suppression well in rats, which was opposite to those findings in humans. The authors may explain what reasons may cause such differences and whether such differences may affect the major findings in differences between primates and rodents.

      The three sevoflurane concentrations (‘high’, ‘intermediate’, ‘low’) used in humans do not necessarily correspond to the three isoflurane concentrations used in rats (2.5%, 2.0%, 1.5%). Comparing anesthetic concentrations across our datasets is challenging, since anesthetic potency is expected to vary depending on the drug (sevoflurane or isoflurane), animal species, age, and the co-administration of other drugs. Nevertheless, we may estimate equivalent concentrations across species by expressing them as multiples of the minimum alveolar concentration (MAC), i.e. the concentration that produces immobility in 50% of subjects undergoing a standard surgical stimulus.

      For humans, we can use available age-related MAC charts (Nickalls and Mapleson, 2003) to express the three sevoflurane levels as follows: ~1 MAC (2%), 1.5 MAC (3%), 2.2–2.3 MAC (3.9–4.6%). For rats, we can rely on the previously reported isoflurane MAC value of 1.35% (Criado et al., 2000) to derive the following levels: 1.2 MAC (1.5%), 1.6 MAC (2%), 1.9 MAC (2.5 %), and ~1 MAC (1.3%, Rat 2 dataset). According to these conversions, fMRI-detectable burst-suppression occurred in humans at ~2 MAC (with some cases at 1.5 MAC), in the Rat 1 dataset at 1.2–1.6 MAC, and in the Rat 2 dataset at 1 MAC. There seems to be a difference between rats and humans as well as a discrepancy between the two rat datasets. The latter discrepancy could have arisen from differences in the calibration of isoflurane vaporizers at the two research sites (direct measurements of end-tidal anesthetic concentration were not obtained in rats).

      In order to better interpret the observed human-rat difference we tried to also compute the multiples of MAC values for our nonhuman primate data, but this proved to be hard. For common marmosets, we are not aware of any published isoflurane MAC values. For long-tailed macaques, a value of 1.28% has been reported (Tinker et al., 1977), which gives a range of 0.7 – 1.2 MAC for our macaque dataset. However, that probably underestimates the actual depth of anesthesia in our experiments, since many of our macaques were old and MAC is known to decrease with age (Nickalls and Mapleson, 2003). Moreover, the administration of medetomidine during anesthesia induction may have further reduced the MAC (Ewing et al., 1993). Consequently, we cannot provide good MAC estimates for the nonhuman primate data and thus have no reference for comparison with other species.

      Even if we knew the correct MAC value in all cases, it may be an inappropriate means of standardizing anesthetic concentrations for burst-suppression. The endpoint measured by MAC—immobility—is mainly mediated by anesthetic effects on the spinal cord and my not be a good predictor for effects on the brain (Rampil et al., 1993). In fact, burst-suppression itself has been proposed as a more appropriate endpoint for measuring anesthetic potency. The proposed metric (MACBS) is defined as the concentration that produces suppressions longer than 1 s in 50% of subjects and is not linearly related to MAC (Pilge et al., 2014).

      In conclusion, if we reference anesthetic concentrations against the MAC, humans and rats indeed seem to exhibit burst-suppression at different concentration ranges. We are unable to perform the same referencing for non-human primates, due to lack of accurate MAC values. Moreover, it is unclear whether MAC is the appropriate reference to begin with. Discussing all these nuances would make the manuscript too long. That said, we have now added a new paragraph to the Discussion section, drawing attention to the fact that anesthetic concentrations are not standardized across species.

      Reviewer #2 (Public Review):

      The strong point in their manuscript is the originality of their results. Using the fMRI's spatial resolution, they can successfully reveal that not all brain areas are synchronized during the burst suppression. Furthermore, they can find that the difference is the most obvious when comparing primates with rats, which makes sense considering the distance on the phylogenetic tree. As far as I know, this manuscript first reports these points.

      On the other hand, there is a weak point in their method. As they've already discussed this point, they needed to use arbitrary thresholds to evaluate whether there is burst suppression or not. Furthermore, this study cannot reject the possibility of spatial inhomogeneity and/or anesthesia-specific modulation in hemodynamic response. If there is such a mechanism, one can find different results from those obtained through electrical measurements.

      1) The authors found that some sensory areas in primates are excluded from those highly synchronized during the burst suppression. While it is true, I wonder if each voxel in such areas shows burst suppression-like activity that is not synchronized with others. If this is the case, burst suppression can still be a global phenomenon. Though authors seem to investigate this point, they used in-ROI averaged time-series so that it cannot reject the possibility that each voxel inside the ROI is not synchronized but shows burst suppression in its manner. I recommend the authors look into each voxel if this is the case or not.

      The reviewer raises an interesting point by proposing that it is possible for sub-regions within the excluded areas—e.g. within V1—to exhibit burst-suppression out-of-phase with each other, thus cancelling out in the mean V1 BOLD signal. We do not think this is the case, for several reasons. Firstly, we can exclude the possibility that any part of V1 exhibits bust-suppression in-phase with the rest of the cortex. The original first-level GLM analysis was a voxel-based univariate analysis. If any voxels within V1 were correlated with the global burst-suppression pattern, we would have seen it on the maps. We saw no such effect, except for some subjects in which a subset of V1 voxels was anti-correlated with the asymmetric PC (the effect was not significant in our group analysis). This anticorrelation was mostly located close to the ventral horns of the two lateral ventricles, and thus could have arisen by the same cycle of ventricular shrinkage-expansion that we describe in Figure 2–figure supplement 2. Secondly, no large clusters of V1 voxels exhibited burst-suppression out-of-phase with the dominant asymmetric PC. If this was the case, we would have seen a phase-shifted version of the fluctuation on the carpet plots. This still leaves the theoretical possibility that individual V1 voxels (or a few at a time) exhibit transitions between burst and suppression epochs out-of-phase with each other. In our response to the next point, we will explain why there is no way of detecting this with fMRI and we discuss whether such a possibility would even fit the label of burst-suppression.

      2) The other but similar point is about their way to detect burst suppression. Why did they use the principal component? By definition, burst suppression should be defined by the existence of burst and suppressed periods. I cannot understand why they did not simply use this definition to check whether each voxel shows such an intermittent activity to evaluate whether it is a global phenomenon or not.

      Burst-suppression on EEG is characterized by quasi-periodic suppressions of activity, during which the EEG signal drops close to being isoelectric. We cannot apply the same definition to fMRI, because the BOLD signal only represents relative changes and thus has no natural baseline equivalent to isoelectricity. Hence there is no way of telling whether a BOLD signal decrease corresponds to a complete activity cessation (suppression) or simply a relative decline. Therefore, we instead decided to rely on another defining feature of burst-suppression—synchrony. We knew that burst-suppression appears simultaneously across EEG electrodes, which means that large parts of the cortex (the major contributor to EEG signal) would have to be synchronized. Moreover, we knew that transitions between burst and suppression epochs occur on a very slow timescale and would be resolvable at a TR of 2 s. PCA allowed us to isolate the large slow synchronous component in the cortical BOLD signal, though this is hardly the only approach that would work. We chose PCA because it is a simple, deterministic, and easily interpretable algorithm.

      On a related note, even if we could identify complete cessation of activity in the BOLD signal of a single voxel, it is unclear whether that would qualify as burst-suppression per the EEG definition. EEG electrodes pick up activity from areas much larger than a voxel, and thus the very presence of an EEG fluctuation presupposes synchrony on a larger spatial scale. If individual voxel-sized brain areas engaged in burst-suppression out-of-phase, that would probably not register as burst-suppression on an EEG electrode.

      3) Why is there no synchronization during the slow-wave states under light anesthesia? During the slow-wave sleep, it is shown that the entire cortical network is decomposed into a modular-like network structure. Is there synchronization inside each module while no synchrony between modules?

      We do not claim that there is no synchrony in the slow-wave state. We simply state that this state lacks the nearly global cortex-wide fluctuation that is produced by the abrupt transitions between burst and suppression epochs. In fact, the very presence of slow waves on EEG requires synchrony. However, this slow-wave synchrony occurs at a timescale too fast for fMRI to capture, and thus would not directly translate into a global BOLD fluctuation, as burst-suppression does.

      Though the slow-wave state lacks global synchrony on fMRI, it may well exhibit within-module synchrony, as the reviewer suggests. Modules resembling the resting-state networks of wakefulness and sleep have been detected during isoflurane anesthesia in primates (Hori et al., 2020; Hutchison et al., 2011). These experiments were presumably conducted during the slow-wave state: burst-suppression would generate a global network, while the isoelectric state would erase any modular structure. We suspect that functional networks during the anesthetized slow-wave state resemble those present in slow-wave sleep. However, we have not assessed that in our study, since our primary goal was to map burst-suppression.

      Reviewer #3 (Public Review):

      The authors present a multicenter, multimodal rs-fMRI study of the spatial signature of burst suppression in the brain of humans, non-human primates and rats. They have used EEG to identify burst suppression activity in human data from simultaneous EEG-rs-fMRI measurements of subjects under servoflurane anesthesia. After having identified a (neurovascular) rs-fMRI representation of burst activity, the authors show that bursts can equally be identified from MR data alone. After a principal component analysis, bursts and their spatial signature were identified by an asymmetry of the correlation coefficients. Across species the authors identified similar spatial signatures, which were conserved for all (investigated) primates, but differed for rats. While rats showed a pan-cortical involvement, signatures in primates were more complex, e.g., not including the visual cortex.

      In this study, the authors have presented a novel purely MR-based method to identify burst suppression and its spatial signature. Their method may be used to readily identify burst suppression in fMRI data. However, no general threshold for the median of the cortex-wide correlation could be identified. The authors also establish a conserved signature of burst suppression for primates and reveal subtle but important differences to rodents. Both achievements are novel and represent a major advance in the field of neuroimaging.

      The study was well designed, including important control data to rule out artefacts as source of the observed burst suppression patterns. The particular strengths of this study are: (1) including multicentre data (although only rats were scanned at two different sites); and (2) including four species from humans to rats.

      The manuscript was very carefully and well written (I did not even notice a single typo) and the figures were carefully devised, comprehensively illustrating the large amount of data. The authors further provide a comprehensive account of the relevant literature. Towards the end of their discussion they also clarify the difference in terminology used for burst suppression in some recent rodent studies.

      The only (and in my opinion notable) weakness, is the lack of a general threshold for the asymmetry of the median of the cortex-wide correlation coefficients. With such a threshold, rs-fMRI could be readily used to automatically detect burst suppression across species. However, the authors clearly state this shortcoming and openly discuss its implications. I do not think that an altered experimental design or additional data could provide further remedy.

      To conclude: This very comprehensive study was very well designed, extremely carefully performed, presents a novel tool for identification of burst suppression, and provides insight across species. It has clearly translational potential, which however, is limited by the lack of a general threshold for burst suppression detection.

      I congratulate the authors for this very nice piece of work, and the most typo-free manuscript I have ever read.

      We thank the reviewer for the positive and detailed feedback.

    1. Author Response

      Reviewer #1 (Public Review):

      When theta phase precession was discovered (O'Keefe & Recce, 1993; place cell firing shifting from late to early theta phases as the rat moves through the firing field, averaged over many runs), it was realized that, correspondingly, firing moves from cells with firing fields that have been run through (early phase) to those whose fields are being entered (late phase), with the consequence that a broader range of cells will be firing at this late phase (Skaggs et al., 1996; Burgess et al., 1993; see also Chadwick et al., 2015). Thus, these sweeps could represent the distribution of possible future trajectories, with the broadening distribution representing greater uncertainty in the future trajectory.

      Using data from Pfeiffer and Foster (2013), they examine how neurons could encode the distribution of future locations, including its breadth (i.e. uncertainty), testing a couple of proposed methods and suggesting one of their own. The results show that decoded location has increasing variability at later phases (corresponding to locations further ahead), and greater deviation from the actual trajectory. Further results (when testing the models below) include that population firing rate increased from early to late phases; decoding uncertainty does not change within-cycle, and the cycle-by-cycle variability (CCV) increases from early to late phases more rapidly than the trajectory encoding error (TEE).

      They then use synthetic data to test ideas about neural coding of the location probability distribution, i.e. that: a) place cell firing corresponds to the tuning functions on the mean future trajectory (w/o uncertainty); b) the distribution is represented in the immediate population firing as the product of the tuning functions of active cells or c) (DDC) the distribution is represented by its overlap with the tuning curves of individual neurons; d) (their suggestion) that different possible trajectories are sampled from the target distribution in different theta cycles.

      The product scheme has decreasing uncertainty with population firing rate, so would have to have maximal firing at early phases (corresponding to locations behind the rat), contradicting what was observed in the data, so this scheme is discarded.

      The DDC scheme has an increased diversity of cells firing as the target distribution gets wider within each cycle, whereas the mean and sampling schemes do not have increasing variance within-cycle (representing a single trajectory throughout). The decoding uncertainty in the data did not vary within-cycle, so the DDC scheme was discarded.

      The mean and sampling schemes are distinguished by the increase in CCV vs TEE with phase, which is consistent with the sampling scheme.

      The analyses are well done and the results with synthetic data (assuming future trajectories are randomly sampled from the average distribution) and real data match nicely, although there is excess variability in the real data. Overall, this paper provides the most thorough analyses so far of place cell theta sweeps in open fields.

      We thank the Reviewer for the accurate summary and the encouragement.

      I found the framing of the paper confusing in a way that made it harder to understand the actual contribution made here. As noted in the discussion, the field has moved on from the 1990s and cycle-by-cycle decoding of theta sweeps has consistently shown that they correspond to specific trajectories moving from the current trajectory to potential future trajectories, consistent with continuous attractor-based models (in which the width of the activity bump cannot change, e.g. Hopfield, 2010). Thus it seems odd to use theta sweeps to test models of encoding uncertainty - since Johnson & Reddish (2007) we know that they seem to encode specific trajectories (e.g. either going one way or the other at a choice point) rather than an average direction with variance covering the possible alternatives.

      We thank the reviewer for emphasising the connections to earlier work on theta sweeps during decision making, which suggests that alternative options before a decision point are assessed individually by hippocampal neuron populations in a simple maze. However, as also noted by the reviewer below, previous analysis of theta sweeps in the hippocampus were limited to discrete decisions in a linear maze, which only permits a limited exploration of the alternative hypotheses an animal might experience in a planning situation.

      In particular, the dominant source of future uncertainty in a binary decision task is the chosen option (left or right) providing a distinctly bimodal predictive distribution. Bimodal distributions can not be easily approximated by variational methods (that includes the DDC or product schemes) but can be efficiently approximated by sampling. In contrast, in an open field the available options (changes in direction and speed) are not restricted by the geometry of the environment and the predictive distribution is relatively similar to a Gaussian distribution which can be efficiently approximated by all of the investigated encoding schemes.

      Moreover, it has been widely reported that the hippocampal spatial code has somewhat different properties in linear tracks, where the physical movement of the animal is restricted by the geometry of the environment, than in open field navigation. Specifically, in linear tracks most neurons develop unidirectional place fields and the hippocampal population uses different maps to represent the two opposite running directions, whereas a single map and omnidirectional place fields are used in open fields (Buzsaki, 2005). In terms of representing future alternatives, it remains to be an open question if the scheme that is compatible with planning in a 1D environment generalises to two 2D environments. Our detailed comparison of the alternative encoding schemes provides an opportunity to demonstrate that a sampling scheme can be applied as a general computational algorithm to represent quantities necessary for probabilistic planning, while also demonstrating that alternative schemes are incompatible with it.

      Moreover, these previous studies did not rule out the possibility that, in addition to alternating between discrete options, specific features of the population activity might also represent uncertainty (conditional to the chosen option) instantaneously as in the product or the DDC schemes.

      We added a new paragraph (lines 74-88) to the introduction to clarify that one of the novel contributions of the paper is the generalisation of previous intuitions, largely based on work on binary decision tasks in mazes, to unrestricted open field environments.

      The point that schemes that assume varying-width activity distribution might be unfit for modelling hippocampal theta activity is an interesting insight. Let us note that new results have pointed out that the fixed width activity bump is not a necesssary feature of attractor networks. It has recently been shown that in continuous attractors (modelling head direction cells in the fly) the amplitude of the bump can change and the changes can be consistent with the represented uncertainty (Kutschireiter et al., 2021 Biorxiv; https://doi.org/ 10.1101/2021.12.17.473253). We believe that similar principles also apply to higher-dimensional continuous attractor networks and therefore it is entirely possible to represent uncertainty via the amplitude of the bump (equivalent to the population gain) in the hippocampus.

      Thus, the main outcomes of the simulations could reasonably be predicted in advance, and the possibility of alternative neural models of uncertainty explaining firing data remains: in situations where it is more reasonable to believe that the brain is in fact encoding uncertainty as the breadth of a distribution.

      Having said that, most previous examples of trajectory decoding of theta sweeps have not been for navigation in open fields, and the analysis of Pfeiffer and Foster (2013; in open fields) was restricted to sequential 'replay' during sharp-wave ripples rather than theta sweeps. This paper provides the nicest decoding analyses so far of place cell theta sweeps in open field data. However, there are already examples of theta sweeps in entorhinal cortex in open fields (Gardner et al., 2019) showing the same alternating left/right sweeps as seen on mazes (Kay et al., 2020). Such alternation could explain the additional cycle-by-cycle variability observed (cf random sampling).

      We thank the reviewer for encouraging us to more directly test the idea that alternating left right sweeps could explain the increased cycle-to-cylce variability in the data. We thoroughly analysed the data (see our answer to essential revisions 1.) and found that trajectories at subsequent theta cycles are strongly anticorrelated (Fig. 7, Fig. S11, lines 375-415)

      Reviewer #2 (Public Review):

      This study investigates how uncertainty about spatial position is represented in hippocampal theta sequences. Understanding the neural coding of uncertainty is important issue in general, because computational and theoretical work clearly demonstrates the advantages of tracking uncertainty to support decision-making, behavioural work in many domains shows that animals and humans are sensitive to it in myriad ways, and signatures of the neural representations of uncertainty have been demonstrated in many different systems/ circuits.

      We thank the reviewer for the comment.

      However, studies of whether and how uncertainty is signalled in the hippocampus has remained understudied. The question of how spatial uncertainty is represented is already interesting but recent interest in interpreting hippocampal sequences as important for planning and decision-making provide additional motivation.

      A variety of experimental paradigms such as recordings in light vs. darkness, dual rotation experiments in which different cues are placed in conflict with another, "morph" and "teleportation" experiments and so on, all speak to this issue in some sense (and as I note below, could nicely complement the present study); and a number of computational models of the hippocampus have included some representation of uncertainty (e.g. Penny et al. PLoS Comp Biol 2013, Barron et al. Prog Neurobiol 2020). However, the present study fills an important gap in that it connects a theory-driven approach of when and how uncertainty could be represented in principle, with experimental data to determine which is the most likely scheme.

      The analyses rely on the fundamental insight that states/positions further into the future are associated with higher uncertainty than those closer to the present. In support of this idea, the authors first show that in the data (navigation in a square environment, using the wonderful data from Pfeiffer & Foster 2013), decoding error increases within a theta sequence, even after correcting for the optimal time shift.

      The authors then lay out the leading theoretical proposals of how uncertainty can be represented in principle in populations of neurons, and apply them to hippocampal place cells. They show that for all of these schemes, the same overall pattern results. The key advance of the paper seems to be enabled by a sophisticated generative model that produces realistic probability distributions to be encoded (that take into account the animal's uncertainty about its own position). Using this model, the authors show that each uncertainty coding scheme is associated with distinct neural signatures that they then test against the data. They find that the intuitive and commonly employed "product" and "DDC" schemes are not consistent with the data, but the "sampling" scheme is.

      The final conclusion that the sampling scheme is most consistent with the data is perhaps not surprising, because similar conclusions have been reached from showing alternating representation of left and right at choice points cited by the authors (Johnson and Redish 2007; Kay et al. 2020; Tang et al. 2021) and "flickering" from one theta cycle to the next (Jezek et al. 2011). So, the most novel parts of the work to me are the rigorous ruling out of the alternative "product" and "DDC" schemes.

      We thank the reviewer for helping us to clarify the main novelty of our work compared to previous studies. We have updated the introduction (lines ~74–88) to state more clearly how our analysis extends previous work largely restricted to binary decision tasks in mazes and not explicitly considering alternative probabilistic representations.

      Overall I am very enthusiastic about this work. It addresses an important open question, and the structure of the paper is very satisfying, moving from principles of uncertainty encoding to simulated data to identifying signatures in actual data. In this structure, the generative model that produces the synthetic data is clearly playing an important role, and intuitively, it seems the conclusions of the paper depend on how well this testbed maps onto the actual data. I think this model is a real strength of the paper and moves the field forward in both its conceptual sophistication (taking into account the agent's uncertainty) and in how carefully it is compared to the actual data (Figures S2, S3).

      We thank the reviewer for the encouraging words.

      I have two overall concerns that can be addressed with further analyses.

      First, I think the authors should test which of the components of this model are necessary for their results. For instance, if the authors simply took the successor representation (distribution of expected future state occupancy given current location) and compressed it into theta timescale, and took that as the probability distribution to be encoded under the various schemes, would the same predictions result? Figuring out which elements of the model are necessary for the schemes to become distinguishable seems important for future empirical work inspired by this paper.

      The crucial part of our generative model is its probabilistic nature. Explicit formulation of the generative model under different coding schemes enables us to quantitatively account for the different factors contributing to the variability in the data. Specifically, when we compared sampling and mean codes, we partitioned variability of the represented locations across theta cycles into specific factors related to 1) decoding error; 2) difference between the true position of the animal and its own location estimate; 3) the animal’s own uncertainty about its spatial location; 4) updating this estimate in each theta cycle. This enabled us to derive quantities (CCV, TEE and EVindex) that can discriminate between sampling and mean schemes, and that could be directly measured experimentally. This would not be possible in a simpler model lacking an explicit representation of the animal’s internal uncertainty.

      We believe that the assumptions of the model are rather general and those do not limit the scope of the model. Here we list the specific features of the model for clarity (Fig S1a):

      1) Planned position (Fig S1a, left): the planned position is required to guide movements in the model. The specific way we generated the planned position was not essential for the simulations but we tuned the movement parameters to generate trajectories matching the real movement of the animal. It is defined as a random walk process for velocity which is the simplest model for smooth trajectories.

      2) The inference part (Fig S1a, middle) is crucial for the model since we believe that hippocampal population activity is driven by the animal’s own beliefs about its position, which tells our approach apart from earlier studies (see paragraph around line 466). If the animal represents its predictions optimally then the predictions should be consistent with its movement within the environment. Thus, the consistency of the inference is a critical statistical property of the model, which can be guaranteed if the predictions are generated by the same model that is used for inferring the animal’s position. The simplest model that can be used for inference and predictions is the Kalman filter, which we opted for in our simulations.

      3) The assumptions of the encoding model (Fig S1a, right and Fig 1b) are solely determined by the representational scheme being tested. All of the schemes rely on encoding the result of inference in population activity during theta cycles and the scheme determines how this encoding happens. This part of the model is clearly necessary for the analysis.

      Alternatively, we could use the above mentioned successor representation (SR) framework (Dayan 1993) to represent possible trajectories and their associated uncertainty in our models of hippocampal population activity. However, this option introduces extra challenges: First, in the SR framework (Stachenfeld et al., 2017) neuronal firing rates are proportional to the discounted expected future number of times a particular location is going to be visited given the current policy and position. Thus, the SR does sum over all possible future visits and does not specify when exactly a particular state might be reached in the future which is inconsistent with the idea that trajectories are represented during theta sequences. Second, the SR represents the probability of occupying all future states in parallel without providing possible trajectories defining specific combinations of future state visits. This property is consistent with the product and the DDC encoding schemes but not with the other two. These two properties of the SR implies that this framework per se does not provide a fine-scale temporal description of how expected future state probabilities are related to the dynamics of the hippocampal population activity during theta oscillation.

      Taken together, implementing theta time-scale dynamics using the SR framework would also require several additional model choices to generate consistent temporal trajectories from the expected future state occupancies, and even in this case the subjective uncertainty of the animal would not be consistently represented in the simulated data. Representing the animal’s subjective uncertainty in our model was an important component in contributing to the EV-index and had profound implications on the signatures of generative cycling in a two dimensional arena.

      We have to note that on a slower time scale (calculating the average firing rate over multiple theta cycles) all of our encoding schemes are consistent with the SR framework (line 548).

      Second, the analyses are generally very carefully and rigorously performed, and I particularly appreciated how the authors addressed bias resulting from noisy estimation of tuning curves (Figure S7). However, the conclusion that the "sampling" scheme is correct relies on there being additional variance in the spiking data. This is reminiscent of the discussions about overdispersion and how "multiple maps" account for it (Jackson & Redish Hippocampus 2007, Kelemen & Fenton PLoS Biol 2010), and the authors should test if this kind of explanation is also consistent with their data. In particular, the task has two distinct behavioral contexts, when animals are searching for the (not yet known) "away" location compared to returning to the known home location, which extrapolating from Jackson & Redish, could be associated with distinct (rate) maps leading to excess variance.

      We thank the reviewer for this constructive comment. We note that the signature of the sampling scheme is variability in the decoded trajectory across subsequent theta cycles while overdispersion is usually defined as the supra-Poisson variability in the spiking of individual neurons evaluated across multiple runs or trials. Nevertheless, we tested the existence of multiple maps corresponding to the two distinct task phases and found that the maps representing the two task phases are very similar (Fig S11).

      Such an analysis could also potentially speak to an overall limitation of the work (not a criticism, more of a question of scope) which is that there are no experimental manipulations/conditions of different amounts of uncertainty that are analyzed. Comparing random search (high uncertainty, I assume) to planning a path to a known goal (low uncertainty) could be one way to address this and further bolster the authors' conclusions.

      We agree with the reviewer that the proposed framework provides additional insights into the way the population activity should change with specific experimental manipulations and can therefore inspire further experiments. In particular, a hallmark of probabilistic computations is that experimental manipulations that control the uncertainty of the animal should be reflected in population responses. In the visual processing such manipulations are indeed reflected in changing response variability, as predicted by sampling (Orban et al, Neuron 2016). In the current experimental paradigm there was no direct manipulation of uncertainty (we discuss this around lines 573-576). While one might argue that there are differences in the planning strategy in trials where the animal was heading for away reward and in those heading for home, this is not a very explicit test of the question. Still, to check if we can find traces of changes in uncertainty in the two conditions, we analysed the EV-index separately on home and away trials (Fig. S11e). We did not find systematic differences in the EV-index across these trial types.

      Reviewer #3 (Public Review):

      Summary of the goals:

      The authors set out to test the hypothesis that neural activity in hippocampus reflects probabilistic computations during navigation and planning. They did so by assuming that neural activity during theta waves represents the animal's location, and that uncertainty about this location should grow along the path from the recent past to the future. They next generated empirical signatures for each of the main four proposals for how probabilities may be encoded in neural responses (PPC, DDC, Sampling) and contrasted them with each other and a non-probabilistic representation (scalar estimate of location). Finally, the authors compared their predictions to previously published neural activity and concluded that a sampling-based representation best explained neural activity.

      Impact & Significance: This manuscript can make a significant impact on many fields in neuroscience from hippocampal research studying the functions and neural coding in hippocampus, through theoretical works linking the representation of uncertainty to neural codes, to modeling experimental paradigms using navigation tasks. The manuscript provides the following novel contribution to cognitive neuroscience:

      • It exploits the inherent change in uncertainty about a parsimonious internal variable over time during planning to test hypotheses about probabilistic computations.
      • A full model comparison of competing hypotheses for the neural implementation of probabilistic beliefs. This is a topic of wide interest and direct comparisons using data have been elusive.
      • The study presents substantial empirical evidence for a sampling-based neural representation of the probability distribution over trajectories in the hippocampus, a finding with potential implications for other parts of neural processing. Strengths:
      • Creative exploitation of a naturally occurring change in uncertainty over a parsimonious latent variable (location).
      • Derivation of three empirical signatures using a combination of analytical and numerical work.
      • Novel computational modelling & linking it to neural coding using 4 existing implementational models
      • Comprehensive and rigorous data analysis of a large and high-quality neural dataset, with supplemental analyses of a second dataset
      • Mostly very clear and high quality presentation We thank the Reviewer for the summary and for the positive feedback on the manuscript. Weaknesses:
      • It is unclear to what degree the "signatures" depend on the details of the numerical simulation used by the authors to generate them. At least two of them (gain for the product scheme and excess variability for the sampling scheme) appear very general, but the degree of robustness should be discussed for all three signatures.

      The generality of the signatures follows from the fact that we derived them from the fundamental properties of the encoding schemes. We tested their robustness using both idealised test data (Fig S6c-d, Fig S7b) and our simulated hippocampal model (Fig. 4c, Fig5b-c, Fig6b-g).

      The reviewer is right that the sensitivity and robustness is a potential issue. These schemes have been originally proposed to encode static distributions ie., the neuronal activity was supposed to encode a specific probability distribution for an extended period of time. Therefore, when we test the signatures we make the simplifying assumption that a static distribution is encoded in the three separate phases of the theta cycle. It is currently unknown whether during theta sequences the trajectories are represented via discrete jumps in positions or as continuously changing locations. Therefore we used our numerical simulations to test whether the proposed signatures are sufficiently sensitive to discriminate the encoding schemes using the limited amount of data available and in the face of biological noise but also robust to the parameter choices and modelling assumptions.

      Regarding the product code, the inverse relationship between the gain and the variance has been previously derived analytically for special cases (Ma et al., 2006). In the manuscript we show numerically that the same relationship holds for general tuning curve shapes (Fig. S6d). Finally we demonstrate that the gain is a robust signature that changes systematically along the theta cycles in the case of a product coding scheme.

      Second, in the case of the DDC code we used the decoded variance of the posterior as the signature. Since DDC code relies on the overlap between the target distribution and the neuronal basis functions, potentially the most important source of error is if we overestimate the size of the encoding basis functions. To control for this factor, we first explored this effect in an idealised setting (in fig S7) and found that the decoded variance correlates with the encoded uncertainty both if we used the estimated basis functions or the empirical tuning curves for decoding. Next we performed the analysis in our simulated dataset in 4 different ways - either using empirical tuning curves (Fig 5c-d) or the estimated basis functions (Fig S8a-b), focusing on high spike count theta cycles or including all theta cycles. The fact that all these analyses led to similar results confirms the robustness of this signature.

      Our third measure, the EV-index measures the variability of the encoded trajectories across theta cycles. The cycle-to-cycle variability is also affected by factors independent of whether a randomly sampled trajectory or the posterior mean is encoded. In particular, the encoded trajectory can start at different distances in the past and can be played at different speeds in different theta cycles. These factors are probably present in the data and all inflate the CCV. Another factor is the start and end time of the trajectories, which we may not be able to accurately find in the real data and confusing the end of a previous trajectory with the start of a new one can also inflate CCV. In our simulations we tested how these potential errors influence our analysis, and found that the EV index is surprisingly robust to such changes (Fig 6fg). An additional factor that the EV-index is sensitive to is the specific sampling algorithm used to sample the posterior: an algorithm that produces correlated samples is hard to distinguish from the MAP scheme. Our newly introduced analysis (Fig 7b) demonstrates this and explores the level of correlation between subsequent trajectories, providing evidence that trajectories decoded during exploration reflect the properties of anticorrelated samples, also a signature of efficient inference.

      • The claims about "efficiency" lack a definition of what exactly is meant by that, and empirical support.

      We thank the reviewer for pointing out this inconsistency in our terminology. What we generally meant by efficiency was a claim that pertains the computational level, according to Marr’s classification, i.e.that computations are probabilistic, that is, representation in the hippocampus takes into account uncertainty by representing a full posterior distribution. We performed an additional test, which concerns the algorithmic-level efficiency of the computations. We explored the efficiency of the sampling process by assessinga signature of efficientsampling, the expected number of sampled trajectories required to represent the distribution of possible future locations. We found that subsequent samples tended to be anti-correlated which is a signature of efficient sampling algorithms (Fig 7). In the revised manuscript we thus use the word efficient solely when we refer to the anticorrelated samples.

    1. Author Response:

      Reviewer #2:

      The authors investigated changes in the unstressed and stressed oligomeric states of the mammalian endoplasmic reticulum (ER) stress sensor, IRE1a. Previous biochemical and microscopy studies in mammalian cells and studies of the related protein Ire1 in yeast, describe an increase in oligomerization of the stress sensor upon treatment of cells with chemical agents that impair the ER protein folding environment. The general view has been that IRE1 in unstressed cells is a monomer and varying degrees of misfolded protein stress stimulate dimerization, activation, and higher order oligomerization. Distinguishing between monomers and dimers, as well as tetramers or other small oligomers is technically challenging, especially for integral membrane proteins. To address this challenge, the authors turned to single particle tracking fluorescence microscopy of Halo-tagged endogenous IRE1. Using a clever combination of random labeling with two fluorescent dyes and oblique angle illumination to visualize single molecules, as well as dimers, the authors surprisingly find that their endogenous IRE1 reporter appears to be dimeric in homeostatic cells. This observation challenges the predominant model in which IRE1 is monomeric in unstressed cells and that even dimerization represents a switch into an active state. The authors claim to detect evidence for higher order oligomers following treatment with stressors. The authors then use a series of IRE1 mutants to identify how oligomerization is regulated and present a new model to reconcile the different models of IRE1 activation in the literature.

      The authors have extensively characterized their novel experimental system in terms of protein expression levels, functionality, and ability to distinguish monomers and dimers. The data are well presented and the authors are clearly familiar with the arguments that have surrounded the IRE1 oligomer question. That the authors observe the characteristic XBP1 mRNA splicing activity in the absence of visible large IRE1 clusters may suggest that the large clusters reported by others may have distinct roles, perhaps in more permissive mRNA cleavage.

      The present study is undermined by two major weaknesses. First, while the authors persuasively demonstrate that they can detect IRE1a dimers, a major claim of the manuscript rests upon detection of tetramers and possibly higher order oligomers. Unfortunately, the authors provide no independent controls to show what tetramer or higher order oligomer data would look like. Thus, the authors can only infer that higher order oligomers are detected, based on modest shifts in the percent of correlated particle trajectories observed in some cells. More robust evidence is needed to make claims of oligomerization. Tools have been developed by others that can induce reversible oligomerization of proteins. Application of these tools would provide powerful controls for tetramers or even higher order oligomers in this study.

      The second, deeper concern, is the discrepancy between the Halo Tag clustering results in this study and studies by this lab and several other labs that report a distinct stress phenotype. In mammalian cells and yeast, IRE1 and Ire1, tagged with different fluorescent proteins or even a small HA peptide epitope tag, undergo quantitative visible formation of puncta or clusters upon treatment with stressors. The small number of bright clusters that form effectively deplete the rest of the ER of IRE1 signal. In the present study, the authors observe no visible change in IRE1-Halo localization in stress cells. The authors do not investigate the cause of this difference. While one might argue that the presence of stress-inducible IRE1 activity is sufficient to argue that the reporter in this study is functional, IRE1 reporters (that do cluster) described in previous studies by the Walter lab and other groups are also demonstrably functional. Does IRE1 normally cluster? Is it cell-type dependent? Tag-dependent? Notably, the Pincus et al. PLoS Biology paper from the Walter lab used two different fluorescent protein tags that do not heterozygously dimerize. Robust colocalization and FRET signals were detected upon treatment of cells with stressors and clustering was subsequently observed. A 2007 Journal of Cell Biology study from Kimata et al. reported clustering in yeast with an Ire1 tagged with an HA epitope peptide. The HA peptide seems unlikely to be prone to any oligomerization propensities that GFP tagged reporters might experience. Importantly, a 2020 PNAS paper from the Walter lab (Belyy et al.) studied clustering of a robustly monomeric mNeonGreen-tagged IRE1 in U2-OS cells and mouse embryonic fibroblasts and this construct readily clustered following stress induction.

      When evaluated against the backdrop of the extensive literature describing the visual behavior of IRE1a in live cells, the absence of stress-induced clustering is both puzzling and disconcerting. Given the focus of this study is to use visual techniques to study IRE1a interactions, the burden of proof is on the authors to resolve this significant discrepancy with the rest of the IRE1a literature. One can easily imagine that incorporation of the majority of the pool of IRE1a into 10-100 clusters could produce very different correlated trajectory behavior. Until the authors can determine why their reporters behave differently from other IRE1a reporters and establish which version accurately reflects physiologic IRE1a behavior, the potential impact of the findings of this manuscript are of unknown value.

      We thank the reviewer for this detailed assessment of our work. We agree that the question of apparent discrepancy in the formation of observable IRE1 clusters between this manuscript and earlier work is important. We have now addressed this issue both in the revised version of the manuscript and in specific point-by-point responses to reviewers’ comments. As a brief summary, we addressed the reviewer’s first concern (lack of controls larger than dimers) by cloning and validating a tetrameric HaloTag construct, the measurements from which were entirely consistent with the model we presented in the original version of the manuscript. To address the reviewer’s second concern, we present several lines of evidence showing that the discrepancy between the formation of microscopically visible IRE1 clusters in earlier studies and the absence of such clusters in the present work almost certainly results from differences in expression levels. First, our IRE1-HaloTag construct is perfectly capable of forming stress- induced clusters, as we show in the new Figure 1 – Figure Supplement 3. Second, we point to a parallel study by Gómez-Puerta et al., who demonstrate that a more “conventional” IRE1-GFP construct does not form visible stress-dependent puncta when it is expressed at a low level comparable to that of untagged IRE1 in HeLa cells, despite being fully active. Third, our earlier work in the 2020 PNAS paper referenced by the reviewer actually showed that even in the overexpression context, IRE1-mNeonGreen only forms visible puncta in just over half of all cells, despite the fact that XBP1 processing is nearly 100% effective in bulk assays. Furthermore, in the same paper we show that, rather than all IRE1 molecules being sequestered in clusters, only a small fraction (~5%) of IRE1-mNeonGreen assembles into large puncta while the remaining 95% of IRE1 stays uniformly distributed throughout the ER. Taken together, we believe that IRE1 does have the propensity to assemble into larger clusters when its expression levels are high (regardless of the tag used), but that these clusters are not strictly required for its activation. We have made significant changes to the discussion section of the manuscript to clarify the above points and directly address the apparent discrepancy between the present work and earlier studies.

      Reviewer #3:

      In this paper, the authors' aim was to test how IRE1's oligomerization state relates to its activation status without relying on ectopic overexpression. The principle underlying the work is a rather simple one, which is that, if the population of IRE1 can be labeled stochastically with either of two different fluorescent probes, then if the protein dimerizes, presuming single molecules can be visualized, correlated migration of a spot of each fluorophore should be observed for some of those dimers. Any correlated migration, maintained for long enough, will by necessity by some sort of dimer or multimer. In principle, if my math is right, the correlation should be 50% of spots of each color, assuming all the molecules are in a dimer, all molecules are labeled with one fluorophore or the other, and the koff of the fluorophores is very low. In practice, the correlation appears closer to 10%, which the authors establish using a control molecule that should not dimerize except by chance, and another for which pseudo-dimerization is enforced due to the two HALO domains used to bind the fluorophores being conjugated to the same molecule in cis. Much of the paper is devoted to establishing the fundamentals of the system. For these experiments, the authors replaced endogenous IRE1 with the HALO-tagged version to generate near-normal expression and show that the IRE1-HALO behaves similarly to endogenous. They also show that correlated migration is observed in the dimer control to a much greater extent than in the monomer.

      Using these findings, they demonstrate, in my mind quite conclusively, that IRE1 exists as a dimer even in the unstimulated state. During ER stress, the authors observe a state that is more highly ordered. Mathematical modeling suggests a transition from predominantly dimers to a mix of dimers and something more highly ordered, with tetramers being the simplest explanation. Satisfyingly, a mutation that breaks the known dimer interface causes the protein to exist solely in monomers, as does deletion of the IRE1 lumenal domain, while disrupting the oligomerization interface keeps the protein as dimers. Mutation or deletion of the kinase and RNase domains does not affect higher order status, suggesting that activation of these domains is not a prerequisite for assembly. It is clear from this that the central claims of the paper, which is that IRE1 exists in a dimer in the basal state and transitions to a higher ordered structure in the activated state, are supported. Moreover, the general approach is likely to be appealing to the study of other molecules activated by multimerization.

      We thank the reviewer for this thoughtful and helpful analysis of our work.

      The principal advance of the paper is the technological approach for tracking IRE1 (and, presumably, other molecules whose activity is regulated by dimerization). The approach is quite elegant for that purpose. Its impact in terms of conclusions about IRE1 is perhaps less clear. The authors rationalize their endogenous-replacement approach by describing how their previous efforts and those of others relied on ectopic overexpression of GFP-tagged IRE1. The authors take great pains to claim that the observed multimerization status of the IRE1-HALO constructs is not a function of expression level, which would imply then that expression level alone is not responsible for the previously observed IRE1 oligomeric puncta. It is not clear why exactly the authors' results differ from this group's previous studies on the topic nor where the truth lies, including whether something inherent to the GFP-tagged overexpression approach favors non-physiologic structures, whether the difference is fundamentally one of cell type, or whether multimerization and activation are correlated but not causally related, with multimer-breaking mutations killing IRE1 by some other mechanism.

      The question of reconciling our present data with earlier work (including work from our group) is clearly and understandably a central question for all three reviewers. As we detailed above in our responses to reviewers 1 and 2, we are convinced that the formation of large IRE1 clusters is largely dependent on expression level rather than the differences between fluorescent protein tags and the HaloTag. We added new supplementary figures and substantially revised the text of the manuscript to address this question directly.

      Interpreting the data is also complicated by the fact that, while the authors point out that the percent of correlated trajectories (i.e., the measurement of multimerization state) does not itself correlate with expression level (using trajectories-per-movie as a proxy), the proper conclusion from that lack of correlation is not that variance in expression level does not account for the changes in apparent multimerization status, but instead that it cannot be the only factor. In some sense, the authors are attempting to play the argument both ways, by arguing that expression level matters for IRE1 activation (from previous studies) and that it doesn't (from this study). I think to address this the authors will need to better account, one way or another, for why the findings presented here differ from their previous findings and why these are the more salient (if in fact they are).

      This is a very important point, and we thank the reviewer for raising it. We are not arguing that expression levels do not matter for the formation of oligomers; quite the contrary, as detailed above and in the revised version of the text, we believe that the formation of massive IRE1 oligomers observed in previous studies and in the new Figure 1 – Figure Supplement 3 is mainly a function of elevated concentration. What we do claim is that our approach can reliably pick out oligomeric differences within the relatively narrow range of concentrations used for single-particle tracking experiments in this paper. We are using the very weak truncated CMVd3 promoter in all transient transfection experiments, and we are only analyzing data from cells that have a comparable density of single-molecule spots to the density we observe in endogenously tagged IRE1-HaloTag cells. In fact, the metric of “trajectories per movie” used as a proxy for expression levels in Figure 5 – Figure Supplement 1 is an overestimation of the true variability of expression levels, since each movie only covers a small fraction of each cell’s area and the number of observed molecules varies depending on cell morphology. Practically speaking, all cells that we image have expression levels that are clustered together rather narrowly, roughly within differences of no more than a factor of 3. These levels, in turn, are significantly lower than the expression levels used in earlier papers by our group and others.

      The other somewhat substantial issue is that there is no control for what higher order structures look like. The authors give no sense for the dynamic range of the multimerization assay. I would presume that tetramers would show a higher percentage of correlated trajectories than dimers, and octamers higher still, and that the mathematical model accounts for this theoretical possibility in calculating an average protomer number of 2.7 in the stress condition, but it would be better to see that in practice; at first glance it would seem that engineering a tetrameric and/or higher order control and validating it would be straightforward.

      This is another great point raised by all reviewers. In the revised version of the manuscript, we engineered a new tetrameric control construct (See Figure 2 – Figure Supplement 1), the results from which agree remarkably well with the mathematical model we developed in the original version of the manuscript (see Figure 2 – Figure Supplement 3)

      Lastly, the data analysis lacks statistical justification for its conclusions. I presume given the high number of readings that the observed changes are all statistically significant, but that should be indicated, as in most cases the 95% confidence intervals shown are overlapping.

      This is another excellent point. The reviewer is correct that all relevant conclusions are statistically supported by the data, and our analysis code immediately calculates pairwise p- values for every plot using one of several relevant tests. Our preferred test is the permutation test, since it makes no assumptions about the underlying distributions being compared. To avoid cluttering the main plots, we have included tables of pairwise p-values for each plot in the revised version of the manuscript.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Responses to reviewers’ comments are in blue text, original reviewers’ comments in black text.

      Response to Reviewer 1.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): In this manuscript Neiro et al. aim to expand our knowledge on the regulation of gene expression in stem cells of the planarian model organism. As a first step the authors used published available data to expand the repertoire of the planaria transcriptome. By combining 183 RNAseq datasets the authors were able to identify thousands of new coding and non-coding transcripts. They then screened for TF motifs in the new annotations, identifying 551 putative TFs, of which 248 were already described in the planarian literature. The most substantial contribution of this work to the field of stem cells and planaria biology is the characterization of new putative enhancers that were identified by performing H3K27ac ChIP-seq and ATAC-seq and combining these data with previously published H3K4me1 ChIPseq dataset.

      We thank the reviewer for their careful assessment of our work, we agree that the identification of likely enhancers genome wide is a substantial contribution. Equally the improved annotation of all genes, including transcription factors we choose to focus on here, is a substantial step forward for the planarian research community.

      By overlapping H3K27ac and H3K4me1the authors find 5,529 new enhancers, for which they report a higher chromatin accessibility than random points in the genome as assessed by ATAC-seq. By using ATAC-footprints Neiro et al. refined the subset of TFs that have binding motifs in the predicted enhancer-like regions and present a list of 22,489 such factors. The manuscript is well written and organized and overall, the reported data will provide an important resource to study gene expression regulation in planaria's stem cells. However, this manuscript would greatly benefit from some functional validation to support the predicted gene regulatory networks. One option would be to use a CRISPR-dCas9-KRAB system to silence the putative enhancers identified in the manuscript and check by qPCR the expression of nearby genes.

      Currently mis-expression technologies, in order too directly test enhancer elements in driving expression, are still not available in planarians. This also preempts us using the suggested silencing system used in mammals and other animals with robust mis-expression tools.

      If this type of experiment is not feasible in planaria (I am not an expert in this model organism) another simple but key experiment would be to perform a knockdown of one (or more) putative enhancer-bound TFs identified in this study followed by RNA-seq. This would allow the authors to verify what are the target genes of the putative enhancer-bound TFs and if they correspond to the predicted gene networks they identified. Simultaneously, this experiment would allow the authors to verify if there are any changes in the expression of differentiation/pluripotency markers as a result of the knockdown of the putative enhancer-bound TF.

      These experiments are possible, but this would be the work of many labs in the future expert in studying those TFs and their roles in planarian stem cells and regeneration. However, what we can do is analyze existing RNA-seq data further. There are a number of studies where TF have been studied and RNA-seq performed after RNAi. Although these studies are performed in specific experimental regenerative contexts, and not specifically in stem cells, it will be possible to look at expression changes of genes with predicted enhancers bound by these TFs. We propose to execute this analysis and add it to the manuscript, rather than perform further TF RNAi experiments. This analysis is feasible within a 3-month revision time. We would add that currently their no genes are implicated in controlling pluripotency in the same way we might consider, for example, OSKM in mammals. Our identification of the TFs enriched in stem cell expression and implicated in binding predicted enhancers suggests future candidates.

      Minor revision: • The authors have mostly focused on the identification of enhancer-bound TFs. However, it would be interesting to look at differential enrichment of TFs in promoters versus enhancers and identify if there are specific factors that are enriched specifically at the planarian newly identified enhancer regions.

      We have not looked at potential TF binding sites near promoters/transcriptional start sites. We will try to add an analysis that considers this in our revision.

      • All tornado plots are missing a colorbar (Fig3 and FigS2)

      We will fix this error

      • There is a typo in the discussion: "the combined use of chip-seq data, RNAi of a histone methyltransferase combines with chip-seq" should be changed to "combined".

      We will fix this and other typographical errors.

      Reviewer #1 (Significance (Required)):

      The manuscript is well written and organized and overall the reported data will provide an important resource to study gene expression regulation in planaria's stem cells.

      We thank the reviewer for their appreciation of our work

      **Referees cross-commenting**

      I agree with the other reviewers that additional functional data should be added to support the author's claims (such as knock down of potential TFs that are identified by computational analyses and assessing the impact on gene expression).

      See response above, with regard to adding further analysis for testing this possibility.

      In addition, as noticed by the third reviewer, all data should be made publicly available to the scientific community.

      We have made all data publicly available and will submit all relevant data to public database repositories in advance of final publication after final peer review.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      This manuscript aims at identifying enhancers in the planarian Schmidtea mediterranea. The authors start with the integration of transcriptome with genome sequencing data to more precisely annotate the genome of the planarian Schmidtea mediterranea. The second part of the manuscript actually then deals with the identification of potentially active enhancer elements in adult stem cells of this regenerating organism using genomic techniques like ATAC-seq and ChIP-seq of histone marks combined with motif searches and in silico footprint analysis. Using these data, the authors predict regulatory interactions potentially critical for pluripotency and regeneration in planarian adult stem cells.

      MAJOR COMMENTS:

      • Are the key conclusions convincing? 1) The authors claim (already in the abstract) that their study identifies enhancers regulating adult stem cells and regenerative mechanisms. This is an over-statement found throughout the manuscript, as none of these enhancers are functionally tested nor is it shown that target gene expression changes when transcription factors predicted to interact with such enhancers are knocked down.

      We agree and it was not our intention to overstate our results, this is why we have tried to refer to putative enhancers, enhancer-like elements etc in manuscript from the title onwards. Only once we have demonstrated a set of elements with key conserved and widely supported characteristics do we suggest we have a set of higher confidence enhancers to study. However, we will adjust the manuscript to reflect that our claims await direct testing as is the case for all enhancers implicated with the approaches used here.

      Another example is at the end of paragraph 1 of section 2.4. Here the authors claim that identifying many fate-specific transcription factor genes in the vicinity of potential enhancers is a further proof that the identified regions represent "real enhancers". It strongly supports this hypothesis, but no evidence for real enhancer activity.

      We agree the total body of evidence strongly supports that we have identified enhancer elements, but as above will adjust the language to suggest further directed functional work will follow from many groups.

      Thus, although the authors state that the regulatory interactions and networks they predict from their data can be studied now in future, they should be more careful with their wording and correct these over-statements. Therefore, the key conclusion is that they identified by various techniques potential enhancers, which are close to genes controlling adult stem cells and potentially controlling these genes, which has to be shown by further analyses.

      We agree

      Thus, also the title needs to be changed.

      We propose changing ‘enhancer-like’ to “predicted enhancers” in the title, and "defines" to "predicts" as well as broadly adjusting the text to caveat that further work will clarify their functions and roles.

      The authors have no proof that the networks are active in planarian adult stem cells, as they do not show that the predicted networks are active in the presented way.

      We agree, see comments above. It was not our attention to claim we are showing pathways that were definitely active, rather predicted by our experiments and analyses of the data from these experiments.

      2) Similarly, the identification of TF motifs within these potential motifs strongly suggests but not shows that these factors are binding, even when these sites were found to be bound by a protein using the ATAC-seq footprinting analysis. Thus, the authors need to be careful with their wording. One example is in the second paragraph of section 2.5, where the authors write that "We found that numerous FSTFs were binding to putative intronic enhancers ... ". The motif suggests that these factors bind, however, they have no experimental confirmation that these sequences are indeed bound by the planarian TFs.

      We agree. We will clarify that ATAC foot printing is the only data suggestive of these motifs being bound and that further experiments will be required for more evidence. We will state this in the section of results and add this explicitly to the discussion

      In sum, this manuscript uses existing genomic tools to define potential enhancer regions in the planarian Schmidtea mediterranea. The manuscript is informative yet descriptive, as tit presents no functional evidence for any of the predictions. If further toned down, the key conclusions are valid.

      Future functional experiments to test the roles of all TFs and enhancers is now possible due to our work.The combination of data and analyses provides strong support of enhancer elements activity in stem cells across the genome.

      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? The experiments performed are well designed and in line with what is known in the field about enhancer architecture. However, as this model system is not very well characterized on that level and the authors do not provide real experimental evidence that any of the identified regions has really enhancer activity and that any of the identified motifs binds indeed the predicted TF, the authors need to be very careful with their statements. The authors should maybe emphasize even stronger that all the GRNs predicted under section 2.6 are really preliminary and need to be validated.

      Yes, we are happy to be even clearer about this as the reviewer suggests

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. One experiment that could provide more evidence for their predicted regulatory interactions is to knock-down one of the FSTFs for which motifs have been identified in potential enhancer regions and to study expression of associated genes (to confirm that the enhancers potentilla bound by these TFs control the expression of associated genes) or by analyzing the chromatin status of selected chromatin regions (by Q-PCR). These experiments would strongly support the claims of the authors. However, it also depends strongly on the journal whether I would consider these experiments essential or "nice to have".

      This suggestion of possible extra experiments is very similar to that of Reviewer 1. We are copying our earlier comment as this also addresses this point.

      “These experiments are possible, but this would be the work of many labs in the future expert in studying those TFs and their roles in planarian stem cells and regeneration. However, what we can do is analyze existing RNA-seq data further. There are a number of studies where TF have been studied and RNA-seq performed after RNAi. Although these studies are performed in specific experimental regenerative contexts, and not specifically stem cells, it will be possible to look at expression changes of genes with predicted enhancers bound by these TFs. We propose to execute this analysis and add it to the manuscript, rather than perform further TF RNAi experiments. This analysis is feasible within a 3-month revision time. We would add that currently their no genes implicated in controlling pluripotency in the same way we might consider OSKM in mammals. Our identification of the TFs enriched in stem cell expression and implicated in binding predicted enhancers suggests future candidates.”

      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. This reviewer is not an expert in Schmidtea mediterranea, thus it is hard to judge how time consuming these experiments would be. Cost-wise they should be feasible, as it would include primarily Q-PCR experiments. And some functional back-up of their claims would be very helpful.

      See previous comment regarding additional analysis.

      • Are the data and the methods presented in such a way that they can be reproduced? For the parts I can judge, yes.

      • Are the experiments adequately replicated and statistical analysis adequate? It is not clear from the manuscript how many replicates of the ChIP-seq experiments were done.

      Chip-Seq replicate data description will be explicitly added to the methods

      MINOR COMMENTS:

      • Specific experimental issues that are easily addressable.

      • Are prior studies referenced appropriately? For the literature I can judge, yes.

      • Are the text and figures clear and accurate? The figures are clear, the text (besides over-statements) is clear. However, the writing can be improved. A few examples: section 2.2 paragraph 1: "... we found 248 to be described in the planarian literature in some way." In which way described?; same paragraph: "... but significantly we could identify new homologs of ..." what does significantly mean? Which test etc? section 2.2, last paragraph: "Most TFs assigned to the X1 and Xins compartments and the least to the X2 compartment", "Very few TFs had expression in X1s and Xins to the exclusion of X2 expression as would be expected by overall lineage relationships"; what do these sentences mean?

      We thank the reviewer for paying careful attention to the language in our manuscript throughout. We will provide clearer explanation of the sentences indicated. We will better explain terms specific to the planarian model system that are obviously not intuitive

      . - Do you have suggestions that would help the authors improve the presentation of their data and conclusions? No over-statements.

      See previous comments agreeing with the need to carefully adjust our language to avoid this

      Reviewer #2 (Significance (Required)):

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. This manuscript identifies genome-wide potential enhancers in adult planarian stem cells, and thus represents a very valuable resource for the community to study these enhancers and the gene regulatory networks they control in the future.

      • Place the work in the context of the existing literature (provide references, where appropriate). As I am not a planarian scientist, it is hard to judge this part.

      • State what audience might be interested in and influenced by the reported findings. In my opinion, this work will be primarily interesting for people working with planarian. When functional data exist, this might be also interesting for researchers working generally on regeneration.

      Given the nature of our data we also think all groups working on animal stem cells would be interested in our data and analyses

      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. My field of expertise is transcriptional regulation using genomic techniques, however I am not familiar with the model Schmidtea mediterranea.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Neiro et al. capitalize on existing genomic data for the planarian Schmidtea mediterranea and new ChIP-seq and ATAC-seq data to use computational approaches to identify putative enhancers in the planarian genome. They integrate analysis of enhancers with transcription factor binding sites to generate testable hypotheses for the regulatory function of transcription factors active in stem cells or control of cell lineage trajectories. Their work creates an excellent resource for future work to resolve the regulatory logic underpinning stem cell biology and tissue regeneration in planarians.

      We are glad the reviewer likes our research.

      Major: Overall, the work in this manuscript and methodology are well executed and presented. However, the authors should consider the following comments to improve the clarity and accessibility of the data and interpretations.

      1) The new transcriptome does not appear to be publically accessible. The links to Github resources are broken, and there is nothing on Neiro's Github page. Will the new transcriptome be integrated with Planmine?

      The new annotation has been available for over a year as we wished the community to have access to it ASAP (see Garcia Castro, 2021, Genome Biology https://doi.org/10.1186/s13059-021-02302-5). We tested the links in the paper before depositing our preprint and after review and they seemed to work for us both within and outside our institutional network. We can only apologize if they were broken or have not worked for the reviewer. We are unclear if this new annotation will be included in Planmine, but we will ask the colleagues maintaining this database to consider including it.

      2) Figure 1: Ternary plot in 1F. The legend is not clear or could be explained better. What is the metric? It could be my misunderstanding, but I didn't consider the ternary plots as insightful or unnecessary. Perhaps the authors can expand on what they are showing.

      These plots are important in demonstrating the distribution of mRNA expression of all genes across cell sorted compartments. Given the broad lineage relationship between sorted cell compartments This analysis allows us to identify genes expressed predominantly in one cell compartment or another, or across a specific transition. For example, genes enriched in X2 cells and Xins, but not X1 are likely to be enriched in post-mitotic differentiating progeny and differentiated cells. In contrast to single cell data where expression data can be sparse this analysis with bulk data allows identification and assignation of low expressed genes, like transcription factors. We will provide further explanation of this in the revised text.

      1I is a map of exons, not alternative splicing. So, it isn't clear what the authors intend t show. Are the specific exons that are more likely to be spliced? Is the figure necessary?

      We wish to demonstrate the power of annotation approach and the richness of the annotation for looking at alternate splicing. We propose to a more informative figure that indicates the variety of splice forms. We apologize for this oversight.

      3) Figure 2: 2A labels Xins as irradiation responsive. Is this the case (just making sure)?

      The reviewer is correct, this is wrong! This should read “irresponsive” or “irradiation resistant” In Figure 1A. We thank the reviewer for spotting this error. We will fix this.

      2F-G: Ternary plot in F seems redundant with G, but that could be my lack of understanding. In 2G, what is represented on the plots on the right of the hierarchical clusters?

      The ternary plot (2F) and heatmap of hierarchical clustering (2G) are complementary ways to visualize the proportional expression values of transcription factors. The ternary plot (2F) allows an overview of all the proportional expression values, while the heatmap (2G) shows how the proportional values may be grouped into clusters of similar expression profiles and displays the relative size of these clusters. For example, the heatmap shows that the clusters of X1 and Xins are more prominent than X2, suggesting that there are realtivey a few X2-specific transcription factors. We will add text to better to explain this difference.

      4) Figure 3: The heat maps need a legend (i.e., please define the colors). In addition, labeling the figures could help the reader. For example, in G-J, a header about the different experiments above each map, such as "enhancers" and "random," etc., would make the figure more accessible.

      We agree we label the figures to be more easily interpretable and provide an independent scale and legend for the heatmaps.

      5) Figure 5: Although it is in the figure legend, the authors could label the 6th track as "RNA-seq in X1."

      We will add this to the figure.

      6) Section 2.6 second page last sentence of the first paragraph "GRN of asexual reproduction is not active in neoblasts" data in the supplement? Is it not shown?

      We apologize for this poorly written sentence. In line with Reviewer 2s comments this statement needs to be toned down and clarified. The raw information is included in the general table of enhancers (Supplementary Table 2), but the genomic tracks visually highlighting the motifs at the promoters of lox5b and post2b were not included. We will add these to the Supplementary information and clarify Supplementary Table 2.

      7) Discussion: The discussion about pluripotency factors in planarians could be expanded. The authors could contrast the study's findings with Önal et al. 2012.

      We agree we will expand our discussion to compare with previous studies and also summarize what is available from other animals with pluripotent adult stem cells

      Minor: The manuscript has no page numbers or line numbers, so I'll provide a general location of the potential issues.

      1) Section 2 - newly identified isoforms are shorter (1656 vs. 1618). Is the order of the median length reversed?

      Yes, we will correct this.

      2) No mention of Figure S1B in the text.

      It is mentioned in the paragraph regarding splicing, but perhaps not in a useful context. We will add a correct reference to this figure in the presentation of transcript diversity.

      3) Figure 1H should be 1I in the text?

      Yes, we will correct this

      4) The discussion contains some minor typos and grammatical errors.

      We will address with careful rereading.

      We thank the reviewer for spotting these errors and we will fix them in revision.

      Reviewer #3 (Significance (Required)):

      Neiro et al. provide an excellent resource for the planarian community. The paper is generally very well written and easy to read. The new transcriptome described, which improves the annotation of the planarian genome, should be made readily available. It would be excellent if the transcriptome could be incorporated in Planmine.

      We will ask Planmine and the Rink lab to consider this. The annotation (without broad analysis) has been available since the pre-print for Garcia Castro, 2021, Genome Biology was deposited in BioRxiv.

      Furthermore, the authors provide a comprehensive list of transcription factors in the planarian Schmidtea mediterranea. Their work provides insight into which factors are highly expressed in the stem cell compartment. Their computational identification of transcription factors and putative enhancers will be helpful to the growing community of researchers studying stem cell and regenerative biology using planarians. In addition, the large dataset generated in this study could inform studies in the evolution of regulatory sequences and transcription factor function.

      **Referees cross-commenting**

      The data presented are well supported by previous studies. As noted by the authors, it is not possible to make transgenic planarians, and thus the field needs to rely on indirect methods. The authors focus on using the stem cell population, which can be isolated from the animals. Overall, I don't think additional experiments are necessary. Additional RNAi experiments combined with RNA-seq (using the stem cells) could take 6-12 months to complete. I believe this is a solid contribution that should be framed as a resource paper. The authors should pay close attention to Reviewer #2's suggestions and edit the paper accordingly.

      I have 20 years of experience in the field. It would be unreasonable to ask the authors to do more experiments, especially in this post-pandemic environment. I hope this helps.

      We thank the reviewer for the comments.

    1. Author Response:

      Reviewer #1:

      The authors found a switch between "retrospective", sensory recruitment-like representations in visual regions when a motor response could not be planned in advance, and "prospective" action-like representations in motor regions when a specific button response could be anticipated. The use of classifiers trained on multiple tasks - an independent spatial working memory task, spatial localizer, and a button-pressing task - to decode working memory representations makes this a strong study with straightforward interpretations well-supported by the data. These analyses provide a convincing demonstration that not only are different regions involved when a retrospective code is required (or alternatively when a prospective code can be used), but the retrospective representations resemble those evoked by perceptual input, and the prospective representations resemble those evoked by actual button presses.

      I have just a couple of points that could be elaborated on:

      1. While there is a clear transition from representations in visual cortex to representations in sensorimotor regions when a button press can be planned in advance, the visual cortex representations do not disappear completely (Figs 2B and C). Is the most plausible interpretation that participants just did not follow the cue 100% of the time, or that some degree of sensory recruitment is happening in visual cortex obligatorily (despite being unnecessary for the task) and leading to a more distributed, and potentially more robust code?

      This is a very good point, and indeed could be considered surprising. While previous work suggests that sensory recruitment is not obligatory when an item can be dropped from memory entirely (e.g., Harrison & Tong, 2009; Lewis-Peacock et al., 2012; Sprague et al., 2014, Sprague et al., 2016; Lorenc et al., 2020), other work suggests that an item which might still be relevant later in a trial (i.e., a socalled “unattended memory item”) can still be decoded during the delay (see the re-analyses in Iamshchinina et al., 2021 from the original Christophel et al. 2018 paper). In short, we cannot exclude that in our paradigm there is some low-grade sensory recruitment happening in visual cortex, even when an action-oriented code can theoretically be used. This would be consistent with a more distributed code, which could potentially increase the overall robustness of working memory.

      At the same time, as the reviewer points out, there is a possibility that on some fraction of trials, participants failed to perfectly encode the cue, or forgot the cue, which might mean they were using a sensory-like code even on some trials in the informative cue condition. This is a reasonable possibility given that we used a trial-by-trial interleaved design, where participants needed to pay close attention on each trial in order to know the current condition. Since we averaged decoding performance across all trials, the above-chance decoding accuracy could be driven by a small fraction of trials during which spatial strategies were used despite the informative nature of the preview disk.

      Finally, another factor is the averaging of data across multiple TRs from the delay period. In Figure 2B, the decoding was performed using data that was averaged over several TRs around the middle of the delay period (8-12.8 seconds from trial start). This interval is early enough that the process of re-coding a representation from sensory to motor cortex may not be complete yet, so this might be an explanation for the relatively high decoding accuracy seen in the informative condition in Figure 2B. Indeed, the time-resolved analyses (Figure 2C, Figure 2 – figure supplement 1) show that the decoding accuracy for the informative condition continues to decline later in the delay period, though it does not go entirely to chance (with the possible exception of area V1).

      Of course, our ability to decode spatial position despite participants having the option to use a pure action-oriented code may be due to a combination of all of the above: some amount of low-grade obligatory sensory recruitment, as well as occasional trials with higher-precision spatial memory due to a missed cue. We have added a paragraph to the discussion to now acknowledge these possibilities.

      Finally, although it is conceptually important to consider the reasons why decoding in the uninformative condition did not drop entirely to chance, we also note that whether the decoding goes to chance in one condition is not critical to the main findings of our paper. The data show a robust difference between the spatial decoding accuracy in visual cortex between the two conditions, which indicates that the relative amount of information in visual cortex was modulated by the task condition, regardless of what the absolute information content was in each condition.

      1. To what extent might the prospective code reflect an actual finger movement (even just increased pressure on the button to be pressed) in advance of the button press? For instance, it could be the case that the participant with extremely high button press-trained decoding performance in 4B, especially, was using such a strategy. I know that participants were instructed not to make overt button presses in advance, but I think it would be helpful to elaborate a bit on the evidence that these action-related representations are truly "working memory" representations.

      This is a good point, and we acknowledge the possibility of some amount of preparatory motor activity during the delay period on trials in the informative condition. However, we still interpret the delayperiod representations during the informative condition as a signature of working memory, for several reasons.

      First, the participants were explicitly instructed to withhold overt finger movements until the final probe disk was shown. We monitored participants closely during their task training phase, which took place outside the scanner, for early button presses, and ensured that they understood and followed the directive to withhold a button press until the correct time. We also confirmed that participants were not engaging in any noticeable motor rehearsal behaviors, such as tapping their fingers just above the buttons. During the scans, we also monitored participants using a video feed that was positioned in a way that allowed us to see their hands on the response box and confirmed that participants were not making any overt finger movements during the delay period. Additionally, most of our participants were relatively experienced, having participated in at least one other fMRI study with our group in the past, and therefore we expect them to have followed the task instructions accurately.

      The distribution of response times for trials in the informative condition also provides some evidence against the idea that participants were already making a button press ahead of the response window. The earliest presses occurred around 250 ms (see below figure, left panel). This response time is consistent with the typical range of human choice response times observed experimentally (e.g. Luce, 1991), suggesting that participants did not execute a physical response in advance of the probe disk appearance, but waited until the response disk stimulus appeared to begin motor response execution.

      Finally, even if we assume that some amount of low-grade motor preparatory activity was occurring, this is still broadly consistent with the way that working memory has been defined in past literature. Past work has distinguished between retrospective and prospective working memory, with retrospective memory being similar in format to previously encountered sensory stimuli, and prospective memory being more closely aligned with upcoming events or actions (Funahashi, Chafee, & Goldman-Rakic, 1993; Rainer, Rao & D’Esposito, 1999; Curtis, Rao, & D’Esposito, 2004; Rahmati et al., 2018; Nobre & Stokes, 2019). Indeed, the transformation of a memory representation from a retrospective code to prospective memory code is often associated with increased engagement of circuits directly related to motor control (Schneider, Barth, & Wascher, 2017; Myers, Stokes, & Nobre, 2017). According to this framework, covert motor preparation could be considered a representation at the extreme end of the prospective memory continuum. Also consistent with this idea, past work has demonstrated that the selection and manipulation of items in working memory can be accompanied by systematic eye movements biased to the locations at which memoranda were previously presented (Spivey & Geng, 2001; Ferreira et al., 2008; van Ede et al., 2019b; van Ede et al. 2020). These physical eye movements may indeed play a functional role in the retrieval of items from memory (Ferreira et al., 2008; van Ede et al., 2019b). These findings suggest that working memory is tightly linked with both the planning and execution of motor actions, and that the mnemonic representations in our task, even if they include some degree of covert motor preparatory activity, are within the realm of representations that can be defined as working memory.

      We have now included a discussion of this issue in the text of our manuscript.

      Reviewer #2:

      Henderson, Rademaker and Serences use fMRI to arbitrate between theories of visual working memory proposing fixed x flexible loci for maintaining information. By comparing activation patterns in tasks with predictable x unpredictable motor responses, they find different extents of information retrieval in sensory- x motor-related areas, thus arguing that the amount/format of retrospective sensory-related x prospective motor-related information maintained depends on what is strategically beneficial for task performance.

      I share the importance of this fundamental question and the enthusiasm for the conclusions, and I applaud the advanced methodology. I did, however, struggle with some aspects of the experimental design and (therefore) the logic of interpretation. I hope these are easily addressable.

      Conceptual points:

      1. The main informative x non-informative conditions differ more than just in the knowledge about the response. In the informative case, participants could select both the relevant sensory information (light, dark shade) and the corresponding response. In essence, their task was done, and they just needed to wait for a later go signal - the second disk. (The activity in the delay could be considered to be one of purely motor preparation or of holding a decision/response.) In the uninformative condition, neither was sensory information at the spatial location relevant and nor could the response be predicted. Participants had, instead, to hold on to the spatial location to apply it to the second disk. These conditions are more different than the authors propose and therefore it is not straightforward to interpret findings in the framework set up by the authors. A clear demonstration for the question posed would require participants to hold the same working-memory content for different purposes, but here the content that needs to be held differs vastly between conditions. The authors may argue this is, nevertheless, the essence of their point, but this is a weak strawman to combat.

      It is true that the conditions in our task differ in several respects, including the content of the representation that must be stored. The uninformative condition trials required the participant to maintain a high-precision, sensory-like spatial representation of the target stimulus, without the ability to plan a motor response or re-code the representation into a coarser format. In contrast, the informative condition trials allowed the participant to re-code their representation into a more actionoriented format than the representation needed for the uninformative condition trials, and the code is also binary (right or left) rather than continuous.

      However, we do not think these differences present an issue for the interpretation of our study. The primary goal of our study was to demonstrate that the brain regions and representational formats utilized for working memory storage may differ depending on parameters of the task, rather than having fixed loci or a single underlying neural mechanism. To achieve this, we intentionally created conditions that are meant to sit at fairly extreme ends of the continuum of working memory task paradigms employed in past work. Our uninformative condition is similar to past studies of spatial working memory with human participants that encourage high-precision, sensory-like codes (i.e., Bays & Husain, 2008; Sprague et al., 2014; Sprague et al., 2016; Rahmati et al., 2018) and our informative condition is more similar to classic delayed-saccade task studies in non-human primates, which often allowed explicit motor planning (Funahashi et al., 1989; Goldman-Rakic, 1995). By having the same participants perform these distinct task conditions on interleaved trials, we can better understand the relationship between these task paradigms and how they influence the mechanisms of working memory.

      Importantly, it is not trivial or guaranteed that we should have found a difference in neural representations across our task conditions. In particular, an alternative perspective presented in past work is that the memory representations detected in early visual cortex in various tasks are actually not essential to mnemonic storage (Leavitt, Mendoza-Halliday, & Martinez-Trujillo, 2017; Xu, 2020). On this view, if visual cortex representations are not functionally relevant for the task, one might have predicted that our spatial decoding accuracy in early visual areas would have been similar across conditions, with visual cortex engaged in an obligatory manner regardless of the exact format of the representation required. Instead, we found a dramatic difference in decoding accuracy across our task conditions. This finding underscores the functional importance of early visual cortex in working memory maintenance, because its engagement appears to be dependent on the format of the representation required for the current task.

      Relatedly, some past work has also suggested that in the context of an oculomotor delayed response task, the maintenance of action-oriented motor codes can be associated with topographically specific patterns of activation in early visual cortex which resemble those recorded during sensory-like spatial working memory maintenance (Saber et al., 2015; Rahmati et al., 2018). This is true for both prosaccade trials, in which saccade goals are linked to past sensory inputs, and anti-saccade trials, in which motor plans are dissociated from past sensory inputs. These findings indicate that even for task conditions which on the surface would appear to require very different cognitive strategies, there can, at least in some contexts, be a substantial degree of overlap between the neural mechanisms supporting sensory-like and action-oriented working memory. This again highlights the novelty of our findings, in which we demonstrate a robust dissociation between the brain areas and neural coding format that support working memory maintenance for different task conditions, rather than overlapping mechanisms for all types of working memory.

      Additionally, there are important respects in which the task conditions have similarities, rather than being entirely different. As pointed out by Reviewer #1, the decoding of spatial information in early visual cortex regions did not drop entirely to chance in the informative condition, even by the end of the delay period (Figure 2C, Figure 2 – figure supplement 1). As discussed above in our reply to R1, this finding may suggest that the neural code in the informative condition continues to rely on visual cortex activation to some extent, even when an action-oriented coding strategy is available. This possibility of a partially distributed code suggests that while the two conditions in our task appear different in terms of the optimal strategy associated with each one, in practice the neural mechanisms supporting the tasks may be somewhat overlapping (although the different mechanisms are differentially recruited based on task demands, which is our main point).

      Another aspect of our results which suggests a degree of similarity between the task conditions is that the univariate delay period activation in early visual cortex (V1-hV4) was not significantly different between conditions (Figure 1 – figure supplement 1). Thus, it is not simply the case that the participants switched from relying purely on visual cortex to purely on motor cortex – the change in information content instead reflects a much more strategically graded change to the pattern of neural activation. This point is elaborated further in the response to point (2) below.

      1. Given the nature of the manipulation and the fact that the nature of the upcoming trial (informative x uninformative) was cued, how can effects of anticipated difficulty, arousal, or other nuisance variables be discounted? Although pattern-based analyses suggest the effects are not purely related to general effects (authors argue this in the discussion, page 14), general variables can interact with specific aspects of information processing, leading to modulation of specific effects.

      There are several aspects of our results which suggest that our results are not due to effects such as anticipated difficulty or general arousal. First, we designed our experiment using a randomly interleaved trial order, such that participants could not anticipate experimental condition on a trialby-trial basis. Participants only learned which condition each trial was in when the condition cue (color change at fixation; Figure 1A) appeared, which happened 1.5 seconds into the delay period. Thus, any potential effects of anticipated difficulty could not have influenced the initial encoding of the target stimulus, and would have had to take effect later in the trial. Second, as the reviewer pointed out, we did not observe any statistically significant modulation of the univariate delay period BOLD signal in early visual ROIs V1-hV4 between task conditions (Figure 1D, Figure 1 – figure supplement 1), which argues against the idea that there is a global modulation of early visual cortex induced by arousal or changes in difficulty.

      Additionally, our results demonstrate a dissociation between univariate delay period activation in IPS and sensorimotor cortex ROIs as a function of task condition (Figure 1D, Figure 1 – figure supplement 1). In each IPS subregion (IPS0-IPS3), the average BOLD signal was significantly greater during the uninformative versus the informative condition at several timepoints in the delay period, while in S1, M1, and PMc, average signal was significantly greater for the informative than the uninformative condition at several timepoints. If a global change in mean arousal or anticipated difficulty were a main driving factor in our results, then we would have expected to see an increase in the univariate response throughout the brain for the more difficult task condition (i.e., the uninformative condition). Instead, we observed effects of task condition on univariate BOLD signal that were specific to particular ROIs. This indicates that modulations of neural activation in our task reflect a more finegrained change in neural processing, rather than a global change in arousal or anticipated difficulty.

      Furthermore, to determine whether the changes in decoding accuracy in early visual cortex were specific to the memory representation or reflected a more general change in signal-to-noise ratio, we provide a new analysis assessing the possibility that processing of incoming sensory information differed between our two conditions. As mentioned above, initial sensory processing of the memory target stimulus was equated across conditions, since participants didn’t know the task condition until the cue was presented 1.5s into the trial. However, because the “preview disk” was presented after the cue, it is possible that the preview disk stimulus was processed differently as a function of task condition. If evidence for differential processing of the preview disk stimulus is present, this might suggest that non-mnemonic factors – such as arousal – might influence the observed differences in decoding accuracy because they should interact with the processing of all stimuli. However, a lack of evidence for differential processing of the preview disk would be consistent with a mnemonic source of differences between task conditions.

      As shown in the new figure below (now Figure 2 – figure supplement 3), we used a linear decoder to measure the representation of the “preview disk” stimulus that was shown to participants early in the delay period, just after the condition cue (Figure 1A). This disk has a light and dark half separated by a linear boundary whose orientation can span a range of 0°-180°. To measure the representation of the disk’s orientation, we binned the data into four bins centered at 0°, 45°, 90°, and 135°, and trained two binary decoders to discriminate the bins that were 90° apart (an adapted version of the approach shown in Figure 2A; similar to Rademaker et al., 2019). Importantly, the orientation of this disk was random with respect to the memorized spatial location, allowing us to run this analysis independently from the spatial-position decoding in the main manuscript text.

      We found that in both conditions, the orientation of the preview disk boundary could be decoded from early visual cortex (all p-values<0.001 for V1-hV4 in both conditions; evaluated using nonparametric statistics as described in Methods), with no significant difference between our two task conditions (all p-values>0.05 for condition difference in V1-hV4). This indicates that in both task conditions, the incoming sensory stimulus (“preview disk”) was represented with similar fidelity in early visual cortex. At the same time, and in the same regions, the representation of the remembered spatial stimulus was significantly stronger in the uninformative condition than the informative condition. Therefore, the difference between task conditions appears to be specific to the quality of the spatial memory representation itself, rather than a change in the overall signal-to-noise ratio of representations in early visual cortex. This suggests that the difference between task conditions in early visual cortex reflects a difference in the brain networks that support memory maintenance in the two conditions, rather than extra processing of the preview disk in one condition over the other, a more general effect of arousal, or anticipated difficulty.

      This result is also relevant to the concerns raised by the reviewer in point (1) regarding the possibility that the selection of relevant sensory information (i.e., the light/dark side of the disk) was different between the two task conditions. Since the decoding accuracy for the preview disk orientation did not differ between task conditions, this argues against the idea that differential processing of the preview disk may have contributed to the difference in memory decoding accuracy that we observed.

      1. I see what the authors mean by retrospective and prospective codes, but in a way all the codes are prospective. Even the sensory codes, when emphasized, are there to guide future discriminations or to add sensory granularity to responses, etc. Perhaps casting this in terms of sensory/perceptual x motor/action~ may be less problematic.

      This is a good point, and we agree that in some sense all the memory codes could be considered prospective because in both conditions, the participant has some knowledge of the way that their memory will be probed in the future, even when they do not know their exact response yet. We have changed our language in the text to reflect the suggested terms “perceptual” and “action”, which will hopefully also make the difference between the conditions clearer to the reader.

      1. In interpreting the elevated univariate activation in the parietal IPSO-3 area, the authors state "This pattern is consistent with the use of a retrospective spatial code in the uninformative condition and a prospective motor code in the informative condition". (page 6) (Given points 1 and 3 above) Instead, one could think of this as having to hold onto a different type of information (spatial location as opposed to shading) in uninformative condition, which is prospectively useful for making the necessary decision down the line.

      It is true that a major difference between the two conditions was the type of information that the participants had to retain, with a sensory-like spatial representation being required for the uninformative condition, and a more action-oriented (i.e., left or right finger) representation being required for the informative condition. To clarify, the participant never had to explicitly hold onto the shading (light or dark gray side of the disk), since the shading was always linked to a particular finger, and this mapping was known in advance at the start of each task run (although we did change this mapping across task runs within each participant to counterbalance the mapping of light/dark and the left/right finger – one mapping used in the first scanner session, the other mapping used in the second scanning session). We have clarified this sentence and we have removed the use of the terms “retrospective” and “prospective” as suggested in the previous comment. The sentence now reads: “This pattern is consistent with the use of a spatial code in the uninformative condition and a motor code in the informative condition.”

      Other points to consider:

      1. Opening with the Baddeley and Hitch 1974 reference when defining working memory implicitly implies buying into that particular (multi-compartmental) model. Though Baddeley and Hitch popularised the term, the term was used earlier in more neutral ways or in different models. It may be useful to add a recent more neutral review reference too?

      This is a nice suggestion. We have added a few more references to the beginning of the manuscript, which should together present a more neutral perspective (Atkinson & Shiffron, 1968; and Jonides, Lacey and Nee, 2005).

      1. The body of literature showing attention-related selection/prioritisation in working memory linked to action preparation is also relevant to the current study. There's a nice review by Heuer, Ohl, Rolfs 2020 in Visual Cognition.

      We thank the reviewer for pointing out this interesting body of work, which is indeed very relevant here. We have added a new paragraph to our discussion which includes a discussion of this paper and its relation to our work.

    1. Reviewer #3 (Public Review): 

      In this paper, Troendle et al investigate changes in alpha oscillation across childhood and adolescence. The main goal of this investigation is to examine how alpha oscillations change across these age ranges, by investigating a large open dataset and adopting new methods that should help to address methodological limitations of many previous analyses. In particular, a key goal is to examine changes in periodic alpha power, and control for potential confounds due to changes in peak frequency and/or aperiodic activity. To do so, they employ a novel spectral parametrization method, and systematically compare measures of isolated periodic alpha activity to conventional measures. Overall, they find that they can replicate the age-related decrease of total alpha power when using conventional methods. However, when explicitly measuring and controlling for aperiodic activity, they find that periodic alpha activity actually increases with age. They suggest this discrepancy can be explained by changes in aperiodic activity, as the aperiodic slope and intercept are found to systematically change across age, in a way that likely drives the finding decrease of total alpha power, while the periodic alpha power actually increases. There are also some follow up analyses, including relating alpha power to anatomical measures of the thalamus, and to performance on an attention task. 

      Strengths of this investigation include that it analyzes multiple, large datasets with well motivated methods. I think the goal of this paper addresses an important question, in terms of seeking to clarify some basic patterns of oscillation changes across development, and doing so in a rigorous way, both in terms of employing methods that are robust to estimating different features of the data, and in terms of using multiple, large datasets, including an internal replication of the main findings. I find the main goal and analysis compelling in terms of examining how alpha activity changes across this age range. 

      I also find some limitations to some aspects of this paper and analysis that could be improved, as they do not always clearly describe the context or support the claims that are made for some of the follow-up analyses, as described in the following. 

      1. Framing and prior literature 

      I find some limitations in the organizing of this paper and it's relationship to prior work that could be improved, as I find that the paper could do better situating the analyses here with prior work, in particular in relation to the methodological issues it is addressing, and prior work on aperiodic activity. 

      For example, in the abstract it is stated that "simulations in this study show that conventional measures of alpha power are confounded". Despite this statement, simulations are not a core feature of this study. There are a couple simulated examples in the supplement, which are referred to in lines 89-95, however it's worth nothing noting that while this section does not include any citations, the described issues, and related simulations, are very similar to points that have been made previously in the literature, that seem like they should be cited here: <br /> - Donoghue, T., Dominguez, J., & Voytek, B. (2020). Electrophysiological Frequency Band Ratio Measures Conflate Periodic and Aperiodic Neural Activity. ENeuro, 7(6), ENEURO.0192-20.2020. https://doi.org/10.1523/ENEURO.0192-20.2020 <br /> - Donoghue, T., Schaworonkow, N., & Voytek, B. (2021). Methodological considerations for studying neural oscillations. European Journal of Neuroscience, ejn.15361. https://doi.org/10.1111/ejn.15361 

      The paper also understates previous work on aperiodic activity, and the degree to which it is known to vary with age, in line 116-117 stating "there is insufficient evidence for the reported significant association between age and aperiodic signal components". This seems to ignore the large number of studies that have replicated this finding, including (some non-exhaustive examples): <br /> - Thuwal, K., Banerjee, A., & Roy, D. (2021). Aperiodic and Periodic Components of Ongoing Oscillatory Brain Dynamics Link Distinct Functional Aspects of Cognition across Adult Lifespan. Eneuro, 8(5), ENEURO.0224-21.2021. https://doi.org/10.1523/ENEURO.0224-21.2021 <br /> - Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-Related Changes in 1/f Neural Electrophysiological Noise. Journal of Neuroscience, 35(38), 13257-13265. https://doi.org/10.1523/JNEUROSCI.2332-14.2015 <br /> Perhaps this claim is supposed to more specifically reflect the age-range analyzed here, in which case recent studies examining this (in relatively large datasets) are also not mentioned here, including, for example: <br /> - Donoghue, T., Dominguez, J., & Voytek, B. (2020). Electrophysiological Frequency Band Ratio Measures Conflate Periodic and Aperiodic Neural Activity. ENeuro, 7(6), ENEURO.0192-20.2020. https://doi.org/10.1523/ENEURO.0192-20.2020 <br /> - Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A. G., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076. https://doi.org/10.1016/j.dcn.2022.101076 

      The notes above do not undermine the utility of examining alpha oscillations in detail, but I think the specific contribution of this work could be better contextualized in terms of other existing work. In the introduction, for example, the following review is an important piece of work that could be cited when introducing aperiodic activity: <br /> - He, B. J. (2014). Scale-free brain activity: Past, present, and future. Trends in Cognitive Sciences, 18(9), 480-487. https://doi.org/10.1016/j.tics.2014.04.003 

      2. Model quality control 

      A limitation to the methods employed in this study is a lack of description of if and how model fit quality was evaluated. For the method of parametrizing neural power spectra that is employed, it is important to validate that models fit the data well, otherwise the estimated parameters may be unreliable. This is especially important in developmental and clinical data, as analyzed here, as this data can be quite noisy, and differences in levels of noise across ages or between clinical groups could plausibly lead to differences in model fit quality. Useful quality checks for this kind of analysis would be to report the average r-squared (or model error) for the parametrized data, and to examine whether model fit quality is significantly related to age, or clinical status. 

      Note that there is also a detailed guide for how best to apply spectral parametrization to developmental datasets, including notes on quality control, that may be useful: <br /> - Ostlund, B., Donoghue, T., Anaya, B., Gunther, K. E., Karalunas, S. L., Voytek, B., & Pérez-Edgar, K. E. (2022). Spectral parameterization for studying neurodevelopment: How and why. Developmental Cognitive Neuroscience, 54, 101073. https://doi.org/10.1016/j.dcn.2022.101073 

      Not reporting any quality control metrics of the model fits also deviates from the analysis of the validation dataset as described in the pre-registered analysis (https://osf.io/7uwy2), which includes the note that the plan is for data to be excluded from the analysis if there is a bad model fit (R-squared < 0.9). It is unclear from the manuscript if this was done at all - and if so, why it was not described, and if not, why this deviates from the pre-registration. Note that though examining and reporting model fit quality is important, it is unclear where the value of 0.9 in the pre-registration came from, and it is unclear if this is an appropriate threshold for these specific datasets. 

      3. The analysis of the relationship between the aperiodic intercept and aperiodic exponent 

      There is an analysis in this paper that attempts to evaluate whether the change in aperiodic intercept that is observed is more than expected due to the measured change in aperiodic exponent. The approach taken for this analysis is ill-posed, and the interpretations made of this analysis are not supported. The issue is that the degree to which the intercept changes due to a change in exponent depend on the rotation frequency, which is not acknowledged or addressed in the analysis employed here. 

      For example, for spectra rotated at 0 Hz, there is no measured change in offset from a change in exponent, whereas for a rotation at 100 Hz, there is a large influence of exponent on the change in offset, with different degrees of impact in between. The results of this analysis are therefore heavily influenced by the rotation frequency that is used. The analysis by the authors uses a rotation frequency of 19 Hz, however, there is no justification provided for this value. It is noted as being the middle point of the analyzed range, however, this itself is unrelated to whether it is an appropriate rotation frequency (since which frequency the spectrum rotates at is unrelated to the experimenter's decision of which frequency range to analyze). 

      In real data, we don't a priori know what the rotation frequency point is, and in general it need not be a single, consistent point, and between subjects, is difficult to measure. To get a sense of what it might be, anecdotally, we can see in Figure 2C that in this particular subset, the rotation point is not at 19 Hz, and appears to be at a higher frequency. If the rotation point is actually higher than 19 Hz, then the analysis employed will systematically under-estimate the impact of the measured exponent change - leading to the conclusion that intercept is changing over and above the influence of the exponent. However, this conclusion is only valid if the rotation point of 19 Hz is accurate, and we would likely arrive at a different conclusion by picking a different rotation point. This analysis, by itself, is therefore invalid. Such an analysis would require a clear motivation of having measured the correct rotation frequency to be interpretable. 

      4. Flanker Analysis 

      Also relating to organization (similar to point 1) it is unclear why the analysis of the Flanker task, which is alluded to in the abstract, is only mentioned in the Discussion section. Given that this appears to be a key analysis, it is unclear why it is not presented in detail in the Results. The Flanker task and analysis is also not described in much detail in the methods. An issue with the Flanker analysis only being mentioned in the Discussion, with a link to supplemental table, is that the details of the results are somewhat obfuscated from the reader. When looking at these results, two key features seem notable - the first that though it is significant effect of aperiodic-adjusted alpha power, the beta value is very small (many times smaller than the coefficients for age and gender), and second, that although it doesn't quite pass significance, the estimated beta value for the total alpha power has the same magnitude as for the individualized alpha power. Between these two features, it is not clear if the relationship between aperiodic-adjusted alpha power and the Flanker performance is of sufficient magnitude to interpret that alpha power is related to attentional performance, and it's not clear that aperiodic-adjusted alpha power is more related to attentional performance than total alpha power (since a difference in significance does not necessarily imply a significant difference in the parameters). I think this analyses, as presented, therefore does not clearly support the claim made in the abstract that alpha power is found to relate to improved attentional performance.

    1. Discussion, revision and decision


      Author response


      To: Adam Marcus, co-founder Retraction Watch & Alison Abritis, PhD, researcher at Retraction Watch

      Major Problems: I found serious deficits in both for this article, and thus I have serious concerns as to the usefulness of this article. Therefore, I have not proceeded in a line-by-line, as I consider the overall problems to be grave enough to require attention and revision before getting to lesser items of clarity.

      I would like to point out that the authors show a marvelous attention to their work, and they have much to contribute to the field of retraction studies, and I do honestly look forward to their future work. However, in order for the field to move ahead with accuracy and validity, we must no longer just rely on superficial number crunching, and must start including the complexities of publishing in our analyses, as difficult and labor-intensive as it might be.

      We do not consider that our article presents serious problems nor that it would be useless.

      It is possible that a different view on the subject, some tendency to forbearance (understandable) for the difficult life of the publishing industry, along with some difficulties in understanding the ideas presented in the article, may have led to a series of points of view that we would like to comment on below.

      We would first like to thank the reviewers for their comments, some of which will allow us to improve and nuance, using objective elements, the analysis of this bumpy field represented by the ecosystem of retracted publications. Because we have based our study on data from freely accessible sources of information, we will not insist too much on commenting on this issue.

      The authors stated that they used the search protocol (and therefore presumably the same dataset) as described in Toma & Padureanu, 2021, and do not indicate any process to compensate for its weaknesses. In the referenced study, the authors (same as for this article) utilized a PubMed search using only “Retracted Publication” in Publication Type. This search method is immediately insufficient, as some retracted articles are not bannered or indexed as retracted in PubMed. This issue is well-understood among scholars who search databases for retractions, and by now one would expect that these searches would strive to be more comprehensive.

      A better method, if one insists on restricting the search to PubMed, would have been to use Publication Type to search for “retracted publication,” and then to search for “retraction of publication,” and to compare the output to eliminate duplications. There are even more comprehensive ways to search PubMed, especially since some articles are retitled as “Withdrawn” – Elsevier, for example, uses the term instead of “Retracted” for papers removed within a year of their publication date – but do not come in searches for either publication type. Even better would have been to use databases with more comprehensive indexing of retractions.

      In an ideal world, if any effort were to be made, it would be aimed at better indexing and managing existing databases, not at generating query strategies to make up for their shortcomings.

      Thank you very much for the suggestions on the search strategy. We do not consider that the use of "Retracted Publication [PT]" should be compensated in any way but, if it should be compensated, we wouldn't want to add "Retraction of publication". We consider that using a search protocol more specific to systematic reviews is not very useful in our case: data are added/updated continuously (sometimes late), incorrect indexing can be corrected, the number of retracted articles increases from month to month; the same strategy can give different results at different times regardless of its complexity. Putting extra effort into detecting problematic articles without knowing the benefit but expecting it only highlights issues that can be improved at the publisher/editor(content delivery) and database level(indexing).

      The dataset analyzed is a snapshot of a particular time interval and nothing more. Even during the analysis we found, in the case of one publisher, the addition of details to the initially incomplete retraction notes. Hence the need for follow-up studies. Therefore in the case of retractions, unlike the reviewer, we prefer an approach based on simple and easily reproducible strategies, widely accessible sources of information, and several steps. The first step in this strategy is the "number crunching" stage which includes this article.

      1. The authors are using the time from publication to retraction based on the notice dates and using them to indicate efficacy of oversight by publishers. However, this approach is seriously problematic. It takes no notice of when the publisher was first informed that the article was potentially compromised. Publishers who respond rapidly to information that affects years/decades old publications will inevitably show worse scores than those who are advised upon an article’s faults immediately upon its publication, but who drag their heels a few months in dealing with the problem.

      Indeed, the article uses the time between publication and retraction(exposure time – ET) as one of the SDTP score components for assessing editorial/publisher performance. Data on when a publisher or editor has been informed of problems with an article, apart from being relatively rare, is not a substitute for a retraction note. Moreover, the use of such information may induce a risk of bias.

      We mention in the article the need to use reporting standards for retraction notes, and one element that might be useful is, indeed, the date on which the publisher or editor was informed of problems with an article. Unfortunately, as the author of this review knows very well, information precedes investigation; the retraction note contains (or should contain) much more data than the initial information about the quality problems of an article.

      Our article aims to suggest a score for measuring publication performance in the context of retracted articles that would also allow an assessment of the dynamics of the activity of correcting the scientific record and, more importantly, how publishers engage in post-publication quality control. ET is only one component of this score.

      It is quite clear from the data presented in the article that a publisher/journal that emphasizes systematic back-checking will have an increasingly longer average lifespan of retracted articles, logically higher than one that does not do this type of checking. We don't see precisely where the reviewer thinks there is a problem: once the checking is done, the ET will decrease, and a publisher that takes concrete steps to correct the literature will ultimately have a better reputation. This does not mean that a higher ET is laudable, it suggests that there is a post-publication quality control but also that the peer review process has let problematic articles through and that the control of these articles has been carried out late. This is an argument for more active involvement of publishers (as potential generators of editorial policies) in post-publication control.

      Second, there is little consistency in dealing with retractions between publishers, within the same publishers or even within the same journal. Under the same publisher, one journal editor may be highly responsive during their term, while the next editor may not be. Most problems with articles quite often are first addressed by contacting the authors and/or journal editors, and publishers – especially those with hundreds of journals – may not have any idea of the ensuing problem for weeks or months, if at all. Therefore, the larger publishers would be far more likely to show worse scores than publishers with few journals to manage oversight.

      It is exactly this inconsistency that we highlight in the article. Differing policies, attitudes, and responsiveness does not mean that a publisher cannot/should not ask questions about the effectiveness of internal processes and resources used for post-publication quality control or the implementation of uniform measures across journals in its portfolio.

      Third, the dates on retraction notices are not always representative of when an article was watermarked or otherwise indicated as retracted. Elsevier journals often overwrite the html page of the original article with the retraction notice, leaving the original article’s date of publication alone. A separate retraction notice may not be published until days, weeks or even years after the article has been retracted. Springer and Sage have done this as well, as have other publishers – though not to the same extent (yet).

      Historically, The Journal of Biological Chemistry would publish a retraction notice and link it immediately to the original article, but a check of the article’s PDF would show it having been retracted days to weeks earlier. They have recently been acquired by Elsevier, so it is unknown how this trend will play out. And keep in mind, in some ways this is in itself not a bad thing – as it gives the user quicker notice that an article is unsuitable for citation, even while the notice itself is still undergoing revisions. It just makes tracking the time of publication to retraction especially difficult.

      We used the same date for all articles in our study (the one listed in PubMed), thus ensuring a uniform criterion for all publishers. If this date was not in PubMed we used the date from the retraction notes on the journal website but this was for a small number of articles. How different publishers handle retraction processes or the delay with which these are published is primarily related to internal editorial procedures, and these delays are reflected in the ET. In our experience, most articles retracted by Elsevier are available online, supplemented, and not replaced by retraction notes, which we think is an excellent policy.

      1. As best as can be determined, the authors are taking the notices at face value, and that has been repeatedly shown to be flawed. Many notices are written as a cooperative effort between the authors and journal, regardless of who initiated the retraction and under the looming specter of potential litigation.

      Shown to be flawed by who? Indeed, in our study, we refer to the retraction notes published by the journals. The fact that they are incomplete or formulated under the threat of litigation only supports our view that publishers and editors need to make a more significant effort to correct the biomedical literature, including avoiding litigation when the retraction note clearly describes the reasons for retraction. The way the retraction note is worded should be an editorial prerogative and should primarily aim at correcting scientific literature, not at appeasing egos, careers, or financial interests.

      Trying to establish who initiated a retraction process strictly by analyzing the notice language is destined to produce faulty conclusions. Looking just at PubPeer comments, questions about the data quality may be raised days/month/years before a retraction, with indications of having contacted the journal or publisher. And yet, an ensuing notice may be that the authors requested the retraction because of concerns about the data/image – where the backstory clearly shows that impetus for the retraction was prompted by a journal’s investigation of outside complaints. As an example, the recent glut of retractions of papers coming from paper mills often suggest the authors are requesting the retraction. This interpretation would be false, however, as those familiar with the backstory are aware that the driving force for many of these retractions were independent investigators contacting the journals/publishers for retraction of these manuscripts.

      Once again, the author of this review does not seem to fully understand our study, apparently favouring information published on third-party websites over that the journals officially assumed. The retraction notes represent the material available to a researcher doing documentation on a particular topic. The clarity and information contained in the note is the editor's or publisher’s responsibility, reflecting their performance and concern for the integrity of the science. Interpretation of a retraction note/analyzing an article occurs in this context. Not everyone has time for further investigation or to search third-party sites for information that is, with a notable exception, the result of a selection bias.

      Assigning the reason for retraction from only the text of the notice will absolutely skew results. As already stated, in many cases, journal editors and authors work together to produce the language. Thus, the notice may convey an innocuous but unquestionable cause (e.g., results not reproducible) because the fundamental reason (e.g., data/image was fabricated or falsified) is too difficult to prove to a reasonable degree. Even the use of the word “plagiarism” is triggering for authors’ reputations – and notices have been crafted to avoid any suggestion of such, with euphemisms that steer well clear of the “p” word. Furthermore, it has been well-documented that some retractions required by institutional findings of misconduct have used language in the notice indicating simple error or other innocuous reasons as the definitive cause.

      We understand your point of view and the situations presented may be accurate. However, from our point of view, the only valid reference remains the retraction note published on the journal's website. The existence of wording difficulties and various other problems that may arise are more likely to do with a tendency of the reviewer to make excuses for journals reluctant to indicate precisely what the reasons for retracting the article are. There are plenty of retraction notes in which the images with problems (including whether they were plagiarized, reused, manipulated, fabricated, etc.) are indicated with great precision, there are equally plenty of notes in which the word plagiarism is used without hesitation, indicating the sources, how they were informed, what was plagiarized. No matter how many hesitant publishers/editors there are, it should not be forgotten that there are many journals/publishers who take their role seriously, acknowledge and learn from their mistakes, thus providing a real service to the scientific community.

      The authors also discuss changes in the quality of notices increasing or decreasing in publishers – but without knowing the backstory. Having more words in a notice or giving one or two specific causes cannot in itself be an indicator of the quality (i.e., accuracy) of said notice.

      "Knowing the backstory" is not part of our objectives, and neither is assessing the quality of the retraction notes. This is also very difficult to do due to the lack of an accepted standard format. We are trying to propose a score composed of several parameters resulting from existing (or non-existing) data in the retraction notes so that we can have a picture of retractions at publisher level. Knowing the backstory is not relevant, reading and interpreting the official retraction note is relevant.

      1. The authors tend to infer that the lack of a retraction in a journal implies a degree of superiority over journals with retractions. Although they qualify it a bit ( “Are over 90% of journals without a retracted article perfect? It is a question that is quite difficult to answer at this time, but we believe that the opinion that, in reality, there are many more articles that should be retracted (Oransky et al. 2021) is justified and covered by the actual figures.”), the inference is naive. First, they have not looked at the number of corrections within these journals. Even ignoring that these corrections may be disproportionate within different journals and require responsive editorial staff, some journals have gone through what can only be called great contortions to issue corrections rather than retractions.

      We believe that this is a case of reviewer confusion generated either by the insufficiently precise wording of the text or a lack of understanding of our study objectives. We are trying to point out that more than 90% of the journals in the NLM catalogue-PubMed subset have not retracted a single article. We are not trying to say that journals without retracted articles are superior to the others. As explained in the article, we referred to retraction notes, not corrections.

      Second, the lack of retractions in a journal speaks nothing to the quality of the articles therein. Predatory journals generally avoid issuing retractions, even when presented with outright proof of data fabrication or plagiarism. Meanwhile, high-quality journals are likely to have more, and possibly more astute, readers, who could be more adept at spotting errors that require retraction.

      Of course, the quality level of articles in a journal is not determined by the number of articles removed.

      Third, smaller publishers/journals may not have the fiscal resources to deal with the issues that come with a retraction. As an example, even though there was an institutional investigation finding data fabrication, at least one journal declined to issue a retraction for an article by Joachim Boldt (who has more than 160 retractions for misconduct) after his attorneys made threats of litigation.

      Threats of lawsuits are instead a failure of a publisher/journal to adapt to the realities of the publishing business or to the risk of misconduct. This is something that needs to change.

      Simply put, the presence or lack of a retraction in a journal is no longer a reasonable speculation about the quality of the manuscripts or the efficiency of the editorial process.

      We have not attempted to suggest this, we have only analyzed the retracted articles and their associated retraction notes. On the other hand, the way a journal/publisher handles the retraction of problematic articles still reflects, to some extent, the quality/performance of the editorial processes.

      1. I am concerned that the authors appear to have made significant errors in their analysis of publishers. For example, they claim that neither PLOS nor Elsevier retracted papers in 2020 for problematic images. That assertion is demonstrably false.

      This is wrong. In our dataset, there are eleven PLOS articles related to human health with the publication year 2019 and 2020. None of these have images as retraction reasons.

      Regarding the 21 Elsevier articles published in 2020, there is nothing in the retraction notes to indicate that the article was retracted because of the images. In 2 retraction notes there is mention of the comments made by Dr. Bik (The Tadpole Paper Mill - Science Integrity Digest) but the text of these (retraction notes) stops at the authors' inability to provide the raw data underlying the article.

      Our study is based only on the content of the retraction notes published and assumed by the journal, not on opinions/comments appearing on other sites, which, for unknown/unmentioned reasons, are not officially assumed in the retraction note. Therefore, we consider the statement in the review to be questionable at best, as the use of material other than the retraction notes has severe implications for the internal and external validity of the study and the suggestion to use such methods is, in our opinion, wrong. We would also like to draw attention to the fact that many retraction notes are explicitely mentioning the request to provide raw images and the authors' inability to provide them.

      Anyway, as far as images are concerned, our article suggested that there are publishers which seem to adopt image analysis technologies faster than others. The numbers are not really relevant in this case but the trend is: it describes the publishing activity complexity better than the numbers.

      Reviewer response

      We appreciate the authors’ zeal in standing by their work.

      In regard to the deficits in the search process, the author states, “We do not consider that the use of ‘Retracted Publication [PT]’ should be compensated in any way but, if it should be compensated, we wouldn't want to add ‘Retraction of publication’”

      There is a lack of appreciation for the complexities of indexing retracted materials in an indexing site such as PubMed. To have a comprehensive search, one should not be choosing to use either “Retracted Publication [PT]” OR “Retraction of Publication [PT].” One would use both, and then filter out the duplicates, because some retractions are indexed by retraction notices, some only have “Retracted” added to the indexed title and the publication type changed to “Retracted Publication.” Use of only one or the other guarantees that the search is far less comprehensive than it should be.

      The authors state, “In an ideal world, if any effort were to be made, it would be aimed at better indexing and managing existing databases, not at generating query strategies to make up for their shortcomings.”

      There is at least one database (http://retractiondatabase.org) that has a far more comprehensive indexing of retractions and is publicly available for use.

      In Item 3, where it is pointed out that retraction notices themselves are inaccurate and cannot be taken at face value as to the reason behind the retraction, the authors responded, “Shown to be flawed by who?” — By an article cited in the manuscript:

      Fang, Ferric C.; Steen, R. Grant; Casadevall, Arturo (2012): Misconduct accounts for the majority of retracted scientific publications. In Proceedings of the National Academy of Sciences of the United States of America 109 (42), pp. 17028–17033. DOI: 10.1073/pnas.1212247109.

      “To understand the reasons for retraction, we consulted reports from the Office of Research Integrity and other published resources (7, 8), in addition to the retraction announcements in scientific journals. Use of these additional sources of information resulted in the reclassification of 118 of 742 (15.9%) retractions in an earlier study (4) from error to fraud.” Followed by “These factors have contributed to the systematic underestimation of the role of misconduct and the overestimation of the role of error in retractions (3, 4), and speak to the need for uniform standards regarding retraction notices (5).”

      The authors then choose to state that it is the “editorial prerogative” – and that when notices “are incomplete or formulated under the threat of litigation [it] only supports our view that publishers and editors need to make a more significant effort to correct the biomedical literature, including avoiding litigation when the retraction note clearly describes the reasons for retraction.”

      Following our attempt to explain why understanding the real reason behind a retraction is important to study the publication of notices, the authors respond: “Once again, the author of this review does not seem to fully understand our study, apparently favouring information published on third-party websites over that the journals officially assumed.”

      First, yes, we do understand the study. We read a lot of these. Second, the “third-party websites” we prefer include the Office of Research Integrity and the Retraction Watch blog, where background investigations into the causes of retraction notices are described. If the authors are challenging the reference to PubPeer, keep in mind that journals initiate investigations based on comments on that website, and have taken to citing the website in their notices.

      Had the authors not chosen to categorize the reasons for retraction, their reasoning may have had more support – but they did, and in doing so, by just using the notice with no further review, their findings address only the notice itself, with no context.

      We recommend that the manuscript be substantially revised with strong attention to the comments we made in our original review.

    2. Discussion, revision and decision


      Author response


      To: Adam Marcus, co-founder Retraction Watch & Alison Abritis, PhD, researcher at Retraction Watch

      Major Problems: I found serious deficits in both for this article, and thus I have serious concerns as to the usefulness of this article. Therefore, I have not proceeded in a line-by-line, as I consider the overall problems to be grave enough to require attention and revision before getting to lesser items of clarity.

      I would like to point out that the authors show a marvelous attention to their work, and they have much to contribute to the field of retraction studies, and I do honestly look forward to their future work. However, in order for the field to move ahead with accuracy and validity, we must no longer just rely on superficial number crunching, and must start including the complexities of publishing in our analyses, as difficult and labor-intensive as it might be.

      We do not consider that our article presents serious problems nor that it would be useless.

      It is possible that a different view on the subject, some tendency to forbearance (understandable) for the difficult life of the publishing industry, along with some difficulties in understanding the ideas presented in the article, may have led to a series of points of view that we would like to comment on below.

      We would first like to thank the reviewers for their comments, some of which will allow us to improve and nuance, using objective elements, the analysis of this bumpy field represented by the ecosystem of retracted publications. Because we have based our study on data from freely accessible sources of information, we will not insist too much on commenting on this issue.

      The authors stated that they used the search protocol (and therefore presumably the same dataset) as described in Toma & Padureanu, 2021, and do not indicate any process to compensate for its weaknesses. In the referenced study, the authors (same as for this article) utilized a PubMed search using only “Retracted Publication” in Publication Type. This search method is immediately insufficient, as some retracted articles are not bannered or indexed as retracted in PubMed. This issue is well-understood among scholars who search databases for retractions, and by now one would expect that these searches would strive to be more comprehensive.

      A better method, if one insists on restricting the search to PubMed, would have been to use Publication Type to search for “retracted publication,” and then to search for “retraction of publication,” and to compare the output to eliminate duplications. There are even more comprehensive ways to search PubMed, especially since some articles are retitled as “Withdrawn” – Elsevier, for example, uses the term instead of “Retracted” for papers removed within a year of their publication date – but do not come in searches for either publication type. Even better would have been to use databases with more comprehensive indexing of retractions.

      In an ideal world, if any effort were to be made, it would be aimed at better indexing and managing existing databases, not at generating query strategies to make up for their shortcomings.

      Thank you very much for the suggestions on the search strategy. We do not consider that the use of "Retracted Publication [PT]" should be compensated in any way but, if it should be compensated, we wouldn't want to add "Retraction of publication". We consider that using a search protocol more specific to systematic reviews is not very useful in our case: data are added/updated continuously (sometimes late), incorrect indexing can be corrected, the number of retracted articles increases from month to month; the same strategy can give different results at different times regardless of its complexity. Putting extra effort into detecting problematic articles without knowing the benefit but expecting it only highlights issues that can be improved at the publisher/editor(content delivery) and database level(indexing).

      The dataset analyzed is a snapshot of a particular time interval and nothing more. Even during the analysis we found, in the case of one publisher, the addition of details to the initially incomplete retraction notes. Hence the need for follow-up studies. Therefore in the case of retractions, unlike the reviewer, we prefer an approach based on simple and easily reproducible strategies, widely accessible sources of information, and several steps. The first step in this strategy is the "number crunching" stage which includes this article.

      1. The authors are using the time from publication to retraction based on the notice dates and using them to indicate efficacy of oversight by publishers. However, this approach is seriously problematic. It takes no notice of when the publisher was first informed that the article was potentially compromised. Publishers who respond rapidly to information that affects years/decades old publications will inevitably show worse scores than those who are advised upon an article’s faults immediately upon its publication, but who drag their heels a few months in dealing with the problem.

      Indeed, the article uses the time between publication and retraction(exposure time – ET) as one of the SDTP score components for assessing editorial/publisher performance. Data on when a publisher or editor has been informed of problems with an article, apart from being relatively rare, is not a substitute for a retraction note. Moreover, the use of such information may induce a risk of bias.

      We mention in the article the need to use reporting standards for retraction notes, and one element that might be useful is, indeed, the date on which the publisher or editor was informed of problems with an article. Unfortunately, as the author of this review knows very well, information precedes investigation; the retraction note contains (or should contain) much more data than the initial information about the quality problems of an article.

      Our article aims to suggest a score for measuring publication performance in the context of retracted articles that would also allow an assessment of the dynamics of the activity of correcting the scientific record and, more importantly, how publishers engage in post-publication quality control. ET is only one component of this score.

      It is quite clear from the data presented in the article that a publisher/journal that emphasizes systematic back-checking will have an increasingly longer average lifespan of retracted articles, logically higher than one that does not do this type of checking. We don't see precisely where the reviewer thinks there is a problem: once the checking is done, the ET will decrease, and a publisher that takes concrete steps to correct the literature will ultimately have a better reputation. This does not mean that a higher ET is laudable, it suggests that there is a post-publication quality control but also that the peer review process has let problematic articles through and that the control of these articles has been carried out late. This is an argument for more active involvement of publishers (as potential generators of editorial policies) in post-publication control.

      Second, there is little consistency in dealing with retractions between publishers, within the same publishers or even within the same journal. Under the same publisher, one journal editor may be highly responsive during their term, while the next editor may not be. Most problems with articles quite often are first addressed by contacting the authors and/or journal editors, and publishers – especially those with hundreds of journals – may not have any idea of the ensuing problem for weeks or months, if at all. Therefore, the larger publishers would be far more likely to show worse scores than publishers with few journals to manage oversight.

      It is exactly this inconsistency that we highlight in the article. Differing policies, attitudes, and responsiveness does not mean that a publisher cannot/should not ask questions about the effectiveness of internal processes and resources used for post-publication quality control or the implementation of uniform measures across journals in its portfolio.

      Third, the dates on retraction notices are not always representative of when an article was watermarked or otherwise indicated as retracted. Elsevier journals often overwrite the html page of the original article with the retraction notice, leaving the original article’s date of publication alone. A separate retraction notice may not be published until days, weeks or even years after the article has been retracted. Springer and Sage have done this as well, as have other publishers – though not to the same extent (yet).

      Historically, The Journal of Biological Chemistry would publish a retraction notice and link it immediately to the original article, but a check of the article’s PDF would show it having been retracted days to weeks earlier. They have recently been acquired by Elsevier, so it is unknown how this trend will play out. And keep in mind, in some ways this is in itself not a bad thing – as it gives the user quicker notice that an article is unsuitable for citation, even while the notice itself is still undergoing revisions. It just makes tracking the time of publication to retraction especially difficult.

      We used the same date for all articles in our study (the one listed in PubMed), thus ensuring a uniform criterion for all publishers. If this date was not in PubMed we used the date from the retraction notes on the journal website but this was for a small number of articles. How different publishers handle retraction processes or the delay with which these are published is primarily related to internal editorial procedures, and these delays are reflected in the ET. In our experience, most articles retracted by Elsevier are available online, supplemented, and not replaced by retraction notes, which we think is an excellent policy.

      1. As best as can be determined, the authors are taking the notices at face value, and that has been repeatedly shown to be flawed. Many notices are written as a cooperative effort between the authors and journal, regardless of who initiated the retraction and under the looming specter of potential litigation.

      Shown to be flawed by who? Indeed, in our study, we refer to the retraction notes published by the journals. The fact that they are incomplete or formulated under the threat of litigation only supports our view that publishers and editors need to make a more significant effort to correct the biomedical literature, including avoiding litigation when the retraction note clearly describes the reasons for retraction. The way the retraction note is worded should be an editorial prerogative and should primarily aim at correcting scientific literature, not at appeasing egos, careers, or financial interests.

      Trying to establish who initiated a retraction process strictly by analyzing the notice language is destined to produce faulty conclusions. Looking just at PubPeer comments, questions about the data quality may be raised days/month/years before a retraction, with indications of having contacted the journal or publisher. And yet, an ensuing notice may be that the authors requested the retraction because of concerns about the data/image – where the backstory clearly shows that impetus for the retraction was prompted by a journal’s investigation of outside complaints. As an example, the recent glut of retractions of papers coming from paper mills often suggest the authors are requesting the retraction. This interpretation would be false, however, as those familiar with the backstory are aware that the driving force for many of these retractions were independent investigators contacting the journals/publishers for retraction of these manuscripts.

      Once again, the author of this review does not seem to fully understand our study, apparently favouring information published on third-party websites over that the journals officially assumed. The retraction notes represent the material available to a researcher doing documentation on a particular topic. The clarity and information contained in the note is the editor's or publisher’s responsibility, reflecting their performance and concern for the integrity of the science. Interpretation of a retraction note/analyzing an article occurs in this context. Not everyone has time for further investigation or to search third-party sites for information that is, with a notable exception, the result of a selection bias.

      Assigning the reason for retraction from only the text of the notice will absolutely skew results. As already stated, in many cases, journal editors and authors work together to produce the language. Thus, the notice may convey an innocuous but unquestionable cause (e.g., results not reproducible) because the fundamental reason (e.g., data/image was fabricated or falsified) is too difficult to prove to a reasonable degree. Even the use of the word “plagiarism” is triggering for authors’ reputations – and notices have been crafted to avoid any suggestion of such, with euphemisms that steer well clear of the “p” word. Furthermore, it has been well-documented that some retractions required by institutional findings of misconduct have used language in the notice indicating simple error or other innocuous reasons as the definitive cause.

      We understand your point of view and the situations presented may be accurate. However, from our point of view, the only valid reference remains the retraction note published on the journal's website. The existence of wording difficulties and various other problems that may arise are more likely to do with a tendency of the reviewer to make excuses for journals reluctant to indicate precisely what the reasons for retracting the article are. There are plenty of retraction notes in which the images with problems (including whether they were plagiarized, reused, manipulated, fabricated, etc.) are indicated with great precision, there are equally plenty of notes in which the word plagiarism is used without hesitation, indicating the sources, how they were informed, what was plagiarized. No matter how many hesitant publishers/editors there are, it should not be forgotten that there are many journals/publishers who take their role seriously, acknowledge and learn from their mistakes, thus providing a real service to the scientific community.

      The authors also discuss changes in the quality of notices increasing or decreasing in publishers – but without knowing the backstory. Having more words in a notice or giving one or two specific causes cannot in itself be an indicator of the quality (i.e., accuracy) of said notice.

      "Knowing the backstory" is not part of our objectives, and neither is assessing the quality of the retraction notes. This is also very difficult to do due to the lack of an accepted standard format. We are trying to propose a score composed of several parameters resulting from existing (or non-existing) data in the retraction notes so that we can have a picture of retractions at publisher level. Knowing the backstory is not relevant, reading and interpreting the official retraction note is relevant.

      1. The authors tend to infer that the lack of a retraction in a journal implies a degree of superiority over journals with retractions. Although they qualify it a bit ( “Are over 90% of journals without a retracted article perfect? It is a question that is quite difficult to answer at this time, but we believe that the opinion that, in reality, there are many more articles that should be retracted (Oransky et al. 2021) is justified and covered by the actual figures.”), the inference is naive. First, they have not looked at the number of corrections within these journals. Even ignoring that these corrections may be disproportionate within different journals and require responsive editorial staff, some journals have gone through what can only be called great contortions to issue corrections rather than retractions.

      We believe that this is a case of reviewer confusion generated either by the insufficiently precise wording of the text or a lack of understanding of our study objectives. We are trying to point out that more than 90% of the journals in the NLM catalogue-PubMed subset have not retracted a single article. We are not trying to say that journals without retracted articles are superior to the others. As explained in the article, we referred to retraction notes, not corrections.

      Second, the lack of retractions in a journal speaks nothing to the quality of the articles therein. Predatory journals generally avoid issuing retractions, even when presented with outright proof of data fabrication or plagiarism. Meanwhile, high-quality journals are likely to have more, and possibly more astute, readers, who could be more adept at spotting errors that require retraction.

      Of course, the quality level of articles in a journal is not determined by the number of articles removed.

      Third, smaller publishers/journals may not have the fiscal resources to deal with the issues that come with a retraction. As an example, even though there was an institutional investigation finding data fabrication, at least one journal declined to issue a retraction for an article by Joachim Boldt (who has more than 160 retractions for misconduct) after his attorneys made threats of litigation.

      Threats of lawsuits are instead a failure of a publisher/journal to adapt to the realities of the publishing business or to the risk of misconduct. This is something that needs to change.

      Simply put, the presence or lack of a retraction in a journal is no longer a reasonable speculation about the quality of the manuscripts or the efficiency of the editorial process.

      We have not attempted to suggest this, we have only analyzed the retracted articles and their associated retraction notes. On the other hand, the way a journal/publisher handles the retraction of problematic articles still reflects, to some extent, the quality/performance of the editorial processes.

      1. I am concerned that the authors appear to have made significant errors in their analysis of publishers. For example, they claim that neither PLOS nor Elsevier retracted papers in 2020 for problematic images. That assertion is demonstrably false.

      This is wrong. In our dataset, there are eleven PLOS articles related to human health with the publication year 2019 and 2020. None of these have images as retraction reasons.

      Regarding the 21 Elsevier articles published in 2020, there is nothing in the retraction notes to indicate that the article was retracted because of the images. In 2 retraction notes there is mention of the comments made by Dr. Bik (The Tadpole Paper Mill - Science Integrity Digest) but the text of these (retraction notes) stops at the authors' inability to provide the raw data underlying the article.

      Our study is based only on the content of the retraction notes published and assumed by the journal, not on opinions/comments appearing on other sites, which, for unknown/unmentioned reasons, are not officially assumed in the retraction note. Therefore, we consider the statement in the review to be questionable at best, as the use of material other than the retraction notes has severe implications for the internal and external validity of the study and the suggestion to use such methods is, in our opinion, wrong. We would also like to draw attention to the fact that many retraction notes are explicitely mentioning the request to provide raw images and the authors' inability to provide them.

      Anyway, as far as images are concerned, our article suggested that there are publishers which seem to adopt image analysis technologies faster than others. The numbers are not really relevant in this case but the trend is: it describes the publishing activity complexity better than the numbers.

      Reviewer response

      We appreciate the authors’ zeal in standing by their work.

      In regard to the deficits in the search process, the author states, “We do not consider that the use of ‘Retracted Publication [PT]’ should be compensated in any way but, if it should be compensated, we wouldn't want to add ‘Retraction of publication’”

      There is a lack of appreciation for the complexities of indexing retracted materials in an indexing site such as PubMed. To have a comprehensive search, one should not be choosing to use either “Retracted Publication [PT]” OR “Retraction of Publication [PT].” One would use both, and then filter out the duplicates, because some retractions are indexed by retraction notices, some only have “Retracted” added to the indexed title and the publication type changed to “Retracted Publication.” Use of only one or the other guarantees that the search is far less comprehensive than it should be.

      The authors state, “In an ideal world, if any effort were to be made, it would be aimed at better indexing and managing existing databases, not at generating query strategies to make up for their shortcomings.”

      There is at least one database (http://retractiondatabase.org) that has a far more comprehensive indexing of retractions and is publicly available for use.

      In Item 3, where it is pointed out that retraction notices themselves are inaccurate and cannot be taken at face value as to the reason behind the retraction, the authors responded, “Shown to be flawed by who?” — By an article cited in the manuscript:

      Fang, Ferric C.; Steen, R. Grant; Casadevall, Arturo (2012): Misconduct accounts for the majority of retracted scientific publications. In Proceedings of the National Academy of Sciences of the United States of America 109 (42), pp. 17028–17033. DOI: 10.1073/pnas.1212247109.

      “To understand the reasons for retraction, we consulted reports from the Office of Research Integrity and other published resources (7, 8), in addition to the retraction announcements in scientific journals. Use of these additional sources of information resulted in the reclassification of 118 of 742 (15.9%) retractions in an earlier study (4) from error to fraud.” Followed by “These factors have contributed to the systematic underestimation of the role of misconduct and the overestimation of the role of error in retractions (3, 4), and speak to the need for uniform standards regarding retraction notices (5).”

      The authors then choose to state that it is the “editorial prerogative” – and that when notices “are incomplete or formulated under the threat of litigation [it] only supports our view that publishers and editors need to make a more significant effort to correct the biomedical literature, including avoiding litigation when the retraction note clearly describes the reasons for retraction.”

      Following our attempt to explain why understanding the real reason behind a retraction is important to study the publication of notices, the authors respond: “Once again, the author of this review does not seem to fully understand our study, apparently favouring information published on third-party websites over that the journals officially assumed.”

      First, yes, we do understand the study. We read a lot of these. Second, the “third-party websites” we prefer include the Office of Research Integrity and the Retraction Watch blog, where background investigations into the causes of retraction notices are described. If the authors are challenging the reference to PubPeer, keep in mind that journals initiate investigations based on comments on that website, and have taken to citing the website in their notices.

      Had the authors not chosen to categorize the reasons for retraction, their reasoning may have had more support – but they did, and in doing so, by just using the notice with no further review, their findings address only the notice itself, with no context.

      We recommend that the manuscript be substantially revised with strong attention to the comments we made in our original review.

    1. Author Response

      Reviewer #1 (Public Review):

      Liu et al investigated the role of Wnt/β-catenin pathway in the genesis of thermogenic adipocytes. Their study shows that some adipocytes exhibited Wnt/β-catenin signaling ("Wnt+ adipocytes") in intrascapular brown adipose tissue (iBAT), inguinal white adipose tissue (iWAT), epidydimal WAT (eWAT), and bone marrow (BM). There was a different level of the possession of Wnt+ adipocytes between the different depots with iBAT expressing 17%, iWAT expressing 6.9%, and eWAT expressing the least at 1.3%. Expression of these adipocytes was noted on embryonic day 17.5 and was present in a higher percentage in female mice compared to male mice and in younger mice compared to older mice, which aligns with their observation that Wnt+ adipocytes are thermogenic.

      The authors also noted that Wnt+ adipocytes can differentiate from human stromal cells. In regards to the pathway, Wnt/β-catenin adipocytes are distinct from classical brown adipocytes at molecular and genomic levels. It was noted that Tcf7L2 was largely expressed in Wnt+ adipocytes but other Tcf proteins (Tcf 1, Tcf 3, and Lef1) were not. Wnt- cells showed a reversible delay in maturation with LF3, however, no cell death was noted. Wnt/β-catenin adipocytes seem to depend on AKT/mTOR signaling. It was further shown that insulin is a key factor in mTOR signaling and Wnt+ adipocyte differentiation.

      Upon cold exposure, UCP1+/Wnt- beige fat emerges largely surrounding Wnt+ adipocytes, implicating that Wnt+ adipocytes serve as a "beiging initiator" in a paracrine manner. Lastly, mice with implanted Wnt+ adipocytes had a significantly better glucose tolerance which suggests that Wnt+ adipocytes have a beneficial impact on whole-body metabolism. I found no major flaws in the method and data largely supports their conclusion that Wnt+ adipocytes have (at least some) a significant role in thermogenesis/metabolism, which I think is a very impressive and innovative finding.

      Thanks so much for the outstanding summary of our manuscript. We feel sorry that we somehow did not make it clear in the original manuscript that the percentage of Wnt+ adipocytes is higher in male mice than that in females.

      Reviewer #2 (Public Review):

      Liu et al present evidence for the surprising finding of Tcf/Lef-active, "Wnt+" mature adipocytes. They report that Wnt+ adipocytes arise during embryogenesis and regulate cold-induced beiging in surrounding adipocytes. Tcf/Lef transcriptional activity in these cells is Wnt-ligand independent and instead appears to be stimulated by insulin-dependent AKT/mTOR signaling. Using a diphtheria toxin inducible depletion mouse model, the authors show that Wnt+ cells play an important role in glucose homeostasis.

      As the authors have acknowledged, proper assignment of adipocyte nuclei is a notoriously difficult histological challenge. Mesenchymal cells sit directly adjacent to the adipocyte plasma membrane and their nuclei are often incorrectly assigned to the adipocyte both in vivo and in vitro. Pparg nuclear co-staining is helpful, however, Pparg is very highly expressed by endothelial cells and Col15a1+ committed preadipocytes, which are intercalated throughout the adipose. The authors have made an impressive attempt to address this concern by generating a Tcf/Lef-CreER mouse line to fluorescently label Wnt+ adipocytes, however, it is not entirely clear if the images presented support the conclusion that mature adipocytes are being labeled. Given that Wnt+ mature adipocytes are the core conclusion of this manuscript, and because this hypothesis runs counter to a large body of literature concluding that Wnt signaling inhibits adipogenesis, the authors have assumed a very high burden of proof that these are indeed Wnt+ mature adipocytes in vivo.

      Thanks for the outstanding summary of our manuscript.

      To address these concerns, the authors could utilize the specificity of in vivo single-nuclei RNA-Seq. Several data resources have been published (https://singlecell.broadinstitute.org/single_cell/study/SCP1376/a-single-cell-atlas-of-human-and-mouse-white-adipose-tissue), and the authors should re-analyze these data for subpopulations of mature adipocytes that express a transcriptional signature of active Tcf/Lef signaling. It is unfortunate that the authors were unable to successfully perform single-nuclei analysis of the Wnt+ adipocytes as this would significantly enhance this manuscript. The physiologic relevance of the single-cell analysis of immortalized, in-vitro differentiated clonal cell lines is questionable.

      We took the advice by Reviewer 2 and intersected our scRNA-seq data on Wnt+ adipocytes with the published single-nucleus sequencing (sNuc-seq) dataset of mouse iWAT (Emont et al., 2022). Because the activation of Tcf/Lef signaling in the Wnt+ adipocytes is relied on AKT/mTOR signaling but not the conventional Wnt ligands and receptors, those traditional downstream markers of Wnt signaling such Axins were not found specifically enriched in the Wnt+ adipocytes. Therefore, the AKT/mTOR-dependent Wnt signaling in Wnt+ adipocytes appears to regulate expression of genes distinct from that controlled by the conventional Wnt signaling pathway. This conclusion is supported by our recent studies that inhibition of this AKT/mTOR-dependent Wnt signaling by LF3 in Wnt+ adipocytes negatively impact pathways implicated in “PI3K/Akt signaling”, “insulin signaling”, “thermogenesis”, and “fatty acid metabolism” et al (see below for details). However, we found that one cluster (mAd3) of sNuc-seq dataset, which is relatively enriched in Tcf7l2, expresses remarked high levels of Cyp2e1 as well as Cfd that encodes Adipsin. These genes, regarded as hallmark of mAd3 cluster, are also uniquely or highly expressed in Wnt+ adipocytes. Interestingly, the percentage of mAd3 among the total iWAT adipocytes in chow-fed male group is about 5%, which is very close to that of Wnt+ adipocytes in vivo (~7%). Thus, mAd3 possibly represents Wnt+ adipocytes in iWAT. These analyses are included in the revision.

      Reviewer #3 (Public Review):

      It is becoming increasingly clear that adipocytes are not homogenous, but rather comprise several distinct subtypes with specific physiological functions. The mechanisms that underlie the development and distinct roles of each adipocyte subtype are of great interest for understanding the biology of metabolic regulation and its impairments in metabolic disease. In this manuscript, the authors describe a previously unknown population of adipocytes in mice, which are characterized by a special form of beta-catenin signaling. They perform a comprehensive series of experiments in cultured cells, in mouse models of in-vivo lineage tracing, and transplantation experiments to define the origin and function of these adipocytes. They find that the formation of these Wnt+ adipocytes is dependent on insulin signaling, and find possible roles in thermogenic adipose tissue development. Overall, the conclusions of this study are very convincing in their identification of a subpopulation of adipocytes displaying non-canonical Wnt signaling. The proposed role of these adipocytes as regulators of thermogenesis is more ambiguous, and their physiological function remains unclear.

      Thanks for the good comments. To distinguish this AKT/mTOR dependent intracellular Wnt signaling in Wnt+ adipocytes from the conventional non-canonical Wnt signaling, we feel that it would be appropriate to call this signaling as atypical Wnt signaling.

      • The new adipocyte types are identified through expression of a reporter for TCF/Lef signaling. This reporter is classically activated by Wnt/beta-catenin and using both siRNA depletion of beta-catenin as well as an allele lacking its transcriptional activation domain, the authors confirm the reporter expression is dependent on the presence of beta-catenin and TCF7L2, but independent of canonical Wnt signaling.

      • The involvement of TCF7L2 is also probed using a specific inhibitor of the beta-catenin/TCF7L2 interactions, LF3, which inhibited reporter expression. Inhibition of canonical Wnt signaling was without effect.

      • The authors isolate clonal lines of precursor cells that give rise to Wnt+ or Wnt- adipocytes from mouse brown adipose tissue. They find that Wnt+ adipocytes are dependent on the Wnt pathway, as inhibition by LF3 induces cell death.

      • To further probe the nature of Wnt+ and Wnt- adipocytes, the authors perform scRNASeq on cells after 7 days of adipose induction and find 2 distinctive cell populations. The finding of 2 distinct populations is expected, given the a priori separation of cells as a function of GFP expression. It is not clear why scRNASeq was chosen over RNASeq on the population, since the fat content of adipocytes may preclude full characterization of the most differentiated cells.

      With scRNA-seq, it would be more convincing to identify specific subpopulation of cells, as adipocytes are well known to be heterogenous.

      Overall, this experiment is less informative on the mechanisms by which Wnt+ adipocytes display Wnt signaling dependency for viability, and what their functional role might be.

      Yes, these are major questions to be addressed in our future studies.

      • The non-canonical nature of Wnt signaling in Wnt+ adipocytes prompted the authors to explore the role of the insulin/PI3K/AKT/MTOR pathway. They find enhanced basal activity of this pathway in Wnt+ adipocytes. It was not explored whether this enhanced activity persists under insulin stimulation; this is relevant as feedback mechanisms within the signaling pathway may result in lower signaling under stimulated conditions.

      • To test the relevance of insulin signaling in-vivo on non-canonical Wnt signaling in adipocytes the authors use the Akita mouse, which lacks the insulin-2 gene and find a marked decrease in reporter activity, confirming the requirement for insulin signaling for expression of this non-canonical Wnt pathway.

      • To determine the functional role of Wnt+ adipocytes, the authors explore their relationship to mitochondrial respiratory activity and thermogenesis. They perform experiments to monitor mitochondrial membrane potential and oxygen consumption rate and find higher overall O2 consumption, and lower membrane potential in adipocyte populations vicinal to Wnt+ adipocytes. Overall these results are not fully convincing: The traces are highly variable from cell to cell, and rigorous quantification of uncoupled respiration is limited by the small number of cell lines analyzed; only one cell line of Wnt- and two Wnt+ adipocytes are analyzed. In situ differences in membrane potential would be more convincing if performed on homogenous collections of Wnt- and Wnt+ adipocytes to better understand stochastic variance.

      Thanks for the suggestions. Actually, the results of mitochondrial membrane potential assay on mixed adipocyte culture gave us the initial hint of the potential paracrine effect of Wnt+ adipocytes.

      • To determine the role of Wnt+ adipocytes in-vivo thermogenesis, the authors expose mice to cold temperature and monitor the proportion of UCP1+ adipocytes in relation to Wnt signaling. They find a proportion of Wnt+ adipocytes expressing UCP1. Whether this proportion is higher or lower than that of Wnt- adipocytes is not quantified, so it is unclear whether Wnt+ adipocytes preferentially develop beige characteristics. The authors find that UCP1+, Wnt- adipocytes are topologically close to Wnt+ adipocytes, and hypothesize a paracrine signaling role. However, this correlation may be explained by known topological biases in inguinal fat pad beiging, where adipocytes closer to lymph node preferentially induce UCP1. The Wnt+ adipocyte population may coincidentally be present in this region.

      As shown in Figure 5-figure supplement 1E, while all Wnt+ adipocytes were co-stained with UCP1, the percentage of Wnt+ adipocytes did not increase after cold challenge. As shown in Figure 5-figure supplement 1C, the initial beiging response is closely associated with Wnt+ adipocytes, but not topological bias.

      • To functionally determine the role of Wnt+ adipocytes in thermogenesis, the authors ablate the Wnt+ lineage through expression of diphtheria toxin using a Fabp4-Flox-DTA mouse crossed to Tcf/Lef-CreERT2 mice. Less than 50% of these mice displayed impaired thermogenesis upon cold exposure. The authors interpret this finding to signify a partial role for Wnt+ adipocyte beiging in thermogenic regulation. This conclusion is not fully supported, as Fabp4 is expressed in many cells other than adipocytes, and therefore the phenotype of the affected mice is not unambiguously attributable to loss of Wnt+ adipocytes. An additional concern is that diphtheria toxin-induced cell death will lead to tissue inflammation, with potential functional effects on thermogenesis. The degree of cell death and inflammation should be measured and reported.

      While Fabp4 is expressed in some SVFs, the Fabp4-Flox-DTA allele is not activated by Tcf/Lef-CreERT2 allele, as T/L-GFP reporter is not seen in freshly isolated SVFs of iWAT (Figure 2-figure supplement 1A). To avoid potential side effects of DTA-induced cell death on adipose tissues, we compounded the Tcf/Lef-rtTA allele with TRE-Cre and floxed Pparg alleles (PpargF/F) to prevent the differentiation of Wnt+ adipocytes. These new results are included in the revision as supplemental results (Figure 5-figure supplement 2G).

      • The finding that Akita mice lack Wnt+ adipocytes was used to determine whether these mice are susceptible to cold-induced challenges. The authors report a decrease in cold-induced UCP1 expression in these mice. This conclusion, derived from a single immunofluorescence image, is not fully convincing in the absence of additional metrics.

      Additional analyses are included in the revision, as Figure 5-figure supplement 3.

      • To further explore the role of Wnt+ adipocytes in systemic metabolism, the authors conduct implantation studies of Wnt+ adipocytes and measure effects on glucose tolerance. They show a significant difference in glucose excursions in mice harboring fat pads developed from Wnt+ adipocytes. These results are convincing, but the conclusion may be due to enhanced volume of additional functional fat developing from Wnt+ adipocytes.

      In this experiment, unbiased mBaSVF adipocytes were used in parallel as control.

    1. Author Response

      Reviewer #2 (Public Review):

      1. The manuscript seems to claim that the study shows that S4 is the voltage sensor and S4 moves in KCNQ2. This has been repeated in Abstract, Introduction and Results. However, by this time S4 movements as a voltage sensor are well accepted mechanisms. The importance of the work is actually that it defines parameters of the VSD movement in KCNQ2 such as the stretch of S4 in and out of the membrane, and the relationship between VSD activation and pore opening. These points should be brought out as the rationale and significance of this work, rather than the well-known S4 function.

      We thank Reviewer# 2 for this important comment that was also brought up by Reviewer# 3. We apologize for over emphasizing that the 4th TM segment is the voltage sensor and that the S4 moves in KCNQ2 channels. This might be the result of the author’s past struggle to convince earlier reviewers that the fluorescence signals at a given position are not an experimental artifact, but S4 moving during channel opening. We are very happy to learn that this is now a well-accepted mechanism.

      In the revised version, we now state:

      Abstract: “Here, we define parameters of voltage sensor movements in wt-KCNQ2 and channels bearing epilepsy-causing mutations using cysteine accessibility and voltage clamp fluorometry (VCF).”

      Introduction: “Similar to that seen in other Kv channels, the fourth transmembrane segment contains several highly conserved positively charged amino acid residues that move in response to changes in membrane voltages that functions as the voltage sensor(25-28)[…]Although these studies provided insight into S4 rearrangements, they did not define parameters of S4 movement, such as the dynamic relationship between S4 activation and pore opening during voltage-controlled gating of KCNQ2 channels.

      Results: We deleted: “Collectively, these close correlations in time (Figure 3) and voltage dependence (Figure 2C) of fluorescence and current suggest that the environmental changes around labeled F192C at the outer end of S4 rendered fluorescence signals that seem to report on S4 motion associated with the opening and closing of the channel gate.”

      And simply state: “The close correlations in time (Figure 3) and voltage dependences (Figure 2G) of S4 motion (fluorescence) and activation gate (ionic current) resemble those observed for homologous KCNQ1 (without KCNE1)(42) and KCNQ3 channels(41, 43)”

      We also rewrote in its entirety the subsection: “Disease-causing mutations differentially affect S4 and gate domains” (Pages 10-11).

      1. The closeness of fluorescence and current traces and FV and GV curves led to the conclusion that the movement of a single VSD could trigger channel opening. The rationale for connecting the experimental observations to this conclusion needs to be well explained when the conclusion is first made. References that have made similar arguments such as Osteen et al PNAS 2010; Westhoff et al PNAS 2019 should be cited. In addition, as the authors recognized in Discussion, the same observations can also lead to an alternative conclusion such that the movements of four VSDs highly cooperative to all activate and then open the pore. However, this alternative mechanism is not mentioned until at the end of the manuscript, while "the movement of a single VSD opening the pore" is firmly claimed in Abstract and Results. Some justifications need to be provided for this.

      Thank you for this important observation, the wording we used was clumsy. Since we removed the kinetic model (Figure 6 in the original manuscript), we have also deleted any sentences that discuss concerted or independent S4 movement in the Abstract and Result sections. We only discussed that these alternatives, concerted or independent S4 movement, might explain our VCF data which shows that both the steady-state voltage dependence of S4 transitions and the kinetics closely follow those of ionic currents. Both references – Osteen et al PNAS 2010 and Westhoff et al PNAS 2019 have also been added – as recommended by the reviewer and apologize for overlooking these references in the original manuscript.

      1. An explanation is needed for how same the covalent MTS modification of N190C at two voltages resulted in different GV relations (Fig 1E).

      Thank you for pointing out this important point. We have spent a good deal of time since we received the reviews answering this important point that was also raised as a concern by Revewer# 1. To that end, we have included additional data that support the idea that N190C channels are accessible in both the open and closed states. This is now clearly addressed in Recommendations for the Authors, first Specific Suggestions from Reviewer #1. See above Response to the first Specific suggestions from Reviewer# 1 on Pages 2-5.

      In the original submission, we only used the protocols shown old Figure 1. We applied MTSET only at +20-mV for the open state and – 80-mV for the closed state. We used – 100-mV and – 120 mV for the closed state of A193C and S199C, respectively, because compared to the wt channels, these cysteine mutants shifted the GV relationship to negative voltages.

      In the revised version, to further strengthen our conclusions, we have used a new protocol: For each cysteine mutant, we have designed a protocol in which we first apply MTSET at hyperpolarized voltages (closed) before switching to depolarized voltages (open) on the same cell, in a pairwise manner.

      This is now described in the Result subsection “State-dependent external S4 modifications consistent with S4 as voltage sensor”, Pages 6-8 of the revised manuscript and new Figure 1 and Figure 1-figures supplement 3 and 4.

      We also apologize for the lack of clarity in citing reference 40 in the original submission. This reference is deleted in the revised version, in light of our new data on N190C (new Figure 1 and Figure 1-figures supplement 3 and 4), which strengthen our claims that N190C modification occurs in in both states (open and closed).

      1. The model in Fig 6F raises several concerns including vertical transitions having the rates of VSD activation and detailed balance is violated.

      The reviewer raises an important concern in our original Figure 6F (model). Based on the Editors and reviewers comments, we have removed Figure 6 from the original manuscript to eliminate any of potential misunderstanding about the data presented. In future studies, we will gather additional fluorescence and current data using different protocols and dimer constructs to provide a more in depth description of KCNQ2 gating.

      1. Discussion. The argument of no intermediate open state based on K/Rb permeability ratio assumes that the pore properties such as ion selection and permeability of KCNQ2 are the same as that of KCNQ1. The evidence for this assumption is not provided or discussed. On the other hand, some evidence suggests that the VSD of KCNQ2 may activate in two steps. For instance, the time course of VSD activation can be fitted with two exponentials, and the fluorescence increases after a plateau at voltages > 0 mV in FV curves (Fig 2C). How these results affect the conclusion should be discussed.

      We agree with the reviewer that the claim of a lack of an intermediate open state in KCNQ2 channels based on the Rb/K data provided in the original submission assumed that the pore properties of KCNQ2 are the same as those seen in KCNQ1 channels. Since we did not show sufficient experimental evidence to prove this point, we have removed Figure 6 (the model) from the revised manuscript. In the future, we will provide more evidence to build stronger support for the potential existence of intermediate and active open states in KCNQ2 channels. As such, we have removed the model shown in the original manuscript. Future studies will be performed to refine the KCNQ2 model, including the use of mutations that can lock the S4 in the intermediate or activated states in KCNQ2, as has been performed in the KCNQ1 channel by Zaydman et al; PMID: 25535795). These experiments will provide more conclusive results regarding the different S4 states.

      We have now re-analyzed the data and concluded that while the time course of the fluorescence appeared to have multiple exponentials, our fluorescence data lacked sufficient resolution to reliably estimate the first (fast) component. This might be because of the low signal-to-noise ratio of our VCF or/and because the filtering might have limited the tau-on from the optical signal (shown to be 20 ms in Figure 3C of the original submission).

      As suggested by reviewers # 3, we have removed the kinetics comparison of fluorescence and current in the revised version of Figure 3, and simply state: …” There is a close correlation between the time course of fluorescence signals and ionic currents at all the voltages tested (Figure 3B, D). The close correlations in time (Figure 3) and voltage dependences (Figure 2G) of S4 motion (fluorescence) and activation gate (ionic current) resemble those observed for homologous KCNQ1 (without KCNE1)(42) and KCNQ3 channels(41, 43).”

      As for the last part of the reviewer comments, the apparent increase in fluorescence after a plateau at voltages > 0mV has now also been revised. We have attempted new VCF at voltages more positive than + 40 mV to probe if a putative second fluorescence component after the plateau phase develops or if it is just artifacts of the experimental system. To get reliably fluorescence signals, we need a huge expression of labeled KCNQ2* channels (often producing currents larger than 100uA). It is considerably more difficult to properly clamp these high expressing cells, especially at extreme voltages. This experimental limitation makes it challenging to draw conclusions about the occurrence of a second fluorescent component. It may be possible to perform the cut—open technique coupled with VCF in order to shed light on this issue, but these experiments would require significant upgrade of the set up that we currently do not have this in place.

      Reviewer #3 (Public Review):

      1. I am convinced that the fluorescence signals reflect the voltage sensor conformation in the system. The authors focus quite a lot of attention on demonstrating that the fluorescence signals are not an experimental artifact, which is fine.

      We thank Reviewer# 3 for this observation. We apologize for over emphasizing that the fluorescence signals reflect the voltage sensor conformation in the system. As state above in response to a similar comment from Reviewer #1, this might be the result of the author’s past struggle to convince earlier reviewers that the fluorescence signals at a given position are not an experimental artifact, but S4 moving during channel opening. This has been amended in the revised version.

      However, I feel the authors could be more cautious in terms of describing how the mutations or dye conjugation may alter some of the gating properties. A place where this may be very important is in the description or characterization of activation kinetics as lacking sigmoidicity, which is part of the argument that these channels may open with only a fraction of voltage sensors activated. This may be correct in the modified (dye-conjugated) channel recordings, but many other recordings of unmodified channels (Figure 1) or WT KCNQ2 or 3 channels exhibit some sigmoidicity. I wonder if this difference may arise because the dye labeling may prevent complete VSD deactivation or interfere with gating in some other way. I would also add that this comment isn't meant to diminish the importance of the findings, I just think it would be wise to qualify some of the description of data with these possible caveats.

      We thank the reviewer for this suggestion, which we believe improves the flow and description of data considering all possible limitations. The reviewer is right. The mutation F192C on its own accelerates the kinetics of activation and causes a leftward shift in the GV curve of KCNQ2 channels. Moreover, labeling F192C with either fluorophore further shifts the GV towards negative potentials.

      In the revised version, we have rewritten the Result subsection ‘Tracking S4 movement of KCNQ2 channels using voltage-clamp fluorometry (VCF)’ almost in its entirety. In this subsection, we now bring to the forefront the changes associated with the measurement of gating properties caused by the mutations or dye conjugation that we agree helps with data interpretation. We made a direct comparison of voltage dependence and kinetics between wt, unlabeled KCNQ2-F192C, and labeled-KCNQ2F192C channels (new Figures 2 and Figure 2-figure supplement 1).

      These differences are also discussed on Pages 12-13 of the revised manuscript. See also below response to Recommendations for the authors:

      1. A brief aside on this point is that a lack of sigmoidicity does not necessarily imply a single transition required for opening - it can also arise if there is a rate-limiting step during a sequence of pre-open transitions.

      Thanks -good point-. We will keep this possibility in mind for future studies where the model will be developed.

      1. The generation of a quantitative model is a useful application of the data. It was not clear to me whether there was a benefit to using multiple-exponential components to fit the fluorescence signals and generate a more complex model. This may add complexity where it may not be necessary, as it is not clear whether the fluorescence signals require multiple components for an adequate fit.

      Thank you for your comment. We agree with the reviewer that our model is underdeveloped and needs additional VCF data to better describe KCNQ2 gating. Based on all three reviewers concerns and as suggested by the Reviewing editor in his summary, we removed the kinetic model from this manuscript and will work to refine this model in our future studies.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper address the "origins and drivers of Neotropical diversity." The Neotropics have high diversity of plants and animals relative to other global regions. There are also many hotspots of global biodiversity (species richness) within the Neotropics.

      This paper aggregates 150 time-calibrated phylogenies from different groups of plants and animals that occur predominantly in the Neotropics. They analyze the diversification dynamics of these clades over time primarily using the method of Morlon et al. (2011; PNAS) as implemented in RPANDA (Morlon et al. 2016). The authors find that most clades have constant rates of speciation and extinction over time.

      Thank you for having reviewed our study and for your feedback.

      The strength of the paper is that it aggregates many previously published phylogenies of Neotropical organisms. However, it is unclear whether the method used gives meaningful inferences about diversification dynamics over time (e.g. Burin et al. 2019; Syst. Biol.). Therefore, the overall contribution of the study is somewhat questionable.

      This is a legitimate comment, and we understand the skepticism on a study that relies on macroevolutionary models of questionable robustness (e.g. Kubo & Iwasa 1995 - Evolution; Rabosky & Lovette 2008 - Evolution; Crisp & Cook 2009 - Evolution; Quental & Marshall 2010 - TREE; Burin et al. 2019 - Syst. Biol.; Louca & Pennell 2020 - Nature; Pannetier et al. 2021 - Evolution).

      The methodology used here has been thoroughly tested with both simulations (e.g. Morlon et al. 2011 - PNAS; Lewitus & Morlon 2018 - Syst. Biol.; Condamine et al. 2019 - Ecol. Lett.) and empirical cases (e.g. Lewitus et al. 2018 - Nat. Ecol. Evol.; Condamine et al. 2019 - Ecol. Lett.). We cannot deny that such a methodology is fully free from issues, which affect all birth-death models, and brings the question: are we able to reliably infer the diversification model and identify parameter values of this model (Louca & Pennell 2020 - Nature)? These concerns are not likely to be resolved in the short term. Although many studies are making progress in understanding the behavior of diversification rate functions, showing, for example, that equally likely diversification functions (i.e. the congruent parameter space of Louca & Pennell 2020 - Nature) can share common features, with diversification rate patterns being robust despite non-identifiability (Höhna et al., 2022 - bioRxiv; Morlon et al., 2022 - TREE).

      Being aware of these concerns, we also relied on the recently developed Pulled Diversification Rates method (Louca & Pennell 2020 – Nature; Louca et al., 2018 - PNAS) that is supposed to correct for the identifiability issue raised by recent studies. Hence, applying both traditional and pulled birth-death models to all phylogenies, we have shown a good consistency in the inferred models, which suggests that our study can provide meaningful estimates of diversification. Our empirical study is also one of the first to perform such a large-scale methodological comparison in diversification analyses (pulled vs. traditional birth-death models) while addressing a key question in evolutionary biology. We have now emphasized this point in the conclusions of our study: “To the extent possible, these results are based on traditional diversification rates, and on the recently developed Pulled Diversification Rates method that is supposed to correct for the identifiability issue raised by recent studies associated with traditional diversification rates (71). Hence, applying both traditional and pulled birth-death models to all phylogenies, we have shown a good consistency in the inferred models, which suggests that our study can provide meaningful estimates of diversification”.

      The design of the study is also somewhat problematic. There is no comparison to other regions outside the Neotropics, so the study cannot address why the Neotropics are so diverse relative to other continental regions. Similarly, within the Neotropics, the authors do not find significant differences in diversification rates or dynamics among regions. As far as I can tell, they do not attempt to relate patterns of diversification to patterns of species richness among regions within the Neotropics (and presumably they would find no significant patterns if they did).

      We agree with this remark. We are sorry for this confusion. Our study does not aim at addressing why the Neotropics are more diverse than other regions in the world. We simply wanted to establish that the Neotropics are the richest region in the world based on previous studies, and that we are interested in understanding what are the patterns/drivers behind such a diversity. In the Introduction, we state that such diversity is not evenly distributed within the Neotropics, and that some regions are richer (e.g. Andes) than others (e.g. southern cone of South America). Diversity models, from Stebbins (1974), have long been proposed to explain this unbalanced diversity. Our study has then defined different bioregions within the Neotropics in which we have looked for differences in diversification patterns. In other words, we do “attempt to relate patterns of diversification to patterns of species richness among regions within the Neotropics”, although we were not able to explain the observed differences in species richness by differences in diversification dynamics (i.e. diversification dynamics are similar across regions). Please, see our response to the essential revision point 1 addressing this comment.

      In the revised version, we have changed the title of the study as: “Diversification dynamics of plants and tetrapods in the Neotropics through time, clades and biogeographic regions”. We hope you will find this new title better fits the content of the article. In addition, to avoid any confusion in light of your comment, we have deleted the following sentence from the introduction: “But such an assessment is required to understand the origin of Neotropical diversity and why the Neotropics are more diverse than other regions in the world”.

      The authors set up their study by claiming that most previous attempts to explain Neotropical diversity relied on two evolutionary models: cradles vs. museums of diversity. The justification cited for this thinking comes mostly from papers from the last century or before. I do not think that this represents the cutting edge of modern thinking about this topic. Many researchers moved on from this dichotomy long ago.

      Thank you for this interesting comment. You are right. The cradle and museum models of diversity are indeed old definitions (Stebbins 1974 - Flowering Plants: Evolution Above the Species Level), but they were convenient to formulate clear and testable hypotheses on the processes underlying the observed patterns of diversity that Stebbins described. We agree that Stebbins’ view is likely outdated, and that is why we took advantage of these models to draw a series of hypotheses relying on evolutionary processes, which has been argued as a “cutting edge of modern thinking about this topic” (Vasconcelos et al. 2022 - Am. Nat.). In the revised version, we have extended the explanation for our rationale to rely on Stebbins’ models and propose process-based hypotheses to explain diversity patterns. We also cite Vasconcelos et al. (2022 - Am. Nat.). We have modified the introduction as follows: “Although the concepts of cradle and museum have contributed to stimulate numerous macroevolutionary studies, a major interest is now focused on the evolutionary processes at play rather than the diversity patterns themselves (23). Four alternative evolutionary trajectories of diversity dynamics could be hypothesized to explain the Neotropical diversity observed today: …”.

      However, we will argue as well that some contemporary studies still rely on the cradle and museum framework to frame their studies, for example: McKenna et al. (2006 - PNAS), Couvreur et al. (2011 - BMC Biol.), Condamine et al. 2012 (BMC Evol. Biol.), Moreau & Bell (2013 - Evolution), Dornburg et al. (2017 - Nat. Ecol. Evol.). A search in Google Scholar with "Neotropic AND cradle AND diversif*" returns 1,700 results since 2010. That is why we would like to emphasize that this framework should be abandoned, because it does not rely on evolutionary processes and does not consider the full spectrum of hypotheses explaining Neotropical diversity. In the revised version, we have qualified our assertion that most studies are based on these models, which we agree is not entirely true. We have modified the corresponding paragraph as follows: “Attempts to explain Neotropical diversity traditionally relied on two evolutionary models. In the first, tropical regions are described as a “cradle of diversity”, [...] Although not mutually exclusive (15), the cradle vs. museum hypotheses primarily assume evolutionary scenarios in which diversity expands through time without limits (16). However, expanding diversity models may be limited in their ability to explain the entirety of the diversification phenomenon in the Neotropics. For example, expanding diversity models cannot explain the occurrence of ancient and species-poor lineages in the Neotropics (17–19) or the decline of diversity observed in the Neotropical fossil record (20–22). Although the concepts of cradle and museum have contributed to stimulate many macroevolutionary studies, the major interest is now focused on the evolutionary processes at play rather than the diversity pattern (23)”. We hope you will find this new paragraph better represents current thinking in the field.

      There are potentially interesting differences in the diversification dynamics of plants and animals, but this depends on whether we can believe the inferences of the diversification dynamics or not.

      Thank you for pointing this out. We understand the concern because of the general (not new) skepticism on macroevolutionary models (e.g. Kubo & Iwasa 1995 - Evolution; Rabosky & Lovette 2008 - Evolution; Burin et al. 2019 - Syst. Biol.; Louca & Pennell 2020 - Nature; Pannetier et al. 2021 - Evolution). Unfortunately, the study of PDR did not help to confirm/reject this particular conclusion.

      We thus remain cautious with our results, and we have acknowledged several caveats that should be kept in mind when interpreting them. Here, the same methodological treatment has been applied to both animals and plants, and yet the results indeed indicate different diversification patterns. In addition, our results remained stable to AIC variations (Figure 5 - figure supplement 1), and regardless of the paleo-temperature curve considered for the analyses. Still, we do not “believe” the inferences made with birth-death models in general are accurate, but as long as these models are applied in a well-defined framework and thoroughly performed with a hypothesis-driven approach, recent studies have shown that one can interpret the results and draw conclusions (Helmstetter et al. 2021 - Syst. Biol.; Morlon et al. 2022 - TREE).

      For this new version of the manuscript, and following the suggestions of reviewer 3, we have conducted new analyses to assess whether the contrasted diversification dynamics found here between plants and tetrapods could be explained by differences in their datasets (i.e. differences in tree size, crown age, or sampling fraction of the phylogenies). We found that the higher proportion of increasing dynamics observed in plants cannot be explained by significant differences in these factors, strengthening our conclusions.

      Reviewer #2 (Public Review):

      In this study, the authors explored the evolution dynamics of Neotropical biodiversity by analyzing a very large data set, 150 phylogenies of seed plants and tetrapods. Furthermore, they compared diversification models with environment-dependent diversification models to seek potential drivers. Lastly, they evaluated the evolutionary scenarios across biogeographic regions and taxonomic groups. They found that most of the clades were supported by the expansion model and fewer were supported by saturation and declining models. The diversity dynamics do not differ across regions but differ substantially across taxa. The data set they compared is impressive and comprehensive, and the analysis is rigorous. The results broadened our understanding of the evolutionary history of the Neotropical biodiversity which is the richest in the world. It will attract broad interest to evolutionary biologists as well as the public interested in biodiversity.

      Thank you very much for your review and the positive input.

      Reviewer #3 (Public Review):

      This manuscript seeks to address a series of questions about lineage diversification in the Neotropics. The authors first fit a range of lineage diversification models to over 150 neotropical seed plant and tetrapod phylogenies to characterize diversification dynamics. Their work indicates that a constant diversification model was most frequently the best fit model, while time-, temperature- and Andean uplift-dependent models were far less frequently favored. The authors then attempted to determine whether distinct biogeographic clusters existed by using clade abundance patterns as a proxy for long-term diversification within regions. They found that while clades were widespread across ecoregions, regional assemblages could be binned into five clusters reflecting clade endemism. Finally, they asked whether diversification dynamics of individual lineages varied by parent clade, by environment (temperature through time, and Andean uplift) and by biogeographic region, finding that diversity trajectories best explained by environmental drivers and parent clade identity, while no significant association was detected with biogeographic region. I especially appreciated the detailed model-testing procedure, the inclusion of pulled rates, tests for phylogenetic signal in the results, and the acknowledgment of caveats. By using a massive dataset and, and a battery of cutting-edge analyses, the authors provide new insight into questions that have intrigued biologists for decades.

      Thank you for reviewing our study and for your positive feedback.

      1. The neotropics, as defined here, extends from Tierra del Fuego to Central Florida, rather than from the Tropic of Cancer-Capricorn. I was confused by this broad circumscription, and wondered whether the findings presented here could be biased by the inclusion of these exclusively or primarily extra-tropical regions (such as "elsewhere" and "Chaco+Temperate south America") and lineages.

      Thank you for this comment, which is also in line with the second comment of Reviewer 1. We understand the confusion. The Neotropics, as originally defined by Alfred Wallace, represent a broad region including many types of ecosystems and biomes (not only tropical ones): i.e. the Neotropical realm. It also has a paleobiogeographic significance, as the whole South American continent was isolated for tens of millions of years (Simpson 1983). This definition is well accepted in the field of biogeography and evolutionary biology and we followed it to avoid adding a new definition. A Google Scholar search with keywords “Neotropic AND phylogen AND diversificat*” returns >24,000 hits. Our biogeo-regionalization and clustering results also corroborate the strong connection between South American temperate and tropical biotas: very few clades were restricted or exclusive to a single region, and in most cases, clades comprised species from tropical regions (Cerrado, Caatinga) together with species from the temperate South America zones (Chaco, Temperate South America; Figure 6, Source Data 1).

      That being said, we did not find significant differences in diversification rates (or diversity dynamics) across temperate and tropical regions (indeed, between any region), even if temperate regions were analyzed separately (Figure-6-figure supplement 2), suggesting that our results would have been similar if we had confined the Neotropics to tropical latitudes, as in a more climatic circumscription. Although, if we would have circumscribed the Neotropics to the tropical latitudes, many of the 150 clades would have not been selected. Hence, our study would have less insights into our understanding of the diversification processes explaining the Neotropical biodiversity in the broad sense.

      1. Model categories and clade diversification dynamics were also linked to the size and age of the phylogeny, such that small and young clades tended to exhibit constant diversification, while exponential and declining dynamics were linked to more diverse and older clades. As one of the main conclusions is that seed plant diversification is more frequently characterized by constant diversification (relative to that of tetrapods), I cannot help but wonder if seed plant phylogenies tend to also be younger and less diverse than those of tetrapods. Figure S1 shows distributions an overview of the distribution but lacks a formal, statistical comparison.

      This is a very good point. We agree this comparison is relevant to support our conclusions, but it was missing from our results. We have now compared tree size, crown age and sampling fraction across taxonomic groups, and found that the higher proportion of increasing dynamics, characteristic of plants, cannot be explained by significant differences in these factors. As can be seen in new Figure-2-figure supplement 2 on the manuscript, tree size does not differ among plants, mammals, birds and squamates. Crown age does not differ among plants, mammals and birds. Groups do differ on sampling fraction: plant (p < 0.01) and squamate (p < 0) phylogenies are significantly worst sampled than the phylogenies of other groups. Yet plants show a higher frequency of increasing dynamics than squamates, and other tetrapods (Figure 4). Incomplete taxon sampling has the effect of flattening out lineages-through-time plots towards the present, and thus artificially increasing the detection of diversification slowdowns rather than diversification increases (Cusimano & Renner 2010 – Syst. Biol.).

      We have included this important piece of information in the results “In our dataset, amphibian phylogenies are significantly larger than those of other clades (p < 0.05) (Figure 2 - figure supplement 2). Amphibian and squamate phylogenies are also significantly older (p < 0). Groups also differ in sampling fraction: plant (p < 0.01) and squamate (p < 0) phylogenies are significantly worst sampled than phylogenies of other groups.”; and in the discussion section: “Differences in the phylogenetic composition of the plant and tetrapod datasets do not explain this contrasted pattern. On average, plant phylogenies are not significantly younger or species-poorer than tetrapod phylogenies (Figure 2 - figure supplement 2). Yet, the proportion of clades experiencing increasing dynamics is significantly higher for plants (Figure 4). Plant phylogenies are significantly worst sampled than those of most other tetrapods, though, as explained above, incomplete taxon sampling has the opposite effect: flattening out lineages-through-time plots towards the present (83).”

      1. I wondered whether it was possible to disentangle time-dependent decreasing diversification from decreasing temperature in young trees? I raise this because it appears that (generally speaking) most of the clades have diversified over periods in which temperature has generally been declining.

      This is also a very good point. It is common to observe that two different models are equally likely or close in terms of statistical support. Previously, Condamine et al. (2019 - Ecol. Lett.) reported that the ΔAIC between the best and second-best diversification model was often below the threshold of 2, which is typically chosen to statistically distinguish models (see Fig. 3 and Fig. S5 in Condamine et al. 2019). Simulation analyses confirmed that it was not enough to distinguish the best and second-best models with confidence (see Fig. S6 in Condamine et al. 2019). This applies to any kind of clade.

      However, in the case of time-dependent decreasing diversification and temperature-dependent decreasing diversification, one can further test the effect of past temperatures by smoothing more the temperature curve so that the features of ups and downs are removed. Previously, Condamine et al. (2019 - Ecol. Lett.) found that smoothing strongly decreased the support for temperature-dependent models (Fig. S13a) to the point where it was lost (Fig. S13b), showing that the support for temperature-dependent models was not simply due to a temporal trend in diversification rates potentially unlinked to temperature.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Major comments:

      Are the key conclusions convincing?

      We discuss 4 key conclusions.

      __# 1 __A PRC of the segmentation clock was constructed.

      Although the authors have produced an interesting phase map, the regulation function F(\phi) of the circle map does not give the phase response curve (PRC) (Hoppensteadt & Keener 1982, Guevara & Glass 1982). This holds only when the system is stimulated with very short pulses (ideally Dirac delta), but the experimental pulses here are a quarter of the intrinsic period.

      There are several definitions of the PRC (Dirac pulses PRCs, linear PRCs, etc.). We use the general definition from Izhikievich, 2007: “In contrast to the common folklore, the function PRC (θ) can be measured for an arbitrary stimulus, not necessarily weak or brief. The only caveat is that to measure the new phase of oscillation perturbed by a stimulus, we must wait long enough for transients to subside“.

      The corresponding equation from Izhikievich (section 10.1.3) is

      PRC(θ)= θ_new-θ

      which is equivalent to our Equation 1.

      Hence, the key assumption we make is that after perturbing the system, we are back on the limit cycle as pointed out by Izhikievich. We think this is a reasonable assumption, because the perturbation we impose is relatively weak, despite pulsing for almost one quarter of the intrinsic period. The concentrations of DAPT we used in this current study are just enough to elicit a measurable response, and further lowering the concentration does not result in entrainment within our experiment time (0.5uM, Figure S7B in submitted version of the manuscript). Additionally, we previously reported that periodic pulsing with 2uM DAPT did not result in change of the Notch signaling activity with respect to control samples (Sonnen et al., 2018). Along similar lines, the DAPT drug concentrations we used are much lower compared to what has been used in previous studies aiming to perturb signaling levels, e.g. 100uM and 50uM used in study of segmentation clock in zebrafish embryos (Özbudak and Lewis, 2008 and Liao et al., 2016, respectively), and 25uM used in study of the segmentation clock in mouse PSM cells (Hubaud et al., 2017). Combined, we reason that we apply weak perturbations that allow to extract the PRC of the segmentation clock during entrainment. Additional evidence that indeed we have revealed a meaningful PRC is provided below, please see our response to point #3.

      __# 2 __Furthermore, in eq. 1 T_ext must be the winding number, and the modulus must be in units of

      phase, either one or two pi, for the circle map to be correct. Thus, calling the measured response of the system a PRC is not convincing.

      We thank the reviewer for pointing this out. We indeed rescaled everything to express the PRC in units of phase. We made this more explicit and updated equations throughout the text.

      __# 3 __The system is being entrained. Technically, It would also be easier to get the stroboscopic maps

      in the quasi-periodic regime since all the points in the circle will be sampled. Since no quasi-periodic response was demonstrated, the claim of entrainment is not convincing.

      While, in principle, PRC can be indeed obtained from responses in the “quasi-periodic” regime, such an approach is, in practice, challenging due to the intrinsic noise. The closest approximation to this is the phase response after the first pulse, that we reproduce below and compare to our inferred PRC, where we indeed clearly see a high noise level. Nevertheless, also the PRC based on the 1st pulse is in agreement with the PRC we derived from the entrainment data.

      In the entrained regime, one can get a much more reliable estimate of the phase response despite the noise. The level of noise in the stroboscopic map lowers as the samples approach entrainment (Figure S12), and the entrainment phase itself is a reliable statistical quantity that can be used to infer regions of the PRC as the detuning is varied.

      In addition, and maybe even more importantly, we identify several key features characteristics of entrainment, such as the change of entrainment phase as a function of detuning (Figure 7, Figure S6-S7 in submitted version of the manuscript) and the dependency of the time to entrainment as a function of initial phase (Figure 6). While additional features can be linked, in theory, with entrainment, i.e. period-doubling, higher harmonics (Figure 5), quasi-periodicity, we do not agree with the reviewer that all of these need, or in fact, can be found in the experimental data, in particular because of the influence of the noise. Conversely the positive experimental evidence that we provide for the presence of entrainment, combined with the theoretical framework we develop, justifies, in our view, the conclusions we make.

      __# 4 __The response of the system to external pulses is compatible with a SNIC. This is compatible, but

      it is equally compatible with other explanations. Assuming that the PRC is the same as the regulation function F(\phi), the PRC in Kotani 2012 (PRL 2012 fig. 3C) would be a similar shape as that shown by the authors. Similar models to that in Kotani et al., have been studied, but a SNIC has not been found (an der Heiden & Mackey 1982). It is relatively straightforward to construct a phenomenological model with a SNIC, but having underlying biological insight is not guaranteed. No argument for choosing a SNIC is given, so this emphasis of the paper is not convincing.

      It is true that the mapping of PRCs to oscillators is undetermined, in the sense that many systems could potentially give rise to similar PRCs. That said, there is value in parsimonious models, which often generalize very well despite their simplicity. This explains why in neuroscience, constant sign PRCs are generally associated with SNIC. There is a mathematical reason for this : 1-D oscillators with resetting (such as the quadratic fire-and-integrate model) are the simplest models displaying constant sign PRCs, and are the “normal” form for SNICs. In other words, SNIC bifurcations are among the simplest ones compatible with constant sign PRCs, and we think it is informative to point this out. In our manuscript, we go one step further by actually fitting the experimental PRC with a simple, analytical model that allows us to compute Arnold tongue for any values of the perturbation (contrary to more complex models).

      Other models such as Kotani 2012 can display similar PRC shapes, but they are of mathematically higher complexity, and furthermore it is not clear how such systems might behave when entrained. For instance that model in particular uses delayed differential equations, and as such contains long term couplings, so that a perturbation might have effects over many cycles, which is not consistent with the hypothesis we here make of a relatively rapid return to the limit cycle. Furthermore, for more complex models, PRCs are analytical only in the linear regime, while our model is analytical for all perturbations. That said, we agree that other types of oscillators can be associated with constant sign PRCs, and we have given more details in this part, in particular we better emphasize the Class I vs Class II oscillators as a way to broaden our discussion on PRC, and emphasize the “infinite period” bifurcation category which is more intuitive and further includes saddle node homoclinic bifurcations.

      __# 5 __The work demonstrates coarse graining of complex systems.

      This conclusion is correct, but coarse graining theory-driven analysis and control of dynamical systems has been established for many years. What is new here is that it is applied specifically to the in vitro culture system of the mouse segmentation clock.

      We agree it is new to successfully apply coarse-graining analysis and, importantly, control, to the in vitro culture system of the mouse segmentation clock. We also agree that such an approach has been pioneered and established for many years, especially in (theoretical) physics, but indeed, the key question is whether and how this can be applied to complex biological systems. Insights coming from theoretical considerations on idealized physical systems might not necessarily apply to biology, as already pointed out by Winfree.

      There are still very few examples in biology with coarse graining similar to what we do here. We think there is immense value in demonstrating that quantitative insights, and control of the biological systems, can be obtained without precise knowledge of molecular details, which is still counter-intuitive to many biologists. In this sense, we think our report will be of interest to both colleagues within the field of the segmentation clock and also to anyone interested to in the question, how theory and physics guided approaches can enable novel insight into biological complexity.

      Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      Following on the points above, each of these needs to be corrected or re-done, and/or the conclusions need to be modified accordingly.

      We have modified the manuscript in response to all those points.

      # 6 Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. If the authors wish to make the strong claim of determining a true PRC, Dirac delta-like perturbation needs to be applied, or approximated by short time duration pulses compared to the intrinsic period.

      Please refer to our response to point #1 and #3..

      # 7 *Are the suggested experiments realistic in terms of time and resources? It would help if you could *

      add an estimated cost and time investment for substantial experiments.

      It's not clear to this reviewer if it is feasible to deliver a very short pulse and record a response. But this may not be relevant, see above.

      Please refer to our response to point #1 and #3 .

      Are the data and the methods presented in such a way that they can be reproduced?

      Yes.

      Are the experiments adequately replicated and statistical analysis adequate?

      Yes.

      Minor comments:

      Specific experimental issues that are easily addressable.

      No issues.

      Are prior studies referenced appropriately?

      Yes.

      # 8 Are the text and figures clear and accurate?

      Figure 1D illustrates how a PRC should be obtained, but doesn't show the experimental protocol applied in the paper.

      Figure 1D is a general introduction on the phase description of oscillators and phase response. It demonstrates how a perturbation can change the phase and is not supposed to represent the experimental protocol. We describe how data are analyzed and how phases are extracted in Supplementary Note 1.

      __# 9 __In Figure 5B, 10 uM DAPT, the traces are already synchronized before the pulse train starts,

      which makes the subsequent behavior difficult to interpret.

      It appears here that by chance, the samples were already almost synchronized. We notice however that the establishment of a stable rhythm with the pulses (which here is not a multiple of the natural period) supports entrainment, and is already evident when looking at the timeseries with respect to the perturbation. The temporal evolution of the instantaneous period further confirms this, showing a change in period close to ½ zeitgeber period (which is very different from the natural period of ~140 mins). This also relates to point #35, in reply to both comments we have further expanded this figure to better show the 2:1 entrainment, adding statistics on the measured period and period evolution for a zeitgeber period of 300 mins.

      # 10 Do you have suggestions that would help the authors improve the presentation of their data and Conclusions? The text includes several paragraphs reviewing broad principles of coarse graining and making general conclusions. This is confusing, because, as mentioned above, there is no new general advance in this paper. The interesting contributions here are specific to the applications to the segmentation clock, and the text should be focused on this aspect.

      As commented above for #3 , we respectfully disagree that there is no “new general advance” in this paper. It is far from obvious that a complex ensemble of coupled oscillators implicated in embryonic development would be amenable to such coarse-graining theory. Of note, we still do not have a full understanding of neither the core oscillators in individual cells, nor what slows these down and eventually stops the oscillations, and multiple recent works suggest that both phenomena are under transient nonlinear control (e.g. our own work in Lauschke 2013). It is remarkable that despite this lack of detailed mechanistic insight, general entrainment theory can be applied to the segmentation process at the tissue level. We further show that classical entrainment theory alone is not sufficient to account for the experimental findings. Specifically, we need to account for a period change that we interpret as an internal feedback, an insight that would be impossible without our coarse-graining approach. While the results might of course be specific to the segmentation process, we think our approach motivated by coarse-graining theory and leading to new insights into the process is of general interest. We tried to make these points explicit in our conclusion.

      Reviewer #1 (Significance (Required)):

      Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      Description of the complex mouse segmentation clock in terms of a simple model and its PRC is an interesting, original and non-trivial result. The proposal that the segmentation clock is close to a SNIC bifurcation provides a consistent dynamical explanation of slowing behavior that has been recognized for some time, but not fully understood. This proposal also raises a hypothesis about the behavior of the underlying molecular regulatory networks, which may be tested in the future. The increase or decrease of the intrinsic period due to the zeitgeber period is not expected from theory, pointing to structures in internal biochemical feedback loops, an idea which again may be tested in the future. Also surprising from a theoretical perspective, the spatial gradient of period in the system persisted after entrainment. Although the categorization of the generic behavior is interesting, by its nature there is little from this that might give a typical developmental biologist any conclusions about pathways or molecules. The successes and limits of the theoretical description do nevertheless focus future attention on interesting behaviors.

      # 11 Place the work in the context of the existing literature (provide references, where appropriate).

      Such an analysis of the segmentation clock is based strongly on the experimental system and results in Sonnen et al., 2018, and goes well beyond it in terms of the dynamical analysis. It provisionally categorizes the mouse segmentation clock as a Class I excitable system, allowing its dynamics at a coarse grained level to be compared to other oscillatory systems. In this aspect of simplification, it is similar to approach of Riedel-Kruse et al., 2007 who used a mean-field model of oscillator coupling to explain the synchrony dynamics observed in the zebrafish segmentation clock in response to blockade of coupling pathways, thereby allowing a high-level comparison to other synchronizing systems.

      It is interesting the reviewer sees similarities with the work of Riedel-Kruse et al, which uses a mean-field variable Z that corresponds to a classical approach, as described in Pikovsky’s textbook, to quantify synchronization of oscillators. In our view, while of course we work in the same context of coupled oscillators in the PSM, our approach based on perturbing and monitoring the system’s PRC in real-time provides a novel strategy to gain insight. This is evidenced by the fact that our quantifications of synchronization and insight into the PRC is the basis to exert precise control of the pace and rhythm of segmentation.

      State what audience might be interested in and influenced by the reported findings.

      Developmental biologists, biophysicists

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Developmental biology, somitogenesis, dynamical systems theory, biophysics, cell signaling


      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: This is a beautifully elegant study that tests how previously published theoretical predictions about entraining nonlinear oscillators applies to a biological oscillator, the segmentation clock. The authors use a combination of state of the art experimental techniques, signal processing and analytical theory to reach a series of interesting and novel conclusions.

      They show that the segmentation clock period can be entrained through Notch inhibitor (DAPT) pulses acting as an external clock (referred to as zeitgeber) using a previously developed and sophisticated microfluidic perfusion system. Pulsing DAPT every 120 to 180min can change the internal clock period while entrainment beyond this range leads to higher order coupling to the zeitgeber period, i.e. entrainment of every other pulse. They then perform entrainment experiments where the concentration of DAPT is changed to elicit a change in the strength of interaction between the internal clock and the external stimulus (referred to as zeitgeber strength); interestingly at low strength response to entrainment is more variable leading to entrainment occurring in some samples while others remain unaffected (Figure 4A); overall, higher concentration leads to faster entrainment (Figure 4C). The experimental data is then analysed using stroboscopic maps to reveal that a stable entrainment phase shift is achieved between the internal clock and the external zeitgeber. Phase response curve (PRC) analysis indicates that the system response is not sinusoidal but predominantly characterised by negative PRC, a behaviour consistent with saddle-node on invariant cycle (SNIC); it also reveals that the intrinsic period changes in a non-linear way and that this effect is reversible when external stimulation stops. Finally, a theoretical model is proposed to represent the segmentation clock as a dynamical system; this is based upon Radial Isochron Cycle with Acceleration (ERICA), an extension motivated by the PRC analysis results which are incompatible with a Radial Isochron Cycle (RIC); this model has predictive capability and could be used to design new control strategies for entrainment of the segmentation clock.

      This study makes a series of key conclusions which are of particular importance in understanding the dynamic response of a biological oscillators. Firstly, given it's the characteristics of the dynamic response to entrainment, the segmentation clock is likely close to a SNIC bifurcation and this can explain the tendency for relaxation of the period over time. Secondly, the clock period was changed in a non-linear way in the direction of the zeitgeber period, a finding which is interpreted to indicate the presence of feedback of the segmentation clock onto itself, potentially via Wnt. This makes an excellent prediction that if tested experimentally would greatly improve the impact of the study. It is also noted that the entrainment of the segmentation clock does not abolish spatial periodicity and phase wave emergence suggesting that single cell oscillators can adjust to periodic perturbation while maintaining emergent properties. This is also a significant result that would need to be followed up with experiments and computation however would be best suited to a separate study.

      Major comments:

      __# 12 __The coarse graining is a major point that would need to be clarified since the rest of the analysis

      and theoretical modelling in the paper flow from this. Firstly, the interpretation of the schematic in Figure 1A on experimental data collection is not immediately obvious to the reader, lacks a clear flow between the different panels or steps (which could be numbered for example) and does not have a legend to indicate the different colour mapping.

      We are grateful to the reviewer for this comment. We have implemented in Figure 1A all the changes suggested by the reviewer: we numbered the different steps and have added a colour mapping. In addition we have rephrased the caption of Fig 1A to better connect the experimental steps.

      __# 13 __Secondly, Figure 2A which explicitly addresses coarse graining is not clear enough. Is the

      message here that by excluding the inner parts of the sample with a radial ROI, a similar dynamic response is observed over time?

      Yes, indeed this is the point and we have adjusted the figure and text to explain this better. Our goal is to focus on the quantification of segmentation pace and rhythm. This is best captured by reporters such as LuVeLu, which has maximum intensity in regions where segment forms, and which dynamics is known to be strongly correlated to segmentation (Aulehla et al., 2007; Lauschke and Tsiairis et al., 20132). The global ROI is thus expected to precisely capture these segmentation and clock dynamics and we have now included more validation data and have also edited the text to make this very important point clearer:

      “To perform a systematic analysis of entrainment dynamics, we first introduced a single oscillator description of the segmentation clock. We used the segmentation clock reporter LuVeLu, which shows highest signal levels in regions where segments form \cite{Aulehla_A_2007}. Hence, we reasoned that a global ROI quantification, averaging LuVeLu intensities over the entire sample, should faithfully report on the segmentation rate and rhythm, essentially quantifying 'wave arrival' and segment formation in the periphery of the sample.”

      Figure 2A indeed shows that the dynamics (from the timeseries) is very similar when considering the entire field of view (global ROI) or when considering only the periphery of the 2D-assay (excluding central regions). We modified Figure 2A to clarify this point by indicating each measurement as either global ROI or global ROI minus the diameter of the excluded circular region (e.g. global ROI - 50px). We also emphasized in the caption that timeseries are obtained using global ROI, unless otherwise specified. We included a link (https://youtu.be/fRHsHYU_H2Q) in the caption to a movie of 2D-assay subjected to periodic pulses of DAPT (or DMSO) and corresponding timeseries from global ROI.

      Since the inner part of the sample corresponds to the posterior side how do we interpret similarities and differences between signals with different ROIs?

      As stated above, the global ROI measurements essentially capture the signal at the periphery where segments form and faithfully mirrors segmentation rate and rhythm. We have now included a comparison to the center ROI, also in response to reviewer’s comments, see our response #34.

      The result shows that the period and PRC in the center matches the one found in the periphery, i.e. global ROI. We have shown previously that center and periphery differ in their oscillation phase by 2pi, i.e., one full cycle (Lauschke et al., 2013). We interpret these findings as confirmation of our analysis strategy, i.e. the global ROI allows a very reproducible, unbiased quantification that reports on segmentation clock and period.

      __# 14 __A quantitative analysis of essential coarse-grained properties such as period and amplitude

      should be performed for different ROIs and across multiple samples. As this effectively masks any spatial differences, limitations of this approach should be clearly stated in the Discussion. For example in lines 466-470 where it is difficult to interpret the slowing down tendency and relate back to single cell level.

      As outlined in our response to comment #13 and also #34, we chose an analysis that allows to determine the segmentation pace and rhythm, i.e. segment formation, which is well captured by LuVeLu signal and a global ROI analysis. We agree that a spatially resolved analysis of dynamic behaviour is important (and indeed a gradient of amplitude might be relevant in such context), but we think this is beyond the scope of the current study focused on the system level segmentation clock behaviour. We have revised the discussion as suggested by the reviewer to make this point approach and the need for future studies clearer.

      __# 15 __The functional characterisation of the sample using LFNG, AXIN2 and MESP2 is unclear. The

      images included in Figure 2D representing expression observed when tissue explants are grown within the microfluidic chip are difficult to interpret and would require a more detailed description of anterior-posterior, pillars etc; it is also difficult to view the bright-field since it is presented as a merged image.

      It is particularly difficult to see the somite boundaries for the same reason. In lines 113-117 the authors state that the global oscillation period matches the periodic boundary formation. How do we reach this conclusion from these images? What is the variability between samples?

      If these two issues would be addressed it would increase confidence in the coarse graining argument and thus would strengthen the importance of the findings in the study.

      We thank the reviewer for this feedback, and we have added more quantifications to address this point directly in the modified Figure 2. Importantly, we added the quantification of the rate of segmentation in multiple samples based on segment boundary formation (new Figure 2D) and compared this to the global ROI quantifications using the reporter lines LuVeLu. This data provides clear evidence that the quantification of global ROI reporter intensities closely matches the rate of morphological segment boundary formation. In addition, we show that segment formation and also Wnt-signaling oscillations (Axin2-Achilles) and the segmentation marker Mesp2 (Mesp2-GFP) are all entrained to the zeitgeber period. We have also revised the text to clarify this important validation of our quantitative approach.

      In addition, we provide, in the revised Figure Suppl. 2, details of entrained samples, focusing on the segmenting regions. The brightfield and reporter channels were separated, emphasizing the segment boundaries and the expression pattern of the reporters. For ease of visualization, these samples were also re-oriented so that the tissue periphery (corresponding to anterior PSM) is at the top while the tissue center (corresponding to the posterior PSM) is at the bottom. This now additionally better shows the localization of the different reporters with respect to the segment boundary. We also included supplementary movies showing timelapse of samples expressing either Axin2-GSAGS-Achilles or Mesp2-GFP that were subjected to periodic DAPT pulses, with their respective controls.

      Several minor points could be addressed to improve the manuscript and are listed below:

      # 16 Figure 1 A the colormap and axes for the oscillatory traces should be defined

      We thank the reviewer, and we have modified the figure accordingly (related to point # 12). A colormap and axes for the illustrated timeseries are now included.

      # 17 Strength of zeitgeber is not defined and there is no analytical expression provided; how does it

      relate to DAPT concentration? Is the fact that low DAPT concentration corresponds to weak strength expected or is it a result?

      Zeitgeber strength generally refers to the magnitude of the perturbation periodically applied to an oscillator. With DAPT pulses, our expectation was that both the duration of the pulse and the drug concentration could influence the strength. Practically, the pulse duration was kept constant for all experiments and the concentration was varied. We thus expected that DAPT concentration would indeed be correlated to zeitgeber strength. We have discussed multiple evidence supporting this assumption in the main text, and this is indeed a result. In particular, as explained in the section “The pace of segmentation clock can be locked to a wide range of entrainment periods”, higher DAPT concentration gives rise to faster and better entrainment, as expected from classical theory. In the context of Arnold tongue, weaker zeitgeber strength corresponds to narrower entrainment region, which is experimentally observed (Fig 8F, showing regions where the clock is entrained).

      From a modelling standpoint, Zeitgeber strength corresponds to parameter A which is the amplitude of the perturbation. Possible zeitgeber strength was inferred from the model by matching the experimental entrainment phase with that obtained from the model isophases. As explained in Supplementary Note 2, we tested four concentrations of DAPT (0.5, 1, 2, and 3 uM) respectively corresponding to A values of 0.13, 0.31,0.43, 0.55. As we can see, those A values are not linear in DAPT concentrations, which is expected since multiple effects (such as saturation) can occur.

      __# 18 __In some figures it looks like the amplitude of oscillations may change with DAPT concentration

      and hence zeitgeber strength? Is this expected?

      We have not systematically analyzed the amplitude effect and have, intentionally, focused on the period and phase readout as most robust and faithful parameters to be quantified. Regarding the amplitude of LuVeLu reporter, we are cautious given that it is influenced, potentially, by the (artificial) degradation system that we included in LuVeLu, i.e. a PEST domain. This effect concerns the amplitude, but not the phase and period, explaining our strategy.

      That said, we agree with the referee that DAPT concentrations might change the amplitude of oscillations. Such change could even play a role in the change of intrinsic period (in fact a similar mechanism drives overdrive suppression for cardiac oscillators, Kunysz et al., 1995). But since the change of period can be more easily measured and inferred, we prefer to directly model it instead of introducing a new hypothesis on amplitude/period coupling, at least for this first study of entrainment.

      __# 19 __Figure 2A including the black area creates confusion and it is unclear which ROI is used in the

      rest of the study; consider moving this to a supplementary figure perhaps

      We thank the reviewer for this feedback (related to point #13), and we have modified the figure accordingly. As we responded to point # 13: We modified Figure 2A, by indicating each measurement as either global ROI or global ROI minus the diameter of the excluded circular region (e.g. global ROI - 50px). We also emphasized in the caption that timeseries are obtained using global ROI, unless otherwise specified.

      __# 20 __What type of detrending is used in Figure 2 and throughout (include info in the figure legend)?

      We used sinc-filter detrending, described and validated in detail previously (Mönke et al., 2020), as specified in Supplementary Note 1: Materials and methods > H. Data analysis > Monitoring period-locking and phase-locking: In this workflow, timeseries was first detrended using a sinc filter and then subjected to continuous wavelet transform. We thank the reviewer for pointing out that this detail is lacking in the figure captions, and we have modified the captions accordingly.

      __# 21 __Figure 2D merged images are difficult to read/interpret (see major comments)

      We thank the reviewer for this comment, and we have modified the figure accordingly (please see response to related point #15).

      __# 22 __Kuramoto order parameter is used to quantify the level of synchrony across the different samples

      however it is not defined in the text. Is it also possible to assess variability in each sample? For example how quickly does entrained occur in each sample? How faithfully the peaks of expression beyond 80min (to exclude initial unsynchronised state) match with zeitgeber time? This would help make the point that weak strength leads to a more variable response which is an interesting finding.

      We have now added a mathematical definition of the Kuramoto parameter in Supplementary Note 1.

      A high order parameter corresponds to coherence between samples, as also elaborated in respective figure captions (e.g. in the caption for polar plots in Figure 4D).

      In terms of variability in response to entrainment, we thank the reviewer for the comments, which has prompted us to perform an additional analysis, now included as Figure S13 in the Supplement.

      Briefly, we represent below figures showing how different samples get synchronized with the zeitgeber. To do this, we first represent the zeitgeber signal as a continuous uniformly increasing phase (“zeitgeber time”) with period : . The initial condition for is chosen so that the zeitgeber phase at the moment of last pulse is matching the experimental entrainment phase for each . We plot for each sample (dotted lines) and the zeitgeber phase (magenta line). To quantify how well each sample is following the zeitgeber time, we compute the Kuramoto parameter: . By the end of experiment most samples reach , indicating entrainment. Most samples need zeitgeber cycles to become entrained. For min the entrainment takes much longer (edge of the Arnold tongue). For min there is much variability, which can be explained by the horizontal region in the PRC around the entrainment phase. As suggested by the referee, synchronization is faster for higher DAPT concentration. So those dynamics are indeed consistent with the expectation from classical PRC theory.

      # 23 Do samples change period to Tzeit in similar ways - i.e. patterns over time. It looks like the

      kuramoto order parameter and period drop initially - why?

      We do not have a direct answer as to why the Kuramoto first order parameter and the period drop for the condition the reviewer specified. It has to be noted though that because of how wavelet analysis is done (cross-correlation of the timeseries with wavelets), the period and phase determination at the boundaries of the time series are less reliable (edge effects, see Mönke et al., 2020). Because of this, we should take caution when considering data to and from the first and last pulses, respectively. This was explicitly stated in the generation of stroboscopic maps: “As wavelets only partially overlap the signal at the edges of the timeseries, resulting in deviations from true phase values (Mönke et al., 2020), the first and last pulse pairs were not considered in the generation of stroboscopic maps.

      # 24 In Figure 4C why is the Kuramoto order parameter already higher in the 2uM DAPT conditions at

      the start of the experiment?

      Samples can, by chance, start synchronously and this results in a high Kuramoto first order parameter. Because of this likelihood, it is thus important to interpret the entrainment behaviour of multiple samples using various readouts, in addition to a high Kuramoto first order parameter. We investigated entrainment of the samples based on several measures: multiple samples remaining (or becoming more) synchronous (because each sample actively synchronizes with the zeitgeber), period-locking (where the pace of the samples match the pace of the zeitgeber, which can be distinct from natural pace), and phase-locking (where there is an establishment of a stable phase relationship between the samples and the zeitgeber).

      # 25 Figure 3C and Figure S2 require statistical testing between CTRL and DAPT in each condition

      p-values were calculated for the specified conditions and were added in the caption of the figures. These values are enumerated here:

      • Figure 3C
      • 170-min 2uM DAPT (vs DMSO control): p
      • Figure S2
      • 120-min 2uM DAPT (vs DMSO control): p = 0.064
      • 130-min 2uM DAPT (vs DMSO control): p = 0.003
      • 140-min 2uM DAPT (vs DMSO control): p = 0.272
      • 150-min 2uM DAPT (vs DMSO control): p = 0.001
      • 160-min 2uM DAPT (vs DMSO control): p To calculate p-values, two-tailed test for absolute difference between medians was done via a randomization method (Goedhart, 2019). This confirms that the period of samples subjected to pulses of DAPT is not equal to the controls, except for the 140-min condition (where the zeitgeber period is equal to the natural period, i.e. 140 mins).

      # 26 Figure 3A gray shaded area not clearly visible on the graph

      We have decided to remove the interquartile range (IQR) in the specified figure as it does not serve a crucial purpose in this case. By removing it in Figure 3A, the timeseries of individual samples are now clearer.

      # 27 Figure 6C colour maping of time progression is not clearly visible on the graph; the interpretation

      of this observation is unclear in the text and the figure

      We agree that the low quality of the image is unfortunate, and it seems that our file was greatly compressed upon submission. We have checked the proper quality of figures in the resubmitted version of the manuscript.

      Regarding the interpretation of Figure 6C, we conclude that in our experiments the entrainment phase is an attractor or stable fixed point, in line with theory (Granada and Herzel, 2009; Granada et al., 2009),. We had elaborated this in the text (lines 248-252 of the submitted version of the manuscript): at the same zeitgeber strength and zeitgeber period, faster (or slower) convergence towards this fixed point (i.e. entrainment) was achieved when the initial phase of the endogenous oscillation (φinit) was closer or farther to φent.

      # 28 Figure 7A circular spread not clearly visible on the graph

      Similar to point #27, we have provided a high resolution graph for the re-submission and hopefully resolved this issue.

      # 29 Figure S7A difficult to see the difference between colours

      See point #28.

      # 30 Is it possible to compare the PRC and the plots of period over time during entrainment? The PRC

      is mainly negative (Fig 8A1,A2), in my understanding this means a delay, however the periods seem to decrease over time before entraining to the Tzeit (Fig 3B). Is this reflective of a decrease in Kuramoto parameter and potential de-synchronisation of single cells before re-synchronisation at Tzeit?

      To address this question, we now plot the Phase response with colors indicating pulse number in new Supplementary Figure S13. While capturing the entire PRC as a function of time would require many more experiments (in particular to sample the phases far from entrainment phase), we still clearly see that the PRCs appear to translate vertically as the oscillator is being entrained, i.e. the latter time points are shifted up (down) for T_zeit = 120 (170) min, respectively.

      # 31 Fig 8A What is the importance/meaning of the PRC being similar shape between different

      entrainment periods? Does this reflect that the underlying gene network is the same?

      If one single gene network is responsible for oscillations, we expect from dynamical systems theory that the PRC are not only of similar shape but actually the same, independent of the entrainment period. What is surprising is that the PRC for different entrainment periods do not overlap, and the simplest explanation for this is that the intrinsic period changes with entrainment, all things being kept equal (including the underlying gene networks). This relates to the previous point since we indeed observe that the PRC “translates” vertically with the pulse number for longer periods. The change of period might be due to a long-term regulation as detailed in the discussion.

      # 32 The spatial period gradient and wave propagation under DAPT (Figure S8) should be included in

      the results and not just the discussion.

      We fully agree with the reviewer that both the establishment and the maintenance of a spatial phase gradient is of great interest. However, many more experiments would be required to fully quantify and understand the processes at play here, which we believe to be out of the scope of the current manuscript. To keep the focus of the paper on the global segmentation clock itself, we prefer to keep this figure in Supplement.

      Reviewer #2 (Significance (Required)):

      We currently do not have a detailed understanding of how biological oscillators integrate local signals from their neighbours as well global external signals to give rise to complex patterning that is important for embryonic development. Main bottlenecks that hinder our understanding are lack of real-time endogenous dynamic response together with known global inputs as well as comprehensive models that can explain emergent behaviour in a variety of tissues.

      This study goes a long way in addressing these bottlenecks in the embryonic tissue responsible for somite formation, a dynamical and oscillatory system also known as the segmentation clock. Firstly, they rely on a state-of-the-art previously developed system to entrain endogenous response in live tissue explants using precise microfluidic control. They test the complete range of exogenous perturbation periods and use an existing live reporter (LuVeLu) to monitor endogenous response. They also identify higher order coupling relationships whereby every other LuVeLu peak is entrained through external stimulation.

      As the stimulation system does not control but rather perturb the endogenous response, the observations from LuVeLu provide a unique opportunity in understanding input-output relationships and thus describing the dynamic response of the segmentation clock. Authors propose to study dynamic behaviour of the clock using coarse-graining and focus on describing the overall response over time while amalgamating spatial information. Appropriate coarse-graining is an important strategy in addressing complex problems and is widely used. They use sophisticated methodology such as phase response curves and Arnold tongue mapping to make several important observations. For example the nonlinear shortening and elongation of the period in response to stimulation is particularly interesting since this may indicates a feedback of the clock onto itself potentially via Wnt. Another key observation is that the spatial periodicity and phase wave activity persists in the perturbed conditions suggesting that individual single cell oscillators can adjust their behaviour to external input while retaining coordination with their neighbours. Finally, the authors go on to construct a general dynamical model of the segmentation clock and use this to conclude that the intrinsic period of the oscillator is altered and that the oscillator can be considered excitable.

      This work sheds light onto mechanisms of coordination of Notch activity in assemblies of cells observed in living tissue, an area of research that is important not only for somitogenesis but also for understanding gene expression patterning in many other tissues where Notch plays a critical role, for example in the development of the neural system and organs. As a study of a real-world nonlinear oscillator this work is directly of interest to theoreticians and synthetic biology experts interested in understanding complex patterning and emergence.


      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, authors studied the system-level responses of the somite segmentation clock by the coarse-grained theoretical-experimental approach, applying the theory of entrainment to understanding the phase responses of mouse pre-somitic mesoderm (PSM) tissues in the presence of periodic perturbation of Notch inhibitor DAPT generated by micro-fluidics technique. It was demonstrated that the segmentation clock is responsive to diverse range of the perturbation-periods from 120 to 180 min, can be period- and phase-locked, and the efficiency is dependent of the DAPT concentration (input-strength). The authors also observed two cycles of the segmentation-clock ticking in single cycles of 300 or 350 min period-perturbation, suggesting that higher order (2:1 mode) entrainment. They also applied stroboscopic maps to analysis and found that entrainment-phases are dependent of period of DAPT pulses, which is recapitulating theoretical predictions. The estimation of the phase response curve (PRC) of the segmentation clock revealed that the inferred PRC is an asymmetrical and mainly negative function, which represents characteristic features in oscillators that emerge after saddle-node on invariant cycle (SNIC) bifurcation. These results also indicated that the the segmentation clock changed the intrinsic period during entrainment.

      Major comments:

      # 33 I have major concerns about the relevance of the global time-series analysis proposed in Fig.2

      and conclusion about the changes of the intrinsic period during entrainment. The validity of the global time-series analysis should be carefully analyzed, because it could bring artifacts in estimated values of the intrinsic period. The authors concluded (page 3, line 172) that the period calculated by the global analysis represents similar values with the rate of segment formation, but there is no data about the quantification of the periods of segmentation, such as the frequency of Mesp2 reporter expression.

      We thank the reviewer for this feedback. We have now added the quantification of the period of segment formation (new Figure 2E) and show its strong correspondence to the dynamics of reporters used (Lfng, Axin2, and Mesp2). Please see also our response to point #15 with additional comments regarding the validation of the global time-series analysis.

      # 34 Another related issue is the presence of spatial period gradient as mentioned (page 13, line 524).

      One possible approach to circumvent this issue would be "local" time-series analysis; for instance, just focusing on the "putative posterior" regions that are close to source-positions of waves. Authors can re-compute and estimate PRCs by using such a method.

      We thank the reviewer for this suggestion and have accordingly now included the analysis of a localized ROI at the center (center ROI) of the 2D-assays (new Figures S5-S6). We also computed the PRC from center ROIs as shown below. We note strong correspondence between the global ROI and the center ROI.

      # 35 I have another major concern about the evidence of higher order entrainment shown in Fig.5. If

      the 1:2 entrainment is successful, we can expect that the values of observed period is close to the half of the period of pulses; However, the period shown in Fig.5B looks like 185 min longer than the half of 350 min. Is this gap due to the temporal accuracy of time-lapse movies?

      We do not think the discrepancy comes from a problem of temporal accuracy as the temporal accuracy is the same for all movies and there is no reason why there would be a specific issue for this set of experiments. In addition, we have re-analyzed the data to calculate the period from the stroboscopic maps. Mathematically speaking, we take the stroboscopic map as (see PDF) and use this to estimate the period of oscillation in entrained samples , in particular inverting the formula for 1:2 entrainment we have : see PDF.

      The advantage of this method is that it gives a more ``instantaneous” estimation of the period.

      The results are as follows:

      350 10uM: 187 +- 8 min (average across entrained samples from the last zeitgeber period)

      350 5uM: 193 +- 13 min (average across entrained samples from the last zeitgeber period)

      300 2uM: 148 +- 8 min (averaged across entrained samples and from two last periods)

      This additional analysis is in agreement with the wavelet analysis.

      The reviewer is right that for 350 minutes, entrained samples show an observed period that is higher than expected, also based on this new additional analysis. The reason for this is not known. One explanation is the relatively short observation time, especially considering for pulses separated by as much as 350-minutes, i.e. only 3 pulses are applied. [We notice that for 300 minutes pulses, the period converges to 150 mins between the 3rd and the 4th pulse]. We have adjusted the text in the results section to reflect that for 350min entrained samples, the observed period ‘approaches’ the predicted value, while for 300min entrained samples, the observed period is very close to it, i.e. 147mins In addition, we comment that the phase distribution narrows with time, another indication supporting higher order entrainment.

      # 36 Also, authors showed the period evolution towards 1:2 locking with just one condition (350 min).

      Authors can show the data for multiple conditions as in Fig. 3D, at least for 300 min and 325 min pulses and add the data about final entrained period with statistic analysis that supports the difference between the entrained period and the natural period (140 min).

      We thank the reviewer for this feedback and have modified the figure accordingly. In particular, in Figure 5A, we have added the period evolution plot for samples subjected to 300-min periodic pulses of 2uM DAPT (or DMSO for control). Additionally, we have added Figure 5D, which plots the average period in the 300-min and 350-min conditions. We summarize the median average period here with computed p-values:

      • 300-min pulses of 2uM DAPT (or DMSO for control): p-value = 0.191
      • CTRL: 130.39 mins
      • DAPT: 146.45 mins

      • 350-min pulses of 5uM DAPT (or DMSO for control): p-value = 0.049

      • CTRL: 127 mins
      • DAPT: 174.86 mins

      • 350-min pulses of 10uM DAPT (or DMSO for control): p-value = 0.016

      • CTRL: 142.82 mins
      • DAPT: 185.12 mins

      Minor comments:

      # 37 The authors can draw vertical lines indicating the T_zeit in Fig.3B, Fig.4B and Fig.5B in order to

      help comparisons between T_zeit and patterns of period (solid lines).

      We thank the reviewer for this comment. We have accordingly added a horizontal line indicating Tzeit in Figures 3B, 4B, S4A, and S5A (figure panel numbers based on the submitted version of the manuscript). We similarly added a horizontal line indicating 0.5Tzeit in the period evolution plots of 300-min and 350-min conditions in Figures 5A and 5B, respectively.

      # 38 In Fig.5A, the authors can show period evolution in the case of 300 min DAPT-pulses as shown

      in Fig.5B.

      We thank the reviewer for this feedback (related to point #36), and we have modified the figure accordingly.

      # 39 In Fig.6B DAPT panel, the authors can draw the points of phi_ent as shown in Fig.7A.

      We thank the reviewer for this comment, and we have modified the figure accordingly.

      # 40 In Fig. 8F, authors can put the information about DAPT concentration at the right y-axis.

      This is a similar comment as point #17, see above. In brief, we do not know the precise relation between the strength of the perturbation in our model and DAPT concentration, zeitgeber strength was inferred from the model by matching the experimental entrainment phase with that obtained from the model isophases.

      # 41 In Fig. 8G, the PRC in the panel "170 mins" does not have any fixed point (cross sections with

      horizontal lines of "0" phase response). If entrainment is successful, there should be stable and unstable fixed points, but those are absent, although 170 min pulses succeeded in the entrainment as shown in Fig.3D. Authors can explain where the fixed points are.

      The fixed points are indeed defined by the intersection with a horizontal line, but not with the ‘0’ line. They are found where the phase response compensates for the detuning/period mismatch, not at ‘0’ phase response. (See PDF for more details).

      Note however on Fig 8G that we further observe a vertical shift of the PRC, which prompted us to propose a change of the intrinsic period with (as explained in the text when we introduce Figs 8A1-2).

      Another way to visualize fixed points is offered in Fig 16 D-E, where we plot the inferred corrected PTC and the stroboscopic maps: there, fixed points correspond to intersections with the diagonal.

      Reviewer #3 (Significance (Required)):

      Although the phase-analysis has been widely applied to various biological systems, such as circadian clocks, cardiac tissues and neurons, this paper represents the first detailed experimental analysis of the segmentation clock based on the theory of phase dynamics. The major results are inline with theoretical predictions, whereas the suggestion about the SNIC bifurcation is attractive not only to the theoretical researchers but also to the experimental biologists; it has been believed that the segmentation clock consists of negative-feedback oscillator that emerge by Hopf bifurcation, whereas this paper proposes another possibility of the molecular network structure for the clockwork. This issue is related to recently proposed hypothesis about the excitable system in the segmentation clock based on the Yap signaling (Hubaud et al. Cell 171, 668 (2017)). However, unfortunately, discussion about detailed molecular networks are not abundant.

      # 42 Thus, maybe the main readers are computational biologists and systems biologists.

      We thank the reviewer for his/her significance comment. We have added comments on the bifurcation structure of the segmentation clock and on excitable systems in the discussion. While our focus is on coarse-graining so that we do not and cannot infer precise molecular details, we can still infer some properties of the underlying networks. In particular we now cite several papers explaining how systems with tunable periods/excitable are indicative of the interplay between positive and negative feedbacks. We think those considerations are of interest to a broad range of biologists interested in connecting experiments to theory.

    1. SAMSON CARRASCO

      Samson is extremely important to Don Quixote. At first glance we think of him as the antagonist however, as the story progresses we find that he is trying to help the Don. "The ir a key figure that fulfills a double function: to cheer up Don Quijote so that he may go out for the third time and also to induce him to return home." This makes him a pivotal part of this story.

      Presence and Sense of Sanson Carrasco | Request PDF. https://www.researchgate.net/publication/298984686_Presence_and_sense_of_Sanson_Carrasco.

    1. “My reasons for marrying are, first, that I think it a right thing for every clergyman in easy circumstances (like myself) to set the example of matrimony in his parish; secondly, that I am convinced that it will add very greatly to my happiness; and thirdly—which perhaps I ought to have mentioned earlier, that it is the particular advice and recommendation of the very noble lady whom I have the honour of calling patroness. Twice has she condescended to give me her opinion (unasked too!) on this subject; and it was but the very Saturday night before I left Hunsford—between our pools at quadrille, while Mrs. Jenkinson was arranging Miss de Bourgh’s footstool, that she said, ‘Mr. Collins, you must marry. A clergyman like you must marry. Choose properly, choose a gentlewoman for my sake; and for your own, let her be an active, useful sort of person, not brought up high, but able to make a small income go a good way. This is my advice. Find such a woman as soon as you can, bring her to Hunsford, and I will visit her.’ Allow me, by the way, to observe, my fair cousin, that I do not reckon the notice and kindness of Lady Catherine de Bourgh as among the least of the advantages in my power to offer. You will find her manners beyond anything I can describe; and your wit and vivacity, I think, must be acceptable to her, especially when tempered with the silence and respect which her rank will inevitably excite. Thus much for my general intention in favour of matrimony; it remains to be told why my views were directed towards Longbourn instead of my own neighbourhood, where I can assure you there are many amiable young women. But the fact is, that being, as I am, to inherit this estate after the death of your honoured father (who, however, may live many years longer), I could not satisfy myself without resolving to choose a wife from among his daughters, that the loss to them might be as little as possible, when the melancholy event takes place—which, however, as I have already said, may not be for several years. This has been my motive, my fair cousin, and I flatter myself it will not sink me in your esteem. And now nothing remains for me but to assure you in the most animated language of the violence of my affection. To fortune I am perfectly indifferent, and shall make no demand of that nature on your father, since I am well aware that it could not be complied with; and that one thousand pounds in the four per cents, which will not be yours till after your mother’s decease, is all that you may ever be entitled to. On that head, therefore, I shall be uniformly silent; and you may assure yourself that no ungenerous reproach shall ever pass my lips when we are married.”

      this seems unnecessary

    1. It’s not such a diffi cult process . . . to start with. . . . If they [Latinos] really wanted to do it, they would just go out and fi ll out the application and ask the teacher for details. . .

      I don't think it is as easy as she makes it sound. As we learned in last week's article, there are some kids who don't even understand the order of taking pre-algebra before algebra simply because they don't have people in their lives to explain that to them. It may seem easy to her because it is a path a lot of people in her family have taken or that her family holds strong values to school so she knows more about it.

    1. Within the field of instructional design, we have sometimes observed a hesitation to dwell on visual aesthetics (Parrish, 2009). This hesitation may stem from concern that artistically-approached designs will lack the ability to be replicated (Merrill & Wilson, 2006) or that the artistic elements will serve merely as window dressing—or worse, distraction—that provides no educational benefit to the learner.

      I find this to be true in my experience. I have worked with some professors who baulk at the idea of spending time creating or searching for a course banner image. There's other examples related to this, but I personally think that something as simple as finding or creating a course banner image can excite students. Or, if it's a corporate training hosted through Rise or Storyline, this may just be little visual elements and images that add a little something to the visual experience.

    1. Author Response

      Reviewer #1 (Public Review):

      This study provides data suggesting that tonic presynaptic a7 nicotinic receptor activity enhances corticostriatal input-mediated excitation of striatal medium spiny neurons; the data also suggest that tonic a4b2 nicotinic receptor activity on PV-fast spiking GABA interneurons inhibits striatal medium spiny neurons. These data advance our understanding about the complex cholinergic regulation of striatal neuronal circuits.

      The presented data are generally clean and high quality; but there are some problems that require the authors' attention.

      We thank the Reviewer for their insightful comments. We have addressed each point below with additional data and/or text. We believe these revisions have made the manuscript significantly stronger.

      1. In this study, ADP is a key parameter manipulated by several pharmacological treatments. But it is not clearly defined. The authors indicate EPSP and ADP are distinct by stating "LED pulse of increasing intensity generates excitatory postsynaptic potentials (EPSPs), or an AP followed by an after depolarization (ADP)." But the data (e.g. Fig. 1B) indicates that much of the ADP is probably EPSP. Please clarify. If much of the ADP is indeed EPSP, how are the data interpretation and the overall conclusion affected?

      We apologize for the oversight. The main focus of our study is on how tonic nAChR activation controls the timing of striatal output; our justification for including the ADP in our experimental analysis was simply corroborative, in that it represents an additional, easily measured parameter of the postsynaptic response to convergent cortical stimulation that 1) can be modulated by similar local inhibitory circuits that we show to mediate the effect of tonic nAChR activation and 2) is positioned (as opposed to EPSPs) to influence subsequent spiking, should the appropriate synaptic cues be present (which are deliberately omitted in our study). That said, under our experimental conditions EPSPs and ADPs were similar in both their kinetics and modulation by mecamylamine, suggesting that they represent mechanistically similar responses to cortical afferents. The defining difference (besides ADPs exhibiting larger amplitudes) is that they appear either in the absence of or following a spike. For these reasons we ultimately decided that reporting changes in both ADPs and EPSPs would be redundant, and limited our analyses to ADPs. Text has been added to the first paragraph of the results section to address these points.

      In Fig. 1F, ADP is absent. Why? Please clarify.

      Figure 1F shows an example of a SPN held at a mimicked ‘up-state’, achieved by injecting positive somatic current to produce a ‘resting’ membrane potential of -55-50mV. In this scenario, the ‘up-state’ membrane potential is higher than what would be reached during most ADPs evoked from Vrest, preventing the observation of ADPs in many trials. Text has been added to the end of the first paragraph in the results section to clarify this point.

      If ADP is distinct from EPSP here in MSNs, has it been reported in the literature, and how is it generated?

      Under our experimental conditions, we do not see any major differences between EPSPs and what we term ADPs (other than amplitude), at least in terms of kinetics and modulation by mecamylamine. That said, we have added text to the first paragraph of the results section that references previous work (Flores-Barrera et al.) describing suprathreshold depolarizations proceeding SPN spikes, which shaped our reasoning for including this measure in our study.

      1. In Fig. 1F, the holding potential for mecamylamine is a few mV more negative than the control, but the spike latency is shorter under mecamylamine. This is hard to understand because membrane potential (current-injection-induced depolarization + EPSP) determines spike firing and latency. If the holding potential is the same, then it's easy to understand (larger EPSP under mycamylamine).

      Thanks for pointing this out! We agree that this might seem counter-intuitive in terms of Vrest and EPSP amplitude only. Given that mecamylamine reduces GABAergic inputs to SPNs, the reduction in spike latency in this case is consistent with a reduction of GABA receptor mediated shunting. We have added this point to the text in the 3rd paragraph of the results section, which we think strengthens our justification to look at GINs as the potential mediators of mecamylamine’s effect on spike latency.

      1. Data in Fig. 2D, E are weak. The spiking ability of whole-cell recorded neurons often declines over time (evidence: the AP duration for the red trace is longer); recovery/partial recovery from MLA is needed for the data to be reliable. Fig. 2E shows 8 cells: 6 had no response, 2 increased. Sample size needs to increase.

      We appreciate this comment. Our initial justification for this experiment was from previous reports that alpha-7 nAChRs reduce corticostriatal glutamate release probability. We have now added additional data (Figure 2 supplemental data) showing that blockade of tonically activated alpha-7 nAChRs with the more specific antagonist MLA was not sufficient to change corticostriatal synaptic strength or release probability. In parallel, as we began increasing the sample size of the experiment testing the effect of MLA on spike latency, we noticed that the effect size became smaller than what we initially reported, which was already modest. Given the modest effect size of MLA on spike latency (with no presynaptic mechanism to offer), we reason that it would likely have minimal impact compared to the larger effect of mecamylamine. For this reason, we have backed off our conclusion that TONIC activation of presynaptic alpha-7 nAChRs on corticostriatal axon terminals will have a meaningful physiological impact on SPN spike timing. Accordingly, we removed previous figure 2D/E, but supplemented Figure 2A/B/C with new data (figure 2 supplement) demonstrating the lack of effect of tonic nAChR activation on corticostriatal synapse release probability. The title of the manuscript has been altered to reflect this.

      1. Fig. 7: the data on DhbE increasing AP duration is not convincing: no effect in 4 neurons, increase in 4 other neurons, and decrease in other neurons. Data ismore important than p<0.05. How do you interpret DhbE increasing AP duration?

      Point taken. We shouldn’t let a statistical calculation dominate the interpretation of a mostly mixed population result. Furthermore, upon revisiting this figure we realized that the main points pertinent to our conclusions (mecamylamine hyperpolarizes PV-FSI Vrest) were obscured by data that were of limited relevance. We have re-focused this figure to highlight data that are directly pertinent to our interpretation. This included removing the AP duration data set in question, which does not add to or inform our conclusions. We have further strengthened our conclusion that PV-FSIs are a primary mediator of the effect of tonic nAChR activation on spike latency by adding new data showing that pharmacologically blocking cortical activation of PV-FSIs occludes the effect of mecamylamine (new figure 8, see comments to Reviewer 2).

      Fig. 7F shows AP duration for PV-FSI is around 1.75 ms (some are over 2 ms, recorded at 35 C). This is unusually long. Also, the AP rise time is around 1.4 ms, very long. 1.75 ms total rise time vs. 1.4 ms for just rise: they do not add up?

      Please see our response to the above point.

      Reviewer #2 (Public Review):

      This manuscript examines one aspect of how acetylcholine influences striatal microcircuit function. While striatal cholinergic interneurons are known to be engaged in key events and tasks related to the basal ganglia in vivo, and pharmacological studies indicate cholinergic signaling is complex and critical to striatal function, the mechanistic details by which acetylcholine regulates individual cell types within the striatum, as well as how these integrate to shape striatal output, remain largely unknown. This work thus addresses an important problem in the basal ganglia field, with likely relevance to both normal function and disease-related dysfunction. The authors used a brain slice preparation in which a large number of excitatory cortical inputs to the striatum are activated, and they could measure the resulting activation of striatal projection neurons (SPNs). Their primary finding was that in this preparation, blocking nicotinic acetylcholine signaling resulted in more rapid activation of SPNs. They then explored some of the potential mechanisms for this phenomenon, and conclude that in their preparation, cholinergic interneurons are engaged both tonically and phasically, resulting in recruitment of local GABAergic interneurons that provide feedforward inhibition onto SPNs. They show that one striatal GABAergic interneuron subclass, PV-FSI, are modestly excited by tonic nicotinic signaling, and suggest this may be one contributor to their primary finding.

      Strengths of the study include the focus on cholinergic signaling across multiple striatal cell types, careful and clearly displayed slice electrophysiology, good writing, and a methodical approach to pharmacology.

      Weaknesses include reliance on the Thy1-ChR2 line to activate excitatory cortical inputs to the striatum (this line may be less specific to cortical pyramidal neurons than a specific Cre recombinase mouse line used with Cre-dependent ChR2, and thus have unintended influences on the results), and despite a strong start, a fairly weak mechanistic exploration of what GABAergic neuron subclasses might contribute to their original phenomenon.

      We thank the Reviewer for their thoughtful and constructive comments. The Reviewer identified two weakness of our study, as presented. The first weakness was our reliance on a transgenic mouse line (Thy1-ChR2) to activate cortical inputs to the striatum. Specifically, how a potential lack of specificity/ectopic expression of ChR2 in non-glutamatergic cortical neurons may impact our interpretation of the data. The second is that we did not make an effort to identify the specific subclass(es) of GINs that contribute to the phenomenon we describe. We have addressed both of these comments with new experiments, which we will describe individually below.

      1) Specificity of corticostriatal afferent activation in Thy1-ChR2 mice. As the Reviewer keenly points out, although Thy1-ChR2 mice are often used as a tool to specifically activate excitatory corticostriatal nerve terminals with optogenetic stimuli, there is concern that ChR2 expression is not exclusively limited to glutamatergic cortical neurons. If present, direct optogenetic activation of non-cortical striatal afferents would influence our results and impact our interpretation. We have addressed this issue experimentally by adding two new types of experiments (and related text, pages 7-8).

      We have added new data using immunohistochemical staining to survey for ectopic expression of ChR2 in the cortex. Staining for GAD, to broadly identify GABAergic neurons, displayed no overlap with ChR2-expressing cortical neurons in Thy1-ChR2 mice. Since a population of GABAergic somatostatin-expressing cortical neurons (particularly in the auditory cortex), have been shown to directly innervate the striatum (Rock et al., 2016), we also show that we found no evidence for somatostatin-ChR2 colocalization in our mice. Furthermore, we report no evidence for somatic expression of ChR2 in the striatum. We do report somatic expression of ChR2 in a population of globus pallidus soma, and add text to describe the above data (figure 3 supplement ) as well as published data identifying ChR2 in axons of the substantia nigra. Together, these data suggest that cortical expression of ChR2 is limited to non-GABAergic neurons, though do not eliminate the possibility of a direct monosynaptic GABAergic input to the striatum form non-cortical (and extrastriatal) brain regions. We describe newly added experimental data below to address this possibility.

      We have added new data to directly test if the optogenetic stimulation protocol used in this study induces a monosynaptic GABAergic current in SPNs (figure 3 supplement). We report that an optogenetically-evoked monosynaptic GABAergic current is indeed detected in SPNs, though it is unlikely to affect our results or interpretations for two reasons. First, based on the newly added histological data, the source of this GABAergic current is non-cortical and extrastriatal. Second, and more importantly, this input is insensitive to mecamylamine (new data, figure 3 supplement) and as such would not be modulated by the key manipulations presented in this study. Finally, experiments described below – instructed by a suggestion made by Reviewer 2 (see below) – show that blocking glutamatergic synaptic activation of a class of striatal GINs eliminates the effect of mecamylamine on SPN spike latency, ruling out the involvement of a monosynaptic GABAergic input in mediating the phenomenon.

      2) Identification of the key GIN subclass that mediates the phenomenon. Our initial manuscript included data demonstrating the feasibility of PV-FSIs in participating in the phenomenon we described, but we agree with the Reviewer that we stopped well short of identifying the class of GINs that are actually involved. We have added two new data sets to the manuscript that now corroborate both the involvement and necessity of PV-FSIs in mediating this phenomenon. First, we have added data showing that striatal SOM+ interneurons respond to mecamylamine differently than PV-FSIs do: while mecamylamine hyperpolarizes PV-FSIs, it depolarizes the average membrane potential of SOM+ interneurons and has no effect on their spontaneous firing frequency, making them unlikely candidates to mediate the phenomenon we describe. Second, we have added data showing that pharmacologically preventing cortical activation of PV-FSIs both mimics and occludes the effect of mecamylamine on spike latency and ADP amplitude (new figure 8). This data also rules out the involvement of certain other classes of GINs, such as PLTS interneurons, as the pharmacological manipulation we performed (blockade of calcium-permeable GluA2-lacking AMPA receptors) does not affect their response to cortical inputs (Gittis et al., 2010).

      Reviewer #3 (Public Review):

      The manuscript by Matityahu et al., investigated the role of tonic activation of AChRs on the spike timing of striatal spiny projection neurons (SPNs) in acute striatal slices. By selectively activation of corticostrialal projections using optogenetic tools (ChR2), they find that pharmacological blockade of presynaptic α7 nAChRs delays SPN spikes, whereas blockade of α4β2 nAChRs on GABAergic interneurons advances SPN spikes. The work is carefully done with proper control experiments, and the main conclusions are mostly well supported by data.

      Although they only constitute ~1% of the total striatal neurons in rodents and humans, cholinergic interneurons (ChINs) are gatekeepers of striatal circuitry because of their extensively arborized axons and varicosities which tonically release ACh. Whereas the role of muscarinic AChRs (mAChRs) in modulating striatal output has been well established, the role of nAChRs (especially the tonic activation) remains to be elucidated. The study is solid and the results are new and convincing. The data suggest that tonic activation of nAChRs may place a "brake" on SPN activity, and the lift of this brake during pauses of ChIN firing in response to salient stimuli may be critical for striatal information processing and learning. The findings from this study will enhance our understanding of the role of tonic nAChR activation in controlling SPNs and striatal output.

      We thank the reviewer for their careful reading of our manuscript and for their kind words and helpful suggestions.

      Unjustified Conclusions and Suggestions:

      1) The change of the SPN spike timing by AChR modulation is on a few milliseconds time scale. To make the current study more significant, the authors should design and perform additional experiments to demonstrate the functional consequence in controlling striatal output and learning. For example, will activation or blockade of nAChRs have effects on striatal STDP?

      We too would be thrilled to see the results of such experiments. Unfortunately our early attempts to perform such tests (e.g., crossing Thy1-ChR2 mice with ChAT-Cre mice to selectively express halorhodopsin in CINs, and combine cortical excitation with silencing of CINs) have been plagued by technical challenges, and would require time and resources that we feel are pragmatically beyond the scope of this study. That said, we’ve included new text (particularly, page 15) discussing how our results may fit with a newly published study on the role of CINs in corticostriatal LTP (Reynolds et al., 2022).

      2) Modulation of striatal circuitry is complex. The addition of a diagram illustrating the hypothesis and key results would help.

      Excellent suggestion. We have added a summary diagram, which is now figure 9.

    1. Rintze December 5, 2011 With regard to broken translators, do the Zotero clients phone home any details on save failures? (there is a preference checkbox "Report broken site translators" which suggests they do)I don't mind fixing up a few more translators, but it would be nice to know which translators fail most often. ajlyon December 5, 2011 It does phone home, but I'm afraid those reports are going into a black hole for now; I've noticed the requests in various logs, but I've never been notified of a failing translator by the Zotero team. It'd be great if the translator list / status page integrated explicit tests and such error reports. adamsmith December 5, 2011 there is, of course, also a good number of translators who don't trigger any errors, because they don't detect. Rintze December 5, 2011 Yes, but I would argue that non-detecting translators are less frustrating to users. dstillman December 7, 2011 Here's a start:https://repo.zotero.org/errorsThe actual error reports aren't public for privacy reasons (and we're not displaying absolute numbers), but we can provide example error strings and URLs on request. We also might be able to have this automatically display error strings that show up across many reports (e.g., "TypeError: scisig is null" for Google Scholar), since short of major site breakages it will probably be hard to debug many of these without examples.Note that the Google Scholar results are greatly skewed by Retrieve Metadata attempts, and DOI is also showing mostly "could not find DOI" errors. I'm hoping detection can be tightened on those (e.g., to remove the folder icon on a Google Scholar search with no results), which would allow this to better show actual error frequency. ajlyon December 7, 2011 I'll try to work on detection. Automatic display of common error strings would be very useful, as well as some general idea of how many errors we're talking about-- for something like ScienceDirect, are we talking about 10 errors? 100? 1000?Also, does this filter out data from clients with out-of-date translators or Zotero versions?Thanks for putting this up! It's sure to be useful in the coming weeks and years. Rintze December 7, 2011 Like ajlyon, I think some indication of the number of errors per translator would be very useful. And could the list be expanded to show more than the top 10 translators (say the top 50)?Also, would it be possible to create somewhat comprehensive reports with, say, 10 error strings and URLs for each translator to send to ajlyon, adamsmith and me, so we don't have to submit individual requests per translator? I'd hope we have established ourselves as at least somewhat trustworthy (and I assume all three of us would be more than willing to sign any privacy agreement). ajlyon December 8, 2011 Thanks for upping the number visible.What's going on with the outdated translators? There are people out there with three different ScienceDirects, two DOIs... Is that just people with updating off? Or something else? dstillman December 8, 2011 OK, updated again with absolute numbers and per-error breakdowns. Hover over each segment for error details. I don't think any page data will make it into the errors, but to be safe I'm displaying only errors coming from at least three addresses that don't include the string "http" in them—the rest get lumped together at the end in blue. If you notice anything that shouldn't be in there, let me know.We might be able to display URLs that show up across enough addresses, though there may not be enough of those. What's going on with the outdated translators? Those are all <2.1.9. Not much we can do for those folks.
      • ABOUT property "Report broken translators"
    1. It is not just that trans women are not really women;even females who self-identify as women are not really women.

      I think that Barnes will not agree this characterization. Barnes's idea is simply that there is no single group corresponding to the term "woman". Instead, there are multiple groups that may be the semantic value of "woman". Some of them are much more gerrymandered. I think the idea does not imply that no one is really a woman. Instead, the upshot is simply that when we consider whether one is really a woman, we must attend to the meaning of "woman".

    1. Author Response:

      We largely agree with the assessment of the Reviewers. Indeed, as noted by Reviewer #2, under the urgent conditions of our experiment, the onset of the cue modulates competing saccade plans that are already ongoing. The reviewer is correct in considering that the initial motor plans are endogenously generated, as they favor one location or the other based simply on the subject's internal bias or preference. We would just note that the endogenous signal that we focus on refers to a later modulation which, based on the perceived cue location and the task rules, directs the motor plans to the correct target location. According to our findings, this endogenous modulation occurs after the exogenous response and acts in the opposite way, boosting the anti-saccade plan and curtailing the activity that would otherwise trigger an erroneous pro-saccade. Thus, three things may happen in each trial: (1) initial, uninformed motor plans are endogenously generated, (2) the cue onset exogenously reinforces the plan toward the cue, and (3) an informed endogenous signal suppresses the plan toward the cue and boosts the plan toward the anti location. We think the novelty here is in being able to characterize these distinct events, which unfold within a few tens of milliseconds of each other.

      Reviewer #3 considered our conclusion that the exogenous response "is entirely insensitive to behavioral context" too strong, and that is a fair point. Conclusions apply to the degree that experimental conditions are valid in general, and furthermore, the deviations from the idealized predictions were small but not zero. However, we do not consider the assumption noted by the reviewer, that saccade-related neural activity ramps up before the saccade goal is known, as a weakness. We have, in fact, recorded such activity in several oculomotor areas using similar urgent-choice designs (Stanford et al., Nat Neurosci 13:379, 2010; Costello et al., J Neurosci 33:16394, 2013; Costello et al., J Neurophysiol 115:581, 2016; Scerra et al., Curr Biol 29:294, 2019; Seideman et al., bioRxiv, 2021, https://doi.org/10.1101/2021.02.16.431470), and the responses in the frontal eye field (FEF) in particular conform quite closely with those assumed by the model (Stanford et al., Nat Neurosci 13:379, 2010; Costello et al., 2013; Salinas et al., Front Comput Neurosci 4:153, 2010). Rather than a potential liability, we think the early ramping activity is a key constraint for any model of urgent choice performance.

    1. Author Response

      Reviewer #1 (Public Review):

      In this article, Miettinen and colleagues exploit the suspended microchannel resonator developed in their lab and optimize the method to be able to record single live mammalian cells for very long periods of times, across several cell division cycles, while performing a double measure of their buoyant mass in media of different densities (H2O and D2O). Because water exchanges fast enough inside the cell, it allows them to define a dry mass and a dry volume, and thus a density of dry material for single cells along the entire cell division cycle. These measures lead them to confirm and clarify some points from previous studies from their lab and others, such as exponential growth also in dry mass and the fact that buoyant mass and this new dry mass are the same thing in interphase cells. They then find that this is not true during mitosis, mostly because dry mass density increases in early mitosis (dry mass decreases and dry volume decreases even more, suggesting that there is a loss of material of density lower then the average dry mass density). The authors rule out a number of potential mechanisms and give evidence for a role of exocytosis, more precisely exocytosis of lysosomal content. Blocking this phenomenon prevents the change in dry mass density but does not affect cell division. They propose some potential function for this phenomenon, including the interesting hypothesis that this helps cleaning the lysosomal content which might contain some toxic components, so that daughter cells are born with 'clean' lysosomes. Cool idea! It is also quite amazing that the precision of their method allows them to detect this event.

      The main question I have concerns the definition of dry mass and dry volume. The authors should discuss in more details what it represents physically. Technically, this is defined by their equation 1, which relates their measure of buoyant mass to a dry mass and a volume of water as parameters to fit from the buoyant mass data. One gets to this equation by writing the definition of buoyant mass as the mass of the cell minus the mass of the equivalent volume of the surrounding medium. But then, to get what the authors find, one has to write that the cell mass is the sum of the dry mass and the mass of water contained in the cell (which makes the dry mass easy to understand) and then to write that the cell volume is the sum of a volume of water and of a volume of dry material. This then defines a dry volume, as the difference between the volume of the cell and the volume of the water contained in the cell (which is the parameter Vwater in the equation 1). At least this is how I got to this equation. The question I asked myself then is: what is this dry volume? Is it really the volume occupied by the dry mass in the cell? This is probably not the case, since dry mass is solvated in the cell. One can estimate this solvated volume using the van't Hoff/Ponder relation, which can be found changing the osmolarity of the external medium. It defines an excluded volume, which is the total volume excluded by macromolecules (like for a van der Waals gas) - it is usually between 25 and 30% of the cell volume. This volume contains the dry mass plus a certain fraction of the water, so it is not exactly the dry mass volume as defined here by the authors. I am worried that this dry mass volume, which is mathematically defined here and calculated from the fit of the equation, is not a standard physical quantity and so it is not easy to relate it to standard biophysical theories (e.g. equations of state), and its behavior could be very unintuitive even for simple systems. This makes the variation in this quantity not easy to interpret, and thus also the variation in dry mass density is not easy to interpret in physical terms.

      That being said, it is still clear that whatever this is, it changes in early mitosis, and it seems to be related to exocytosis, so I am not saying that the authors are wrong here. They potentially indeed detect this increase of exocytosis. But they should discuss more what they think this quantity is, either in the methods or in the discussion of the article. In particular, the sentence at the bottom of page 5, line 104, is not clear ('We are not aware of any other single cell methods capable of quantifying this biophysical feature of a cell'), since this measure is not really clearly a biophysical feature of a cell, but is defined a bit artificially from the equation which defines the dry mass volume from the measures of buoyant mass.

      Thank you for the detailed and very constructive feedback. As stated above in the Essential Revisions section, we have now clarified the terminology we use and made the terminology more consistent with existing literature. We have also better defined the concept behind our method. Our updated Measurement Method section now states (page 3) that: “In our approach, we consider the buoyant mass of a cell to be dependent on two distinct physical “sections” of the cell, the dry content and the water content. To measure the cell’s dry content independently of the water content, we measure the cell’s buoyant mass in H2O and D2O-based solutions. Under these conditions, the influence of the water content on buoyant mass can be excluded, because the intracellular water is exchanged with extracellular water, making the intracellular water content neutrally buoyant with extracellular solution. This allows us to detect the cell’s dry mass (i.e. total mass – water mass), dry volume (i.e. total volume – water volume) and dry mass density (i.e. dry mass / dry volume).”

      The reviewer is also correct that our method measures a dry volume which is, by our model’s definition, the volume occupied by the dry mass independently of water. In other words, our method & measurement model assumes that the intracellular water exchange is 100% complete. The reviewer is correct that some water may be retained, and we cannot directly measure the amount of H2O left inside the cell after immersion in D2O-based media. However, our results indicate that our dry volume measurements are not limited by the water exchange time that the cell experiences (Figure 1–figure supplement 2). In other words, in our measurements, cells exchange all the water they can exchange, be that 100% or 98%. This is further supported by our new estimations of the time needed to transport all water in and out of the cell (see above, other comments section #1, and our updated manuscript page 5). Note that, as our method only exchanges H2O to D2O instead of removing all water from the cell, dry mass will always remain solvated in either H2O or D2O, which makes it plausible that 100% of the water content is exchanged.

      As the reviewer keenly points out, our measured dry volume is biophysically distinct from the more classically measured excluded cell volume (or dehydrated cell volume), which still includes some water in the excluded cell volume quantifications. Consistently, our method measures dry volumes that are smaller (~15%) than what the excluded volumes typically are (~25-30%). We do not consider this a limitation of our method, but rather an opportunity for new measurements. That being said, we completely agree with the reviewer that this may cause confusion in the readers. To address this point, our Measurement Method section now states (page 4) that: “Importantly, our approach assumes that all water within the cell is exchangeable between H2O and D2O. Accordingly, our dry volume measurement is distinct from the excluded cell volume detected by measuring cell volume following strong hyperosmotic shocks, which does not remove all water from the intracellular space.”

      Finally, we have also changed the sentence “We are not aware of any other single cell methods capable of quantifying this biophysical feature of a cell” (page 5) so that it only refers to a metric, which hasn’t been quantified before on a single-cell level. We believe that this minor change will avoid the suggestion that dry volume is of biophysical importance on its own.

      Reviewer #2 (Public Review):

      The new suspended microchannel resonator (SMR)-based method described in this paper enables high precision and high temporal resolution single-cell measurements of key physical properties: cell dry mass and the density of cell dry mass, which depends on the macromolecular composition of the cell. The validity of the method is rigorously tested with several convincing control experiments. This method will be useful for future studies investigating cell size and growth regulation and the coordination of mass, volume and density in animal cells.

      Using their method, the authors report two important results. First, they confirm that buoyant mass measurement is a valid proxy for cell mass in interphase, an important finding given that SMR measurements have been one of the best and most productive approaches to investigating cell mass growth regulation. Second, they provide evidence that some cell types lose dry mass during metaphase by a mechanism that involves exocytosis, emphasizing how mass, volume, and density dynamics are more complex than during the rest of the cell cycle.

      While this paper presents very interesting results, it would benefit significantly from two main improvements. First, the different physical variables studied here (dry mass, dry density, dry mass density, dry volume) should be better defined, and the terminology revised to provide a more straightforward and intuitive description of their biological meaning. Several sections of the paper (especially the introduction and the discussion of Fig. 2-4) should be re-written to help the reader understand the message. Second, some of the drug treatments require more replicates to provide more conclusive answers.

      Thank you for this constructive feedback. As stated above in the Essential Revisions section, we have now changed our terminology to increase clarity. Our new density measurement in this manuscript (dry mass divided by dry volume) is now defined as ‘dry mass density’. This change has been applied throughout our manuscript, including our manuscript title. In addition, we have added clearer definitions of each term to our Introduction and Measurement Method sections. Furthermore, we have minimized the use of the term ‘dry composition’ throughout our manuscript, as we now realize this may cause confusion to some readers.

      More specifically, our introduction (page 3) now states: “Here, we introduce a new approach for monitoring single cell’s dry mass (i.e. total mass – water mass), dry volume (i.e. total volume – water volume), and density of the dry mass (i.e. dry mass / dry volume), which we will refer to as dry mass density.” These definitions are also repeated in our Measurement Method section (page 4), as many readers may look for the definitions in that section. We have also done many other minor modifications to our main text throughout the manuscript to help the readers understand our message.

      In addition, as detailed above in the Essential Revisions section 3, we have adjusted the writing of our manuscript to avoid overly strong claims where our replicate numbers are insufficient. More specifically, we now avoid conclusions where we claim that inhibition of cytokinesis has no influence on dry mass and dry mass density changes in mitosis.

      Reviewer #3 (Public Review):

      In this manuscript, the authors extend the Manalis lab's vibrating cantilever approach by adding the ability to rapidly exchange media with heavy water. This allows the authors to measure dry mass and its density in growth and proliferating cells. This resolves a previous discrepancy of the cantilever approach and quantitative phase imaging and shows that cells in early mitosis likely increase lysosomal exocytosis. This is an interesting piece of work.

      The authors report that: "On average, the FUCCI L1210 cells lost ~4% of dry mass and increased dry density by ~2.5%, and these changes took place in approximately 15 minutes (Figure 3C). In extreme cases, cells lost ~8% of their dry mass while increasing dry density by ~4%". Although these changes may sound small, I believe they would require significant changes to the cell composition. I.e., to increase the overall dry mass density by 4% while losing 8% of the cell's dry mass, the cell would need to lose almost exclusively low-density components, which may not be typical for exocytosis. Moreover, even if all of those lost 8% of cell dry mass are exclusively lipids (or other low-density components), it is not intuitively obvious that such a loss would be sufficient to cause a 4% change to the dry density. To make this more convincing, the authors should provide a simple mathematical model that would roughly estimate how the cell composition (e.g., the contents of lipids vs proteins) needs to change and what the composition of the lost (secreted) components needs to be to provide the observed changes to the dry mass and density, given the existing information on average cell composition and the densities of different biomolecules (lipids, sugars, proteins, etc).

      Thank you for this comment. The reviewer is correct that significant changes to the cell composition are needed to explain the phenotypes we observe. As stated above in the Essential Revisions section, we fully agree that such calculations could be very useful in interpreting our results. Our manuscript now contains a new paragraph (discussion section, page 13), where we state: “The magnitude of dry mass density increase in mitosis was large. We have previously observed similar magnitude changes in dry mass density when perturbing proliferation in mammalian cell (Feijo Delgado et al., 2013). To provide some rough estimates of what kind of compositional changes would be required to achieve the dry mass loss and dry mass density increase, we carried out a back-of-the-envelope calculations. Assuming a typical mammalian cell composition and typical macromolecule dry mass densities (Alberts, 2008; Feijo Delgado et al., 2013), we calculated the degree of lipid loss needed to increase dry mass density by 2.5%. This suggested that cells would have to secrete ~1/3 of their lipid content in early mitosis. This could be achieved via lysosomal exocytosis of lipids. Lipid droplets, the main lipid storages inside cells, are frequently trafficked into and degraded in lysosomes (Singh et al., 2009), and lipid droplets can also be secreted via lysosomal exocytosis (Minami et al., 2022). However, it seems likely that the mitotic dry mass density increase also involves secretion of other low dry mass density components (e.g. lipoproteins, specific metabolites) and/or a minor, transient increase in high dry mass density components (e.g. RNAs, specific proteins) in early mitosis. Indeed, CDK1 activity has been suggested to drive a transient increase in protein and RNA content in early mitosis (Asfaha et al., 2022; Clemm von Hohenberg et al., 2022; Miettinen et al., 2019; Shuda et al., 2015).”

    1. The illustrations below (pp. 224 ff.) show the course of the reaction time in hysterical individuals. The light cross-hatched columns denote the locations where the test person was unable to react (so-called failures). The first thing that strikes us is the fact that many test persons show a marked prolongation of the reaction time. This would make us think at first of intellectual difficulties, - wrongly, however, as we are often dealing with very intelligent persons of fluent speech. The explanation lies rather in the emotions.

      This makes sense. Some words may have someone relate to a certain incident or time/place that slows their quick responses. They are distracted and taken back to the thought that is associated with that word.

    1. Author Response:

      Reviewer #1:

      This study reports on the inference of the evolutionary trajectory of two specialist species that evolved from one generalist species. The process of speciation is explained as an adaptive process and the changing genetic architecture of the process is analyzed in great detail. The genomic dataset is big and the inference from it solid. The authors reach the conclusion that introgression and de novo mutations, but not standing genetic variation, are the main players in this adaptive process.

      I would avoid the term adaptive radiation for the group of fish studied here. It is misleading. It is generally accepted to use the term adaptive radiation when a fairly large number of new species originate from a common ancestor (cichlids in big African lakes, gammarids in Lake Baikal, etc). Here are only 2 new lines that evolved from a common ancestor. Furthermore, I do not see much parallel between the ideas and concepts used when people study real adaptive radiations and one studied here. I actually believe that the term adaptive radiation even distracts from the beauty of the current study.

      We would like to acknowledge that the usage of the term “adaptive radiation” has a long, rich history of debate in the literature over how it should be applied to empirical systems. Some example definitions of adaptive radiation are listed below:

      1) “The evolution of ecological and phenotypic diversity within a rapidly multiplying lineage” - Schluter, 2001 (The ecology of adaptive radiation). This definition implies that abundant ecological and morphological diversity that arose in a single lineage over a short time are the hallmarks of adaptive radiation and has been frequently applied to stickleback species pairs. The pupfishes of San Salvador Island meet these criteria (two trophic specialists arose from a generalist ancestor within 10,000 years). Importantly, please note that in this foundational textbook on adaptive radiation, no statement is made about the number of species necessary to be considered an adaptive radiation.

      2) “The evolutionary divergence of members of a clade to adapt to the environment in a variety of different ways.” – Losos, 2009 (Lizards in an evolutionary tree: Ecology and adaptive radiation of Anoles). Here again, the pupfish system described meets the definition. Unlike the previous definition, no statement about the rate of diversification (species or morphological/ecological) is made.

      3) “The rise of a diversity of ecological roles and attendant adaptations in different species within a lineage” – Givnish, 1997 (Adaptive plant evolution on islands: classical patterns, molecular data, new insights. Evolution on islands). As with the previous definition, no qualification is made with respect to rates of diversification. The pupfishes again meet the definition.

      As discussed by Givnish in 2015 (“Adaptive radiation versus ‘radiation’ and ‘explosive diversification’: why conceptual distinctions are fundamental to understanding evolution” – New Phytologist), few of the early definitions of adaptive radiations contained any reference to the rapidity of speciation – Simpson (1953) perhaps being the only notable exception. However, despite this, no definition states that the application of “adaptive radiation” to a given system is contingent upon a given number of species having arisen by the present day.

      The pupfishes of Salvador island meet all definitions of adaptive radiation – exceptional rates of morphological diversification and ecological diversification, as well as truly exceptional rates of speciation – focusing just on the three species here, two species have arisen within the last 10,000 years – this roughly translates to a speciation rate of 200 species per million years. While this pace is highly unlikely to be maintained, we feel that every line of evidence points towards the pupfishes of San Salvador Island as an adaptive radiation at the earliest stages of the process. We disagree that an adaptive radiation must be ‘complete’ or nearly so, for it to be deemed as such.

      Finally, we have also discovered a fourth pupfish species on the island (Richards and Martin 2016; Richards et al. 2021), and even more undiscovered species may exist there. Thus, this is an adaptive radiation of four sympatric species, not two as suggested.

      The "Result and discussion" section has rather little discussion. There is not much about other systems or studies, neither in concepts nor in biology. The results are not linked to the bigger questions and the larger field. The same is true for the conclusion, which is very strongly centered on the here reported study. What can we learn from this study for other systems? Is there a generalizable take-home message? How do the findings relate to commonly held ideas/theory on how adaptive speciation works? Without this, it reads like a report of a case study, disconnected from the larger field. To achieve this aim, it may be good to split the main section into a result and a discussion section, but this is only a suggestion.

      We followed this helpful suggestion and have split the results and discussion section and significantly expanded and revised our discussion section. We now relate our findings to the broader fitness landscape theory literature and emphasize how our findings inform the process of speciation. We conclude by emphasizing that our findings point to a process in which adaptive introgression and de novo mutation not only provide diversity that is useful in reaching novel fitness peaks on a static landscape but alter the shape of the landscape itself.

      Reviewer #2:

      This is a really interesting and challenging question the authors are addressing here. I enjoyed reading the manuscript and a few comments below:

      One major concern I have concerns the analysis of the two treatments (low and high density, l411). I believe that the two treatments should analyzed separately as the authors are estimating two different fitness landscapes. When conducting their analysis, experiment is treated as a single factor. Yet, in Martin and Wainwrigth (2013), it was established that the fitness landscapes were quite different between the two treatments (Figure S7 of said paper), meaning that different phenotypes (and therefore genotypes) were affected differently. I do not think that the complex effect described there can be capture by a single factor as done here.

      We examined this concern further and now include new analyses of only data from the second field experiment to address these concerns (described in more detail below), resulting in qualitatively similar conclusions to those conducted using all samples.

      Please also note that only the high-density treatments from the 2013 study were included in the current study due to the low sample sizes of the original low-density treatments. In the 2020 fitness landscape study, we found no evidence of a treatment effect (frequency-manipulation) on the curvature of the fitness landscape. In all our analyses, we do include the effect of lake accounting for environmental differences between lake replicates.

      While the two high-density treatments in Martin and Wainwright 2013 were analyzed and visualized in some cases as distinct adaptive landscapes as pointed out by the reviewer, many aspects of stabilizing and disruptive selection were comparable between the lake environments and detected in similar regions of morphospace as described in Table 1 in that paper. All statistical analyses of the second field experiment (e.g. Figure 5A of Martin & Gould 2020 Evol. Letters) indicated no effect of the frequency treatment between the two field enclosures in each lake; accounting for treatment did not improve model fit to the data. In the second field experiment, the authors found that the two frequency treatments in each lake could in fact be summarized by a single fitness landscape accounting for lake-specific effects which was as the best fitting GAM model. This surface bore remarkable similarities to the high-density fitness surfaces of the 2013 in the placement of fitness peaks and valleys on the morphospace (Martin and Gould 2020). Thus, we tend to view the fitness landscape of interest to us as a single landscape connecting the fitness of different species phenotypes while treating lake-specific environmental effects on this landscape as background noise.

      Unfortunately, we do not have sufficient resequenced samples to analyze only data from the first experiment alone (Martin and Wainwright 2013); fewer than half of our samples come from the 2013 study – the remainder come from the second field experiment. Therefore, we now include a second set of analyses focused on just the subset of resequenced fish from the second field experiment (Figure 5—figure supplement 1-2, Appendix 1—table 18-19). Our primary goal was to assess whether our major findings held within a single field experiment by focusing on the latter, more data-rich experiment.

      Because we believe the most significant analyses from our paper are those pertaining to genotypic fitness landscapes and accessibility, using the subset of data from the second field experiment we performed 1) analyses of models fit between ancestry proportion and fitness (i.e. Figure 1—figure supplement 3), and 2) analyses estimating accessibility between generalists and either trophic specialist (reported in Appendix 1—table 19).

      Overall, we found qualitatively similar results between analyses conducted using either all samples or only those in the second experiment. As a result, we report results for all samples in the main text while referencing the analyses of the second field experiment alone which are presented in the supplementary material.

      A second major concern I have is in the use of the Admixture software (Figure 1 and l152.) The generalist type is assumed to be the ancestral type. Yet, a unique group was not assigned to it. This is a known problem for Admixture (Lawson et al. 2018). Groups that are under-sampled are far more likely to be consider a mixture of different ancestry groups even when this is impossible (Rasmussen et al 2010, Skolung et al 2012). While this in itself is not problematic, I am concerned about the use the authors are making of these ancestry proportions (l 156-165). The authors analyzed how ancestry of scale eater or molluscivore affect survival probability, growth, or the hybrid composite fitness. However, the ancestries values are partly generated due to an artefact, so I wonder how modelling the ancestral type as a group, and therefore acknowledging some amount of share ancestry between the three species may further affect this analysis.

      We agree that the ancestries estimated for the generalists by our unsupervised admixture analyses appear to be confounded and we briefly allude to this in the text. In our original submission, we focused exclusively on molluscivore and scale-eater ancestry, which appear less biased by this artifact. To address this concern, we ran new admixture analyses using a supervised analysis, a priori assigning generalists, molluscivores, and scale-eaters to one of three populations. Ancestry proportions of hybrids were then inferred for each of three clusters. We now include new analyses of fitness by ancestry associations using these admixture proportions and found qualitatively similar results. We report these new analyses in the results and supplemental material.

      We also conducted analyses using only samples from the second field experiment (related to the first concern raised by the reviewer). In all, we now include the following analyses of the extent to which the three fitness measures are associated with each of the three ancestry proportions using:

      1) an unsupervised admixture analysis (Appendix 1—table 2), 2) all samples using a supervised admixture analysis (i.e. model is informed a priori which samples are known to belong to either of the three assumed populations/parental species: Appendix 1—table 3), 3) only samples from the second field experiment (Martin & Gould 2020) in which lake was not found to significantly affect fitness using an unsupervised analysis (Appendix 1—table 4).

      Importantly, results are qualitatively the same; ancestry proportions do not strongly influence fitness in this system. There is one exception – generalist ancestry appears to positively predict growth when modeled using all samples and the supervised admixture analysis (Appendix 1—table 3). However, the inconsistency of this result across the three analyses leads us to cautiously interpret this exception

      I understand the need to use subsets of a network, due to impossibly large dimension size of the network in the first place. However, subsetting said network may give the wrong impression of the whole network (Fragata et al 2019). I wish this point was further discussed here.

      We have followed this suggestion. In our now-expanded and significantly revised discussion, we include discussion of this limitation, citing Fragata et al (2019) as well as related works. We also discuss how estimation of combinatorially complete fitness landscapes may be misleading, as their topography is determined in part by epistasis that occurs among loci that are not segregating in natural populations. We also suggest that the ‘realized epistasis’ that occurs among only those loci that are naturally segregating in a population may be why the shape of the fitness landscape, and thus accessibility of fitness peaks, changes upon the appearance of adaptive introgression and de novo mutations.

      L 294-295: I wonder whether the results here could be used to discuss the geometry of the different fitness peaks. The small number of steps within molluscivores suggest a rather narrow peak, while the rather large ones within the generalist suggest a rather flat fitness peak. The shape of the peak can be linked to the amount of genetic variation that can be maintained within populations, as well as the mutational load of said populations.

      This is an excellent suggestion and led us to consider the ruggedness of our fitness landscapes as an additional factor affecting evolutionary accessibility. We now interrogate the geometry of the fitness landscape further, asking for each specialist, how many local peaks exist on their respective landscapes (i.e. the ruggedness), how far specialists are from these peaks, and how accessible these peaks are to specialists. We elaborate on these findings in the discussion as recommended.

      These expanded analyses further led us to similarly investigate the influence of each source of genetic variation on the ruggedness of the fitness landscape. Consequently, we now discuss in more detail the interplay between fitness landscape ruggedness and accessibility of interspecific genotypic paths, in the context of what sources of genetic variation are available. We show that the presence of adaptive introgression and de novo mutations both increase the accessibility of interspecific genotypic paths, while decreasing fitness landscape ruggedness. We now discuss how this finding makes sense in light of epistasis; changes to the pool of segregating genetic variation alters the ‘realized epistasis’ in natural populations, thus altering the shape of the fitness landscapes and ultimately the evolutionary outcomes favored by natural selection.

      L74-75 I would suggest to more cautious in the phrasing here. While this is true within Fisher geometric model, where population are assumed monomorphic and infinite, this is not true in general. Deleterious mutations can fix within populations, especially when drift is non negligible. Crossing fitness valleys has been quite widely investigated (see Weissman et al 2010 for example). Even the authors themselves mention it later (l 108).

      We tempered these statements as recommended and expand our references to include Weissman et al. 2010 and additional references describing these caveats.

      Lastly, I would be more cautious about the conclusion. Line 373-374, the authors mentioned that "de novo mutations may enable the crossing of a large fitness valley". Given that the authors focus only on adaptive walk (fitness always has to increase between each mutational step), there is no crossing of fitness valleys. Switching from one fitness peak to another is simply a matter of walking along a (very) narrow ridge.

      We revised our language as recommended, emphasizing that our results support an interpretation in which apparent phenotypic fitness valleys are crossed along narrow fitness ridges, which are not observed in a three dimensional morphospace, to reach new fitness optima.

      Reviewer #3:

      This paper uses sophisticated regression methods and numerical experiments to produce a genotype-fitness relationship for three closely related sympatric pupfish species, forming an adaptive radiation. In addition to providing insights into the genetic targets of selection, this paper goes further in attempting to tease out what types of genetic variation were most likely to have played key roles in this radiation.

      Strengths:

      The idea behind this study is excellent, and clearly a large amount of thought and effort went into collecting the underlying data. The attention paid to linking evolutionary dynamics with the fitness results is laudable. The system is extremely exciting and I think an experiment and analysis of this sort could potentially be interesting to a broad audience within evolutionary biology.

      Weaknesses:

      The claim that this is the first genotypic fitness network in a vertebrate needs additional qualifiers: as far as I can tell, the claim to novelty is based on the inclusion of multiple species, the number of alleles, and measuring fitness in the field. I can't fully assess this claim but I would urge the authors to avoid staking a stronger claim to priority than is really needed, as it might be a lightening rod for criticism and hair-splitting that would distract from the contents of the paper.

      We tempered this claim as suggested, removing it from the title, and de-emphasizing or removing this claim elsewhere throughout the manuscript.

      One of my major questions while reading this was whether these three species were better or worse adapted to subenvironments within the lakes. This is partially answered in a few places in the manuscript, but I think that resolving this point more precisely would help interpret if positioning all three species on the same fitness landscape is fair.

      We have included more description/discussion of the ecological differences between species to the manuscript, particularly their habitats within the lake. We now point out that all three species coexist within the benthic littoral zone of each lake. No habitat segregation among these species has been observed in 13 years of field studies, suggesting that it is reasonable to position all three species within the same fitness landscape. Their foraging also occurs within the same benthic microhabitat throughout each lake; indeed, the scale-eaters target their generalist neighbors for scale attacks. This thinking also underlies much of the theory of speciation and adaptive radiation. We now include these qualifiers in the text as well.

      I find it a little hard to follow the construction of the landscapes in Fig. 2 B and C. I am not clear why the landscapes don't cover the location of the molluscivore population.

      We now include a brief statement that estimated values of fitness are only plotted for samples within the observed morphospace in the hybrids. That is, because none of the hybrid phenotypes were morphologically similar to the most divergent molluscivore phenotypes, we could not measure fitness values for this region of morphospace. However, there were hybrid phenotypes that fell within the 95% confidence ellipse of the lab-reared molluscivore population, suggesting that we have good power to detect adaptive walks to this region of the morphospace.

      I think the fitnesses predicted for the main bulk of the generalists and scale-eaters are the same across the two landscapes (as I expect they would be), but this is obscured by the differing fitness ranges of the two landscapes. I would suggest using a single color-fitness relationship for the two panels to aid cross-comparison.

      We re-plotted these landscapes using a uniform color scheme across panels as recommended.

      Also, two salient features of the landscape-the major peak at the top center and the deep pit at the bottom center-seem to be supported by few fish in each case. I would imagine that something like boot-strapping could be done for fitness landscapes, where the support for each feature of the landscape could be judged by how often it appears in subsets of the data (or in inferred models with nearly as high support as the best model), but I acknowledge that might be very hard to do. Still, I think some statement of uncertainty should be prominently included.

      We followed this suggestion and now more explicity address uncertainty in our estimation of three-dimensional fitness landscapes, with particular focus on the landscape we devote the most attention to (Fig. 2c-d – composite fitness + genotypes).

      To quantify uncertainty, we conducted a bootstrap procedure as suggested in which we resampled hybrids with replacement, re-estimated the fitness landscape, and compared the topology of the predicted fitness landscapes to that of the observed fitness landscape (Figure 2—figure supplement 7). Even across the bootstrap replicates, we still recovered the same general features – a peak localized near generalists, a fitness valley near scale-eaters, and a fitness ridge/modest peak near molluscivores.

      Furthermore, we emphasize more strongly in the revised manuscript our point that three-dimensional representations of the fitness landscape may in fact mislead interpretations of how evolution proceeds. In that respect, even though we recover the same features of the landscape when accounting for uncertainty, we articulate that these inferred peaks and valleys separating populations may be bridged in multidimensional genotype space.

      More generally, the landscapes reconstructed in Fig. 2 do not show very clear evidence that the M or S types are separated by valleys from the G type. Close inspection of the figure suggests a very shallow valley might be present between G and M, but the overall trend is declining fitness; between G and S, fitness appears to simply decline. While peaks may occur within the landscapes composed of limited sets of loci, the overall pattern seen in Fig. 2 doesn't seem conducive to analyzing how adaptive evolution in generalists crossed valleys to reach the putatively higher peaks of the two specialists. As such, I find the connection between these phenotypic-fitness landscapes and the later genotypic fitness landscapes quite confusing.

      We thank the reviewer for this comment. The apparent disconnect noted by the reviewer is in fact a point that we would like to draw more attention to. Thus, we have revised much of the discussion of these results to address this.

      As discussed in our response to the reviewer’s previous comment, the three dimensional landscape contrasts with our inferences from genotypic fitness landscapes. This incongruence demonstrates, through example, how three-dimensional fitness landscapes may in fact mislead our intuition about how evolution proceeds.

      As has been discussed extensively in the fitness landscape literature (e.g. Kaplan et al. 2008; Gavrilets 2010; Fragata et al. 2019), reduction of the fitness landscape, which is inherently highly multidimensional (as originally recognized by Wright), to only three dimensions can mask viable evolutionary trajectories, underestimate the number of peaks, and oversimplify our understanding of how populations evolve. We now attempt to better clarify and discuss this in the revised manuscript.

      I also had trouble understanding the role of fitness in the analysis of mutational distances in a subset of loci between the three species (lines 282-296). While the illustration in Fig. 3C uses directed edges to capture fitness data, this framework doesn't seem to be applied in Fig. 3d or the resulting analyses in 3e. As such, I don't see how this section is about genotypic fitness landscapes at all.

      We followed this suggestion and have rearranged our figures and their constituent panels to provide a more coherent illustration of our results and analyses. Figure 3 now serves to describe 1) the focal loci used to construct genotypic networks and 2) the general structure of genotypic networks constructed using loci sampled across all three species. What is now figure 4 is dedicated explicitly towards investigation of genotypic fitness landscapes, describing how we incorporated fitness measures into these networks to identify accessible path. This figure also serves to describe the fitness landscapes for each specialist, quantifying accessibility of interspecific genotypic trajectories, and landscape ruggedness. Our discussion of these sections similarly attempts to distinguish their respective focus, emphasizing that investigation of the general isolation of each species on genotypic networks will help provide context for our later focused investigation of fitness landscapes.

      The final part of the conclusion sketches a story in which de novo and introgressed alleles reduce the accessibility of reverse evolution, back to a generalist. I think this is conceptually confusing because we don't expect evolution to favor paths toward lower fitness, even if those paths do not pass through a valley. Again, the framing here-that generalists are less fit than either specialist-is hard to square with the facts that generalists seem to be coexisting with the specialists, and much closer to the hypothesized fitness peak than is either specialist.

      We agree and have completely rewritten this section and removed this framing. We omitted this part of the conclusion entirely, as we felt it too speculative, and as noted by the reviewer, difficult to square with some of the rest of our findings. Instead, we now devote more focus on other aspects and implications of our findings in a new discussion section as requested by reviewer 1.

      This is a complicated and ambitious paper, on an exciting system and aiming at important questions. I think the main results about genotypic-fitness networks are hard to relate back to the other major analyses in the paper due to the points raised above. Moreover, using fitness measurements of three coexisting species to infer how they evolved faces a major obstacle: if fitnesses are frequency-dependent, then the actual trajectory of an initially rare variant will be completely obscured post-invasion. This possibility, as well as the potential issue that data on reproductive success might change these findings, need to be discussed, especially in light of the puzzling fact that the specialists appear less fit than their ancestor in at least one of the paper's major analyses.

      We now emphasize the apparent disconnect between three-dimensional fitness landscapes and the highly dimensional genotypic fitness landscapes as noted by the reviewer (see above). We hope to demonstrate through example how highly dimensional genotypic fitness landscapes may harbor numerous viable evolutionary trajectories (e.g. fitness ridges) on rugged fitness landscapes that are unobservable on low-dimensional representations. Additionally, we expand our discussion of the caveats in our analyses pertaining to the use of data on contemporary species to infer historical dynamics on the fitness landscape as recommended by the reviewer.

      We also now note that no evidence for frequency-dependent selection has been found in this system (Martin and Gould 2020; Martin 2016). We previously explicitly manipulated the frequency of rare phenotypes between treatments and found no effect of treatment across lake populations. Rather, these fitness peaks and valleys appear surprisingly stable across lakes, treatments, and years.

      Regardless, we now include in the discussion that we necessarily have taken a ‘birds-eye view’ of evolution here, describing the influences of different sources of genetic variation on the fitness landscape, after these have already undergone selective sweeps. Likewise, we acknowledge that it is impossible to quantify reproductive success in this system using field enclosures due to the very small size of newly hatched fry and continuous egg-laying life history of pupfishes. This is a limitation of our system. We take this opportunity to emphasize that other experimental or simulation studies would be invaluable to quantify the changing influence of these different sources of genetic variation on the fitness landscape as a function of time, during the process of selective sweeps.

    2. Reviewer #3 (Public Review): 

      This paper uses sophisticated regression methods and numerical experiments to produce a genotype-fitness relationship for three closely related sympatric pupfish species, forming an adaptive radiation. In addition to providing insights into the genetic targets of selection, this paper goes further in attempting to tease out what types of genetic variation were most likely to have played key roles in this radiation. 

      Strengths: 

      The idea behind this study is excellent, and clearly a large amount of thought and effort went into collecting the underlying data. The attention paid to linking evolutionary dynamics with the fitness results is laudable. The system is extremely exciting and I think an experiment and analysis of this sort could potentially be interesting to a broad audience within evolutionary biology. 

      Weaknesses: 

      The claim that this is the first genotypic fitness network in a vertebrate needs additional qualifiers: as far as I can tell, the claim to novelty is based on the inclusion of multiple species, the number of alleles, and measuring fitness in the field. I can't fully assess this claim but I would urge the authors to avoid staking a stronger claim to priority than is really needed, as it might be a lightening rod for criticism and hair-splitting that would distract from the contents of the paper. 

      One of my major questions while reading this was whether these three species were better or worse adapted to subenvironments within the lakes. This is partially answered in a few places in the manuscript, but I think that resolving this point more precisely would help interpret if positioning all three species on the same fitness landscape is fair. 

      I find it a little hard to follow the construction of the landscapes in Fig. 2 B and C. I am not clear why the landscapes don't cover the location of the molluscivore population. I think the fitnesses predicted for the main bulk of the generalists and scale-eaters are the same across the two landscapes (as I expect they would be), but this is obscured by the differing fitness ranges of the two landscapes. I would suggest using a single color-fitness relationship for the two panels to aid cross-comparison. Also, two salient features of the landscape-the major peak at the top center and the deep pit at the bottom center-seem to be supported by few fish in each case. I would imagine that something like boot-strapping could be done for fitness landscapes, where the support for each feature of the landscape could be judged by how often it appears in subsets of the data (or in inferred models with nearly as high support as the best model), but I acknowledge that might be very hard to do. Still, I think some statement of uncertainty should be prominently included. 

      More generally, the landscapes reconstructed in Fig. 2 do not show very clear evidence that the M or S types are separated by valleys from the G type. Close inspection of the figure suggests a very shallow valley might be present between G and M, but the overall trend is declining fitness; between G and S, fitness appears to simply decline. While peaks may occur within the landscapes composed of limited sets of loci, the overall pattern seen in Fig. 2 doesn't seem conducive to analyzing how adaptive evolution in generalists crossed valleys to reach the putatively higher peaks of the two specialists. As such, I find the connection between these phenotypic-fitness landscapes and the later genotypic fitness landscapes quite confusing. 

      I also had trouble understanding the role of fitness in the analysis of mutational distances in a subset of loci between the three species (lines 282-296). While the illustration in Fig. 3C uses directed edges to capture fitness data, this framework doesn't seem to be applied in Fig. 3d or the resulting analyses in 3e. As such, I don't see how this section is about genotypic fitness landscapes at all. 

      The final part of the conclusion sketches a story in which de novo and introgressed alleles reduce the accessibility of reverse evolution, back to a generalist. I think this is conceptually confusing because we don't expect evolution to favor paths toward lower fitness, even if those paths do not pass through a valley. Again, the framing here-that generalists are less fit than either specialist-is hard to square with the facts that generalists seem to be coexisting with the specialists, and much closer to the hypothesized fitness peak than is either specialist. 

      This is a complicated and ambitious paper, on an exciting system and aiming at important questions. I think the main results about genotypic-fitness networks are hard to relate back to the other major analyses in the paper due to the points raised above. Moreover, using fitness measurements of three coexisting species to infer how they evolved faces a major obstacle: if fitnesses are frequency-dependent, then the actual trajectory of an initially rare variant will be completely obscured post-invasion. This possibility, as well as the potential issue that data on reproductive success might change these findings, need to be discussed, especially in light of the puzzling fact that the specialists appear less fit than their ancestor in at least one of the paper's major analyses.

    1. Author Response:

      Reviewer #1 (Public Review):

      The observation that the cells are able to steadily move along the light axis but perpendicular to their long axis is very interesting considering the T4P appear to be bipolarly localized. There is some discussion on the micro-optic effect in single cells but it does not include the observation that the negative phototaxis to green light occurs no matter where the direction of blue light comes from or the micro-optic effect in a microcolony.

      We have added the following sentences in the Discussions part (p16 L363-372) in the Related Manuscript File: “The focused green light would excite yet unknown photosensory molecules to induce spatially localized signalling, whereas the position of the focused blue light is not crucial for directional switching. As we showed, the direction of blue light illumination did not influence directionality of movement, because cells do not move in random orientation (Figure 2 – figure supplement 6). Thus, blue light does not control the directional light-sensing capability, instead it provides the signal for the switch between positive and negative phototaxis. This is very similar to the situation in Synechocystis where the blue light receptor PixD controls the switch between negative and positive phototaxis independently of the position of the blue-light source (Sugimoto et al., 2017).”

      Reviewer #2 (Public Review):

      I- The author's attribute the defect of negative phototaxis observed in the SesA mutant to the level of C-di-GMP in the cell, mainly because a SesA mutant shows a two fold decrease in C-di-GMP concentration upon blue light treatment. However, this measurement has been realised in a batch culture and normalised to dry cell mass. At the opposite, the negative phototaxis observed at single cell level occurs in a range of less than a minute (Figure 2). It would be therefore important for the author's to strength the implication of C-di-GMP in the phototaxis regulation. For example, the author's could ectopically modulate the level of C-di-GMP in the cell, via the expression of ectopic a diguanylate cyclase or phosphodiesterase enzymes, and observe its effect on phototaxi

      We highly appreciate your evaluation and comments. As we pointed out in our response to reviewer 1, utilizing heterologous expression systems in T. vulcanus is challenging, maybe due to the cultivation of cells at of 45°C. However, we were lucky in isolating a spontaneous mutant (named WT_N) that shows constitutive negative phototaxis under lateral light illumination. By comparative genomics, we identified the frameshift mutation that confers an increase of the intracellular concentration of c-di-GMP and which was accompanied by negative phototaxis under the condition where the WT cells showed positive phototaxis (Figure 4). We have added a paragraph in the Results part for these experiments on p9-10 (L201-219). See also our comments to the other reviewers and the editor concerning these new experiments, which support the role of c-di-GMP in directional switching. In addition, the figure formerly assigned as Figure 3 – figure supplement 1 was moved to the main manuscript as Figure 3C, because we think that the data of the intracellular concentration of c-di-GMP are very important to support our conclusions.

      II- The author's used fluorescent beads to visualize T4P dynamics. As it was previously described, the author's show that it is specific of the T4P activity and it also can reveal T4P retraction. Then, the author's used this method to convincingly show that cells that move perpendicular of the light source have only active pili at one half of the both cell poles (Fig6). It is an interesting observation but again it gets short of details.

      -The manuscript would definitively benefit from more general analysis of T4P dynamics during phototaxis. For example, during the switch from positive to negative phototaxis. What are the behaviours (T4P pole activation) of cells parallel to the light source?

      -Beside, as suggested by the author's in the discussion, having the intracellular localisation of the Atpase PilB would definitively be a plus.

      -Moreover, in the discussion section the author proposed the existence of "a specific signalling system with high special resolution" to explain the asymmetric polar T4P activation. Why could it not be a molecular mechanism similar to the one observed in round cell such as Synechocystis, where the light receptor PixD regulates T4P function at some part of the cell according to the direction of the light.

      In order to get more direct insights into T4P dynamics, we have performed additional experiments, which are summarized in Figure 8 and Movies S17-20. Importantly, we succeeded in visualizing T4P filaments by PilA1 labelling using live cells. The T4P filaments were bipolarly localized and showed dynamics of assembly and retraction at both cell poles. When the cells moved perpendicular to their long axis, the T4P filaments at both poles showed biased distribution towards the same direction of cellular movement. These results support our idea that T4P are asymmetrically activated within a single cell pole. This asymmetric activation can rely on the localization of PilB ATPase. We would like to address how a molecular machinery such as PilB governs directional switching events. However, GFP-tagging has not been established in thermophilic cyanobacteria so far. We have added a chapter in the Results part for these experiments p13-14 (L296-322) in the Related Manuscript File. Please, also pay attention to our answers to similar comments of the other reviewers.

      Our results suggest that the T. vulcanus cell can actuate the spatially resolved signaling even within a cell pole to activate the pilus activity at only one side of a cell pole to enable biased cellular movements. This finding means that the cell harnesses "a specific signalling system with high special resolution" compared to other rod-shaped bacteria showing pole-to-pole regulation of cell polarity. We do not exclude that a system which works similar to the PixD/PixE complex in Synechocystis contributes to the asymmetric localization of the pili in Thermosynechococcus motility. Thermosynechococcus encodes a PixD protein but no PixE homolog. For Synechocystis, it was shown very recently that PATAN domain response regulators (including PixE) bind PilB1 and PilC and can switch the direction of movement (Han et al. Mol. Microbiol. 2021). Thermosynechococcus encodes homologs of such PATAN-domain response regulators, but at the moment, we do not know whether they have a similar function in both cyanobacteria.

      III- The links between the C-di-GMP concentration and T4P dynamics during the switch from positive to negative phototaxis is absent. The author's proposed in the discussion a potential binding of C-di-GMP to PilB as previously shown for some T4P. Could it be tested here by the author's since they seem to be able to handle C-di-GMP?

      The experimental verification of the binding of c-di-GMP to PilB is ongoing work, but it seems that direct binding of c-di-GMP to PilB is either very weak or does not happen in our setup. Thus, detailed molecular events of c-di-GMP signaling are out of the scope of the current study. However, we do show in the revised version of the manuscript that pilus extension and retraction dynamics are not different between positive and negative phototaxis (Figure 7 − figure supplement 2), suggesting that c-di-GMP most probably does not affect the activity of the PilB protein. Therefore, we have modified the sentence about the binding of c-di-GMP to PilB in the Discussion part as follows. See p17 L391-394: “Since we did not observe a change in pilus dynamics under green and green/blue light illumination (Figure 7 − figure supplement 2), the T4P regulation in T. vulcanus may not be explained simply by a specific activation of PilB (Floyd et al., 2020, Hendrick et al., 2017).”

      In addition, we have performed experiments to show additional data that the c-di-GMP levels switch the direction of T4P-dependent phototaxis (new Figure 4). We also performed additional experiments to visualize T4P dynamics by PilA labeling (new Figure 8), which suggest asymmetric activation of pili and most probably of the motor ATPases as well.

    1. Reviewer #1 (Public Review):

      This is an interesting manuscript providing important new information on the mechanism of action of EROS in the generation of superoxide by the NADPH oxidase of neutrophils. The authors have shown in previous publications that EROS deficiency results in defective NOX2 activity and thus represents a hitherto unrecognised, rare form of chronic granulomatous disease. They now show how EROS is involved in oligosaccharide transfer during the maturation of gp91phox and also extend what is known about the role for EROS in regulating expression of the P2x7 ion channel.

      The results presented in the manuscript are supported by findings from a variety of techniques and for the most part, are convincing and well presented. However, I do have queries about certain aspects of the manuscript.

      1. Figure 1<br /> The much lower EROS expression when gp91phox is expressed warrants a comment.<br /> Fig 1 G. Please explain what fold change represents. From F, zero time expression appears much more than the 1.5 fold higher shown in G for the EROS-expressing cells. This needs explaining. With the very high error bars (presumably for the EROS sample although this is not clear) overlapping zero I find it hard to conclude anything from this figure.

      2. P 9 line 9 states that Fig 1H shows that cycloheximide increases expression. Yet it appears from the legend that cycloheximide is present in all samples and it is EROS that increases expression. Please clarify.

      3. Fig 3A&B and p12 1st para. The identification of OST as a binding partner is interesting and a significant novel finding. However, the presentation of this information appears to me to be unduly complex and more information is required. Not all the readers will be familiar with the details of SAINTexpress methodology and more explanation of what is being shown would be helpful. At the least, a supplementary Table of the 59 identified proteins would be helpful, plus information on controls to establish selective pull down by EROS and on how the blue spots in A relate to the proteins. Also please make it clearer which of the proteins in B were identified and the relevance of showing all the steps in the pathway.

      4. Figure 6. This contains a large amount of information. Although interesting, I am concerned that the authors may be trying to include too much at the expense of the necessary detail for some of the experiments. For example, the EROS -/- +ATP scattergram on the left of Fig 6E does not seem to agree with the right hand graph. I would also like to see the mean values for the 5 experiments in Fig 6G shown. Most importantly, insufficient information is given for Fig 6H. I don't think I missed it but I could find no details about the experiment in the Methods section. We need to know more about exactly how many animals in were in each group (death of 1 animal appears to equate to 5% of total - how does this relate to >10 in total), how signs of illness were monitored and related to death, and generally more about the conditions of the experiment. Alternatively, this may be better left to a more detailed study.

  4. docdrop.org docdrop.org
    1. Thus the strongest research evidence appears to indicate that money matters, in a variety of ways, for children's long-term success in schoo

      Money suggests recourses. Things kids can obtain, chances they can have and people they can meet. I had a former roommate who said that we come from different hierarchies, because her family income is more than 10 times of mine. I do see some difference between us, but I think the difference is not as big as the poverties and the riches. Middle class families can basically make sure that their children get enough resources. The richer families may hold better resources, but this is a gap that somehow not that big. The problem for now is how to give the kids from poverty families get the resources, no matter how good the resources are, I hope at least they can have the basic needs being met.

    1. Author Response:

      Reviewer #1 (Public Review):

      The manuscript by Liu et al investigates how MRI can be used to detect the earliest stages of CNS infections and how MRI can also be used as a surrogate readout for treatment efficacy. Authors demonstrate convincingly that microbleeds, as evidenced by unusual dark spots in the brain of mice infected with a virus that infects the brain, occurred at the earliest stages of viral infection. Authors also convincingly demonstrate that the infusion of virus-specific immune cells, when delivered at the right time and at the right dose, could reduce these microbleeds. Importantly, authors showed that the wrong dose could be detrimental.

      The authors cast this study as a method for improving research and discovery in immunotherapy context and the study is convincing in its conclusions regarding imaging microbleeds and the immunotherapy tested herein. While authors do not directly suggest so, these findings extend the significance of this work beyond research and development of immunotherapies by providing a potential early detection mechanism for viral infection in the brain. This may be feasible as the MRI methodologies for detecting these phenomena are generally translatable to clinical imaging scenarios, though the imaging resolution may not.

      Weaknesses in the report revolves around the value of and the ability to image magnetically labeled T cells in the presence of microbleeds.

      1) Authors developed a magnetic particle coated with fluorescent molecules and antibodies specific for CD8+ T cells. They labeled these T cells with particles for detection by MRI. They then wanted to follow the accumulation of these cells in the brain following infusion and viral infection by performing MRI using parameters that amplify the signal of the attached label. The rationale for these experiments was to determine if immune cell infiltration preceded vascular compromise. This suggests the expectation for active chemotactic migration or other signaled accumulation rather than leakage. When authors tested their magnetically labeled T cells for functional impairment due to the presence of attached magnetic particles, they did not test for deficits to migratory capabilities, such as in standard transwell migration assays. Others have shown (see https://doi.org/10.1038/nm.2198 for example) that T cell migration is very sensitive to the type of attached nanoparticle as well as the surface coverage. Perhaps authors should temper their claims that magnetically labeling of T cells does not alter T cell function without at least an assay of this critical function. Further, the fluorescence microscopy shown in Figure 7D is of insufficient resolution to claim that MPIOs are inside cells. Electron microscopy should be used to determine this.

      We thank this Reviewer for the comments. In this Revision, we added EM data to confirm the cellular location of MPIOs (Fig 7D and S7D). The EM experiment also added another layer of information for improving our cell isolation method. We improved our FACS experiment by narrowing down the MPIO positive gating to exclude the T cell population that labeled with high numbers of MPIO particles, which may affect T cell functions, and some crosslinked MPIO particles that formed during conjugation (Fig 7B and S7A). The yield of FACS of MPIO-labeled T cells is ~8.3%. As quantified from EM images, 91% MPIOs were localized intracellularly (Fig 7E). We agree that labeling T cells with nanoparticles might alter key T cell functions. We have improved the manuscript by putting this caution and reference. We also added T cell migration assay results (Fig 7G). Labeling CD8 T cells with MPIO did not affect T cell migration. This adds to our other in-vitro assays that T cell function is not significantly affected. There is in-vivo evidence as well that labeled T cells are functional. In Fig 8E-I, MPIO-labeled T cells were found in the brain, which showed that labeled T cells can migrate into the brain. In addition, a key phenotype of virus specific CD8 T cells in this model is the therapeutic function described in the manuscript. Labeling virus specific CD8 T cells with MPIO did not affect their therapeutic function. Quantification of bleeding in the OB and brain on day 6 and 11 verified the therapeutic effects of MPIOlabeled OT-I T cells (Fig 1E and 2C vs Fig S9C and D). We added discussion of these points in this Revision.

      2) Regarding the use of imaging the accumulation of magnetically labeled T cells, authors show evidence that magnetically labeled T cells accumulate in areas of the brain that as yet do not present with microbleeds but do have the histological hallmarks of vascular inflammation. This corroboration is intriguing but only provable with a serial imaging study in the same animal, which was not performed. Authors are also encouraged to report on the frequency in which a magnetically labeled T cell was present in a pre-vascular compromised inflammatory environment. The bulk of the results on imaging magnetically labeled T cells essentially show that the accumulation of magnetically labeled T cells enhances the ability to detect microbleeeds that otherwise were perhaps too small to detect (Sup Fig 8). Given the lack of data supporting the retained migratory capacity of magnetically labeled T cells, one wonders then, whether magnetically labeled T cells are indeed trafficking to the brain or are passively arriving in the brain, and might some vascular magnetic particle accumulate in an early inflammation or leak into the microbleed on its own and similarly enhance the ability to detect the otherwise undetectable microbleed. A series of controls would be useful to answer these questions, perhaps testing the administration of magnetic particles alone, and/or magnetically labeled non-CD8+ T cells. Authors are also encouraged to report on the frequency in which a magnetically labeled T cell was present in a pre-vascular compromised inflammatory environment versus in the microbleed, as measured by MRI and histology.

      Distinguishing bleeding from T cells is a key challenge for doing a serial MRI study in the same animal. In the new Fig 8I and Fig S8, we did a study using time-lapse MRI on the same mouse from 20 to 24 hr-post infection. We observed the appearance of hypointensities at the center of the bulb at 22 hr which is prior to bleeding in this area. Bleeds were observed at the GL, but not at the center of the bulb by IHC. Thus, we were able to time the entrance of T cells in this area of the brain. We were not able to find migration tracks of T cells from the outer GL layer into the center of the bulb. This is consistent with the idea that T cells infiltrate directly into areas with virus prior to vessel breakdown and microbleeds. We didn’t observe a very significant change in the location of T cells from 22 to 24 hr on the distance scale of MRI. There are two possibilities to explain our inability to detect T cell movement over a 2 hr time interval: 1.) the T cells under investigation may have been attached to blood vessels and required more time to extravasate. surface due to inflammation, and it might take some time for extravasation, or 2.) although T cell velocities in the CNS have been clocked at ~10 µm/min (Herz et al., 2015), their paths are often tortuous and influenced by antigen presenting cells displaying cognate peptide MHC as well as local chemokine gradients. Thus, upon entering a site of viral infection, the labeled T cells may not have traveled far enough in 2 hrs for us to detect their movement by MRI. We did not image mice beyond 24 hrs post-infection due to the possibility of bleeding. We added this discussion. Quantification of the frequency in which a MPIO labeled T cell was present in a region where no bleeding was detected versus in a region with a microbleed was added in Fig 8H. In the ONL/GL, 85% of MPIO-labeled T cells were in the region with microbleeds and 15% were in a region where no tissue bleeding was detected. In the MCL/GCL areas, no evidence for bleeding was detected. Magnetic labeling of CD8 T cells doesn’t reduce their migratory capacity in an in-vitro migration assay (Fig 7G). This adds to other in-vitro assays that the labeled T cells are functioning. Labeled T cells had therapeutic efficacy like unlabeled T cells and labeled T cells were found at the center of the bulb (Fig 8F-I) with no bleeds as well as in other brain regions. Based on these observations, we think that MPIO-labeled T cells are functioning and trafficking in the brain. A previous study showed that non-CD8 T cells, such as monocytes/macrophages, CD4 T cells, and neutrophiles also migrate into the OB and are involved in the immune responses in this model [(Moseman et al., 2020), Fig 2E]

      Reviewer #2 (Public Review):

      [...]

      Weaknesses:

      • Individuals with systemic infections or other underlying condition may have microbleeds due to inflammation or hypertension. The etiology of microbleeds is thus not necessarily tied to CNS infections. Investigation of potential cerebrovascular microbleeds following systemic or respiratory infections not affecting the CNS may shed light on this possibility which may also provide alternative interpretation of neurological symptoms associated with on CNS invasive infections.

      This is an important issue. Prior work has shown that virus in this model is cleared quickly (2 to 3 days) from the periphery (Ramsburg et al., 2005; Roberts et al., 1999). This is likely due to the fact the virus is inoculated through the nose. It is clear in this model that virus infects the brain, that bleeding corresponds to sites of high viral load, and bleeding can be modulated by blocking immune infiltration into the brain. However, the quantitative role of peripheral influences such as high blood pressure could be important and will be checked as this work proceeds.

      • Representative colocalization of virus infected endothelial cells with red blood cells (RBCs) is shown in Fig 4. However, a more quantitative assessment indicating how many areas or hypointensities were evaluated for virus-localization with RBCs, and how many of these revealed colocalization versus virus or RBC only would strengthen interpretation.

      Fig 4 shows that VSV can infect vascular endothelial cells and cause bleeding. Hypointensities were not measured in this Figure. We quantified the numbers of VSV infected vessels, colocalizing and not colocalizing with bleeds. Fig 4D was added with this new data.

      • A limitation clearly acknowledged by the authors is that hypointensity spots detected by MRI cannot distinguish microbeads from MPIO-labeled T cells.

      As in our response to Reviewer 1, this is a critical next step since bleeding so often occurs with immune cell infiltration in the brain. We have discussed potential approaches and have added the idea that development of more sensitive MRI contrast agents and quantitative T2* analysis especially at different magnetic field strengths may be approaches to accomplish this. It will be crucial for MRI cell tracking under the condition of bleeding, which is one common pathology associated with many diseases.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript, the authors exploit retinal cell proliferation and neurogenesis in zebrafish to study banp, a protein that is essential in humans and embryonic lethal in mice. The authors performed large-scale mutagenesis and identified a mutant known as "rw337" that compared to WT cells the mutant zebrafish have smaller eyes and optic tectum. They found that the retinas of these mutants have mitotic-like round cells that accumulate indicating mitotic arrest. Sequencing of these mutants identified that the rw337 mutant gene encodes a truncated banp protein. Expression of WT Banp occurs primarily in retinal and neuronal cells in Zebrafish. Interestingly, rw337 showed significant decrease in retinal photoreceptors number and neuronal formation within the OPL and IPL were morphologically disrupted and had fewer cells. The authors found that rw337 cells have increased numbers of DSBs in the retina over time (via TUNEL) assays. They found that mitotic defects and apoptosis are spatially and temporally occurring in distinct regions of the retina as prolonged phosphorylation of histone H3, which indicates an issue in exit of mitosis, occurred in apical surface of the neural retina whereas apoptosis occurred in retinal progenitor cells (via Caspase 3 staining). The authors then went on to examine the role of replication stress regulators like p53, atm, and atr and showed that protein and RNA levels of banprw337 were increased and upregulated. As p53 binds banp in zebrafish, it was not surprising that regulators of p53 were enhanced in banprw337 mutants. Intriguingly, the authors found that two genes which are essential for chromatin segregation were downregulated in banprw337 mutants and banp morphants as a result of chromatin accessability decreases near the TSS of resulting in decreased transcriptional activity of cenpt and ncapg genes. Finally, the authors temporally monitored mitosis in mitosis of banprw337 mutants and found that chromosomal segregation is abnormal and takes longer. The authors have performed a thorough analysis of the impact of the banp gene on retinal biology and its importance regulating replication stress response and cenpt and ncapg expression. This paper is important to retinal biology, genome stability, and replication stress response fields and requires minor revision.

      Strengths:<br /> • These studies exploit zebrafish retinal development and its cell-cycle regulation as knockout of Banp/ SMAR1 is an essential gene in human cells and embryonic lethal in mice.<br /> • The authors show that this gene is involved in replication stress responses involving p53, atm, and atr signaling.<br /> • The authors show that banp is required for chromatin segregation factors and chromatin accessability by binding to banp sequences (TCTCGCGAGA) upstream of specifically cenpt and ncapg. Interestigly the mutant rw337 had decreased chromatin accessability near the transcript start sites of these genes. This is an elegant study of how a gene is regulating the transcription of two genes essential for chromatin segregation.<br /> •<br /> Weaknesses:<br /> • The authors could highlight the protein names of both zebrafish and humans throughout the text using standard nomenclature description with humans proteins all capitalized etc... This will enable the reader to understand their findings in the context of fascinating biology and human disease/cancer.

      We have revised nomenclature of genes and proteins throughout the text, consistent with nomenclature conventions as follows.

      species /gene/ protein zebrafish / banp / Banp mouse / Banp / BANP human / BANP / BANP

      In the revised manuscript, we have used human/mouse/zebrafish nomenclature in sentences relating findings that were achieved using human/mouse/zebrafish samples, respectively.

      • As banprw337 mutants show such severe morphological disruption a discussion on the impact of this work for the vision community could strengthen the importance of understanding how this gene functions.

      We appreciate this suggestion. In response to comments from the editor and reviewer #2, we have revised the Introduction to mention that vertebrate retina is an excellent model system to dissect mechanisms of cell-cycle regulation and DNA damage response-mediated neuronal cell death. We believe that our banp paper will have an impact on the retinal community. Furthermore, in addition to the role of Banp in cell-cycle regulation, most photoreceptors fail to differentiate in banp mutants, whose phenotypes are more severe than other retinal cell-types. Nuclear architecture, especially heterochromatin and euchromatin patterns, are quite differently organized in photoreceptor neurons and dynamically changed during rod photoreceptor differentiation, so we suspect that Banp may be important for photoreceptor differentiation through regulation of its nuclear organization. In the future, we will investigate this underlying mechanism. There are very interesting perspectives on retinal phenotypes in banp mutants, which may attract retinal and vision community researchers. However, these are diverse topics. So, in the current manuscript, we have limited the discussion to within cell-cycle regulation.

      • Gamma H2AX phosphorylation is a global marker of DSBs and stalled forks. The authors did not note that H2AX phorylation is present and a marker of stalled replications forks.<br /> o PMID: 11673449, PMID: 20053681, doi:10.1101/gad.2053211, https://doi.org/10.1016/j.cell.2013.10.043 etc.

      We appreciate this suggestion. We have added a statement on gamma-H2AX and cited appropriate references.

      • As gamma H2AX phosphorylation recruits DNA repair factors like BRCA2, speculation of importance of these genes may be of interest to the DNA repair community.

      We agree that to clarify which step or steps of DNA replication stress and the DNA repair mechanism are direct targets of Banp, it is important to consider how DNA repair factors are affected in banp mutants. Among Banp transcriptional target genes, we found that wrnip1 mRNA expression is significantly reduced in banp mutants. We have added these data to a new Figure 6-figure supplement 2. wrnip1 protects stalled replication forks from degradation and promotes fork restart during replication stress by cooperating with BRCA2. It was recently reported that WRNIP1 functions in translesion synthesis (TLS) and template switching (TS) at stalled forks, and also interstrand crosslink repair (ICR). It is possible that the loss of Wrnip1 causes defects in fork stabilization for restart, and ICR, leading to genomic instability. We have added this material to the Discussion and have revised a summary figure (Figure 7).

      Reviewer #2 (Public Review):

      Babu et al report the role of the zebrafish banp gene in the developing retina. They find that banp is required for faithful S-phase as well as mitosis.

      Manuscript strengths: 1- The authors performed a large-scale mutagenesis screen and successfully identified a causative banp gene mutation from these efforts, which represent a significant amount of work. 2- The authors provide a substantial amount of cellular-level analysis of a host of cell cycle-related phenotypes in the banp mutant retina. The data are of high technical quality and the experiments are well-executed. For the most part, the data support the conclusions.

      We are grateful for the reviewer’s high estimation of our work.

      Manuscript weaknesses: 1- Banp mutants have numerous defects, and perhaps this is not unexpected for a nuclear matrix protein. I'm left wondering what insights are gained from the study beyond that the nuclear matrix is required for numerous cell cycle events?

      As we mentioned in the Introduction, BANP was originally identified as a nuclear protein that binds matrix-associated regions (MARs). MARs are regulatory DNA sequences mostly present upstream of various promoters. MAR-binding proteins interact with numerous chromatin-modifying factors and regulate gene transcription. In addition, it was reported that BANP suppresses tumor growth, and that loss of BANP heterozygosity is associated with several cancers in humans. So, before we started this banp mutant analysis, we expected that loss of Banp might cause defects in the cell cycle. However, because the majority of prior studies on BANP have been done using in vitro systems, its physiological function was still ambiguous. Very recently, it was reported that BANP functions as a transcription factor that binds to Banp motifs and regulates essential metabolic genes. In this study, rather than focusing on the MAR domain, we used this Banp motif to search for direct transcriptional targets of Banp that may function in cell proliferation and differentiation in zebrafish retina. Our study provides the first in vivo evidence that Banp serves as an essential transcription activator of cell cycle genes, including cenpt, ncapg, and wrnip1 via Banp motifs. We believe that such a list of Banp direct target genes provides a new research avenue to discover more precisely how Banp functions in tumor suppression and that it will contribute to medical research on cancer therapy.

      Our study did not investigate how the nuclear matrix itself is involved in Banp mutant phenotypes. However, since it is likely that the interaction between MAR domains and nuclear matrix may influence chromatin organization in the nucleus, BANP functions must depend on nuclear matrix configuration. So, while this question is interesting, we think it is beyond the scope of our current study. In addition, we are afraid that the term “matrix-associated nuclear protein” might mislead people to think that Banp is a regulator of nuclear matrix. To better clarify the relationship between Banp and nuclear matrix, we have revised “nuclear matrix-associated protein” -> “nuclear matrix associated region-binding protein” in the text.

      2- Why did the authors focus on the eye? It is unclear whether this study revealed a sensitivity to eye development regarding nuclear matrix function specifically, or it was just a convenient place in the animal to look.

      Historically, molecular and cellular mechanisms that regulate cell proliferation and differentiation in the nervous system has been intensively studied using the vertebrate retina, because retinal neuronal cell types are fewer than those of other brain regions and its neural circuits are also simpler than those of other brain regions. Furthermore, many research groups, including us, have identified zebrafish retinal mutants, including mutants that show defects in cell-cycle regulation and DNA damage response. Indeed, our group has investigated this topic using retinal apoptotic mutants for the last 20 years. Thus, we focus on the zebrafish retina, because the retina is an excellent in vivo model system to dissect mechanisms of cell-cycle regulation and DNA damage response. To emphasize the importance of this excellent in vivo model system to researchers beyond the retinal community, we have revised in the Introduction as follows. "The developing retina is a highly proliferating tissue, in which a spatiotemporal pattern of neurogenesis is tightly coordinated by cell-cycle regulation. So, vertebrate retina provides a great model for studying how cell-cycle regulation, including DNA damage response ensures neurogenesis and subsequent cell differentiation."

      3- I found the conclusions regarding mitosis to be contradictory. The authors at first emphasize mitotic arrest, but then characterize chromosome segregation defects. How can chromosomes segregate if cells are arrested in mitosis?

      We apologize for the confusion due to our incorrect usage of the term “mitotic arrest.” Mitotic arrest was one of possibilities that we considered when first examining banp mutant phenotypes, in which we just observed accumulation of mitotic (pH3+) cells. However, when we examined mitosis in Banp morphants using live imaging, we found that mitosis duration is significantly prolonged because of chromosome segregation defects in Banp morphants, but that all 28 mitoses we examined eventually completed cytokinesis. Thus, we finally concluded that mitotic cells are not permanently arrested in M phase, but that mitosis is prolonged. To prevent confusion, we have changed “mitotic arrest” to “mitotic cell accumulation” or simply “mitotic defects” in the Results section on banp mutant phenotype analysis (shown in Figures 2 and 4).

      4- It would be important to know whether the authors can rule out that S-phase defects cause the M phase defects, or vice versa. Could there be a primary defect, rather than multiple independent defects as the authors conclude?

      We thank reviewer #2 for this suggestion. Interdependence between S phase defects and M phase defects is important to correctly interpret the data on cell-cycle regulation, especially cell-cycle checkpoint and DNA damage response. Indeed, there are interesting reports using in vitro cell culture systems indicating that replication stress induces mitotic death, through specific pathways (for example, Masamsetti et al., 2019, Nat. Comm. 10.4224. However, this topic is still challenging to dissect in vivo. In terms of our findings on Banp functions in zebrafish, we found that two chromosome segregation regulators, ncapg and cenpt, are direct transcription targets of Banp, and that it is likely that loss of Banp causes mitotic defects through downregulation of cenpt and ncapg. From this point, we conclude that mitotic defects are primary effects of the loss of Banp. The next question is how the loss of Banp stalls DNA replication forks and causes subsequent cell death. To address this question, we examined whether Banp direct targets include cell-cycle regulators, especially in S phase. We found that wrnip1 is an interesting candidate, because Wrnip1 reportedly protects stalled replication forks and promotes fork restart after DNA replication stress. In addition, Wrnip1 functions in interstrand crosslink repair (ICR). We found that the mRNA expression level of wrnip1 is markedly decreased in banp mutants, suggesting the possibility that DNA replication stress may be caused by reduction of wrnip1 expression in banp mutants. We present these data in new Figure 6-figure supplement 2. We have revised the possible role of Banp in cell-cycle regulation in new Figure7. Under this scenario, we consider it likely that loss of Banp may cause DNA replicationstress through downregulation of S phase regulators, independent of mitotic defects. However, we cannot exclude the possibility that DNA replication stress causes mitotic defects in banp mutants. Masamsetti et al., 2019, Nat. Comm. 10.4224. revealed that replication stress induces spindle assembly checkpoint (SAC)-dependent mitotic arrest and subsequent mitotic death when tp53 activity is inhibited. We showed that cell death in zebrafish banp mutant retinas was fully suppressed by tp53-MO at 48 hpf, but still occurred at 72 hpf, although there was no significant difference between wildtype and banp mutants (Figure 3GH). In the manuscript, we mentioned the possibility that some tp53-independent mechanism induces retinal apoptosis in banp mutants after 48 hpf. An alternative possibility is that most cell death in banp mutants depends on tp53; however, replication stress persisting in banp mutants injected with MO-tp53 may cause SAC-mediated mitotic death, as reported by Masamsetti et al., 2019. Future studies will be necessary to clarify this possibility.

      Reviewer #3 (Public Review):

      Babu and colleagues demonstrate that banp is expressed in the retina progenitor cells among other locations, and mutational loss of it results in increased mitosis, increased apoptosis, increased DNA damage, and the failure to differentiate photoreceptors. Importantly, these phenotypes are seen at a time period when retina progenitors undergo rapid cell cycles and differentiate into multiple cell types that make up the fully developed retina. Rescue with the wild type and phenocopy with another mutant allele provide strong support that the phenotypes results from loss of banp. Mutant animals show elevated p53 protein and reduction of p53 delays the onset of apoptosis by 24 hours. Mutant animals show altered transcriptional profile, with increased p53 expression and decreased expression of two genes that encode proteins needed for chromosome segregation. The authors propose that loss of banp results in defective DNA replication and DNA damage as well as mitotic chromosome segregation failures, all of which contribute to p53-dependent apoptosis to reduce cell number and cause developmental defects.

      Banp is a very interesting protein. Also known as Scaffold/matrix attachment region binding protein 1, it is known to regulate the transcription of a number of genes including those important in oncogenesis. In vivo function of Banp, especially in the context of normal development, remains to be better understood. The current study fills this knowledge gap but I have some concerns about the interpretation of the data, the presentation and the potential impact. Specifically:

      We are very pleased that reviewer #3 understood and appreciated the significance of our study.

      Increased expression of atm and atr is observed and the authors suggest that replication stress and DNA damage activate the checkpoints to cause cell cycle arrest. There are several problems with this conclusion, which is depicted in Fig. 4G. Checkpoint activation occurs via phosphorylation changes in ATM/ATR and not through their transcriptional upregulation, which would take too long for a response that occurs within minutes.

      We agree with the referee that upregulation of ATR/ATM mRNA expression may represent chronical activation of DNA replication stress and DNA damage response. In addition to ATR/ATM mRNA upregulation, RNA-seq analysis revealed that exo5 is one of the TOP15 upregulated genes in banp mutants (Fig. 3B). exo5 plays a critical role in ATR-dependent replication restart (Hambarde et al., 2021), suggesting that chronic replication stress occurs in banp mutants. We have mentioned exo5 upregulation in the Results section. As Referee 1 suggested, phosphorylation of H2AX is induced by ATR prior to DSBs, indicating that gammaH2AX is a marker of DNA replication stalling as well as of DSBs. We showed that gamma-H2AX+ cells are more numerous in banp mutants (Figure 4CF) and morphants (Figure 4-figure supplement 1AB) and in S phase banp mutant cells (Figure 4-figure supplement 1CDEFF’), suggesting that DNA replication stress and subsequent DNA damage linked to fork breakage are induced in banp mutants. We have revised the text by adding this statement in the Results section. In addition, we have revised Fig. 4G and its legend, in order to more clearly show the role of ATR and ATM in DNA replication fork repair and HR-mediated DNA repair in response to DSBs, and tp53-mediated regulation of cell survival and death.

      ATM/ATR-dependent checkpoints arrest cells in G1 or G2 so you would expect reduced S and M phases. Yet, the authors saw increased M and no change in S.

      It is puzzling that BrdU+ cell number does not change because if cells are indeed arrested in mitosis, they should be prevented from going into S phase and BrdU+ cell numbers should decrease.

      There is no significant difference in the BrdU+ fraction of total retinal cells between wild-type and banp mutants at 48 hpf (Fig. 2-figure supplement 1AC), suggesting that cell-cycle arrest in S phase does not occur at significant levels in banp mutants at 48 hpf. At present, we have no good tool to detect G1 phase in zebrafish developing retina, because the Cdt1 fluorescent protein of the FUCCI zebrafish line cannot be stably driven in highly proliferating tissues such as zebrafish retina due to its very short G1 duration. Thus, we cannot determine whether G1 arrest occurs in banp mutant retina. However, we found that mRNA expression of p21 cdk inhibitor is upregulated in banp mutants, using bulk RNA-seq (Figure 3AB) and RT-PCR (Figure C), so it is still possible that banp mutant retinal cells are (probably partially) arrested in G1 phase. We have added this possibility to the Discussion. Further study is necessary to evaluate this point.

      It is not addressed whether cenpt and ncapg expressed in the retina and whether are their expressions decreased in banp mutants. The RNAseq data is from whole animals.

      RNA-seq data (Fig 3AB) were obtained from embryonic heads, but not whole bodies (see Materials and Methods). In accordance with this suggestion, to examine whether cenpt and ncapg mRNAs are expressed in retina, we performed in situ hybridization. We confirmed that these mRNAs are expressed in proliferative cells in zebrafish retina and have added these data to new Figure 5-figure supplement 1. In addition, we also confirmed that cenpt and ncapg mRNA expression is absent in banp mutants (see panels at 48 hpf in Fig. 5-figure supplement 1).

      The rescue by banp-EGFP in Fig.1G is very nice. But it looks like there is partial rescue also with EGFP-banp(rw337) in the same panel. The defects the last panel do not seem as severe as in non inj controls. There are fewer pyknotic nuclei and the cell layers lack gaps. Quantification of the extent or reproducibility of the rescue is lacking.

      We conducted acridine orange (AO) staining of retinas of wild-type, banp mutants, and banp mutants injected with banp(wt)EGFP and with EGFP-banp(rw337). We confirmed that banp(wt)EGFP significantly suppressed apoptosis in banp mutant retinas, whereas EGFP-banp(rw337) did not. We have added these data to new Figure 1-figure supplement 5. So, there is no partial rescue by EGFP-banp(rw337).

      Some of the conclusions lack supporting data. For example, line 99: "Thus, Banp is required for integrity of DNA replication and DNA damage repair." There are no data for the integrity (meaning 'fidelity'?) of DNA replication and there are no DNA repair assays.

      Thank are grateful for this suggestion. We understand that the term “integrity” could be too strong and changed it to “regulation.”

      In another example, non-overlap of pH3 (M phase) and caspase+ cells is interpreted to mean that cells are dying in S phase (Figure 2 supplement 1). But the data are equally consistent with cells dying in G1 and G2.

      In addition to non-overlap of the pH3+ and caspase+ areas along the apico-basal axis of the retina (Fig.2-figure supplement 1DG), we did not observe mitotic death in our live imaging of mitosis in banp morphant retinas. Considering the very short G2 phase of retinal cells in zebrafish, we conclude that apoptosis occurs mostly in retinal progenitor cells undergoing G1 or S phase, or differentiating neurons. However, we cannot exclude the possibility that apoptosis occurs in G2 phase. So, we have revised the text. Furthermore, caspase 3+ cells were mostly located in the intermediate zone of the neural retina along the apico-basal axis, whereas pH3+ cells were localized at the apical surface of the neural retina (Fig. 2-figure supplement 1G), suggesting that apoptosis occurs mostly in retinal progenitor cells during G1, S or G2 phase, or in differentiating neurons. Accordingly, we have revised Fig. 2-figure supplement 1L, to suggest that apoptosis may be induced in G1, S, or G2 phase.

      The model in Figure 7 includes components without accompanying supportive data. For example, the arrow from Banp to DNA repair that indicates a direct role and the arrow from tp53 to delta113 tp53 that indicates direct activation.

      Thank appreciate this suggestion. We have revised Figure 7 and its legend. In new Figure 7, we used solid arrows for regulatory pathways confirmed by us and previous other groups, and dotted arrows for proposed regulatory pathways. We already cited a reference (Chen et al., 2009), indicating direct activation of ∆113 tp53 by FL tp53.

      The data that together support a single point are often split up among figures. For example, increased pH3+ cells shown in Fig. 2 and is interpreted as mitotic arrest. But it is equally possible that cells are undergoing extra divisions (and then dying). Support for mitotic arrest is provided by live imaging of mitosis, which is not presented until the last figure (Fig. 6). There are many such instances in the manuscript.

      A similar concern was raised by reviewer #2. Please see our response.

      Banp is already known for roles in p53-dependent transcription and in apoptosis (e.g. Sinha et al papers cited in the manuscript). Banp is also known to bind to the promoter regions of cenpt and ncapg (Grand et al and Mathai et al papers cited in the manuscript). These genes are known to be involved in mitosis in zebrafish (Hung et al and Seipold et al papers cited in the manuscript). In terms of what is new about banp function in this report, the requirement for banp in a critical phase of retina development and spontaneous induction of DNA damage come to mind. Unfortunately, how loss of banp leads to this defect remains to be addressed.

      A related concern was raised by the editors and also by reviewer #2. Please see our responses. We found that wrnip1 mRNA expression is drastically reduced in banp mutants, which may cause DNA replication stalling and abnormal phenotypes.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this article, Bollmann and colleagues demonstrated both theoretically and experimentally that blood vessels could be targeted at the mesoscopic scale with time-of-flight magnetic resonance imaging (TOF-MRI). With a mathematical model that includes partial voluming effects explicitly, they outline how small voxels reduce the dependency of blood dwell time, a key parameter of the TOF sequence, on blood velocity. Through several experiments on three human subjects, they show that increasing resolution improves contrast and evaluate additional issues such as vessel displacement artifacts and the separation of veins and arteries.

      The overall presentation of the main finding, that small voxels are beneficial for mesoscopic pial vessels, is clear and well discussed, although difficult to grasp fully without a good prior understanding of the underlying TOF-MRI sequence principles. Results are convincing, and some of the data both raw and processed have been provided publicly. Visual inspection and comparisons of different scans are provided, although no quantification or statistical comparison of the results are included.

      Potential applications of the study are varied, from modeling more precisely functional MRI signals to assessing the health of small vessels. Overall, this article reopens a window on studying the vasculature of the human brain in great detail, for which studies have been surprisingly limited until recently.

      In summary, this article provides a clear demonstration that small pial vessels can indeed be imaged successfully with extremely high voxel resolution. There are however several concerns with the current manuscript, hopefully addressable within the study.

      Thank you very much for this encouraging review. While smaller voxel sizes theoretically benefit all blood vessels, we are specifically targeting the (small) pial arteries here, as the inflow-effect in veins is unreliable and susceptibility-based contrasts are much more suited for this part of the vasculature. (We have clarified this in the revised manuscript by substituting ‘vessel’ with ‘artery’ wherever appropriate.) Using a partial-volume model and a relative contrast formulation, we find that the blood delivery time is not the limiting factor when imaging pial arteries, but the voxel size is. Taking into account the comparatively fast blood velocities even in pial arteries with diameters ≤ 200 µm (using t_delivery=l_voxel/v_blood), we find that blood dwell times are sufficiently long for the small voxel sizes considered here to employ the simpler formulation of the flow-related enhancement effect. In other words, small voxels eliminate blood dwell time as a consideration for the blood velocities expected for pial arteries.

      We have extended the description of the TOF-MRA sequence in the revised manuscript, and all data and simulations/analyses presented in this manuscript are now publicly available at https://osf.io/nr6gc/ and https://gitlab.com/SaskiaB/pialvesseltof.git, respectively. This includes additional quantifications of the FRE effect for large vessels (adding to the assessment for small vessels already included), and the effect of voxel size on vessel segmentations.

      Main points:

      1) The manuscript needs clarifying through some additional background information for a readership wider than expert MR physicists. The TOF-MRA sequence and its underlying principles should be introduced first thing, even before discussing vascular anatomy, as it is the key to understanding what aspects of blood physiology and MRI parameters matter here. MR physics shorthand terms should be avoided or defined, as 'spins' or 'relaxation' are not obvious to everybody. The relationship between delivery time and slab thickness should be made clear as well.

      Thank you for this valuable comment that the Theory section is perhaps not accessible for all readers. We have adapted the manuscript in several locations to provide more background information and details on time-of-flight contrast. We found, however, that there is no concise way to first present the MR physics part and then introduce the pial arterial vasculature, as the optimization presented therein is targeted towards this structure. To address this comment, we have therefore opted to provide a brief introduction to TOF-MRA first in the Introduction, and then a more in-depth description in the Theory section.

      Introduction section:

      "Recent studies have shown the potential of time-of-flight (TOF) based magnetic resonance angiography (MRA) at 7 Tesla (T) in subcortical areas (Bouvy et al., 2016, 2014; Ladd, 2007; Mattern et al., 2018; Schulz et al., 2016; von Morze et al., 2007). In brief, TOF-MRA uses the high signal intensity caused by inflowing water protons in the blood to generate contrast, rather than an exogenous contrast agent. By adjusting the imaging parameters of a gradient-recalled echo (GRE) sequence, namely the repetition time (T_R) and flip angle, the signal from static tissue in the background can be suppressed, and high image intensities are only present in blood vessels freshly filled with non-saturated inflowing blood. As the blood flows through the vasculature within the imaging volume, its signal intensity slowly decreases. (For a comprehensive introduction to the principles of MRA, see for example Carr and Carroll (2012)). At ultra-high field, the increased signal-to-noise ratio (SNR), the longer T_1 relaxation times of blood and grey matter, and the potential for higher resolution are key benefits (von Morze et al., 2007)."

      Theory section:

      "Flow-related enhancement

      Before discussing the effects of vessel size, we briefly revisit the fundamental theory of the flow-related enhancement effect used in TOF-MRA. Taking into account the specific properties of pial arteries, we will then extend the classical description to this new regime. In general, TOF-MRA creates high signal intensities in arteries using inflowing blood as an endogenous contrast agent. The object magnetization—created through the interaction between the quantum mechanical spins of water protons and the magnetic field—provides the signal source (or magnetization) accessed via excitation with radiofrequency (RF) waves (called RF pulses) and the reception of ‘echo’ signals emitted by the sample around the same frequency. The T1-contrast in TOF-MRA is based on the difference in the steady-state magnetization of static tissue, which is continuously saturated by RF pulses during the imaging, and the increased or enhanced longitudinal magnetization of inflowing blood water spins, which have experienced no or few RF pulses. In other words, in TOF-MRA we see enhancement for blood that flows into the imaging volume."

      "Since the coverage or slab thickness in TOF-MRA is usually kept small to minimize blood delivery time by shortening the path-length of the vessel contained within the slab (Parker et al., 1991), and because we are focused here on the pial vasculature, we have limited our considerations to a maximum blood delivery time of 1000 ms, with values of few hundreds of milliseconds being more likely."

      2) The main discussion of higher resolution leading to improvements rather than loss presented here seems a bit one-sided: for a more objective understanding of the differences it would be worth to explicitly derive the 'classical' treatment and show how it leads to different conclusions than the present one. In particular, the link made in the discussion between using relative magnetization and modeling partial voluming seems unclear, as both are unrelated. One could also argue that in theory higher resolution imaging is always better, but of course there are practical considerations in play: SNR, dynamics of the measured effect vs speed of acquisition, motion, etc. These issues are not really integrated into the model, even though they provide strong constraints on what can be done. It would be good to at least discuss the constraints that 140 or 160 microns resolution imposes on what is achievable at present.

      Thank you for this excellent suggestion. We found it instructive to illustrate the different effects separately, i.e. relative vs. absolute FRE, and then partial volume vs. no-partial volume effects. In response to comment R2.8 of Reviewer 2, we also clarified the derivation of the relative FRE vs the ‘classical’ absolute FRE (please see R2.8). Accordingly, the manuscript now includes the theoretical derivation in the Theory section and an explicit demonstration of how the classical treatment leads to different conclusions in the Supplementary Material. The important insight gained in our work is that only when considering relative FRE and partial-volume effects together, can we conclude that smaller voxels are advantageous. We have added the following section in the Supplementary Material:

      "Effect of FRE Definition and Interaction with Partial-Volume Model

      For the definition of the FRE effect employed in this study, we used a measure of relative FRE (Al-Kwifi et al., 2002) in combination with a partial-volume model (Eq. 6). To illustrate the implications of these two effects, as well as their interaction, we have estimated the relative and absolute FRE for an artery with a diameter of 200 µm or 2 000 µm (i.e. no partial-volume effects at the centre of the vessel). The absolute FRE expression explicitly takes the voxel volume into account, and so instead of Eq. (6) for the relative FRE we used"

      Eq. (1)

      "Note that the division by M_zS^tissue⋅l_voxel^3 to obtain the relative FRE from this expression removes the contribution of the total voxel volume (l_voxel^3). Supplementary Figure 2 shows that, when partial volume effects are present, the highest relative FRE arises in voxels with the same size as or smaller than the vessel diameter (Supplementary Figure 2A), whereas the absolute FRE increases with voxel size (Supplementary Figure 2C). If no partial-volume effects are present, the relative FRE becomes independent of voxel size (Supplementary Figure 2B), whereas the absolute FRE increases with voxel size (Supplementary Figure 2D). While the partial-volume effects for the relative FRE are substantial, they are much more subtle when using the absolute FRE and do not alter the overall characteristics."

      Supplementary Figure 2: Effect of voxel size and blood delivery time on the relative flow-related enhancement (FRE) using either a relative (A,B) (Eq. (3)) or an absolute (C,D) (Eq. (12)) FRE definition assuming a pial artery diameter of 200 μm (A,C) or 2 000 µm, i.e. no partial-volume effects at the central voxel of this artery considered here.

      In addition, we have also clarified the contribution of the two definitions and their interaction in the Discussion section. Following the suggestion of Reviewer 2, we have extended our interpretation of relative FRE. In brief, absolute FRE is closely related to the physical origin of the contrast, whereas relative FRE is much more concerned with the “segmentability” of a vessel (please see R2.8 for more details):

      "Extending classical FRE treatments to the pial vasculature

      There are several major modifications in our approach to this topic that might explain why, in contrast to predictions from classical FRE treatments, it is indeed possible to image pial arteries. For instance, the definition of vessel contrast or flow-related enhancement is often stated as an absolute difference between blood and tissue signal (Brown et al., 2014a; Carr and Carroll, 2012; Du et al., 1993, 1996; Haacke et al., 1990; Venkatesan and Haacke, 1997). Here, however, we follow the approach of Al-Kwifi et al. (2002) and consider relative contrast. While this distinction may seem to be semantic, the effect of voxel volume on FRE for these two definitions is exactly opposite: Du et al. (1996) concluded that larger voxel size increases the (absolute) vessel-background contrast, whereas here we predict an increase in relative FRE for small arteries with decreasing voxel size. Therefore, predictions of the depiction of small arteries with decreasing voxel size differ depending on whether one is considering absolute contrast, i.e. difference in longitudinal magnetization, or relative contrast, i.e. contrast differences independent of total voxel size. Importantly, this prediction changes for large arteries where the voxel contains only vessel lumen, in which case the relative FRE remains constant across voxel sizes, but the absolute FRE increases with voxel size (Supplementary Figure 2). Overall, the interpretations of relative and absolute FRE differ, and one measure may be more appropriate for certain applications than the other. Absolute FRE describes the difference in magnetization and is thus tightly linked to the underlying physical mechanism. Relative FRE, however, describes the image contrast and segmentability. If blood and tissue magnetization are equal, both contrast measures would equal zero and indicate that no contrast difference is present. However, when there is signal in the vessel and as the tissue magnetization approaches zero, the absolute FRE approaches the blood magnetization (assuming no partial-volume effects), whereas the relative FRE approaches infinity. While this infinite relative FRE does not directly relate to the underlying physical process of ‘infinite’ signal enhancement through inflowing blood, it instead characterizes the segmentability of the image in that an image with zero intensity in the background and non-zero values in the structures of interest can be segmented perfectly and trivially. Accordingly, numerous empirical observations (Al-Kwifi et al., 2002; Bouvy et al., 2014; Haacke et al., 1990; Ladd, 2007; Mattern et al., 2018; von Morze et al., 2007) and the data provided here (Figure 5, 6 and 7) have shown the benefit of smaller voxel sizes if the aim is to visualize and segment small arteries."

      Note that our formulation of the FRE—even without considering SNR—does not suggest that higher resolution is always better, but instead should be matched to the size of the target arteries:

      "Importantly, note that our treatment of the FRE does not suggest that an arbitrarily small voxel size is needed, but instead that voxel sizes appropriate for the arterial diameter of interest are beneficial (in line with the classic “matched-filter” rationale (North, 1963)). Voxels smaller than the arterial diameter would not yield substantial benefits (Figure 5) and may result in SNR reductions that would hinder segmentation performance."

      Further, we have also extended the concluding paragraph of the Imaging limitation section to also include a practical perspective:

      "In summary, numerous theoretical and practical considerations remain for optimal imaging of pial arteries using time-of-flight contrast. Depending on the application, advanced displacement artefact compensation strategies may be required, and zero-filling could provide better vessel depiction. Further, an optimal trade-off between SNR, voxel size and acquisition time needs to be found. Currently, the partial-volume FRE model only considers voxel size, and—as we reduced the voxel size in the experiments—we (partially) compensated the reduction in SNR through longer scan times. This, ultimately, also required the use of prospective motion correction to enable the very long acquisition times necessary for 140 µm isotropic voxel size. Often, anisotropic voxels are used to reduce acquisition time and increase SNR while maintaining in-plane resolution. This may indeed prove advantageous when the (also highly anisotropic) arteries align with the anisotropic acquisition, e.g. when imaging the large supplying arteries oriented mostly in the head-foot direction. In the case of pial arteries, however, there is not preferred orientation because of the convoluted nature of the pial arterial vasculature encapsulating the complex folding of the cortex (see section Anatomical architecture of the pial arterial vasculature). A further reduction in voxel size may be possible in dedicated research settings utilizing even longer acquisition times and/or larger acquisition volumes to maintain SNR. However, if acquisition time is limited, voxel size and SNR need to be carefully balanced against each other."

      3) The article seems to imply that TOF-MRA is the only adequate technique to image brain vasculature, while T2 mapping, UHF T1 mapping (see e.g. Choi et al., https://doi.org/10.1016/j.neuroimage.2020.117259) phase (e.g. Fan et al., doi:10.1038/jcbfm.2014.187), QSM (see e.g. Huck et al., https://doi.org/10.1007/s00429-019-01919-4), or a combination (Bernier et al., https://doi.org/10.1002/hbm.24337​, Ward et al., https://doi.org/10.1016/j.neuroimage.2017.10.049) all depict some level of vascular detail. It would be worth quickly reviewing the different effects of blood on MRI contrast and how those have been used in different approaches to measure vasculature. This would in particular help clarify the experiment combining TOF with T2 mapping used to separate arteries from veins (more on this question below).

      We apologize if we inadvertently created the impression that TOF-MRA is a suitable technique to image the complete brain vasculature, and we agree that susceptibility-based methods are much more suitable for venous structures. As outlined above, we have revised the manuscript in various sections to indicate that it is the pial arterial vasculature we are targeting. We have added a statement on imaging the venous vasculature in the Discussion section. Please see our response below regarding the use of T2* to separate arteries and veins.

      "The advantages of imaging the pial arterial vasculature using TOF-MRA without an exogenous contrast agent lie in its non-invasiveness and the potential to combine these data with various other structural and functional image contrasts provided by MRI. One common application is to acquire a velocity-encoded contrast such as phase-contrast MRA (Arts et al., 2021; Bouvy et al., 2016). Another interesting approach utilises the inherent time-of-flight contrast in magnetization-prepared two rapid acquisition gradient echo (MP2RAGE) images acquired at ultra-high field that simultaneously acquires vasculature and structural data, albeit at lower achievable resolution and lower FRE compared to the TOF-MRA data in our study (Choi et al., 2020). In summary, we expect high-resolution TOF-MRA to be applicable also for group studies to address numerous questions regarding the relationship of arterial topology and morphometry to the anatomical and functional organization of the brain, and the influence of arterial topology and morphometry on brain hemodynamics in humans. In addition, imaging of the pial venous vasculature—using susceptibility-based contrasts such as T2-weighted magnitude (Gulban et al., 2021) or phase imaging (Fan et al., 2015), susceptibility-weighted imaging (SWI) (Eckstein et al., 2021; Reichenbach et al., 1997) or quantitative susceptibility mapping (QSM) (Bernier et al., 2018; Huck et al., 2019; Mattern et al., 2019; Ward et al., 2018)—would enable a comprehensive assessment of the complete cortical vasculature and how both arteries and veins shape brain hemodynamics.*"

      4) The results, while very impressive, are mostly qualitative. This seems a missed opportunity to strengthen the points of the paper: given the segmentations already made, the amount/density of detected vessels could be compared across scans for the data of Fig. 5 and 7. The minimum distance between vessels could be measured in Fig. 8 to show a 2D distribution and/or a spatial map of the displacement. The number of vessels labeled as veins instead of arteries in Fig. 9 could be given.

      We fully agree that estimating these quantitative measures would be very interesting; however, this would require the development of a comprehensive analysis framework, which would considerably shift the focus of this paper from data acquisition and flow-related enhancement to data analysis. As noted in the discussion section Challenges for vessel segmentation algorithms, ‘The vessel segmentations presented here were performed to illustrate the sensitivity of the image acquisition to small pial arteries’, because the smallest arteries tend to be concealed in the maximum intensity projections. Further, the interpretation of these measures is not straightforward. For example, the number of detected vessels for the artery depicted in Figure 5 does not change across resolutions, but their length does. We have therefore estimated the relative increase in skeleton length across resolutions for Figures 5 and 7. However, these estimates are not only a function of the voxel size but also of the underlying vasculature, i.e. the number of arteries with a certain diameter present, and may thus not generalise well to enable quantitative predictions of the improvement expected from increased resolutions. We have added an illustration of these analyses in the Supplementary Material, and the following additions in the Methods, Results and Discussion sections.

      "For vessel segmentation, a semi-automatic segmentation pipeline was implemented in Matlab R2020a (The MathWorks, Natick, MA) using the UniQC toolbox (Frässle et al., 2021): First, a brain mask was created through thresholding which was then manually corrected in ITK-SNAP (http://www.itksnap.org/) (Yushkevich et al., 2006) such that pial vessels were included. For the high-resolution TOF data (Figures 6 and 7, Supplementary Figure 4), denoising to remove high frequency noise was performed using the implementation of an adaptive non-local means denoising algorithm (Manjón et al., 2010) provided in DenoiseImage within the ANTs toolbox, with the search radius for the denoising set to 5 voxels and noise type set to Rician. Next, the brain mask was applied to the bias corrected and denoised data (if applicable). Then, a vessel mask was created based on a manually defined threshold, and clusters with less than 10 or 5 voxels for the high- and low-resolution acquisitions, respectively, were removed from the vessel mask. Finally, an iterative region-growing procedure starting at each voxel of the initial vessel mask was applied that successively included additional voxels into the vessel mask if they were connected to a voxel which was already included and above a manually defined threshold (which was slightly lower than the previous threshold). Both thresholds were applied globally but manually adjusted for each slab. No correction for motion between slabs was applied. The Matlab code describing the segmentation algorithm as well as the analysis of the two-echo TOF acquisition outlined in the following paragraph are also included in our github repository (https://gitlab.com/SaskiaB/pialvesseltof.git). To assess the data quality, maximum intensity projections (MIPs) were created and the outline of the segmentation MIPs were added as an overlay. To estimate the increased detection of vessels with higher resolutions, we computed the relative increase in the length of the segmented vessels for the data presented in Figure 5 (0.8 mm, 0.5 mm, 0.4 mm and 0.3 mm isotropic voxel size) and Figure 7 (0.16 mm and 0.14 mm isotropic voxel size) by computing the skeleton using the bwskel Matlab function and then calculating the skeleton length as the number of voxels in the skeleton multiplied by the voxel size."

      "To investigate the effect of voxel size on vessel FRE, we acquired data at four different voxel sizes ranging from 0.8 mm to 0.3 mm isotropic resolution, adjusting only the encoding matrix, with imaging parameters being otherwise identical (FOV, TR, TE, flip angle, R, slab thickness, see section Data acquisition). The total acquisition time increases from less than 2 minutes for the lowest resolution scan to over 6 minutes for the highest resolution scan as a result. Figure 5 shows thin maximum intensity projections of a small vessel. While the vessel is not detectable at the largest voxel size, it slowly emerges as the voxel size decreases and approaches the vessel size. Presumably, this is driven by the considerable increase in FRE as seen in the single slice view (Figure 5, small inserts). Accordingly, the FRE computed from the vessel mask for the smallest part of the vessel (Figure 5, red mask) increases substantially with decreasing voxel size. More precisely, reducing the voxel size from 0.8 mm, 0.5 mm or 0.4 mm to 0.3 mm increases the FRE by 2900 %, 165 % and 85 %, respectively. Assuming a vessel diameter of 300 μm, the partial-volume FRE model (section Introducing a partial-volume model) would predict similar ratios of 611%, 178% and 78%. However, as long as the vessel is larger than the voxel (Figure 5, blue mask), the relative FRE does not change with resolution (see also Effect of FRE Definition and Interaction with Partial-Volume Model in the Supplementary Material). To illustrate the gain in sensitivity to detect smaller arteries, we have estimated the relative increase of the total length of the segmented vasculature (Supplementary Figure 9): reducing the voxel size from 0.8 mm to 0.5 mm isotropic increases the skeleton length by 44 %, reducing the voxel size from 0.5 mm to 0.4 mm isotropic increases the skeleton length by 28 %, and reducing the voxel size from 0.4 mm to 0.3 mm isotropic increases the skeleton length by 31 %. In summary, when imaging small pial arteries, these data support the hypothesis that it is primarily the voxel size, not the blood delivery time, which determines whether vessels can be resolved."

      "Indeed, the reduction in voxel volume by 33 % revealed additional small branches connected to larger arteries (see also Supplementary Figure 8). For this example, we found an overall increase in skeleton length of 14 % (see also Supplementary Figure 9)."

      "We therefore expect this strategy to enable an efficient image acquisition without the need for additional venous suppression RF pulses. Once these challenges for vessel segmentation algorithms are addressed, a thorough quantification of the arterial vasculature can be performed. For example, the skeletonization procedure used to estimate the increase of the total length of the segmented vasculature (Supplementary Figure 9) exhibits errors particularly in the unwanted sinuses and large veins. While they are consistently present across voxel sizes, and thus may have less impact on relative change in skeleton length, they need to be addressed when estimating the absolute length of the vasculature, or other higher-order features such as number of new branches. (Note that we have also performed the skeletonization procedure on the maximum intensity projections to reduce the number of artefacts and obtained comparable results: reducing the voxel size from 0.8 mm to 0.5 mm isotropic increases the skeleton length by 44 % (3D) vs 37 % (2D), reducing the voxel size from 0.5 mm to 0.4 mm isotropic increases the skeleton length by 28 % (3D) vs 26 % (2D), reducing the voxel size from 0.4 mm to 0.3 mm isotropic increases the skeleton length by 31 % (3D) vs 16 % (2D), and reducing the voxel size from 0.16 mm to 0.14 mm isotropic increases the skeleton length by 14 % (3D) vs 24 % (2D).)"

      Supplementary Figure 9: Increase of vessel skeleton length with voxel size reduction. Axial maximum intensity projections for data acquired with different voxel sizes ranging from 0.8 mm to 0.3 mm (TOP) (corresponding to Figure 5) and 0.16 mm to 0.14 mm isotropic (corresponding to Figure 7) are shown. Vessel skeletons derived from segmentations performed for each resolution are overlaid in red. A reduction in voxel size is accompanied by a corresponding increase in vessel skeleton length.

      Regarding further quantification of the vessel displacement presented in Figure 8, we have estimated the displacement using the Horn-Schunck optical flow estimator (Horn and Schunck, 1981; Mustafa, 2016) (https://github.com/Mustafa3946/Horn-Schunck-3D-Optical-Flow). However, the results are dominated by the larger arteries, whereas we are mostly interested in the displacement of the smallest arteries, therefore this quantification may not be helpful.

      Because the theoretical relationship between vessel displacement and blood velocity is well known (Eq. 7), and we have also outlined the expected blood velocity as a function of arterial diameter in Figure 2, which provided estimates of displacements that matched what was found in our data (as reported in our original submission), we believe that the new quantification in this form does not add value to the manuscript. What would be interesting would be to explore the use of this displacement artefact as a measure of blood velocities. This, however, would require more substantial analyses in particular for estimation of the arterial diameter and additional validation data (e.g. phase-contrast MRA). We have outlined this avenue in the Discussion section. What is relevant to the main aim of this study, namely imaging of small pial arteries, is the insight that blood velocities are indeed sufficiently fast to cause displacement artefacts even in smaller arteries. We have clarified this in the Results section:

      "Note that correction techniques exist to remove displaced vessels from the image (Gulban et al., 2021), but they cannot revert the vessels to their original location. Alternatively, this artefact could also potentially be utilised as a rough measure of blood velocity."

      "At a delay time of 10 ms between phase encoding and echo time, the observed displacement of approximately 2 mm in some of the larger vessels would correspond to a blood velocity of 200 mm/s, which is well within the expected range (Figure 2). For the smallest arteries, a displacement of one voxel (0.4 mm) can be observed, indicative of blood velocities of 40 mm/s. Note that the vessel displacement can be observed in all vessels visible at this resolution, indicating high blood velocities throughout much of the pial arterial vasculature. Thus, assuming a blood velocity of 40 mm/s (Figure 2) and a delay time of 5 ms for the high-resolution acquisitions (Figure 6), vessel displacements of 0.2 mm are possible, representing a shift of 1–2 voxels."

      Regarding the number of vessels labelled as veins, please see our response below to R1.5.

      In the main quantification given, the estimation of FRE increase with resolution, it would make more sense to perform the segmentation independently for each scan and estimate the corresponding FRE: using the mask from the highest resolution scan only biases the results. It is unclear also if the background tissue measurement one voxel outside took partial voluming into account (by leaving a one voxel free interface between vessel and background). In this analysis, it would also be interesting to estimate SNR, so you can compare SNR and FRE across resolutions, also helpful for the discussion on SNR.

      The FRE serves as an indicator of the potential performance of any segmentation algorithm (including manual segmentation) (also see our discussion on the interpretation of FRE in our response to R1.2). If we were to segment each scan individually, we would, in the ideal case, always obtain the same FRE estimate, as FRE influences the performance of the segmentation algorithm. In practice, this simply means that it is not possible to segment the vessel in the low-resolution image to its full extent that is visible in the high-resolution image, because the FRE is too low for small vessels. However, we agree with the core point that the reviewer is making, and so to help address this, a valuable addition would be to compare the FRE for the section of a vessel that is visible at all resolutions, where we found—within the accuracy of the transformations and resampling across such vastly different resolutions—that the FRE does not increase any further with higher resolution if the vessel is larger than the voxel size (page 18 and Figure 5). As stated in the Methods section, and as noted by the reviewer, we used the voxels immediately next to the vessel mask to define the background tissue signal level. Any resulting potential partial-volume effects in these background voxels would affect all voxel sizes, introducing a consistent bias that would not impact our comparison. However, inspection of the image data in Figure 5 showed partial-volume effects predominantly within those voxels intersecting the vessel, rather than voxels surrounding the vessel, in agreement with our model of FRE.

      "All imaging data were slab-wise bias-field corrected using the N4BiasFieldCorrection (Tustison et al., 2010) tool in ANTs (Avants et al., 2009) with the default parameters. To compare the empirical FRE across the four different resolutions (Figure 5), manual masks were first created for the smallest part of the vessel in the image with the highest resolution and for the largest part of the vessel in the image with the lowest resolution. Then, rigid-body transformation parameters from the low-resolution to the high-resolution (and the high-resolution to the low-resolution) images were estimated using coregister in SPM (https://www.fil.ion.ucl.ac.uk/spm/), and their inverse was applied to the vessel mask using SPM’s reslice. To calculate the empirical FRE (Eq. (3)), the mean of the intensity values within the vessel mask was used to approximate the blood magnetization, and the mean of the intensity values one voxel outside of the vessel mask was used as the tissue magnetization."

      "To investigate the effect of voxel size on vessel FRE, we acquired data at four different voxel sizes ranging from 0.8 mm to 0.3 mm isotropic resolution, adjusting only the encoding matrix, with imaging parameters being otherwise identical (FOV, TR, TE, flip angle, R, slab thickness, see section Data acquisition). The total acquisition time increases from less than 2 minutes for the lowest resolution scan to over 6 minutes for the highest resolution scan as a result. Figure 5 shows thin maximum intensity projections of a small vessel. While the vessel is not detectable at the largest voxel size, it slowly emerges as the voxel size decreases and approaches the vessel size. Presumably, this is driven by the considerable increase in FRE as seen in the single slice view (Figure 5, small inserts). Accordingly, the FRE computed from the vessel mask for the smallest part of the vessel (Figure 5, red mask) increases substantially with decreasing voxel size. More precisely, reducing the voxel size from 0.8 mm, 0.5 mm or 0.4 mm to 0.3 mm increases the FRE by 2900 %, 165 % and 85 %, respectively. Assuming a vessel diameter of 300 μm, the partial-volume FRE model (section Introducing a partial-volume model) would predict similar ratios of 611%, 178% and 78%. However, if the vessel is larger than the voxel (Figure 5, blue mask), the relative FRE remains constant across resolutions (see also Effect of FRE Definition and Interaction with Partial-Volume Model in the Supplementary Material). To illustrate the gain in sensitivity to smaller arteries, we have estimated the relative increase of the total length of the segmented vasculature (Supplementary Figure 9): reducing the voxel size from 0.8 mm to 0.5 mm isotropic increases the skeleton length by 44 %, reducing the voxel size from 0.5 mm to 0.4 mm isotropic increases the skeleton length by 28 %, and reducing the voxel size from 0.4 mm to 0.3 mm isotropic increases the skeleton length by 31 %. In summary, when imaging small pial arteries, these data support the hypothesis that it is primarily the voxel size, not blood delivery time, which determines whether vessels can be resolved."

      Figure 5: Effect of voxel size on flow-related vessel enhancement. Thin axial maximum intensity projections containing a small artery acquired with different voxel sizes ranging from 0.8 mm to 0.3 mm isotropic are shown. The FRE is estimated using the mean intensity value within the vessel masks depicted on the left, and the mean intensity values of the surrounding tissue. The small insert shows a section of the artery as it lies within a single slice. A reduction in voxel size is accompanied by a corresponding increase in FRE (red mask), whereas no further increase is obtained once the voxel size is equal or smaller than the vessel size (blue mask).

      After many internal discussions, we had to conclude that deducing a meaningful SNR analysis that would benefit the reader was not possible given the available data due to the complex relationship between voxel size and other imaging parameters in practice. In detail, we have reduced the voxel size but at the same time increased the acquisition time by increasing the number of encoding steps—which we have now also highlighted in the manuscript. We have, however, added additional considerations about balancing SNR and segmentation performance. Note that these considerations are not specific to imaging the pial arteries but apply to all MRA acquisitions, and have thus been discussed previously in the literature. Here, we wanted to focus on the novel insights gained in our study. Importantly, while we previously noted that reducing voxel size improves contrast in vessels whose diameters are smaller than the voxel size, we now explicitly acknowledge that, for vessels whose diameters are larger than the voxel size reducing the voxel size is not helpful---since it only reduces SNR without any gain in contrast---and may hinder segmentation performance, and thus become counterproductive.

      "In general, we have not considered SNR, but only FRE, i.e. the (relative) image contrast, assuming that segmentation algorithms would benefit from higher contrast for smaller arteries. Importantly, the acquisition parameters available to maximize FRE are limited, namely repetition time, flip angle and voxel size. SNR, however, can be improved via numerous avenues independent of these parameters (Brown et al., 2014b; Du et al., 1996; Heverhagen et al., 2008; Parker et al., 1991; Triantafyllou et al., 2011; Venkatesan and Haacke, 1997), the simplest being longer acquisition times. If the aim is to optimize a segmentation outcome for a given acquisition time, the trade-off between contrast and SNR for the specific segmentation algorithm needs to be determined (Klepaczko et al., 2016; Lesage et al., 2009; Moccia et al., 2018; Phellan and Forkert, 2017). Our own—albeit limited—experience has shown that segmentation algorithms (including manual segmentation) can accommodate a perhaps surprising amount of noise using prior knowledge and neighborhood information, making these high-resolution acquisitions possible. Importantly, note that our treatment of the FRE does not suggest that an arbitrarily small voxel size is needed, but instead that voxel sizes appropriate for the arterial diameter of interest are beneficial (in line with the classic “matched-filter” rationale (North, 1963)). Voxels smaller than the arterial diameter would not yield substantial benefits (Figure 5) and may result in SNR reductions that would hinder segmentation performance."

      5) The separation of arterial and venous components is a bit puzzling, partly because the methodology used is not fully explained, but also partly because the reasons invoked (flow artefact in large pial veins) do not match the results (many small vessels are included as veins). This question of separating both types of vessels is quite important for applications, so the whole procedure should be explained in detail. The use of short T2 seemed also sub-optimal, as both arteries and veins result in shorter T2 compared to most brain tissues: wouldn't a susceptibility-based measure (SWI or better QSM) provide a better separation? Finally, since the T2* map and the regular TOF map are at different resolutions, masking out the vessels labeled as veins will likely result in the smaller veins being left out.

      We agree that while the technical details of this approach were provided in the Data analysis section, the rationale behind it was only briefly mentioned. We have therefore included an additional section Inflow-artefacts in sinuses and pial veins in the Theory section of the manuscript. We have also extended the discussion of the advantages and disadvantages of the different susceptibility-based contrasts, namely T2, SWI and QSM. While in theory both T2 and QSM should allow the reliable differentiation of arterial and venous blood, we found T2* to perform more robustly, as QSM can fail in many places, e.g., due to the strong susceptibility sources within superior sagittal and transversal sinuses and pial veins and their proximity to the brain surface, dedicated processing is required (Stewart et al., 2022). Further, we have also elaborated in the Discussion section why the interpretation of Figure 9 regarding the absence or presence of small veins is challenging. Namely, the intensity-based segmentation used here provides only an incomplete segmentation even of the larger sinuses, because the overall lower intensity found in veins combined with the heterogeneity of the intensities in veins violates the assumptions made by most vascular segmentation approaches of homogenous, high image intensities within vessels, which are satisfied in arteries (page 29f) (see also the illustration below). Accordingly, quantifying the number of vessels labelled as veins (R1.4a) would provide misleading results, as often only small subsets of the same sinus or vein are segmented.

      "Inflow-artefacts in sinuses and pial veins

      Inflow in large pial veins and the sagittal and transverse sinuses can cause flow-related enhancement in these non-arterial vessels. One common strategy to remove this unwanted signal enhancement is to apply venous suppression pulses during the data acquisition, which saturate bloods spins outside the imaging slab. Disadvantages of this technique are the technical challenges of applying these pulses at ultra-high field due to constraints of the specific absorption rate (SAR) and the necessary increase in acquisition time (Conolly et al., 1988; Heverhagen et al., 2008; Johst et al., 2012; Maderwald et al., 2008; Schmitter et al., 2012; Zhang et al., 2015). In addition, optimal positioning of the saturation slab in the case of pial arteries requires further investigation, and in particular supressing signal from the superior sagittal sinus without interfering in the imaging of the pial arteries vasculature at the top of the cortex might prove challenging. Furthermore, this venous saturation strategy is based on the assumption that arterial blood is traveling head-wards while venous blood is drained foot-wards. For the complex and convoluted trajectory of pial vessels this directionality-based saturation might be oversimplified, particularly when considering the higher-order branches of the pial arteries and veins on the cortical surface. Inspired by techniques to simultaneously acquire a TOF image for angiography and a susceptibility-weighted image for venography (Bae et al., 2010; Deistung et al., 2009; Du et al., 1994; Du and Jin, 2008), we set out to explore the possibility of removing unwanted venous structures from the segmentation of the pial arterial vasculature during data postprocessing. Because arteries filled with oxygenated blood have T2-values similar to tissue, while veins have much shorter T2-values due to the presence of deoxygenated blood (Pauling and Coryell, 1936; Peters et al., 2007; Uludağ et al., 2009; Zhao et al., 2007), we used this criterion to remove vessels with short T2* values from the segmentation (see Data Analysis for details). In addition, we also explored whether unwanted venous structures in the high-resolution TOF images—where a two-echo acquisition is not feasible due to the longer readout—can be removed based on detecting them in a lower-resolution image."

      "Removal of pial veins

      Inflow in large pial veins and the superior sagittal and transverse sinuses can cause a flow-related enhancement in these non-arterial vessels (Figure 9, left). The higher concentration of deoxygenated haemoglobin in these vessels leads to shorter T2 values (Pauling and Coryell, 1936), which can be estimated using a two-echo TOF acquisition (see also Inflow-artefacts in sinuses and pial veins). These vessels can be identified in the segmentation based on their T2 values (Figure 9, left), and removed from the angiogram (Figure 9, right) (Bae et al., 2010; Deistung et al., 2009; Du et al., 1994; Du and Jin, 2008). In particular, the superior and inferior sagittal and the transversal sinuses and large veins which exhibited an inhomogeneous intensity profile and a steep loss of intensity at the slab boundary were identified as non-arterial (Figure 9, left). Further, we also explored the option of removing unwanted venous vessels from the high-resolution TOF image (Figure 7) using a low-resolution two-echo TOF (not shown). This indeed allowed us to remove the strong signal enhancement in the sagittal sinuses and numerous larger veins, although some small veins, which are characterised by inhomogeneous intensity profiles and can be detected visually by experienced raters, remain."

      Figure 9: Removal of non-arterial vessels in time-of-flight imaging. LEFT: Segmentation of arteries (red) and veins (blue) using T_2^ estimates. RIGHT: Time-of-flight angiogram after vein removal.*

      Our approach also assumes that the unwanted veins are large enough that they are also resolved in the low-resolution image. If we consider the source of the FRE effect, it might indeed be exclusively large veins that are present in TOF-MRA data, which would suggest that our assumption is valid. Fundamentally, the FRE depends on the inflow of un-saturated spins into the imaging slab. However, small veins drain capillary beds in the local tissue, i.e. the tissue within the slab. (Note that due to the slice oversampling implemented in our acquisition, spins just above or below the slab will also be excited.) Thus, small veins only contain blood water spins that have experienced a large number of RF pulses due to the long transit time through the pial arterial vasculature, the capillaries and the intracortical venules. Hence, their longitudinal magnetization would be similar to that of stationary tissue. To generate an FRE effect in veins, “pass-through” venous blood from outside the imaging slab is required. This is only available in veins that are passing through the imaging slab, which have much larger diameters. These theoretical considerations are corroborated by the findings in Figure 9, where large disconnected vessels with varying intensity profiles were identified as non-arterial. Due to the heterogenous intensity profiles in large veins and the sagittal and transversal sinuses, the intensity-based segmentation applied here may only label a subset of the vessel lumen, creating the impression of many small veins. This is particularly the case for the straight and inferior sagittal sinus in the bottom slab of Figure 9. Nevertheless, future studies potentially combing anatomical prior knowledge, advanced segmentation algorithms and susceptibility measures would be capable of removing these unwanted veins in post-processing to enable an efficient TOF-MRA image acquisition dedicated to optimally detecting small arteries without the need for additional venous suppression RF pulses.

      6) A more general question also is why this imaging method is limited to pial vessels: at 140 microns, the larger intra-cortical vessels should be appearing (group 6 in Duvernoy, 1981: diameters between 50 and 240 microns). Are there other reasons these vessels are not detected? Similarly, it seems there is no arterial vasculature detected in the white matter here: it is due to the rather superior location of the imaging slab, or a limitation of the method? Likewise, all three results focus on a rather homogeneous region of cerebral cortex, in terms of vascularisation. It would be interesting for applications to demonstrate the capabilities of the method in more complex regions, e.g. the densely vascularised cerebellum, or more heterogeneous regions like the midbrain. Finally, it is notable that all three subjects appear to have rather different densities of vessels, from sparse (participant II) to dense (participant I), with some inhomogeneities in density (frontal region in participant III) and inconsistencies in detection (sinuses absent in participant II). All these points should be discussed.

      While we are aware that the diameter of intracortical arteries has been suggested to be up to 240 µm (Duvernoy et al., 1981), it remains unclear how prevalent intracortical arteries of this size are. For example, note that in a different context in the Duvernoy study (in teh revised manuscript), the following values are mentioned (which we followed in Figure 1):

      “Central arteries of the Iobule always have a large diameter of 260 µ to 280 µ, at their origin. Peripheral arteries have an average diameter of 150 µ to 180 µ. At the cortex surface, all arterioles of 50 µ or less, penetrate the cortex or form anastomoses. The diameter of most of these penetrating arteries is approximately 40 µ.”

      Further, the examinations by Hirsch et al. (2012) (albeit in the macaque brain), showed one (exemplary) intracortical artery belonging to group 6 (Figure 1B), whose diameter appears to be below 100 µm. Given these discrepancies and the fact that intracortical arteries in group 5 only reach 75 µm, we suspect that intracortical arteries with diameters > 140 µm are a very rare occurrence, which we might not have encountered in this data set.

      Similarly, arteries in white matter (Nonaka et al., 2003) and the cerebellum (Duvernoy et al., 1983) are beyond our resolution at the moment. The midbrain is an interesting suggesting, although we believe that the cortical areas chosen here with their gradual reduction in diameter along the vascular tree, provide a better illustration of the effect of voxel size than the rather abrupt reduction in vascular diameter found in the midbrain. We have added the even higher resolution requirements in the discussion section:

      "In summary, we expect high-resolution TOF-MRA to be applicable also for group studies, to address numerous questions regarding the relationship of arterial topology and morphometry to the anatomical and functional organization of the brain, and the influence of arterial topology and morphometry on brain hemodynamics in humans. Notably, we have focused on imaging pial arteries of the human cerebrum; however, other brain structures such as the cerebellum, subcortex and white matter are of course also of interest. While the same theoretical considerations apply, imaging the arterial vasculature in these structures will require even smaller voxel sizes due to their smaller arterial diameters (Duvernoy et al., 1983, 1981; Nonaka et al., 2003)."

      Regarding the apparent sparsity of results from participant II, this is mostly driven by the much smaller coverage in this subject (19.6 mm in Participant II vs. 50 mm and 58 mm in Participant I and III, respectively). The reduction in density in the frontal regions might indeed constitute difference in anatomy or might be driven by the presence or more false-positive veins in Participant I than Participant III in these areas. Following the depiction in Duvernoy et al. (1981), one would not expect large arteries in frontal areas, but large veins are common. Thus, the additional vessels in Participant I in the frontal areas might well be false-positive veins, and their removal would result in similar densities for both participants. Indeed, as pointed out in section Future directions, we would expect a lower arterial density in frontal and posterior areas than in middle areas. The sinuses (and other large false-positive veins) in Participant II have been removed as outlined and discussed in sections Removal of pial veins and Challenges for vessel segmentation algorithms, respectively.

      7) One of the main practical limitations of the proposed method is the use of a very small imaging slab. It is mentioned in the discussion that thicker slabs are not only possible, but beneficial both in terms of SNR and acceleration possibilities. What are the limitations that prevented their use in the present study? With the current approach, what would be the estimated time needed to acquire the vascular map of an entire brain? It would also be good to indicate whether specific processing was needed to stitch together the multiple slab images in Fig. 6-9, S2.

      Time-of-flight acquisitions are commonly performed with thin acquisition slabs, following initial investigations by Parker et al. (1991) to maximise vessel sensitivity and minimize noise. We therefore followed this practice for our initial investigations but wanted to point out in the discussion that thicker slabs might provide several advantages that need to be evaluated in future studies. This would include theoretical and empirical evaluations balancing SNR gains from larger excitation volumes and SNR losses due to more acceleration. For this study, we have chosen the slab thickness such as to keep the acquisition time at a reasonable amount to minimize motion artefacts (as outlined in the Discussion). In addition, due to the extreme matrix sizes in particular for the 0.14 mm acquisition, we were also limited in the number of data points per image that can be indexed. This would require even more substantial changes to the sequence than what we have already performed. With 16 slabs, assuming optimal FOV orientation, full-brain coverage including the cerebellum of 95 % of the population (Mennes et al., 2014) could be achieved with an acquisition time of (16  11 min 42 s = 3 h 7 min 12 s) at 0.16 mm isotropic voxel size. No stitching of the individual slabs was performed, as subject motion was minimal. We have added a corresponding comment in the Data Analysis.

      "Both thresholds were applied globally but manually adjusted for each slab. No correction for motion between slabs was applied as subject motion was minimal. The Matlab code describing the segmentation algorithm as well es the analysis of the two-echo TOF acquisition outlined in the following paragraph are also included in the github repository (https://gitlab.com/SaskiaB/pialvesseltof.git)."

      8) Some researchers and clinicians will argue that you can attain best results with anisotropic voxels, combining higher SNR and higher resolution. It would be good to briefly mention why isotropic voxels are preferred here, and whether anisotropic voxels would make sense at all in this context.

      Anisotropic voxels can be advantageous if the underlying object is anisotropic, e.g. an artery running straight through the slab, which would have a certain diameter (imaged using the high-resolution plane) and an ‘infinite’ elongation (in the low-resolution direction). However, the vessels targeted here can have any orientation and curvature; an anisotropic acquisition could therefore introduce a bias favouring vessels with a particular orientation relative to the voxel grid. Note that the same argument applies when answering the question why a further reduction slab thickness would eventually result in less increase in FRE (section Introducing a partial-volume model). We have added a corresponding comment in our discussion on practical imaging considerations:

      "In summary, numerous theoretical and practical considerations remain for optimal imaging of pial arteries using time-of-flight contrast. Depending on the application, advanced displacement artefact compensation strategies may be required, and zero-filling could provide better vessel depiction. Further, an optimal trade-off between SNR, voxel size and acquisition time needs to be found. Currently, the partial-volume FRE model only considers voxel size, and—as we reduced the voxel size in the experiments—we (partially) compensated the reduction in SNR through longer scan times. This, ultimately, also required the use of prospective motion correction to enable the very long acquisition times necessary for 140 µm isotropic voxel size. Often, anisotropic voxels are used to reduce acquisition time and increase SNR while maintaining in-plane resolution. This may indeed prove advantageous when the (also highly anisotropic) arteries align with the anisotropic acquisition, e.g. when imaging the large supplying arteries oriented mostly in the head-foot direction. In the case of pial arteries, however, there is not preferred orientation because of the convoluted nature of the pial arterial vasculature encapsulating the complex folding of the cortex (see section Anatomical architecture of the pial arterial vasculature). A further reduction in voxel size may be possible in dedicated research settings utilizing even longer acquisition times and a larger field-of-view to maintain SNR. However, if acquisition time is limited, voxel size and SNR need to be carefully balanced against each other."

      Reviewer #2 (Public Review):

      Overview

      This paper explores the use of inflow contrast MRI for imaging the pial arteries. The paper begins by providing a thorough background description of pial arteries, including past studies investigating the velocity and diameter. Following this, the authors consider this information to optimize the contrast between pial arteries and background tissue. This analysis reveals spatial resolution to be a strong factor influencing the contrast of the pial arteries. Finally, experiments are performed on a 7T MRI to investigate: the effect of spatial resolution by acquiring images at multiple resolutions, demonstrate the feasibility of acquiring ultrahigh resolution 3D TOF, the effect of displacement artifacts, and the prospect of using T2* to remove venous voxels.

      Impression

      There is certainly interest in tools to improve our understanding of the architecture of the small vessels of the brain and this work does address this. The background description of the pial arteries is very complete and the manuscript is very well prepared. The images are also extremely impressive, likely benefiting from motion correction, 7T, and a very long scan time. The authors also commit to open science and provide the data in an open platform. Given this, I do feel the manuscript to be of value to the community; however, there are concerns with the methods for optimization, the qualitative nature of the experiments, and conclusions drawn from some of the experiments.

      Specific Comments :

      1) Figure 3 and Theory surrounding. The optimization shown in Figure 3 is based fixing the flip angle or the TR. As is well described in the literature, there is a strong interdependency of flip angle and TR. This is all well described in literature dating back to the early 90s. While I think it reasonable to consider these effects in optimization, the language needs to include this interdependency or simply reference past work and specify how the flip angle was chosen. The human experiments do not include any investigation of flip angle or TR optimization.

      We thank the reviewer for raising this valuable point, and we fully agree that there is an interdependency between these two parameters. To simplify our optimization, we did fix one parameter value at a time, but in the revised manuscript we clarified that both parameters can be optimized simultaneously. Importantly, a large range of parameter values will result in a similar FRE in the small artery regime, which is illustrated in the optimization provided in the main text. We have therefore chosen the repetition time based on encoding efficiency and then set a corresponding excitation flip angle. In addition, we have also provided additional simulations in the supplementary material outlining the interdependency for the case of pial arteries.

      "Optimization of repetition time and excitation flip angle

      As the main goal of the optimisation here was to start within an already established parameter range for TOF imaging at ultra-high field (Kang et al., 2010; Stamm et al., 2013; von Morze et al., 2007), we only needed to then further tailor these for small arteries by considering a third parameter, namely the blood delivery time. From a practical perspective, a TR of 20 ms as a reference point was favourable, as it offered a time-efficient readout minimizing wait times between excitations but allowing low encoding bandwidths to maximize SNR. Due to the interdependency of flip angle and repetition time, for any one blood delivery time any FRE could (in theory) be achieved. For example, a similar FRE curve at 18 ° flip angle and 5 ms TR can also be achieved at 28 ° flip angle and 20 ms TR; or the FRE curve at 18 ° flip angle and 30 ms TR is comparable to the FRE curve at 8 ° flip angle and 5 ms TR (Supplementary Figure 3 TOP). In addition, the difference between optimal parameter settings diminishes for long blood delivery times, such that at a blood delivery time of 500 ms (Supplementary Figure 3 BOTTOM), the optimal flip angle at a TR of 15 ms, 20 ms or 25 ms would be 14 °, 16 ° and 18 °, respectively. This is in contrast to a blood delivery time of 100 ms, where the optimal flip angles would be 32 °, 37 ° and 41 °. In conclusion, in the regime of small arteries, long TR values in combination with low flip angles ensure flow-related enhancement at blood delivery times of 200 ms and above, and within this regime there are marginal gains by further optimizing parameter values and the optimal values are all similar."

      Supplementary Figure 3: Optimal imaging parameters for small arteries. This assessment follows the simulations presented in Figure 3, but in addition shows the interdependency for the corresponding third parameter (either flip angle or repetition time). TOP: Flip angles close to the Ernst angle show only a marginal flow-related enhancement; however, the influence of the blood delivery time decreases further (LEFT). As the flip angle increases well above the values used in this study, the flow-related enhancement in the small artery regime remains low even for the longer repetition times considered here (RIGHT). BOTTOM: The optimal excitation flip angle shows reduced variability across repetition times in the small artery regime compared to shorter blood delivery times.

      "Based on these equations, optimal T_R and excitation flip angle values (θ) can be calculated for the blood delivery times under consideration (Figure 3). To better illustrate the regime of small arteries, we have illustrated the effect of either flip angle or T_R while keeping the other parameter values fixed to the value that was ultimately used in the experiments; although both parameters can also be optimized simultaneously (Haacke et al., 1990). Supplementary Figure 3 further delineates the interdependency between flip angle and T_R within a parameter range commonly used for TOF imaging at ultra-high field (Kang et al., 2010; Stamm et al., 2013; von Morze et al., 2007). Note how longer T_R values still provide an FRE effect even at very long blood delivery times, whereas using shorter T_R values can suppress the FRE effect (Figure 3, left). Similarly, at lower flip angles the FRE effect is still present for long blood delivery times, but it is not available anymore at larger flip angles, which, however, would give maximum FRE for shorter blood delivery times (Figure 3, right). Due to the non-linear relationships of both blood delivery time and flip angle with FRE, the optimal imaging parameters deviate considerably when comparing blood delivery times of 100 ms and 300 ms, but the differences between 300 ms and 1000 ms are less pronounced. In the following simulations and measurements, we have thus used a T_R value of 20 ms, i.e. a value only slightly longer than the readout of the high-resolution TOF acquisitions, which allowed time-efficient data acquisition, and a nominal excitation flip angle of 18°. From a practical standpoint, these values are also favorable as the low flip angle reduces the specific absorption rate (Fiedler et al., 2018) and the long T_R value decreases the potential for peripheral nerve stimulation (Mansfield and Harvey, 1993)."

      2) Figure 4 and Theory surrounding. A major limitation of this analysis is the lack of inclusion of noise in the analysis. I believe the results to be obvious that the FRE will be modulated by partial volume effects, here described quadratically by assuming the vessel to pass through the voxel. This would substantially modify the analysis, with a shift towards higher voxel volumes (scan time being equal). The authors suggest the FRE to be the dominant factor effecting segmentation; however, segmentation is limited by noise as much as contrast.

      We of course agree with the reviewer that contrast-to-noise ratio is a key factor that determines the detection of vessels and the quality of the segmentation, however there are subtleties regarding the exact inter-relationship between CNR, resolution, and segmentation performance.

      The main purpose of Figure 4 is not to provide a trade-off between flow-related enhancement and signal-to-noise ratio—in particular as SNR is modulated by many more factors than voxel size alone, e.g. acquisition time, coil geometry and instrumentation—but to decide whether the limiting factor for imaging pial arteries is the reduction in flow-related enhancement due to long blood delivery times (which is the explanation often found in the literature (Chen et al., 2018; Haacke et al., 1990; Masaryk et al., 1989; Mut et al., 2014; Park et al., 2020; Parker et al., 1991; Wilms et al., 2001; Wright et al., 2013)) or due to partial volume effects. Furthermore, when reducing voxel size one will also likely increase the number of encoding steps to maintain the imaging coverage (i.e., the field-of-view) and so the relationship between voxel size and SNR in practice is not straightforward. Therefore, we had to conclude that deducing a meaningful SNR analysis that would benefit the reader was not possible given the available data due to the complex relationship between voxel size and other imaging parameters. Note that these considerations are not specific to imaging the pial arteries but apply to all MRA acquisitions, and have thus been discussed previously in the literature. Here, we wanted to focus on the novel insights gained in our study, namely that it provides an expression for how relative FRE contrast changes with voxel size with some assumptions that apply for imaging pial arteries.

      Further, depending on the definition of FRE and whether partial-volume effects are included (see also our response to R2.8), larger voxel volumes have been found to be theoretically advantageous even when only considering contrast (Du et al., 1996; Venkatesan and Haacke, 1997), which is not in line with empirical observations (Al-Kwifi et al., 2002; Bouvy et al., 2014; Haacke et al., 1990; Ladd, 2007; Mattern et al., 2018; von Morze et al., 2007).

      The notion that vessel segmentation algorithms perform well on noisy data but poorly on low-contrast data was mainly driven by our own experiences. However, we still believe that the assumption that (all) segmentation algorithms are linearly dependent on contrast and noise (which the formulation of a contrast-to-noise ratio presumes) is similarly not warranted. Indeed, the necessary trade-off between FRE and SNR might be specific to the particular segmentation algorithm being used than a general property of the acquisition. Please also note that our analysis of the FRE does not suggest that an arbitrarily high resolution is needed. Importantly, while we previously noted that reducing voxel size improves contrast in vessels whose diameters are smaller than the voxel size, we now explicitly acknowledge that, for vessels whose diameters are larger than the voxel size reducing the voxel size is not helpful---since it only reduces SNR without any gain in contrast---and may hinder segmentation performance, and thus become counterproductive. But we take the reviewer’s point and also acknowledge that these intricacies need to be mentioned, and therefore we have rephrased the statement in the discussion in the following way:

      "In general, we have not considered SNR, but only FRE, i.e. the (relative) image contrast, assuming that segmentation algorithms would benefit from higher contrast for smaller arteries. Importantly, the acquisition parameters available to maximize FRE are limited, namely repetition time, flip angle and voxel size. SNR, however, can be improved via numerous avenues independent of these parameters (Brown et al., 2014b; Du et al., 1996; Heverhagen et al., 2008; Parker et al., 1991; Triantafyllou et al., 2011; Venkatesan and Haacke, 1997), the simplest being longer acquisition times. If the aim is to optimize a segmentation outcome for a given acquisition time, the trade-off between contrast and SNR for the specific segmentation algorithm needs to be determined (Klepaczko et al., 2016; Lesage et al., 2009; Moccia et al., 2018; Phellan and Forkert, 2017). Our own—albeit limited—experience has shown that segmentation algorithms (including manual segmentation) can accommodate a perhaps surprising amount of noise using prior knowledge and neighborhood information, making these high-resolution acquisitions possible. Importantly, note that our treatment of the FRE does not suggest that an arbitrarily small voxel size is needed, but instead that voxel sizes appropriate for the arterial diameter of interest are beneficial (in line with the classic “matched-filter” rationale (North, 1963)). Voxels smaller than the arterial diameter would not yield substantial benefits (Figure 5) and may result in SNR reductions that would hinder segmentation performance."

      3) Page 11, Line 225. "only a fraction of the blood is replaced" I think the language should be reworded. There are certainly water molecules in blood which have experience more excitation B1 pulses due to the parabolic flow upstream and the temporal variation in flow. There is magnetization diffusion which reduces the discrepancy; however, it seems pertinent to just say the authors assume the signal is represented by the average arrival time. This analysis is never verified and is only approximate anyways. The "blood dwell time" is also an average since voxels near the wall will travel more slowly. Overall, I recommend reducing the conjecture in this section.

      We fully agree that our treatment of the blood dwell time does not account for the much more complex flow patterns found in cortical arteries. However, our aim was not do comment on these complex patterns, but to help establish if, in the simplest scenario assuming plug flow, the often-mentioned slow blood flow requires multiple velocity compartments to describe the FRE (as is commonly done for 2D MRA (Brown et al., 2014a; Carr and Carroll, 2012)). We did not intend to comment on the effects of laminar flow or even more complex flow patterns, which would require a more in-depth treatment. However, as the small arteries targeted here are often just one voxel thick, all signals are indeed integrated within that voxel (i.e. there is no voxel near the wall that travels more slowly), which may average out more complex effects. We have clarified the purpose and scope of this section in the following way:

      "In classical descriptions of the FRE effect (Brown et al., 2014a; Carr and Carroll, 2012), significant emphasis is placed on the effect of multiple “velocity segments” within a slice in the 2D imaging case. Using the simplified plug-flow model, where the cross-sectional profile of blood velocity within the vessel is constant and effects such as drag along the vessel wall are not considered, these segments can be described as ‘disks’ of blood that do not completely traverse through the full slice within one T_R, and, thus, only a fraction of the blood in the slice is replaced. Consequently, estimation of the FRE effect would then need to accommodate contribution from multiple ‘disks’ that have experienced 1 to k RF pulses. In the case of 3D imaging as employed here, multiple velocity segments within one voxel are generally not considered, as the voxel sizes in 3D are often smaller than the slice thickness in 2D imaging and it is assumed that the blood completely traverses through a voxel each T_R. However, the question arises whether this assumption holds for pial arteries, where blood velocity is considerably lower than in intracranial vessels (Figure 2). To answer this question, we have computed the blood dwell time , i.e. the average time it takes the blood to traverse a voxel, as a function of blood velocity and voxel size (Figure 2). For reference, the blood velocity estimates from the three studies mentioned above (Bouvy et al., 2016; Kobari et al., 1984; Nagaoka and Yoshida, 2006) have been added in this plot as horizontal white lines. For the voxel sizes of interest here, i.e. 50–300 μm, blood dwell times are, for all but the slowest flows, well below commonly used repetition times (Brown et al., 2014a; Carr and Carroll, 2012; Ladd, 2007; von Morze et al., 2007). Thus, in a first approximation using the plug-flow model, it is not necessary to include several velocity segments for the voxel sizes of interest when considering pial arteries, as one might expect from classical treatments, and the FRE effect can be described by equations (1) – (3), simplifying our characterization of FRE for these vessels. When considering the effect of more complex flow patterns, it is important to bear in mind that the arteries targeted here are only one-voxel thick, and signals are integrated across the whole artery."

      4) Page 13, Line 260. "two-compartment modelling" I think this section is better labeled "Extension to consider partial volume effects" The compartments are not interacting in any sense in this work.

      Thank you for this suggestion. We have replaced the heading with Introducing a partial-volume model (page 14) and replaced all instances of ‘two-compartment model’ with ‘partial-volume model’.

      5) Page 14, Line 284. "In practice, a reduction in slab …." "reducing the voxel size is a much more promising avenue" There is a fair amount on conjecture here which is not supported by experiments. While this may be true, the authors also use a classical approach with quite thin slabs.

      The slab thickness used in our experiments was mainly limited by the acquisition time and the participants ability to lie still. We indeed performed one measurement with a very experienced participant with a thicker slab, but found that with over 20 minutes acquisition time, motion artefacts were unavoidable. The data presented in Figure 5 were acquired with similar slab thickness, supporting the statement that reducing the voxel size is a promising avenue for imaging small pial arteries. However, we indeed have not provided an empirical comparison of the effect of slab thickness. Nevertheless, we believe it remains useful to make the theoretical argument that due to the convoluted nature of the pial arterial vascular geometry, a reduction in slab thickness may not reduce the acquisition time if no reduction in intra-slab vessel length can be achieved, i.e. if the majority of the artery is still contained in the smaller slab. We have clarified the statement and removed the direct comparison (‘much more’ promising) in the following way:

      "In theory, a reduction in blood delivery time increases the FRE in both regimes, and—if the vessel is smaller than the voxel—so would a reduction in voxel size. In practice, a reduction in slab thickness―which is the default strategy in classical TOF-MRA to reduce blood delivery time―might not provide substantial FRE increases for pial arteries. This is due to their convoluted geometry (see section Anatomical architecture of the pial arterial vasculature), where a reduction in slab thickness may not necessarily reduce the vessel segment length if the majority of the artery is still contained within the smaller slab. Thus, given the small arterial diameter, reducing the voxel size is a promising avenue when imaging the pial arterial vasculature."

      6) Figure 5. These image differences are highly exaggerated by the lack of zero filling (or any interpolation) and the fact that the wildly different. The interpolation should be addressed, and the scan time discrepancy listed as a limitation.

      We have extended the discussion around zero-filling by including additional considerations based on the imaging parameters in Figure 5 and highlighted the substantial differences in voxel volume. Our choice not to perform zero-filling was driven by the open question of what an ‘optimal’ zero-filling factor would be. We have also highlighted the substantial differences in acquisition time when describing the results.

      Changes made to the results section:

      "To investigate the effect of voxel size on vessel FRE, we acquired data at four different voxel sizes ranging from 0.8 mm to 0.3 mm isotropic resolution, adjusting only the encoding matrix, with imaging parameters being otherwise identical (FOV, TR, TE, flip angle, R, slab thickness, see section Data acquisition). The total acquisition time increases from less than 2 minutes for the lowest resolution scan to over 6 minutes for the highest resolution scan as a result."

      Changes made to the discussion section:

      "Nevertheless, slight qualitative improvements in image appearance have been reported for higher zero-filling factors (Du et al., 1994), presumably owing to a smoother representation of the vessels (Bartholdi and Ernst, 1973). In contrast, Mattern et al. (2018) reported no improvement in vessel contrast for their high-resolution data. Ultimately, for each application, e.g. visual evaluation vs. automatic segmentation, the optimal zero-filling factor needs to be determined, balancing image appearance (Du et al., 1994; Zhu et al., 2013) with loss in statistical independence of the image noise across voxels. For example, in Figure 5, when comparing across different voxel sizes, the visual impression might improve with zero-filling. However, it remains unclear whether the same zero-filling factor should be applied for each voxel size, which means that the overall difference in resolution remains, namely a nearly 20-fold reduction in voxel volume when moving from 0.8-mm isotropic to 0.3-mm isotropic voxel size. Alternatively, the same ’zero-filled’ voxel sizes could be used for evaluation, although then nearly 94 % of the samples used to reconstruct the image with 0.8-mm voxel size would be zero-valued for a 0.3-mm isotropic resolution. Consequently, all data presented in this study were reconstructed without zero-filling."

      7) Figure 7. Given the limited nature of experiment may it not also be possible the subject moved more, had differing brain blood flow, etc. Were these lengthy scans acquired in the same session? Many of these differences could be attributed to other differences than the small difference in spatial resolution.

      The scans were acquired in the same session using the same prospective motion correction procedure. Note that the acquisition time of the images with 0.16 mm isotropic voxel size was comparatively short, taking just under 12 minutes. Although the difference in spatial resolution may seem small, it still amounts to a 33% reduction in voxel volume. For comparison, reducing the voxel size from 0.4 mm to 0.3 mm also ‘only’ reduces the voxel volume by 58 %—not even twice as much. Overall, we fully agree that additional validation and optimisation of the imaging parameters for pial arteries are beneficial and have added a corresponding statement to the Discussion section.

      Changes made to the results section (also in response to Reviewer 1 (R1.22))

      "We have also acquired one single slab with an isotropic voxel size of 0.16 mm with prospective motion correction for this participant in the same session to compare to the acquisition with 0.14 mm isotropic voxel size and to test whether any gains in FRE are still possible at this level of the vascular tree."

      Changes made to the discussion section:

      "Acquiring these data at even higher field strengths would boost SNR (Edelstein et al., 1986; Pohmann et al., 2016) to partially compensate for SNR losses due to acceleration and may enable faster imaging and/or smaller voxel sizes. This could facilitate the identification of the ultimate limit of the flow-related enhancement effect and identify at which stage of the vascular tree does the blood delivery time become the limiting factor. While Figure 7 indicates the potential for voxel sizes below 0.16 mm, the singular nature of this comparison warrants further investigations."

      8) Page 22, Line 395. Would the analysis be any different with an absolute difference? The FRE (Eq 6) divides by a constant value. Clearly there is value in the difference as other subtractive inflow imaging would have infinite FRE (not considering noise as the authors do).

      Absolutely; using an absolute FRE would result in the highest FRE for the largest voxel size, whereas in our data small vessels are more easily detected with the smallest voxel size. We also note that relative FRE would indeed become infinite if the value in the denominator representing the tissue signal was zero, but this special case highlights how relative FRE can help characterize “segmentability”: a vessel with any intensity surrounded by tissue with an intensity of zero is trivially/infinitely segmentatble. We have added this point to the revised manuscript as indicated below.

      Following the suggestion of Reviewer 1 (R1.2), we have included additional simulations to clarify the effects of relative FRE definition and partial-volume model, in which we show that only when considering both together are smaller voxel sizes advantageous (Supplementary Material).

      "Effect of FRE Definition and Interaction with Partial-Volume Model

      For the definition of the FRE effect in this study, we used a measure of relative FRE (Al-Kwifi et al., 2002) in combination with a partial-volume model (Eq. 6). To illustrate the effect of these two definitions, as well as their interaction, we have estimated the relative and absolute FRE for an artery with a diameter of 200 µm and 2 000 µm (i.e. no partial-volume effects). The absolute FRE explicitly takes the voxel volume into account, i.e. instead of Eq. (6) for the relative FRE we used"

      Eq. (1)

      Note that the division by

      to obtain the relative FRE removes the contribution of the total voxel volume

      "Supplementary Figure 2 shows that, when partial volume effects are present, the highest relative FRE arises in voxels with the same size as or smaller than the vessel diameter (Supplementary Figure 2A), whereas the absolute FRE increases with voxel size (Supplementary Figure 2C). If no partial-volume effects are present, the relative FRE becomes independent of voxel size (Supplementary Figure 2B), whereas the absolute FRE increases with voxel size (Supplementary Figure 2D). While the partial-volume effects for the relative FRE are substantial, they are much more subtle when using the absolute FRE and do not alter the overall characteristics."

      Supplementary Figure 2: Effect of voxel size and blood delivery time on the relative flow-related enhancement (FRE) using either a relative (A,B) (Eq. (3)) or an absolute (C,D) (Eq. (12)) FRE definition assuming a pial artery diameter of 200 μm (A,C) or 2 000 µm, i.e. no partial-volume effects at the central voxel of this artery considered here.

      Following the established literature (Brown et al., 2014a; Carr and Carroll, 2012; Haacke et al., 1990) and because we would ultimately derive a relative measure, we have omitted the effect of voxel volume on the longitudinal magnetization in our derivations, which make it appear as if we are dividing by a constant in Eq. 6, as the effect of total voxel volume cancels out for the relative FRE. We have now made this more explicit in our derivation of the partial volume model.

      "Introducing a partial-volume model

      To account for the effect of voxel volume on the FRE, the total longitudinal magnetization M_z needs to also consider the number of spins contained within in a voxel (Du et al., 1996; Venkatesan and Haacke, 1997). A simple approximation can be obtained by scaling the longitudinal magnetization with the voxel volume (Venkatesan and Haacke, 1997) . To then include partial volume effects, the total longitudinal magnetization in a voxel M_z^total becomes the sum of the contributions from the stationary tissue M_zS^tissue and the inflowing blood M_z^blood, weighted by their respective volume fractions V_rel:"

      A simple approximation can be obtained by scaling the longitudinal magnetization with the voxel volume (Venkatesan and Haacke, 1997) . To then include partial volume effects, the total longitudinal magnetization in a voxel M_z^total becomes the sum of the contributions from the stationary tissue M_zS^tissue and the inflowing blood M_z^blood, weighted by their respective volume fractions V_rel:

      Eq. (4)

      For simplicity, we assume a single vessel is located at the center of the voxel and approximate it to be a cylinder with diameter d_vessel and length l_voxel of an assumed isotropic voxel along one side. The relative volume fraction of blood V_rel^blood is the ratio of vessel volume within the voxel to total voxel volume (see section Estimation of vessel-volume fraction in the Supplementary Material), and the tissue volume fraction V_rel^tissue is the remainder that is not filled with blood, or

      Eq. (5)

      We can now replace the blood magnetization in equation Eq. (3) with the total longitudinal magnetization of the voxel to compute the FRE as a function of vessel-volume fraction:

      Eq. (6)

      Based on your suggestion, we have also extended our interpretation of relative and absolute FRE. Indeed, a subtractive flow technique where no signal in the background remains and only intensities in the object are present would have infinite relative FRE, as this basically constitutes a perfect segmentation (bar a simple thresholding step).

      "Extending classical FRE treatments to the pial vasculature

      There are several major modifications in our approach to this topic that might explain why, in contrast to predictions from classical FRE treatments, it is indeed possible to image pial arteries. For instance, the definition of vessel contrast or flow-related enhancement is often stated as an absolute difference between blood and tissue signal (Brown et al., 2014a; Carr and Carroll, 2012; Du et al., 1993, 1996; Haacke et al., 1990; Venkatesan and Haacke, 1997). Here, however, we follow the approach of Al-Kwifi et al. (2002) and consider relative contrast. While this distinction may seem to be semantic, the effect of voxel volume on FRE for these two definitions is exactly opposite: Du et al. (1996) concluded that larger voxel size increases the (absolute) vessel-background contrast, whereas here we predict an increase in relative FRE for small arteries with decreasing voxel size. Therefore, predictions of the depiction of small arteries with decreasing voxel size differ depending on whether one is considering absolute contrast, i.e. difference in longitudinal magnetization, or relative contrast, i.e. contrast differences independent of total voxel size. Importantly, this prediction changes for large arteries where the voxel contains only vessel lumen, in which case the relative FRE remains constant across voxel sizes, but the absolute FRE increases with voxel size (Supplementary Figure 9). Overall, the interpretations of relative and absolute FRE differ, and one measure may be more appropriate for certain applications than the other. Absolute FRE describes the difference in magnetization and is thus tightly linked to the underlying physical mechanism. Relative FRE, however, describes the image contrast and segmentability. If blood and tissue magnetization are equal, both contrast measures would equal zero and indicate that no contrast difference is present. However, when there is signal in the vessel and as the tissue magnetization approaches zero, the absolute FRE approaches the blood magnetization (assuming no partial-volume effects), whereas the relative FRE approaches infinity. While this infinite relative FRE does not directly relate to the underlying physical process of ‘infinite’ signal enhancement through inflowing blood, it instead characterizes the segmentability of the image in that an image with zero intensity in the background and non-zero values in the structures of interest can be segmented perfectly and trivially. Accordingly, numerous empirical observations (Al-Kwifi et al., 2002; Bouvy et al., 2014; Haacke et al., 1990; Ladd, 2007; Mattern et al., 2018; von Morze et al., 2007) and the data provided here (Figure 5, 6 and 7) have shown the benefit of smaller voxel sizes if the aim is to visualize and segment small arteries."

      9) Page 22, Line 400. "The appropriateness of " This also ignores noise. The absolute enhancement is the inherent magnetization available. The results in Figure 5, 6, 7 don't readily support a ratio over and absolute difference accounting for partial volume effects.

      We hope that with the additional explanations on the effects of relative FRE definition in combination with a partial-volume model and the interpretation of relative FRE provided in the previous response (R2.8) and that Figures 5, 6 and 7 show smaller arteries for smaller voxels, we were able to clarify our argument why only relative FRE in combination with a partial volume model can explain why smaller voxel sizes are advantageous for depicting small arteries.

      While we appreciate that there exists a fundamental relationship between SNR and voxel volume in MR (Brown et al., 2014b), this relationship is also modulated by many more factors (as we have argued in our responses to R2.2 and R1.4b).

      We hope that the additional derivations and simulations provided in the previous response have clarified why a relative FRE model in combination with a partial-volume model helps to explain the enhanced detectability of small vessels with small voxels.

      10) Page 24, Line 453. "strategies, such as radial and spiral acquisitions, experience no vessel displacement artefact" These do observe flow related distortions as well, just not typically called displacement.

      Yes, this is a helpful point, as these methods will also experience a degradation of spatial accuracy due to flow effects, which will propagate into errors in the segmentation.

      As the reviewer suggests, flow-related artefacts in radial and spiral acquisitions usually manifest as a slight blur, and less as the prominent displacement found in Cartesian sampling schemes. We have added a corresponding clarification to the Discussion section:

      "Other encoding strategies, such as radial and spiral acquisitions, experience no vessel displacement artefact because phase and frequency encoding take place in the same instant; although a slight blur might be observed instead (Nishimura et al., 1995, 1991). However, both trajectories pose engineering challenges and much higher demands on hardware and reconstruction algorithms than the Cartesian readouts employed here (Kasper et al., 2018; Shu et al., 2016); particularly to achieve 3D acquisitions with 160 µm isotropic resolution."

      11) Page 24, Line 272. "although even with this nearly ideal subject behaviour approximately 1 in 4 scans still had to be discarded and repeated" This is certainly a potential source of bias in the comparisons.

      We apologize if this section was written in a misleading way. For the comparison presented in Figure 7, we acquired one additional slab in the same session at 0.16 mm voxel size using the same prospective motion correction procedure as for the 0.14 mm data. For the images shown in Figure 6 and Supplementary Figure 4 at 0.16 mm voxel size, we did not use a motion correction system and, thus, had to discard a portion of the data. We have clarified that for the comparison of the high-resolution data, prospective motion correction was used for both resolutions. We have clarified this in the Discussion section:

      "This allowed for the successful correction of head motion of approximately 1 mm over the 60-minute scan session, showing the utility of prospective motion correction at these very high resolutions. Note that for the comparison in Figure 7, one slab with 0.16 mm voxel size was acquired in the same session also using the prospective motion correction system. However, for the data shown in Figure 6 and Supplementary Figure 4, no prospective motion correction was used, and we instead relied on the experienced participants who contributed to this study. We found that the acquisition of TOF data with 0.16 mm isotropic voxel size in under 12 minutes acquisition time per slab is possible without discernible motion artifacts, although even with this nearly ideal subject behaviour approximately 1 in 4 scans still had to be discarded and repeated."

      12) Page 25, Line 489. "then need to include the effects of various analog and digital filters" While the analysis may benefit from some of this, most is not at all required for analysis based on optimization of the imaging parameters.

      We have included all four correction factors for completeness, given the unique acquisition parameter and contrast space our time-of-flight acquisition occupies, e.g. very low bandwidth of only 100 Hz, very large matrix sizes > 1024 samples, ideally zero SNR in the background (fully supressed tissue signal). However, we agree that probably the most important factor is the non-central chi distribution of the noise in magnitude images from multiple-channel coil arrays, and have added this qualification in the text:

      "Accordingly, SNR predictions then need to include the effects of various analog and digital filters, the number of acquired samples, the noise covariance correction factor, and—most importantly—the non-central chi distribution of the noise statistics of the final magnitude image (Triantafyllou et al., 2011)."

      Al-Kwifi, O., Emery, D.J., Wilman, A.H., 2002. Vessel contrast at three Tesla in time-of-flight magnetic resonance angiography of the intracranial and carotid arteries. Magnetic Resonance Imaging 20, 181–187. https://doi.org/10.1016/S0730-725X(02)00486-1

      Arts, T., Meijs, T.A., Grotenhuis, H., Voskuil, M., Siero, J., Biessels, G.J., Zwanenburg, J., 2021. Velocity and Pulsatility Measures in the Perforating Arteries of the Basal Ganglia at 3T MRI in Reference to 7T MRI. Frontiers in Neuroscience 15. Avants, B.B., Tustison, N., Song, G., 2009. Advanced normalization tools (ANTS). Insight j 2, 1–35. Bae, K.T., Park, S.-H., Moon, C.-H., Kim, J.-H., Kaya, D., Zhao, T., 2010. Dual-echo arteriovenography imaging with 7T MRI: CODEA with 7T. J. Magn. Reson. Imaging 31, 255–261. https://doi.org/10.1002/jmri.22019

      Bartholdi, E., Ernst, R.R., 1973. Fourier spectroscopy and the causality principle. Journal of Magnetic Resonance (1969) 11, 9–19. https://doi.org/10.1016/0022-2364(73)90076-0

      Bernier, M., Cunnane, S.C., Whittingstall, K., 2018. The morphology of the human cerebrovascular system. Human Brain Mapping 39, 4962–4975. https://doi.org/10.1002/hbm.24337

      Bouvy, W.H., Biessels, G.J., Kuijf, H.J., Kappelle, L.J., Luijten, P.R., Zwanenburg, J.J.M., 2014. Visualization of Perivascular Spaces and Perforating Arteries With 7 T Magnetic Resonance Imaging: Investigative Radiology 49, 307–313. https://doi.org/10.1097/RLI.0000000000000027

      Bouvy, W.H., Geurts, L.J., Kuijf, H.J., Luijten, P.R., Kappelle, L.J., Biessels, G.J., Zwanenburg, J.J.M., 2016. Assessment of blood flow velocity and pulsatility in cerebral perforating arteries with 7-T quantitative flow MRI: Blood Flow Velocity And Pulsatility In Cerebral Perforating Arteries. NMR Biomed. 29, 1295–1304. https://doi.org/10.1002/nbm.3306

      Brown, R.W., Cheng, Y.-C.N., Haacke, E.M., Thompson, M.R., Venkatesan, R., 2014a. Chapter 24 - MR Angiography and Flow Quantification, in: Magnetic Resonance Imaging. John Wiley & Sons, Ltd, pp. 701–737. https://doi.org/10.1002/9781118633953.ch24

      Brown, R.W., Cheng, Y.-C.N., Haacke, E.M., Thompson, M.R., Venkatesan, R., 2014b. Chapter 15 - Signal, Contrast, and Noise, in: Magnetic Resonance Imaging. John Wiley & Sons, Ltd, pp. 325–373. https://doi.org/10.1002/9781118633953.ch15

      Carr, J.C., Carroll, T.J., 2012. Magnetic resonance angiography: principles and applications. Springer, New York. Cassot, F., Lauwers, F., Fouard, C., Prohaska, S., Lauwers-Cances, V., 2006. A Novel Three-Dimensional Computer-Assisted Method for a Quantitative Study of Microvascular Networks of the Human Cerebral Cortex. Microcirculation 13, 1–18. https://doi.org/10.1080/10739680500383407

      Chen, L., Mossa-Basha, M., Balu, N., Canton, G., Sun, J., Pimentel, K., Hatsukami, T.S., Hwang, J.-N., Yuan, C., 2018. Development of a quantitative intracranial vascular features extraction tool on 3DMRA using semiautomated open-curve active contour vessel tracing: Comprehensive Artery Features Extraction From 3D MRA. Magn. Reson. Med 79, 3229–3238. https://doi.org/10.1002/mrm.26961

      Choi, U.-S., Kawaguchi, H., Kida, I., 2020. Cerebral artery segmentation based on magnetization-prepared two rapid acquisition gradient echo multi-contrast images in 7 Tesla magnetic resonance imaging. NeuroImage 222, 117259. https://doi.org/10.1016/j.neuroimage.2020.117259

      Conolly, S., Nishimura, D., Macovski, A., Glover, G., 1988. Variable-rate selective excitation. Journal of Magnetic Resonance (1969) 78, 440–458. https://doi.org/10.1016/0022-2364(88)90131-X

      Deistung, A., Dittrich, E., Sedlacik, J., Rauscher, A., Reichenbach, J.R., 2009. ToF-SWI: Simultaneous time of flight and fully flow compensated susceptibility weighted imaging. J. Magn. Reson. Imaging 29, 1478–1484. https://doi.org/10.1002/jmri.21673

      Detre, J.A., Leigh, J.S., Williams, D.S., Koretsky, A.P., 1992. Perfusion imaging. Magnetic Resonance in Medicine 23, 37–45. https://doi.org/10.1002/mrm.1910230106

      Du, Y., Parker, D.L., Davis, W.L., Blatter, D.D., 1993. Contrast-to-Noise-Ratio Measurements in Three-Dimensional Magnetic Resonance Angiography. Investigative Radiology 28, 1004–1009. Du, Y.P., Jin, Z., 2008. Simultaneous acquisition of MR angiography and venography (MRAV). Magn. Reson. Med. 59, 954–958. https://doi.org/10.1002/mrm.21581

      Du, Y.P., Parker, D.L., Davis, W.L., Cao, G., 1994. Reduction of partial-volume artifacts with zero-filled interpolation in three-dimensional MR angiography. J. Magn. Reson. Imaging 4, 733–741. https://doi.org/10.1002/jmri.1880040517

      Du, Y.P., Parker, D.L., Davis, W.L., Cao, G., Buswell, H.R., Goodrich, K.C., 1996. Experimental and theoretical studies of vessel contrast-to-noise ratio in intracranial time-of-flight MR angiography. Journal of Magnetic Resonance Imaging 6, 99–108. https://doi.org/10.1002/jmri.1880060120

      Duvernoy, H., Delon, S., Vannson, J.L., 1983. The Vascularization of The Human Cerebellar Cortex. Brain Research Bulletin 11, 419–480. Duvernoy, H.M., Delon, S., Vannson, J.L., 1981. Cortical blood vessels of the human brain. Brain Research Bulletin 7, 519–579. https://doi.org/10.1016/0361-9230(81)90007-1

      Eckstein, K., Bachrata, B., Hangel, G., Widhalm, G., Enzinger, C., Barth, M., Trattnig, S., Robinson, S.D., 2021. Improved susceptibility weighted imaging at ultra-high field using bipolar multi-echo acquisition and optimized image processing: CLEAR-SWI. NeuroImage 237, 118175. https://doi.org/10.1016/j.neuroimage.2021.118175

      Edelstein, W.A., Glover, G.H., Hardy, C.J., Redington, R.W., 1986. The intrinsic signal-to-noise ratio in NMR imaging. Magn. Reson. Med. 3, 604–618. https://doi.org/10.1002/mrm.1910030413

      Fan, A.P., Govindarajan, S.T., Kinkel, R.P., Madigan, N.K., Nielsen, A.S., Benner, T., Tinelli, E., Rosen, B.R., Adalsteinsson, E., Mainero, C., 2015. Quantitative oxygen extraction fraction from 7-Tesla MRI phase: reproducibility and application in multiple sclerosis. J Cereb Blood Flow Metab 35, 131–139. https://doi.org/10.1038/jcbfm.2014.187

      Fiedler, T.M., Ladd, M.E., Bitz, A.K., 2018. SAR Simulations & Safety. NeuroImage 168, 33–58. https://doi.org/10.1016/j.neuroimage.2017.03.035

      Frässle, S., Aponte, E.A., Bollmann, S., Brodersen, K.H., Do, C.T., Harrison, O.K., Harrison, S.J., Heinzle, J., Iglesias, S., Kasper, L., Lomakina, E.I., Mathys, C., Müller-Schrader, M., Pereira, I., Petzschner, F.H., Raman, S., Schöbi, D., Toussaint, B., Weber, L.A., Yao, Y., Stephan, K.E., 2021. TAPAS: An Open-Source Software Package for Translational Neuromodeling and Computational Psychiatry. Front. Psychiatry 12. https://doi.org/10.3389/fpsyt.2021.680811

      Gulban, O.F., Bollmann, S., Huber, R., Wagstyl, K., Goebel, R., Poser, B.A., Kay, K., Ivanov, D., 2021. Mesoscopic Quantification of Cortical Architecture in the Living Human Brain. https://doi.org/10.1101/2021.11.25.470023

      Haacke, E.M., Masaryk, T.J., Wielopolski, P.A., Zypman, F.R., Tkach, J.A., Amartur, S., Mitchell, J., Clampitt, M., Paschal, C., 1990. Optimizing blood vessel contrast in fast three-dimensional MRI. Magn. Reson. Med. 14, 202–221. https://doi.org/10.1002/mrm.1910140207

      Helthuis, J.H.G., van Doormaal, T.P.C., Hillen, B., Bleys, R.L.A.W., Harteveld, A.A., Hendrikse, J., van der Toorn, A., Brozici, M., Zwanenburg, J.J.M., van der Zwan, A., 2019. Branching Pattern of the Cerebral Arterial Tree. Anat Rec 302, 1434–1446. https://doi.org/10.1002/ar.23994

      Heverhagen, J.T., Bourekas, E., Sammet, S., Knopp, M.V., Schmalbrock, P., 2008. Time-of-Flight Magnetic Resonance Angiography at 7 Tesla. Investigative Radiology 43, 568–573. https://doi.org/10.1097/RLI.0b013e31817e9b2c

      Hirsch, S., Reichold, J., Schneider, M., Székely, G., Weber, B., 2012. Topology and Hemodynamics of the Cortical Cerebrovascular System. J Cereb Blood Flow Metab 32, 952–967. https://doi.org/10.1038/jcbfm.2012.39

      Horn, B.K.P., Schunck, B.G., 1981. Determining optical flow. Artificial Intelligence 17, 185–203. https://doi.org/10.1016/0004-3702(81)90024-2

      Huck, J., Wanner, Y., Fan, A.P., Jäger, A.-T., Grahl, S., Schneider, U., Villringer, A., Steele, C.J., Tardif, C.L., Bazin, P.-L., Gauthier, C.J., 2019. High resolution atlas of the venous brain vasculature from 7 T quantitative susceptibility maps. Brain Struct Funct 224, 2467–2485. https://doi.org/10.1007/s00429-019-01919-4

      Johst, S., Wrede, K.H., Ladd, M.E., Maderwald, S., 2012. Time-of-Flight Magnetic Resonance Angiography at 7 T Using Venous Saturation Pulses With Reduced Flip Angles. Investigative Radiology 47, 445–450. https://doi.org/10.1097/RLI.0b013e31824ef21f

      Kang, C.-K., Park, C.-A., Kim, K.-N., Hong, S.-M., Park, C.-W., Kim, Y.-B., Cho, Z.-H., 2010. Non-invasive visualization of basilar artery perforators with 7T MR angiography. Journal of Magnetic Resonance Imaging 32, 544–550. https://doi.org/10.1002/jmri.22250

      Kasper, L., Engel, M., Barmet, C., Haeberlin, M., Wilm, B.J., Dietrich, B.E., Schmid, T., Gross, S., Brunner, D.O., Stephan, K.E., Pruessmann, K.P., 2018. Rapid anatomical brain imaging using spiral acquisition and an expanded signal model. NeuroImage 168, 88–100. https://doi.org/10.1016/j.neuroimage.2017.07.062

      Klepaczko, A., Szczypiński, P., Deistung, A., Reichenbach, J.R., Materka, A., 2016. Simulation of MR angiography imaging for validation of cerebral arteries segmentation algorithms. Computer Methods and Programs in Biomedicine 137, 293–309. https://doi.org/10.1016/j.cmpb.2016.09.020

      Kobari, M., Gotoh, F., Fukuuchi, Y., Tanaka, K., Suzuki, N., Uematsu, D., 1984. Blood Flow Velocity in the Pial Arteries of Cats, with Particular Reference to the Vessel Diameter. J Cereb Blood Flow Metab 4, 110–114. https://doi.org/10.1038/jcbfm.1984.15

      Ladd, M.E., 2007. High-Field-Strength Magnetic Resonance: Potential and Limits. Top Magn Reson Imaging 18, 139–152. Lesage, D., Angelini, E.D., Bloch, I., Funka-Lea, G., 2009. A review of 3D vessel lumen segmentation techniques: Models, features and extraction schemes. Medical Image Analysis 13, 819–845. https://doi.org/10.1016/j.media.2009.07.011

      Maderwald, S., Ladd, S.C., Gizewski, E.R., Kraff, O., Theysohn, J.M., Wicklow, K., Moenninghoff, C., Wanke, I., Ladd, M.E., Quick, H.H., 2008. To TOF or not to TOF: strategies for non-contrast-enhanced intracranial MRA at 7 T. Magn Reson Mater Phy 21, 159. https://doi.org/10.1007/s10334-007-0096-9

      Manjón, J.V., Coupé, P., Martí‐Bonmatí, L., Collins, D.L., Robles, M., 2010. Adaptive non-local means denoising of MR images with spatially varying noise levels. Journal of Magnetic Resonance Imaging 31, 192–203. https://doi.org/10.1002/jmri.22003

      Mansfield, P., Harvey, P.R., 1993. Limits to neural stimulation in echo-planar imaging. Magn. Reson. Med. 29, 746–758. https://doi.org/10.1002/mrm.1910290606

      Masaryk, T.J., Modic, M.T., Ross, J.S., Ruggieri, P.M., Laub, G.A., Lenz, G.W., Haacke, E.M., Selman, W.R., Wiznitzer, M., Harik, S.I., 1989. Intracranial circulation: preliminary clinical results with three-dimensional (volume) MR angiography. Radiology 171, 793–799. https://doi.org/10.1148/radiology.171.3.2717754

      Mattern, H., Sciarra, A., Godenschweger, F., Stucht, D., Lüsebrink, F., Rose, G., Speck, O., 2018. Prospective motion correction enables highest resolution time-of-flight angiography at 7T: Prospectively Motion-Corrected TOF Angiography at 7T. Magn. Reson. Med 80, 248–258. https://doi.org/10.1002/mrm.27033

      Mattern, H., Sciarra, A., Lüsebrink, F., Acosta‐Cabronero, J., Speck, O., 2019. Prospective motion correction improves high‐resolution quantitative susceptibility mapping at 7T. Magn. Reson. Med 81, 1605–1619. https://doi.org/10.1002/mrm.27509

      Mennes, M., Jenkinson, M., Valabregue, R., Buitelaar, J.K., Beckmann, C., Smith, S., 2014. Optimizing full-brain coverage in human brain MRI through population distributions of brain size. NeuroImage 98, 513–520. https://doi.org/10.1016/j.neuroimage.2014.04.030 Moccia, S., De Momi, E., El Hadji, S., Mattos, L.S., 2018. Blood vessel segmentation algorithms — Review of methods, datasets and evaluation metrics. Computer Methods and Programs in Biomedicine 158, 71–91. https://doi.org/10.1016/j.cmpb.2018.02.001

      Mustafa, M.A.R., 2016. A data-driven learning approach to image registration. Mut, F., Wright, S., Ascoli, G.A., Cebral, J.R., 2014. Morphometric, geographic, and territorial characterization of brain arterial trees. International Journal for Numerical Methods in Biomedical Engineering 30, 755–766. https://doi.org/10.1002/cnm.2627

      Nagaoka, T., Yoshida, A., 2006. Noninvasive Evaluation of Wall Shear Stress on Retinal Microcirculation in Humans. Invest. Ophthalmol. Vis. Sci. 47, 1113. https://doi.org/10.1167/iovs.05-0218

      Nishimura, D.G., Irarrazabal, P., Meyer, C.H., 1995. A Velocity k-Space Analysis of Flow Effects in Echo-Planar and Spiral Imaging. Magnetic Resonance in Medicine 33, 549–556. https://doi.org/10.1002/mrm.1910330414

      Nishimura, D.G., Jackson, J.I., Pauly, J.M., 1991. On the nature and reduction of the displacement artifact in flow images. Magnetic Resonance in Medicine 22, 481–492. https://doi.org/10.1002/mrm.1910220255

      Nonaka, H., Akima, M., Hatori, T., Nagayama, T., Zhang, Z., Ihara, F., 2003. Microvasculature of the human cerebral white matter: Arteries of the deep white matter. Neuropathology 23, 111–118. https://doi.org/10.1046/j.1440-1789.2003.00486.x

      North, D.O., 1963. An Analysis of the factors which determine signal/noise discrimination in pulsed-carrier systems. Proceedings of the IEEE 51, 1016–1027. https://doi.org/10.1109/PROC.1963.2383

      Park, C.S., Hartung, G., Alaraj, A., Du, X., Charbel, F.T., Linninger, A.A., 2020. Quantification of blood flow patterns in the cerebral arterial circulation of individual (human) subjects. Int J Numer Meth Biomed Engng 36. https://doi.org/10.1002/cnm.3288

      Parker, D.L., Goodrich, K.C., Roberts, J.A., Chapman, B.E., Jeong, E.-K., Kim, S.-E., Tsuruda, J.S., Katzman, G.L., 2003. The need for phase-encoding flow compensation in high-resolution intracranial magnetic resonance angiography. J. Magn. Reson. Imaging 18, 121–127. https://doi.org/10.1002/jmri.10322

      Parker, D.L., Yuan, C., Blatter, D.D., 1991. MR angiography by multiple thin slab 3D acquisition. Magn. Reson. Med. 17, 434–451. https://doi.org/10.1002/mrm.1910170215

      Pauling, L., Coryell, C.D., 1936. The magnetic properties and structure of hemoglobin, oxyhemoglobin and carbonmonoxyhemoglobin. Proceedings of the National Academy of Sciences 22, 210–216. https://doi.org/10.1073/pnas.22.4.210

      Payne, S.J., 2017. Cerebral Blood Flow And Metabolism: A Quantitative Approach. World Scientific. Peters, A.M., Brookes, M.J., Hoogenraad, F.G., Gowland, P.A., Francis, S.T., Morris, P.G., Bowtell, R., 2007. T2* measurements in human brain at 1.5, 3 and 7 T. Magnetic Resonance Imaging 25, 748–753. https://doi.org/10.1016/j.mri.2007.02.014

      Pfeifer, R.A., 1930. Grundlegende Untersuchungen für die Angioarchitektonik des menschlichen Gehirns. Berlin: Julius Springer. Phellan, R., Forkert, N.D., 2017. Comparison of vessel enhancement algorithms applied to time-of-flight MRA images for cerebrovascular segmentation. Medical Physics 44, 5901–5915. https://doi.org/10.1002/mp.12560

      Pohmann, R., Speck, O., Scheffler, K., 2016. Signal-to-Noise Ratio and MR Tissue Parameters in Human Brain Imaging at 3, 7, and 9.4 Tesla Using Current Receive Coil Arrays. Magn. Reson. Med. 75, 801–809. https://doi.org/10.1002/mrm.25677

      Reichenbach, J.R., Venkatesan, R., Schillinger, D.J., Kido, D.K., Haacke, E.M., 1997. Small vessels in the human brain: MR venography with deoxyhemoglobin as an intrinsic contrast agent. Radiology 204, 272–277. https://doi.org/10.1148/radiology.204.1.9205259 Schmid, F., Barrett, M.J.P., Jenny, P., Weber, B., 2019. Vascular density and distribution in neocortex. NeuroImage 197, 792–805. https://doi.org/10.1016/j.neuroimage.2017.06.046

      Schmitter, S., Bock, M., Johst, S., Auerbach, E.J., Uğurbil, K., Moortele, P.-F.V. de, 2012. Contrast enhancement in TOF cerebral angiography at 7 T using saturation and MT pulses under SAR constraints: Impact of VERSE and sparse pulses. Magnetic Resonance in Medicine 68, 188–197. https://doi.org/10.1002/mrm.23226

      Schulz, J., Boyacioglu, R., Norris, D.G., 2016. Multiband multislab 3D time-of-flight magnetic resonance angiography for reduced acquisition time and improved sensitivity. Magn Reson Med 75, 1662–8. https://doi.org/10.1002/mrm.25774

      Shu, C.Y., Sanganahalli, B.G., Coman, D., Herman, P., Hyder, F., 2016. New horizons in neurometabolic and neurovascular coupling from calibrated fMRI, in: Progress in Brain Research. Elsevier, pp. 99–122. https://doi.org/10.1016/bs.pbr.2016.02.003

      Stamm, A.C., Wright, C.L., Knopp, M.V., Schmalbrock, P., Heverhagen, J.T., 2013. Phase contrast and time-of-flight magnetic resonance angiography of the intracerebral arteries at 1.5, 3 and 7 T. Magnetic Resonance Imaging 31, 545–549. https://doi.org/10.1016/j.mri.2012.10.023

      Stewart, A.W., Robinson, S.D., O’Brien, K., Jin, J., Widhalm, G., Hangel, G., Walls, A., Goodwin, J., Eckstein, K., Tourell, M., Morgan, C., Narayanan, A., Barth, M., Bollmann, S., 2022. QSMxT: Robust masking and artifact reduction for quantitative susceptibility mapping. Magnetic Resonance in Medicine 87, 1289–1300. https://doi.org/10.1002/mrm.29048

      Stucht, D., Danishad, K.A., Schulze, P., Godenschweger, F., Zaitsev, M., Speck, O., 2015. Highest Resolution In Vivo Human Brain MRI Using Prospective Motion Correction. PLoS ONE 10, e0133921. https://doi.org/10.1371/journal.pone.0133921

      Szikla, G., Bouvier, G., Hori, T., Petrov, V., 1977. Angiography of the Human Brain Cortex. Springer Berlin Heidelberg, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-81145-6

      Triantafyllou, C., Polimeni, J.R., Wald, L.L., 2011. Physiological noise and signal-to-noise ratio in fMRI with multi-channel array coils. NeuroImage 55, 597–606. https://doi.org/10.1016/j.neuroimage.2010.11.084

      Tustison, N.J., Avants, B.B., Cook, P.A., Zheng, Y., Egan, A., Yushkevich, P.A., Gee, J.C., 2010. N4ITK: Improved N3 Bias Correction. IEEE Transactions on Medical Imaging 29, 1310–1320. https://doi.org/10.1109/TMI.2010.2046908

      Uludağ, K., Müller-Bierl, B., Uğurbil, K., 2009. An integrative model for neuronal activity-induced signal changes for gradient and spin echo functional imaging. NeuroImage 48, 150–165. https://doi.org/10.1016/j.neuroimage.2009.05.051

      Venkatesan, R., Haacke, E.M., 1997. Role of high resolution in magnetic resonance (MR) imaging: Applications to MR angiography, intracranial T1-weighted imaging, and image interpolation. International Journal of Imaging Systems and Technology 8, 529–543. https://doi.org/10.1002/(SICI)1098-1098(1997)8:6<529::AID-IMA5>3.0.CO;2-C

      von Morze, C., Xu, D., Purcell, D.D., Hess, C.P., Mukherjee, P., Saloner, D., Kelley, D.A.C., Vigneron, D.B., 2007. Intracranial time-of-flight MR angiography at 7T with comparison to 3T. J. Magn. Reson. Imaging 26, 900–904. https://doi.org/10.1002/jmri.21097

      Ward, P.G.D., Ferris, N.J., Raniga, P., Dowe, D.L., Ng, A.C.L., Barnes, D.G., Egan, G.F., 2018. Combining images and anatomical knowledge to improve automated vein segmentation in MRI. NeuroImage 165, 294–305. https://doi.org/10.1016/j.neuroimage.2017.10.049

      Wilms, G., Bosmans, H., Demaerel, Ph., Marchal, G., 2001. Magnetic resonance angiography of the intracranial vessels. European Journal of Radiology 38, 10–18. https://doi.org/10.1016/S0720-048X(01)00285-6

      Wright, S.N., Kochunov, P., Mut, F., Bergamino, M., Brown, K.M., Mazziotta, J.C., Toga, A.W., Cebral, J.R., Ascoli, G.A., 2013. Digital reconstruction and morphometric analysis of human brain arterial vasculature from magnetic resonance angiography. NeuroImage 82, 170–181. https://doi.org/10.1016/j.neuroimage.2013.05.089

      Yushkevich, P.A., Piven, J., Hazlett, H.C., Smith, R.G., Ho, S., Gee, J.C., Gerig, G., 2006. User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability. NeuroImage 31, 1116–1128. https://doi.org/10.1016/j.neuroimage.2006.01.015

      Zhang, Z., Deng, X., Weng, D., An, J., Zuo, Z., Wang, B., Wei, N., Zhao, J., Xue, R., 2015. Segmented TOF at 7T MRI: Technique and clinical applications. Magnetic Resonance Imaging 33, 1043–1050. https://doi.org/10.1016/j.mri.2015.07.002

      Zhao, J.M., Clingman, C.S., Närväinen, M.J., Kauppinen, R.A., van Zijl, P.C.M., 2007. Oxygenation and hematocrit dependence of transverse relaxation rates of blood at 3T. Magn. Reson. Med. 58, 592–597. https://doi.org/10.1002/mrm.21342

      Zhu, X., Tomanek, B., Sharp, J., 2013. A pixel is an artifact: On the necessity of zero-filling in fourier imaging. Concepts Magn. Reson. 42A, 32–44. https://doi.org/10.1002/cmr.a.21256

    1. Author Response:

      Reviewer #3 (Public Review):

      Two cell types in the parasubthalamic nucleus (a region of the posterior hypothalamus) are activated following food intake. The authors determine that the Tac1 expressing population is sufficient to suppress food intake and the Crh population does not influence food intake. Further, the authors demonstrate that only the Tac1 population projects to the PBN. The Tac1 neurons are transiently activated following food presentation or satiation hormones (for about 1 minute). This transient change in activity is interesting and fits into a lot of other recently published work showing transient neural activity changes that are involved in longer term behavior. Longer term activation of these neurons reduces food intake and the authors begin to explore the circuits/networks that these neurons influence. Overall, the work is well done and the experiments support the conclusions. Some minor clarifications could enhance the manuscript and could be addressed through further analysis or adding in text.

      1. What % of the overall PSTN neurons are tac1/crh (ie, how many other cell types are there?). Or what % of the vglut2 neurons do they make. This just requires further analysis of the current dataset. And, are there any GABAergic cells (like are the PV GABAergic)?

      We thank the Reviewer for suggesting this analysis because it is interesting and other readers are likely to ask the same questions. In our original submission we were hesitant to report these values because they ultimately represent an approximation. Because the neurons that surround the PSTN are also glutamatergic (including the subthalamic nucleus and the lateral hypothalamic area), it is impossible to precisely delineate the border of the PSTN using Slc17a6 as a marker. However, this is an important question and we feel that reporting these values while qualifying them as an estimation will be impactful. Therefore, in the revised manuscript, we now include the following statement:

      “Although it is impossible to delineate a precise border for the PSTN using Slc17a6 because adjacent regions are also glutamatergic, we estimate that ~22% of Slc17a6- expressing neurons within the PSTN region do not express either Tac1 or Crh, indicating the presence of glutamatergic PSTN cell types that may express other unique genetic markers.”

      We did not examine GABAergic expression in the PSTN because the Allen Brain Atlas and recent RNA-Seq studies (e.g., Wallén-Mackenzie et al., 2020) found an almost complete absence of Gad1- and Gad2-expressing cells in the PSTN region. We report this previous finding within the Results:

      “Expression of the GABAergic markers Gad1 and Gad2 are notably absent from the PSTN region (Shah et al., 2022).”

      2. The 60 second increase in tac1 neuron activity is interesting. In the discussion, the authors present some plausible arguments for how that may affect feeding for hours. Additionally, it would be nice to point out that this is a recurring theme. This occurs in other neuron populations that influence food intake. Although this is seemingly counterintuitive, I think it is good to mention as these short-term neural activity changes are clearly having large effects on behavior and it is important for everyone to realize this.

      This point is an excellent observation and we agree that we could highlight other studies showing transient activation of neural activity controlling food intake. Therefore, we added to our Discussion:

      “Indeed, many other neural populations that regulate food intake behavior also show a transient increase in neural activity on the timescale of seconds (Berrios et al, 2021; Luskin et al., 2021; Mohammad et al., 2021; Wu et al., 2022).”

      3. Something a little strange with the meal frequency. I thought CCK reduced meal size not frequency. Why does the rescue then increase frequency? Could it be that the rescue to the CCK is by a different means than just blocking the effect of CCK? Adding some language to the discussion about how to interpret the satiation peptide data would be useful.

      We thank the Reviewer for bringing up this interesting point. Previous studies do indicate that CCK (and also amylin, to a large extent) reduces meal size and does not have much of an effect on meal frequency. We therefore added a paragraph to the Discussion to note and discuss this point:

      “It is also noteworthy that chemogenetic inhibition of PSTN^Tac1 neurons attenuates the effects of amylin, CCK, and PYY by decreasing the frequency of meals as opposed to meal size or meal duration (Figure 5). Previous studies of these anorexigenic hormones, especially amylin and CCK, indicate that they affect food intake primarily by decreasing meal size as opposed to meal frequency (Drazen and Woods, 2003; Lutz et al., 1995; West et al., 1987). Therefore, inhibition of PSTN^Tac1 neurons might attenuate the effects of these hormones indirectly, perhaps by reducing activity in downstream populations such as the NTS or PBN. In this model, infusion of anorexigenic hormones activate PSTN^Tac1 neurons that, in turn, cause sustained activation of downstream populations. Without this sustained activity, downstream populations may not have sufficient activity to cause a reduction in the intermeal interval, leading to increased bouts of feeding. The mechanism by which anorexigenic hormones activate PSTN^Tac1 neurons, as well as how decreases in PSTN^Tac1 neuronal activity affect downstream populations, are important topics for future investigation.”

      4. The axonal stimulation data needs qualification - as axons could project to multiple target regions (like the projections to the PVT could also have a collateral to the CEA). For this type of experiment, I prefer to use the phrase "neurons with a projection to region X do behavior Y". Otherwise, the implication in reading the results is that the particular projection is mediating the behavior. Also, the collateral issue, which is qualified in the discussion, should be mentioned here.

      We see the Reviewer’s point and have revised the language to highlight this important qualification of our results. Specifically, we added text in the Results section in regard to Figure 8:

      “Because it is unknown whether PSTNneurons send collateral projections to multiple brain regions, it is possible that stimulation in a single projection target causes antidromic activation to one or more other target areas. Therefore, these results indicate that PSTNTac1 neurons with projections to the CeA, PVT, PBN, and NTS can suppress food intake, although the exact functional role of each downstream target region on food intake behavior remains undetermined.”

  5. Mar 2022
    1. Author Response:

      Reviewer #2 (Public Review):

      In their supplementary section A.3-1.5 the authors perform QTL simulations to assess the performance of their analysis methods. Of particular interest is the performance of their cross-validated stepwise forward search methodology, which was used to identify all the QTL. However, a major limitation of their simulations was their choice of genetic architectures. In their simulations, all variants have a mean effect of 1% and a random sign. They also simulated 15, 50, or 150 QTL, which spans a range of sparse architectures, but not highly polygenic ones. It was unclear how the results would change as a function of different trait heritability. The simulations should explore a wider range of genetic architectures, with effect sizes sampled from normal or exponential distributions, as is more commonly done in the field.

      As suggested, we have expanded the range of simulations we explore in the revised manuscript. We note that the original simulations discussed in the manuscript involve exponentially distributed effect sizes (with a mean of 1% and random sign) at multiple different heritability values. These are described in Figures A3-4 and A3-5. We also simulated epistatic terms (Figure A3-3.3). In the revision, we have broadened the simulations to add more ‘highly polygenic’ architectures (1000 QTL). We find that the algorithm still performs well, though worse than when 150 QTL are simulated. The forward search behaves in a fairly intuitive way: QTLs get added when the contribution of a true QTL to the explained phenotypic variance overcomes the model bias and variance. QTLs are only missed if their effect size is too low to contribute significantly to phenotypic variance, or if they are in strong linkage and thus their independent discovery barely increases the variance explained (which is all finally controlled by the trait heritability). At much higher polygenicity, composite QTL can be detected as a single QTL when their sum contribute to phenotypic variance, and get broken up if and only if independent sums also contribute significantly to phenotypic variance. Of course, there are many ways to break up composite QTL, but the algorithm proceeds in a greedy fashion focusing on unexplained variance. We have also explored cases with multiple QTL of the same effect, and with different mean effects or different number of epistatic terms, but we found these results were largely redundant. To summarize these conclusions, we have added the following discussion at the end of the results section: “The behavior of this approach is simple and intuitive: the algorithm greedily adds QTL if their expected contribution to the total phenotypic variance exceeds the bias and increasing variance of the forward search procedure, which is greatly reduced at large sample size. Thus, it may fail to identify very small effect size variants and may fail to break up composite QTL in extremely strong linkage.”

      We have also added additional clarification in the Appendix: “These results allow us to gain some intuition for how our cross-validated forward search operates. […] However, while our panel of spores is very large, it remains underpowered in several cases: 1) when QTL have very low effect size, therefore not contributing significantly to the phenotypic variance, and 2) when composite QTL are in strong linkage and few spores have recombination between the QTL, then the individual identification of QTL only contributes marginally to the explained variance and the forward search may also miss them.”

      In this simulation section, the authors show that the lasso model overestimates the number of causal variants by a factor of 2-10, and that the model underestimates the number of QTL except in the case of a very sparse genetic architecture of 15 QTL and heritability > 0.8. This indicates that the experimental study is underpowered if there are >50 causal variants, and that the detected QTL do not necessarily correspond to real underlying genetic effects, as revealed by the model similarity scores shown in A3-4. This limitation should be factored into the discussion of the ability of the study to break up "composite" QTL, and more generally, detect QTL of small effect.

      We agree with some aspects of this comment, but the details are a bit subtle. First, we note that the definition of underpowered depends on the specifics of the QTL assumed in the simulation. In addition, many of the simulations were performed at 10,000 segregants, not at 100,000, with no effort to enforce a minimum effect size, or minimum distance between QTL. For example, if 100 QTL are all evenly spaced (in recombination space) and all have the same effect such that they all contribute the same to the phenotypic variance, then the algorithm is in principle maximally powered to detect these. This is why our algorithm is capable of finding >100 QTL per environment. On the other hand, just 2 QTL in complete linkage cannot be distinguished and no panel size will be able to detect these.

      However, we do agree with the general need to discuss the limitations in more detail and have clarified these concerns in the ‘Polygenicity’ result section. We have also reiterated the limitations of the LASSO approach within the simulation section. The motivation for an L0 normalization in this data was first discussed in the section A3-1.3: “Unfortunately, a harsh condition for model consistency is the lack of strong collinearity between true and spurious predictors (Zhao & Yu, 2006). This is always violated in QTL mapping studies if recombination frequencies between nearby SNPs are low. In these cases, the LASSO will almost always choose multiple correlated predictors and distribute the true QTL effect amongst them.”

      In section A3-2.3, the authors develop a model similarity score presented in A3-4 for the simulations. The measure is similar to R^2 in that it ranges from 0 to 1, but beyond that it is not clear how to interpret what constitutes a "good" score. The authors should provide some guidance on interpreting this novel metric. It might also be helpful to see the causal and lead QTLs SNPs compared directly on chromosome plots.

      We agree that this was unclear, and have added additional discussion in the main text describing how to interpret the model similarity score. Essentially, the score is a Pearson’s correlation coefficient on the model coefficient (as defined in section A3-2.3, after equation A3-28). However, given a single QTL that spans two SNPs in close linkage, a pure Pearson’s correlation coefficient would have high variance, as subtle noise in the data could lead to one SNP being called the lead SNP vs the other, and two models that call the same QTL might have either 100% correlation, or 0% correlation. Instead, our model similarity score ‘aligns’ these predicted QTL before obtaining the correlation coefficient. The degree at which QTL are aligned are based on penalties with respect to collinearity (or linkage) between the SNPs, and the maximum possible score is obtained by dynamic programming. Similar to sequence alignments between two completely unrelated sequences, a score of 0 is unlikely to occur on sufficiently large models as at least a few QTL can usually be paired (erroneously). We have also added a mention in the main text referring to Figures A3-3, A3-7, A3-8, A3-9, which show the causal and lead QTL SNP directly on the chromosome plots.

      The authors performed validation experiments for 6 individual SNPs and 9 pairs of RM SNPs engineered onto the BY background. It was promising that the experiments showed a positive correlation between the predicted and measured fitness effects; however, the authors did not perform power calculations, which makes it hard to evaluate the success of each individual experiment. The main text also does not make clear why these SNPS were chosen over others-was this done according to their effect sizes, or was other prior information incorporated in the choice to validate these particular variants? The authors chose to focus mostly on epistatic interactions in the validation experiments, but given their limited power to detect such interactions, it would probably be more informative to perform validation for a larger number of individual SNPs in order to test the ability of the study to detect causal variants across a range of effect sizes. The authors should perform some power calculations for their validation experiments, and describe in detail the process they employed to select these particular SNPs for validation.

      We agree with the thrust of the comment, but some of the suggestions are impossible to implement because of practical constraints on the experimental methods (and to a lesser extent on the model inference). First, we chose the SNPs to reconstruct based on three main factors: (a) to ensure that we are validating the right locus, the model must have a confident prediction that that specific SNP is causal, (b) the predicted effect must be large enough in at least one environment that we would expect to reliably measure it given the detection limits of our experimental fitness measurements, and (c) the SNP must be in a location that is amenable to CRISPR-Cas9 or Delitto Perfetto reconstruction. In practice, this means that it is impossible to validate SNPs across a wide range of effect sizes, as smaller-effect SNPs have wider confidence intervals around the lead SNP (violating condition a) and have effects that are harder to measure experimentally (violating condition b). In addition, because the cloning constraints mentioned in (c) require experimental testing for each SNP we analyze, it is much easier to construct combinations of a smaller set of SNPs than a larger set of individual SNPs. Together, these considerations motivated our choice of specific SNPs and of the overall structure of the validation experiments (6 individual and 9 pairs, rather than a broader set of individual SNPs).

      In the revised manuscript, we have added a more detailed discussion of these motivations for selecting particular SNPs for validation, and mention the inherent limitations imposed by the practical constraints involved. We have also added a description of the power and resolution of the experimental fitness measurements of the reconstructed genotypes (we can detect approximately ~0.5% fitness differences in most conditions). We are unsure if there are any other types of power calculations the reviewer is referring to, but we are only attempting to note an overall positive correlation between predicted and measured effects, not making any claims about the success of any individual validation (these can fail for a variety of reasons including experimental artifacts with reconstructions, model errors in identifying the correct causal SNP, unresolved higher-order epistasis, and noise in our fitness measurements, among others).

      In section A3-1.4, the authors describe their fine-mapping methodology, but as presented is difficult to understand. Was the fine-mapping performed using a model that includes all the other QTL effects, or was the range of the credible set only constrained to fall between the lead SNPs of the nearest QTL or the ends of the chromosome, whichever is closest to the QTL under investigation? The methodology presented on its face looks similar to the approximate Bayes credible interval described in Manichaikul et al. (PMID: 16783000). The authors should cite the relevant literature, and expand this section so that it is easier to understand exactly what was done.

      We have attempted to clarify section A3-1.4. As the reviewer correctly points out, the fine mapping for a QTL is performed by scanning an interval between neighboring detected QTL (on either side) and using a model that includes all other QTL. For example, if a detected QTL is a SNP found in a closed interval of 12 SNPs produced by its two neighboring QTL, 10 independent likelihoods are obtained (re-optimizing all effect sizes for each), and a posterior probability is obtained for each of the ten possible positions. We have cited the recommended paper, as our approach is indeed based on an approximate Bayes credible interval similar to the one described in that study (using all SNPs instead of markers). We have added the following sentence to the A3-1.4 section at the end of the second paragraph (similar to the analogous paragraph in Manichaikul et al): “[…] as above by obtaining the maximum likelihood of the data given that a single QTL is found at each possible SNP position between its neighboring QTL and given all detected other QTL (thus obtaining a likelihood profile for the considered positions of the QTL). We then used a uniform prior on the location of the QTL to derive a posterior distribution, from which one can derive an interval that exceeds 0.95.” Some typos referring to a ‘confidence’ interval were also changed to ‘credible’ interval.

      The text explicitly describes an issue with the HMM employed for genotyping: "we find that the genotyping is accurate, with detectable error only very near recombination breakpoints". The genotypes near recombination breakpoints are precisely what is used to localize and fine-map QTL, and it is therefore important to discuss in the text whether the authors think this source of error impacts their results.

      This is a good point, we have added a reference in the main text to the Appendix section (A1-1.4) that has an extensive discussion and analysis of the effect of recombination breakpoint uncertainties on finemapping.

      The use of a count-based HMM to infer genotypes has been previously described in the literature (PMID: 29487138), and this should be included in the references.

      We now also add this citation to our text on the count-based HMM.

    1. Author Response:

      Reviewer #1 (Public Review):

      Major Comments

      I am concerned that a lot of these studies had relatively low n numbers (n=5 in some cases) and that some of the studies may have been underpowered. Given the variability with in vivo studies, some endpoints may have been significant with more numbers. Along these lines, what is the justification for using the (parametric) ANOVA test. I'm not a statistician but I thought that the rule of thumb was that non-parametric tests should be used if n<12 since you cannot verify that the data is normally distributed. In this case, I would recommend having a statistician look at it and/or increasing some of the N's, or using the non-parametric Kruskal-Wallis test. Indeed, in some cases, the variation the variation is quite large (ie Fig 6, 7). Whilst I do not think that the low N's change the ultimate conclusions, but more rigor (ie more N's) would help solidify the paper given that it will likely be of great interest and scrutinized by the scientific community.

      We conducted power analyses prior to the start of the studies to identify the number of animals per group to use, based on our past studies of inflammatory changes induced by inhalants, infections and asthma. We set the target number of mice (n) at that time, such that these studies would be powered to detect a 25% change in cytokine expression. We did go through and reviewed all of the data with our biostatisticians, we came to the conclusion that it would not be statistically appropriate to run more mice to increase the n when our primary outcome remains the same. We double-checked that the ANOVAs with corrections for multiple comparisons were correct for each particular experiment. Discussion with our statistician confirmed that ANOVA is correct as long as the data passed normality testing, which was done. An additional point, and most relevant to this specific recommendation, JUUL Mint and JUUL Mango flavors are no longer on the market, such that extensive further studies are not feasible. While these two flavors are not available anymore, they were composed of an array of chemicals commonly found in other flavors (but in different combinations), such that we believe that these data are most likely relevant to other vapes. In particular, JUUL Mint shares chemical features with JUUL Menthol, which took its place as one of the most popular JUUL flavors. The discontinuation of these flavors has been added as a limitation within the Discussion

      Fig S3. For the lung histology, please quantify the mean linear intercept per ATS guidelines and show representative BAL images.

      We have conducted the mean linear intercept (MLI) measurements on e-cigarette aerosol exposed lungs and controls per ATS guidelines and have added these data to the manuscript (new Appendix 1- Figure 4M). We paired these data with the original histology images (Appendix 1 – Figure 4A-4L). We have added appropriate methods (pages 21-22) and results (page 9) as well. Of note, the MLI data matches our original physiologic assessments of lung function (Appendix 1 – Figure 2A-2J), including elastance and compliance, which are known to change in the setting of emphysema. MLI, lung elastance and compliance were no different across inhalant groups and controls. Further, we have taken representative images of Giemsa Wright stained BAL samples, and have added these to the manuscript (new Appendix 1 Figure - 3E-3J and 3O-3T) paired with BAL cell count data.

      One of the most novel conclusions from this paper is increased inflammation in the brain which the authors speculate could lead to altered moods and or change the addiction threshold. I would tend to agree with this conclusion, but could the authors perform additional mouse psychological tests to confirm this? Also, were there observable physiological responses in the vaped mice that could be reported which may correlate this conclusion, ie changes in grooming, fur ruffling or other behavioral changes?

      We are thrilled that the Reviewer is as interested in these implications as we are, because we believe the neuroinflammation detected is quite frightening, particularly because it is likely to impact both behavior and mood. We have added further discussion regarding the potential consequences of inflammation in each of the organs (pages 13-19), with an emphasis on the effects of neuroinflammation on behavior and psychology. We have subdivided the Discussion section to highlight potential effects on each distinct organ.

      While we are not a behavioral lab, and thus running behavioral studies in mice is beyond the scope of both our lab and this manuscript, we agree that the neuroinflammation is of great interest and further studies are needed to best assess potential psychological and behavioral changes. Of note, we did not observe any overt behavioral changes - we closely observe the mice both during and after exposures and make notes regarding grouping, fur, and activity level - none of which were changed by the different vaping exposures. We have added the lack of dedicated behavioral and psychological evaluations as a limitation of this work and as an opportunity for discovery in future studies (page 19- 20).

      Minor comments Change title to state "in mouse". That this study was performed in rodents should be apparent from the outset.

      Actually, our original title does contain “in mice” at the end. Apologies if these words were cut off on your end. We do agree that the title should be apparent that the study was conducted in mice. We wanted to make the title even clearer, so replaced the brand name JUUL with the type of e-device. The title is as follow: “Effects of Mango and Mint pod-based e-cigarette aerosol inhalation on inflammatory states of the brain, lung, heart and colon in mice”

      No changes in collagen deposition were detected using basic histology. Have the reviewers considered performing immunohistochemistry and staining for alpha-smooth muscle actin which may be a more sensitive assay?

      We agree with the reviewer that there are more sensitive tools that can be used. We believe that, in our system, and at 3 months of exposure, JUUL Mint and Mango are not very likely to induce fibrosis, since our data of inflammatory markers and fibrosis associated genes (in homeostatic conditions, Figure 3) show that there are not significant differences, and in some markers, JUUL Mint and Mango exposed mouse lungs are even showing less inflammation than Air controls. In addition, we also showed no differences were obtain in physiological assessment (heart rate, heart rate variability or blood pressure, Appendix 1 – Figure 1). Thus, we do not expect to find significant differences even with additional assays. We are planning on challenging mice with bleomycin in the future, as it may be possible to detect differences in fibrosis in the setting of this pro-fibrotic challenge.

      "Thus long term exposure to Juul does not lead to significant changes...". I would argue that 1-3 months is not long term. Indeed, other researchers have performed 6-12 month ecigarette exposures and it takes a lifetime in humans to develop lung disease after smoking. Since you can detect pro-inflammatory changes but no altered physiology, it may be that alterations in airway physiology are only just beginning.... The authors should modify this sentence and maybe not call their studies "long term".

      We agree with the reviewer and have modified the sentence as follows for a more accurate interpretation of our results (page 9): “Thus, 1 and 3 month exposure to JUUL Mint and Mango aerosols may not cause significant changes in airway physiology, but this does not preclude the possibility that changes may occur with longer exposures, such as 6-12 months.” We have also gone through the entire the manuscript to focus on describing our exposure in terms of months instead of the descriptive terms acute / sub-acute / chronic, and we have removed the word chronic from the title.

      "Differences in LPS induced cytokine levels were no longer observed after 3 month JUUL exposure versus Air control groups". As per the major comments, this might be a power issue - there is certainly a trend for some cytokines.

      It has been seen in prior studies that chronic inhalant use (including and most notably cigarette smoke) can lead to proinflammatory changes in the first days to weeks, but opposite effects thereafter. For example, cigarette smoke inhalation leads to inflammatory changes at 4 weeks that resolve by 12 weeks. Thus, we feel that some of the cytokine findings are not unusual or surprising versus other patterns of inhalant use. However, we agree with the reviewer that IL-1b in cardiac tissue trends in the same direction at 3 months in both JUUL Mint and JUUL Mango exposed mice (Figure 8C and 8D). As per one reviewers’ comments, we combined 1 and 3 month data for merged graphs (Appendix 1 – Figure 4) and when analyzed together (data passed normality testing) further differences at 3 months were identified (see IL-1b in Appendix 1 – Figure 4 panel 4B). We have included these additional figures for each dataset in the Appendix 1 files.

      Of note, because some JUUL flavors are no longer on the market, including JUUL Mint and JUUL Mango, we are unable to run additional studies with these flavors. We are running new studies of the impact of JUUL Tobacco and JUUL Menthol, the two remaining JUUL flavors on the market. However, these studies will take an additional 1- 2 years and thus are beyond the scope of this manuscript. We have expanded the limitation section within the discussion with regards to power, in order to clarify to the readers that some findings are limited by the number of subjects.

      Reviewer #2 (Public Review):

      Under homeostasis conditions, the authors observed sign of inflammatory responses in the brain, the heart and the colon, while no inflammation was detected in the broncho-alveolar lavage fluid of the mice following exposures to JUUL aerosols. Also, JUUL aerosol exposures mediated airway inflammatory responses in the acute lung injury model (LPS). Further, this infection affected the inflammatory responses in the cardiac tissue. Most of the biological adverse effects induced by JUUL aerosols were flavor-specific.

      Strengths include evaluating inflammation in multiple organs, as well as assessing the physiological responses in the lungs (lung function) and cardiovascular system (heart rate, blood pressure), following exposures to JUUL aerosols. Weaknesses include the fact that only female mice were used in this study. Further, the daily exposures to either air or to the JUUL aerosols lasted only 20 min per day. It is unclear how a 20-min exposure is representative of human vaping product use. Also, although daily exposures were conducted for a duration of both 1 and 3 months, time-course effects associated with JUUL aerosols are barely addressed.

      We would like to thank the reviewer for their positive comments on our manuscript. We apologize for our error; in reality we exposed mice for 20 minutes three times daily, so one hour in total per day. We have corrected this error within our Methods. We designed the exposures this way to better mimic human e-cigarette use throughout the day (instead in one intense vaping session per day, which is not the norm). We agree that there is a limitation in using only female mice in the study in case that there are sex-dependent effects, which is definitely an interesting question. We typically start with one sex of mice and then run repeat experiments with the other sex. Unfortunately, this study faced problems beyond our control that prevented us from performing further experiments. In late 2019 the FDA was moving to ban specific flavors for pod devices, which include those for Mint and Mango. In anticipation of the new regulations, JUUL ultimately decided to discontinue JUUL Mint and Mango, and soon they were out of the market. The same process occurred with the other popular JUUL flavors such as Crème Brûlée and Cucumber. We have expanded the limitation section within the Discussion, and have pointed out that because these studies were conducted in female mice alone, the results may not represent effects in males.

      Although there are a few limitations related to this study, which should be included in the manuscript, overall, the authors' claims and conclusions are based on the data that is presented through multiple figures.

      We appreciate the Reviewers comments and have added limitations about the study size, power, lack of male subjects, etc. to the discussion section.

      Reviewer #3 (Public Review):

      Weaknesses

      1. The authors observed neuroinflammation in brain regions responsible for behavior modification, drug reward and formation of anxious or depressive behaviors after exposure to JUUL. The importance of the neuroinflammation is still unclear. It would help demonstrate the pathogenic role of the neuroinflammation by testing animal behaviors. Similar issue for other organ inflammation.

      We are an immunology, inflammation, and lung physiology lab, thus, behavioral studies are beyond the scope of both our lab and this manuscript. However, we agree that the neuroinflammation is of great interest and is highly likely to impact behavior and mood. Further studies are needed to best assess potential psychological and behavioral changes. We believe this work is important to share such that dedicated behavioral science labs can undertake these important studies. We have added these important limitations to the discussion.

      1. Majority of the data are inflammatory cytokine mRNA expression. Other methods would be needed to confirm their expression.

      Of note, in the original submission, we included protein quantification data for both the brain and the lung. We have taken the reviewers comments to heart and have conducted protein-level assays on the cardiac tissues as well, yielding additional data (new Figure 4) that has been added to the methods, results, figures and discussion. Unfortunately, we do not have any additional colonic tissue for protein-level assessments, as all of the tissue was used for the gene transcription and histologic studies. But to take a step back, these studies were originally intended to examine the broad reaching impact of e-cigarette aerosols across the body. This work, and thus this manuscript, was designed to highlight changes at the gene expression level, to demonstrate that e-cigarette use is not benign and does have broad-reaching effects on gene expression. We agree that more work is needed to fully define the impact of e-cigarette use at the protein, cellular, and organ level, but the majority of that work is beyond the scope of this manuscript. To bring the focus back to gene expression, we have conducted RNAseq on the lungs of JUUL exposed mice, and have included those data herein to highlight the effects of ecigarette aerosols on gene expression in the lung, with a particular focus on differences between Mint and Mango flavors (the most popular JUUL flavors at the time of this study). These new data (new Figure 6) support the hypothesis that e-cigarette aerosol inhalation fundamentally alters the lung, which raises the specter of downstream health effects.

      1. The author seemed to assume the difference between JUUL Mango and JUUL Mint is flavor and then came up with the conclusion regarding flavor-dependent changes in several inflammatory responses. Evidence is needed to approve the assumption.

      Although the formulation of JUUL e-liquids is proprietary, their website claims simplicity (https://www.juul.com/learn/pods) in that they use pharmaceutical grade propylene glycol and glycerol (which makes up the majority of their e-liquids), in order to form an aerosol which carries pharmaceutical grade nicotine and benzoic acid (when combined, create a nicotine salt), and flavors (which can be a mixture of natural and artificial ingredients). Thus, according to their website the only difference among the different JUUL pods would be the flavoring components. Hence, we concluded that differences observed in our study between Mint vs Mango should be most likely due to flavor-dependent effects, since base components should be the same. To support this flavor-dependent effect, a study from Omaiye et al in 2019 (PMID: 30896936) showed the variety of different flavoring chemical in all JUUL flavors and how the different JUUL vapors induce different level of cytotoxicity in BEAS-2B cells in vitro based their flavor. We have added relevant discussion to the manuscript.

      1. In most cases, the change of inflammatory cytokines is mild ~2 fold. The author should demonstrate how these marginal changes could affect pathophysiology.

      We agree with the reviewer that the majority of changes in cytokines were relatively small. However, the fact that multiple cytokines are changing in concert indicates a significant shift in immunophenotyping across organs. We are most concerned about how these shifts in the inflammatory state will alter an e-cigarette vapers response to common clinical challenges. In Dr. Kheradmand’s recent work, mice exposed to e-cigarette aerosols with and without nicotine were much more susceptible to acute lung injury in the setting of viral pneumonia. In our work, we utilized the LPS model of acute lung injury to take a first look at the potential impact of JUUL inhalation in particular on susceptibility to lung inflammation. Further work is needed to truly define how the subtle, broad shifts in the cytokine milieu across organs will impact the health of e-cigarette vapers. We have added relevant discussion to the manuscript.

      1. To fully evaluate the health impact of evolving cigarette, it would be informative to included other tobacco or vaping device as control.

      We agree that such comparisons are likely to provide insight into the differences between devices and formulations and versus cigarette smoke, and thus will be incredibly important for the field. However, these comparisons were beyond the scope of this study, whose main goal was to assess the inflammatory and physiological aspects of JUUL in particular. We believe this to be important because JUUL e-cigarettes are the most popular of all e-cigarette devices, and many young users do not use other e-devices or conventional tobacco. Thus, our primary objective of this work was to specifically assess the safety or risk of this device in particular (versus not using any inhalant at all). However, because we have run parallel studies in the past with vape pens, box mods, and conventional tobacco, we are hopeful to start combining data to look for trends and differences across inhalant exposures. For example, we recently published our work on differences in metabolites in the circulation of mice exposed to a wide variety of ecigarette based inhalants (Moshensky et al. Vaping induced metabolomic signatures in the circulation of mice are driven by device type, eliquid, exposure duration and sex. ERJ Open. July 2021 PMID: 34262972). This study is one of the few studies that have employed animal models to test JUUL devices and the only one assessing their effects in different organs, and although we agree that comparisons with other devices is important, it was not the goal of this study.

      1. The longest exposure in the study is 3 months. It is not convicting to come up with conclusions regarding chronic exposure. Some organ showing no difference may be due to the timing.

      We have altered the wording throughout the manuscript to clarify that the 3-month duration is equivalent to 10 to 20 years of inhalant use versus 40 to 50 years for a 6 to 12 month model. We have also removed many instances of the descriptive terms acute, sub-acute and chronic across the manuscript, as focused on using the absolute duration of exposure instead, to avoid accidental extrapolation to longer exposures. Because we utilized cellular and molecular based assays, we were not relying on identifying organ level pathology such as fibrosis, emphysema, and organ dysfunction, all of which would require longer exposures.

    1. Mauro's solicitation

      I think this website is a great way for educators to communicate and share their work and ideas. However, it is important to note that all information may not be as reliable as we expect!

    1. Author Response:

      Reviewer #1 (Public Review):

      1. There was little comment on the strategy/mechanism that enabled subjects to readily attain Target I (MU 1 active alone), and then Target II (MU1 and MU2 active to the same relative degree). To accomplish this, it would seem that the peak firing rate of MU1 during pursuit of Target II could not exceed that during Target I despite an increased neural drive needed to recruit MU2. The most plausible explanation for this absence of additional rate coding in MU1 would be that associated with firing rate saturation (e.g., Fuglevand et al. (2015) Distinguishing intrinsic from extrinsic factors underlying firing rate saturation in human motor units. Journal of Neurophysiology 113, 1310-1322). It would be helpful if the authors might comment on whether firing rate saturation, or other mechanism, seemed to be at play that allowed subjects to attain both targets I and II.

      To place the cursor inside TII, both MU1 and MU2 must discharge action potentials at their corresponding average discharge rate during 10% MVC (± 10% due to the target radius and neglecting the additional gain set manually in each direction). Therefore, subjects could simply exert a force of 10% MVC to reach TII and would successfully place the cursor inside TII. However, to get to TI, MU1 must discharge action potentials at the same rate as during TII hits (i.e. average discharge rate at 10% MVC) while keeping MU2 silent. Based on the performance analysis in Fig 3D, subjects had difficulties moving the cursor towards TI when the difference in recruitment threshold between MU1 and MU2 was small (≤ 1% MVC). In this case, the average discharge rate of MU1 during 10% MVC could not be reached without activating MU2. As could be expected, reaching towards TI became more successful when the difference in recruitment threshold between MU1 and MU2 was relatively large (≥3% MVC). In this case, subjects were able to let MU1 discharge action potentials at its average discharge rate at 10% MVC without triggering activation of MU2 (it seems the discharge rate of MU1 saturated before the onset of MU2). Such behaviour can be observed in Fig. 2A. MUs with a lower recruitment threshold saturate their discharge rate before the force reaches 10% MVC. We adapted the Discussion accordingly to describe this behaviour in more detail.

      1. Figure 4 (and associated Figure 6) is nice, and the discovery of the strategy used by subjects to attain Target III is very interesting. One mechanism that might partially account for this behavior that was not directly addressed is the role inhibition may have played. The size principle also operates for inhibitory inputs. As such, small, low threshold motor neurons will tend to respond to a given amount of inhibitory synaptic current with a greater hyperpolarization than high threshold units. Consequently, once both units were recruited, subsequent gradual augmentation of synaptic inhibition (concurrent with excitation and broadly distributed) could have led to the situation where the low threshold unit was deactivated (because of the higher magnitude hyperpolarization), leaving MU2 discharging in isolation. This possibility might be discussed.

      We agree with the reviewer’s comment that inhibition might have played a critical role in succeeding to reach TIII. Hence, we have added this concept to our discussion.

      1. In a similar vein as for point 2 (above), the argument that PICs may have been the key mechanism enabling the attainment of target III, while reasonable, also seems a little hand wavy. The problem with the argument is that it depends on differential influences of PICs on motor neurons that are 1) low threshold, and 2) have similar recruitment thresholds. This seems somewhat unlikely given the broad influence of neuromodulatory inputs across populations of motor neurons.

      We agree with the reviewer’s point and reasoning that a mixture of neuromodulation and inhibition likely introduced the variability in MU activity we observed in this study. This comment is addressed in the answer to comment 3.

      Reviewer #2 (Public Review):

      [...]

      1. Some subjects seemed to hit TIII by repeatedly "pumping" the force up and down to increase the excitability of MU2 (this appears to happen in TIII trials 2-6 in Fig. 4 - c.f. p18 l30ff). It would be useful to see single-trial time series plots of MU1, MU2, and force for more example trials and sessions, to get a sense for the diversity of strategies subjects used. The authors might also consider providing additional analyses to test whether multiple "pumps" increased MU2 excitability, and if so, whether this increase was usually larger for MU2 than MU1. For example, they might plot the ratio of MU2 (and MU1) activation to force (or, better, the residual discharge rate after subtracting predicted discharge based on a nonlinear fit to the ramp data) over the course of the trial. Is there a reason to think, based on the data or previous work, that units with comparatively higher thresholds (out of a sample selected in the low range of <10% MVC) would have larger increases in excitability?


      We added a supplementary figure (Supplement 4) that visualizes additional trials from different conditions and subjects for TIII-instructed trials and noted this in the text.

      MU excitability might indeed be pronounced during repeated activations within a couple of seconds (see, for example, M. Gorassini, J. F. Yang, M. Siu, and D. J. Bennett, “Intrinsic Activation of Human Motoneurons: Reduction of Motor Unit Recruitment Thresholds by Repeated Contractions,” J. Neurophysiol., vol. 87, no. 4, pp. 1859–1866, 2002.). Such an effect, however, seems to be equally distributed to all active MUs. Moreover, we are not aware of any recent studies suggesting that MUs, within the narrow range of 0-10% MVC, may be excited differently by such a mechanism. Supplement 4C and D illustrate trials in which subjects performed multiple “pumps”. Visually, we could not find changes in the excitability specific to any of the two MUs nor that subjects explored repeated activation of MUs as a strategy to reach TIII. It seems subjects instead tried to find the precise force level which would allow them to keep MU2 active after the offset of MU1. We further discussed that PICs act very broadly on all MUs. The observed discharge patterns when successfully reaching TIII may likely be due to an interplay of broadly distributed neuromodulation and locally acting synaptic inhibition.

      1. I am somewhat surprised that subjects were able to reach TIII at all when the de-recruitment threshold for MU1 was lower than the de-recruitment threshold for MU2. It would be useful to see (A) performance data, as in Fig. 3D or 5A, conditioned on the difference in de-recruitment thresholds, rather than recruitment thresholds, and (B) a scatterplot of the difference in de-recruitment vs the difference in recruitment thresholds for all pairs.


      We agree that comparing the difference in de-recruitment threshold with the performance of reaching each target might provide valuable insights into the strategies used to perform the tasks. Hence, we added this comparison to Figure 4E at p. 16, l. 1. A scatterplot of the difference in de-recruitment threshold and the difference in recruitment threshold has been added to Supplement 3A. The Results section was modified in line with the above changes.

      1. Using MU1 / MU2 rates to directly control cursor position makes sense for testing for independent control over the two MUs. However, one might imagine that there could exist a different decoding scheme (using more than two units, nonlinearities, delay coordinates, or control of velocity instead of position) that would allow subjects to generate smooth trajectories towards all three targets. Because the authors set their study in a BCI context, they may wish to comment on whether more complicated decoding schemes might be able to exploit single-unit EMG for BCI control or, alternatively, to argue that a single degree of freedom in input fundamentally limits the utility of such schemes.


      This study aimed to assess whether humans can learn to decorrelate the activity between two MUs coming from the same functional MU pool during constraint isometric conditions. The biofeedback was chosen to encourage subjects to perform this non-intuitive and unnatural task. Transferring biofeedback on single MUs into an application, for example, BCI control, could include more advanced pre-processing steps. Not all subjects were able to navigate the cursor along both axes consistently (always hitting TI and TIII). However, the performance metric (Figure 4C) indicated that subjects became better over time in diverging from the diagonal and thus increased their moving range inside the 2D space for various combinations of MU pairs. Hence, a weighted linear combination of the activity of both MUs (for example, along the two principal components based on the cursor distribution) may enable subjects to navigate a cursor from one axis to another. Similarly, coadaptation methods or different types of biofeedback (auditory or haptic) may help subjects. Furthermore, using only two MUs to drive a cursor inside a 2-D space is prone to interference. Including multiple MUs in the control scheme may improve the performance even in the presence of noise. We have shown that the activation of a single MU pool exposed to a common drive does not necessarily obey rigid control. State-dependent flexible control due to variable intrinsic properties of single MUs may be exploited for specific applications, such as BCI. However, further research is necessary to understand the potentials and limits of such a control scheme.

      1. The conclusions of the present work contrast somewhat with those of Marshall et al. (ref. 24), who claim (for shoulder and proximal arm muscles in the macaque) that (A) violations of the "common drive" hypothesis were relatively common when force profiles of different frequencies were compared, and that (B) microstimulation of different M1 sites could independently activate either MU in a pair at rest. Here, the authors provide a useful discussion of (A) on p19 l11ff, emphasizing that independent inputs and changes in intrinsic excitability cannot be conclusively distinguished once the MU has been recruited. They may wish to provide additional context for synthesizing their results with Marshall et al., including possible differences between upper / lower limb and proximal / distal muscles, task structure, and species.

      The work by Marshall, Churchland and colleagues shows that when stimulating focally in specific sites in M1 single MUs can be activated, which may suggest a direct pathway from cortical neurons to single motor neurons within a pool. However, it remains to be shown if humans can learn to leverage such potential pathways or if the observations are limited to the artificially induced stimulus. The tibialis anterior receives a strong and direct cortical projection. Thus, we think that this muscle may be well suited to study whether subjects can explore such specific pathways to activate single MUs independently. However, it may very well be that the control of upper limbs show more flexibility than lower ones. However, we are not aware of any study that may provide evidence for a critical mismatch in the control of upper and lower limb MU pools. We have added this discussion to the manuscript.

      Reviewer #3 (Public Review):

      [...]

      Even if the online decomposition of motor units were performed perfectly, the visual display provided to subject smooths the extracted motor unit discharge rates over a very wide time window: 1625 msec. This window is significantly larger than the differences in recruitment times in many of the motor unit pairs being used to control the interface. So while it's clear that the subjects are learning to perform the task successfully, it's not clear to me that subjects could have used the provided visual information to receive feedback about or learn to control motor unit recruitment, even if individuated control of motor unit recruitment by the nervous system is possible. I am therefore not convinced that these experiments were a fair test of subjects' ability to control the recruitment of individual motor units.

      Regarding the validating of isolating motor units in the conditions analysed in this study, we have added a full new set of measurements with concomitant surface and intramuscular recordings during recruitment/derecruitment of motor units at variable recruitment speed. This provides a strong validation of the approach and of the accuracy of the online decomposition used in this study. Subjects received visual feedback on the activity of the selected MU pair, i.e. discharge behaviour of both MUs and the resulting cursor movement. This information was not clear from the initial submission and hence, we annotated the current version to clarify the biofeedback modalities. To further clarify the decoding of incoming MU1/MU2 discharge rates into cursor movement, we included Supplement 2. We also included a video that shows that the smoothing window on the cursor position does not affect the immediate cursor movement due to incoming spiking activity. For example, as shown in Supplement 2, for the initial offset of 0ms, the cursor starts moving along the axis corresponding to a sole activation of MU1 and immediately diverges from this axis when MU2 starts to discharge action potentials. We, therefore, think that the biofeedback provided to the subjects does allow exploration of single MU control.

      Along similar lines, it seems likely to me that subjects are using some other strategy to learn the task, quite possibly one based on control of over overall force at the ankle and/or voluntary recruitment of other leg/foot muscles. Each of these variables will presumably be correlated with the activity of the recorded motor units and the movement of the cursor on the screen. Moreover, because these variables likely change on a similar (or slower) timescale than differences in motor units recruitment or derecruitment, it seems to me that using such strategies, which do not reflect or require individuated motor unit recruitment, is a highly effective way to successfully complete the task given the particular experimental setup.

      In addition to being seated and restricted by an ankle dynamometer, subjects were instructed to only perform dorsiflexion of the ankle. Further, none of the subjects reported compensatory movements as a strategy to reach any of the targets. In addition, to be successfully utilised, such compensatory movements would need to influence various combinations of MUs tested in this study equally, even when they differ in size. Nevertheless, we acknowledge, as pointed out by the reviewer, that our setup has limitations. We only measured force in a single direction (i.e. ankle dorsiflexion) and did not track toe, hip or knee movements. Even though an instructor supervised leg movement throughout the experiment, it may be that very subtle and unknowingly compensatory movements have influenced the activity of the selected MUs. Hence, we updated the limitations section in the Discussion.

      To summarize my above two points, it seems like the author's argument is that absence of evidence (subjects do not perform individuated MU recruitment in this particular task) constitutes evidence of absence (i.e. is evidence that individuated recruitment is not possible for the nervous system or for the control of brain-machine interfaces). Therefore given the above-described issues regarding real-time feedback provided to subjects in the paper it is not clear to me that any strong conclusions can be drawn about the nervous system's ability or inability to achieve individuated motor unit recruitment.

      We hope that the above changes clarify the biofeedback modalities and their potential to provide subjects with the necessary information for exploring independent MU control. Our experiments aimed to investigate whether subjects can learn under constraint isometric conditions to decorrelate the activity between two MUs coming from the same functional pool. While it seemed that MU activity could be decorrelated, this almost exclusively happened (TIII-instructed trials) within a state-dependent framework, i.e. both MUs must be activated first before the lower threshold one is switched off. We did not observe flexible MU control based exclusively on a selective input to individual MUs (MU2 activated before MU1 during initial recruitment). That does not mean that such control is impossible. However, all successful control strategies that were voluntarily explored by the subjects to achieve flexible control were based on a common input and history-dependent activation of MUs. We have added these concepts to the discussion section.

      Second, to support the claims based on their data the authors must explain their online spike-sorting method and provide evidence that it can successfully discriminate distinct motor unit onset/offset times at the low latency that would be required to test their claims. In the current manuscript, authors do not address this at all beyond referring to their recent IEEE paper (ref [25]). However, although that earlier paper is exciting and has many strengths (including simultaneous recordings from intramuscular and surface EMGs), the IEEE paper does not attempt to evaluate the performance metrics that are essential to the current project. For example, the key metric in ref 25 is "rate-of-agreement" (RoA), which measures differences in the total number of motor unit action potentials sorted from, for example, surface and intramuscular EMG. However, there is no evaluation of whether there is agreement in recruitment or de-recruitment times (the key variable in the present study) for motor units measured both from the surface and intramuscularly. This important technical point must be addressed if any conclusions are to be drawn from the present data.

      We have taken this comment in high consideration, and we have performed a validation based on concomitant intramuscular and surface EMG decomposition in the exact experimental conditions of this study, including variations in the speed of recruitment and de-recruitment. This new validation fully supports the accuracy in of the methods used when detecting recruitment and de-recruitment of motor units.

      My final concern is that the authors' key conclusion - that the nervous system cannot or does not control motor units in an individuated fashion - is based on the assumption that the robust differences in de-recruitment time that subjects display cannot be due to differences in descending control, and instead must be due to changes in intrinsic motor unit excitability within the spinal cord. The authors simply assert/assume that "[derecruitment] results from the relative intrinsic excitability of the motor neurons which override the sole impact of the receive synaptic input". This may well be true, but the authors do not provide any evidence for this in the present paper, and to me it seems equally plausible that the reverse is true - that de-recrutiment might influenced by descending control. This line of argumentation therefore seems somewhat circular.

      When subjects were asked to reach TIII, which required the sole activation of a higher threshold MU, subjects almost exclusively chose to activate both MUs first before switching off the lower threshold MU. It may be that the lower de-recruitment threshold of MU2 was determined by descending inputs changing the excitability of either MU1 or MU2 (for example, see J. Nielsen, C. Crone, T. Sinkjær, E. Toft, and H. Hultborn, “Central control of reciprocal inhibition during fictive dorsiflexion in man,” Exp. brain Res., vol. 104, no. 1, pp. 99–106, Apr. 1995 or E. Jankowska, “Interneuronal relay in spinal pathways from proprioceptors,” Prog. Neurobiol., vol. 38, no. 4, pp. 335–378, Apr. 1992). Even if that is the case, it remains unknown why such a command channel that potentially changes the excitability of a single MU was not voluntarily utilized at the initial recruitment to allow for direct movement towards TIII (as direct movement was preferred for TI and TII). We cannot rule out that de-recruitment was affected by selective descending commands. However, our results match observations made in previous studies on intrinsic changes of MU excitability after MU recruitment. Therefore, even if descending pathways were utilized throughout the experiment to change, for example, MU excitability, subjects were not able to explore such pathways to change initial recruitment and achieve general flexible control over MUs. The updated discussion explains this line of reasoning.

      Reviewer #4 (Public Review):

      [...]

      1. Figure 6a nicely demonstrates the strategy used by subjects to hit target TIII. In this example, MU2 was both recruited and de-recruited after MU1 (which is the opposite of what one would expect based on the standard textbook description). The authors state (page 17, line 15-17) that even in the reverse case (when MU2 is de-recruited before MU1) the strategy still leads to successful performance. I am not sure how this would be done. For clarity, the authors could add a panel similar to panel A to this figure but for the case where the MU pairs have the opposite order of de-recruitment.

      We have added more examples of successful TIII-instructed trials in Supplement 4. Supplement 4C and D illustrate examples of subjects navigating the cursor inside TIII even when MU2 was de-recruited before MU1. As exemplarily shown, subjects also used the three-stage approach discussed in the manuscript. In contrast to successful trials in which MU2 was de-recruited after MU1 (for example, Supplement 4B), subjects required multiple attempts until finding a precise force level that allowed a continuous firing of MU2 while MU1 remained silent. We have added a possible explanation for such behaviour in the Discussion.

      1. The authors discuss a possible type of flexible control which is not evident in the recruitment order of MUs (page 19, line 27-28). This reasoning was not entirely clear to me. Specifically, I was not sure which of the results presented here needs to be explained by such mechanism.

      We have shown that subjects can decorrelate the discharge activity of MU1 and MU2 once both MUs are active (e.g. reaching TIII). Thus, flexible control of the MU pair was possible after the initial recruitment. Therefore, this kind of control seems strongly linked to a specific activation state of both MUs. We further elaborated on which potential mechanisms may contribute to this state-dependent control.

      1. The authors argue that using a well-controlled task is necessary for understanding the ability to control the descending input to MUs. They thus applied a dorsi-flexion paradigm and MU recordings from TA muscles. However, it is not clear to what extent the results obtained in this study can be extrapolated to the upper limb. Controlling the MUs of the upper limb could be more flexible and more accessible to voluntary control than the control of lower limb muscles. This point is crucial since the authors compare their results to other studies (Formento et al., bioRxiv 2021 and Marshall et al., bioRxiv 2021) which concluded in favor of the flexible control of MU recruitment. Since both studies used the MUs of upper limb muscles, a fair comparison would involve using a constrained task design but for upper limb muscles.

      We agree with the reviewer that our work differs from previous approaches, which also studied flexible MU control. We, therefore, added a paragraph to the limitation section of the Discussion.

      1. The authors devote a long paragraph in the discussion to account for the variability in the de-recruitment order. They mostly rely on PIC, but there is no clear evidence that this is indeed the case. Is it at all possible that the flexibility in control over MUs was over their recruitment threshold? Was there any change in de-recruitment of the MUs during learning (in a given recording session)?

      The de-recruitment threshold did not critically change when compared before and after the experiment on each day (difference in de-recruitment threshold before and after the experiment: -0.16 ± 2.28% MVC, we have now added this result to the Results section). Deviations from the classical recruitment order may be achieved by temporal (short-lived) changes in the intrinsic excitability of single MUs. We, therefore, extended our discussion on potential mechanisms that may explain the observed variability given all MUs receive the same common input.

      1. The need for a complicated performance measure (define on page 5, line 3-6) is not entirely clear to me. What is the correlation between this parameter and other, more conventional measures such as total-movement time or maximal deviation from the straight trajectory? In addition, the normalization process is difficult to follow. The best performance was measured across subjects. Does this mean that single subject data could be either down or up-regulated based on the relative performance of the specific subject? Why not normalize the single-subject data and then compare these data across subjects?

      We employed this performance metric to overcome shortcomings of traditional measures such as target hit count, time-to-target or deviation from the straight trajectory. Such problems are described in the illustration below for TIII-instructed trials (blue target). A: the duration of the trial is the same in both examples (left and right); however, on the left, the subject manages to keep the cursor close to the target-of-interest while on the right, the cursor is far away from the target centre of TIII. B: In both images the cursor has the same distance d to the target centre of TIII. However, on the left, the subject manages to switch off MU1 while keeping MU2 active, while on the right, both MUs are active. C: On the left, the subject manages to move the cursor inside the TIII before the maximum trial time was reached, while on the right, the subject moved the cursor up and down, not diverging from the ideal trajectory to the target centre but fails to place the cursor inside TIII within the duration of the trial. In all examples, using only one conventional measure fails to account for a higher performance value in the left scenario than in the right. Our performance metric combines several performance metrics such as time-to-target, distance from the target centre, and the discharge rate ratio between MU1 and MU2 via the angle 𝜑 and thus allows a more detailed analysis of the performance than conventional measures. The normalisation of the performance value was done to allow for a comparison across subjects. The best and worst performance was estimated using synthetic data mimicking ideal movement towards each target (i.e. immediate start from the target origin to the centre of the target, while the normalised discharge rate of the corresponding MU is set to 1). Since the target space is normalised for all subjects in the same manner (mean discharge rate of the corresponding MUs at 10 %MVC) this allows us to compare the performance between subjects, conditions and targets.

      1. Figure 3C appears to indicate that there was only moderate learning across days for target TI and TII. Even for target TIII there was some improvement but the peak performance in later days was quite poor. The fact that the MUs were different each day may have affected the subjects' ability to learn the task efficiently. It would be interesting to measure the learning obtained on single days.

      We have added an analysis that estimated the learning within a session per subject and target (Supplement 3C). In order to evaluate the strength of learning within-session, the Spearman correlation coefficient between target-specific performance and consecutive trials was calculated and averaged across conditions and days. The results suggest that there was little learning within sessions and no significant difference between targets. These results have now been added to the manuscript.

      1. On page 16 line 12-13, the authors describe the rare cases where subjects moved directly towards TIII. These cases apparently occurred when the recruitment threshold of MU2 was lower. What is the probable source of this lower recruitment level in these specific trials? Was this incidental (i.e., the trial was only successful when the MU threshold randomly decreased) or was there volitional control over the recruitment threshold? Did the authors test how the MU threshold changed (in percentages) over the course of the training day?

      We did not track the recruitment threshold throughout the session but only at the beginning and end. We could not identify any critical changes in the recruitment order (see Results section). However, our analysis indicated that during direct movements towards TIII, MU2 (higher threshold MU) was recruited at a lower force level during the initial ramp and thus had a temporary effective recruitment threshold below MU1. It is important to note that these direct movements towards TIII only occurred for pairs of MUs with a similar recruitment threshold (see Figure 6). One possible explanation for this temporal change in recruitment threshold could be altered excitability due to neuromodulatory effects such as PICs (see Discussion). We have added an analysis that shows that direct movements towards TIII occurred in most cases (>90%) after a preceding TII- or TIIIinstructed trial. Both of these targets-of-interest require activation of MU2. Thus, direct movement towards TIII was likely not the result of specific descending control. Instead, this analysis suggests that the PIC effect triggered at the preceding trial was not entirely extinguished when a trial ending in direct movement towards TIII started. Alternatively, the rare scenarios in which direct movements happened could be entirely random. Similar observations were made in previous biofeedback studies [31]. To clarify these points, we altered the manuscript.

    1. United States, researchers have long found that echo chambers are smaller and less prevalent than commonly assumed

      research continues to show that echo chambers are not as prevalent or important as we may think

      • reminds me of how facebook is known for this (Zucked)
  6. tandfbis.s3.amazonaws.com tandfbis.s3.amazonaws.com
    1. Cognitive flexibility is the ability to change how we think about something—to see things from another person’s point of view, consider multiple options, think of several ways to respond, and seek information that may not be readily available

      I think Cognitive Flexibility is a very interesting concept which requires personal effort as it encourages us to change our mentality regarding something which is important because we need to have the ability to think differently.It helps us to think of new ideas.This skill is very useful in academic and work environments as it allows us to think with keeping in mind another person's point of view.

    1. Author Response:

      Reviewer #1

      1: “A major weakness was that the simulation algorithm was both highly complex, but insufficiently explained. As a consequence, it was not clear what the underlying assumptions of the simulations were and how these assumptions were based on and/or constrained by the experiments.”

      We have revised the section related to the simulation algorithm. This reviewer also raised a similar issue and suggested adding pseudocode or explaining it in plain language. We have therefore included two sections, “Cell-fate simulation algorithm” and “Cell-fate simulation options with Operation data”, as well as Figure 7, Figure 8 and Supplementary Figure 9.

      In our previous version of the manuscript, we named the data used for the simulation as “Source data”. However, we realize that this journal uses this term for other purposes. We have therefore changed “Source data” to “Operation data” to avoid confusion.

      1. “The single-cell analysis, including measuring lineages, by itself is not cutting-edge and has been done before, and so the novelty should be in the analysis.”

      We agree that single-cell tracking per se is not a new technology, and was carried out as early as 1989 using 16 mm film. However, it has not been used frequently in the field of cell biology because of its extremely laborious nature. Our focus was thus on the development of a single-cell tracking technique that could be used routinely in cell biological research. We therefore computerized the analysis (preprint, BioRxiv 508705; doi: https://doi.org/10.1101/508705 (2018)) to allow the generation of large amounts of single-cell tracking data for bioinformatics analysis. We have mentioned this in the Results (“System to investigate the functional implications of maintaining low levels of p53 in unstressed cells”).

      1. “However, in many cases, the resulting data is presented in a manner that does not rely on the single-cell tracking (e.g. total cell number vs time in Fig. 2, average frequency of events in Fig. 4).”

      We realize that we did not adequately explain the data relating to Figure 2. Counting experiments were performed to validate the results of single-cell tracking data, because such verification has not previously been performed. We therefore intended to produce a figure including both the actual counting data and single-cell tracking data together, to allow the readers to compare the results obtained by the different approaches. Although this reviewer commented that some data did “not rely on the single-cell tracking”, we would like to stress that the counting data were only used for the purpose of comparison. We have thus rewritten the “Effect of silencing the low levels of p53 on cell population expansion” in the Results, to clarify this.

      1. “The impact of p53 was only assessed on level of differences between experimental conditions (p53 siRNA or not), but p53 levels themselves were not measured and therefore not incorporated in the single-cell analysis.”

      To the best of our knowledge, there are currently no techniques that allow the expression levels of proteins or genes of interest to be determined in individual live cells that are being tracked, and which could thus be used to generate data for bioinformatics analysis. It may be possible to use cells expressing a fluorescence-tagged protein, but as noted by this reviewer, frequent excitement of fluorophores in cells could affect cell growth (phototoxicity). We have thus been searching for a suitable technique that could be combined with single-cell tracking since 2012. If it becomes possible to perform an experiment similar to that suggested by this reviewer, it could potentially reveal many unknown cellular characteristics. We have revised the Discussion to consider this matter.

      1. “In general, differences between wild-type and p53 siRNA data were small, while cell-to-cell variability in p53 knock-down appears high (as judged by Supplementary Fig. 4). This leaves open whether the relatively minor difference between wild-type and p53 siRNA cells reflects variability in p53 knockdown between cells, which is currently not directly assessed.”

      With regard to the “differences between wild-type and p53 siRNA data were small”, we would like to make a comment related to the small difference. In a typical study of p53, a lethal dose of an agent that could kill a majority of growing cells within e.g. 24-48 hrs has been used to detect a difference with control cells. A reason to use the lethal dose of agents is to make the status of cells homogeneous to detect any alteration of interest using average-based techniques, which represent the alteration that occurred in a majority of cells. On the other hand, when lower doses of agents are used, cell-to-cell heterogeneity has to be talking into account, as only a certain group of cells in a cell population may respond to the agents. In this case, only a small or no difference may be able to detect by the average-based analyses, if only a small number of cells in a cell population respond. Distance from the average-based analysis, single-cell tracking is a technique that allows quantitative analysis of alteration that occurred in individual cells in a cell population. By Western blotting, which is an average-based assay, (Supplementary Fig. 4), the level of p53 in unstressed cells was reduced to 30%. As the levels of p53 in unstressed cells are already low, a 70% reduction of the amount of p53 may be considered to be small. However, at the individual cell levels, it was sufficient to increase cell death, multipolar cell division, and cell fusion (Fig. 4). Thus, analysis of cells at the single-cell level could allow obtaining information that is difficult to find by the average-based analysis.

      The comment related to “reflects variability”, however, made an important point. It is currently technically difficult to determine the expression levels of p53 or other proteins in individual live cells that are being tracked by long-term live-cell imaging. We therefore assumed that silencing reduced the levels of p53 in all the tracked cells. However, it is reasonable to expect variations in the silencing levels of p53 among individual cells, and it may be possible that cells in which p53 levels were reduced, e.g. to 0%, underwent cell death, while cells in which expression was only reduced to 50% underwent cell fusion, etc. Information on the levels of silencing in each cell would allow us to evaluate the relationship between p53 levels and the type of induced events. However, this analysis is currently technically difficult, as explained above. Nevertheless, the fact that silencing induced changes in cell fate suggested that the low background levels of p53 may have some functions. We have revised “Silencing of p53 and single-cell tracking” in the Results.

      Reviewer #2

      “The study's main weakness is the lack of empirical evidence from the simulation predictions of biology, and that the cellular consequences of p53 function were predictable and mostly confirmatory.”

      We appreciate these interesting comments regarding the similarities and differences of the empirical and simulation approaches. In empirical studies, a model or hypothesis is often based on the results of an analysis that aims to reveal characteristics of interest e.g. of cells. However, such a model or hypothesis generally needs to be confirmed or tested independently. We therefore considered simulation as a tool to build a model or hypothesis, which also needed to be confirmed or tested.

      Simulation could thus be considered as an additional tool, e.g. in addition to western blotting and DNA sequencing, which could generate different types of data than other existing techniques. We therefore think that such simulations could provide new options for cell biological studies. Regarding its “confirmatory” use, we think that simulation can be used to confirm existing models, but may also be used as a discovery tool. For example, p53-knockout cells are known to produce tetraploid cells, but how such cells are formed remains unclear. Single-cell tracking analysis can be used to fill the gap between the loss of p53 and tetraploid cell formation, and simulation can then be used to simulate the fate of cells generated by this loss.

      Although we focused on describing our approach using single-cell tracking and cell-fate simulation in our manuscript, we believe these methods could be used in combination with empirical studies, to widen the cell biological research options.

      We have discussed these issues in “Cell fate simulation and its applications” in the Discussion.

      Reviewer #3

      "Yet it is unclear how these results can be generalized because the authors only studied one cell line."

      The current work focused on addressing a biological question using single-cell tracking and cellfate simulation; however, it will also be interesting to see if the proposed models can be generalized. Given that HeLa cells, in which p53 function is neutralized by papillomavirus E6 protein, also frequently undergo cell fusion followed by multipolar cell division and cell death (Sato, Rancourt, Sato and Satoh Sci Rep (2916) 6:23328), we believe that the low levels of p53 may also play a similar role in suppressing those events in many other types of cells.

      "The results are not compared to other cell lines or primary cells, in terms of baseline expression of p53. "

      We agree that it will be interesting to apply the methods in various types of cells and primary cell lines. However, there are significant variations in growth profiles among cell types. We have created live-cell imaging videos for > 30 cell lines, and found that each cell type showed unique characteristics in terms of growth patterns, frequencies of cell death, cell fusion, and multipolar cell division, and in the degree of cell-to-cell heterogeneity, implying that each cell type must be characterized using single-cell tracking analysis before moving on to studies using those cells, given that no such data are currently available. We believe that establishing a public data archive of single-cell tracking data will be useful for cell biological research, as well as for testing the current model.

      "In addition, it is unclear how this model is superior to testing homeostatic p53 compares to models that use mutated p53.”

      Most cancer cells carrying p53 gene mutations still express mutant p53 in the cytoplasm, and mutant p53 is suggested to confer gain-of-function in cancer cells. The characteristics of the cells used in the current study were related to the p53 null phenotype, but it will be interesting to determine if cancer cells carrying mutant p53 have a null+gain-of-function phenotype, or if gainof-function alters the null phenotype, in order to further understand the role of p53 in tumorigenesis. Such a study will require a large amount of work, but is probably feasible.

      In addition to our responses, we would like to take this opportunity to discuss the cell biological meaning of “generality”. For example, if a response is detected in cell types A, B, and C by e.g. enzymatic assay, quantitation of protein expression levels, and staining of cells, it is often concluded that the response is commonly induced in those cells (generalized). However, as noted by this reviewer, the levels of responses may vary among cells, and commonly induced responses may thus only occur in a specific group of cells in the A, B, and C cell populations. In this case, such responses may not be generally induced in cell types A, B, and C, but only in certain subpopulations of these cell populations. In the current study, cell death etc. were induced in the A549 cell population following p53 silencing, but not in the majority of A549 cells, indicating that this might not be “general” for A549 cells, according to the definition of “generality” used for classical experimental approaches. We have thus been considering the meaning of the term “general”. Each cell in a cell population may have a different status, and without knowing the context affecting the status of each cell, it is not possible to establish “generality”. Information regarding the context of each cell in various types of cell populations is currently lacking, and we do not know how many contexts exist. In the current study, we described one context related to A549 cells, but there will be many other contexts, which may be similar to or distinct from A549 cells. We therefore consider that we are still at the stage of revealing such contexts, e.g. contexts for cancer cells carrying p53 mutation and for metastatic cells, and some commonality may begin to emerge after more contexts have been revealed. However, revealing these contexts will require extensive work, and we hope that other groups will also show an interest in this type of study.

      We have addressed some these points in the revised Discussion.

      “The tools described, including the DIC tracking software and the simulation algorithms would be useful additions to the biologist's toolkit. The direct visualization of siRNA transfection agents through DIC, and its integration with western blotting is novel, and the authors may consider preparing a protocol or methods paper that describes this in more detail, as it may be useful for trouble-shooting when encountering difficulties with siRNA transfections. ”

      We appreciate the encouraging comments and would be happy to publish a protocol.

      “The use of white-light imaging is refreshing, as many of us in the field default to fluorescence imaging, which has the potential to interfere with cell proliferation. Overall, the approach is innovative by extracting the most information possible from optical imaging data sets, in the less invasive way possible.”

      We have been working on live-cell imaging since 2000 and had difficulty maintaining cell viability using fluorescent imaging. We therefore tried various light sources and found that nearinfrared light (not white light) was less toxic to the cells, allowing us to maintain cell cultures for at least a month on a microscope stage. We mentioned that near-infrared was used in the current study (“System to investigate the functional implications of maintaining low levels of p53 in unstressed cells” in the Results.

    1. g. 8) . The power of the photographs Spiegelman includes in Maus lies not in their evocation of memory, in the connection they can establish between present and past, but in their status as fragmen

      Indeed, the power of photographs lies in the fragments of history that we cannot take in. On December 16, 2014, Taliban stormed a children school in Peshawar, where more than a 100 children were killed. The photographs of blood bath and massacre in school still invites the most horrible memory our city Peshawar ever witnessed. It is that fragment of history we cannot take in. In contrast to this, when we see photographs of those young children dressed in uniforms, as a memory of who passed away in the attack still invokes a different kind of a meaning. A photograph freezes the moment between life and death. In that very moment, when a child was posing in school uniform, he was very well alive, unaware of what will happen to him. When today their parents hold photographs by protesting on roads to find justice hurts even more. After reading this, I think there is a need to do similar work which emphasizes that those killed in wars were human too. For instance placing the pictures of people in some seminar project where people could come and see who died in Drone strikes or military operations. It may evoke some anti-war sentiments that those killed were not just numbers or stats but human beings.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      The authors present further investigation of the Sox transcription factors in the model Cnidarian Hydractinia. They showcase the Hydractinia as now a relatively technically advanced model system to study animal stem cells, regeneration and the control of differentiation in animal cells. In this study they characterise the neural cells in hydractinia using FACS and sing cell transcriptome sequencing, investigate the sequential expression of SoxB genes in the i-cells and presumptive lineage giving rise to i-cells and investigate the neuronal regeneration making good use of transgenic rules. Finally, they investigate the role of SoxB genes in embryonic neurogenesis.

      There are no major or minor issues effecting the conclusions

      Reviewer #1 (Significance):

      This study helps to confirm the role of an important group of transcription factors is conserved across the metazoan as well as showcasing an exciting model organism for regeneration and stem cell biology. This will of interest to a broad audience of developmental and biologists.

      My own research is in the same field, using a different model system

      Referees cross-commenting

      I agree with the comments from the other reviewers, and am sure the authors can address these adequately with further explanation.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary

      Chrysostomou et al. investigate the role of three putative SoxB genes in embryonic neurogenesis in the colonial hydrozoan Hydractinia. They show that SoxB1 is co-expressed with Piwi in the multipotent i-cells and, using transgenics, they show that these Piwi/SoxB1 cells become neurons and gametes, consistent with the cell types that differentiate from i-cells. They further suggest that SoxB2 and SoxB3 are expressed downstream of SoxB1 in the progeny of the i-cells and, using shRNAs, investigate the role of SoxB genes on embryonic neurogenesis. The primary conclusions center on the similarity between neural differentiation in humans and Hydractinia as both systems pattern neurons using sequential expression of SoxB genes during the differentiation of neurons. The manuscript presents a large and diverse set of data derived from analysis of transgenic animals, single-cell sequencing, and investigation of gene function; despite this, the conclusions are either not particularly novel or not well-supported. The co-expression of SoxB1 in Piwi-expressing i-cells appears to be both novel and significant but the implications are not clearly indicated. Additional specific concerns are detailed below.

      Major comments

      1. SoxB genes act sequentially<br /> Knockdown of SoxB2 has already been shown to result in the loss of SoxB3, so the sequential action of SoxB genes in this animal does not seem to be a terribly novel conclusion.

      Sequential expression of Soxb1-Soxb2 has not been demonstrated previously. Flici et al. did show some data on Soxb1 expression but these were not detailed. Furthermore, they have not shown in vivo transition to Soxb2. Our new single-molecule fluorescence in situ hybridization, and the transgenic reporter animals have been developed to address these issues.

      While this manuscript does appear to report the most comprehensive analysis of SoxB1 expression, the evidence for sequential activation of SoxB1 and then SoxB2 in the same lineage (Figure 4) is a bit troubling. Panel A of this figure appears to show complete overlap between SoxB1 and SoxB2, suggesting all the cells in this field are synchronously passing through the transition point from SoxB1 to SoxB2 expression. While this may reflect reality, it would be more convincing to see adjacent cells expressing SoxB1 only or SoxB2 only, reflecting the dynamic progression of cell type specification along the main body axis.

      As shown in Figures 1, Soxb1 is expressed by i-cells (together with Piwi1) in the lower body column of feeding polyps and in germ cells in sexual polyps. These cells do not express Soxb2. Figure 2 shows that Soxb2 is expressed more orally in a population of putative i-cell progeny as they migrate towards the head. These cells still express Soxb1. In the upper part of the body column, just under the tentacle line, there are Soxb2+ cells that do not express Soxb1. Therefore, cells expressing Soxb1 but not Soxb2 are present in the basal part of the polyp, Soxb1+/Soxb2+ double positive cells in the mid body region (i.e., the interface between the two domains where Soxb1+ cells start to express Soxb2 and downregulate Soxb1.), and cells expressing Soxb2 but not Soxb1 in the upper part of the polyp, just under the tentacle line. In Figure 4, we show the interface between these two domains using in vivo imaging of double transgenic reporter animals to visualize the Soxb1 to Soxb2 transition. Indeed, in the mid body area, most Soxb1+ cells also express Soxb2 (Figure 2). Hence, Figure 4 should be seen keeping Figure 2’s data in mind. At the mRNA level, the overlap between the Soxb1 and Soxb2 domains is smaller (Figure 2) than the one shown in Figure 4 because the latter constitutes a lineage tracing, showing fluorescent proteins with a long half-life. Therefore, when i-cells downregulate Soxb1 while starting to express Soxb2, the long half-life of tdTomato results in red fluorescence persisting longer than the mRNA encoding it. We have added cartoons to Figure 4 to indicate the position along the main body axis that are depicted.

      Panel B is more concerning; while the authors have highlighted a cell that does appear to transition from SoxB1+ to SoxB1+/SoxB2+, there are several cells in the background that appear to gain SoxB2 expression without first expressing SoxB1. Do these cells constitute a fundamentally different, SoxB1-indpenendent, lineage of SoxB2+ cells? This would be noteworthy but is not mentioned or characterized.

      The panels included in Figure 4 constitute selected confocal slices of stacks acquired in vivo. During imaging, cells move in three dimensions, making them appear and disappear in given optical planes over time. In other words, the individual time frames shown (T0-T5) were not always found in the same plane due to cell migration in the Z dimension. The cells that appear to gain Soxb2+ w/o having expressed Soxb1 first are an example of such cells. They are probably Soxb2+ cells that had already downregulated Soxb1 and migrated into the respective plane of image. We have added the explanation to Figure 4's legend.

      Figure 7 shows the effect of SoxB1 knockdown (by shRNA) on the number of Piwi-expressing cells, nematocytes, etc but why not show that SoxB2 and SoxB3 are also knocked down in these experiments? Figure S11 shows no effect of SoxB2 and SoxB3 knockdown on SoxB1 expression but why wasn't the reciprocal experiment performed? If SoxB2 and SoxB3 are really downstream of SoxB1, the authors should demonstrate that with the shRNA experiments.

      Our data show that Soxb1 is expressed in i-cells and its KD reduces the number of these stem cells (assessed by expression of Piwi1, an i-cell marker). Because i-cells give rise to all Hydractinia somatic lineages (and to germ cells), focusing specifically on Soxb2+ cells would provide no further insight because all cell types are expected to be affected. Indeed, injection of shRNA targeting Soxb1 resulted in smaller animals with multiple defects, including but not limited to the neural lineage.

      1. Knockdown of SoxB genes resulted in complex defects in embryonic neurogenesis<br /> The manuscript aims to detail the roles of SoxB1, SoxB2, and SoxB3 in embryogenesis but only one of the main figures even shows pre-polyp life stages (Figure 7) and the results presented in in this figure are confusing. The authors suggest that knockdown of SoxB3 had no effect on embryonic neurogenesis but another interpretation of these data is that the SoxB3 shRNA simply did not work. The authors should provide additional support to show that this reagent is working as expected.

      This information is included in Figure S11. Using mRNA in situ hybridization, we show that injection of shRNA targeting Soxb3 causes transcriptional downregulation of Soxb3 but not of Soxb2. The figure also shows the specificities of the shRNAs targeting Soxb1 and Soxb2.

      Further, the results for SoxB1 and SoxB2 knockdown do not support the previous investigation of the role of SoxB2 in neurogenesis (Flici et al 2017). If SoxB1 is upstream of SoxB2, how does knockdown of SoxB1 have such a dramatic effect on RFamide neurons and nematocytes but knockdown of SoxB2 has an effect only on RFamide neurons? Is it possible the SoxB2 shRNA also wasn't working as expected? Can the results of the Flici et al 2017 paper showing SoxB2 knockdown in polyps be recapitulated using these shRNAs? If the point is to argue that embryos and adults (polyps) use fundamentally different mechanisms to drive neurogenesis, then the results presented in Figures 1-6 (which investigate SoxB genes in polyps) can't really be used to make inferences about embryonic neurogenesis. I think the authors have more work to do to demonstrate that embryonic and adult neurogenesis fundamentally differ.

      The Soxb2 shRNA specificity is shown in Figure S11 (i.e., it KD Soxb2 but not Soxb1). We were equally surprised to discover that Soxb2 KD resulted in somewhat different phenotypes than the ones obtained by Flici et al. (2017) in polyps. At this stage, we cannot explain the difference. However, one could speculate that it resulted from slightly different regulation logic between embryonic and adult neurogenesis. More specifically, we propose different priorities for generating neural subtypes as explanation. Unfortunately, shRNAs work only with embryos, and long dsRNA mediated KD works only with polyps. CRISPR/Cas9-mediated KO is feasible in Hydractinia, but knocking out developmental genes, such as these Sox genes, would likely cause embryonic lethality. Other conditional KO/KD approaches are not available for Hydractinia. We believe we have made all possible efforts to clarify the roles of these genes using currently available techniques. Neurogenesis is a complex process that is only partially conserved among different animals and poorly studied in non-bilaterians. Furthermore, it is not possible to answer all questions in one study. As many studies before, our work contributes to the understanding of neurogenesis but also raises new questions. Addressing them is matter for future research. We have toned down the statement in the last sentence of the results and in the discussion and do not claim that embryonic and adult neurogenesis are fundamentally different.

      Minor comments

      Methods: A large bit of data from this manuscript relies on quantitative analysis of cell number but there's not enough information in the methods to understand how quantification was performed. How many slices from the z-stack were analyzed? Were counts made relative to the total tissue area in the X/Y dimension or relative to the number of total nuclei in the same section? How many individuals were examined for each analysis?

      All cell counting analysis was performed using ImageJ/Fiji software. Counts were made relative to the total tissue area in the X/Y dimension (for the shRNA experiments). A Z-stack covering the whole depth of each larva was obtained. Counting was performed on cells positive for the respective cell type marker based on antibody staining and numbers were compared between shControl and shSoxb1/2/3 animals. At least 4 animals were counted per condition.

      Page 11 - "Piwi2low cells, which are presumably i-cell progeny" - how were "high" and "low quantified?

      “High” and “low” were not quantified. This is because i-cells progressively downregulate Piwi genes (i.e., Piwi1 and Piwi2) as they differentiate but this is a continuous process. Hence, it is difficult to put a threshold of Piwi1/Piwi2 protein level below which a cell ceases to be an i-cell while becoming a committed progeny. This is a similar process that is well documented in other animals where stemness markers are gradually downregulated during differentiation.

      Page 13 - "a role in maintaining stemness" - this comment is not totally clear to me. Why would the number of EdU+ cells increase if the role of SoxB1 is to maintain stemness? Wouldn't SoxB1 knockdown then force stem cells to exit their program, resulting in early differentiation of i-cell progeny? This should be clarified.

      KD of Soxb1 resulted in a decrease in the number of i-cells (i.e., Piwi1+ ones), suggesting that the gene is required for stemness maintenance. The increase in the numbers of cells in S-phase in this context was not related to i-cells because most of them were Piwi1-negative (Figure 7B). The identity of the cells in S-phase remains unknown, but a plausible explanation is that i-cell progeny (e.g., nematoblasts; see also next comment) increase their proliferative activity when i-cells numbers are low as a compensatory mechanism. This is merely a speculation. We have rephrased the paragraph to increase clarity.

      Page 13 - "if progenitors are limiting" - if progenitors are limited why would there be an increase in nematocytes?

      We do not have a definitive answer to this question but speculate that nematoblasts (i.e., stinging cell progenitors) account, at least in part, for the excessive proliferation seen under Soxb1 KD. This may constitute a mechanism allowing a depleted i-cell population to recover by self-renewal (instead of differentiation), moving temporarily the proliferation task to committed progeny (e.g., nematoblasts) until i-cell numbers return to normal. However, in the absence of evidence we refrain from expanding on this in the text.

      Figures 1 and 2 claim to show "partial overlap" but they look perfectly overlapping to me. This makes the situation in Figure 4B difficult to interpret.

      Figure 1 shows full overlap between Piwi1 and Sox1 expression and this is reflected in the text. Figure 2 shows no overlap between Soxb1 and Soxb2 in the lower body column (where only Soxb1 is expressed), overlap in the mid body region, and Soxb2 only expressing cells in the upper part of the body, just under the tentacle line. Similarly, the figure shows overlap between Soxb2/Soxb3 under the tentacle line, and predominantly Soxb3 above it in the head region. The small cartoons at the left side of each panel indicate its position along the oralaboral axis. See also our reply to the second part of comment #1.

      Figure 4 - No indication of which part of the animal or which stage is shown in these images.

      We have added cartoons to indicate the area in the polyp from where the images were taken.

      Figure 5 - No indication of where these dissociated cells came from - polyps? Larvae?

      All tissue samples were taken from feeding polyps; this is now mentioned in the Materials and Methods section.

      Panel D is a bit perplexing - what are the "progeny" of Piwi+ cells if not SoxB2+ cells and their derivatives?

      In Panel D, we show three cell fractions. One constitutes i-cells, based on high Piwi1 expression (green fluorescence of the Piwi1::GFP reporter transgene) and morphology; one fraction includes nematocytes, based on the characteristic nematocyst capsule, and one constitutes a mixture of other i-cell progeny. The latter includes different cell types, given that i-cells are thought to contribute to all lineages. They have only dim GFP fluorescence because the Piwi1 promoter-driven GFP shuts down upon i-cell differentiation. Soxb2+ cells are also among them but are not the only i-cell progeny.

      Why are nematocytes but not neurons indicated?

      Neurons are shown on Panels E & F. See also next comment.

      Piwi seems to be maintained in Ncol-expressing cells but not in SoxB2- or RFamide-expressing cells? Does this suggest that Piwi is turned on in i-cells, off in SoxB2-expressing cells, and on again in terminally differentiating nematocytes? This would be quite surprising and should be verified with antibody labeling/imaging in Piwi transgenics to confirm the result. The resolution for Panel M is too low to evaluate this part of the figure.

      The Piwi1i gene is downregulated upon i-cell differentiation. In the Piwi1:GFP reporter animal, residual GFP fluorescence persists post differentiation due to GFP's long half-life. The brightness of which depends on the time elapsed since differentiation. Because nematocytes are short living cells with high turnover, most nematocytes have recently differentiated and are therefore relatively bright green in the Piwi1::GFP animal. Neuron turnover is lower, making most neurons in the same transgenic animal appear dim. The resolution of the imaging flow cytometer is limited because the machine images 1000s of cells per second through all optical channels. However, it is high enough to allow the identification of features such as cell shape, some organelles (e.g., nematocytes), nuclear size and shape, and fluorescence intensity.

      Figure 7 - the low magnification images provide nice overall context but the authors should also provide high magnification panels for the same images. Without them it is not possible to assess "defects in ciliation" or to determine if there are defects in GLWamide neurons from these knockdowns (e.g., neurite vs cell body defects). There's no mention of the fact that SoxB1 knockdown resulted in complete loss of RFamide cells, which is strange. Are there SoxB2-independent populations of RFamide? Panel B could be interpreted multiple ways - downregulation of Piwi in SoxB1 shRNA or upregulation in SoxB2/B3. The authors should provide an image of control shRNA-injected larvae with the same co-labeling of Piwi/EdU for context. From the images, it's not clear that there were differential effects of SoxB2 and SoxB3 on nematocytes.

      The resolution of the images is, in fact, high, allowing it to be blown up on the screen. Even higher magnification of ciliation can be seen in Figure S12. KD of Soxb1 resulted in complete or nearly complete loss of Rfamide+ neurons. We have added this statement to the text as requested. Panel B shows the relative difference in Piwi1+ and S-phase cells between shSoxb1, shSoxb2, and shSoxb3-treated animals. The quantification relative to the control is presented in Figure 7C.

      Figures 6 and S9 - why piwi2 and not piwi1?

      In Figure 6, we co-stained the regenerates with two antibodies: one was a rabbit anti-GFP (to visualize the RFamide+ neurons), and the other was a guinea pig anti-Piwi2 (to visualize icells). The anti-Piwi1 antibody that was used in other images to visualize i-cells was raised in rabbit and could not be used in conjunction with the anti-GFP one.

      Figure S1 - Kayal et al 2018 is the most recent phylogeny of cnidarians and should probably be cited in place of Zapata throughout the manuscript. Independent of this, the polytomy in Figure S1 panel A is not supported by either Zapata or Kayal and should be fixed.

      We have cited Kayal et al. 2018 and revised the tree in Figure S1 as pointed.

      Figure S3 - is this mRNA? Protein? Panels E-G are too small to interpret. Please provide stage/time for cartoons in panel H.

      As per the legend, Panels A, B, D, E, F refer to protein; C is lectin staining (DSA), and G is EdU. The resolution of Panels E-G is actually high, allowing blowing up of the images on the screen to view the details. The stages of the cartoon in Panel H are now provided in the figure legend.

      Figure S11 - please provide images of whole larvae as shown for Piwi knockdown in Fig S9 and some additional support (e.g., qPCR) to demonstrate the shRNAs are actually working.

      Figure S9 represents immunostaining using the anti-Piwi1 antibody. In Figure S11, we show the specificity of the shRNA treatments; we used highly sensitive single-molecule mRNA in situ hybridization. Whole animal imaging is not informative due to the punctuated nature of the single-molecule staining.

      Figure S12 - it's not clear what ciliary "defects" are being shown.

      In the control, cilia are uniformly distributed along the oral-aboral axis whereas in the shSoxb1-injected animals, the pattern is patchy. Additionally, shSoxb1-injected larvae could not swim (planulae swim by coordinated cilia beat).

      Reviewer #2 (Significance):

      Generally, the results are either equivocal or the conclusions are not well supported by the results (as detailed above). The significance of this work to vertebrate neurobiology is somewhat weak. (Especially considering the orthology of these genes to bilaterian SoxB genes is not well supported.) Why not compare these results to other cnidarians - the expression patterns of SoxB1 and SoxB2 in corals and sea anemones seem to differ quite a lot (Shinzato et al 2008; Magie et al 2005), suggesting these genes are almost certainly not behaving in the same way across cnidarians. This is exciting! What's happening in Hydra? Seems like it should be possible to mine the single-cell data set from Siebert et al to test these hypothesized relationships between the Sox genes in another hydrozoan which constantly makes new neurons.

      We have modified the concluding section in the discussion, in line with this comment. See also comment to Reviewer #3.

      Reviewer #3 (Evidence, reproducibility and clarity):

      This paper characterizes the role of Soxb genes in neurogenesis in Hydractinia. The authors use cutting edge approaches including FISH, transgenics, image flow cytometry, FACS and shRNA knock downs to characterize SoxB in Hydractinia. The images are beautiful, the data is sound and the interpretation of the data is appropriate.

      I have only minor suggested listed by section below:

      Abstract<br /> - The abstract and introduction should make clear that this is a colonial animal and the cell migration occurs from the aboral to the oral end of the polyp (not the animal, as there are many oral ends). This is relevant to the interpretation of the data as the polyps do not act in isolation as they interconnected and may communicate via the stolonal network that connects the polyps in the colony.

      We have added a section to the Introduction to address the reviewer's comment. The Abstract, however, is too short to include this explanation.

      • The human disease justification is a relatively weak one and does not need to be included. Using Hydractinia to understand the role of SoxB in the evolution of neurogenesis in animals is enough justification for the study.

      We have adopted the reviewer's comment and modified the statement in the discussion (see also comment to Reviewer #2).

      Introduction<br /> - Instead of Sox phylogenies (the term phylogeny is more appropriate for species trees), consider substituting, for Sox gene trees. And instead of "phylogenetic relation" use the term "orthology"

      This has been done.

      • The number of times the sentences that have the sentiment "....remain unknown." "....little is known.." "...unclear..." , "....difficult to establish...." etc. is distracting and detracts from what IS known about these genes. It is not necessary to continually justify the study throughout the introduction. Instead a clearer description of the background and setting up the question/hypothesis of SoxB paralog subfuctionalization in space and time - would be more informative to the reader.

      We have reduced the number of occasions as recommended.

      • The authors state that there are three SoxB genes in the Hydractinia genome? What genome? For several years there has been multiple papers published by subsets of these authors have used unpublished genome data, but the complete genome has yet to be released to the public. This is especially egregious because they cite their NSF funded EDGE proposal to CEF and UF which is supposed to develop tools to the community, and yet the community at large doesn't have access to the genome. If these data came from the genome, then the genome should be released. If these data came from a previously published transcriptome as in the previous SoxB paper then this should be stated explicitly.

      The Hydractinia genome assembly, annotation, RNA-seq data, and genome browser are now available in the Hydractinia genome project portal at the National Human Genome Research Institute (NIH) website (https://research.nhgri.nih.gov/hydractinia/). The raw data have been deposited in the NCBI Sequence Read Archive (SRA) under BioProject PRJNA807936. This information has been added to the 'Resource availability' section.

      Results<br /> - I assume there was no expression of Soxb2 and Soxb3 in the reproductive polyps? This should be stated explicitly.

      Soxb2 expression in sexual polyps was consistent with the nervous system and with maternal deposition in oocytes. It was not detected in male germ cells. We have added a new in situ hybridization image of Soxb2 to Figure 12.

      • The word "progeny" is used throughout to describe terminally differentiated cells. However, progeny implies offspring, but these are actually later stages of differentiation of the in a cell's ontogeny, thus the term should be changed to "differentiated cells"

      We used "progeny" to indicate that the corresponding cells derived from a specific progenitor cell type. We did try replacing it with "differentiated cells" but this completely changes the meaning of the sentence: first, it does not include the cell of origin info and second, not all progeny are already fully differentiated.

      • Typo on page 11 "This predictable generation of many new neurons provides an opportunity to study neurogenesis in [a ]regeneration." - Remove the "a"

      Corrected.

      • While the regeneration study is interesting, there is nothing revealed about the role of Soxb and there is not a lot of new information revealed about regenerations. Authors should better justify this section or consider omitting.

      These sections demonstrate de novo neurogenesis in head regeneration. This was not known in this animal before.

      Discussion<br /> - The authors assume that in the transgenic lineage, the fluorescent marker in differentiated cells is due to retention of fluorescence, but it is unclear if they can rule out that Soxb2 is still being expressed in those cells" Please clarify.

      We conclude this by comparing the mRNA expression (Figures 1 & 2) with the fluorescent proteins (Figure 3).

      • How did the authors determine that the shSoxb3 knockdown worked? Please discuss relevant controls and validation (either in discussion or methods). This is particularly important given that it didn't have an apparent phenotypic effect.

      The efficacy of all shRNAs determined by in situ hybridization, showing that each shRNA downregulates its own target mRNA but not the others (Figure S11).

      • Again, the connection to human health is a bit of a stretch. Instead, what is most interesting is the similarity of Soxb paralogs acting sequentially as has been found in vertebrates. This suggests a highly conserved mechanism of subfunctionization following gene duplication at the base of animals.

      We agree. This is now also better highlighted in the discussion.

      Figures<br /> - Its very hard to distinguish the overall abundance of Soxb2 and Soxb3 expression along the polyp body axis from the panels figure 2. A lower magnification or larger area in each region would be helpful

      In Figure 2, we performed single-molecule in situ hybridization. While highly sensitive, this method generates spotty images because they highlight single molecules and are not coupled to an enzymatic reaction as in other methods. They mostly looks poor when showing low magnification images. Because a previous study (Flici et al. 2017) has already shown the general expression pattern, we aimed at providing the details of the transition.

      • Figure 4 - either the figure is upside down or the text is upside down. It is also difficult to see the double staining (if any).

      The figure is oriented to position the oral end up. The resolution of the panels is high, enabling blowing-up on the screen. The quality of in vivo time lapse images cannot match that of fixed and antibody stained ones, or of single in vivo images. This is because the animals are imaged for many hours during which they tend to bleach.

      • Figure 5M is difficult to read due to the small print. Consider enlarging and moving it to Supplementary Material

      The size of the text is small but the resolution is very high, enabling blowing up the image on the screen. We thought that the information was important enough to be presented in the main text and given that most readers would use the electronic version we preferred this option on another supplemental figure on top of the 12 we already have.

      Reviewer #3 (Significance):

      This is an interesting and important study because although it is well known that SoxB genes function in neurogenesis in animals, it is unclear how and if subfunctionalization occurs outside of vertebrates. Hydractinia is an excellent model to study SoxB genes because of its colonial organization and continuous development of nerve cells throughout the life of the animal. In addition, it is part of the early diverging cnidarian lineage and thus can provide insight into the relative conservation of SoxB genes across animals.

    1. Author Response:

      Reviewer #1 (Public Review):

      Overview

      This is a well-conducted study and speaks to an interesting finding in an important topic, whether ethological validity causes co-variation in gamma above and beyond the already present ethological differences present in systemic stimulus sensitivity.

      I like the fact that while this finding (seeing red = ethnologically valid = more gamma) seems to favor views the PI has argued for, the paper comes to a much simpler and more mechanistic conclusion. In short, it's good science.

      I think they missed a key logical point of analysis, in failing to dive into ERF <----> gamma relationships. In contrast to the modeled assumption that they have succeeded in color matching to create matched LGN output, the ERF and its distinct features are metrics of afferent drive in their own data. And, their data seem to suggest these two variables are not tightly correlated, so at very least it is a topic that needs treatment and clarity as discussed below.

      Further ERF analyses are detailed below.

      Minor concerns

      In generally, very well motived and described, a few terms need more precision (speedily and staircased are too inaccurate given their precise psychophysical goals)

      We have revised the results to clarify:

      "For colored disks, the change was a small decrement in color contrast, for gratings a small decrement in luminance contrast. In both cases, the decrement was continuously QUEST-staircased (Watson and Pelli, 1983) per participant and color/grating to 85% correct detection performance. Subjects then reported the side of the contrast decrement relative to the fixation spot as fast as possible (max. 1 s), using a button press."

      The resulting reaction times are reported slightly later in the results section.

      I got confused some about the across-group gamma analysis:

      "The induced change spectra were fit per participant and stimulus with the sum of a linear slope and up to two Gaussians." What is the linear slope?

      The slope is used as the null model – we only regarded gamma peaks as significant if they explained spectrum variance beyond any linear offsets in the change spectra. We have clarified in the Results:

      "To test for the existence of gamma peaks, we fit the per-participant, per-stimulus change spectra with three models: a) the sum of two gaussians and a linear slope, b) the sum of one Gaussian and a linear slope and c) only a linear slope (without any peaks) and chose the best-fitting model using adjusted R2-values."

      To me, a few other analyses approaches would have been intuitive. First, before averaging peak-aligned data, might consider transforming into log, and might consider making average data with measures that don't confound peak height and frequency spread (e.g., using the FWHM/peak power as your shape for each, then averaging).

      The reviewer comments on averaging peak-aligned data. This had been done specifically in Fig. 3C. Correspondingly, we understood the reviewer’s suggestion as a modification of that analysis that we now undertook, with the following steps: 1) Log-transform the power-change values; we did this by transforming into dB; 2) Derive FWHM and peak power values per participant, and then average those; we did this by a) fitting Gaussians to the per-participant, per-stimulus power change spectra, b) quantifiying FWHM as the Gaussian’s Standard Deviation, and the peak power as the Gaussian’s amplitude; 3) average those parameters over subjects, and display the resulting Gaussians. The resulting Gaussians are now shown in the new panel A in Figure 3-figure supplement 1.

      (A) Per-participant, the induced gamma power change peak in dB was fitted with a Gaussian added to an offset (for full description, see Methods). Plotted is the resulting Gaussian, with peak power and variance averaged over participants.

      Results seem to be broadly consistent with Fig. 3C.

      Moderate

      I. I would like to see a more precise treatment of ERF and gamma power. The initial slope of the ERF should, by typical convention, correlate strongly with input strength, and the peak should similarly be a predictor of such drive, albeit a weaker one. Figure 4C looks good, but I'm totally confused about what this is showing. If drive = gamma in color space, then these ERF features and gamma power should (by Occham's sledgehammer…) be correlated. I invoke the sledgehammer not the razor because I could easily be wrong, but if you could unpack this relationship convincingly, this would be a far stronger foundation for the 'equalized for drive, gamma doesn't change across colors' argument…(see also IIB below)…

      …and, in my own squinting, there is a difference (~25%) in the evoked dipole amplitudes for the vertically aligned opponent pairs of red- and green (along the L-M axis Fig 2C) on which much hinges in this paper, but no difference in gamma power for these pairs. How is that possible? This logic doesn't support the main prediction that drive matched differences = matched gamma…Again, I'm happy to be wrong, but I would to see this analyzed and explained intuitively.

      As suggested by the reviewer, we have delved deeper into ERF analyses. Firstly, we overhauled our ERF analysis to extract per-color ERF shape measures (such as timing and slope), added them as panels A and B in Figure 2-figure supplement 1:

      Figure 2-figure supplement 1. ERF and reaction time results: (A) Average pre-peak slope of the N70 ERF component (extracted from 2-12 ms before per-color, per-participant peak time) for all colors. (B) Average peak time of the N70 ERF component for all colors. […]. For panels A-C, error bars represent 95% CIs over participants, bar orientation represents stimulus orientation in DKL space. The length of the scale bar corresponds to the distance from the edge of the hexagon to the outer ring.

      We have revised the results to report those analyses:

      "The initial ERF slope is sometimes used to estimate feedforward drive. We extracted the per-participant, per-color N70 initial slope and found significant differences over hues (F(4.89, 141.68) = 7.53, pGG < 410 6). Specifically, it was shallower for blue hues compared to all other hues except for green and green-blue (all pHolm < 710-4), while it was not significantly different between all other stimulus hue pairs (all pHolm > 0.07, Figure 2-figure supplement 1A), demonstrating that stimulus drive (as estimated by ERF slope) was approximately equalized over all hues but blue.

      The peak time of the N70 component was significantly later for blue stimuli (Mean = 88.6 ms, CI95% = [84.9 ms, 92.1 ms]) compared to all (all pHolm < 0.02) but yellow, green and green-yellow stimuli, for yellow (Mean = 84.4 ms, CI95% = [81.6 ms, 87.6 ms]) compared to red and red-blue stimuli (all pHolm < 0.03), and fastest for red stimuli (Mean = 77.9 ms, CI95% = [74.5 ms, 81.1 ms]) showing a general pattern of slower N70 peaks for stimuli on the S-(L+M) axis, especially for blue (Figure 2-figure supplement 1B)."

      We also checked if our main findings (equivalence of drive-controlled red and green stimuli, weaker responses for S+ stimuli) are robust when controlled for differences in ERF parameters and added in the Results:

      "To attempt to control for potential remaining differences in input drive that the DKL normalization missed, we regressed out per-participant, per-color, the N70 slope and amplitude from the induced gamma power. Results remained equivalent along the L-M axis: The induced gamma power change residuals were not statistically different between red and green stimuli (Red: 8.22, CI95% = [-0.42, 16.85], Green: 12.09, CI95% = [5.44, 18.75], t(29) = 1.35, pHolm = 1.0, BF01 = 3.00).

      As we found differences in initial ERF slope especially for blue stimuli, we checked if this was sufficient to explain weaker induced gamma power for blue stimuli. While blue stimuli still showed weaker gamma-power change residuals than yellow stimuli (Blue: -11.23, CI95% = [-16.89, -5.57], Yellow: -6.35, CI95% = [-11.20, -1.50]), this difference did not reach significance when regressing out changes in N70 slope and amplitude (t(29) = 1.65, pHolm = 0.88). This suggests that lower levels of input drive generated by equicontrast blue versus yellow stimuli might explain the weaker gamma oscillations induced by them."

      We added accordingly in the Discussion:

      "The fact that controlling for N70 amplitude and slope strongly diminished the recorded differences in induced gamma power between S+ and S- stimuli supports the idea that the recorded differences in induced gamma power over the S-(L+M) axis might be due to pure S+ stimuli generating weaker input drive to V1 compared to DKL-equicontrast S- stimuli, even when cone contrasts are equalized.."

      Additionally, we made the correlation between ERF amplitude and induced gamma power clearer to read by correlating them directly. Accordingly, the relevant paragraph in the results now reads:

      "In addition, there were significant correlations between the N70 ERF component and induced gamma power: The extracted N70 amplitude was correlated across colors with the induced gamma power change within participants with on average r = -0.38 (CI95% = [-0.49, -0.28], pWilcoxon < 4*10-6). This correlation was specific to the gamma band and the N70 component: Across colors, there were significant correlation clusters between V1 dipole moment 68-79 ms post-stimulus onset and induced power between 28 54 Hz and 72 Hz (Figure 4C, rmax = 0.30, pTmax < 0.05, corrected for multiple comparisons across time and frequency)."

      II. As indicated above, the paper rests on accurate modeling of human LGN recruitment, based in fact on human cone recruitment. However, the exact details of how such matching was obtained were rapidly discussed-this technical detail is much more than just a detail in a study on color matching: I am not against the logic nor do I know of a flaw, but it's the hinge of the paper and is dealt with glancingly.

      A. Some discussion of model limitations

      B. Why it's valid to assume LGN matching has been achieved using data from the periphery: To buy knowledge, nobody has ever recorded single units in human LGN with these color stimuli…in contrast, the ERF is 'in their hands' and could be directly related (or not) to gamma and to the color matching predictions of their model.

      We have revised the respective paragraph of the introduction to read:

      "Earlier work has established in the non-human primate that LGN responses to color stimuli can be well explained by measuring retinal cone absorption spectra and constructing the following cone-contrast axes: L+M (capturing luminance), L-M (capturing redness vs. greenness), and S-(L+M) (capturing S-cone activation, which correspond to violet vs. yellow hues). These axes span a color space referred to as DKL space (Derrington, Krauskopf, and Lennie, 1984). This insight can be translated to humans (for recent examples, see Olkkonen et al., 2008; Witzel and Gegenfurtner, 2018), if one assumes that human LGN responses have a similar dependence on human cone responses. Recordings of human LGN single units to colored stimuli are not available (to our knowledge). Yet, sensitivity spectra of human retinal cones have been determined by a number of approaches, including ex-vivo retinal unit recordings (Schnapf et al., 1987), and psychophysical color matching (Stockman and Sharpe, 2000). These human cone sensitivity spectra, together with the mentioned assumption, allow to determine a DKL space for human observers. To show color stimuli in coordinates that model LGN activation (and thereby V1 input), monitor light emission spectra for colored stimuli can be measured to define the strength of S-, M-, and L-cone excitation they induce. Then, stimuli and stimulus background can be picked from an equiluminance plane in DKL space. "

      Reviewer #2 (Public Review):

      The major strengths of this study are the use of MEG measurements to obtain spatially resolved estimates of gamma rhythms from a large(ish) sample of human participants, during presentation of stimuli that are generally well matched for cone contrast. Responses were obtained using a 10deg diameter uniform field presented in and around the centre of gaze. The authors find that stimuli with equivalent cone contrast in L-M axis generated equivalent gamma - ie. that 'red' (+L-M) stimuli do not generate stronger responses than 'green (-L+M). The MEG measurements are carefully made and participants performed a decrement-detection task away from the centre of gaze (but within the stimulus), allowing measurements of perceptual performance and in addition controlling attention.

      There are a number of additional observations that make clear that the color and contrast of stimuli are important in understanding gamma. Psychophysical performance was worst for stimuli modulated along the +S-(L+M) direction, and these directions also evoked weakest evoked potentials and induced gamma. There also appear to be additional physiological asymmetries along non-cardinal color directions (e.g. Fig 2C, Fig 3E). The asymmetries between non-cardinal stimuli may parallel those seen in other physiological and perceptual studies and could be drawn out (e.g. Danilova and Mollon, Journal of Vision 2010; Goddard et al., Journal of Vision 2010; Lafer-Sousa et al., JOSA 2012).

      We thank the review for the pointers to relevant literature and have added in the Discussion:

      "Concerning off-axis colors (red-blue, green-blue, green-yellow and red-yellow), we found stronger gamma power and ERF N70 responses to stimuli along the green-yellow/red-blue axis (which has been called lime-magenta in previous studies) compared to stimuli along the red-yellow/green-blue axis (orange-cyan). In human studies varying color contrast along these axes, lime-magenta has also been found to induce stronger fMRI responses (Goddard et al., 2010; but see Lafer-Sousa et al., 2012), and psychophysical work has proposed a cortical color channel along this axis (Danilova and Mollon, 2010; but see Witzel and Gegenfurtner, 2013)."

      Similarly, the asymmetry between +S and -S modulation is striking and need better explanation within the model (that thalamic input strength predicts gamma strength) given that +S inputs to cortex appear to be, if anything, stronger than -S inputs (e.g. DeValois et al. PNAS 2000).

      We followed the reviewer’s suggestion and modified the Discussion to read:

      "Contrary to the unified pathway for L-M activation, stimuli high and low on the S-(L+M) axis (S+ and S ) each target different cell populations in the LGN, and different cortical layers within V1 (Chatterjee and Callaway, 2003; De Valois et al., 2000), whereby the S+ pathway shows higher LGN neuron and V1 afferent input numbers (Chatterjee and Callaway, 2003). Other metrics of V1 activation, such as ERPs/ERFs, reveal that these more numerous S+ inputs result in a weaker evoked potential that also shows a longer latency (our data; Nunez et al., 2021). The origin of this dissociation might lie in different input timing or less cortical amplification, but remains unclear so far. Interestingly, our results suggest that cortical gamma is more closely related to the processes reflected in the ERP/ERF: Stimuli inducing stronger ERF induced stronger gamma; and controlling for ERF-based measures of input drives abolished differences between S+ and S- stimuli in our data."

      Given that this asymmetry presents a potential exception to the direct association between LGN drive and V1 gamma power, we have toned down claims of a direct input drive to gamma power relationship in the Title and text and have refocused instead on L-M contrast.

      My only real concern is that the authors use a precomputed DKL color space for all observers. The problem with this approach is that the isoluminant plane of DKL color space is predicated on a particular balance of L- and M-cones to Vlambda, and individuals can show substantial variability of the angle of the isoluminant plane in DKL space (e.g. He, Cruz and Eskew, Journal of Vision 2020). There is a non-negligible chance that all the responses to colored stimuli may therefore be predicted by projection of the stimuli onto each individual's idiosyncratic Vlambda (that is, the residual luminance contrast in the stimulus). While this would be exhaustive to assess in the MEG measurements, it may be possible to assess perceptually as in the He paper above or by similar methods. Regardless, the authors should consider the implications - this is important because, for example, it may suggest that important of signals from magnocellular pathway, which are thought to be important for Vlambda.

      We followed the suggestion of the reviewer, performed additional analyses and report the new results in the following Results text:

      "When perceptual (instead of neuronal) definitions of equiluminance are used, there is substantial between-subject variability in the ratio of relative L- and M-cone contributions to perceived luminance, with a mean ratio of L/M luminance contributions of 1.5-2.3 (He et al., 2020). Our perceptual results are consistent with that: We had determined the color-contrast change-detection threshold per color; We used the inverse of this threshold as a metric of color change-detection performance; The ratio of this performance metric between red and green (L divided by M) had an average value of 1.48, with substantial variability over subjects (CI95% = [1.33, 1.66]).

      If such variability also affected the neuronal ERF and gamma power measures reported here, L/M-ratios in color-contrast change-detection thresholds should be correlated across subjects with L/M-ratios in ERF amplitude and induced gamma power. This was not the case: Change-detection threshold red/green ratios were neither correlated with ERF N70 amplitude red/green ratios (ρ = 0.09, p = 0.65), nor with induced gamma power red/green ratios (ρ = -0.17, p = 0.38)."

      Reviewer #3 (Public Review):

      This is an interesting article studying human color perception using MEG. The specific aim was to study differences in color perception related to different S-, M-, and L-cone excitation levels and especially whether red color is perceived differentially to other colors. To my knowledge, this is the first study of its kind and as such very interesting. The methods are excellent and manuscript is well written as expected this manuscript coming from this lab. However, illustrations of the results is not optimal and could be enhanced.

      Major

      The results presented in the manuscript are very interesting, but not presented comprehensively to evaluate the validity of the results. The main results of the manuscript are that the gamma-band responses to stimuli with absolute L-M contrast i.e. green and red stimuli do not differ, but they differ for stimuli on the S-(L+M) (blue vs red-green) axis and gamma-band responses for blue stimuli are smaller. These data are presented in figure 3, but in it's current form, these results are not well conveyed by the figure. The main results are illustrated in figures 3BC, which show the average waveforms for grating and for different color stimuli. While there are confidence limits for the gamma-band responses for the grating stimuli, there are no confidence limits for the responses to different color stimuli. Therefore, the main results of the similarities / differences between the responses to different colors can't be evaluated based on the figure and hence confidence limits should be added to these data.

      Figure 3E reports the gamma-power change values after alignment to the individual peak gamma frequencies, i.e. the values used for statistics, and does report confidence intervals. Yet, we see the point of the reviewer that confidence intervals are also helpful in the non-aligned/complete spectra. We found that inclusion of confidence intervals into Figure 3B,C, with the many overlapping spectra, renders those panels un-readable. Therefore, we included the new panel Figure 3-figure supplement 2A, showing each color’s spectrum separately:

      (A) Per-color average induced power change spectra. Banding shows 95% confidence intervals over participants. Note that the y-axis varies between colors.

      It is also not clear from the figure legend, from which time-window data is averaged for the waveforms.

      We have added in the legend:

      "All panels show power change 0.3 s to 1.3 s after stimulus onset, relative to baseline."

      The time-resolved profile of gamma-power changes are illustrated in Fig. 3D. This figure would a perfect place to illustrate the main results. However, of all color stimuli, these TFRs are shown only for the green stimuli, not for the red-green differences nor for blue stimuli for which responses were smaller. Why these TFRs are not showed for all color stimuli and for their differences?

      Figure 3-figure supplement 3. Per-color time-frequency responses: Average stimulus-induced power change in V1 as a function of time and frequency, plotted for each frequency.

      We agree with the reviewer that TFR plots can be very informative. We followed their request and included TFRs for each color as Figure 3-Figure supplement 3.

      Regarding the suggestion to also include TFRs for the differences between colors, we note that this would amount to 28 TFRs, one each for all color combinations. Furthermore, while gamma peaks were often clear, their peak frequencies varied substantially across subjects and colors. Therefore, we based our statistical analysis on the power at the peak frequencies, corresponding to peak-aligned spectra (Fig. 3c). A comparison of Figure 3C with Figure 3B shows that the shape of non-aligned average spectra is strongly affected by inter-subject peak-frequency variability and thereby hard to interpret. Therefore, we refrained from showing TFR for differences between colors, which would also lack the required peak alignment.

  7. learn-us-east-1-prod-fleet02-xythos.content.blackboardcdn.com learn-us-east-1-prod-fleet02-xythos.content.blackboardcdn.com
    1. It is an insurrection.It may be that in this presentation of a dreadful event we will some­times speak of rioting, but merely to describe what was happening on the surface and always maintaining the distinction between form andessence, riot and insurrection.In the sudden outbreak and grim suppression of this 1832 uprising there was so much grandeur that even those who see it as mere riot cannot speak of it without respect.

      Hugo places a certain scrutiny onto riots, and here argues that, even those who think that the June 1832 insurrection was a riot, "cannot speak of it without respect." I understand Hugo's distinction between riots and insurrections, but is a riot against injustice not a noble act to him? Why must an uprising be an insurrection in order to gain our respect? Yes, riots are violent and seemingly random, but they also serve a purpose in society and are one viable option for the oppressed.

    1. I saw many technologies used in unequal ways

      I have't read on yet, but I wonder if there are any biases that we are unaware of that contribute to this mistreatment. We did a study similar to this, on race and age using the tool "Implicit Association Test", which is said to unveil hidden biases that we may have. I have added the link if you want to try this out for yourself. Personally, I believe that there are many things that are unconscious to us, and we try to avoid negative biases, but sometimes they can be apart of our nature based on the way we were raised, and the environment we are exposed to.

      https://implicit.harvard.edu/implicit/selectatest.html

      It is really sad to hear the data on this as we think of teachers to be loving caretakers.

    1. Author Response:

      Reviewer #1 (Public Review):

      This article focuses on a quantitative description of airineme morphology and its consequences for contact and communication between cells via these long narrow projections. The primary conclusions are

      1) Airineme shapes are consistent with a persistent random walk model (analogous to a wormlike polymer chain), unhindered by the presence of other cells.

      The authors convincingly demonstrate, using analysis of the mean-squared-displacement along the airineme contour, that the structures cannot be described by a diffusive growth process (ie: a Gaussian chain) as would be expected if there were no directional correlations between consecutive steps. Furthermore, by observing the airineme growth and looking at the distribution of step-sizes, they show that these steps do not exhibit the expected long-tail distributions that would imply a Levy-walk behavior. The persistent random walk (PRW) is presented as an alternative that is not inconsistent with the data. However, given the high level of noise due to low sampling, the claimed scaling behavior of the MSD at long lengths is not fully convincing. Nevertheless, the PRW provides a plausible potential description of the airineme shapes.

      To reiterate the comment: the MSD analysis allows us to reject the simple random walk model, and it is consistent but alone is not strongly supportive of the PRW model, especially at high time of around 15 minutes (long lengths of around 65 microns). As the Reviewer points out, this is due to low numbers of long airinemes.

      This prompted us to investigate the long-length data using multiple analysis approaches. In the new manuscript, new Fig 2B, we took all airinemes whose growth time was greater than 15 min, and plotted their final angle, i.e., the angle between the tangent vector at their point of emergence from the source cell and the tangent vector at their tip. At long times (>1/D_theta), the PRW model predicts that the angular distribution should become isotropic.

      In new 2B, we find that the angular distribution is uniform, i.e., isotropic, using a Kolmogorov-Smirnov test (p-value 0.37, N=26).

      Since there are relatively few data points, we repeated this analysis under various airineme selection criteria, and in all cases found the final angular distribution to be consistent with uniformity (new Supplemental Data Figure 1). For example, if we set the threshold at 10min, which includes N=49 airinemes, the Kolmogorov-Smirnov test against a uniform angular distribution gives a p-value of 0.32.

      We here add a few additional notes

      ● Note that there is significantly less data used in this test than in the MSD analysis or the autocorrelation function maximum likelihood analysis. In order to perform a hypothesis test, we wanted to be sure that the data points are independent, so we take only one from each airineme (unlike MSD and autocorrelation analyses, for which we take every interval of a particular length, whether in the same airineme or not.)

      ● Finally, although the >10min KS test has more data than the >15min KS test (N=49 compared to N=26), we have chosen to present the >15min KS test in the Main Text. As we mentioned above, the conclusion is unchanged for >10min (see Supporting Data). The reason is that >15min is the first test we ran to check angular distribution against a uniform (-pi,pi) distribution, and we did not want to bias our testing.

      Taken together, the data are even more strongly supportive of the PRW model. We are grateful for the Reviewer in encouraging us to further explore the high-time data.

      2) The flexibility (ie: persistence length) of the airineme shapes is one that maximizes the probability of a given airineme (of fixed length) contacting the target cell.

      This optimum arises due to the balance between straight-line paths that reach far from the source but cover a narrow region of space and diffusive paths that compactly explore space but do not reach far from the starting point. Such optimization has previously been noted in unrelated contexts both for search processes of moving particles and for semiflexible chains that need to contact a target. The authors present a compelling case (Fig 4B) that the measured angular diffusion of the airinemes falls close to the predicted optimum. Furthermore, the measured probability of hitting the target cell also lies close to the model prediction, providing a strong test of the applicability of their model.

      3) Airineme flexibility engenders a tradeoff between contact probability and directional information (ie: the extent to which the target cell can determine the position of the source).

      This calculation proposes an alternative utility metric for communication via airinemes. The observed flexiblity is shown to be at a Pareto optimum, where changes in either direction would decrease either the probability of contact or the directional information. Again the absolute value of the metric (Fisher information for angular distribution) is within the predicted order of magnitude from the model. Thus, while the importance of maximizing this metric remains speculative, its quantitative value provides an additional test for the applicability of the PRW model.

      Overall, this paper provides an interesting exploration of optimization problems for communication by long thin projections. A particular strength is the quantitative match to experimental data -- indicating not just that the experimental parameters fall along a putative optimum but also that the metrics being optimized are well-predicted by the model. Defining an optimization problem and showing that some parameter sits at the optimum is a common approach to generating insight in biophysical modeling, albeit invariably suffering from the fact that it is difficult to know which optimization criteria actually matter in a particular cellular system. The authors do an excellent job of exploring multiple optimization criteria, quantifying the balance between them, and pointing out inherent limitations in knowing which is most relevant.

      A minor weakness of the manuscript is its focus on a very narrowly defined cellular system, with the general applicability of the results not being highlighted for clarity. For example, the fact that the same flexiblity optimizes contact probability and the balance between contact and directional information is an interesting conclusion of the paper. Is this true in general? Is it applicable to other systems involving a semiflexible structure reaching for a target or a moving agent executing a PRW?

      The Reviewer’s question is an excellent question: Is the trade-off between contact and directional information a general property of searchers that obey persistent random walks? To address this question, we now include the analysis previously contained in Figure 5D, but for a full parameter space exploration. This is done in new Figure 5 Supplemental Figure 1. In doing so, we found fascinating behavior that sheds some light on the loop in Fig 5D.

      At low d_targ, the trade-off is amplified, and the parametric curve resembles bull's horns with two tips representing the smallest and largest D_theta in our explored range, pointing outward so the shape is concave-up. Intuitively, we understand this as follows: since the target is fairly close (relative to l_max), contact is easy. The only way to get directional specification is by increasing D_theta to be very large, effectively shrinking the search range so it only reaches (with significant probability) the target at the near side (“3-o-clock'' in Fig. 5A). At low d_targ, the parametric curve is concave-up, and there is no Pareto optimum.

      At high d_targ, the searcher either barely reaches (when D_theta is high), and does so at 3-o-clock, therefore providing high directional information, or D_theta is low, and the searcher fails to reach, and therefore also fails to provide directional information. So, at high d_targ, there is no trade-off.

      At intermediate d_targ, the curve transitions from concave-up bull's horn to the no-tradeoff line. To our surprise, it does so by bending forward, forming a loop, and closing the loop as the low-D_theta tip moves towards the origin. At these intermediate d_targ values, the loop offers a concave-down region with a Pareto optimum.

      So, to answer the specific question of the Reviewers: No, the Pareto optimum is not a general feature of persistent random walk searchers. It only exists in a particular parameter regime, sandwiched between a regime where there is a strict trade-off with no Pareto optimum and a regime in which there is no trade-off.

      All of these results are now discussed in the main text.

      (Note that although we do not explicitly explore lmax, since these plots have not been nondimensionalized, the parametric curve for a different lmax can be obtained by rescaling the results).

      Reviewer #2 (Public Review):

      Signalling filopodia are essential in disseminating chemical signals in development and tissue homeostasis. These signalling filopodia can be defined as nanotubes, cytonemes, or the recently discovered airinemes. Airinemes are protrusions established between pigment cells due to the help of macrophages. Macrophages take up a small vesicle from one pigment cell and carry it over to the neighbouring pigment cell to induce signalling. However, the vesicle maintains contact with the source cell due to a thin protrusion - the airineme. In support of these data, the authors find that the extension progress of the airinemes fits an "unobstructed persistent random walk model" as described for other macrophages or neutrophils.

      The authors describe the characteristics of an airineme as it would be a signalling filopodia, e.g. a nanotube or a cytoneme, which sends out to target a cell. An airineme, however, is fundamentally different. Here, a macrophage approaches a pigment cell binds to the airineme vesicle. Then, the macrophage approaches a target pigment cell and hands over the airineme vesicle. During this process, the airineme vesicle maintains a connection to the source pigment cell by a thin protrusion. Then, the macrophage leaves the target cell, but the airineme vesicle, including the protrusion, is stabilized at the surface and activates signalling. Indeed nearly all airinemes observed have been associated with macrophages (Eom et al., 2017).

      Therefore, it is essential to focus on the "search-and-find" walk of the macrophage and not the passively dragged airineme. In the light of this discussion, I am not sure if statements like "allow the airineme to hit the target cell" are helpful as it would point towards an actively expanding protrusion like a filopodium.

      We have added a new paragraph in the Introduction emphasizing the role of the macrophage, and we have changed the language. In particular, we want to remove agency from the airineme, since it is indeed moving with the macrophage. In the mathematical sections, we opt for the phrase “search process”.

      We have also clarified that, in the biological system, the details of contact are unclear (e.g., what mechanism in the macrophage-airineme-vesicle is responsible for distinguishing the target cell). Therefore, in the model, we have clarified that contact is declared when the airineme tip arrives at a distance r_targ from the center of the target cell, and this critical distance might be larger than the size of the target cell, since it might include part or all of the macrophage.

      Reviewer #3 (Public Review):

      This paper studies statistical aspects of the role of long-range cellular protrusions called airinemes as means of intracellular communication. The mean square distance of an airineme tip is found to follow a persistent random walk with a given velocity and angular diffusion. It is argues that this distribution with these parameters is the one that optimise the probability of contact with the target cell. The authors then evaluate the directional information (where in space did the airineme come from) and found that, again, the measure diffusion coefficient optimise the trade-off between high directional information (small diffusion) and large encounter probability.

      I found this paper well written and clear, and addressing an interesting problem (long-range intracellular communication) using rigorous quantitative tools. This is a very useful approach, which appears to have been appropriately done, that in itself makes this paper worthy of interest.

      1) The main conclusion of this paper is that the airineme properties optimises something that has to do with their function. Although rather appealing, I find this kind of conclusion often questionable considering the large uncertainty surrounding many parameters.

      We agree that conclusions about optimality need to be expressed carefully, to avoid teleological statements and overstating our knowledge about the constraints and variability faced by the living system. In the revised manuscript, we strive to use language to point out that the parameter extracted from data (an average) and the parameter predicted to be optimal (on average) are approximately equal, and avoid speculation about the evolutionary process that may have led to these parameters.

      Here, optimality is shown from a practical perspective, using measure parameters. For instance, the optimal diffusion coefficient for hitting the target varies by 2 orders of magnitude when the distance between cells is varied (Fig.3A). The measured coefficient is optimal for cells about 25 µm distant. Does this reflect anything about the physiological situation in which these airinemes operate?

      To find the physiological regime in which the airinemes operate, we extracted distance-to-target measurements from imaging data, and found an average distance of 51 microns (note possible typo in referee comment), with a range of 33𝜇m − 84𝜇m, 𝑁 = 70. We report this in updated Table 1). The optima we find is in the average number of attempts before success (so, a single instance of an airineme may either succeed or fail, stochastically), when the distance to the target is 50 microns. These are both averages, across an entire fish epithelium (which contains ~10^5 source cells). So, for a particular cell generating airinemes, there may be different optimal parameters given a priori knowledge of its environment, but, across the whole fish epithelium, we assume the overall success corresponds to the average single-cell success we simulate.

      Another rather puzzling claim is that the diffusion coefficient is optimised both for finding the target, AND for finding the best compromised between finding the target and providing directional information, while the latter must necessarily require weaker diffusion. Hence the last paragraph of p.6 ("the data is consistent with either conclusion that the curvature is optimized for search, or it is optimized to balance search and directional information"), although quite honest, gives the feeling that the conclusions are not very robust. I would welcome a discussion of these points.

      We have clarified the result about directional information in the new manuscript.

      First, it is not optimized for maximal directional information, in the sense that there are other parameters that would give more directional information – we apologize for the lack of clarity. Rather, the parameters observed are such that changing them would either reduce search success or directional information. In the study of multiple optimization, this property is called “Pareto optimality”.

      Second, the Reviewer’s intuition is that weaker diffusion (straighter airinemes) would provide more directional information. This was indeed our intuition as well, prior to this study. To our surprise, we found that very weak diffusion or very strong diffusion both give local maxima of directional information. The intuitive explanation is that the searchers are finite-length, and high diffusion leads to a smaller search extent which only reaches the target cell at its very nearest region. We provide this intuitive explanation (which was indeed a surprise to us) in the Results section.

      Third, the Reviewer asks about the generality of the result about directional information. This is an excellent question. The comment, and similar comments from other Reviewers, prompted us to perform a parameter exploration study. This is contained in a new Supplemental Figure and new paragraphs in the Results section.

      The Reviewer’s question is an excellent question: Is the trade-off between contact and directional information a general property of searchers that obey persistent random walks? To address this question, we now include the analysis previously contained in Figure 5D, but for a full parameter space exploration. This is done in new Figure 5 Supplemental Figure 1. In doing so, we found fascinating behavior that sheds some light on the loop in Fig 5D.

      At low d_targ, the trade-off is amplified, and the parametric curve resembles bull's horns with two tips representing the smallest and largest D_theta in our explored range, pointing outward so the shape is concave-up. Intuitively, we understand this as follows: since the target is fairly close (relative to l_max), contact is easy. The only way to get directional specification is by increasing D_theta to be very large, effectively shrinking the search range so it only reaches (with significant probability) the target at the near side (“3-o-clock'' in Fig. 5A). At low d_targ, the parametric curve is concave-up, and there is no Pareto optimum.

      At high d_targ, the searcher either barely reaches (when D_theta is high), and does so at 3-o-clock, therefore providing high directional information, or D_theta is low, and the searcher fails to reach, and therefore also fails to provide directional information. So, at high d_targ, there is no trade-off.

      At intermediate d_targ, the curve transitions from concave-up bull's horn to the no-tradeoff line. To our surprise, it does so by bending forward, forming a loop, and closing the loop as the low-D_theta tip moves towards the origin. At these intermediate d_targ values, the loop offers a concave-down region with a Pareto optimum.

      So, to answer the specific question of the Reviewers: No, the Pareto optimum is not a general feature of persistent random walk searchers. It only exists in a particular parameter regime, sandwiched between a regime where there is a strict trade-off with no Pareto optimum and a regime in which there is no trade-off.

      All of these results are now discussed in the main text.

      (Note that although we do not explicitly explore lmax, since these plots have not been nondimensionalized, the parametric curve for a different lmax can be obtained by rescaling the results).

      2) on p.4: "the airineme tips (which are transported by macrophages [30]) appear unrestricted in their motion". I don't understand what it means that the airineme tips are transported by macrophage, and I missed the explanation in the cited article. Is airineme dynamics internally generated (i.e. by actin/microtubule polymerisation) or does it reflect to motility of cells dragging the airineme along? This is discussed in passing in the Discussion, but I think that this should be explainde in more detail right from the start. Aslo, if a cell is indeed directing the tip, what does contact mean? Does it mean that the driving macrophage must contact the target cell and somehow attached the airineme to it? IF yes, that means that the airineme tip has a large spatial extent, which will certainly affect the contact probability.

      These are very good questions. Airinemes have been characterized in a few studies since their discovery in 2015. We are saddened (and excited) to say that: the answers to all of these questions are currently unknown. To paraphrase the Reviewer, the questions are: First, what is the force generation mechanism that leads to airineme extension (additionally, if there are multiple coordinated force generators, e.g., the airineme’s internal cytoskeleton and the macrophage, how are these forces coordinated)? And second, what are the molecular details of airineme tip contact establishment upon arrival at a target cell?

      We present an extended biological background discussion addressing these questions, including what is known and what remains unknown. We have incorporated a shortened version of this as a new paragraph in the introduction.

      Airinemes are produced by xanthophore cells (also called yellow pigment cells) and play a role in the spatial organization of pigment cells that produce the patterns on zebrafish skin. Xanthophores have bleb-like structures at their membrane, and those blebs are the origin of the airineme vesicles at the tip. Those blebs express phosphatidylserine (PtdSer), an evolutionarily conserved ‘eat-me’ signal for macrophages. Macrophages recognize the blebs, ‘nibble,’ and ‘drag’ as they migrate around the tissue and the filaments trailing and extending behind. Airineme lengths have a maximum, regardless of whether they reach their target. If the airineme reaches a target before this length, the airineme tip complex recognizes target cells (melanophores) and the macrophage and airineme tip disconnect.

      The airineme tip contains the receptor Delta-C, which activates Notch signaling in the target cell. The mechanism by which a macrophage hands off the airineme tip is still mysterious, due to temporal and spatial resolution limits. It is also known what other signals, if any, are carried by the airineme. If no target cell is found by the maximum length, the macrophage and airineme disconnect, and the airineme the extension switches to retraction. Thus, macrophages do not keep dragging the airineme vesicles until they find the target melanophores. However, how macrophages determine when to engulf the untargeted airineme vesicles is not understood.

      Regarding the Reviewer’s specific question about the implications for the macrophage on how we model contact establishment: This would indeed change the interpretation of the model parameter r_targ. Specifically, contact is declared when the airineme tip arrives at a distance r_targ from the center of the target cell, and this critical distance might be larger than the size of the target cell, since it might include part or all of the macrophage. We have added this to the first part of Results, when the parameter is introduced.

      3) Fig. 2A shows the airinemes MSD and the fit using the PRW model. I don't find the agreement so good. The power law t^2 seems good almost up to 10 minutes, and the scaling above that, if there is one, is clearly larger than linear. So I would say that the apparent agreement with the PRW model reflects the fact that there is a crossover from a ballistic motion to something else, but that this something else is not a randow walk. The MSD does look quite strange at long time, where it apparently decays. This made me wonder whether there might be a statistical biais in the data, for instance, the longest living airinemes are those who didn't find their target and hence those who travel less far, on average. I tried to get more information on the data from the ref.[29,30], but could not find anything. The authors should discuss these data and possible biais in more detail. For instance, do the data mix successful and unsuccessful airinemes? This is somewhat touched upon in Fig.s$, but I did not gain any useful information from it, except that the authors find the agreement "good" while it does not look so good to me.

      To reiterate the comment, which is closely related to comments from other Reviewers: the MSD analysis allows us to reject the simple random walk model, and it is consistent but alone is not strongly supportive of the PRW model, especially at high tau of around 15 minutes (long lengths of around 65 microns). As the Reviewer points out, this is due to low numbers of long airinemes.

      We agree, and have performed new analysis. The following is repeated here for convenience:

      This prompted us to investigate the long-length data using multiple analysis approaches. In the new manuscript, new Fig 2B, we took all airinemes whose growth time was greater than 15 min, and plotted their final angle, i.e., the angle between the tangent vector at their point of emergence from the source cell and the tangent vector at their tip. At long times, the PRW model predicts that, for long times >1/D_theta, the angular distribution should become isotropic. In new 2B, we find that the angular distribution is uniform, i.e., isotropic, using a Kolmogorov-Smirnov test (p-value 0.37, N=26).

      Since there are relatively few data points, we repeated this analysis under various airineme selection criteria, and in all cases found the final angular distribution to be consistent with uniformity (new Supplemental Data Figure 1). For example, if we set the threshold at 10min, which includes up to N=49 airinemes, the Kolmogorov-Smirnov test against a uniform angular distribution gives a p-value of 0.32.

      We here add a few additional notes

      ● Note that there is significantly less data used in this test than in the MSD analysis or the autocorrelation function maximum likelihood analysis. In order to perform a hypothesis test, we wanted to be sure that the data points are independent, so we take only one from each airineme (unlike MSD and autocorrelation analyses, for which we take every interval of a particular length, whether in the same airineme or not.)

      ● Finally, although the >10min KS test has more data than the >15min KS test (N=49 compared to N=26), we have chosen to present the >15min KS test in the Main Text. As we mentioned above, the conclusion is unchanged for >10min (see Supporting Data). The reason is that >15min is the first test we ran to check angular distribution against a uniform (-pi,pi) distribution, and we did not want to bias our testing.

      Taken together, the data are even more strongly supportive of the PRW model. We are grateful for the Reviewer in encouraging us to further explore the high-time data.

      4) Regarding the directionality discussion, some aspect are a bit vague so that we are left to guess the assumptions made. For instance, the source cell is place at \theta=0 "without loss of generality" (p.6). Apparently (sketch Fig.5A) this also means that the airineme starting point from the source is at \theta=0, which clearly involves loss of generality, since the airineme could start from anywhere, its path could be hindered by the body of the source cell, and its contact angle would then be much less likely to be close to 0. It might be that in practice, only those airineme starting close to theta=0 do in fact make contact, but this should be discussed more thoroughly. Also, why is there to maxima in the Fisher information (Fig.5C) for very high and very low diffusion coefficient at short distance?

      The sketch was indeed not clear about generality, so we have edited it so that the angles are no longer perpendicular. We also now also clarify in the Main Text that, in all simulations (both measuring contact probability and directional sensing), the airineme begins at a specified point in an orientation uniformly random in (-pi,pi). We apologize that this was not clear in the previous sketch.

      Regarding hindrance by the source cell: While the tissue surface is crowded, the airineme tips appear unrestricted in their motion on the 2d surface, passing over or under other cells unimpeded (Eom et al., 2015, Eom and Parichy, 2017). We therefore do not consider obstacles in our model. This includes the source cell, i.e., we allow the search process to overlie the source cell. We now state this explicitly in the Main Text.

      Regarding two maxima in Figure 5C (which was a surprise to us): We understand it with the following intuitive picture. For low D_theta, i.e., for very straight airinemes, the allowed contact locations are in a narrow range (by analogy, imagine the day-side of the planet Earth, as accessible by straight rays of sunlight), resulting in high directional information. For high D_theta, i.e., for very random airinemes, we initially expected low and decreasing directional information, since there is more randomness. However, these are finite-length searches, and the range of the search process shrinks as D_\theta increases. This results in a situation where the tip barely reaches only the closest point on the target cell, resulting again in high directional information. We have added this intuitive reasoning in the Main Text.

    1. In constructing personas, we had to be cognizant of inadvertently creating stereotypes as humans naturally stereotype as a way of categorizing conceptions of others

      In addition to this inadvertent tendency to create stereotypes, I think that in only making a couple of personas to represent the learning audience you may fall into stereotyping just by lack of a sufficient sampling. How do you determine how many personas would be a representative enough sample? If you are looking at a diverse group of learners, you need more personas, and you need to have instructional materials that cover diverse needs. In larger groups would you break the group into sections to better address individual needs? Or have additional instructors?

    2. One way to enhance the socio-technical design of learning environment is by espousing a human-computer interaction perspective, which allows us to not only consider what the s/he is learning, but the unique interactions that impact their learning process.

      This is such an important point! I think that often designers, SMEs, coders...everyone involved in the design team can become so enthralled with and focused on their design, that they lose sight of the learner experience. It seems to me that true LXD requires that the design team set their egos aside and be flexible and open to change in order to provide the best and most effective learning experience for the learner. If the design itself induces frustration, the learner may give up and never get to the actual learning process. Designers need to strive for ease of use and provide design with limited barriers for the learner

      I have seen the role of designer-ego play out in the real world. In my role as a virtual math teacher, my colleagues and I regularly reached out to the curriculum team to request a change to the virtual book, activities, or assessments in order to enhance our students' experiences. Too often, we were told no, with no regard for the learner.

      My frustration with this led me to want to move into curriculum so that the learner's point of view would be better understood and represented. That is part of what led me to my current role in Gifted and to the ID program at UF.

    3. We put ourselves in his or her shoes.

      This week really put into perspective the idea of empathy vs. sympathy. It's much easier to sympathize with someone because you're still creating understanding from your personal perspective, while empathy requires what Baaki & Maddrell suggest: putting ourselves in someone else's shoes. However, I think that's a lot more difficult than these activities suggest. In Dr. Schmidt's example of creating a course for parents dealing with a child's diagnosis, while we may sympathize with them as instructional designers and human beings, it's far more difficult to truly understand the depths and nuances and of their experiences. While empathy interviews certainly help, it can never replace the experience. I saw this as a mother whose son recently received a diagnosis and dealing with the feelings and thoughts associated with it. I don't know if there's truly an empathy interview that an instructional designer can use to gauge and truly understand and feel what I do.

    1. Accommodations alone are not enough to achieve inclusion; when we go beyond accommodations, we create paths that help and support many learners, not just those who need or want accommodations.

      I think this idea is so important! Creating accommodations or new accessibility features is not just helpful for people with disabilities. They can serve as useful tools for anyone regardless of their abilities. If the accessible features can make everyone's time using a tool easier, why is there a lack of emphasis on creating these features? Tool creators may not prioritize them to begin with because they do not value people with disabilities as much as they should, but they need to realize these features can help everyone.

    2. They require constant reevaluation of the design choices we make in order to recognize how each choice can open up new forms of exclusion and barriers for learners.

      This is something that I think is very important to recognize. In regards to inclusion, it is essential that we are constantly aware of how or who we may be excluding others, even if we do not realize it. Nobody can be perfect 100% of the time, but as long as we are making an effort to respect others, that's all that we can ask for.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this detailed study the authors show that in isolated islets the polarity of the secretory apparatus is largely lost while it is preserved in slices where the capillary network remains intact. The authors then go on to show that the integrin/FAK pathway appears to be responsible for inducing and maintaining polarity, which involves concentration of active zone proteins and calcium channels at the contact sites and a higher sensitivity and potency of insulin secretion to glucose stimulation.

      Generally, the data appear to be of high quality, being carried out with state-of-the-art technology, and the manuscript is lavishly illustrated. Since as a neuroscientist I am not sufficiently familiar with the field of the cell biology of insulin release it is difficult for me to judge whether there is sufficient advance in knowledge. A higher degree of organization of release sites including a role of active zone proteins was previously demonstrated from other endocrine organs involving the release of large dense-core vesicles such as chromaffin cells. Thus, the differences between the highly organized and rapidly responding exocytotic sites in neurons and the slower reacting release sites of peptide/protein containing granules are not fundamental but rather gradual, despite the principal cell biological differences between the biogenesis and recycling pathways of the secretory organelles.

      In summary, the work adds new aspects to the understanding of the regulation of exocytosis in pancreatic beta cells. Aside from corrections of figure descriptions and experimental details, my only major comment relates to the data shown in Fig. 4. It appears that the difference in the time-to-peak between the two preparation is mainly caused by a (rather variable?) delay between glucose addition and the onset of the rise since the rate of increase is apparently not different between the preparations. Is this due a delay in depolarization, i.e. a delay in the closure of the ATP-K channels? This should be clarified. Also, the authors should show a comparative histogram of the delay times (between glucose addition and the inflection point at the onset of the rise).

      The delay observed is due to a slower response in islets vs slices, which given the potentiating effects we show of the KATP channel drugs (diazoxide and now glibenclamide) is likely explained by a delay in KATP closure. However, since we are measuring the Ca2+ response we cannot directly prove this. We feel this is adequately discussed with reference to glucose-dependent triggering (where the KATP channel is a key component). In direct response to the referee’s comment about variability, we have re-expressed the data to show frequency histogram comparisons of the delay to peak (new Fig 4J).

      Reviewer #2 (Public Review):

      1) The authors present an investigation of subcellular distribution and dynamics of known presynaptic proteins in a relatively new approach, pancreatic slices, mastered by a limited number of laboratories, and which is currently the best method to largely preserve capillary networks. They demonstrate the advantage of this method by detailed cellular and subcellular optical analysis comparing isolated islets, islets in pancreatic slices, isolated islet cells and isolated islet cells on ECM (laminin) covered surfaces. This work provides good proof that preservation of capillary networks and corresponding distribution of proteins (laminin, liprin, integrin beta1 etc) is required for insulin secretion at the apical surface of islet cells. Moreover, in these pancreatic slices they observe a restriction of exocytotic sites at the vascular surfaces. The role of the extracellular matrix is also well investigated here by experiments on dispersed or single beta cells attached either to a glass-BSA interface or to a glass-laminin interface. However, the authors have already previously published in 2014 a restricted polarized insulin secretion in cultured islets as well as the preservation of localized liprin and laminin distribution (as well as RIM2 and piccolo; DOI 10.1007/s00125-014-3252-6). It is not clear why these data cannot be reproduced now again in isolated islets (see Fig. 1 and 2) .

      We thank the referee for their comments. To clarify the specific issue around our past work. All our live sub-cellular resolution experiments have previously been performed with isolated islets – we have not, until recently been able to reliably get the slice to work. In contrast, our work with immunofluorescence of active zone proteins has been performed with fixed slices (including DOI 10.1007/s00125-014-3252-6, Low et al 2014).

      2) The authors try to gain insight which mechanisms control this specific spatial restriction and they provide evidence that Focal Adhesion kinase activity is implicated in glucose-induced calcium fluxes and insulin secretion by the use of a small molecule antagonist and the use of a purified monoclonal antibody. They conclude that FAK is a master regulator of glucose induced insulin secretion that controls positioning of presynaptic scaffold proteins and the functioning of calcium channels. Although FAK may be a regulator, the claim that FAK controls functioning of calcium channels can certainly not be made. Ratio measurements of cellular calcium levels do not suffice for that (patch or sharp would be required). Moreover, the fact that KCl-induced insulin secretion (which bypasses nutrient metabolism and leads directly to opening of voltage-dependent calcium channels) is not altered by the FAK antagonist strongly argues against a role of FAK in calcium channel regulation. Indeed, the presented data suggest that FAK may intervene far more upstream from exocytosis such as in nutrient metabolism or granule mobility/maturation.

      Our data clearly shows that integrin/FAK activation is part of the glucose dependent control of Ca2+ and insulin secretion. It is not relevant to this conclusion how we measure Ca2+ responses – they are obviously affected by all manipulations of integrin/FAK. We note that the referee is specifically correct in saying that we do not have evidence that Ca2+ channel function is a direct target of integrins/FAK and we have reworded the text to make this clear.

      Further, our work does not define where in the glucose pathway integrin/FAK are acting. The referee is correct in saying the KCl data suggests it is upstream of the final stages of Ca2+ channel and exocytosis. Consistent with this we see effects of integrin/FAK manipulation on ELKS and liprin positioning (Figs 7 and 8) and, given the published data showing that ELKS enhances Ca2+ channel current (Ohara-Imaizumi et al 2019) we think it is plausible integrin/FAK intersect with this pathway to regulate Ca2+ channel activity. With reference to the high K responses, KCl rapidly depolarises the cells to recruit Ca2+ channels, in contrast glucose slowly depolarises cells. This difference will affect Ca2+ channel behaviour and altered CaV1.2 function, such as lowered voltage threshold might specifically only be apparent in the glucose responses.

      3) The authors present data that islets in pancreatic slices are considerably more sensitive to glucose, inducing a response already at basal glucose levels (2.8 mM). In the same vein the authors observe a considerably shortened delay between stimulus and response (this delay is general due to nutrient metabolism and initial filling of intracellular calcium stores). The authors take these phenomena as evidence for a superior and more physiological quality of their islet slices as compared to conventional purified islets.

      However, contrary to their interpretation, these observations considerably questions whether the slice preparation used here in this work has physiological qualities. Indeed, the authors observe considerable activity of islet beta-cells already far below the set-point of around 6 or 7 mM in rodents, very well characterized through a number of studies in-vivo, in-vitro and even in-situ (10.1113/jphysiol.1995.sp020804), and their preparations reach almost full activity around the set-point. This is also surprising as such a hypersensitivity has not been reported by several other groups using the same preparation, i.e. pancreatic slices (10.1152/ajpendo.00043.2021; 10.1371/journal.pone.0054638; 10.3389/fphys.2019.00869; 10.1371/journal.pcbi.1009002; 10.1038/nprot.2014.195) even using patch clamp (10.3390/s151127393). >Moreover, even human islets, known for a lower set-point, are inactive in slices at 3 mM (10.1038/s41467-020-17040-8) in line with the physiological requirement to avoid insulin secretion in low glucose states as to avoid life-threatening hypoglycaemia. The same applies for the shortened delay between application of a stimulus (glucose) and start of the response, which has also not been observed by other groups in pancreatic slices (refs see above).

      We are cognisant that our data challenges the dogma and talked around this point in the discussion. Evidence that our findings might be correct include the responses seen by Henquin to glucose concentrations below 6 mM (Gembal et al 1992) and the long-standing evidence of heterogeneous responses in isolated cells that show responses to very low glucose concentrations (Van Schravendijk et al 1992). As such, our data is not as unusual as it might initially appear. Furthermore, as discussed in detail below the findings from others using the slice preparation is not directly or easily compared to our work.

      In general, such an increased glucose sensitivity is observed in prediabetic states or experiments mimicking such a condition. To the best of my recollection such an apparently increased sensitivity can also be observed in brain slices due to leakage. Unfortunately, no independent measures of islet quality in slices are provided.

      We have previously characterised increased insulin secretion in “prediabetes” in mice and demonstrated a clear effect on the mechanisms of granule fusion such as an increase in compound exocytosis (Do et al 2016). We do not think this is relevant to this slice preparation where normal mice were used for both the slice and the islet experiments and our data in slices and islets both show normal granule fusion and not compound exocytosis.

      Within the same vein the comparison between slices and islets (Fig 5) is not in favour of a more physiological aspect of slices and the different cell morphology and small number of observations shed more doubt, especially in view of the well known normal beta-cell heterogeneity (which may explain differences and may have been missed here due to a small sample size).

      We acknowledge that beta cell heterogeneity is a potential confounding factor. However, our sample sizes are not small, in each islet or slice we record Ca2+ responses from ~10 cells (see Fig 3) and have repeated preparations from each mouse with the total dataset from >3 mice. It is true that the sample size for Ca2+ waves is small for the isolated islets, but this is because these are such rare events which is explained by the fragmented capillaries and compromised cell structure (eg Fig 1) in isolated islets.

      In a larger context this glucose supersensitivity may also shed doubts on the proposed important role of FAK as its role may be far less preponderant in preparations corresponding to physiological criteria.

      We agree that the relative importance of FAK might be different in different in vitro models. But it is clear that FAK plays an important role in vivo and the data from FAK KO mice show both defective glucose homeostasis and lower insulin secretion (Cai et al 2012) directly demonstrating physiological relevance.

    1. Author Response:

      Evaluation Summary:

      This study investigates the mechanisms by which distributed systems control rhythmic movements of different speeds. The authors train an artificial recurrent neural network to produce the muscle activity patterns that monkeys generate when performing an arm cycling task at different speeds. The dominant patterns in the neural network do not directly reflect muscle activity and these dominant patterns do a better job than muscle activity at capturing key features of neural activity recorded from the monkey motor cortex in the same task. The manuscript is easy to read and the data and modelling are intriguing and well done.

      We thank the editor and reviewers for this accurate summary and for the kind words.

      Further work should better explain some of the neural network assumptions and how these assumptions relate to the treatment of the empirical data and its interpretation.

      The manuscript has been revised along these lines.

      Reviewer #1 (Public Review):

      In this manuscript, Saxena, Russo et al. study the principles through which networks of interacting elements control rhythmic movements of different speeds. Typically, changes in speed cannot be achieved by temporally compressing or extending a fixed pattern of muscle activation, but require a complex pattern of changes in amplitude, phase, and duty cycle across many muscles. The authors train an artificial recurrent neural network (RNN) to predict muscle activity measured in monkeys performing an arm cycling task at different speeds. The dominant patterns of activity in the network do not directly reflect muscle activity. Instead, these patterns are smooth, elliptical, and robust to noise, and they shift continuously with speed. The authors then ask whether neural population activity recorded in motor cortex during the cycling task closely resembles muscle activity, or instead captures key features of the low-dimensional RNN dynamics. Firing rates of individual cortical neurons are better predicted by RNN than by muscle activity, and at the population level, cortical activity recapitulates the structure observed in the RNN: smooth ellipses that shift continuously with speed. The authors conclude that this common dynamical structure observed in the RNN and motor cortex may reflect a general solution to the problem of adjusting the speed of a complex rhythmic pattern. This study provides a compelling use of artificial networks to generate a hypothesis on neural population dynamics, then tests the hypothesis using neurophysiological data and modern analysis methods. The experiments are of high quality, the results are explained clearly, the conclusions are justified by the data, and the discussion is nuanced and helpful. I have several suggestions for improving the manuscript, described below.

      This is a thorough and accurate summary, and we appreciate the kind comments.

      It would be useful for the authors to elaborate further on the implications of the study for motor cortical function. For example, do the authors interpret the results as evidence that motor cortex acts more like a central pattern generator - that is, a neural circuit that transforms constant input into rhythmic output - and less like a low-level controller in this task?

      This is a great question. We certainly suspect that motor cortex participates in all three key components: rhythm generation, pattern generation, and feedback control. The revised manuscript clarifies how the simulated networks perform both rhythm generation and muscle-pattern generation using different dimensions (see response to Essential Revisions 1a). Thus, the stacked-elliptical solution is consistent with a solution that performs both of these key functions.

      We are less able to experimentally probe the topic of feedback control (we did not deliver perturbations), but agree it is important. We have thus included new simulations in which networks receive (predictable) sensory feedback. These illustrate that the stacked-elliptical solution is certainly compatible with feedback impacting the dynamics. We also now discuss that the stacked-elliptical structure is likely compatible with the need for flexible responses to unpredictable perturbations / errors:

      "We did not attempt to simulate feedback control that takes into account unpredictable sensory inputs and produces appropriate corrections (Stavisky et al. 2017; Pruszynski and Scott 2012; Pruszynski et al. 2011; Pruszynski, Omrani, and Scott 2014). However, there is no conflict between the need for such control and the general form of the solution observed in both networks and cortex. Consider an arbitrary feedback control policy: 𝑧 = 𝑔 𝑐 (𝑡, 𝑢 𝑓 ) where 𝑢 is time-varying sensory input arriving in cortex and is a vector of outgoing commands. The networks we 𝑓 𝑧 trained all embody special cases of the control policy where 𝑢 is either zero (most simulations) or predictable (Figure 𝑓 9) and the particulars of 𝑧 vary with monkey and cycling direction. The stacked-elliptical structure was appropriate in all these cases. Stacked-elliptical structure would likely continue to be an appropriate scaffolding for control policies with greater realism, although this remains to be explored."

      The observation that cortical activity looks more like the pattern-generating modes in the RNN than the EMG seem to be consistent with this interpretation. On the other hand, speed-dependent shifts for motor cortical activity in walking cats (where the pattern generator survives the removal of cortex and is known to be spinal) seems qualitatively similar to the speed modulation reported here, at least at the level of single neurons (e.g., Armstrong & Drew, J. Physiol. 1984; Beloozerova & Sirota, J. Physiol. 1993). More generally, the authors may wish to contextualize their work within the broader literature on mammalian central pattern generators.

      We agree our discussion of this topic was thin. We have expanded the relevant section of the Discussion. Interestingly, Armstrong 1984 and Beloozerova 1993 both report quite modest changes in cortical activity with speed during locomotion (very modest in the case of Armstrong). The Foster et al. study agrees with those earlier studies, although the result is more implicit (things are stacked, but separation is quite small). Thus, there does seem to be an intriguing difference between what is observed in cortex during cycling (where cortex presumably participates heavily in rhythm/pattern generation) and during locomotion (where it likely does not, and concerns itself more with alterations of gait). This is now discussed:

      "Such considerations may explain why (Foster et al. 2014), studying cortical activity during locomotion at different speeds, observed stacked-elliptical structure with far less trajectory separation; the ‘stacking’ axis captured <1% of the population variance, which is unlikely to provide enough separation to minimize tangling. This agrees with the finding that speed-based modulation of motor cortex activity during locomotion is minimal (Armstrong and Drew 1984) or modest (Beloozerova and Sirota 1993). The difference between cycling and locomotion may reflect cortex playing a less-central role in the latter. Cortex is very active during locomotion, but that may reflect cortex being ‘informed’ of the spinally generated locomotor rhythm for the purpose of generating gait corrections if necessary (Drew and Marigold 2015; Beloozerova and Sirota 1993). If so, there would be no need for trajectories to be offset between speeds because they are input-driven, and need not display low tangling."

      For instance, some conclusions of this study seem to parallel experimental work on the locomotor CPG, where a constant input (electrical or optogenetic stimulation of the MLR at a frequency well above the stepping rate) drives walking, and changes in this input smoothly modulate step frequency.

      We now mention this briefly when introducing the simulated networks and the modeling choices that we made:

      "Speed was instructed by the magnitude of a simple static input. This choice was made both for simplicity and by rough analogy to the locomotor system; spinal pattern generation can be modulated by constant inputs from supraspinal areas (Grillner, S. 1997). Of course, cycling is very unlike locomotion and little is known regarding the source or nature of the commanding inputs. We thus explore other possible input choices below."

      If the input to the RNN were rhythmic, the network dynamics would likely be qualitatively different. The use of a constant input is reasonable, but it would be useful for the authors to elaborate on this choice and its implications for network dynamics and control. For example, one might expect high tangling to present less of a problem for a periodically forced system than a time-invariant system. This issue is raised in line 210ff, but could be developed a bit further.

      To investigate, we trained networks (many, each with a different initial weight initialization) to perform the same task but with a periodic forcing input. The stacked-elliptical solution often occurred, but other solutions were also common. The non-stacking solutions relied strongly on the ‘tilt’ strategy, where trajectories tilt into different dimensions as speed changes. There is of course nothing wrong with the ‘tilting’ strategy; it is a perfectly good way to keep tangling low. And of course it was also used (in addition to stacking) by both the empirical data and by graded-input networks (see section titled ‘Trajectories separate into different dimensions’). This is now described in the text (and shown in Figure 3 - figure supplement 2):

      "We also explored another plausible input type: simple rhythmic commands (two sinusoids in quadrature) to which networks had to phase-lock their output. Clear orderly stacking with speed was prominent in some networks but not others (Figure 3 - figure supplement 2a,b). A likely reason for the variability of solutions is that rhythmic-input-receiving networks had at least two “choices”. First, they could use the same stacked-elliptical solution, and simply phase-lock that solution to their inputs. Second, they could adopt solutions with less-prominent stacking (e.g., they could rely primarily on ‘tilting’ into new dimensions, a strategy we discuss further in a subsequent section)."

      This addition is clarifying because knowing that there are other reasonable solutions (e.g., pure tilt with little stacking), as it makes it more interesting that the stacked-elliptical solution was observed empirically. At the same time, the lesson to be drawn from the periodically forced networks isn’t 100% clear. They sometimes produced solutions with realistic stacking, so they are clearly compatible with the data. On the other hand, they didn’t do so consistently, so perhaps this makes them a bit less appealing as a hypothesis. Potentially more appealing is the hypothesis that both input types (a static, graded input instructing speed and periodic inputs instructing phase) are used. We strongly suspect this could produce consistently realistic solutions. However, in the end we decided we didn’t want to delve too much into this, because neither our data nor our models can strongly constrain the space of likely network inputs. This is noted in the Discussion:

      "The desirability of low tangling holds across a broad range of situations (Russo et al. 2018). Consistent with this, we observed stacked-elliptical structure in networks that received only static commands, and in many of the networks that received rhythmic forcing inputs. Thus, the empirical population response is consistent with motor cortex receiving a variety of possible input commands from higher motor areas: a graded speed-specifying command, phase-instructing rhythmic commands, or both.."

      The use of a constant input should also be discussed in the context of cortical physiology, as motor cortex will receive rhythmic (e.g., sensory) input during the task. The argument that time-varying input to cortex will itself be driven by cortical output (475ff) is plausible, but the underlying assumption that cortex is the principal controller for this movement should be spelled out. Furthermore, this argument would suggest that the RNN dynamics might reflect, in part, the dynamics of the arm itself, in addition to those of the brain regions discussed in line 462ff. This could be unpacked a bit in the Discussion.


      We agree this is an important topic and worthy of greater discussion. We have also added simulations that directly address this topic. These are shown in the new Figure 9 and described in the new section ‘Generality of the network solution’:

      "Given that stacked-elliptical structure can instantiate a wide variety of input-output relationships, a reasonable question is whether networks continue to adopt the stacked-elliptical solution if, like motor cortex, they receive continuously evolving sensory feedback. We found that they did. Networks exhibited the stacked-elliptical structure for a variety of forms of feedback (Figure 9b,c, top rows), consistent with prior results (Sussillo et al. 2015). This relates to the observation that “expected” sensory feedback (i.e., feedback that is consistent across trials) simply becomes part of the overall network dynamics (M. G. Perich et al. 2020). Network solutions remained realistic so long as feedback was not so strong that it dominated network activity. If feedback was too strong (Figure 9b,c, bottom rows), network activity effectively became a representation of sensory variables and was no longer realistic."

      We agree that the observed dynamics may “reflect, in part, the dynamics of the arm itself, in addition to those of the brain regions discussed”, as the reviewer says. At the same time, it seems to us quite unlikely that they primarily reflect the dynamics of the arm. We have added the following to the Discussion to outline what we think is most likely:

      "This second observation highlights an important subtlety. The dynamics shaping motor cortex population trajectories are widely presumed to reflect multiple forms of recurrence (Churchland et al. 2012): intracortical, multi-area (Middleton and Strick 2000; Wang et al. 2018; Guo et al. 2017; Sauerbrei et al. 2020) and sensory reafference (Lillicrap and Scott 2013; Pruszynski and Scott 2012). Both conceptually (M. G. Perich et al. 2020) and in network models (Sussillo et al. 2015), predictable sensory feedback becomes one component supporting the overall dynamics. Taken to an extreme, this might suggest that sensory feedback is the primary source of dynamics. Perhaps what appear to be “neural dynamics” merely reflect incoming sensory feedback mixed with outgoing commands. A purely feedforward network could convert the former into the latter, and might appear to have rich dynamics simply because the arm does (Kalidindi et al. 2021). While plausible, this hypothesis strikes us as unlikely. It requires sensory feedback, on its own, to create low-tangled solutions across a broad range of tasks. Yet there exists no established property of sensory signals that can be counted on to do so. If anything the opposite is true: trajectory tangling during cycling is relatively high in somatosensory cortex even at a single speed (Russo et al. 2018). The hypothesis of purely sensory-feedback-based dynamics is also unlikely because population dynamics begin unfolding well before movement begins (Churchland et al. 2012). To us, the most likely possibility is that internal neural recurrence (intra- and inter-area) is adjusted during learning to ensure that the overall dynamics (which will incorporate sensory feedback) provide good low-tangled solutions for each task. This would mirror what we observed in networks: sensory feedback influenced dynamics but did not create its dominant structure. Instead, the stacked-elliptical solution emerged because it was a ‘good’ solution that optimization found by shaping recurrent connectivity."

      As the reviewer says, our interpretation does indeed assume M1 is central to movement control. But of course this needn’t (and probably doesn’t) imply dynamics are only due to intra-M1 recurrence. What is necessarily assumed by our perspective is that M1 is central enough that most of the key signals are reflected there. If that is true, tangling should be low in M1. To clarify this reasoning, we have restructured the section of the Discussion that begins with ‘Even when low tangling is desirable’.

      The low tangling in the dominant dimensions of the RNN is interpreted as a signature of robust pattern generation in these dimensions (lines 207ff, 291). Presumably, dimensions related to muscle activity have higher tangling. If these muscle-related dimensions transform the smooth, rhythmic pattern into muscle activity, but are not involved in the generation of this smooth pattern, one might expect that recurrent dynamics are weaker in these muscle-related dimensions than in the first three principal components. That is, changes along the dominant, pattern-generating dimensions might have a strong influence on muscle-related dimensions, while changes along muscle-related dimensions have little impact on the dominant dimensions. Is this the case?


      A great question and indeed it is the case. We have added perturbation analyses of the model showing this (Figure 3f). The results are very clear and exactly as the reviewer intuited.

      It would be useful to have more information on the global dynamics of the RNN; from the figures, it is difficult to determine the flow in principal component space far from the limit cycle. In Fig. 3E (right), perturbations are small (around half the distance to the limit cycle for the next speed); if the speed is set to eight, would trajectories initialized near the bottom of the panel converge to the red limit cycle? Visualization of the vector field on a grid covering the full plotting region in Fig. 3D-E with different speeds in different subpanels would provide a strong intuition for the global dynamics and how they change with speed.


      We agree that both panels in Figure 3e were hard to visually parse. We have improved it, but fundamentally it is a two-dimensional projection of a flow-field that exists in many dimensions. It is thus inevitable that it is hard to follow the details of the flow-field, and we accept that. What is clear is that the system is stable: none of the perturbations cause the population state to depart in some odd direction, or fall into some other attractor or limit cycle. This is the main point of this panel and the text has been revised to clarify this point:

      "When the network state was initialized off a cycle, the network trajectory converged to that cycle. For example, in Figure 3e (left) perturbations never caused the trajectory to depart in some new direction or fall into some other limit cycle; each blue trajectory traces the return to the stable limit cycle (black).

      Network input determined which limit cycle was stable (Figure 3e, right)."

      One could of course try and determine more about the flow-fields local to the trajectories. E.g., how quickly do they return activity to the stable orbit? We now explore some aspects of this in the new Figure 3f, which gets at a property that is fundamental to the elliptical solution. At the same time, we stress that some other details will be network specific. For example, networks trained in the presence of noise will likely have a stronger ‘pull’ back to the canonical trajectory. We wish to avoid most of these details to allow us to concentrate on features of the solution that 1) were preserved across networks and 2) could be compared with data.

      What was the goodness-of-fit of the RNN model for individual muscles, and how was the mean-squared error for the EMG principal components normalized (line 138)? It would be useful to see predicted muscle activity in a similar format as the observed activity (Fig. 2D-F), ideally over two or three consecutive movement cycles.

      The revision clarifies that the normalization is just the usual one we are all used to when computing the R^2 (normalization by total variance). We have improved this paragraph:

      "Success was defined as <0.01 normalized mean-squared error between outputs and targets (i.e., an R^2 > 0.99). Because 6 PCs captured ~95% of the total variance in the muscle population (94.6 and 94.8% for monkey C and D), linear readouts of network activity yielded the activity of all recorded muscles with high fidelity."

      Given this accuracy, plotting network outputs would be redundant with plotting muscle activity as they would look nearly identical (and small differences would of course be different for every network.

      A related issue is whether the solutions are periodic for each individual node in the 50-dimensional network at each speed (as is the case for the first few RNN principal components and activity in individual cortical neurons and the muscles). If so, this would seem to guarantee that muscle decoding performance does not degrade over many movement cycles. Some additional plots or analysis might be helpful on this point: for example, a heatmap of all dimensions of v(t) for several consecutive cycles at the same speed, and recurrence plots for all nodes. Finally, does the period of the limit cycle in the dominant dimensions match the corresponding movement duration for each speed?


      These are good questions; it is indeed possible to obtain ‘degenerate’ non-periodic solutions if one is not careful during training. For example, if during training, you always ask for 3 cycles, it becomes possible for the network to produce a periodic output based on non-periodic internal activity. To ensure this did not happen, we trained networks with variable number of cycles. Inspection confirmed this was successful: all neurons (and the ellipse that summarizes their activity) showed periodic activity. These points are now made in the text:

      "Networks were trained across many simulated “trials”, each of which had an unpredictable number of cycles. This discouraged non-periodic solutions, which would be likely if the number of cycles were fixed and small.

      Elliptical network trajectories formed stable limit cycles with a period matching that of the muscle activity at each speed."

      We also revised the relevant section of the Methods to clarify how we avoided degenerate solutions, see section beginning with:

      “One concern, during training, is that networks may learn overly specific solutions if the number of cycles is small and stereotyped”.

      How does the network respond to continuous changes in input, particularly near zero? If a constant input of 0 is followed by a slowly ramping input from 0-1, does the solution look like a spring, as might be expected based on the individual solutions for each speed? Ramping inputs are mentioned in the Results (line 226) and Methods (line 805), but I was unable to find this in the figures. Does the network have a stable fixed point when the input is zero?


      For ramping inputs within the trained range, it is exactly as the reviewer suggests. The figure below shows a slowly ramping input (over many seconds) and the resulting network trajectory. That trajectory traces a spiral (black) that traverses the ‘static’ solutions (colored orbits).

      It is also true that activity returns to baseline levels when the input is turned off and network output ceases. For example, the input becomes zero at time zero in the plot below.

      The text now notes the stability when stopping:

      "When the input was returned to zero, the elliptical trajectory was no longer stable; the state returned close to baseline (not shown) and network output ceased."

      The text related to the ability to alter speed ‘on the fly’ has also been expanded:

      "Similarly, a ramping input produced trajectories that steadily shifted, and steadily increased in speed, as the input ramped (not shown). Thus, networks could adjust their speed anywhere within the trained range, and could even do so on the fly."

      The Discussion now notes that this ramping of speed results in a helical structure. The Discussion also now notes, informally, that we have observed this helical structure in motor cortex. However, we don’t want to delve into that topic further (e.g., with direct comparisons) as those are different data from a different animal, performing a somewhat different task (point-to-point cycling).

      As one might expect, network performance outside the trained range of speeds (e.g., during an input is between zero and the slowest trained speed) is likely to be unpredictable and network-specific. There is likely is a ‘minimum speed’ below which networks can’t cycle. This appeared to also be true of the monkeys; below ~0.5 Hz their cycling became non-smooth and they tended to stop at the bottom. (This is why our minimum speed is 0.8 Hz). However, it is very unclear whether there in any connection between these phenomena and we thus avoid speculating.

      Why were separate networks trained for forward and backward rotations? Is it possible to train a network on movements in both directions with inputs of {-8, …, 8} representing angular velocity? If not, the authors should discuss this limitation and its implications.


      Yes, networks can readily be trained to perform movements in both directions, each at a range of speeds. This is now stated:

      "Each network was trained to produce muscle activity for one cycling direction. Networks could readily be trained to produce muscle activity for both cycling directions by providing separate forward- and backward-commanding inputs (each structured as in Figure 3a). This simply yielded separate solutions for forward and backward, each similar to that seen when training only that direction. For simplicity, and because all analyses of data involve within-direction comparisons, we thus consider networks trained to produce muscle activity for one direction at a time."

      As noted, networks simply found independent solutions for forward and backward. This is consistent with prior work where the angle between forward and backward trajectories in state space is sizable (Russo et al. 2018) and sometimes approaches orthogonality (Schroeder et al. 2022).

      It is somewhat difficult to assess the stability of the limit cycle and speed of convergence from the plots in Fig. 3E. A plot of the data in this figure as a time series, with sweeps from different initial conditions overlaid (and offset in time so trajectories are aligned once they're near the limit cycle), would aid visualization. Ideally, initial conditions much farther from the limit cycle (especially in the vertical direction) would be used, though this might require "cutting and pasting" the x-axis if convergence is slow. It might also be useful to know the eigenvalues of the linearized Poincaré map (choosing a specific phase of the movement) at the fixed point, if this is computationally feasible.

      See response to comment 4 above. The new figure 3f now shows, as a time series, the return to the stable orbit after two types of perturbations. This specific analysis was suggested by the reviewer above, and we really like it because it gets at how the solution works. One could of course go further and try to ascertain other aspects of stability. However, we want to caution that is a tricky and uncertain path. We found that the overall stacked-elliptical solution was remarkably consistent among networks (it was shown by all networks that received a graded speed-specifying input). The properties documented in Figure 3f are a consistent part of that consistent solution. However, other detailed properties of the flow field likely won’t be. For example, some networks were trained in the presence of noise, and likely have a much more rapid return to the limit cycle. We thus want to avoid getting too much into those specifics, as we have no way to compare with data and determine which solutions mimic that of the brain.

      Reviewer #2 (Public Review):

      The study from Saxena et al "Motor cortex activity across movement speeds is predicted by network-level strategies for generating muscle activity" expands on an exciting set of observations about neural population dynamics in monkey motor cortex during well trained, cyclical arm movements. Their key findings are that as movement speed varies, population dynamics maintain detangled trajectories through stacked ellipses in state space. The neural observations resemble those generated by in silico RNNs trained to generate muscle activity patterns measured during the same cycling movements produced by the monkeys, suggesting a population mechanism for maintaining continuity of movement across speeds. The manuscript was a pleasure to read and the data convincing and intriguing. I note below ideas on how I thought the study could be improved by better articulating assumptions behind interpretations, defense of the novelty, and implications could be improved, noting that the study is already strong and will be of general interest.

      We thank the reviewer for the kind words and nice summary of our results.

      Primary concerns/suggestions:

      1 Novelty: Several of the observations seem an incremental change from previously published conclusions. First, detangled neural trajectories and tangled muscle trajectories was a key conclusion of a previous study from Russo et al 2018. The current study emphasizes the same point with the minor addition of speed variance. Better argument of the novelty of the present conclusions is warranted. Second, the observations that motor cortical activity is heterogenous are not new. That single neuronal activity in motor cortex is well accounted for in RNNs as opposed to muscle-like command patterns or kinematic tuning was a key conclusion of Sussillo et al 2015 and has been expanded upon by numerous other studies, but is also emphasized here seemingly as a new result. Again, the study would benefit from the authors more clearly delineating the novel aspects of the observations presented here.

      The extensive revisions of the manuscript included multiple large and small changes to address these points. The revisions help clarify that our goal is not to introduce a new framework or hypothesis, but to test an existing hypothesis and see whether it makes sense of the data. The key prior work includes not only Russo and Sussillo but also much of the recent work of Jazayeri, who found a similar stacked-elliptical solution in a very different (cognitive) context. We agree that if one fully digested Russo et al. 2018 and fully accepted its conclusions,then many (but certainly not all) of the present results are expected/predicted in their broad strokes. (Similarly, if one fully digested Sussillo et al. 2015, much of Russo et al. is expected in its broad strokes). However, we see this as a virtue rather than a shortcoming. One really wants to take a conceptual framework and test its limits. And we know we will eventually find those limits, so it is important to see how much can be explained before we get there. This is also important because there have been recent arguments against the explanatory utility of network dynamics and the style of network modeling we use to generate predictions. Iit has been argued that cortical dynamics during reaching simply reflect sequence-like bursts, or arm dynamics conveyed via feedback, or kinematic variables that are derivatives of one another, or even randomly evolving data. We don’t want to engage in direct tests of all these competing hypotheses (some are more credible than others) but we do think it is very important to keep adding careful characterizations of cortical activity across a range of behaviors, as this constrains the set of plausible hypotheses. The present results are quite successful in that regard, especially given the consistency of network predictions. Given the presence of competing conceptual frameworks, it is far from trivial that the empirical data are remarkably well-predicted and explained by the dynamical perspective. Indeed, even for some of the most straightforward predictions, we can’t help but remain impressed by their success. For example, in Figure 4 the elliptical shape of neural trajectories is remarkably stable even as the muscle trajectories take on a variety of shapes. This finding also relates to the ‘are kinematics represented’ debate. Jackson’s preview of Russo et al. 2018 correctly pointed out that the data were potentially compatible with a ‘position versus velocity’ code (he also wisely noted this is a rather unsatisfying and post hoc explanation). Observing neural activity across speeds reveals that the kinematic explanation isn’t just post hoc, it flat out doesn’t work. That hypothesis would predict large (~3-fold) changes in ellipse eccentricity, which we don’t observe. This is now noted briefly (while avoiding getting dragged too far into this rabbit hole):

      "Ellipse eccentricity changed modestly across speeds but there was no strong or systematic tendency to elongate at higher speeds (for comparison, a ~threefold elongation would be expected if one axis encoded cartesian velocity)."

      Another result that was predicted, but certainly didn’t have to be true, was the continuity of solutions across speeds. Trajectories could have changed dramatically (e.g., tilted into completely different dimensions) as speed changed. Instead, the translation and tilt are large enough to keep tangling low, while still small enough that solutions are related across the ~3-fold range of speeds tested. While reasonable, this is not trivial; we have observed other situations where disjoint solutions are used (e.g., Trautmann et al. COSYNE 2022). We have added a paragraph on this topic:

      "Yet while the separation across individual-speed trajectories was sufficient to maintain low tangling, it was modest enough to allow solutions to remain related. For example, the top PCs defined during the fastest speed still captured considerable variance at the slowest speed, despite the roughly threefold difference in angular velocity. Network simulations (see above) show both that this is a reasonable strategy and also that it isn’t inevitable; for some types of inputs, solutions can switch to completely different dimensions even for somewhat similar speeds. The presence of modest tilting likely reflects a balance between tilting enough to alter the computation while still maintaining continuity of solutions."

      As the reviewer notes, the strategy of simulating networks and comparing with data owes much to Sussillo et al. and other studies since then. At the same time, there are aspects of the present circumstances that allow greater predictive power. In Sussillo, there was already a set of well-characterized properties that needed explaining. And explaining those properties was challenging, because networks exhibited those properties only if properly regularized. In the present circumstance it is much easier to make predictions because all networks (or more precisely, all networks of our ‘original’ type) adopted an essentially identical solution. This is now highlighted better:

      "In principle, networks did not have to find this unified solution, but in practice training on eight speeds was sufficient to always produce it. This is not necessarily expected; e.g., in (Sussillo et al. 2015), solutions were realistic only when multiple regularization terms encouraged dynamical smoothness. In contrast, for the present task, the stacked-elliptical structure consistently emerged regardless of whether we applied implicit regularization by training with noise."

      It is also worth noting that Foster et al. (2014) actually found very minimal stacking during monkey locomotion at different speeds, and related findings exist in cats. This likely reflects where the relevant dynamics are most strongly reflected. The discussion of this has been expanded:

      "Such considerations may explain why (Foster et al. 2014), studying cortical activity during locomotion at different speeds, observed stacked-elliptical structure with far less trajectory separation; the ‘stacking’ axis captured <1% of the population variance, which is unlikely to provide enough separation to minimize tangling. This agrees with the finding that speed-based modulation of locomotion is minimal (Armstrong and Drew 1984) or modest (Beloozerova and Sirota 1993) in motor cortex. The difference between cycling and locomotion may be due to cortex playing a less-central role in the latter. Cortex is very active during locomotion, but that likely reflects cortex being ‘informed’ of the spinally generated locomotor rhythm for the purpose of generating gait corrections if necessary (Drew and Marigold 2015; Beloozerova and Sirota 1993). If so, there would be no need for trajectories to be offset between speeds because they are input-driven, and need not display low tangling."

      2 Technical constraints on conclusions: It would be nice for the authors to comment on whether the inherent differences in dimensionality between structures with single cell resolution (the brain) and structures with only summed population activity resolution (muscles) might contribute to the observed results of tangling in muscle state space and detangling in neural state spaces. Since whole muscle EMG activity is a readout of a higher dimensional control signals in the motor neurons, are results influenced by the lack of dimensional resolution at the muscle level compared to brain? Another way to put this might be, if the authors only had LFP data and motor neuron data, would the same effects be expected to be observed/ would they be observable? (Here I am assuming that dimensionality is approximately related to the number of recorded units * time unit and the nature of the recorded units and signals differs vastly as it does between neuronal populations (many neurons, spikes) and muscles (few muscles with compound electrical myogram signals). It would be impactful were the authors to address this potential confound by discussing it directly and speculating on whether detangling metrics in muscles might be higher if rather than whole muscle EMG, single motor unit recordings were made.

      We have added the following to the text to address the broad issue of whether there is a link between dimensionality and tangling:

      "Neural trajectory tangling was thus much lower than muscle trajectory tangling. This was true for every condition and both monkeys (paired, one-tailed t-test; p<0.001 for every comparison). This difference relates straightforwardly to the dominant structure visible in the top two PCs; the result is present when analyzing only those two PCs and remains similar when more PCs are considered (Figure 4 - figure supplement 1). We have previously shown that there is no straightforward relationship between high versus low trajectory tangling and high versus low dimensionality. Instead, whether tangling is low depends mostly on the structure of trajectories in the high-variance dimensions (the top PCs) as those account for most of the separation amongst neural states."

      As the reviewer notes, the data in the present study can’t yet address the more specific question of whether EMG tangling might be different at the level of single motor units. However, we have made extensive motor unit recordings in a different task (the pacman task). It remains true that neural trajectory tangling is much lower than muscle trajectory tangling. This is true even though the comparison is fully apples-to-apples (in both cases one is analyzing a population of spiking neurons). A manuscript is being prepared on this topic.

      3 Terminology and implications: A: what do the authors mean by a "muscle-like command". What would it look like and not look like? A rubric is necessary given the centrality of the idea to the study.

      We have completely removed this term from the manuscript (see above).

      B: if the network dynamics represent the controlled variables, why is it considered categorically different to think about control of dynamics vs control of the variables they control? That the dynamical systems perspective better accounts for the wide array of single neuronal activity patterns is supportive of the hypothesis that dynamics are controlling the variables but not that they are unrelated. These ideas are raised in the introduction, around lines 39-43, taking on 'representational perspective' which could be more egalitarian to different levels of representational codes (populations vs single neurons), and related to conclusions mentioned later on: It is therefore interesting that the authors arrive at a conclusion line 457: 'discriminating amongst models may require examining less-dominant features that are harder to visualize and quantify'. I would be curious to hear the authors expand a bit on this point to whether looping back to 'tuning' of neural trajectories (rather than single neurons) might usher a way out of the conundrum they describe. Clearly using population activity and dynamical systems as a lens through which to understand cortical activity has been transformative, but I fail to see how the low dimensional structure rules out representational (population trajectory) codes in higher dimensions.

      We agree. As Paul Cisek once wrote: the job of the motor system is to produce movement, not describe it. Yet to produce it, there must of course be signals within the network that represent the output. We have lightly rephrased a number of sentences in the Introduction to respect this point. We have also added the following text:

      "This ‘network-dynamics’ perspective seeks to explain activity in terms of the underlying computational mechanisms that generate outgoing commands. Based on observations in simulated networks, it is hypothesized that the dominant aspects of neural activity are shaped largely by the needs of the computation, with representational signals (e.g., outgoing commands) typically being small enough that few neurons show activity that mirrors network outputs. The network-dynamics perspective explains multiple response features that are difficult to account for from a purely representational perspective (Churchland et al. 2012; Sussillo et al. 2015; Russo et al. 2018; Michaels, Dann, and Scherberger 2016)."

      As requested, we have also expanded upon the point about it being fair to consider there to be representational codes in higher dimensions:

      "In our networks, each muscle has a corresponding network dimension where activity closely matches that muscle’s activity. These small output-encoding signals are ‘representational’ in the sense that they have a consistent relationship with a concrete decodable quantity. In contrast, the dominant stacked-elliptical structure exists to ensure a low-tangled scaffold and has no straightforward representational interpretation."

      4 Is there a deeper observation to be made about how the dynamics constrain behavior? The authors posit that the stacked elliptical neural trajectories may confer the ability to change speed fluidly, but this is not a scenario analyzed in the behavioral data. Given that the authors do not consider multi-paced single movements it would be nice to include speculation on what would happen if a movement changes cadence mid cycle, aside from just sliding up the spiral. Do initial conditions lead to predictions from the geometry about where within cycles speed may change the most fluidly or are there any constraints on behavior implied by the neural trajectories?

      These are good questions but we don’t yet feel comfortable speculating too much. We have only lightly explored how our networks handle smoothly changing speeds. They do seem to mostly just ‘slide up the spiral’ as the reviewer says. However, we would also not be surprised if some moments within the cycle are more natural places to change cadence. We do have a bit of data that speaks to this: one of the monkeys in a different study (with a somewhat different task) did naturally speed up over the course of a seven cycle point-to-point cycling bout. The speeding-up appears continuous at the neural level – e.g., the trajectory was a spiral, just as one would predict. This is now briefly mentioned in the Discussion in the context of a comparison with SMA (as suggested by this reviewer, see below). However, we can’t really say much more than this, and we would definitely not want to rule out the hypothesis that speed might be more fluidly adjusted at certain points in the cycle.

      5 Could the authors comment more clearly if they think that state space trajectories are representational and if so, whether the conceptual distinction between the single-neuron view of motor representation/control and the population view are diametrically opposed?

      See response to comment 3B above. In most situations the dynamical network perspective makes very different predictions from the traditional pure representational perspective. So in some ways the perspectives are opposed. Yet we agree that networks do contain representations – it is just that they usually aren’t the dominant signals. The text has been revised to make this point.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: This manuscript documents a very thorough biophysical, structural and functional dissection of interactions between the RNA-binding protein Rrm4 and the endosomal adaptor Upa1 in the filamentous fungus Ustilago maydis. It has been shown previously that the Rrm4-Upa1 interaction is critical for mRNA transport in this system as mRNAs hitchhike on motor-associated endosomes. Here, the authors reveal using modelling that Rrm4 has three MLLE domains, including a cryptic one that had not been identified previously. They then report the crystal structure of MLLE2 and analyze the distribution anf arrangement of the MLLE domains in the protein using SAXS. They then show using pulldowns and isothermal titration calorimetry that MLLE3 is critical for the Upa1 interaction (via the PAM2L domains of Upa1) and that MLLE2 contributes to Rrm4 localization in vivo when the MLLE3-Upa1 interaction is partially impaired. The study suggests that Rrm4 has a platform of MLLE domains for orchestrating Rrm4 function. Overall, this is technically a high quality study. However, a number of points (mostly minor) should be addressed.

      Major comments:

      __A key part of the study if the in vivo work illustrating a role for MLLE2 in regulating Rrm4 localization when the system is sensitized. Some aspects of this part of the work need clarifying.

      a) The authors should show that the abberant staining is indeed microtubule-related with the benomyl experiment that they used in Jankowski et al. 2019. __

      We included this important control in Figure EV5F demonstrating that the aberrant staining is no longer visible after the microtubule inhibitor benomyl treatment

      b) The authors claim from these experiments that MLLE2 contributes to endosomal targeting (as there is ectopic protein on other structures (presumptive microtubules)). However, to make this claim, the authors would need to measure the intensity of the mutant Rrm4 protein on endosomes and/or the colocalization of these Rrm4 variants with endosomes, as they do in other experiments in this paper. Otherwise, it is possible that the MLLE2 deletion has another effect, e.g. increasing protein stability, and thus increasing the likelihood of binding to structures other than endosomes. If available, data on the relative abundance in the cell of the protein expressed from the wild-type control (rrm4-kat) and MLLE2 deletion constructs (e.g. rrm4-m1,2delta-kat) should be provided.

      As indicated by the reviewer, a critical point is identifying a function of MLLE2. Surprisingly, the domain is conserved in evolution, but , we do not see a mutant phenotype under optimal culture conditions. Therefore, we challenged the system and observed the mislocalisation of Rrm4, if the MLLE2 domain is deleted. However, the overall amount of shuttling Rrm4-positive endosomes was not strongly affected according to our kymograph experiments. We observe aberrant staining, which is not seen with the Rrm4 wild-type protein. Thus, under challenging conditions, we do see a function of MLLE2.

      To address the valid point of the reviewer, we quantified the signal intensities in kymographs of the most important Rrm4 variants. As indicated in Figure 5E, we observed that the maximum fluorescence intensity in kymograph signals was reduced when Rrm4 variants are mislocalised to microtubules while the minimum intensities were comparable in all strains. This underlines that a subset of Rrm4 molecules are no longer shuttling through the cell and most likely are attached to microtubules (to prove the involvement of microtubules, we did benomyl treatment which is now shown in Figure EV5F). We also included a Western Blot experiment (Figure EV5G) demonstrating that neither MLLE1 nor MLLE2 deletion impacts the total protein amount of Rrm4. These data support the notion that MLLE2 contributes to endosomal targeting.

      c) Was the data in Figure 5D scored blind of the identity of the samples? Given that the classification has to be done manually, it is important to confirm the phenotypes are robust to blinding (at least for the key comparisons).

      We agree entirely that manual evaluation of microscopic images has to be carried out with utmost care. The phenotype of aberrant microtubule staining is not easily detectable, and it needs an experienced person to quantify this. The data were analyzed by a second experimentalist with experience in evaluating microscopy images to validate the system’s robustness. Notably, the key findings were confirmed in both cases aberrant microtubule staining was only observed when the MLLE domain was mutated. However, the second person reported difficulties in differentiating a bundle of Rrm4 signals or stained microtubules. Therefore, this person quantified higher values with less experience in Rrm4 movement. In essence, we can rely on the key findings. We included the information in the section “Materials and methods” and gave the comparison in Figure EV5H.

      If points b and c are addressed, it should be possible to draw an arrow between the gray question mark protein in Figure 6 and the endosome surface, which is what I assume the authors believe to be case based on their discussion.

      Having addressed both points, we have also improved the model. To this end, we added a second unknown protein component (grey oval with a question mark) that interacts with MLLE2 and the endosomal surface. Thereby the hierarchical order with the accessory role of MLLE2 during endosomal attachment is stressed.

      Minor comments:

      1. The first line of the abstract is quite bold. It is hard to quantify the role of transport vs RNA stability for example, so I suggest this sentence is toned down. Correct, the first line now reads, “Spatiotemporal expression can be achieved by transport and translation of mRNAs at defined subcellular sites”.

      Line 269: change "amount of motile Rrm4-M12delta-Kat positive signals" to "number of motile Rrm4-M12delta-Kat positive signals".

      Changed as mentioned above.

      Figure 3 legend: Insert "Variant" before "amino acids of the FxP and FxxP..." to indicate what is labeled in gray. Change "fond" to "font" in the same sentence.

      Corrected as mentioned above.

      The cartoons of the different protein variants are very helpful but I had problems spotting the Upa1-Pam2L deletions due to the similar gray to the background of the protein. This would perhaps be clearer if the gray used for the background was lighter than it currently is.

      We improved the contrast by reducing the background of Upa1 to a lighter grey tone in all the corresponding figures.

      The residual motility of wild-type Rrm4 when PAM2L1 and PAM2L2 are both mutated (Figure 5C) is reminiscent of what is seen in a complete Upa1 deletion in the group's previous work. It would be helpful to point this out to the reader, as well as the implication that other proteins are contributing to Rrm4's linkage to endosomes. After all, some of these other adaptors might contact MLLE2 of Rrm4.

      We addressed this point by referring to our previous publication with the following sentence: “Comparable to previous reports, we observed residual motility of Rrm4-Kat on shuttling the endosomes if both PAM2L motifs are mutated or if upa1 is deleted. This indicates that additional proteins besides Upa1 are involved in the endosomal attachment of Rrm4 (Pohlmann et al., 2015).”

      Some of the y-axes of the charts should be more descriptive so that the reader can understand the plots even before they consult the legends. For example, in Figure EV4A and EV5D and E, which protein is being to referred to in each 'number of signals' plot should be included. In Figure 5D, 'Hyphae [%]' would be clearer as 'Hyphae with MT staining of Rrm4 [%]'

      We improved this in Figures EV4, 5D and EV5.

      Figure EV5 legend title: this could be misleading as the authors are seeing ectopic MT localization rather than a deficit in microtubule association.

      Corrected to “Deletion of MLLE1Rrm4 and -2 cause aberrant staining of microtubules”.

      Reviewer #1 (Significance (Required)):

      __The Feldbrugge group has previously mapped interactions between Upa1 and Rrm4 (Pohlmann et al., 2015) and some conclusions are corroborated in the paper by Boehm et al. The paper under review is, however, a significant advance due to the identification of the third MLLE domain, detailed biophysical characterization of the interactions, the structural insights, and evidence of a subsidiary role of MLLE2. The work would of course be stronger if the target of MLLE2 had been identified but I think this is beyond the scope of this initial work. To my knowledge, this is one of the most extensive analyses of the interactions mediated by MLLE and PAM domains and will be of interest to others working on these protein features. The work will also appeal to those interested in the links of localizing mRNAs with motor-associated membranes, which is an emerging field.

      Reviewer expertise: I have a long-standing interest in molecular analysis of mRNA trafficking mechanisms. I do not have experience in fungal genetics. __

      **Referee Cross-commenting**

      It seems that we are in agreement that this is solid work and that biochemical and biophysical analysis of the MLLE-PAM interactions will be of significant interest to those working on those domains (or proteins containing those domains). I agree with the comments of the other reviewers and there are clearly some essential minor revisions needed to strengthen the evidence for their conclusions and some clarifications. I think it is a long shot that RNA binding to the RRMs will affect the MLLE-PAM interactions and would require quite a lot of work to show this conclusively. The study would, however, be more impactful if this was shown to be the case, or the target of MLLE2 was found. Nonetheless, I would not say these new avenues of research are necessary to find a home in one of the Review Commons journals.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Devan, Schott-Verdugo et al.

      Summary

      In this study the putative MLLE RNA-binding motifs of the endosomal RNA-binding protein, Rrm4, from Ustilago maydis were examined using structural and genetic analyses. MLLE motifs are conserved in polyA-binding proteins (Pab1/PABPC1) and found also in Rrm4, which was shown to reside on motile endosomes and deliver septin mRNAs for endosome-localized translation during polarized growth. Upa1 on the endosome interacts with Rrm4 via its PAM2L domain that itself interacts with the MLLE domains of proteins like Pab1. Mutations in the known MLLE domain of Rrm4 were earlier shown to affect localization to endosomes. Here, the C-terminal domain of Rrm4 was revealed to have three divergent MLLE motifs using comparative modeling; only two of which were previously predicted. Crystallization and X-ray diffraction analysis of a truncated version of bacterially produced Rrm4, showed MLLE2 is most similar to that of PABPC1 and UBR5, although MLLE1 and 2 are somewhat divergent in the key region of PAM2 binding. Small angle X-ray scattering of recombinant full-length or truncated Rrm4 revealed that the MLLE domains might form a platform that could allow for multiple contacts with different binding partners. In vitro binding studies with different N-terminal GST-tagged versions of the Rrm4 were used to examine for interactions with PAM2 sequences of Upa1 using N-terminal hexa-histidine-SUMO fusions. It was found that Pab1-MLLE interacts with the PAM2, but not PAM2L, domain of Upa1. In contrast, the complete Rrm4 MLLE region (G-Rrm4-NT4) interacted with the PAM2L domain, but not the PAM2 of Upa1. Notably, the interaction with PAM2L required the third MLLE and neither MLLE1 nor MLLE2, nor both. No significant differences in affinity were observed and were similar to that of the Pab1 MLLE. The results also show that the MLLE3 has a higher affinity for the PAM2L2 than PAM2L1 of Upa1.

      To examine the biological role of the Rrm4 MLLEs, U. maydis strains bearing deletions in the domains of Rrm4 were examined for hyphal growth and endosomal transport (latter using Upa1-GFP and Rrm4-mKate2). Only the loss of the MLLE3 domain inhibited polarized growth (as seen with the full deletion of RRM4) and not the deletion of either MLLE1 or 2. Similar results were obtained regarding endosome shuttling. Thus, in line with the biochemical experiments performed the MLLE3 domain alone (of the three identified) is necessary for the biological actions of Rrm4. This suggested the MLLE1 and 2 are not necessary for function under these conditions.

      To examine this further, Upa1 carrying mutations in the PAM2L 1or PAM2L2 domains were examined. It was found that the deletion of both PAM2L domains affected unipolar growth resulting in bipolar growth similar to the deletion of UPA1 alone. This phenotype was observed even upon the deletion of Rrm4 MLLE1 and 2 in the same background as the PAM2L mutants. The mutation of both PAM2L domains led to a reduction in Rrm4-labeled shuttling endosomes, which suggests that these domains help anchor Rrm4 to endosomes. When only the PAM2L1 domain is present in Upa1 there was a larger increase in hyphae with aberrant microtubule staining than upon the loss of PAM2L1. The authors suggest that this indicates PAM2L2 is more important and prescribes an accessory role for MLLE2 in endosome association.

      Comments: Overall, the study seems well conducted. We cannot comment on the structural aspect of the work since this is not our field of expertise. That said, the biochemical and genetic/functional studies appear solid, well thought-out, and clearly presented. No new experiments are necessary to support the general claims of the paper, however, experiments suggested below might make it more revealing with regards to the connection between RNA binding and MLLE-PAM2L interactions (i.e. endosome localization and RNA binding functions).

      1. Line 286 - It reads the they "Next, we investigated the association of Rrm4 -M12D-Kat in strains expressing PAM2L1. Thus, the endosomal attachment was solely dependent on the interaction of MLLE3 with the PAM2L2 sequence of Upa1." Unclear - wouldn't lacking PAM2L1 (and not expressing) fit the logic of the sentence? We corrected this with the sentence, “Next, we investigated the association of Rrm4-M1,2D-Kat in strains expressing Upa1 with mutated PAM2L1”.

      Several questions regarding the specificity of PAM2 vs. PAM2L domains. What happens when you switch/replace the PAM2L1 or 2 of Upa1 with Upa1 PAM2 domains? Are they exclusive? What happens when the MLLE3 of Rrm4 is switched with that of Pab1? And if one does both - does that restore functionality to Rrm4?

      These are very interesting suggestions. Previously, we have shown that a single PAM2L1 or PAM2L2 sequence of Upa1 is sufficient for unipolar growth and recruitment of Rrm4 to endosomes. Please note that Upa1 with mutated PAM2L1 and L2 still contains a PAM2 motif. Furthermore, mutating the PAM2 motif of Upa1 did not affect Rrm4 shuttling or unipolar growth. Thus, switching the domains would mostly address whether the precise location within Upa1 would be important. This is interesting but, unfortunately very labour-intensive and beyond the manuscript’s current scope.

      Switching MLLE3 with MLLE of PAB1 is an interesting approach. One might expect that Rrm4 can be recruited to endosomes again. However, Rrm4 would also interact with numerous other proteins containing PAM2 motifs like deadenylase Not4. Here it would compete with the MLLE of Pab1. Thus, it would be expected that Rrm4 is on the surface, but the protein will be mistargeted to other proteins causing pleiotropic alterations. It will be difficult to judge whether Rrm4 functionality is restored or whether other processes are disturbed. In essence, these are stimulating ideas, but we believe that these experiments are beyond the scope of the current study. In the future, we might address this point by using a heterologous peptide-binding pocket or tethering approach.

      Likewise, what happens if Upa1 only has PAM2L2 instead of only PAM2L1 domains? Does that alter function - perhaps now one can observe a contribution of MLLE1? If it it's there it's likely to have function. Anything known about the post-translational modification of these MLLE or PAM domains? Does it change during unipolar vs. bipolar growth? Perhaps the different MLLE domains are regulated in such a fashion?

      Again also very valid points. Upa1 with two PAM2L2 motifs might interact stronger. The problem is that one PAM2L motif is sufficient for interaction, and we do not see a strong phenotype.

      Currently, we do not know if post-translational modifications regulate the MLLE domains. This could alter the binding affinity or specificity, and by expressing fungal proteins in E. coli, we might have missed this type of regulation. However, we addressed the function of MLLE1 and MLLE2 in U. maydis using a genetic approach. We deleted the corresponding domains and interfered with potential regulation by posttranslational modification. Thus, we cannot exclude post-translational modification, but it appears to be not essential for function. We will address the posttranslational regulation of Rrm4 in more detail in the future.

      Can the authors show whether the binding of mRNA cargo (e.g. Cdc3 mRNA) to the RRM motifs of Rrm4 affects the interaction between any of the MLLE-PAM2L pairs, or vice versa (i.e. does the MLLE-PAM2L interaction affect mRNA binding)?

      In previous studies, we have investigated a version of Rrm4 carrying a mutation in the first RRM motif of Rrm4. According to RNA live imaging, the respective strains exhibit a loss of function phenotype and mRNA transport is strongly affected. However, the endosomal association of Rrm4-mR1-Gfp is not affected, indicating no direct cross-talk between RNA-binding via RRM1 and endosomal attachment via MLLE3. Also, a version of Rrm4 carrying a deletion of all three RRM domains is still shuttling on endosomes. The two functions, i.e. RNA binding and endosomal binding, appears to be carried out by two independent platforms, i.e. three RRMs and three MLLEs, respectively. The overall structure of the protein also reflects this. The RRM domains are structurally clearly separated from the flexible MLLE domains.

      Discussion line 311 It is written that the three MLLE domains "collaborate for optimal functionality..." Perhaps there's a misunderstanding here, but the authors show that MLLE3 domain alone is necessary & sufficient for function, so where is the collaboration? MLLE2 may have an accessory role according to the authors, but we do not know if it is in collaboration with MLLE3 or independent thereof. Since the KD of MLLE3 is not affected by the presence or absence of MLLE1,2 in vitro at least, it may be that they have independent, and not collaborative, roles.

      Correct, we rephrased this more carefully. We omitted the collaboration aspect. It now reads, ”but a sophisticated binding platform consisting of three MLLE domains with MLLE2 and MLLE3 functioning in linking the key RNA transporter to endosomes.”

      Reviewer #2 (Significance (Required)):

      This paper concerns functional domains found in an endosome-localized RNA binding protein, U. maydis Rrm4, which is necessary for localized translation on endosomes and subsequent unipolar growth. Here the authors show using structural, biochemical, and genetic studies that instead of one or two MLLE protein-protein interacting domain in Rrm4 there are three, although one (MLLE3) is necessary and sufficient for full function. This work is for an audience interested in those studying RNA trafficking and its role in cell physiology, which is our expertise. The work is interesting, but it could be made more so especially if a connection was established between the RNA-binding function of the RRM domains and the MLLE-PAM2L interaction(s). At this point it is solid technical work and could be published after minor revisions.

      **Referee Cross-commenting**

      I concur with the comments of the other reviewers in that the work is solid and necessitates minor revisions in order to be published. Clearly, establishing a connection between the RNA-binding function and the MLLE-PAM interactions of Rrm4 would be an interesting and worthy pursuit that might enhance the novelty of the work, but I agree that it could belong to future studies.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      __ Summary: Long-distance subcellular transport of mRNAs is achieved through selective and dynamic interaction with the transport machinery. Using the highly polarized hyphae of Ustilago maydis, the authors previously showed i- that mRNAs can hitchhike on actively transported endosomes for proper distribution, and ii- that the connection between mRNAs and endosomes is mediated by the interaction between a C-terminal MademoiseLLE (MLE) domain of the RNA binding protein Rrm4 and the Upa1 adapter protein. In this study, the authors aimed at more precisely characterizing the structural and molecular bases underlying the Rrm4-Upa1 interaction. Combining structural modeling and X-ray analyses, they discovered a non-canonical, and previously missed, MLE domain (MLE1) in Rrm4, and characterized the structure of the second MLE domains (MLE2) of Rrm4. Through binding assays, they showed that the three MLE domains exhibit different binding properties, and that MLE3 is the only domain capable of binding to the PAM2 domain of Upa1. Consistent with this finding, functional assays performed in U. maydis revealed that MLE3 is the main domain involved in interaction with endosomes and trafficking, MLE1 and 2 having either no or minor functions in this process.

      The manuscript is very-well written, the data are of high quality and clearly presented. A wide range of complementary approaches has been used to molecularly and functionally characterize the different MLE domains of Rrm4. From an "RNA transport" perspective, this manuscript falls short of a main novel findings as the domains characterized in this study (MLE1 and 2) don't have a clear function in connecting mRNAs to the transport machinery. From an "MLE domain" perspective, this work however provides interesting information about non-canonical domains and structures, and about binding and function specificity. As described below, my major concern relates to the role played by the ML2 domain of Rrm4, a role referred to as "accessory" by the authors. __

      __

      Major comments: __

      The authors conclude from their results that ML2 has an accessory role in promoting association with endosomes.

      1- This conclusion is made based on in vivo experiments showing that a form of Rrm4 lacking the M2 domain, in contrast to wild-type Rrm4, aberrantly attached to MTs in a context where the Rrm4-Upa1 interaction mediated by MLE3Rrm4 has been weakened (Upa1-pl2m). Although the results are convincing, their interpretation is less. The authors, indeed, claim that the observed phenotype results from "the static accumulation of Rrm4" due to reduced interaction with endosomes. Why then don't they see a decrease in the motility/transport properties of Rrm4-M2Δ in this context then? Also, do the authors see a decrease in the co-localization of Rrm4-M2Δ with endosomes (which would be expected if the interaction is decreased)? Can the authors perform IP or co-sedimentation experiments to strengthen their hypothesis?

      This is a fair criticism that was also raised by reviewer 1. In the improved version of the manuscript, we now include important control experiments demonstrating that (i) the aberrant localisation is microtubule-dependent (Fig. EV5F) (ii) the mutations do not cause differences in protein amounts of Rrm4 (Fig. EV5G) (iii) the key findings of the aberrant microtubule staining, which were scored manually in microscopic images were verified independently by two persons (Fig. EV5H) and (iv) most importantly, Rrm4 signal intensity is decreased in processive signals of our kymograph analysis (Fig. 5E). We firmly believe that this set of experiments strengthens our conclusion that MLLE2 plays an accessory role in the endosomal attachment (Fig. 6).

      2- Whether MLE2Rrm4 mediates interaction with endosomes through association with Upa1 is unclear, as the binding assays performed in Figure 3 test for association of Rrm4 variants with single isolated domains of Upa1, not with the full-length protein. Assessing the binding of Rrm4-M2Δ variants with Upa1-PL2m would help interpreting the phenotypes described in Figure 5.

      Unfortunately, it is difficult to express full-length Upa1 protein in E. coli due to the presence of extended unstructured regions. To overcome this limitation, we performed yeast two-hybrid experiments with full-length proteins of Rrm4 and Upa1. We were able to recapitulate qualitatively the results observed in vitro using the individual domains.

      Notably, the Rrm4 version carrying a deletion in MLLE1 and MLLE2 interacted with Upa1 versions carrying mutations in PAM2L1 or PAM2L2 (Fig. EV3C), suggesting that both MLLE domains of Rrm4 are dispensable for interaction with Upa1. MLLE3 is sufficient to interact with a single PAM2L sequence of Upa1. This suggests the presence of additional interaction partners for MLLE1 and MLLE2 and is entirely consistent with our genetic and cell biological analysis described in Fig. 5.

      __

      Minor comments: __

      1- The authors have previously characterized the effect of a C-terminal deletion of Rrm4 on Rrm4 motility and binding to Upa1 (Becht et al., 2006; Pohlmann et al., 2015). How their previously-described construct compares to the Rrm4-M3Δ used in this study is unclear (is it the same?).

      It is the identical mutation to allele rrm4GPD from Becht et al. 2006. We indicate the information in the text “(Fig. 4B-C; mutation identical to allele rrm4GPD in Becht et al., 2006).”

      2- page 6, line 141: refer to Fig. 1B rather than Fig. EV1A ?

      We included the reference to Fig. 1B.

      3- page 10, line 274: "Rrm4-Kat was found"

      We corrected this.

      4- page 11, line 286: "in strains expressing Upa1-PAM2L1", replace by "in strains expressing Upa1 with mutated PAM2L1"?

      We corrected this.

      5- The Figures and accompanying legends are overall very clear and detailed. In Figures EV4A and EV5D-E, it would however help if the authors would indicate on the Figure itself, left to each panel which markers/signals is being analyzed (e.g Rrm4-Kat (top) and Upa1-GFP (down) for Figure EV4).

      We clarified this.

      Reviewer #3 (Significance (Required)):

      Active transport of mRNAs along microtubule tracks has been shown to play a key role in the spatio-temporal control of gene expression in various cell types and species. How specific mRNAs mechanistically connect to molecular motors for their transport to their subcellular destination has however for long remained largely unclear. Recent work, including work from the authors, has uncovered that RNAs can hitchhike on membranous organelles through adapter proteins linking mRNAs and RNA binding proteins with trafficking membrane-bound organelles.

      This study aimed at investigating the structural and molecular bases underlying the interaction between RNA binding proteins and endosomes. While their identification and characterization of the MLE1 and MLE2 domains of Rrm4 did not provide significant new insight into the mechanisms involved in the endosome-mediated transport of mRNAs, it uncovered interesting new properties of MLE domains, including structural variations, selective binding and functional specificity. This work should thus be of interest for structural biologists and researchers interested in protein-protein interaction platforms.

      **Referee Cross-commenting**

      Our comments all converge to the idea that this study is solid as it is and requires only minor revision work to support the authors conclusions. Although characterizing further MLE/PAM2 binding specificity and MLE2 interactors would be of great interest and indeed provide a more complete understanding of interaction networks at play, I feel that this is beyond expected revision work.

    1. Author Response:

      Reviewer #1:

      Hu and colleagues employ computed-tomography methods and provide a detailed description of and inferences about the dental system in three early-diverging ceratopsian dinosaur genera represented by rare specimens from China. Their study identifies nuanced tooth replacement rates and patterns. Furthermore, combined with the analysis of dental wear patterns, their study not only elucidates ontogenetic aspects of these early ceratopsians but also explores the implication of such patterns for dietary adaptations among these taxa. The manuscript, therefore, provides unique insights into the anatomical and ecological contexts of ceratopsians in such deep time.

      The manuscript is rich in data that are summarized in multiple tables and figures. It is also well-written and easy to follow. The inference and conclusions made are also overall well supported by the data presented.

      Thank you for your positive comments!

      The only main comment I have concerns the inference made about the dietary adaptation of Yinlong, which is inferred to be characterized by "feeding strategies other than only grinding food with their teeth." I think that this could be expanded a bit more to incorporate dietary breadth as an additional possible explanation, particularly given the lack of conclusive evidence for the predominance of a single plant species. As it stands, the inference (made across lines 475 through 485) may only imply processing the same food resource using non-chewing methods (e.g., gastroliths to triturate fern). Could the incorporation of other, less abrasive plat foods--in addition to the fibrous ferns--in the diet of Yinlong be a possible, additional explanation for the relatively slow tooth replacement and lack of a heavy tooth wear from chewing-related stress?

      We have provided more explanations and discussion for feeding strategies based on analysing the environmental condition and internal features. Firstly, we analyzed the flora of the Shishugou Formation and the environment that Yinlong lived. Then its feeding strategy can be inferred from its body size and tooth characters. The relatively small body length implies that Yinlong likely feeds on some low plants. The morphology of dentitions, the primitive jaw morphology, and the low tooth replacement rate suggest that Yinlong is unlikely to grind tough foods like derived ceratopsians. Yinlong possibly has other feeding strategies such as processing the foodstuffs by gastroliths, which have been found in some other dinosaurs. We have added more comparison with other dinosaurs (i.e., an armoured dinosaur preserved stomach contents and gastroliths). We suggest that ferns such as Angiopteris, Osmunda, and Coniopteris are suitable to be food choices of Yinlong. Some low and tender leaf and other less abrasive plant foods could also be possible.

      Reviewer #2:

      The authors of the present work aimed to describe tooth replacement in early ceratopsian species from the Lower Jurassic of China, and with this novel information, discuss new hypotheses of successive changes in jaw evolution that led to the highly specialized replacement and jaw function of derived ceratopsids. Major strengths of this study include not only the use of microCT-scans and 3D reconstructions to address tooth replacement in three different species of early ceratopsians (Yinlong, Hualianceratops, and Chaoyangsaurus), but also the observation of wear development, pulp cavity development, zahnreihen, and z-spacing and replacement rate to compare between taxa and address the succession of mandibular and replacement changes in the phylogeny of ceratopsian dinosaurs. The aims were achieved and the conclusions are strongly supported by the evidence discussed and the cited bibliography. Figures are clear and captions are concise. The presented information gives evidence for the comparison and discussion of the order of acquisition of different craniomandibular adaptations that lead to a specialized herbivorous diet, useful not only for ceratopsians and ornithischians, but also for other lineages of dinosaurs in the Mesozoic, and further for comparing with extant and extinct lineages of mammals. Dinosaurs not only were fantastic creatures from the past but also achieved different morphologic, physiologic, and behavioral traits unknown to any other creature, even mammals. For ceratopsians, the appearance of dental batteries corresponds to a unique trait only functionally similar to that in hadrosaurs and some sauropods, and understanding the steps that led to that specialized structure allows us to also understand the drivers that later guided their diversification during the Late Cretaceous.

      Thank you for your positive comments!

      Reviewer #3:

      The major strengths of the paper are its thorough level of detail, rich dataset, and easy readability. The figures are excellent and clear.

      One shortcoming of the paper is the lack of measurements -- a table of measurement for each functional and replacement tooth's length, mesiodistal width, and linguolabial width should be provided.

      We thank the reviewer for pointing out this. We have provided each functional and replacement tooth’s total height, maximum mesiodistal width, maximum labiolingual width of all specimens presented in TABLE S1. These data help to support our conclusions.

      Unfortunately the manuscript is not publishable in its current form because the conclusions are not testable based on the limited data provided. The authors stated "All data generated or analysed during this study are included in the manuscript and supporting file." This is not true. Only the 3D models derived from segmentations are provided, not the raw scans. Segmentation-derived models are interpretations, akin to publishing a drawing of a fossil instead of a photograph, which is not generally acceptable under today's publishing standards (drawings can be published alongside photographs). Please upload the raw scans to an appropriate repository such as Morphosource, Dryad, or Morphobank. Scans can be cropped to the dentigerous regions only, so long as scaling information is preserved.

      We have added raw micro-CT scans of all scanned specimens (all cropped to the dentigerous regions) in Dryad as .TIF or .BMP file format. The file object details are also provided in a TXT file ‘README_file.txt’ saved in Dryad, at https://doi.org/10.5061/dryad.9ghx3ffk0.

    1. This behavioral data is fed to machine learning systems that provide predictions about what people will do in the future. She documents how surveillance capitalists have gained immense wealth through the trading of “prediction products,” as companies profit from laying accurate bets on people’s future behaviors. These systems tend to reward the privileged while entrapping the underprivileged, whose choices are particularly constrained.

      Indeed. The machine learning system will tend to learn the most from people's initial wealth. I think we may combine other facts as input (like education background, occupation etc. ) to the machine learning systems to weaken the effect of initial wealth.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1: General comments:

      Fujimoto and collaborators use Nanopore-based cDNA sequencing for genome-wide transcriptome analysis of a collection of hepatocellular carcinomas (HCCs) and matched normal liver tissues. To improve detection of alternatively spliced isoforms and hybrid transcripts potentially deriving from genomic rearrangements, they develop a dedicated pipeline SPLICE, which they benchmark against available software used for the same analysis. Besides having dual functionality (calls both alternative transcripts and fused transcripts), SPLICE seems to outperform previous software in calling alternative/fused transcripts and accuracy. They use the SPLICE pipeline to call isoforms and gene fusions in normal liver cells and HCCs and perform basic functional validations on novel fusions identified. The manuscript is well written, and the analyses are well performed. Perhaps the benchmarking of the SPLICE pipeline could have been more extensive (i.e., performed on additional independent datasets).

      Major points: 1. Line 149-150: "We compared the results of mapping to the reference genome and the reference transcriptome sequences, and removed candidates if both were inconsistent (removal of mapping errors). " Please specify what "both were inconsistent" means.

      Our reply; Thank you for this comment. The accuracy of fusion gene detection is influenced by mapping errors. To remove possible mapping errors, SPLICE aligned reads to the reference genome and the reference transcriptome sequences and compared the results. If the results are inconsistent (for example, GeneA-GeneB in the reference genome and GeneA-GeneB in the transcriptome genome, or GeneA-GeneB in the reference genome and GeneA in the transcriptome genome), SPLICE considers the candidates as false positive and removes them from the analysis.

                We changed the sentence “We compared the results of mapping to the reference genome and the reference transcriptome sequences, and removed candidates if both were inconsistent (removal of mapping errors).” to “we compared the results of mapping to the reference genome and the reference transcriptome sequences, and removed candidates if both results did not detect same fusion genes (removal of mapping errors).”  (line 150-152).
      
      • Concerning TE-derived novel exons, in principle, this may lead to altered expression of the TE-transcript (as the Authors report for L1-MET) or to altered splicing of the transcript (i.e., other exon/introns could be retained or excluded). Can the Authors assess whether the inclusion of the TE in a transcript enhances its expression or affects the splicing of the "parental" transcript? If so, can they verify if the position of the insertion of the TE has any effect on expression and splicing?*

      Our reply; Thank you very much for this important comment. As the reviewer mentioned, exonization of TE may affect the splicing patterns and gene expression levels of transcripts. To determine the effect of TE on expression levels, we compared the expression levels of transcripts with TE-derived novel exons with those of known transcripts of the gene. We found that the expression levels of transcripts with TE-derived novel exon were lower than those of known transcripts (Figure 1 in the reply). Since the same results were observed in all novel transcripts (Fig. 1E,F), most TE exonization would not affect the expression level of transcripts.

                We then analyzed the effects of TE in the splicing change, we compared the numbers of novel splicing junctions between transcripts with TE-derived novel exons and other transcripts in each gene. The proportions of genes with novel splicing junctions were not significantly different between the transcripts with TE-derived novel exons and others (transcripts with TE-derived novel exons; 9.1% and others; 11.9%)  (Figure 2 in the reply). As observed in L1-*MET* and L2-*RHR1*, transposons can affect expression levels and structures of transcripts, however, their effect would be limited to a part of genes.
      

      Figure 1

      Comparison of expression levels of transcripts with TE-derived novel exon and known transcripts. Only transcripts derived from genes with TE-derived novel exons were compared. The total number of transcripts is shown below the plot. Transcript abundance was measured in reads per million reads (RPM), and log10 converted values for RPM were shown in the violinplot. P-values were calculated by Wilcoxon rank-sum test.

      Figure 2

      Comparison of the percentage of novel splicing junction in transcripts with novel TE-derived exon and other transcripts. The total number of genes are shown below the plot. Transcripts with TE-derived novel exons and other transcripts were compared. P-value was calculated by Fisher’s exact test.

      • Can the Authors explain why the NBEAL1-RPL12 was not detected by SPLICE?*

      Our reply; Thank you for this comment. Although NBEAL1-RPL12 fusion was detected by SPLICE, mapping results to the reference genome and the reference transcriptome were inconsistent and removed from the final result. AsNBEAL1-RPL12 was not validated by PCR (Supplemental Fig. S4B) (line 183-184), we consider that this fusion-gene is a false positive, and filtering of SPLICE successfully removed false-positive fusions.

      • Line 332: Can the Authors explain how the total amount of HVB mRNA was determined in each sample? Is it a relative amount calculated from the sequencing data? If so, it should be made clear in the text that this is a fractional measure.*

      Our reply; Thank you very much for this comment. Expression levels were calculated by log10 converted reads per million reads (log10(RPM)) for each sample. We added the following sentences to the "Expression from HBV" subsection in the Results (line 337-338); “Expression levels were estimated by log10 converted support reads per million reads (log10(RPM)) for each sample.”.

      • Fig4a: please specify if the y-axis "number of support reads" reports library normalized values.*

      Our reply; Thank you for this comment. The values of the y-axis are row read counts. We added the following sentences to the Figure legend (line 348); “Y-axis shows the total number of support reads (raw counts).”.

      • HCCs have more HBV-human genome fusion transcripts than normal liver. Could the authors clarify if these HCC transcripts are selectively found in tumors? or whether they are also expressed in normal liver samples? The paragraph starting from line 356 is confusing, and it is difficult to retrieve the above information for both HBs and HBx fusions.*

      Our reply; We apologize for the confusing description. All HBV-human genome fusion transcripts were selectively expressed in tumor or normal liver. We added the following sentence to the "Expression from HBV" subsection in the Results (line 365-366); “All of these HBV-human genome fusion transcripts were selectively expressed in the HCCs and the livers.”.

      • Figure 4C: what was the control used to calculate the relative viability in these analyses?*

      Our reply; Thank you for this comment. Fig. 4C shows the number of HBV-human fusion transcripts in the six categories. If this comment refers to Fig. 4H, cell lines transfected with the empty vector (pIRES2-AcGFP1-Nuc) was used as controls. This has been described in the "Gene overexpression" subsection of Methods (line 716-717).

      • MYT1L: the Authors report the identification of a novel MYT1L transcript downregulated in HCC, and argue it may have a potential tumor-suppressive function. For the sake of clarity, it will be advisable to show also the differential expression (HCC vs. Liver) of the other transcripts expressed from the same locus.*

      Our reply; Thank you for this important comment. In HCCs and normal livers, only the novel MYT1L transcript was expressed from this locus, and no known transcript of MYT1L was expressed. We changed the sentence “In the MYT1Lgene, a highly-conserved novel exon was detected (Fig. 2E), and this transcript was significantly down-regulated in the HCCs” to “In the MYT1L gene, a highly-conserved novel exon was detected (Fig. 2E), and only a transcript with the novel exon was expressed.” (line 471-472).

      • *

      Minor points: 1. Table S4: there is a typo, correct “secific” in “specific”

      Our reply; Thank you very much for this comment. We corrected the typo of Table S4.

      • *

      • *

      *Reviewer #2: General comments:

      Summary: This is both a presentation of a pipeline for analysis of Nanopore RNA-seq data, as well as an analysis of a cohort of 44 hepatocellular carcinomas against matched-normal liver tissue. It presents a number of quite intriguing results from the long-read RNA analysis, and suggests potential new targets for study in HCC. It is also worth noting that the current version of guppy (6) has functionality to detect primer sequences in the middle of reads and split those reads, which may obviate one of the steps in SPLICE.*

      *Major comments:

      1) The work done in this study used data that was basecalled using guppy 3.0.3. Since that version, I am aware of at least two major upgrades to the base caller accuracy, which would likely also improve the accuracy of isoform resolution. Given that the data is relatively low-coverage and that you have an automated workflow for the analysis, I would recommend re-basecalling using an updated basecaller and re-running your analysis using that. This is especially important given your comments in the paper about splice site misalignment.*

      Our reply; Thank you very much for this important comment. We performed basecalling of a sequence data of MCF7 using the latest guppy v6.0.6 and compared the result with that by guppy v3.0.3. We randomly extracted 1M reads from MCF-7 reads that passed qscore filtering in guppy basecaller. The same reads were extracted and basecalled by guppy v3.0.3. These two data were analyzed by SPLICE.

      The average error rate was 4.6 % for v6.0.6 and 6.8 % for v3.0.3. The number of transcripts was 9,674 for v6.0.6 and 9,329 for v3.0.3. Of these, the number of novel transcripts was 446 and 410, respectively. The number of fusion genes was 2 (BCAS3-BCAS4, and BCAS3-ATXN7) by v6.0.6 and one (BCAS3-BCAS4) by v3.0.3. As the reviewer mentioned, we found that using the latest version of guppy improved the accuracy and detected a larger number of transcripts.

      We added the results to Supplemental Table S12. We also changed the sentences from “Second, our analysis removed the change of splicing sites within 5 bp to remove alignment errors (Fig. 1B). We consider that this cutoff value is necessary due to currently available high-error reads (S____upplemental Data S____2). However, sequencing technologies and basecallers are improving, and in the near future, we should be able to use a smaller cutoff value and identify larger numbers of splicing changes.” to “Second, the accuracy of the analysis depends on the sequencing error rate. Although several filters are used for currently available high-error reads (Fig. 1B and ____Supplemental____ Fig. S1), sequencing errors would affect the accuracy of the result. Sequencing technologies and basecallers are improving, and in the near future, we should be able to identify larger numbers of splicing changes with high accuracy (Supplemental Table S10).” (line 538-542).

      2) You have compared your software to another tool for isoform analysis on Nanopore sequencing data, TALON. But a number of other tools exist for this purpose, including stringtie2, flair and bambu. My own testing has shown that stringtie2 outperforms TALON in terms of concordance with Illumina RNA-seq. It is quite important that you perform a complete comparison of your software to the state of the art for this purpose.

      Our reply; Thank you very much for this important comment. We compared our tool with four tools (TALON, FLAIR, StringTie, and bambu). For this comparison, we used sequence data of MCF-7 and HCC (RK107C). We randomly extracted 1 M reads from MCF-7 and HCC (RK107C) sequence data using Seqtk (v1.3) (params: sample -s1 1000000). Reads were mapped to the reference genome sequence (hg38) with minimap2 (v2.17) (params: -ax splice --MD), and the output SAM files were converted to BAM files and sorted with samtools (v1.7) (Li et al. 2009).

      For benchmarking of TALON (v5.0), we corrected aligned reads with TranscriptClean (v2.0.3) (Wyman and Mortazavi 2018). Next, we ran the talon_label_reads module to flagging reads for internal priming (params: --ar 20). TALON database was initialized by running the talon_initialize_database module (params: --l o --5p 500 --3p 300). Then, we ran the talon module to annotate the reads (params: --cov 0.8 --identity 0.8). To output transcript abundance, we first obtained a whitelist using the talon_filter_transcripts module (params: --maxFracA 0.5 --minCount 5), and then quantified transcripts using the talon_abundance module based on the whitelist. For FLAIR (v1.5), the sorted BAM file was converted to BED12 using bin/bam2Bed12.py. We then corrected misaligned splice sites with the flair-correct module. High-confidence isoforms were defined from the corrected reads using the flair-collapse module (params: -s 3 --generate_map). For benchmarking of StringTie (v2.2.1), Stringtie was performed with input files consisting of long-read alignment and reference annotation (params: -L -c 3). For benchmarking of bambu (v2.0.0), Bambu was performed with input files consisting of long-read alignment, reference annotation and reference genome (hg38) (params: min.readCount = 3). Candidates with low expression levels (support reads As a result, SPLICE identified the third-highest number of transcripts followed by FLAIR and StringTie (Supplemental Fig. S3A). In MCF-7 the concordance rate with IsoSeq MCF-7 transcriptome data was the highest in SPLICE for known transcripts and the second highest in SPLICE for novel transcripts (Supplemental Fig. S3B). These results indicate that SPLICE has sufficient accuracy for analyzing transcript aberrations.

      We added the text to the "Comparison of SPLICE method with other tools" subsection of the Results (line 165-177) and the "Benchmarking" subsection of the Methods (line 640-679). We added the results to Supplemental Fig. S3.

      3) Likewise, for fusion detection, you compare to LongGF. You should also compare to (and cite) JAFFAL.

      Our reply; Thank you very much for this important comment. We compared our tool with the two tools (LongGF and JAFFAL). We used 1 M reads randomly extracted from MCF-7 and HCC (RK107C) sequence data as described above.

                For benchmarking of LongGF (v0.1.2), reads were mapped to the reference genome sequence (hg38) with minimap2 (v2.17) (params: -ax splice --MD), and the output SAM files were converted to BAM files and sorted with samtools (v1.7). We then ran the *longgf* module and obtained the list of fusion genes (params: min-overlap-len 100 bin_size 50 min-map-len 200 pseudogene 0 secondary_alignment 0 min_sup_read 3). For benchmarking of JAFFAL (v2.2), we ran the *JAFFAL.groovy* module with zipped fastq files.
      
                In this comparison, close gene pairs (We added the text to the "Comparison of SPLICE method with other tools" subsection in the Results (line 178-186) and the "Benchmarking" subsection in the Methods (line 667-679). We showed the results in Supplemental Fig. 4.
      

      4) In terms of the source code, I have questions. Why did you use BASH to run the Python code, instead of making this into a Python package? Why did you not use the functionality already available in BioPython for a number of basic sequence data handling tasks? Why is there not even a single function defined anywhere, let alone classes?

      At some level, if it works, it works. But I have serious concerns about the long-term maintainability of the code in its current state.

      Our reply; Thank you very much for this critical comment. As the reviewer mentioned, we think it is better to make a python package and use BioPython for maintenance and long-term maintainability of the code. We have been building our analysis pipeline by trial and error, and at this stage, the current scripts are convenient for us (our group may need to learn software development). We provided a Docker package (see the reply to comment 5)), and this would promote usability.

      5) Also related to the code, it is generally the standard now to create a BioConda package or Docker container for a bioinformatics package. BioConda has the advantage that the BioContainers project automatically generate Docker and Singularity containers from it. Please provide one of these.

      Our reply; Thank you very much for this critical comment. We made a Docker file and provided it from our github page. It is available from the "Installation and usage via Docker" section.

      6) There is some quite nice functional validation work done on some of the DE transcripts that would have been hidden in a gene-level analysis. There is also some nice work on detecting HBV fusion genes. These both contain important results which are not mentioned at all in the abstract. I feel like the abstract as it stands is selling the paper short.

      Our reply; Thank you very much for this important comment. We added the following sentences to the abstract; “Comparison of expression levels identified 9,933 differentially expressed transcripts (DETs) in 4,744 genes. Interestingly, 746 genes with DETs, including LINE1-MET transcript, were not found by the gene-level analysis. We also found that fusion transcripts of transposable elements and hepatitis B virus (HBV) were overexpressed in HCCs. In vitro experiments on DETs showed that LINE1-MET and HBV-human transposable elements promoted cell growth.”.

      7) Fig 5C shows a Venn diagram of fusions detected by short-read vs long-read sequencing, in which there is quite low overlap between these. You make the statement in the paper that "a combination of short- and long-reads can detect more fusion genes". I find it more likely that the short-read ICGC data had much greater depth of coverage than the MinION data you produced, which allowed for the detection of fusions that were expressed at much lower levels. This could be easily tested by downsampling the ICGC data to the same amount of sequence data as was generated on the MinION, and re-creating the Venn diagram with the fusions detected that way.

      Our reply; Thank you very much for this very important comment. We compared the amount of data between our long-reads and the previous short-reads. However, the amounts of data were not quite different (Supplemental Fig. S14A). Therefore, differences in depth are not likely to be the cause of the low overlap. We considered that two possibilities could explain the low overlap. First, most of the fusion genes missed by short-read were very low expression levels, less than 1 reads per million reads (RPM) (Supplemental Fig. S14B), therefore, there are many fusion-genes with low expression levels, and they are difficult to be detected. Second, 28.9 % of transcripts in long-reads lacked 5' region (Supplemental Fig. S5 and Supplemental Fig. S14C,D). Therefore fusion-genes whose breakpoints are located in the 5' region were difficult to detect by long-read.

      We added the following sentences to the "Fusion genes" subsection in the Results (line 400-405); “We considered that two possibilities could explain the low overlap. Since the most of the fusion genes missed by short-reads had very low expression levels (Supplemental Fig. S14B), many fusion-genes with low expression levels would be missed by a single approach. In addition, 28.9 % of transcripts in long-reads lacked 5' region (Supplemental Fig. S5 and Supplemental Fig. S14C, D). Therefore fusion-genes whose breakpoints are located in the 5' region would be difficult to detect by long-read.”. We also added a figure on the amount of data to Supplemental Information (Supplemental Fig. S14A).

      8) Figure 5D is very interesting. What do you conclude from that result? Please comment in the manuscript.

      Our reply; Thank you very much for this important comment. We used samples that used for whole-genome sequencing in our previous study. Therefore, a list of SVs is available. We classified fusion-gene to these supported by SVs (SV detected fusion-genes) and others (no SV detected fusion-genes), and compared the expression levels of them (Figure 5D).

      Whole-genome sequencing can accurately identify clonal (high frequency) SVs, however, would miss sub-clonal (low frequency) SVs. Therefore, we considered that no SV detected fusion-genes were generated by sub-clonal SVs. This result suggests that there are a lot of sub-clonal fusion genes, and their expression levels are lower than clonal fusion genes. Although the functional importance of sub-clonal fusion genes is currently unknown, deeper RNA sequencing would detect a larger number of fusion genes.

                We added the following sentences to the “Fusion genes” subsection in the Results (line 410-412); “This result suggests that there are a lot of sub-clonal fusion genes, and their expression levels are lower than clonal fusion genes. Although the functional importance of sub-clonal fusion genes is currently unknown, deeper RNA sequencing would detect a larger number of fusion genes.”.
      

      *Minor comments:

      1) The manuscript has many small errors in English grammar, spelling and style. I would strongly recommend sending it for copy editing before submitting it to a journal.*

      Our reply; Thank you very much for this comment. Due to the limitation of time, the current version has not been proofread by a native-English speaker. We are planning to review English grammar by a native-English speaker.

      2) Neither the results section nor the methods section describing the sequencing that was performed specify whether it was done on a MinION or PromethION (or flongle). While this is implied elsewhere in the paper, it should definitely be specified in the methods at a minimum.

      Our reply; Thank you for this comment. We used a MinION for sequencing. We added the following sentences to the Method section (line 579-580); “Libraries were sequenced on a SpotON FlowCell MKⅠ(R9.4) (Oxford Nanopore), using the MinION sequencer (Oxford Nanopore)”.

      3) You also write in the introduction that your method, SPLICE, was developed for the MinION specifically. Please comment on its applicability to data generated on the PromethION and flongle Nanopore sequencers.

      Our reply; Thank you very much for this comment. We consider that our method is applicable to data from MinION, PromethION, and flongle. We added the following sentence to the Methods section (line 592-593); “In the present study, we analyzed sequence data from MinION. We consider that our method is applicable to data from MinION, PromethION, and flongle.”.

      4) The volcano plot in Fig 3A is missing its dots.

      Our reply; Thank you very much for this comment. We modified the Fig. 3A.

      *Reviewer #3: General comments:

      Summary: In this manuscript, Kiyose et al have developed and tested a novel methodology for identifying splicing alterations, and fusions, from full-length transcript or long read sequencing data. They apply this approach to liver cancer and paired, non-cancerous liver tissue from a prior publication, and use wet-lab/experimental methods to validate their in silico findings. They conclude that their new methodology, SPLICE, outperforms one existing method, and is uniquely suitable to identifying fusion genes.*

      Major Comments: 1) Figure 1B shows a schematic of common error patterns from MinION cDNA sequencing, and the text of the manuscript describes how the authors' new approach (SPLICE), overcomes several of these, e.g. sequencing errors, artificial chimeras, and mapping errors of highly homologous genes. However, there is a fundamental disconnect between the text and the graphic in Figure 1B. This should either be revised for clarity, or an additional graphic or flowchart placed in the supplementary materials to clearly show *how* SPLICE overcomes each of these limitations.

      Our reply; We apologize for the insufficient explanation in Figure 1. We showed a detailed explanation of the data analysis procedure in Supplemental Fig. S1.

      2) Why was TALON the only alternative approach chosen for validation of SPLICE performance? There are a number of other, more advanced pipelines such as SUPPA2, and IsoformSwitchAnalyzeR. It would strengthen the manuscript, and its conclusions, to incorporate at least one of these methods as a second comparator. This is particularly true for IsoformSwitchAnalyzeR, since Kiyose et al identify a number of differentially expressed transcripts (DETs) for genes that are not differentially expressed.

      Our reply; Thank you very much for this important comment. Another reviewer also requested additional benchmarking, therefore we performed an additional performance comparison for the revised manuscript. As SUPPA2 and IsoformSwichAnalyzeR are used to analyze the annotated output GTF files, and direct comparison with SPLICE is difficult. Since IsoformSwichAnalyzeR recommends StringTie as an annotation soft, we compared using StringTie instead.

      We compared the performance of SPLICE with that of four other methods (TALON, FLAIR, StringTie and Bambu) for splicing variant detection. SPLICE identified the third-highest number of transcripts followed by FLAIR and StringTie (Supplemental Fig. S3A). In MCF-7 the concordance rate with IsoSeq MCF-7 transcriptome data was the highest in SPLICE for known transcripts and the second highest in SPLICE for novel transcripts (Supplemental Fig. S3B).

      We added the text to the "Comparison of SPLICE method with other tools" subsection of the Results (line 165-177) and the "Benchmarking" subsection of the Methods (line 640-665). We added the results to Supplemental Fig. 3.

      3) The Venn diagram in Figure 5C appears to show that conventional short read sequencing identifies 46 fusion genes that are not also detected by long read sequencing. However, this result, and its implications are never addressed in the text.

      Our reply; Thank you very much for this important comment. We apologize for the insufficient explanation. We considered that two possibilities could explain the low overlap. First, most of the fusion genes missed by short-read were very low expression levels, less than 1 reads per million reads (RPM) (Supplemental Fig. S14B), therefore these are many fusion-gene with low expression level and they are difficult to be detected. Second, 28.9 % of transcripts in long-reads lacked 5' region (Supplemental Fig. S5 and Supplemental Fig. S14C,D). Therefore fusion-genes whose breakpoints are located in the 5' region were difficult to detect by long-read.

                We added the following sentences to the "Fusion genes" subsection in the Results (line 400-405); “We considered that two possibilities could explain the low overlap. The most of the fusion genes missed by short-reads had very low expression levels (Supplemental Fig. S14B). This result suggests that there are many missed fusion-genes with low expression levels. In addition, 28.9 % of transcripts in long-reads lacked 5' region (Supplemental Fig. S5 and Supplemental Fig. S14C, D). Therefore fusion-genes whose breakpoints are located in the 5' region would be difficult to detect by long-read.”. We also added a figure on the amount of data to Supplemental Information (Supplemental Fig. S14A).
      

      Minor Comments: 1) On pages 20-21, the language used to describe the HBV and/or HCV postive vs negative materials is very confusing. Please clarify that by "HBV- and HCV-related tissues" you in fact mean "HBV-and HCV-infected samples."

      Our reply; We apologize for the confusing wording. We converted "HBV and HCV-related tissues" to " HBV and HCV-infected samples" in the manuscript.

    1. There is no one way of practicing CSP — this would go against the very idea of sustaining students’ cultures! — but there are ways to understand what a CSP approach may require from a teacher

      I believe multilingualism and multiculturalism are what define today's societies, being able to speak more than one language is a need since there is so much language contact around us. I believe that as teachers we have to recognize, respect, and protect the different cultures present in our classroom. Some examples that I can think about are: reading about a legend or myth from different cultures, learning about a holiday from different countries, having students share with each other their country's traditional food, games, music, etc. Finally, I believe CSP practices are about creating a welcoming and safe space for all students.

    1. As we may think

      Considere un dispositivo futuro... en el que un individuo almacene todos sus libros, registros y comunicaciones, y que esté mecanizado para que pueda consultarse con una velocidad y flexibilidad extraordinarias. Es un suplemento íntimo ampliado a su memoria.

    1. The spread of misinformation online is a global problem that requires global solutions. To that end, we conducted an experiment in 16 countries across 6 continents (N = 33,480) to investigate predictors of susceptibility to misinformation and interventions to combat misinformation. In every country, participants with a more analytic cognitive style and stronger accuracy-related motivations were better at discerning truth from falsehood; valuing democracy was also associated with greater truth discernment whereas political conservatism was negatively associated with truth discernment in most countries. Subtly prompting people to think about accuracy was broadly effective at improving the veracity of news that people were willing to share, as were minimal digital literacy tips. Finally, crowdsourced accuracy evaluation was able to differentiate true from false headlines with high accuracy in all countries. The consistent patterns we observe suggest that the psychological factors underlying the misinformation challenge are similar across the globe, and that similar solutions may be broadly effective.
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      Overall we were elated to have received such positive comments on the manuscript, with requests for only minor changes. We have made all suggested changes to clarify or tone down the language as suggested.

      We would like to thank each of the three reviewers for their assessment of our work. We note that all three reviewers agreed the phylogenetic analysis was interesting and convincing. Two of the three reviewers felt the study sufficiently demonstrated roles for Baramicin in the nervous system. We have responded to comments from Reviewer 2 to draw attention to some aspects of the data that they may have been overlooked, which we hope reassures them that our proposal of BaraB and BaraC involvement in the nervous system is robust, coming from different approaches that show consistent results.

      Reviewer 1 and Reviewer 3 compliment the study as being very worthwhile, and for suggesting concrete routes for how an AMP evolved non-immune functions. Both compliment its comprehensiveness, and describe the study as having striking findings that should have broad appeal to audiences interested in the crosstalk between the nervous system and the innate immune system.

      2. Point-by-point description of the revisions

      In the revised manuscript file, we have highlighted all text where changes were made.


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The authors provide convincing evidence for an evolutionary scenario in which duplications of an AMP gene with ancestral immune function led to paralogs specialist for neural functions. They focus on the Baramicin genes, coding for major Toll signalling targets in the context of antifungal defence. Their study uses infection experiments in several Drosophila species, a careful annotation of the Baramicin genes of D. melanogaster, the demonstration of neural expression of BaraB and BaraC, the KD analysis of Bara B revealing lethality and neurological phenotypes, a reconstruction of the evolutionary history of Baramicn genes in Drosophilids and an analysis of the sequence evolution of the IM24 domain providing the neural functions. In general the paper is well written. There are a few places in the manuscript where the language can be improved and one point, which needs clarification: - ine 297: ...,which did not present with... - line 314/315: ...to just 14% that of...to 63% that of - line 459: ..., we this motif... - line 518: What does "... genomic relatedness (by speciation and locus)..." mean? - line 527/528: ...drive behaviour or disease through interactions... - line 532: ... ancestrally encodes distinct peptides involved with either the nervous system or the immune response... line 535: ...with either the nervous system (IM24) or.... Do the data provide enough evidence suggesting that IM24 had a neural function in the ancestor? Ideally the authors should look at neural expression of the Baramicin gene in the ourgroup, S. lebanonensis. The authors later (line571) admit, that they cannot rule out that IM24 is also antimicrobial.

      We thank reviewer #1 for drawing attention to these points. We have made changes to each line to be more concise, clarify our meaning, or fix typos.

      Reviewer #1 (Significance (Required)):

      This is a very comprehensive study, which, to my knowledge for the first time, suggests concrete routes of how an AMP evolved non-immune functions. One of the striking findings of this paper is that duplications and subsequent truncations of the ancestral Baramicin locus linked to specialisation for neural functions occurred independently in different Drosophila lineages.

      We thank reviewer #1 for their very positive comments. We also agree with all suggested changes, including more careful phrasing to emphasize that we have not described a mechanism, just an involvement in the nervous system. For instance, see lines 556-568 are reworked to soften language and explicitly state the ancestral function of IM24 is unknown, and our suggestion that IM24 could underlie Dmel\BaraA interactions with the nervous system is speculation that should be tested.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Hanson and Lemaitre present a genomic and phylogenetic characterization of the Baramicin family of antimicrobial peptide genes in different species. They discover new Baramicin paralogs, united by the presence of an IM24 domain at the N-terminus. They show that among Baramicins, those that are not inducible by infection (which they improperly call non-immune since a protein can be non-inducible by infection and have very important immune functions), are truncated. They propose that an ancestor peptide with immune functions evolved into a neuronal regulator/effector via truncation.

      Although the hypothesis is interesting, the data do not really support it. This manuscript is rather descriptive at this point. The demonstration that IM24 is necessary for neural function is very tenuous. For example, in the paragraphs titled Dmel\BaraB is required in the nervous system during development and Baramicin B plays an important role in the nervous system, I did not find convincing data demonstrating that BaraB is required in the nervous system. The only data that links BaraB to the nervous system is a weak locomotion defect observed in the BaraB mutant. But how many genes, when inactivated, give a locomotion defect? This remains totally unexplained at the molecular level. The authors also mentioned that BaraB is expressed in a subset of mechanosensory neuron cells in the wing. What is the link between this expression and the nubbin phenotype? The authors also mention that data in the literature indicate that BaraC is expressed in glial cells but also in other tissues. Finally, we have no idea what role, if any, these peptides have in the nervous system.

      While the characterization of the Baramicin gene family and its evolution across species is convincing, the link between these AMPs and the nervous system is really too preliminary to be convincing. The manuscript would greatly benefit from being more concise.

      Reviewer #2 (Significance (Required)):

      see above

      We thank reviewer #2 for their fair assessment. We have made edits to soften our phrasing, and to emphasize that we have not described a mechanism, just an involvement, in the nervous system.

      Examples:

      line 270: “integral development role” -> “important for development”

      line 277: “Baramicin B plays an important role in the nervous system“ -> “Baramicin B suppression in the nervous system mimics mutant phenotypes”

      line 532: “Here we demonstrate that the Baramicin antimicrobial peptide gene of Drosophila ancestrally encodes distinct peptides involved with either the nervous system or the immune response.“ -> “Here we demonstrate that the Baramicin antimicrobial peptide gene of Drosophila ancestrally encodes distinct peptides that may interact with either the nervous system (IM24) or invading pathogens (IM10-like, IM22).”

      line 562 new text: “Thus while our results suggest that IM24 of different Baramicin genes might underlie Baramicin interactions with the nervous system, we cannot exclude the possibility that IM24 is also antimicrobial, or even that antimicrobial activity is IM24’s ancestral purpose. Future studies could use tagged IM24 transgenes or synthetic peptides to determine the host binding partner(s) of secreted IM24 from the immune-induced Dmel\BaraA, and/or to see if IM24 binds to microbial membranes.”

      We have also changed all instances of “non-immune Baramicins” to “Baramicins lacking immune induction” or something to that effect (e.g. new Lines 25,464, 469,478-82).

      We also made some small changes to be more concise (e.g. line 387, 447, cut lines 492-495 from previous version, cut lines 506-507 from previous version).

      We have responded below in the reviewer-to-reviewer comments for a few of the specific points raised there, which we hope further assuage some of Reviewer 2’s concerns.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Antimicrobial peptides are main effectors in (insect) immune defenses. It is becoming more and more clear, that AMPs can have pleiotropic effects or even acquire new functions. In the present paper, the authors investigate Baramicin, an antifungal AMP that they described first in publication last year. Here they show that in Drosophila melanogaster Baramicin A, which they described before, has paralogs, that are not immune-inducible. They then show that these paralogs, named BarB and BarC, which are truncated versions of BarA, are expressed in the head and neural tissues. That they have neural functions is supported by targeted gene-silencing experiments. They go on to show, using a comparative approach across Drosophila, that Baramicin A with its antimicrobial function constitutes the ancestral state. Moreover, Baramicin is also enriched in head samples of some of the other Drosophila species they study. This manuscript, which according to the acknowledgements has already been seen by reviewers, is in a very good shape.

      I have only a number of minor points, that might help to clarify the presentation.

      Lines 34-36: I would delete this sentence and replace it with a statement based on the main findings of the manuscript

      We now conclude the abstract with “As many AMP genes encode polypeptides, a full understanding of how immune effectors interact with the nervous system will require consideration of all their peptide products.”

      Lines 56-60. May be tone down a bit. Anti-inflammatory activities of AMPs have been known for a long time. I think the next paragraph makes a very good case what is already known and is hence a nice motivation for the current study.

      Toned down. This part now reads: “However AMPs and AMP-like genes in many species have recently been implicated in non-immune roles in flies, nematodes, and humans, suggesting non-immune functions might help explain AMP evolutionary patterns.”

      Line 125: classical instead of classically

      done

      Line 200: what is a 'novel' time course? I would just describe what has been done.

      Now reads: “We next measured Baramicin expression over development from egg to adult.”

      Line 268: hypomorph, I guess in the literature usually hypomorphic is used.

      done

      Line 279: I would suggest to tone this headline down. This is not a criticism of the paper, but the actual mechanisms of the roles in the nervous system are not studied here.

      Done. Now reads: “Baramicin B suppression in the nervous system mimics mutant phenotypes”

      Line 505: what does not really become clear is whether IM24 plays an important role in the nervous system of fly species that only have BarA.

      Edits from lines 556-568 now help highlight this question.

      Line 540-549. This comparison I find a bit far-fetched, or maybe it needs clarification how doublesex expression is related to Baramicins.

      Being completely honest: the doublesex discussion was requested during previous review at another journal. We agree that it is a bit of a tangent, and so we have removed these sentences.

      Line 584-585. I think that this has been known for much longer from studies in frogs and beetles.

      Our use of “in vivo” might have been a bit squishy here. We have edited this to reflect endogenous loss-of-function study, rather than simply “in vivo,” to clarify our intended sentiment.

      Reviewer #3 (Significance (Required)):

      Overall, I think that this is a very worthwhile and convincing story about the evolution AMPs and how they can acquire new functions. All the main statements are supported by careful experiments and data analysis. The paper does not go into any detail, of how the neurological role of BarB and BarC is achieved, but I think this is beyond the scope of the current manuscript. In short, this is a very worthwhile contribution to the growing literature of the role of AMPs in the nervous system. The authors provide the context of the main published papers in the area in the introduction. As opposed to most papers on this so far, the current manuscript also provides very interesting data on the evolutionary history of the Baramicin genes, both within the main study species, and within other Drosophila species. This paper should appeal to a rather broad audience of researchers interested in innate defenses, AMPs and the crosstalk between the nervous system and the innate immune system.

      My background is insect immunology with a focus on AMPs and evolutionary approach.

      We thank reviewer #3 for their very positive comments. We agree with all suggested changes.

      **Referees cross-commenting**

      This session contains the comments of all reviewers

      Reviewer 3

      Reviewer 2 and I share the view, that the evidence for the effects of BarB and C on the nervous system is rather limited. But I still think, that the paper provides enough new and interesting data that make it a very useful contribution. Though not a neurobiologist, I would assume that providing functional evidence for the role of BarA and B in the nervous system would justify a paper on its own. I agree though, that the relevant sections should be toned down.

      Reviewer 2

      As I mentioned in my review, I found the genomic and phylogenetic analysis interesting and convincing. I therefore totally agréé with reviewers 2 and 3 on that. Whether BarA and B are playing a role in the nervous system and how it does remain speculative. BaraB mutants show locomotion defects. But mutants in mitochondrial genes have locomotion defects. Can we conclude that mitochondria play a role in the nervous system? If I understand correctly, downregulating Bara in neurons only (With Elav-Gal4 driver) does not show the locomotion phenotype. it induces early lethality. How many genes when inactivated in neurons will give rise to such a phenotype? A lot. I really think that the implication of Bara in the nervous system should be seriously toned done and more presented as an hypothesis than a validated fact.

      We would like to note for Reviewer 2 here that it is specifically elav> BaraB-IR that results in lethality, and in weaker gene silencing experiments, adult elav>BaraB-IR flies emerge, and they do suffer locomotor defects. Often, they got stuck in the food shortly after emerging, or would move haphazardly (which was common in flies with nubbin-like wings). We have added explicit mention that elav>BaraB-IR also results in locomotor defects (Line 288-289).

      Our private speculation is that the reason flies fail to emerge from their pupae is because they are so uncoordinated that they sometimes cannot wriggle out of the pupal case before their cuticle hardens. In some instances, both using mutants and RNAi, we observed fully developed adults with mature abdominal pigmentation that died trapped inside their pupal cases.

      We’d also like to emphasize here that despite testing many other Gal4 drivers, including mef2-Gal4 (muscle/myocytes), nubbin-like wings and lethality were only found using elav-Gal4. A role interacting with mitochondria would likely have been revealed using mef2-Gal4, given the importance of mitochondrial function in muscle.

      For BaraC: expression in other tissues (like the rectal pad) could nevertheless be from e.g. nerves innervating the muscles controlling the sphincter. Or it could indeed be entirely unrelated to the nervous system. However we feel the nearly perfect overlap with Repo-expressing cells is a strong argument for a neural role. We also made an effort using RNAi to validate this pattern suggested by scRNAseq, which confirmed a strong knockdown of BaraC-IR with Repo-Gal4 (Fig. 3, Fig. S4).

      We hope these comments clarify for Reviewer 2 why we feel confident in proposing a role for Baramicins in the nervous system, even if we do not investigate a mechanism in this study.

      Reviewer 1

      I agree with reviewer 3 that the main message of the paper providing a concrete scenario of how non-immune functions of AMPs may evolve is an important contribution. A deep investigation of the neural function is definitely going beyond the scope of the paper. Indeed this might be quite tricky. But it would help if the authors could clarify their idea about the ancestral condition. Is there the possibility that IM24 had ancestrally already non-immune function? They are not really clear about this point.

      Reviewer 2

      I agree with the other reviewers that determining the exact role of Bara peptides could be complicated. I just ask that the authors limit themselves to proposing that the peptides have lost their immune function. I stress that this argument is not very strong. It relies solely on the lack of inducibility of these peptides following infection. I still think that the demonstration of the role of Bara in the nervous system is not provided.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Antimicrobial peptides are main effectors in (insect) immune defenses. It is becoming more and more clear, that AMPs can have pleiotropic effects or even acquire new functions. In the present paper, the authors investigate Baramicin, an antifungal AMP that they described first in publication last year. Here they show that in Drosophila melanogaster Baramicin A, which they described before, has paralogs, that are not immune-inducible. They then show that these paralogs, named BarB and BarC, which are truncated versions of BarA, are expressed in the head and neural tissues. That they have neural functions is supported by targeted gene-silencing experiments. They go on to show, using a comparative approach across Drosophila, that Baramicin A with its antimicrobial function constitutes the ancestral state. Moreover, Baramicin is also enriched in head samples of some of the other Drosophila species they study. This manuscript, which according to the acknowledgements has already been seen by reviewers, is in a very good shape.

      I have only a number of minor points, that might help to clarify the presentation.

      Lines 34-36: I would delete this sentence and replace it with a statement based on the main findings of the manuscript

      Lines 56-60. May be tone down a bit. Anti-inflammatory activities of AMPs have been known for a long time. I think the next paragraph makes a very good case what is already known and is hence a nice motivation for the current study.

      Line 125: classical instead of classically

      Line 200: what is a 'novel' time course? I would just describe what has been done.

      Line 268: hypomorph, I guess in the literature usually hypomorphic is used.

      Line 279: I would suggest to tone this headline down. This is not a criticism of the paper, but the actual mechanisms of the roles in the nervous system are not studied here.

      Line 505: what does not really become clear is whether IM24 plays an important role in the nervous system of fly species that only have BarA.

      Line 540-549. This comparison I find a bit far-fetched, or maybe it needs clarification how doublesex expression is related to Baramicins.

      Line 584-585. I think that this has been known for much longer from studies in frogs and beetles.

      Significance

      Overall, I think that this is a very worthwhile and convincing story about the evolution AMPs and how they can acquire new functions. All the main statements are supported by careful experiments and data analysis. The paper does not go into any detail, of how the neurological role of BarB and BarC is achieved, but I think this is beyond the scope of the current manuscript.

      In short, this is a very worthwhile contribution to the growing literature of the role of AMPs in the nervous system. The authors provide the context of the main published papers in the area in the introduction. As opposed to most papers on this so far, the current manuscript also provides very interesting data on the evolutionary history of the Baramicin genes, both within the main study species, and within other Drosophila species.

      This paper should appeal to a rather broad audience of researchers interested in innate defenses, AMPs and the crosstalk between the nervous system and the innate immune system.

      My background is insect immunology with a focus on AMPs and evolutionary approach.

      Referees cross-commenting

      This session contains the comments of all reviewers

      Reviewer 3

      Reviewer 2 and I share the view, that the evidence for the effects of BarB and C on the nervous system is rather limited. But I still think, that the paper provides enough new and interesting data that make it a very useful contribution. Though not a neurobiologist, I would assume that providing functional evidence for the role of BarA and B in the nervous system would justify a paper on its own. I agree though, that the relevant sections should be toned down.

      Reviewer 2

      As I mentioned in my review, I found the genomic and phylogenetic analysis interesting and convincing. I therefore totally agréé with reviewers 2 and 3 on that. Whether BarA and B are playing a role in the nervous system and how it does remain speculative. BaraB mutants show locomotion defects. But mutants in mitochondrial genes have locomotion defects. Can we conclude that mitochondria play a role in the nervous system? If I understand correctly, downregulating Bara in neurons only (With Elav-Gal4 driver) does not show the locomotion phenotype. it induces early lethality. How many genes when inactivated in neurons will give rise to such a phenotype? A lot. I really think that the implication of Bara in the nervous system should be seriously toned done and more presented as an hypothesis than a validated fact.

      Reviewer 1

      I agree with reviewer 3 that the main message of the paper providing a concrete scenario of how non-immune functions of AMPs may evolve is an important contribution. A deep investigation of the neural function is definitely going beyond the scope of the paper. Indeed this might be quite tricky. But it would help if the authors could clarify their idea about the ancestral condition. Is there the possibility that IM24 had ancestrally already non-immune function? They are not really clear about this point.

      Reviewer 2

      I agree with the other reviewers that determining the exact role of Bara peptides could be complicated. I just ask that the authors limit themselves to proposing that the peptides have lost their immune function. I stress that this argument is not very strong. It relies solely on the lack of inducibility of these peptides following infection. I still think that the demonstration of the role of Bara in the nervous system is not provided.

    1. Before we start talking about how to choose search terms and where to search for sources, it can help to get a sense of what we’re hoping to get out of the research. We might think that in order to support a thesis we should only look for sources that prove an idea we want to promote. But since writing academic papers is about joining a conversation, what we really need is to gather the sources that will help us situate our ideas within that ongoing conversation. What we should look for first is not support but the conversation itself: who is saying what about our topic? The sources that make up the conversation may have various kinds of points to make and ultimately may play very different roles in our paper. After all, as we have seen in Chapter 2, an argument can involve not just evidence for a claim but limits, counterarguments, and rebuttals. Sometimes we will want to cite a research finding that provides strong evidence for a point; at other times, we will summarize someone else’s ideas in order to explain how our own opinion differs or to note how someone else’s concept applies to a new situation.  As you find sources on a topic, look for points of connection, similarity and difference between them. In your paper, you will need to show not just what each one says, but how they relate to each other in a conversation.  Describing this conversation can be the springboard for your own original point.

      Arguments not only involve evidence for a claim but for limits, counterarguments, and rebuttals.

    1. First I menaced thee with a feigned one, and hurt thee not for the covenant that we made in the first night, and which thou didst hold truly. All the gain didst thou give me as a true man should. The other feint I proffered thee for the morrow: my fair wife kissed thee, and thou didst give me her kisses–for both those days I gave thee two blows without scathe–true man, true return. But the third time thou didst fail, and therefore hadst thou that blow. For ’tis my weed thou wearest, that same woven girdle, my own wife wrought it, that do I wot for sooth. Now know I well thy kisses, and thy conversation, and the wooing of my wife, for ’twas mine own doing. I sent her to try thee, and in sooth I think thou art the most faultless knight that ever trode earth. As a pearl among white peas is of more worth than they, so is Gawain, i’ faith, by other knights. But thou didst lack a little, Sir Knight, and wast wanting in loyalty, yet that was for no evil work, nor for wooing neither, but because thou lovedst thy life–therefore I blame thee the less.”

      The Green Knight is informing Gawain that none of the strikes were due to the covenant. Instead, he explains that he pretended to strike Gawain the first two times because "Gawain gave him the gifts he received from the lady" (Sparknotes summary part 4 page 1). He then goes on to say that he hurt Gawain on the third strike because Gawain was dishonest about the girdle from Bertilak's wife. However, the Green Knight adds on that he did not kill Gawain because Gawain valued his life, which the Green Knight understood.

      "Sir Gawain and the Green Knight," Sparknotes.com. www.sparknotes.com/lit/gawain/section4/ <accessed 18 May 2022>

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for carefully reading our manuscript. We found their comments to be incredibly thoughtful and constructive and greatly appreciate their feedback. We are confident that addressing the reviewers’ concerns has strengthened our manuscript.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): Camuglia, Chanet and Martin investigate the mechanisms that control cell division orientation in vivo, using the mitotic domains (MDs) in the head of the Drosophila embryo as their main model system. They find that cells in the head mitotic domains rotate and align their spindles within 30 degress of the anterior-posterior axis of the embryo. The Pins protein, implicated in spindle orientation in other systems, is planar polarized in mitotic cells. Pins polarization precedes spindle rotation and is correlated with the division angle (but cell shape is not, violating Hertwig's rule). Overexpression of myristoylated Pins results in uniform Pins distribution on the membrane and affects spindle orientation. alpha-catenin RNAi (but not canoe RNAi) disrupts Pins polarity and spindle orientation in MDs 1, 3 and 5. Low dose CytoD injections (which should disrupt force transmission) also result in defective Pins polarity and spindle orientations. Finally, mechanical isolation by laser ablation also disrupts spindle orienttion. The authors find that preventing mesoderm invagination by snail dsRNA disrupts Pins polarity and spindle orientation in the head. MAJOR 1. Is there a certain chirality in the rotation of the spindles? From Movie 1, it seems like in MDs 1 and 3 at least, a majority of spindles on the right side of the embryo rotate clockwise, while spindles on the left side rotate counter-clockwise? Is that so, and in that case, are there geometric/molecular considerations that could explain that chirality?

      We thank the reviewer for pointing this out. They are correct in that there is a tilt to the spindle orientation relative to the AP axis. To illustrate this tilt, we performed our spindle analysis separately on the right and left sides of MD1 and found that spindles on the left side align with an average division angle of about 30from the AP axis whereas spindles on the right side align with an average division angle of -30from the AP axis. To determine whether spindles on either side rotated with a certain chirality, we found there was no preference in rotating clockwise or counterclockwise on the left and right sides (on the left side of MD1 53% of measured spindles rotated counterclockwise and 47% rotated clockwise, on the right side 46% rotated counterclockwise and 54% clockwise). We have added this data as Fig. 1I-J and discussed in the Results lines 134-145.

      1. The authors are experts in mesoderm invagination, and understandably concentrate on the role that forces from that process may have in the orientation of head MD divisions. However, the cephalic furrow forms much closer to the head MDs, and in an orientation that might also explain the alignment of spindles in the head. Is cephalic furrow formation important for Pins polarity and spindle orientation in the head MDs?

      This was certainly a possibility, but our experimental results strongly argues that mesoderm invagination is most relevant.

      1) Perturbing the ventral furrow (e.g. by Snail depletion) does not block the cephalic furrow (Vincent et al., 1997; Leptin and Grunewald, 1990), but does block mesoderm invagination. Snail depletion strikingly disrupted spindle orientation and Pins localization, which suggests mesoderm is most important.

      2) In addition, depletion of -catenin blocks ventral furrow invagination but not cephalic furrow formation. We see a disruption in spindle orientation and Pins localization in -catenin RNAi, which suggests cephalic furrow itself cannot orient spindles.

      3) Furthermore, light sheet imaging of the Drosophila embryo has shown that the head region of the embryo undergoes tissue movement in the direction of the cell division and that this is associated with mesoderm invagination (Streichan et al., 2018; Stern et al., 2022).

      See movies here: https://www.youtube.com/watch?v=kC11Upr30JY

      To further test the importance of mesoderm invagination, we will perform additional ablation experiments trying to disrupt forces transmitted to the mitotic domains from distinct directions. Once we get this experimental result we will include language in the Discussion that will summarize the experimental results and the weight of the evidence for the roles of either ventral or cephalic furrow.

      1. Does expression of myristoylated Pins affect mesoderm invagination (or cephalic furrow formation)? From Table S1 it seems that a maternal Gal4 driver was used to express myristoylated Pins, which could affect other tissues in the embryo. So it is in principle possible that effects of myristoylated Pins on mesoderm internalization/cephalic furrow formation could affect cell division orientation much like sna loss of function does, but in a mechanism that does not depend on Pins polarity. There is definitely an effect on mesoderm invagination in alpha-catenin RNAi (but not in canoe RNAi) embryos, so I wonder if the effect could be consistently through defects in mesoderm invagination (or cephalic furrow formation), and Pins polarity is really dispensable for spindle orientation. Are there head-specific Gal4 drivers that could be used to drive myristoylated Pins exclusively in the head?

      We apologize that we did not clarify this in the text. Maternal overexpression of myr-Pins does not obviously disrupt mesoderm internalization/cephalic furrow formation. But, we do see that targeted disruption of mesoderm internalization via a Snail depletion affects the orientation of division. Note that our paper demonstrates the effect of force transmission on Pins polarity and division orientation, which is new and the main conclusion. The role of these divisions in morphogenesis is more complicated and is beyond the scope of this study.

      In response to this comment we: 1) added language in the Results that states that gastrulation proceeds in myr-Pins expressing embryos (lines 206-208), 2) Added to the Discussion of the role of these oriented divisions to morphogenesis (lines 443-449), and 3) will add a figure showing ventral furrow and cephalic furrow formation in embryos ectopically expressing the myr-Pins.

      1. Related to the previous point, does mechanical isolation by laser ablation (Figure 6I-N) affect Pins polarity? This experiment could alleviate some of my concerns above, as it certainly does not (should not?) disrupt neither mesoderm invagination nor cephalic furrow formation.

      We agree that it would be useful to look at Pins polarity in laser ablated embryos. Currently, we have been unable to analyze Pins polarity after laser ablation, because the ablation to fully isolate the mitotic domain has bleached our Pins::GFP signal. Also, we have shown that Pins polarity is disrupted by 1) alpha-catenin-RNAi, 2) low dose CytoD injection, and 3) Snail depletion, all of which are expected to disrupt force generation and transmission through tissues.

      In response to the reviewer comment, we will determine if Pins::GFP can be analyzed in less aggressive (directional) laser ablations. Again, remember that myr-Pins does not affect mesoderm internalization and that Snail depletion affects Pins polarity.

      MINOR 1. Figure S5: I am a bit confused about the role of Toll 2, 6, 8 in orienting spindle orientation. In Figure S5D it seems that dsRNA treatment against these genes does not disrupt spindle orientation, but Figure S5F shows quite a significant (p=0.0057) effect in triple mutants. The authors favor the idea that Toll receptors do not affect spindle orientation, but the difference with the mutant should be addressed. Furthermore, what happens in MDs 3, 5 and 14 (if the germband extension defect does not affect those divisions)? Is there a difference between dsRNA and triple mutant embryos in these other MDs?

      We think this is a great point. We stated in the text that TLRs are not solely responsible (line 247) for spindle orientation as they do not recapitulate the random pattern of division seen in the myr-Pins expression condition. We acknowledge the differences between the dsRNA injection and TLR triple mutant in the manuscript (lines 242-247), but our data show a greater importance for the role of force transmission. We favor the idea that other mechanisms contribute to spindle orientation because of the small effect of mutating all three Tolls and the dramatic effects of depleting AJs, inhibiting actin (with CytoD), laser ablation, and blocking mesoderm invagination. The planned laser ablation experiments (described above) will also contribute to addressing this point.

      1. No statistical analysis is provided for any of the differences in polarity between Pins and Gap43, and this should be done to demonstrate the significance of the polarization of Pins. Also, particularly for MD14, they should compare anterior vs. posterior polarity, as based on the images in Figure 2H it is not clear that there is a difference between the anterior and posterior side of cells.

      We thank the reviewer for this point. We have added the statistical comparison.

      1. Figure 2A-D: the authors propose that Pins localizes preferentially to the posterior end of cells (instead of both anterior and posterior ends) in MDs 1, 3 and 14 (and anterior in MD 5). How is the asymmetry in the distribution of Pins along the AP axis accomplished, and is there any significance to it? This should be discussed in a bit more detail (currently no potential mechanisms provided in the discussion, just an acknowledgment of the question).

      __We agree the localization of Pins to the posterior end of cells in MDs 1, 3, and 14 and anterior end in MD 5 is of great interest. The details and further mechanism of this preferential localization are beyond the scope of this paper, but we have added an acknowledgment of the question and discuss possible models that could explain the result (lines 458-460). __TYPOS 1. Line 49: "one daughter cells" should be "one daughter cell". 2. Line 193: "rotation. (Figure 3E-F)." should be "rotation (Figure 3E-F)." 3. Lines 232-237: please review. 4. Line 238: "epithelia cells" should be "epithelial cells".

      We thank the reviewers for carefully reading our manuscript. We have fixed the typos mentioned.

      Reviewer #1 (Significance (Required)): This is the first study to my knowledge that demonstrates the role of mechanical forces in polarizing Pins, and provides a nice model to further investigate how mechanical forces generated in one tissue may affect cell division orientation in distant ones. The paper is clear, well written, and quantitative analysis is present for most results. I have some issues with the statistics (or lack thereof) for a couple of results, and potential alternative interpretations for some experiments that in my opinion should be addressed prior to publication. Specifically, it is not clear to me if Pins polarity is at all necessary for spindle orientation in any of the examined MDs.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): Overview: In this manuscript, Camuglia et al. show Pins/LGN, which is understood to drive spindle orientation, can localize asymmetrically (with respect to the tissue plane) in the Drosophila embryo. Experimental work (including drug treatments, laser ablation, and knockdowns) lead the authors to propose that this asymmetry is driven by tissue-level tension. The findings are quite interesting and the manuscript is well-written overall. Major Comments: • The authors propose that localization is driven by tissue-level tension, but the direction of the tension isn't clear from the experimental work. For example, the laser ablation experiments cut around the entire perimeter of the mitotic domain, rather than along just one tension axis. Similarly, the finding that disruption of the ventral furrow (by Snail RNAi) interferes with spindle orientation in the head is very puzzling; the furrow is A) outside the embryonic head and B) runs in the parallel direction to the divisions considered. The authors need to address the directionality of tension experimentally.

      We thank the reviewer for this comment and agree that better defining the direction of tension would strengthen our manuscript. We showed that blocking mesoderm invagination with Snail depletion disrupts spindle orientation, despite Snail not being required for cephalic furrow formation (refs). Recent light sheet data has shown that mesoderm invagination is associated with global movements throughout the embryo. Furthermore, the ventral furrow extends into the head region just past the anterior of MD5. To address the reviewer’s comments, we plan to: 1) Perform directional laser ablations to determine the directionality of the tension that orients the spindle, 2) Analyze strain rates in the mitotic domains prior to and during division, and 3) Add to our Discussion more about what is said in the literature about the movements that occur in the head during mesoderm invagination.

      • As acknowledged in the text, the asymmetric enrichment of Pins in MD14 is fairly weak. Since the cells being examined here border a divot in the tissue, and might therefore be curving relative to the focal plane, it would be good to rule out the possibility that some of the asymmetry in Pins intensity is just a consequence of cell/tissue geometry. One way this could be achieved is by showing multiple focal planes.

      Good point. We do not think that the asymmetric Pins enrichment in MD14 is due to tissue geometry or junction tilt. 1) MD14 divides ~10-15 minutes after mesoderm invagination is completed, so the cells do not border a divot (as seen with Gap43::mCh, Fig. 2I). The cells do round up, which can be seen as gaps between cells (Fig. 3E). 2) We compare Pins to GapCh and only see an enrichment with Pins (Fig. 2H-K). If the enrichment was due to tissue curvature or junction orientation relative to imaging axis, we would see the same enrichment in GapCh. 3) Expression of myr-Pins randomizes spindle orientation in MD14 (Fig. 3M, N).

      • In Figure 3I (and 3M?), it appears that there are fewer cell divisions in the presence of myr-Pins. Is this the case? Since cell shapes change during division, and cell shapes influence tissue tension, an increase in cell divisions could lead to a change in tissue tension. This would be important to address, since tissue tension plays an important role in the proposed model.

      These images are not taken at the same point of MD1 division ‘wave’, there are the same number of divisions in each condition. These mitotic domains exhibit a ‘wave’ of cell division (Di Talia and Wieschaus, 2012), and so the number of divisions in each image reflect the timing at which we captured the image. Quantifications involved divisions throughout this wave, but we have chosen images for figures which are most representative of what we see. We will add this to the text in the final version of the manuscript.

      • The alpha-catenin and Canoe results are a bit confusing: - The rose plot in Figure 4D doesn't show a random distribution of spindle angles, but rather a modest change; most spindles still orient in the normal range. The p value in the figure legend (0.0012) is very different from the one in the figure (5.8284e-04). - Alpha-catenin is the strongest way to disrupt AJs, but A) the epithelium appears to be intact in the knockdown condition and B) spindle orientation is impacted but not randomized. Does this mean that the knockdown is incomplete? Or is Cadherin-mediated adhesion (in which alpha-catenin participates) only partially responsible for force transduction?

      We acknowledge that perturbation using ____alpha-cat RNAi does not recapitulate the complete disruption of division orientation seen in embryos expressing myr-Pins. This is likely due to the variability in the strength of RNAi knockdown, which is observed for most RNAi lines that we use. To address the reviewer’s comment, we have added rose plots for individual embryos showing extremes in the severity of division orientation disruption (Fig. 4E and F). For the main plot (Fig. 4D), we have included all the data that we took because we obviously did not want to pick and choose which embryos were used for analysis. So Fig. 4D includes all the variability.

      • Given that previous studies implicate Canoe in Pins localization, it seems important to lock down the question of whether Canoe is participating in the mechanism described in this paper. How do the authors know the extent of Canoe knockdown? As suggested by the alpha-catenin results (described above), is it possible that Canoe knockdown is simply not strong enough to impact spindle orientation? Aren't there genetic nulls available? We thank the reviewer for bringing these points to our attention. There are certainly genetic nulls available (Sawyer et al., 2009), but the experiment suggested by the reviewer would not establish the necessity of Canoe in mitotic domain cells. This is because Canoe nulls severely disrupt mesoderm invagination (Sawyer et al., 2009; Jodoin et al., 2015), as well as affecting junctions in the ectoderm during germband extension (Sawyer et al., 2011). Therefore, we would not be able to distinguish what effect of Canoe would be responsible for the spindle orientation using a null mutation. We did better experiments, we used 1) a mutant which specifically compromised mesoderm invagination (snail), 2) laser isolation to show the importance of external force transmission in orienting mitotic domain divisions, and 3) RNAi to deplete Canoe so that mesoderm invagination initiates and pulls on the ectoderm, but where there is clearly compromised Canoe function. This treatment did not cause any effect on spindle orientation arguing against a role of Canoe in this case. In response to the reviewers comment, we added language to the Results to indicate that it is possible that the Canoe knockdown is not strong enough and our rationale for why we did not perform the experiment in a Canoe null (lines 279-282).

      Minor Comments:

      • It can be difficult to interpret some of the spindle orientation data since the AP axis is vertical in the diagrams but horizontal in the rose plots. Can one of these be flipped so they go together?

      We thank the reviewer for this suggestion and have flipped the rose plots so they match the images. Note that because of the large size of the figures, we have had to consistently orient anterior towards the top, which we establish at the beginning of the Results.

      • Figure S3 is important information for the reader and should be ideally moved into the main paper. - Protein localizations referred to in text should be annotated on images, as they can be hard to see.

      We disagree that S3 should be included in the main paper. The myr-Pins reagent has been used previously so the information in S3 is not new (Chanet et al., 2017).

      • There are some discrepancies between figures, legends and text. - p-values differ between figures, legends, and/or text. - Fluorescent markers are labelled differently in figures and legend (CLIP170 in Figure 1) - Graphs appear to show that MD3 polarizes on posterior side, but figure legend says anterior in Figure S1. Vice versa for MD5.

      We thank the reviewer for catching these typos. We have fixed these issues.

      • Ideally, multichannel image overlays should be shown along with individual channels (b/w). However, it is appreciated that the fluorescent signals are exceptionally weak in this study, presenting a challenge to presentation and to quantification.

      We agree the overlays would be nice. However, the Pins::GFP signal is weak compared to the tubulin and Gap43 signals, the merge does not provide more clarity, and the figures are already quite large. Therefore, we have only included the separated the images.

      • Graph axes depicting spindle orientation would be more clear if shown in degrees, instead of normalized or in radians.

      We thank the reviewer for this suggestion. We have changed the graph axes to be in degrees.

      Reviewer #2 (Significance (Required)): Several recent studies have demonstrated that division orientation (in the tissue plane) is governed by tissue level tension. Remarkably, it appears that diverse mechanisms link tension with spindle orientation. Here the authors provide the first in vivo evidence connecting tension to the asymmetric localization of Pins, an important and evolutionarily conserved spindle orientation factor.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): This beautiful manuscript uncovers a role for planar polarized PINS/LGN in orienting the mitotic spindle in Drosophila epithelia. In response to morphogenetic forces acting on adherens junctions, PINS/LGN localises to junctions in a planar polarized fashion to orient the spindle, and de-polarization of PINS/LGN prevents planar spindle orientation. The experiments are very well performed and the findings are robust. The conclusions are well supported by the data. Reviewer #3 (Significance (Required)): These important findings mirror previous work in human cell culture, but crucially reveal that the same phenomenon occurs in vivo in the Drosophila embryo. Thus, the findings underscore the highly conserved nature and in vivo relevance of this phenomenon.

      We thank this reviewer for reading the manuscript and their encouraging words.

    1. Author Response:

      Reviewer #1 (Public Review):

      This paper is of potential interest to researchers performing animal behavioral quantification with computer vision tools. The manuscript introduces 'BehaviorDEPOT', a MATLAB application and GUI intended to facilitate quantification and analysis of freezing behavior from behavior movies, along with several other classifiers based on movement statistics calculated from animal pose data. The paper describes how the tool can be applied to several specific types of experiments, and emphasizes the ease of use - particularly for groups without experience in coding or behavioral quantification. While these aims are laudable, and the software is relatively easy to use, further improvements to make the tool more automated would substantially broaden the likely user base.

      In this manuscript, the authors introduce a new piece of software, BehaviorDEPOT, that aims to serve as an open source classifier in service of standard lab-based behavioral assays. The key arguments the authors make are that 1) the open source code allows for freely available access, 2) the code doesn't require any coding knowledge to build new classifiers, 3) it is generalizable to other behaviors than freezing and other species (although this latter point is not shown) 4) that it uses posture-based tracking that allows for higher resolution than centroid-based methods, and 5) that it is possible to isolate features used in the classifiers. While these aims are laudable, and the software is indeed relatively easy to use, I am not convinced that the method represents a large conceptual advance or would be highly used outside the rodent freezing community.

      Major points:

      1) I'm not convinced over one of the key arguments the authors make - that the limb tracking produces qualitatively/quantitatively better results than centroid/orientation tracking alone for the tasks they measure. For example, angular velocities could be used to identify head movements. It would be good to test this with their data (could you build a classifier using only the position/velocity/angular velocities of the main axis of the body?

      2) This brings me to the point that the previous state-of-the-art open-source methodology, JAABA, is barely mentioned, and I think that a more direct comparison is warranted, especially since this method has been widely used/cited and is also aimed at a not-coding audience.

      Here we address points 1 and 2 together. JAABA has been widely adopted by the drosophila community with great success. However, we noticed that fewer studies use JAABA to study rodents. The ones that did typically examined social behaviors or gross locomotion, usually in an empty arena such as an open field or a standard homecage. In a study of mice performing reaching/grasping tasks against complex backgrounds, investigators modified the inner workings of JAABA to classify behavior (Sauerbrei et al., 2020), an approach that is largely inaccessible to inexperienced coders. This suggested to us that it may be challenging to implement JAABA for many rodent behavioral assays.

      We directly compared BehaviorDEPOT to JAABA and determined that BehaviorDEPOT outperforms JAABA in several ways. First, we used MoTr and Ctrax (the open-source centroid tracking software packages that are typically used with JAABA) to track animals in videos we had recorded previously. Both MoTr and Ctrax could fit ellipses to mice in an open field, in which the mouse is small relative to the environment and runs against a clean white background. However, consistent with previous reports (Geuther et al., Comm. Bio, 2019), MoTr and Ctrax performed poorly when rodents were fear conditioning chambers which have high contrast bars on the floor (Fig. 10A–C). These tracking-related hurdles may explain, at least in part, why relatively few rodent studies have employed JAABA.

      We next tried to import our DeepLabCut (DLC) tracking data into JAABA. The JAABA website instructs users to employ Animal Part Tracker (https://kristinbranson.github.io/APT/) to convert DLC outputs into a format that is compatible with JAABA. We discovered that APT was not compatible with the current version of DLC, an insurmountable hurdle for labs with limited coding expertise. We wrote our own code to estimate a centroid from DLC keypoints and fed the data into JAABA to train a freezing classifier. Even when we gave JAABA more training data than we used to develop BehaviorDEPOT classifiers (6 videos vs. 3 videos), BehaviorDEPOT achieved higher Recall and F1 scores (Fig. 10D).

      In response to point 1, we also trained a VTE classifier with JAABA. When we tested its performance on a separate set of test videos, JAABA could not distinguish VTE vs. non-VTE trials. It labeled every trial as containing VTE (Fig. 10E), indicating that a fitted ellipse is not sufficient to detect fine angular head movements. JAABA has additional limitations as well. For instance, JAABA reports the occurrence of behavior in a video timeseries but does not allow researchers to analyze the results of experiments. BehaviorDEPOT shares features of programs like Ethovision or ANYmaze in that it can classify behaviors and also report their occurrence with reference to spatial and temporal cues. These direct comparisons address some of the key concerns centered around the advances BehaviorDEPOT offers beyond JAABA. They also highlight the need for new behavioral analysis software targeted towards a noncoding audience, particularly in the rodent domain.

      3) Remaining on JAABA: while the authors' classification approach appeared to depend mostly on a relatively small number of features, JAABA uses boosting to build a very good classifier out of many not-so-good classifiers. This approach is well-worn in machine learning and has been used to good effect in highthroughput behavioral data. I would like the authors to comment on why they decided on the classification strategy they have.

      We built algorithmic classifiers around keypoint tracking because of the accuracy flexibility and speed it affords. Like many behavior classification programs, JAABA relies on tracking algorithms that use background subtraction (MoTr) or pattern classifiers (Ctrax) to segment animals from the environment and then abstract their position to an ellipse. These methods are highly sensitive to changes the experimental arena and cannot resolve fine movement of individual body parts (Geuther et al., Comm. Bio, 2019; Pennington et al., Sci. Rep. 2019; Fig. 10A). Keypoint tracking is more accurate and less sensitive to environmental changes. Models can be trained to detect animals in any environment, so researchers can analyze videos they have already collected. Any set of body parts can be tracked and fine movements such as head turns can be easily resolved (Fig. 10E).

      Keypoint tracking can be used to simultaneously track the location of animals and classify a wide range of behaviors. Integrated spatial-behavioral analysis is relevant to many assays including fear conditioning, avoidance, T-mazes (decision making), Y-mazes (working memory), open field (anxiety, locomotion), elevated plus maze (anxiety), novel object exploration, and social memory. Quantifying behaviors in these assays requires analysis of fine movements (we now show Novel Object Exploration, Fig. 5 and VTE, Fig. 6 as examples). These behaviors have been carefully defined by expert researchers. Algorithmic classifiers can be created quickly and intuitively based on small amounts of video data (Table 4) and easily tweaked for out of sample data (Fig. 9). Additional rounds of machine learning are time consuming, computationally intensive, and unnecessary, and we show in Figure 10 that JAABA classifiers have higher error rates than BehaviorDEPOT classifiers, even when provided with a larger set of training data. Moreover, while JAABA reports behaviors in video timeseries, BehaviorDEPOT has integrated features that report behavior occurring at the intersection of spatial and temporal cues (e.g. ROIs, optogenetics, conditioned cues), so it can also analyze the results of experiments. The automated, intuitive, and flexible way in which BehaviorDEPOT classifies and quantifies behavior will propel new discoveries by allowing even inexperienced coders to capitalize on the richness of their data.

      Thank you for raising these questions. We did an extensive rewrite of the intro and discussion to ensure these important points are clear.

      4) I would also like more details on the classifiers the authors used. There is some detail in the main text, but a specific section in the Methods section is warranted, I believe, for transparency. The same goes for all of the DLC post-processing steps.

      Apologies for the lack of detail. We included much more detail in both the results and methods sections that describe how each classifier works, how they were developed and validated, and how the DLC post-processing steps work.

      5) It would be good for the authors to compare the Inter-Rater Module to the methods described in the MARS paper (reference 12 here).

      We included some discussion of how BehaviorDEPOT Inter-Rater Module compares to the MARS.

      6) More quantitative discussion about the effect of tracking errors on the classifier would be ideal. No tracking is perfect, so an end-user will need to know "how good" they need to get the tracking to get the results presented here.

      We included a table detailing the specs of our DLC models and the videos that we used for validating our classifiers (Table 4). We also added a paragraph about designing video ‘training’ and test sets to the methods.

      Reviewer #2 (Public Review):

      BehaviorDEPOT is a Matlab-based user interface aimed at helping users interact with animal pose data without significant coding experience. It is composed of several tools for analysis of animal tracking data, as well as a data collection module that can interface via Arduino to control experimental hardware. The data analysis tools are designed for post-processing of DeepLabCut pose estimates and manual pose annotations, and includes four modules: 1) a Data Exploration module for visualizing spatiotemporal features computed from animal pose (such as velocity and acceleration), 2) a Classifier Optimization module for creating hand-fit classifiers to detect behaviors by applying windowing to spatiotemporal features, 3) a Validation module for evaluating performance of classifiers, and 4) an Inter-Rater Agreement module for comparing annotations by different individuals.

      A strength of BehaviorDEPOT is its combination of many broadly useful data visualization and evaluation modules within a single interface. The four experimental use cases in the paper nicely showcase various features of the tool, working the user from the simplest example (detecting optogenetically induced freezing) to a more sophisticated decision-making example in which BehaviorDEPOT is used to segment behavioral recordings into trials, and within trials to count head turns per trial to detect deliberative behavior (vicarious trial and error, or VTE.) The authors also demonstrate the application of their software using several different animal pose formats (including from 4 to 9 tracked body parts) from multiple camera types and framerates.

      1) One point that confused me when reading the paper was whether BehaviorDEPOT was using a single, fixed freezing classifier, or whether the freezing classifier was being tuned to each new setting (the latter is the case.) The abstract, introduction, and "Development of the BehaviorDEPOT Freezing Classifier" sections all make the freezing classifier sound like a fixed object that can be run "out-of-the-box" on any dataset. However, the subsequent "Analysis Module" section says it implements "hard-coded classifiers with adjustable parameters", which makes it clear that the freezing classifier is not a fixed object, but rather it has a set of parameters that can (must?) be tuned by the user to achieve desired performance. It is important to note that the freezing classifier performances reported in the paper should therefore be read with the understanding that these values are specific to the particular parameter configuration found (rather than reflecting performance a user could get out of the box.)

      Our classifier does work quite well “out of the box”. We developed our freezing classifier based on a small number of videos recorded with a FLIR Chameleon3 camera at 50 fps (Fig. 2F). We then demonstrated its high accuracy in three separately acquired data sets (webcam, FLIR+optogenetics, and Minicam+Miniscope, Fig. 2–4, Table 4). The same classifier also had excellent performance in mice and rats from external labs. With minor tweaks to the threshold values, we were able to classify freezing with F1>0.9 (Fig. 9). This means that the predictive value of the metrics we chose (head angular velocity and back velocity) generalizes across experimental setups.

      Popular freezing detection software including FreezeFrame, VideoFreeze as well as the newly created ezTrack also allow users to adjust freezing classifier thresholds. Allowing users to adjust thresholds ensures that the BehaviorDEPOT freezing classifier can be applied to videos that have already been recorded with different resolutions, lighting conditions, rodent species, etc. Indeed, the ability to easily adjust classifier thresholds for out-of-sample data represents one of the main advantages of hand-fitting classifiers. Yet BehaviorDEPOT offers additional advantages above FreezeFrame, VideoFreeze, and ezTrack. For one, it adds a level of rigor to the optimization step by quantifying classifier performance over a range of threshold values, helping users select the best ones. Also, it is free, it can quantify behavior with reference to user-defined spatiotemporal filters, and it can classify and analyze behaviors beyond freezing. We updated the results and discussions sections to make these points clear.

      2) This points to a central component of BehaviorDEPOT's design that makes its classifiers different from those produced by previously published behavior detection software such as JAABA or SimBA. So far as I can tell, BehaviorDEPOT includes no automated classifier fitting, instead relying on the users to come up with which features to use and which thresholds to assign to those features. Given that the classifier optimization module still requires manual annotations (to calculate classifier performance, Fig 7A), I'm unsure whether hand selection of features offers any kind of advantage over a standard supervised classifier training approach. That doesn't mean an advantage doesn't exist- maybe the hand-fit classifiers require less annotation data than a supervised classifier, or maybe humans are better at picking "appropriate" features based on their understanding of the behavior they want to study.

      See response to reviewer 1, point 3 above for an extensive discussion of the rationale for our classification method. See response to reviewer 2 point 3 below for an extensive discussion of the capabilities of the data exploration module, including new features we have added in response to Reviewer 2’s comments.

      3) There is something to be said for helping users hand-create behavior classifiers: it's easier to interpret the output of those classifiers, and they could prove easier to fine-tune to fix performance when given out-ofsample data. Still, I think it's a major shortcoming that BehaviorDEPOT only allows users to use up to two parameters to create behavior classifiers, and cannot create thresholds that depend on linear or nonlinear combinations of parameters (eg, Figure 6D indicates that the best classifier would take a weighted sum of head velocity and change in head angle.) Because of these limitations on classifier complexity, I worry that it will be difficult to use BehaviorDEPOT to detect many more complex behaviors.

      To clarify, users can combine as many parameters as they like to create behavior classifiers. However, the reviewer raises a good point and we have now expanded the functions of the Data Exploration Module. Now, users can choose ‘focused mode’ or ‘broad mode’ to explore their data. In focused mode, researchers use their intuition about behaviors to select the metrics to examine. The user chooses two metrics at a time and the Data Exploration Module compares values between frames where behavior is present or absent and provides summary data and visual representations in the form of boxplots and histograms. A generalized linear model (GLM) also estimates the likelihood that the behavior is present in a frame across a range of threshold values for both selected metrics (Fig. 8A), allowing users to optimize parameters in combination. This process can be repeated for as many metrics as desired.

      In broad mode, the module uses all available keypoint metrics to generate a GLM that can predict behavior. It also rank-orders metrics based on their predictive weights. Poorly predictive metrics are removed from the model if their weight is sufficiently small. Users also have the option to manually remove individual metrics from the model. Once suitable metrics and thresholds have been identified using either mode, users can plug any number and combination of metrics into a classifier template script that we provide and incorporate their new classifier into the Analysis Module. Detailed instructions for integrating new classifiers are available in our GitHub repository (https://github.com/DeNardoLab/BehaviorDEPOT/wiki/Customizing-BehaviorDEPOT).

      MoSeq, JAABA, MARS, SimBA, B-SOiD, DANNCE, and DeepEthogram are among a group of excellent opensource software packages that already do a great job detecting complex behaviors. They use supervised or unsupervised machine learning to detect behaviors that are difficult to see by eye including social interactions and fine-scale grooming behaviors. Instead of trying to improve upon these packages, BehaviorDEPOT is targeting unmet needs of a large group of researchers that study human-defined behaviors and need a fast and easy way to automate their analysis. As examples, we created a classifier to detect vicarious trial and error (VTE), defined by sweeps on the head (Fig. 9). Our revised manuscript also describes our new novel object exploration classifier (Fig. 5). Both behaviors are defined based on animal location and the presence of fine movements that may not be accurately detected by algorithms like MoTr and Ctrax (Fig. 10). As discussed in response to reviewer 1, point 3, additional rounds of machine learning are laborious (humans must label frames as input), computationally intensive, harder to adjust for out-of-sample videos, and are not necessary to quantify these kinds of behaviors.

      4) Finally, I have some concerns about how performance of classifiers is reported. For example, the authors describe "validation" set of videos used to assess freezing classifier performance, but they are very vague about the detector was trained in the first place, stating "we empirically determined that thresholding the velocity of a weighted average of 3-6 body parts ... and the angle of head movements produced the bestperforming freezing classifier." What videos were used to come to this conclusion? It is imperative that when performance values are reported in the paper, they are calculated on a separate set of validation videos, ideally from different animals, that were never referenced while setting the parameters of the classifier. Otherwise, there is a substantial risk of overfitting, leading to overestimation of classifier performance. Similarly, Figure 7 shows the manual fitting of classifiers to rat and mouse data; the fitting process in 7A is shown to include updating parameters and recalculating performance iteratively. This approach is fine, however I want to confirm that the classifier performances in panels 7F-G were computed on videos not used during fitting.

      Thank you for pointing this out. We have included detailed descriptions of the classifier development and validation in the results (149–204) and methods (789–820) sections and added a table that describes videos used to validate each classifier (Table 4).

      To develop the classifier freezing, we explored linear and angular velocity metrics for various keypoints, finding that angular velocity of the head and linear velocity of a back point tracked best with freezing. Common errors in our classifiers were identified as short sequences of frames at the beginning or end of a behavior bout. This may reflect failures in human detection. Other common errors were sequences of false positive or false negative frames that were shorter than a typical behavior bout. We included the convolution algorithm to correct these short error sequences.

      When developing classifiers (including adjust the parameters for the external videos), videos were randomly assigned to classifier development (e.g. ‘training’) and test sets. Dividing up the dataset by video rather than by frame ensures that highly correlated temporally adjacent frames are not sorted into training and test sets, which can cause overestimation of classifier accuracy. Since the videos in the test set were separate from those used to develop the algorithms, our validation data reflects the accuracy levels users can expect from BehaviorDEPOT.

      5) Overall, I like the user-friendly interface of this software, its interaction with experimental hardware, and its support for hand-crafted behavior classification. However, I feel that more work could be done to support incorporation of additional features and feature combinations as classifier input- it would be great if BehaviorDEPOT could at least partially automate the classifier fitting process, eg by automatically fitting thresholds to user-selected features, or by suggesting features that are most correlated with a user's provided annotations. Finally, the validation of classifier performance should be addressed.

      Thank you for the positive feedback on the interface. We addressed these comments in response to points 3 and 4. To recap, we updated the Data Exploration Module to include Generalized Linear Models that can suggest features with the highest predictive value. We also generated template scripts that simplify the process of creating new classifiers and incorporating them into the Analysis Module. We also included all the details of the videos we used to validate classifier performance, which were separate from the videos that we used to determine the parameters (Table 4).

      Reviewer #3 (Public Review): There is a need for standardized pipelines that allow for repeatable robust analysis of behavioral data, and this toolkit provides several helpful modules that researchers will find useful. There are, however, several weaknesses in the current presentation of this work.

      1) It is unclear what the major advance is that sets BehaviorDEPOT apart from other tools mentioned (ezTrack, JAABA, SimBA, MARS, DeepEthogram, etc). A comparison against other commonly used classifiers would speak to the motivation for BehaviorDEPOT - especially if this software is simpler to use and equally efficient at classification.

      We also address this in response to reviewer 1, points 1–3. To summarize, we added direct comparisons with JAABA to a revised manuscript. In Fig. 10, we show that BehaviorDEPOT outperforms JAABA in several ways. First, DLC is better at tracking rodents in complex environments than MoTr and Ctrax, which are the most used JAABA companion software packages for centroid tracking. Second, we show that even when we use DLC to approximate centroids and use this data to train classifiers with JAABA, the BehaviorDEPOT classifiers perform better than JAABA’s.

      In a revised manuscript, we included more discussion of what sets BehaviorDEPOT apart from other software, focusing on these main points:

      BehaviorDEPOT vs. commercially available packages (Ethovision, ANYmaze, FreezeFrame, VideoFreeze)

      1) Ethovision, ANYmaze, FreezeFrame, VideoFreeze cost thousands of dollars per license while BehaviorDEPOT is free.

      2) The BehaviorDEPOT freezing classifier performs robustly even when animals are wearing a tethered patch cord, while VideoFreeze and FreezeFrame often fail under these conditions.

      3) Keypoint tracking is more accurate, flexible, and can resolve more detail compared to those that use background subtraction or pixel change detection algorithms combined with center of mass or fitted ellipses.

      BehaviorDEPOT vs. packages targeted at non-coding audiences (JAABA, ezTrack)

      1) DLC keypoint tracking performs better than MoTr and Ctrax in complex environments. As a result, JAABA has not been widely used in the rodent community. Built around keypoint tracking, BehaviorDEPOT will enable researchers to analyze videos in any type of arena, including videos they have already collected. Keypoint track also allows for detection of finer movements, which is essential for behaviors like VTE and object exploration.

      2) Hand-fit classifiers can be creative quickly and intuitively for well-defined laboratory behaviors. Compared to machine learning-derived classifiers, they are easier to interpret and easier to fine-tune to optimize performance when given out-of-sample data.

      3) Even when using DLC as the input to JAABA, BehaviorDEPOT classifiers perform better (Figure 10)

      4) BehaviorDEPOT integrates behavioral classification, spatial tracking, and quantitative analysis of behavior and position with reference to spatial ROIs and temporal cues of interest. It is flexible and can accommodate varied experimental designs. In ezTrack, spatial tracking is decoupled from behavioral classification. In JAABA, spatial ROIs can be incorporated into machine learning algorithms, but users cannot quantify behavior with reference to spatial ROIs after classification has occurred. Neither JAABA nor ezTrack provide a way to quantify behavior with reference to temporal events (e.g. optogenetic stimuli, conditioned cues).

      5) BehaviorDEPOT includes analysis and visualization tools, providing many features of the costly commercial software packages for free.

      BehaviorDEPOT vs. packages based on keypoint tracking (SimBA, MARS, B-SOiD)

      Other software packages based on keypoint tracking use supervised or unsupervised methods to classify behavior from animal poses. These software packages target researchers studying complex behaviors that are difficult to see by eye including social interactions and fine-scale grooming behaviors whereas BehaviorDEPOT targets a large group of researchers that study human defined behaviors and need a fast and easy way to automate their analysis. Many behaviors of interest will require spatial tracking in combination with detection of specific movements (e.g. VTE, NOE). Additional rounds of machine learning are laborious (humans must label frames as input), computationally intensive, and are not necessary to quantify these kinds of behaviors.

      2) While the idea might be that joint-level tracking should simplify the classification process, the number of markers used in some of the examples is limited to small regions on the body and might not justify using these markers as input data. The functionality of the tool seems to rely on a single type of input data (a small number of keypoints labeled using DeepLabCut) and throws away a large amount of information in the keypoint labeling step. If the main goal is to build a robust freezing detector then why not incorporate image data (particularly when the best set of key points does not include any limb markers)?

      While one main goal was to build a robust freezing detector, BehaviorDEPOT is a general-purpose software. BehaviorDEPOT can classify behaviors from video timeseries and can analyze the results of experiments similar to Ethovision or FreezeFrame. BehaviorDEPOT is particularly useful for assays in which behavioral classification is integrated with spatial location, including avoidance, decision making (T maze), and novel object memory/recognition. While image data is useful for classifying behavior, it cannot combine spatial tracking with behavioral classification. However, DLC keypoint tracking is well-suited for this purpose. We find that tracking 4–8 points is sufficient to hand-fit high performing classifiers for freezing, avoidance, reward choice in a T-maze, VTE, and novel object recognition. Of course, users always have the option to track more points because BehaviorDEPOT simply imports the X-Y coordinates and likelihood scores of any keypoints of interest.

      3) Need a better justification of this classification method

      See response to reviewer 1, points 1–3 above.

      4) Are the thresholds chosen for smoothing and convolution adjusted based on agreement to a user-defined behavior?

      Yes. We added more details in the text. Briefly, users can change the thresholds used in both smoothing and convolution in the GUI and can optimize the values using the Classifier Optimization Module. Smoothing is performed once at the beginning of a session and has an adjustable span for the smoothing window. The convolution is a feature of each classifier, and thus can be adjusted when adjusting the classifier. When developing the freezing classifier, we started with a smoothing window that had the largest value that did not exceed the rate of motion of the animal and then fine-tuned the value to optimize smoothing. In the classifiers we have developed, window widths that are the length of the smallest bout of ‘real’ behavior and count thresholds approximately 1/3 the window width yielded the best results.

      5) Jitter is mentioned as a limiting factor in freezing classifier performance - does this affect human scoring as well?

      We were referring to jitter in terms of point location estimates by DeepLabCut. In other words, networks that are tailored to the specific recording conditions have lower error rates in the estimates of keypoint positions. Human scoring is an independent process that is not affected by this jitter. We changed the wording in the text to avoid any confusion.

      6) The use of a weighted average of body part velocities again throws away information - if one had a very high-quality video setup with more markers would optimal classification be done differently? What if the input instead consisted of 3D data, whether from multi-camera triangulation or other 3D pose estimation? Multianimal data?

      From reviewer 2, point 3: MARS, SimBA, and B-SOiD are excellent open-source software packages that are also based on keypoint tracking. They use supervised or unsupervised methods to classify complex behaviors that are difficult to see by eye including social interactions and fine-scale grooming behaviors. Instead of trying to improve upon these packages, which are already great, BehaviorDEPOT is targeting unmet needs of a large group of researchers that study human defined behaviors and need a fast and easy way to automate their analysis. Additional rounds of machine learning are laborious (humans must label frames as input), computationally intensive, and are not necessary to quantify these kinds of behaviors. However, keypoint tracking offers accuracy, precision and flexibility that is superior to behavioral classification programs that estimate movement based on background subtraction, center of mass, ellipse fitting, etc.

      7) It is unclear where the manual annotation of behavior is used in the tool as currently stands. Is the validation module used to simply say that the freezing detector is as good as a human annotator? One might expect that algorithms which use optic flow or pixel-based metrics might be superior to a human annotator, is it possible to benchmark against one of these? For behaviors other than freezing, a tool to compare human labels seems useful. The procedure described for converging on a behavioral definition is interesting and an example of this in a behavior other than freezing, especially where users may disagree, would be informative. It appears that manual annotation doesn't actually happen in the GUI and a user must create this themselves - this seems unnecessarily complicated.

      Manual annotation of behavior is used in the four classifier development modules: inter-rater, data exploration, optimization, and validation. The inter-rater module can be used as a tool to refine ground-truth behavioral definitions. It imports annotations from any number of raters and generates graphical and text-based statistical reports about overlap, disagreement, etc. Users can use this tool to iteratively refine annotations until they converged maximally. The inter-rater module can be used to compare human labels (or any reference set of annotations) for any behavior. To ensure this is clear to the readers, we added more details to the text and second demonstration of the inter-rater module for novel object exploration annotations (Fig. 7). The validation module imports reference annotations which can be produced by a human or another program, which can benchmark classifier performance against the reference. We added more details to this section as well.

      Freezing is a straightforward behavior that is easy to detect by eye. Rather than benchmark against an optic flow algorithm, we benchmarked against JAABA, another user-friendly behavioral classification software that uses machine learning algorithms. We find that BehaviorDEPOT is easier to use and labels freezing more accurately than JAABA. We also made a second freezing classifier that uses a changepoint algorithm to identify transitions from movement to freezing that may accommodate a wider range of video framerates and resolutions.

      We plan to incorporate an annotation feature into the GUI, but in the interest of disseminating our work soon, we argue that this is not necessary for inclusion now. There are many free or cheap programs that allow framewise annotation of behavior including FIJI, Quicktime, VLC, and MATLAB. In fact, users may already have manual annotations or annotations produced by a different software and BehaviorDEPOT can import these directly. While machine learning classifiers like JAABA require human annotations to be entered into their GUI, allowing people to import annotations they collected previously saves time and effort.

      8) A major benefit of BehaviorDEPOT seems to be the ability to run experiments, but the ease of programming specific experiments is not readily apparent. The examples provided use different recording methods and networks for each experimental context as well as different presentations of data - it is not clear which analyses are done automatically in BehaviorDEPOT and which require customizing code or depend on the MiniCAM platform and hardware. For example - how does synchronization with neural or stimulus data occur? Overall it is difficult to judge how these examples would be implemented without some visual documentation.

      We added visual documentation of the experimental module graphical interface to figure 1 and added more detail to the results, methods and to our GitHub repository (https://github.com/DeNardoLab/Fear-Conditioning-Experiment-Designer). Synchronization with stimulus data can occur within the Experiment Module (designed for fear conditioning experiments) or stimuli timestamps can be easily imported into the Analysis Module. Synchronization with neural data occurs post hoc using the data structures produced by the BehaviorDEPOT Analysis Module. We include our code for aligning behavior to Miniscope on our GitHub repository https://github.com/DeNardoLab/caAnalyze).

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      In their manuscript, Hattori et al., put forward evidence that the knock-out of CD38 expression in astrocytes at approximately post-natal day 10 (referred to as CD38 AS-cKO P10) leads to a specific deficit in social memory in adult mice, while other types of memory remain unaltered. Using immunohistochemistry (IHC), the authors found a reduced number of excitatory synapses in the medial prefrontal cortex (mPFC) of CD38 AS-cKO P10 mice. Switching to in vitro primary cell culture models, the authors identify the astrocyte secreted protein SPARCL1 as a relevant synaptogenic factor. Using pharmacological dissection of relevant signaling pathways, Hattori et al., propose that cADPR formation and calcium released from intracellular stores, is essential for SPARCL1 secretion from astrocytes. Finally, the authors analyzed the transcriptome of primary CD38 KO astrocytes using bulk mRNA sequencing, and found that genes related to calcium signaling were downregulated in these cells.

      Major commments:

      • Are the key conclusions convincing?
        1. From a global perspective, the multiple lines of evidence provided by the authors strongly suggest that expression of CD38 in astrocytes is important for synaptogenesis in the mPFC of P10 mice, with ablation of CD38 and reduced synapse formation leading to social memory deficits at P70. However, the data concerning the role of astrocyte-secreted SPARCL1 is not particularly strong: further experiments are needed to support this claim (see below).
      • Are the claims preliminary or speculative?
        1. As it stands, there is no proof that the claimed astrocyte-specific deletion of CD38 is actually astrocyte specific. This evidence is crucial: without it the reported effects could be due to non-specific CD38 knock-out in other CNS cells. In this respect, the Western Blot in Supplementary Figure 1A does not provide information on astrocyte-specific deletion, merely that CD38 was globally reduced in the mPFC. Interestingly, the authors have previously published data (Hattori et al., 2017, 10.1002/glia.23139) showing that CD38 expression is mostly astrocyte-specific, peaking at p14, which coincides with the peak period of synaptogenesis. The degree of CD38 heterogeneity is also an issue that I think the authors need to consider. Do they information on this? Is CD38 expressed in every astrocyte of the CNS, or are there some astrocytes that are CD38 negative at P14? Is the mPFC a region specifically enriched in CD38 positive astrocytes and does this explain the observed behavioral deficit? I think if this is known, the authors should mention it in the "Introduction" or "Discussion". If this is not known, maybe the authors could provide data addressing the issue.
        2. I think the authors should take more caution in claiming that SPARCL1 is the main factor secreted through the CD38 signaling pathway and responsible for increased synaptogenesis. This is for several reasons, all centered on data displayed in Figure 4 and Supplementary Figure 6:
          • a) Western Blot (WB) data: The "Materials and Methods" section for WB does not indicate how protein loading and transfer efficiency were controlled for. Normalizing to β-Actin levels is an acceptable way to control for loading and transfer efficiency when using cell lysates. However, in the absence of such an abundant structural protein in conditioned media it is unclear how loading and transfer was controlled for under these conditions. Do the authors normalized the CD 38 KO AS ACM data by expressing protein levels relative to those from WT AS ACM? Is BDNF being used as a control, based on proteomics data? If so, why is proteomics data not given in the manuscript and why is this control not shown for all ACM blots? I realize that (quantitative) blotting using ACM is difficult, but I am also not convinced that the methodology used is sufficiently rigorous. Simple steps to give confidence would be Coomassie staining of gels both before and after membrane transfer, to show that i) the total protein amount loaded was the same in each lane of the gel and ii) the transfer to the nitrocellulose membrane was complete. In addition, Ponceau S staining of the nitrocellulose membrane should also have been performed and displayed, to show (roughly) equal amounts of protein were transferred for each lane. In summary, the WB data quantification needs to be better controlled. The values of the Y axis in these graphs (and throughout the manuscript) are simply too small to be read properly. Finally, I want to highlight the general lack of precision regarding the nature of the replication unit (the "n"). For example, the legend of Figure4C-D states "n = 6", but we have no idea if these are 6 independent primary cultures originating from 6 mice, 6 independent cultures from the same mouse, 6 repeats of the Western Blot using the same sample etc. This issue is valid for the whole manuscript: in my opinion, the authors should be more much careful when it comes to these crucial elements of scientific reporting.
          • b) While the data hint at an important role of SPARCL1 in synapse formation, when the authors tested if ACM from CD38 KO astrocytes supplemented with exogenous SPARCL1 could rescue synapse formation, the effect was incomplete, with only a trend to an increase in synapse number (Figure 4J-K). Perhaps the authors simply forgot to indicate the statistical significance of differences between the experimental groups (Figure 4K)? However, if there really were no statistically significant differences observed, the authors should reduce the strength of their conclusions regarding SPARCL1. This protein may well be pro-synaptogenic but, as it stands, other factors could well be in play. Perhaps the authors should have tried higher concentrations of SPARCL1 to further boost synaptogenesis? In this respect, the SPARCL1 knockdown (KD) experiment in Supplementary Figure 6B-D is an important addition, but should be supplemented by rescue with an siRNA-resistant recombinant SPARCL1? If SPARCL1 is a major player in synaptogenesis, the prediction is that synapse numbers would be close to wild type levels with this approach.
          • c) In my opinion, there are also issues with the data displayed in Figure 4H-I. The authors want to convince the reader that SPARCL1 is mostly an astrocytic protein using immunohistochemistry on mouse mPFC sections, co-labelled with antibodies against neuronal and astrocytic markers. In these panels, we are presented with images showing a few cells, in which it seems SPARCL1 is absent from NeuN positive cells, present in WT astrocytes and reduced in CD38 AS-cKO P10 astrocytes. However, the numbers of cell counted and lack of quantification severely impact on the strength of this conclusion. In my opinion, the authors should have quantified their IHC data by counting cells and establishing the ratios of SPARCL1 positive over NeuN or S100β positive cells, in both control and CD38 AS-cKO P10 animals. This experiment would provide critical information that the conditional gene targeting strategy is robust. The authors should also consider quantifying the intensity of the SPARCL1 signal in astrocytes. This is recommended as the image displayed in Figure 4I for the CD38 AS-cKO is problematic: are the authors really claiming that the reduction in SPARCL1 expression following cKO of CD38 in astrocytes is at best only partial? Is 11 days between the first tamoxifen injection and tissue fixation actually sufficient to allow for CD38 turnover? With low levels of protein turnover, the possibility exists that residual levels of CD38 are still sufficient to impact SPARCL1 levels. What would happen if there is a greater interval between tamoxifen administration and tissue recovery? Would levels of synaptogenesis be further reduced? Is this an issue of production versus secretion or a combination of factors?
        3. The heatmap (Figure 5E-F) is simply too small to interpret. The color choice is also not accessible for colorblind readers. The authors might consider displaying this heatmap in a separate figure. The authors should also provide a supplementary table where all the genes detected are listed along with their respective counts. Furthermore, it is surprising that the authors only found genes being downregulated in CD 38 KO astrocytes. Were they really no genes up-regulated? The authors might also want to indicate the genes belong to each of the ontological categories listed in Figure 5F. On p. 11, Figure 5E: The authors should indicate in the main text they performed bulk RNA-sequencing and not another type of RNA sequencing (like single cell RNA sequencing for instance). The authors indicate n = 2 but we have no indications of the nature of the replicate (also see earlier comments). Please amend.
      • Are additional experiments necessary? I think supplementary experiments are essential to support the claims of the paper. Most are described in the section above, but to summarize:
        1. Show data to prove that the CD38 AS-cKOP10 model is astrocyte-specific and leads to a total loss of CD38 in these cells.
        2. WB data: The issue of protein loading and transfer efficiency should be dealt with. Quantifications should be revisited.
        3. The authors should quantitatively analyze the different IHC performed in Figure 4H-I.
        4. The authors should provide more information on their RNA sequencing data: list of genes detected with their FPKM values etc. The authors should display the RNA sequencing data in a separate figure, allowing the heatmap to be enlarged.
        5. LC-MS/MS data: the authors should provide the list of all the proteins they identified in their LC-MS/MS experiment. As a supplementary table for instance? The majority of these experiments should be able to be performed with pre-existing samples/tissue slices. If not, the experimental pipeline necessary exits and these supporting experiments should not be too burdensome.
      • Data and methods presentation Methods: The authors need to work on this aspect of the manuscript. Most of the important details are already described, but some crucial ones are missing, while the phrasing used to describe methods is sometimes misleading. I will give some examples here, but this is not an exhaustive list. The fact that the manuscript is riddled with small mistakes, inconsistencies and/or oversights makes it difficult to read and creates a negative impression. The whole manuscript would benefit from a thorough proof-reading, preferably by a native speaker.
        1. in the "Immunohistochemistry and Synaptic Puncta Analysis" section on p. 21-22, we have no indication of which antibodies against "GFAP, NDRG2, VGlut1, PSD95, S100β, NenN(?) and SPARCL1" were used. It is standard practice to indicate the company, product number and lot number. The authors must also indicate the dilution at which they use these antibodies. On p.22, the authors write the cells were incubated with "Alexa- or Cy3-conjugated secondary antibodies". The excitation wavelengths of the Alexa dyes used need to be given.
        2. The authors need to provide more details on the microscope they used. Merely writing "using a 63× lens on a fluorescence microscope" (p.23) is insufficient.
        3. In the "LC-MS/MS" method the authors wrote: "Briefly, these proteins were reduced, alkylated, and digested by trypsin". I think that in the reduction and alkylation steps, chemicals other than trypsin were actually used. This sentence should be modified to reflect this.
        4. p.19: "uM" is written when the authors very likely mean "µM". Please check the whole manuscript for repeat examples. I know this is often lab "short-hand", but it should be avoided in scientific publications.
        5. The authors should be careful when describing their data to always indicate whether they referring to experiments performed using cultured astrocytes or not. As it stands, the text is confusing: for instance, when describing RNA-sequencing data in Figure 5, the main text appears to indicate that these astrocytes were acutely isolated from adult mice, when in fact they were obtained from primary cultures. Given concerns in the literature about potential differences between acutely isolated and cultured astrocytes (Foo et al., Neuron, 2011), this is essential. Data presentation: The figures appear to have been produced in a rush - and almost have a "screenshot" feel to them. This is not a scientific issue per se, but does impact on the overall impression given by the manuscript. The following is a non-exhaustive list of issues with the figures. I list the major ones that the authors should correct.
        6. Almost all Y axis labels are too small. The authors should comply to the basic journal requirements in terms of font sizes. Some axes do not end on a tick (e.g. Figure 3R). This is not dramatic, but should be corrected. Globally, the authors need to display bigger bar plots - most of them are extremely hard to read. Labeling should also be checked: Figure 4K, the Y axis label indicates values displayed are in %, when I think the axis graduation displays ratio values. Some of the IHC pictures are also too small to be easily interpreted.
        7. The heatmap in Figure 5E is impossible to read and, as such, has little or no value for the manuscript.
        8. Scale bars: where is the scale bar in Figure 2A? Figure 3A-H: Is the scale bar really representing 10 millimeters? Supplementary Figure 3A: scale bar is missing. Please check for similar issues throughout the manuscript.
        9. Figure Legends are problematic, and often contain incorrect or incomplete information. Examples include: Supplementary Figure 1: The description of panels J, L and N appears to be missing. Please also use the Greek letter beta and not 'b' for S100β. Supplementary Figure 5: I think the term "KO" is missing after CD 38 in the legend title. Figure 3: why state that nuclei were counterstained with DAPI in Figure 3P,Q, when this precision is not given for panels Figure 3A-H? Figure 3A-H: If the authors choose to explicitly state PSD95 is a post-synaptic marker, why not indicate that VGlut1 is a pre-synaptic marker? Same issue in Supplementary Figure 4.
        10. There are multiple instances of panels being wrongly referred to in the main text. On p.10, Figure 4H is referenced, when I think the authors mean Figure 4I; on p.10, Figure 4I-J are referred to when the authors clearly describe data found in Figure 4J-K. These types of mistakes are problematic and recur throughout the manuscript.
      • Statistical analysis As mentioned above, the exact nature of the replicates is often not stated, when the "n" number is indicated. The authors must correct this issue and give the information either at the appropriate point in the main text or in the figure legend.

      The authors should also be more consistent in the way they indicate which statistical tests were performed. This should also be indicated either at the appropriate point in the main text or in the figure legend. Furthermore, care should be taken to ensure statistics are presented in an appropriate manner: at the end of legend for Figure 4, it is indicated #p < 0.05 vs. CD38 KO ACM. This hashtag symbol is completely absent from the figure. In Figure 4F-G, the lack of statistical symbols seems to indicate no statistical tests were performed on these data, when the legend covering these panels states "*p < 0.05 versus P70", indicating some tests were done. We cannot interpret this panel without knowing which comparisons were done exactly and which were significant.

      In the "Materials and Methods", the authors give no indication that the assumptions of the statistical test they used were met (normality of data distribution for t-tests, homogeneity of variances for ANOVA...). This needs to be checked, and if not met, appropriate non-parametric tests should be used instead.

      Minor commments:

      • Specific experimental issues that are easily addressable. Most of the experimental issues that need to be addressed are given in previous sections and should be easily addressable.
      • Citation of previous studies? Adequate
      • Clarity and accuracy of text and figures There are issues with the clarity and accuracy of text and figures - which are described above. The text is also often problematic in its phrasing and other, more fundamental aspects. For instance, the authors spent a considerable amount of time speaking about the role of oxytocin, when they only performed one measurement of oxytocin levels in mice.
      • Suggestions to improve the presentation of data and conclusions? All my suggestions to improve the presentation of data can found in previous sections. As for improving the authors presentation of their conclusions, the authors should make a considerable re-drafting effort, particularly for the "Discussion", which lacks clarity in how supporting arguments are built and presented. For example, on p.13, I am confused with the argument made by the authors. Their data are focused on synapses onto pyramidal neurons of the mPFC, but here the discussion states that the behavioral phenotype they observed in CD38 AS-cKOP10 might be explained by a lack of mPFC neurons synapsing onto neurons in the Nucleus Accumbens (assuming that "NAc" really refers to this brain region, as the definition is missing from the text). I think the authors should make it clear if this is their interpretation of their own result, which essentially renders their focus on mPFC pointless, or a speculation on possible other mechanisms that could also explain their behavioral results. Personally, given the data shown, I believe the authors should focus on explaining how their data in mPFC might explain the behavioral output observed. The authors could also provide perspectives on how the hypothesis laid down in this paragraph would be tested. When the authors write on p.14 "We identified SPARCL1 as a potential molecule for synapse formation in cortical neurons" why use the word "potential"? Does this mean the authors consider their data on SPARCL1 (one of the key messages of the paper) invalid? If the authors themselves think the role of SPARCKL1 is ambiguous based on their own data, they should perform further experiments. P. 13, the authors write: "Moreover, many studies have shown that astrocyte-specific molecules, including extracellular molecules such as IL-6, are involved in memory function"; Interleukin 6 (Il-6, abbreviation not defined in the manuscript) is definitely not an astrocyte-specific molecule (see, for example, Erta et al., 2021 10.7150/ijbs.4679).

      Significance

      NATURE AND SIGNIFICANCE OF THE ADVANCE: I think that despite the issues described above, this manuscript, once revised, could have a strong impact in the field. It would fuel the current paradigm shift which puts astrocytes at the forefront of neuronal circuit wiring during development with links to adult behavior. By identifying clear molecular targets involved in astrocyte-driven synaptogenesis, this article could help the clinical field to find new druggable targets, which may help reverse aging-related cognitive decline.

      COMPARISON TO EXISTING PUBLISHED KNOWLEGDE: This work adds new data in the specific and growing line of research that study how astrocytes control synaptogenesis. Recent reviews have summarized advances in this field (Shan et al., 2021, 10.3389/fcell.2021.680301; Baldwin et al., 2021, 10.1016/j.conb.2017.05.006).

      AUDIENCE: Neuroscientists in general, clinicians interested in cellular and molecular causes of neurodevelopmental disorders leading to social dysfunctions.

      REVIEWER EXPERTISE: Astrocyte biology; Astrocyte-neuron interactions and synapse assembly; Neuronal circuit formation and plasticity

      Referees cross-commenting

      After careful reading of the other comments, I feel that there is considerable agreement/overlap between the reviewers on the main issues with this manuscript. Perhaps the major difference relates to the amount of further work necessary for the manuscript to be publication ready.

      As Reviewer 3 rightly points out, this is always a moot point: how much is it reasonable for reviewers to ask authors to do? While I agree with all of Reviewer 1's comments regarding the rigour of the mass-spec/western blot analysis, it seems to me that from a molecular/cell biological point of view, the key issue is whether Sparcl1 is a synaptogenic factor released from astrocytes following CD38/cADPR/calcium signaling (irrespective of whether other factors may be in play); and whether raising Sparcl1 levels is sufficient to recover spine morphology and synapse numbers. Of course, if these experiments were performed in vivo using AAV-mediated overexpression of Sparcl1, it is also reasonable to think that the deficit in social memory may be reversed on testing.

      The issues of whether there is a difference in observable behavioral phenotypes between the astrocyte-specific and constitutive CD38 knock-outs is an interesting one, as is why there is only a deficit in social memory seen following astrocyte-specific CD38 ablation. These issues should at least be discussed.

    1. Author Response:

      Evaluation Summary:

      This study adds to the considerable, but often conflicting, work on how neurotransmitter systems contribute to auditory processing dysfunction. The paper details a thorough and careful analysis of an important hypothesis from the point of view of schizophrenia research: do muscarinic and dopaminergic receptors contribute to mismatch negativity effects? The answers could be useful for future treatment allocation in psychosis. The analysis was pre-registered and departures from the planned analysis were well-motivated and clearly described.

      Thank you for this positive statement. We would like to make sure that the nature of our pre-registration is fully understood: we did not formally pre-register our study (i.e., there was no independent peer review). Instead, we defined an analysis plan ex ante (i.e., before beginning the data analysis for examining drug effects), and time-stamped and uploaded this plan on our institutional Git repository, prior to the unblinding of the analysing researcher. This a priori analysis plan is publicly available as well as our analysis code, and we report any departures from the analysis plan in our manuscript.

      Reviewer #1 (Public Review):

      The reduced amplitude of the mismatched negativity (MMN) in Schizophrenic patients has been associated with NMDA receptor malfunction. Weber and colleagues adjusted the systemic levels of two neurotransmitters (acetylcholine and dopamine), that are known to modulate NMDA receptor function, and examined the effects on mismatch related ERPs. They examined mismatch related ERPs elicited during a novel passive auditory oddball paradigm where the probability of hearing a particular tone was either constant for at least 100 trials (stable phases) or changed every 25-60 trials (volatile phases). Using impressive statistical testing the authors find that mismatch responses are selectively affected by reduced cholingeric function particularly during stable phases of the paradigm, but not by reduced dopamine function. Interestingly neither enhanced cholingeric or dopamine function affected MM responses at all. While the presented data support the main conclusions mentioned above, there are some claims in the abstract and text that are not supported by the results.

      1) The authors state in the abstract that "biperiden reduced and/or delayed mismatch responses......", while the results (Figure 2) support the statement that biperiden delayed mismatch responses, the claim that biperiden reduced mismatch responses is misleading as on P13 the authors actually report that "mismatch signals were stronger in the biperiden group compared to the placebo group at right central and centro-parietal sensors" around 200ms. This is close both in time and spatially to the traditional temporal and spatial locations of the MMN component. If one were to only read the abstract they would take away the result that the muscarinic acetylcholine receptor antagonist biperiden has an attenuative effect on MMN which is not what the results show.

      Thank you for this comment. We agree that the description in the abstract might be misleading and have changed our wording there. We now say (in the overall shortened abstract):

      “We found a significant drug x mismatch interaction: while the muscarinic acetylcholine receptor antagonist biperiden delayed and topographically shifted mismatch responses, particularly during high stability, this effect could not be detected for amisulpride, a dopamine D2/D3 receptor antagonist.”

      2) The conclusion that biperiden reduced mismatch responses may be due to the finding that at pre-frontal sensors mismatch responses were significantly smaller in the biperiden group than in the amisulpride (a dopaminergic receptor antagonist) group (P9) around 164ms. However, it is difficult to interpret if this is a meaningful result as amisulpride was found not to significantly alter mismatch responses in any way compared to placebo. It would be more convincing if the significant difference here were between biperiden and placebo groups. Or are we to think of amisulpride as being comparable to a placebo?

      We agree with your previous point and have adjusted our wording in the abstract accordingly (see response to previous comment).

      Furthermore, we have included an additional section in the Discussion in which we address the points you raise:

      "One might wonder whether the early difference between the biperiden and the amisulpride group at pre-frontal sensors is difficult to interpret, given the lack of differences of either drug group compared to placebo. However, given our research question – i.e., whether auditory mismatch signals are differentially susceptible to muscarinic versus dopaminergic receptor status – showing a significant difference between biperiden and amisulpride is critical.

      Clearly, such a differential effect would be even more compelling if biperiden differed significantly from amisulpride and placebo at the same time (and in the same sensor locations). While we do not find this in our main analysis, we do see it for the analysis using the alternative pre-processing pipeline and the trial definition (Figure 2—figure supplement 3) that was also specified a priori in our analysis plan. In this alternative analysis, mismatch responses under biperiden did differ significantly from both placebo and amisulpride."

      We suspect this difference in results between the analysis pipelines might partly be due to the different re-referencing. Compared to the average reference used in the main analysis, the linked mastoid reference in the alternative pre-processing pipeline subtracts the effects at sensors which show positive mismatch signals from those at fronto-central channels (with opposite sign), effectively enhancing the signal at the fronto-central channels (for evidence of this effect see also current Figure 3—figure supplement 1) but weakening it at temporal and pre-frontal sensors.

      We now discuss the question of sensitivity of both our paradigm and processing strategy in the discussion.

      3) The authors use the words mismatch negativity (MMN) and mismatch responses interchangeably however in some cases it is clearly mismatch responses being described and not the classical MMN ERP component. This occurs especially in the Introduction where the authors describe the study and that they plan to focus on the MMN but in the results section, since the initial analysis focuses on all sensors, other mismatch responses are consistently discussed. These differences in wording need to be precisely defined and used consistently in the text.

      We agree that it is important to use precise definitions of the terms and be consistent in their use. The dipole source signal of mismatch detection shows up with different signs across different sensor locations, and “MMN” traditionally refers to the effect in fronto-central channels, where it is a deviant-induced negativity. However, even when we constrain the use of “MMN” to the (difference in) negative deflection at fronto-central channels between 100 and 250ms (or similar) there remains some ambiguity due to the choice of reference. A common choice in MMN research is a linked mastoid reference. Because the mismatch signal shows up at the mastoids with opposite sign to fronto-central channels, this reference maximizes the observed difference at fronto-central channels (see also our Figure 3—figure supplement 1 and our reply to the previous comment) and minimizes it elsewhere, effectively forcing all (drug or other) effects to show up at frontocentral channels. This demonstrates that we typically think of the effects at different sensor locations as (caused by) one and the same (dipole source) signal. In our average referenced data (our main analysis), we observe some effects at fronto-polar sensors, where they are expressed as a modulation of a positive deflection, however, we think of these as being part of what is typically referred to as “MMN” for the above reasons.

      However, to avoid any confusion that this may cause, we have adapted the wording in our manuscript everywhere and mention this distinction in the methods section:

      “To avoid confusion, we will only use the term “MMN” when we talk about effects in the classical time window (100-200ms) and sensor locations (frontocentral sensors) for the MMN, and use “mismatch responses” for all other effects.”

      4) A weakness of the paper would be that the authors offer no prediction in the Introduction about what the expected effects of these specific neurotransmitter modulations would be on mismatch responses.

      Thank you for this suggestion and apologies for this oversight. We have now added a sentence to the Introduction, describing the effects we expected based on previous literature.

      Based on previous literature, one would expect mismatch responses in our paradigm to be sensitive to (1) volatility, with larger mismatch amplitudes during more stable phases (Dzafic et al., 2020; Todd et al., 2014; Weber et al., 2020), and (2) cholinergic manipulations, with galantamine increasing and biperiden reducing mismatch amplitudes (Moran et al., 2013; Schöbi et al., 2021). Furthermore, we expected a differential effect of cholinergic (muscarinic) and dopaminergic receptor status on mismatch responses, as postulated by initial work on MMN-based computational assays (Stephan et al., 2006). Our results suggest that muscarinic receptors play a critical role for the generation of mismatch responses and their dependence on environmental volatility, whereas no such evidence was found for dopamine receptors.”

      5) A nice aspect of this paper is that the authors re-analyzed their data using pre-processing settings identical to those used in comparable research papers examining the effect of cholinergic modulation on MMN. The main findings did not differ following this re-analysis.

      Reviewer #2 (Public Review):

      The authors found that Biperiden (M1 antagonist) delayed and altered the topography of MMN responses, particularly in the stable condition. Amisulpride did not do so, and neither did Galantamine or L-DOPA. The analysis using an ideal Bayesian observer (the HGF) detailed in the Appendix showed that Biperiden reduced the representation of lower-level prediction errors and increased that of higher-level prediction errors (about volatility).

      The methods were rigorous (including obtaining drug plasma levels and detailing alternative preprocessing techniques) and I have no suggestions for improvement from that point of view.

      I only have one main comment that I think could be discussed. I'm not an expert on this but as I understand it, Olanzapine is most selective for M2 receptors rather than M1 (https://www.nature.com/articles/1395486), although Clozapine metabolites do have some M1 selectivity (https://www.pnas.org/content/100/23/13674) - I'm not sure about Clozapine itself. So Biperiden (very M1 selective) might not be the ideal drug to use to explore a treatment allocation paradigm, at least for Olanzapine? I suspect the options are quite limited but it would probably be worth commenting on this.

      Thank you for pointing this out, this is indeed an important point for the discussion.

      First, clarifying the pharmacodynamics of psychopharmacological drugs and their relative affinity to different receptor subtypes is notoriously difficult as this depends on many methodological factors. The seminal paper on the binding profile of olanzapine (which, at the same time, also examined clozapine) is (Bymaster et al., 1996). Using in vitro assays, this study found that both olanzapine and clozapine showed by far the greatest affinity for the M1 receptor (see the Table 5). By contrast, using SPECT data from seven patients with schizophrenia treated with olanzapine, the paper you mentioned (Raedler et al., 2000) estimated the affinity of olanzapine to the M2 receptor as being roughly twice as high as to the M1 receptor. Both studies have methodological pros and cons (as discussed by (Raedler et al., 2000)). From our view, an important limitation by the study of (Raedler et al., 2000) is that they used the ligand [I-123]IQNB which is not selective and "does not allow discrimination between the different subtypes of the muscarinic receptors" (Raedler, Knable, Jones, Urbina, Gorey, et al., 2003). Instead, the M1/M2 comparison by (Raedler et al., 2000) rested on conclusions from a mathematical approximation – under various assumptions and with only 7 data points available. We note that subsequent studies by the same group on muscarinic receptors in schizophrenia (Raedler, Knable, Jones, Urbina, Egan, et al., 2003; Raedler, Knable, Jones, Urbina, Gorey, et al., 2003) no longer used this approach and refrained from making statements about relative selectivity of olanzapine and clozapine with regard to M1/M2 receptors. Furthermore, the results by (Raedler et al., 2000) are potentially confounded by the fact that they were not obtained from healthy controls, but from patients with schizophrenia. This is potentially problematic: if schizophrenia is characterised by an aberration related to M1 receptors (see below), this would affect the interpretability of the results by (Raedler et al., 2000). Overall, the relative affinity of olanzapine and clozapine to M1/M2 receptors remains a matter of debate, but it seems safe to say that both drugs affect both receptors.

      Second, we would like to explain that we think of biperiden as a model of a (potential) impairment, rather than a treatment. A series of studies have provided compelling evidence for a role of muscarinic (M1) receptor dysfunction in the pathophysiology of schizophrenia. In particular, there is compelling evidence for a subgroup of patients with markedly decreased M1 availability in the prefrontal cortex ((E. Scarr et al., 2009); see also (Gibbons et al., 2013) and (Elizabeth Scarr et al., 2018)). Moreover, multiple studies have found antipsychotic effects of xanomeline, an M1/M4 agonist (Bodick et al., 1997; Shekhar et al., 2008).

      Against this background, clozapine and olanzapine may seem counterintuitive as treatment options since they antagonize muscarinic receptors. However, the muscarinic system is complex, and the mechanisms by which muscarinic receptors are involved in the therapeutic effects of clozapine and olanzapine are far from being understood. One interesting observation is that both clozapine and olanzapine have been found to elevate extracellular acetylcholine concentrations in cortical regions (Ichikawa et al., 2002; Shirazi-Southall et al., 2002), potentially by blocking muscarinic autoreceptors (Johnson et al., 2005), although this is debated (Tzavara et al., 2006). There is clinical evidence that clozapine or its metabolites may exert their pro-cognitive effects by increasing the release of actetylcholine (Weiner et al., 2004), and preclinical evidence that clozapine is able to normalize M1 receptor availability in cortex (Malkoff et al., 2008).

      Irrespective of the exact mechanism by which clozapine and olanzapine exert their antipsychotic effects, their much higher affinity to muscarinic cholinergic receptors compared to dopaminergic receptors sets them apart from other antipsychotics. If a functional readout of the relative contribution of cholinergic versus dopaminergic deficits could be obtained in individual patients, this might be predictive of whether this patient would profit from clozapine, olanzapine, or, in the future, potential new treatments targeting the muscarinic system specifically.

      Given the above considerations, we have amended the relevant paragraph in the discussion to state this rationale more clearly.

      Notably, there is compelling evidence for a subgroup of patients with markedly decreased M1 availability in the prefrontal cortex ((E. Scarr et al., 2009); see also (Gibbons et al., 2013) and (Elizabeth Scarr et al., 2018)). This is consistent with the possibility that a key pathophysiological dimension of the heterogeneity of schizophrenia derives from a differential impairment of cholinergic versus dopaminergic modulation of NMDAR function (Stephan et al., 2006, 2009). Distinguishing these potential subtypes of schizophrenia could be highly relevant for treatment selection, as some of the most effective neuroleptic drugs (e.g., clozapine, olanzapine) differ from other atypical antipsychotics (e.g., amisulpride) in their binding affinity to muscarinic cholinergic receptors. The exact mechanisms by which muscarinic receptors are involved in the therapeutic effects of clozapine and olanzapine are still under debate and include, for example, elevation of extracellular levels of acetylcholine in cortex (Ichikawa et al., 2002; Shirazi-Southall et al., 2002; Weiner et al., 2004), possibly via blocking presynaptic muscarinic autoreceptors (see (Johnson et al., 2005; Tzavara et al., 2006) for conflicting data), and normalization of M1 receptor availability in cortex (Malkoff et al., 2008). Irrespective of the exact mechanism by which clozapine and olanzapine exert their antipsychotic effects, their much higher affinity to muscarinic cholinergic receptors compared to dopaminergic receptors sets them apart from other antipsychotics. If a functional readout of the relative contribution of cholinergic versus dopaminergic deficits could be obtained in individual patients, this might be predictive of whether this patient would profit from clozapine, olanzapine, or, in the future, potential new treatments targeting the muscarinic system specifically. Indeed, muscarinic receptors have become an important target of drug development for schizophrenia (Yohn & Conn, 2018).

    1. Author Response:

      Reviewer #2 (Public Review):

      The manuscript addresses an important question regarding sensory processing related to self-motion. The main experiment is clearly described and demonstrates that neurons display a diversity of responses from purely reflecting vestibular input (head-in-space motion) to predominantly body motion, and any combination between. Of particular interest, is that the response of the Purkinje cells are profoundly different than its downstream target, the fastigial neurons which signal only head-in-space or body motion. This substantive difference in neural representations between these two connected brain regions is surprising.

      The manuscript also provides a simple population model to show that fastigial responses could be generated from Purkinje cell activity, but only from combining at least 40 neurons. While the model provides some insight on the potential interaction between Purkinje cells and fastigial neurons, I think the model assumes no other input to the fastigial neurons. However, I would assume that there is likely a strong input from mossy fibers onto the fastigial neurons that also target the Purkinje cells. This mossy fiber input will certainly provide vestibular and neck proprioceptive input to the fastigial nucleus. Thus, the Purkinje cell input may be essential for countering the mossy fiber input leading to separate representations for head and body motion in the fastigial nucleus.

      We agree this is an important point. To address the reviewer’s concern, we performed additional modeling in order to consider the influence of mossy fiber inputs. Specifically, following the reviewer’s suggestion below, mossy fiber input was modeled using random patterns of vestibular and neck proprioceptive input. Prior studies have shown that the dynamics of vestibular nuclei neuron responses strongly resemble those of unimodal fastigial neurons in rhesus monkeys (i.e., they encode vestibular input and are insensitive to neck proprioceptive inputs, Roy & Cullen, 2001). In contrast, reticular formation neurons responses to such yaw head and/or neck rotations have not yet been described. We therefore simulated mossy fiber input first as a summation of vestibular and neck proprioceptive inputs, for which the gains and phases were randomly drawn from a distribution, comparable to that previously reported (Mitchell er al. 2017) in the vestibular nuclei (Fig. 7-figure supplement 3). We then further explored the effect of systematically altering this simulated mossy fiber input - relative to the reference distribution of mossy fiber inputs - by i) doubling the gain, ii) reducing the gain by half, iii) doubling the phase, and iv) reducing the phase by half (Fig. 7-figure supplement 4). Overall, we found that the addition of such simulated mossy fiber did not dramatically alter our estimate of the population Purkinje cell population size required to generate rFN neurons responses (~50 versus 40; Fig. 7-figure supplement 3&4).

      Another issue is the limited number of neurons recorded in the secondary experiment with only 12 bimodal neurons and 5 unimodal (although there appears to be only 4 neurons in Figure 5C). Such a small sample impacts the estimated tuning properties of Purkinje neurons in Figure 5D and the results from the population model. This needs to be clearly recognized.

      We have revised the RESULTS to clarify the numbers of Purkinje cells that were tested (13 bimodal and 4 unimodal Purkinje cells). For comparison, in our Brooks and Cullen study, tuning curves were computed for 10 bimodal and 12 unimodal rFN. We note that i) unimodal Purkinje cells make up a relatively small percentage of anterior vermis Purkinje cells and ii) similar to unimodal rFN, our small sample of unimodal 9 Purkinje cells did not demonstrate significant tuning. In contrast, all bimodal Purkinje cells in our sample demonstrated significant tuning. To simulate responses for the bimodal Purkinje cells that were not held long enough to test during gain-field paradigm (i.e., Fig 5), we generated tuning curves drawn from a normal distribution estimated from 13 bimodal Purkinje cells. We appreciate this was not clear in the original submission and have revised the METHODS section to clarify our approach. Overall, while we recognize that our sample size is small, we nevertheless found it interesting that including this our results from this protocol did not increase the estimated population size relative to that estimated using our other dynamic protocols.

      Reviewer #3 (Public Review):

      In this study, the authors characterize the simple spike discharges of Purkinje cells in the anterior vermis of the macaque during passive vestibular and neck proprioceptive stimulation. The activity of most Purkinje cells encoded both vestibular (whole-body rotation) and proprioceptive (body-under-head rotation) stimuli. Although the vestibular and proprioceptive responses were, on average, antagonistic in the preferred direction, consistent with a partial transformation from head to body coordinates, response properties for both modalities were highly variable across neurons. Most cells responded under combined vestibular and proprioceptive stimulation (head-on-body rotation), and these responses were well-approximated by the average of the responses to each modality individually. Vestibular responses exhibited gain-field-like tuning with changes in head-on-body position, though these changes were significantly smaller than the shifts observed for neurons downstream in the rostral fastigial nucleus. Finally, a weighted average of the responses of approximately 40 Purkinje cells provided a good fit to the responses of postsynaptic fastigial neurons.

      Overall, these results provide important and novel insights into the implementation of coordinate transformations by cerebellar circuitry. The experiments are well-designed, the data high quality, the analyses reasonable, and the conclusions justified by the data. The manuscript is clear and well-written, and will be of interest to a broad neuroscientific audience. I have no major concerns. I have a few minor suggestions for improving this manuscript, described below.

      1 - The authors may wish to discuss earlier work in the decerebrate cat by Denoth et al. (1979, Pflügers Archiv), which provided evidence that the responses of Purkinje cells in the anterior vermis to head-on-body tilt is relatively well-approximated by averaging the responses to neck and macular stimulation alone.

      We thank the reviewer for bringing this reference to our attention and have revised the INTRODUCTION and DISCUSSION to include the early work of Denoth et al.,1979.

      2 - To better convey the heterogeneity of responses across the sample of Purkinje cells, two additional supplemental figure panels might be useful: (1) the vestibular, proprioceptive, summed, and combined sensitivities in each direction (as in the Fig. 3C insets) for each individual neuron (perhaps as a series of subpanels), and (2) scatterplots of response phase for proprioceptive vs vestibular stimulation for bimodal neurons (with separate panels for preferred and non-preferred directions).

      We agree that this is a useful way to emphasize the heterogeneity of bimodal Purkinje cells responses and have added the requested response phase scatterplots for proprioceptive vs vestibular stimulation (Fig 2 - figure supplement 2C&D). We have also made a figure showing the summation model for each individual neuron. However, because our Purkinje cell population included 73 neurons, this figure includes a corresponding 73X2 =146 polar plots (i.e., two plot each cell, one for ipsi and contralateral motion). Given the immense size of this figure, we elected not to include this figure in the supplementary material in the revised manuscript.

      3 - Can the authors provide additional information on the approximate location of the recorded neurons (lobule and zone or mediolateral position)? Is it possible that some project to the vestibular nuclei, rather than the rFN? This consideration seems especially relevant for the interpretation of the pooling analysis in Fig. 6, which seems to assume that Purkinje cells are sampled from a sagittal zone with overlapping projections in the rFN (or, at least, that the response properties of the sampled neurons are representative of the properties in a corticonuclear zone). Some additional discussion on this point would be helpful.

      The recorded neurons were located in the lobules II-V of the anterior vermis, ~0 to 2 mm from the midline. We now include this information in the revised METHODS. As noted by the reviewer, Purkinje cells in this region of the anterior vermis project to the vestibular nuclei as well as to the rFN (Voogd et al. 1991). Nevertheless, using comparable stimulation protocols, we have previously shown that the responses of vestibular nuclei neurons are comparable to those of unimodal rFN neurons (Brooks et al., 2015). Specifically, both vestibular nuclei and unimodal rFN neurons are insensitive to proprioceptive stimulation and demonstrated comparable responses to vestibular stimulation. Thus, our present modeling results regarding the population convergence required to account for unimodal rFN neurons can be directly applied to vestibular nuclei neurons. We have revised the DISCUSSION to consider this point.

      4 - When weighted averages of Purkinje cell responses are used to model rFN responses, my intuition would be that w_i is near zero for v-shaped and rectifying Purkinje cells. That is, the model would mostly ignore them, as data from both directions appear to be included. Is this the case? A more detailed description of the fitting procedure would also be helpful.

      To address the reviewers’ concerns regarding the Purkinje cell weights, we have added a new inset to Fig 7C. As can be seen, model weights are well distributed across different Purkinje cells. Further, to confirm that the distribution of the weights of Purkinje cells inputs are distributed over different classes of PCs we now illustrate the weight distributions for (a) linear vs. v-shaped vs. rectifying Purkinje cells, (b) bimodal vs. unimodal Purkinje cells, (c) Type I vs. Type II Purkinje cells and (d) Purkinje cells with agonistic vs. antagonistic vestibular and proprioceptive sensitivities. These results are shown in Figure 7-supplemental figures 1&2. Overall, we found that distribution of the weights was not biased towards linear cells, but rather were similarly distributed across all three groups. This was true for our modeling of both bimodal and unimodal rFN cells (compare Fig 7- figure supplement 1 vs. Fig 7- figure supplement 2). As can be seen in this Figure, we likewise found comparable results for the weights of Type I vs. Type II Purkinje cells, unimodal vs. bimodal Purkinje cells, and/or vestibular / proprioceptive agonist vs. antagonist bimodal neurons. Finally, as detailed above in our response to the reviewers’ consensus comments, we have also revised the METHODS section to provide a more detailed description of linear regression method.

      5 - Another potential interpretive issue in the averaging analysis concerns the presence of noise on single trials. The authors could briefly comment on whether more Purkinje cells might be needed to predict rFN responses on a single trial in real time.

      This is an interesting question; we have revised the DISCUSSION to consider this point.

    1. Author Response:

      Reviewer #1:

      The aim of this paper to reveal the mechanisms that establish the Wnt gradient combining a mathematical model and experiments is of general importance. The results of computer simulations and biological experiments are interesting because they consider multiple extracellular components. They successfully demonstrated that the ligand/receptor feedback and the other extracellular components shape the morphogen gradient of Wnt ligand so that the fine patterning found in heart development can be explained. However, I feel that quantification of the experimental data, explanation of the mathematical model and discussion of the results are not sufficient in the current manuscript.

      Major points:

      1. Experimental validation of the results of computer simulations is very important in this study. However, many of experimental data were not properly quantified or statistically tested. The authors would need to quantify the experimental results when appropriate and perform statistical tests (e.g. Figs. 1E, 2A, 4A-B, Supplemental Figs. 6, 7).

      We are sorry for the lack of quantitative and statistical analyses in many experiments. We revised all the points (graphs and statistical analyses in Figs 1, 2, 4; Figure 1-figure supplement 1; Figure 3-figure supplement 7; Figure 4-figure supplement 1, 2).

      1. Design of the mathematical model is not sufficiently explained in the main text. Besides details in the method section, the basic design of the model and simulation should be briefly explained. For example, initial distribution of Fzd7, regions that produce Wnt6 and sFRP1, and interpretation of the simulation results should be added for Fig. 3 (page 10, line 11-16).

      We are sorry for the inconvenience. In this revision, we wrote the basic design of the model and simulation in the main text.

      As an interpretation of the simulation results, we added an explanation as follows:

      The Wnt signaling gradient became steeper with increased feedback strength. Considering a threshold of signal activation (Fig. 3A, dashed line), feedback results in restriction of the Wnt-activated region.

      1. The authors demonstrated the roles of Wnt6/Fzd7 feedback and sFRP/Heparan sulfate binding. A typical simulation data showing the roles of sFRP and Heparan sulfate would need to be shown in the main figure.

      Thank you for your suggestions. We moved a typical result of sFRP/HS simulation from the original supplemental figure to a main figure (Fig. 4G).

      Unfortunately, they did not sufficiently discuss their actions using the mathematical model. They would need to at least qualitatively discuss these points. How do they control Wnt gradient? What are the roles of these two mechanisms? What are the difference? How do they influence with each other? Simplified models may be necessary to reveal the relationship between these two mechanisms and to gain mechanistic insights.

      Thank you for pointing out these critical points.

      For Wnt gradients, receptor feedback, sFRP, and HS are synergistically acting for the restriction of signal activated region (steep gradient).

      However, there are some differences. The receptor-feedback can overcome the variation of Wnt production but sFRP1 and HS cannot because sFPR1 expression is inhibited by Wnt, which forms a positive feedback loop for Wnt signaling (Gibb et al., 2013). Thus, sFRP1/HS cannot buffer the variation of Wnt production.

      In this revision, we added these explanations.

      [They will influence each other] Because sFRP1 inhibits Wnt signaling, sFRP1 reduces fzd7 expression. This occurs mainly in the right side (because sFRP1 is expressed in the right side), resulting in a short-range activation of Wnt signaling.

      Deeply considering your comments, we recognized that we did not describe sFRP1/HS function in the title of the previous version. We revised it as follows:

      Previous) Positive Feedback Regulation of fzd7 Expression Robustly Shapes Wnt Signaling Range in Early Heart Development

      Current) Positive feedback regulation of fzd7 expression robustly shapes a steep Wnt gradient in early heart development, together with sFRP1 and heparan sulfate

      Additionally, the situation studied in this paper would need to be compared with the other examples of ligand/receptor feedback, and the similarity and difference should also be discussed (e.g. Hedgehog/Patched and Wingless/Frizzled2 in the fly wing).

      Thank you for your helpful comments.

      As you mentioned, the gene regulatory circuit of our Wnt6/Fzd7 is similar to that of Hedgehog (Hh)/Patched (Ptc): both of the morphogens commit self-enhanced degradation via induction of receptor expression (Eldar et al., 2003; Hh induces Ptc expression, and this increases Hh degradation). In the case of Wingless/Frizzled2, the gene regulatory circuit is different from that of Wnt6/Fzd7: Wingless commits self-enhanced degradation via repression of receptor expression. Wingless inhibits Fzd2 expression, and Fzd2 inhibits Wingless degradation. Both gene regulatory circuits function as a robust system for morphogen variations (Alon, 2006).

      There is also a little difference between Wnt6/Fzd7 and Hh/Ptc. In the Hh, the receptor Ptc inhibits downstream signaling. Thus, the network of Hh restricts the ligand distribution as is the case with Wnt, but the signal activity is not as steep as Wnt (highly Ptc expression inhibits the signaling).

      We added these explanations.

      Reviewer #2:

      In this work, the authors tried to understand the effect of receptor and diffusible inhibitors on the Wnt morphogen gradient during heart development by combining experiment and computational modeling. The experimental part seems to be a solid contribution to this academic field, and I appreciate the interdisciplinary attempt to combine the results with the computational model. However, their results may be interpreted more clearly using classical mathematical models.

      First of all, we greatly thank you for evaluating our manuscript. And thank you very much for explaining classical models in detail.

      1. Classical models may be enough.

      Previous mathematical models provided stronger predictions than numerical simulations, and I am not sure numerical results provided by the authors give us new insights. For example, Eldar et al. (2003) have provided analytical results on why the concentration becomes robust. In normal SDD model

      u'(x,t) = -d_1 u(x,t) + d_u \Delta u(x,t),

      the steady-state solution is exponential function,

      u_s(x) = u_0 exp(- \sqrt (d_1/d_u)x)

      , and the amount of morphogen production at the boundary critically affects the result (If the production becomes 1/2, the concentration becomes 1/2 everywhere). On the other hand, if the degradation is promoted by the morphogen itself (in this case, by the upregulation of the receptor expression), the governing equation becomes

      u'(x,t) = -d_2 u(x,t)^2 + d_u \Delta u(x,t),

      the solution is

      u_s(x) =A/(x+x_b)^2

      ($A$ and $x_b$ are constants determined by $d_u$ and $d_2$). It converges to

      u_s(x) =A/x^2

      and the morphogen gradient profile does not change much when the morphogen production is relatively high (that means there is a condition to be robust).

      Similarly, a linear approximation is enough to understand the diffusion length change - diffusion length of the morphogen gradient (the length necessary to become morphogen concentration 1/e) is in general $\sqrt{D_u /d_1}$, and feedback mechanism should increase d_1 in first-order estimation, hence decreasing the diffusion length. Binding to HSPG may have a similar effect (in the case of FGF, HSPG is necessary to the binding of FGFR, and the situation is very different).

      Thank you again for your explanations. Our explanations in the previous manuscript were not enough.

      –Difference of our computational simulation and the classical analysis:

      We think we need numerical simulation to consider points not addressed with previous analytical methods. The following two points are the new points that are too complicated to handle with analytical methods.

      1. Transient state is considered, which is hard to analyze without computer simulation.

      Considering the in vivo situation, we cannot determine whether the fate determination takes place at a transient or steady state (as described in page 7, line 14). So, we analyzed it not limited to a steady state but including transient state in our simulation.

      1. Receptor has multiple functions in interaction with multiple molecule species: (i) binds to the ligand and restricts the ligand spreading, (ii) activates the intracellular signaling, and (iii) degrade the ligand (new Supplementary Fig. 1A). We would like to include these different functions separately in the simulation. In addition, we considered sFRP1 and N-acetyl-rich HS. Thus, we need a multivariate nonlinear reaction-diffusion equation, which is hard to handle without computer simulation.

      To clarify these points, we added an explanation of the multiple receptor functions with a schematic figure (Supplementary Fig. 1A).

      –Importance/significance of our simulation:

      We first confirmed that our simulation reached a similar conclusion as the classical simulation at a certain time point (~ 1 day after the onset of simulation): the network was robust against variation of Wnt production. In addition, examining the time change of activation level, we have found that this network is robust against changes in speed of the differentiation. We added these explanations.

      1. Biological example of Wnt fluctuation

      The authors examine the effect of Wnt production fluctuation, but their motivation is not clear. Eldar et al. (2003) is motivated by the fact that the Shh heterozygote knockout has no phenotype, although the amount of mRNA is halved. Theoretically, it should have a major effect on the organs utilizing the Shh morphogen gradient (actually, haploinsufficiency is observed, but the phenotype is mild). The authors would need to provide some argument why they are interested in the robustness to the Wnt expression fluctuation.

      We all agree with your opinion. Compared with Eldar et al. (2003), our motivation is not clear to set 50% for the variance of ligand production.

      It is generally accepted that gene expression is different between individuals. In contrast, the proportions of the patterned tissues are almost the same among individuals.

      We examine this general question in our specific example of Wnt production. Here we focused on an extreme example (50% increase) among various sizes of gene expression.

      We added a phrase “as an extreme case” to clarify that it is an example in the revised manuscript.

      1. Wnt signal distribution

      It is difficult for general readers to understand why the Wnt signal distribution in the simulations (0 around 0-10 µm, Sudden disappearance at 40 µm) is appropriate. The authors can provide the profile plot of the actual measurement, which corresponds to the modeling result.

      Sorry for this inconvenience. As indicated in Figure 1—figure supplement 1B, Fzd7 shows a limited expression in pericardium. Fzd7 expression was not detected in epidermis (Figure 1—figure supplement 1B), which is the Wnt source (Lavery et al., 2008), indicating that the sudden increase of Fzd7 expression near Wnt source (at x = 10 μm) is reasonable (because the amount of Wnt at x = 10 μm is considered to be above the threshold for Fzd7 expression). In the prospective myocardium region, Fzd7 expression was also disappeared suddenly (Figure 1—figure supplement 1B), suggesting that the activity of Wnt signaling is also disappeared suddenly in the region. We added the explanations.

      In addition to the indirect estimation of Wnt signaling from Fzd7 expression, to directly confirm the “sudden disappearance” of Wnt signaling, we tried following three ways, but they failed. We examined (i) a transgenic reporter line of Wnt signaling (TCF-promoter-driven GFP) and (ii) immunohistochemistry (IHC) of beta-catenin (nucleus localization of beta-catenin is an indicator of the activation of Wnt signaling) and (iii) IHC of active beta-catenin (which only detect the active form of beta-catenin), expecting more gradual signal distribution, compared to the readout of Fzd7 expression which may have a threshold to express. But (i) the background signal was high in the transgenic. (ii) The background signal was also high with IHC maybe because beta-catenin is abundant also in the cytoplasm in heart region. (iii) The signal of active beta-catenin was not changed by Wnt addition in Xenopus.

      In addition, about the width of wnt6 and fzd7 expression, we measured the actual size of the fzd7-expressed region (Figure 1—figure supplement 1B), which was around 32 μm. It was almost the same as that in the model (30 μm). The width of Wnt6-expressed region was set to be 10 μm following a previous report (Lavery et al., 2008). We added explanations for the width of the expressions.

      1. Variable "Wnt signal"

      It is not clear what the variable "Wnt signal" means. As far as I understand, the signal inside the cell changes quickly (in the case of FGF, the ERK phosphorylation state changes within a minute). The author should provide a concrete example of this "Wnt signal" (maybe mRNA expression of some marker gene?).

      We agree with your opinion. As an indicator of Wnt signal activation, we think of the translocation of β-catenin (a transcriptional regulator) into the nucleus. Indeed, the translocation is observed at least in a 15 min and concurrently the transcription of the target gene is observed (Kafri et al., 2016), suggesting this translocation (the activation of the signal in the cells) is recognized enough by the cells within a minute. We added this explanation.

      1. Use of BMP measurement values.

      In addition, I am not sure whether using BMP values for the estimate of Wnt dynamics is appropriate. I have an impression that BMP is a fast-diffusing molecule that has a less binding affinity to ECM compared to FGFs. Although I have not dealt with Wnts, they are reported to bind strongly to ECM.

      Thank you for the comments. In this revision, we used all of the reported Wnt values. According to this parameter change, we performed computer simulation again. All the conclusions were not changed.

      Reviewer #3:

      A summary of the study and the strengths of this manuscript: The authors found several new molecular interactions that may be essential for understanding the mechanism of steep gradient formation of Wnt ligands in the prospective cardiac field.

      One of the new findings is that expression of a Wnt receptor, Frizzled7, in the prospective heart field is activated by Wnt/b-catenin signaling, as well as by Wnt6 ligands, which is involved in the patterning of this field. They also found that the diffusing Wnt6 ligand is trapped at the surface of cells in which Frizzled7 is ectopically expressed. It seems reasonable that the combination of signal-dependent receptor expression and receptor-dependent ligand capture would result in a steep gradient of morphogen molecules. In fact, this idea is supported by mathematical modeling. In addition, this modeling suggests that the receptor feedback mechanism provides robustness to morphogen-mediated patterning against fluctuations in morphogen production.

      Another highlight of their study is that the soluble Wnt antagonist, sFRP1, specifically binds to N-acetyl HS, and this modification of HS is specifically detected in the outer of the cardiogenic field. The localized N-acetyl HS may also be involved in Wnt gradient formation by inhibiting Wnt signaling around myocardium region.

      The weaknesses of this manuscript: Although the issue they address in this manuscript is very important for understanding the mechanism of morphogen-based tissue patterning, most of the experimental data presented in this manuscript are preliminary.

      We added and revised many experiments (including computational analysis) in this revision. In particular, in Figs 1, 2, 4; Figure 1-figure supplement 1; Figure 3-figure supplement 7; Figure 4-figure supplement 1, 2.

      Therefore, interpretations other than the ones they have argued for in this manuscript are quite possible. any other interpretations except those they claimed in this manuscript are still possible.

      For example, the authors argue that receptor feedback is essential for the formation of steep Wnt gradients (lines 8-9 in the abstract), but their model does not rule out an alternative possibility that high levels of receptor expression in the cardiogenic field form steep gradients.

      We agree.

      As you mentioned, high levels of receptor expression can form steep gradients. In a case distributions are similar with and without feedback, the changes in the boundary position in response to Wnt production change seemed smaller with feedback than without (Fig. 3B), providing a possibility that feedback has higher robustness to the variation.

      These explanations were poor in the previous version. We added explanation.

      In addition, it would be a waste of energy because too much receptor expression is needed. If the initial expression of receptor is critical for the patterning (not the receptor feedback), the amount and the area should be tightly controlled by an additional mechanism.

      We added these explanations to the result and discussion sections.

      Furthermore, they have not succeeded in directly examining the effect of receptor feedback on Wnt6 gradient formation. Although the data shown in Supplementary Figure 6E appear to support the contribution of feedback mechanisms to patterning, the results do not exclude another interpretation that an increase in Wnt trapper molecules simply inhibits the receptor-mediated clearance of Wnt ligands from the extracellular space in the pericardial region, resulting in an increase of extracellular Wnt ligands and their long-range transport.

      Thank you for your comment. As you mentioned, the Wnt trapper inhibits clearance. However, at the same time as it inhibits clearance, it also inhibits diffusion of Wnt. These two inhibitions happen simultaneously for the same duration. Thus, the trapper will not promote long-range transport via competitive inhibition of the Wnt clearance.

      Thus, from the results using the trapper, we can conclude that the receptor expressed after the activation of Wnt signal (not the initial amount of receptor) is critical for determining the range of Wnt signaling (e.g. the width of the resulting pericardium).

      We added these explanations in the new text.

      With regard to the restriction of sFRP1 diffusion, no evidence has been presented to show that N-acetyl modification of HS is actually involved in the restriction of sFRP1 diffusion, the formation of Wnt gradient, and the patterning of prospective cardiac fields. This lack of data significantly undermines the credibility of the conclusions presented in this paper.

      We performed a new experiment.

      We overexpressed Ndst1 enzyme that modifies N-acetyl to N-sulfo HS to eliminate N-acetyl HS, and analyzed if heart patterning is changed. We revealed that Ndst1 expression results in a reduced pericardium but an increased myocardium region, suggesting that N-acetyl HS promotes pericardium differentiation and inhibits myocardium differentiation.

      We added these explanations and figures (Fig. 4F; Figure 4-figure supplement 2A-C).

    1. A list of all the questions that Vannevar Bush poses in the piece:

      • What are the scientists to do next?
      • Of what lasting benefit has been man's use of science and of the new instruments which his research brought into existence?
      • Is this all fantastic?
      • Will there be dry photography?
      • What would it cost to print a million copies?
      • The preparation of the original copy?
      • To consider the first stage of the procedure, will the author of the future cease writing by hand or typewriter and talk directly to the record?
      • Is it not possible that some day the path may be established more directly?
      • Might not these currents be intercepted, either in the original form in which information is conveyed to the brain, or in the marvelously metamorphosed form in which they then proceed to the hand?
      • Is it not possible that we may learn to introduce them without the present cumbersomeness of first transforming electrical vibrations to mechanical ones, which the human mechanism promptly transforms back to the electrical form?
      • True, the record is unintelligible, except as it points out certain gross misfunctioning of the cerebral mechanism; but who would now place bounds on where such a thing may lead?
      • Must we always transform to mechanical movements in order to proceed from one electrical phenomenon to another?
    1. Author Response:

      Evaluation Summary:

      This work provides new insights into how surface-exposed lipoproteins of Gram-negative bacteria reach their destination in the outer membrane. Authors find that the outer membrane protein complex Slam serves as a translocon for the lipoproteins and the periplasmic chaperone Skp mediates their targeting to Slam. This work may contribute to the elucidation of host invasion mechanisms by pathogenic bacteria, in which surface lipoproteins play an important role.

      Reviewer #1 (Public Review):

      Previously, using rigorous genetic, bioinformatic and cell-based biochemical analyses, the same group discovered SLAM1, an outer membrane protein in Neisseria spp., which mediates the membrane translocation of surface lipoproteins (SLPs) (Hooda et al. 2016 Nature Microbiology 1, 16009). Here, authors reconstituted this system in proteoliposomes using minimal purified components including the translocon Slam1 and the client lipoprotein TbpB. Authors further coupled the system to TbpB-expressing E. coli spheroblasts and LolA, the Slam1-specific periplasmic shuttle system. Using the digestion pattern of TbpB by Proteinase K as a readout, authors confirmed that Slam1 indeed served as a translocon for SLPs. As a step forward, authors found that Skp, a periplasmic chaperone (holdase), was critical to the membrane-assembly and translocation of TbpB. Strengths: Overall, this is a solid biochemical study that demonstrates the role of Slam1 as a translocon for SLPs. The experimental design is neat and straightforward. The specific role of Skp in SLP translocation is interesting. This reconstituted system will serve as a novel platform for further elucidation of the Slam1-mediated SLP translocation mechanisms. The manuscript is overall well written. Weakness: There are several major concerns, however. 1) It is not fully convincing whether these findings are novel and significantly advance the field. Identification of minimal components in a biological process and their reconstitution are always challenging and thus, this study is an achievement. Nonetheless, I am not sure whether we have learned novel molecular insights besides the confirmation of the group's previous discovery. The specific role of Skp in translocation is interesting but not surprising, considering that periplasmic holdases are already known to be extensively involved in the biogenesis of periplasmic and outer membrane proteins.

      We thank the reviewer for their time and thorough review of the manuscript. In the previous paper (Hooda et al. 2016 Nature Microbiology 1, 16009), we discovered that the outer membrane protein Slam is “important/responsible” for the surface display for SLPs (TbpB, LbpB, fHbp). In this mechanism focused manuscript, we were able to demonstrate Slam’s role as an outer membrane translocon. One of the achievements in this paper is to demonstrate that Slam as an autonomous translocon – importantly this is unlike the two-partner secretion systems, as it does not require the Bam complex for the translocation of TbpB.

      2) Although authors developed nice assays (Figs. 1 and 2), it was not verified whether TbpB protected from Proteinase K digestion had "correct" conformation and membrane-topology. Authors performed a functional assay on TbpB (Fig. 5a), but this result was obtained from a cell-based assay, not from the reconstituted system.

      We have performed pulldown assay for the TbpB that has been translocated into Slam-proteoliposomes using human transferrin conjugated beads to show that this TbpB protein is correctly folded and functional. Blots and explanations are attached in the revised manuscript (see new Figure 2 – figure supplement 2 and line 197-207). (As addressed in major scientific concerns point 2-i).

      Although the data in Figs. 1 and 2 clearly show that the membrane association of TbpB depends on Slam1, it does not mean that the "translocation" has actually occurred in the proteoliposomes. Probably, more rigorous analysis on the Proteinase K-protected portion of TbpB (for example, mass spec) seems necessary (that is, whether the proteolytic product is expected based on the predicted topology).

      The TbpB is flag-tag at its C-terminus and the protected band on our blots (detected by α-flag antibody) corresponds to the expected Mw (~75kDa) for Mcat TbpB flag tagged protein. Therefore, we believed the band at 75kDa is our full length processed TbpB. Moreover, we have confirmed that TbpB can be detected at the top of the sucrose gradient with our Slam-proteoliposomes in this assay. This would only occur if TbpB was actually translocated inside the intact liposomes, otherwise we should not see any TbpB in the top layer of the sucrose gradient (Figure 4d). Furthermore, we have performed a pulldown assay for TbpB in proteoliposomes to check for their functional binding to human transferrin beads after translocation. These results are explained in the updated new Figure 2 – figure supplement 2 and line 197-207.

      3) The manuscript has a couple of missing supporting data. 3a) Lines 87-89: "From our analysis, we found that the Slam1 from Moraxella catarrhalis (or Mcat Slam1) expressed well and the purified protein was more stable than other Slam homologs." I cannot find the expression and stability data of various homologs supporting this sentence.

      In general, what we meant was that we chose Mcat Slam as the target of this study because it is more stable during the purification and resulted in a higher yield of protein. We needed higher yields of Slam to be able to reconstitute the protein into the liposomes for the translocation assay. We have purification data for Mcat Slam1, Nme Slam1 and Ngo Slam2 but we think including them in the supplementary is not necessary. We have changed and rewritten this section dedicated to Mcat Slam1 purification (Figure 1 – figure supplement 1 and 2).

      3b) "Lines 216-219: Furthermore, the processing of TbpB by signal peptidase II and subsequence release from the inner membrane was unaffected suggesting the defect in surface display by Skp occurs after the release of TbpB from the inner membrane (Fig. 4a)." The result supporting this sentence seems missing or this sentence points to a wrong figure.

      Yes, this sentence is misleading. What we meant was that the processed TbpB (TbpB has 2 bands, unprocessed TbpB – upper band and signal peptidase processed, lipidated TbpB - lower band) is similar for all samples indicating that the knockout of Skp did not affect the expression or processing of the signal peptide of TbpB up until it is ready (processed and lipidated in the periplasm) for translocation by Slam to the surface. We have added an explanation in the figure legend of Figure 4a –line 267-269.

      4) Some statistical analysis results are not clear, making some conclusions not convincing. 4a) Figure 4a top "Exposure of TbpB on the surface of K12 E. coli" Apparently, all three data points for (Delta_DegP+Slam1+TbpB) are very closely distributed. Accordingly, (WT+Slam1+TbpB) vs (Delta_DegP+Slam1+TbpB) data look significantly different (difference is ~0.2). But the two data were assigned as "Not Significant". Similarly, in the comparable in vitro data (Figure 4b), the intensity for Slam1 (WT+Proteinase K - Triton) looks larger than that for Slam1 (Delta_DegP + Proteinase K - Triton). So, the DegP contribution should not be ignored.

      For figure 4a, the ONE WAY ANOVA test was performed using Prism with 4 biological replicates (we can include the analysis report in the revised submission if this is requested we have updated the figure to include data points. In general, both our in vitro liposomes translocation assay and in vivo surface exposure assay for TbpB showed that delta-DegP only slightly reduces the translocation of TbpB to the surface but could not detect statistically significant differences.

      4b) Figure 5a top "Exposure of TbpB on the surface of N. meningitidis" What is the p-value for WT vs Delta_Skp data? Are the two data significantly different? The p-value range for (*) is not shown.

      We have included the p-value range for (*) in the revised manuscript, figure 5a.

      Reviewer #2 (Public Review):

      The article addresses the function of SLAM, a protein which the authors have shown previously to be involved in the traffic of lipoproteins to the bacterial surface. The authors have performed a series of experiments to assess the impact of SLAM on the delivery into proteoliposomes of the model lipoprotein TbpB either added exogenously or presented by E coli spheroplasts. They identify a periplasmic chaperone, Skp, which enhances transport of TbpB and other lipoproteins to proteoliposomes, and show the contribution of endogenous Skp to lipoprotein transport in Neisseria meningitidis. The authors set up an in vitro translocation assays using purified components from different bacteria. This is reasonable as the assays can be challenging to establish and require proteins that can be expressed and are stable. It would be helpful however if the sources of the proteins and how they are tagged (for their detection) is clearly documented in the article and the figures. In keeping with this, the figures describing the assays could be improved (ie 1A, 2A, 3A and C). Despite this, the results presented in Fig 1 and 2 clearly demonstrate the role of SLAM as a translocase, and the authors have included appropriate controls for their assays; the translocation of a OmpA to demonstrate that the Bam complex is functional in their hands in an important control and should be included in the main figures. Experiments outlined in Figure 3 and Table 1 demonstrate the interaction specific of TbpB and another lipoprotein HpuA with Skp, a previously characterised periplasmic chaperone. This is performed by pull-downs and MS as well as immunobloting. A critical result is shown in Figure 4 in which SLAM and TbpB are introduced into E coli, and the role of endogenous Skp is assessed. Importantly, the absence of Skp reduces but does not eliminate TbpB surface expression. The authors could speculate on the nature of Skp-indendent surface expression of TbpB, as this result mirrors what they find in a meningococcal strain lacking Skp (Figure 5A). It appears that Skp might be required for the correct insertion/folding of lipoproteins given their result in Figure 5B (currently, this could be changed into 5C) which tests the binding of transferrin to the bacterial surface. Clearly this could be influenced by an effect of Skp on TbpA, which acts as a co-receptor with TbpB. In summary, the authors have used appropriate assays to reach their conclusions about the role of SLAM as a translocase and the contribution of Skp to the localisation of lipoproteins to the surface of bacteria. The findings presented are robust and shed new insights into the sorting of proteins in bacteria, an incompletely understood process which is central to microbial physiology, viurlence and vaccines.

      Reviewer #3 (Public Review):

      Slam was identified as an outer membrane protein involved in the translocation of certain lipoproteins to the cell surface in Neisseria meningitidis. Slam homologs were also identified in other proteobacteria. However, direct evidence that Slam is an outer membrane translocation device is still missing. In this paper, the authors set up an in vitro translocation assay to probe the role of Slam proteins in the translocation of the lipoprotein TbpB. Although they provide strong data supporting the role of Slam in lipoprotein translocation, further molecular dissection is required to unambiguously establish Slam as a lipoprotein translocator. The work is interesting and the paper clearly written. The authors also discovered a functional link between the periplasmic chaperone Skp and Slam-dependent lipoproteins, which is a novel and interesting finding.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2021-01015

      Corresponding author(s): Jordan, Raff

      1. General Statements [optional]

      We thank the reviewers for their thoughtful and constructive comments and have now revised our manuscript accordingly. We apologise that it has taken so long to send in these revisions, but this is in part because both first authors have now left the lab.

      2. Point-by-point description of the revisions

      Reviewer #1

      This reviewer was generally supportive. They note that it is unfortunate that our data suggests the CP110/Cep97 complex does not play a major part in controlling daughter centriole growth—although we believe that this is an important negative result—but feel that other aspects of our data are interesting. They requested no further experiments, but did comment that it would be interesting to determine when g-tubulin is incorporated into growing centrioles. Unfortunately, we cannot test this as the centrioles in these embryos recruit large amounts of g-tubulin to their PCM, so we cannot specifically assay the small amount of protein in the centriolar fraction.

      Reviewer #2

      Major Points:

      __Figure 1: The reviewer notes that Sas-4 and CP110 have antagonistic roles in promoting/repressing centriole growth and asks if Sas-4 is involved in promoting centriole elongation and whether it also oscillates. __It is unclear if Sas-4 directly promotes centriole elongation in flies. We have previously shown that centriolar Sas-4 levels do oscillate during S-phase, but with a timing that is distinct from CP110/Cep97 (Novak et al., Curr. Biol., 2014). These observations do not shed much light on the potential antagonistic relationship between CP110/Cep97 and Sas-4, so we do not comment on this here.

      Figure S1B: The reviewer requests that we image the centrioles with greater laser intensity to test whether some residual CP110 or Cep97 protein can be recruited in the absence of the other protein. The quantification of this data suggests that some residual CP110 or Cep97 can still be recruited to centrioles in the absence of the other (Graphs, Figure S1B,C), so we do not think it necessary to repeat this experiment at higher laser intensity to further test this point. We now state that the centriolar recruitment of one protein may not be completely dependent of the other (p6, para.2).

      Figure 3: The reviewer questions whether the reduction in CP110/Cep97 levels at the mother centriole that we observe during S-phase could be due to photobleaching. This is an interesting point that we now analyse in more detail (p8, para.2). We do not think the decrease in mother centriolar CP110/Cep97 levels is due to photobleaching as our new analysis (which includes more data points during mitosis) strongly suggests that centriolar levels on the mother rise again at the start of the next cycle (New Figure 3C,D).

      The reviewer asks whether the CP110/Cep97 oscillations occur at the tip of the growing centriole, and whether we can use super-resolution imaging to address this. A large body of evidence indicates that CP110/Cep97 are highly concentrated at centriole distal tips, and all our experiments suggest that it is this fraction that is oscillating. In Figure 3, for example, we use Airy-scan super-resolution imaging to follow the oscillation on Mother and Daughter centrioles in living embryos. Although the resolution in these experiments is not as high as we can achieve using 3D-SIM in fixed cells, it seems reasonable to assume that the dots of fluorescence we observe oscillating on these centrioles (Fig. 3) are the same fluorescent dots we observe localised at the distal tips of the mother and daughter using 3D-SIM in fixed cells (Fig. 1A).

      The reviewer requests additional quantification of the western blots shown in Figure S1 that we use to judge relative expression levels. As we now describe in more detail in the M&M, these ECL blots are very sensitive, but highly non-linear, so we usually estimate relative expression levels by comparing serial dilutions of the different fractions (see, for example, Figure 1B, Franz et al., JCB, 2013). As we now clarify, the key point is not precisely by how much these proteins are over- or under-expressed, but that we observe a similar oscillatory behaviour when they are either over- or under-expressed.

      __The reviewer points out that our statement that the CP110/Cep97 oscillation is entrained by the Cdk/Cyclin oscillator (CCO) is too strong as it is based only on a correlation. __We agree and apologise for this overstatement. To address this, we have now perturbed the CCO by halving the dose of Cyclin B (New Figure 5E—H). This extends S-phase length and we now show that the period of the CP110/Cep97 oscillation is also extended. This suggests that the CCO directly influences the period of the CP110/Cep97 oscillation.

      The reviewer notes that our conclusion that the centriole cartwheels are longer or shorter when CP110 or Cep97 are absent or overexpressed, respectively, is based only on Sas-6-GFP fluorescence intensity. They ask if this fluorescence intensity perfectly reflects cartwheel length, and if we can confirm these conclusions using EM. Sas-6 is the main structural component of the cartwheel, so the amount of Sas-6 at the centriole should be proportional to cartwheel length, and we have published two papers that support this conclusion and that use the incorporation of Sas-6 as a proxy to measure cartwheel length (Aydogan et al., JCB, 2018; Aydogan et al., Cell, 2020). Importantly, our previous EM studies support our conclusions about the relationship between cartwheel length and CP110/Cep97 levels: the centrioles in wing-disc cells are slightly longer in the absence of CP110 and slightly shorter when CP110 is overexpressed (Franz et al., JCB, 2013). The new findings reported here provide a potential explanation for this EM data, which was puzzling at the time.

      Minor Points:

      Figure 1C: The reviewer noted that our schematic illustrations in this Figure could be misleading____. We agree and have now redrawn them.

      Reviewer #3

      Major points:

      The reviewer requested that we clarify our use of the term oscillation, pointing out that oscillations are repetitive variations in levels/activity over time, whereas the “oscillations” we describe here occur during each round of centriole assembly. This is a fair point, and one that is often debated in the oscillation field, with many believing that too many biological processes are termed “oscillations”, when they are not truly driven by the passage of time. To avoid any ambiguity, we now no longer describe the behaviour of CP110/Cep97 as an oscillation (although, for ease of discussion, we still use the term in this letter).

      The reviewer thought that the data we show in Figure 1 was not relevant as we largely analyse centrioles in living embryos whereas the data in Figure 1 is derived from fixed wing-disc cells—and similar fixed-cell data has been shown in previous studies. The reviewer suggests we use super-resolution methods to analyse Cp110/Cep97 dynamics in the syncytial embryo, and show this relative to Sas-6 and Plk4. They ask if Plk4 and CP110/Cep97 colocalise at any time. While CP110/Cep97 localisation has been analysed by super-resolution microscopy previously (e.g. Yang et al., Nat. Comm., 2018; LeGuennec et al., Sci. Adv., 2020), CP110/Cep97 was a minor part of these studies and our data is the first to show that this complex sits as a ring on top of the centriole MTs in fly centrioles (that lack the complex distal and sub-distal appendages present in the previously analysed systems). As this localisation is important in thinking about how CP110/Cep97 might influence centriole MT growth, we would like to include it. We cannot show this detail in living embryos as the movement of the centrioles reduces resolution and we cannot observe the ring structure.

      Although we do use Airy-scan super-resolution microscopy to study CP110/Cep97 dynamics in living embryos (Figure 3), we cannot do this in two colours (to compare these dynamics to Sas-6 or Plk4 dynamics) as red-fluorescent proteins bleach too quickly. We now show the relative dynamics of CP110/Cep97 and Plk4 recruitment using standard resolution microscopy (New Figure S2). While it is well established that Plk4 and CP110/Cep97 are concentrated at opposite ends of centrioles, they are all recruited to the nascent site of daughter centriole assembly, effectively “colocalising” at this timepoint. This could provide an opportunity for the crosstalk we observe here, and we now mention this possibility (p17, para.1).

      The Reviewer questioned whether the loading of Sas-6-GFP onto centrioles can be used as a proxy for cartwheel length, pointing out that Sas-6 could load into centrioles in a way that does not change the cartwheel structure, and that EM is required to test this. As described in our response to Reviewer #2, Sas-6 is the main structural component of the cartwheel, and we have published two papers that use the incorporation of Sas-6 into the cartwheel as a proxy to measure cartwheel length (Aydogan et al., JCB, 2018; Aydogan et al., Cell, 2020). While we cannot exclude that Sas-6 might also associate with the cartwheel in a way that does not involve its incorporation into the cartwheel, it is not clear how EM might address this question. Moreover, even if such a fraction existed, it should not affect our conclusions—as long as Sas-6 is binding to the cartwheel in some way, then the amount bound should remain proportional to the length of the cartwheel. Perhaps the reviewer is suggesting that we perform an EM time course of cartwheel growth to back up our conclusions from the Sas-6 incorporation assay? If so, we think this impractical. The changes in cartwheel length shown in Figure 6 are revealed from analysing several thousand images of centrioles compared at precise relative time points. Such an analysis cannot be done in fixed embryos by EM.

      Similar to the point above, the reviewer notes that we use the length of the cartwheel to infer centriole MT length, but we never directly measure MT length. They suggest we perform either an EM analysis or use MT markers to directly measure the kinetics of centriole MT growth. In flies (and many other organisms), the centriole MTs grow to the same length as the centriole cartwheel (Gonzalez, JCS, 1998), so we can be confident that the final length of the cartwheel reflects the final length of the centriole MTs. Moreover, we previously measured the distance between the mother centriole and the GFP-Cep97 cap that sits at the distal tip of the centriole MTs as a proxy for centriole MT length, and found that the inferred kinetics of MT growth were similar to the kinetics of cartwheel growth (inferred from Sas-6 incorporation) (Aydogan et al., 2018). This manual analysis was very time consuming, and we have tried to implement computational analysis methods, but so far without success. For similar reasons to those described in the point above, it is not feasible to accurately measure centriole MT growth kinetics by EM (nobody has been able to do this). Moreover, the centrosomes in these embryos are associated with too much tubulin and the centriole MTs are not yet modified (e.g. by acetylation) as the cycles are so fast—so we cannot directly stain the centriole MTs in fixed embryos. We have now toned down our conclusions about MT length throughout the paper, and we make it clear that we cannot directly measure this.

      All of the experiments shown here are performed in the presence of endogenous untagged proteins, and the reviewer wonders if recruitment dynamics might be influenced by competition for binding from the endogenous protein. We have compared the behaviour of many centriole and centrosome proteins in the presence and absence of the untagged WT protein. In all cases, less tagged-protein binds to centrioles/centrosomes in the presence of untagged protein, presumably due to competition. Apart from this, however, we usually observe no real difference in overall dynamics and in Reviewer Figure 1 (see below) we show that CP110-GFP and GFP-Cep97 both oscillate even in the absence of any endogenous protein. As we feel this result is not very surprising, we do not show it in the manuscript.

      The reviewer correctly noted that our data was not strong enough to conclude that the CP110/Cep97 oscillation is influenced by the CCO. This was also raised by Reviewer #2 and, as described above (p2, para.3 above), we have now performed additional experiments to more directly demonstrate this point (new Figure 5G—H).

      The reviewer requests more discussion of why our conclusion that CP110/Cep97 levels oscillate on the growing daughter centrioles during S-phase is different to that reached by Dobbelaere et al, (Curr. Biol., 2020), who conclude that Cep97-GFP only starts to incorporate into the new daughter centrioles late in S-phase when the daughters are fully grown. We have discussed this discrepancy with these authors and they kindly shared their reagents with us (so our endogenous Cep97-GFP oscillation data comes from the same line they used in their experiments), but we have not come to a clear conclusion on this point. We have shown robust oscillations for CP110 and Cep97 by quantifying many hundreds of centrioles using multiple transgenes (both over- and under-expressed) in multiple backgrounds. Cep97 dynamics were a very minor part of the Dobbelaere et al., study, and they analysed a much smaller number of centrioles. We now briefly mention this discrepancy (p9, para.1), but do not discuss it in detail as we have no definitive explanation for it.

      The reviewer requests more experiments or more discussion to address the mechanism(s) of crosstalk between CP110/Cep97 and Plk4, and they suggest several avenues for further investigations. These are excellent ideas, and we are working hard on these approaches. These are all long-term experiments, however, and we feel it is important that the field be made aware of these surprising findings as soon as possible, as others may be better-placed to provide mechanistic insight into how this system ultimately works. We now briefly mention some of the future directions the reviewer highlights in the Discussion.

      The reviewer thought we should highlight the previous publications showing that Plk4-induced centriole amplification requires CP110 and that Plk4 can phosphorylate CP110. These studies (Kleylein-Sohn et al, Dev. Cell, 2007; Lee et al., Cell Cycle, 2017) were mentioned, but we now discuss them more prominently (p17, para.2).

      Minor Points:

      The reviewer raised a number of minor concerns that we have now addressed: (1) We discuss the model the reviewer suggests; (2) we no longer state that the crosstalk between CP110/Cep97 and Plk4 is unexpected; (3) We have clarified our description of the shift in timing of the peak levels of CP110/Cep97, which we no longer refer to as an oscillation; (4) We define mNG as monomeric Neon Green; (5) We have changed our schematics in Figure 1 as suggested by the reviewer; (6) We have corrected the mistake in the legend to Figure 8.

      Reviewer #4

      Major points:

      1. The reviewer noted that the amplitude of the CP110/Cep97 oscillations depended on protein expression levels, so the oscillations might not reflect the behaviour of the endogenous proteins. They requested that we either repeat our experiments with CRISPR knock-in alleles, or conduct experiments with the lines driven by the endogenous promotors but in their respective mutant backgrounds. We have not generated CRISPR knock-ins for CP110/Cep97, but have done so for many other centriole/centrosome proteins (>8) and found that most such lines are expressed at higher or lower levels than the endogenous allele (and sometimes very significantly so). This is also true for our standard transgenic lines, where genes are expressed from their endogenous promoters, but are randomly integrated into the genome. The blots in Figure 4 show that CP110-GFP and GFP-Cep97 expressed from a ubiquitin (u) promoter or from their endogenous promoters (e) are expressed at ~2-5X higher or ~2-5X lower levels than the endogenous proteins, respectively. As we observe CP110/Cep97 oscillations in all cases, it seems unnecessary to generate new CRISPR knock-ins (that are also likely to be somewhat over- or under-expressed) to show this again. As the reviewer asks, we show that Cep97-GFP and CP110-GFP still oscillate in in the absence of the endogenous proteins (Reviewer Figure 1). As this does not seem a surprising result, we do not show this in the main manuscript. In the same point the reviewer requests that we use antibody staining in fixed embryos to show that the untagged proteins also oscillate. Analysing protein dynamics is much harder in fixed embryos, as the levels of fluorescent staining are more variable and we can only approximately infer relative timing, rather than precisely measuring it (as we can in living embryos). Moreover, as both proteins in the CP110/Cep97 complex exhibit a very similar oscillatory behaviour when tagged with either GFP or RFP (e.g. Figure 2C), and this behaviour is distinct to that observed with several other GFP- or RFP-tagged centriole proteins (e.g. Novak et al., Curr. Biol., 2014; Conduit et al., eLife, 2015; Aydogan et al., JCB, 2018; Aydogan et al., Cell, 2020) it seems very unlikely that this behaviour is induced by the GFP (or RFP) tag.

      The reviewer also suggests that we show the data with the endogenous promoter before we show the data with the ubiquitin promoter. As we now explain better (and show in Figure 4), this seems unnecessary as the proteins expressed from the ubiquitin promotor are probably actually expressed at levels that are more similar to the endogenous protein.

      The reviewer questions whether the oscillations we observe might be due to the centrioles simply moving up and down in the embryo during the cell cycle, and they suggest we monitor Asl behaviour to rule this out. We have previously shown that Asl-GFP levels do not oscillate; they remain constant throughout the cell cycle on old-mother centrioles, and grow approximately linearly throughout S-phase on new-mother centrioles (see Figure 1D in Novak et al., Curr. Biol., 2014).

      We were not sure we understood this point properly, so we copy the reviewers comment in full here: ____The authors mention (for instance on p. 3) that the inner cartwheel and the surrounding microtubules assemble at opposite ends of the daughter centriole. However, my understanding is that the short centrioles present in the fly embryo have an inner cartwheel that extends throughout the organelle, such that it might be moot to make a distinction between the two ends in this case. Moreover, it is also my understanding that this inner cartwheel is itself surrounded by microtubules, so that microtubule assembly might not be expected to occur strictly at the distal end no matter what. The reviewer is correct that Drosophila centrioles are short (~150nm) and that the cartwheel extends throughout the centriole. We think the reviewer is suggesting that it may not be relevant therefore whether the cartwheel and centriole MTs grow from opposite ends—as the activities that govern their growth may not be spatially separated? However, because cartwheels grow preferentially from the proximal-end (Aydogan et al., JCB 2018) while centriole MTs are assumed to grow preferentially from the distal (plus) end, there is an intrinsic problem in ensuring they grow to the same size—no matter how short or long the centrioles are. The reviewer is correct that one possible solution to this problem is that the centriole MTs actually grow from their minus ends, but this is not widely accepted (or even proposed). We have tried to explain this issue more clearly throughout the revised manuscript.

      The reviewer points out that the schematic illustrations in Figure 1A and 1C are inaccurate and unhelpful. We agree and have now redrawn these.

      The reviewer asks that we provide information about the eccentricities of the centrioles in the different datasets used to calculate the protein distributions shown in Figure 1, particularly as the data for Sas-4-GFP and Sas-6-GFP were obtained previously using a different microscope modality, making comparisons complicated. The point that comparing distance measurements across different datasets is difficult is an important one, and we now state that such comparisons should be treated with caution. However, we have not provided information on the distribution of centriole eccentricities in the different experiments as it wasn’t clear to us how this information could be used to make such comparisons more accurate (presumably the reviewer is suggesting we could apply a correction factor to each dataset?). The very tight overlap in the positioning of CP110/Cep97 fusions (Figure 1C) strongly suggests that any difference in the average centriole eccentricities of the different populations of centrioles analysed, which are already tightly selected for their en-face orientation (i.e. eccentricity

      The reviewer requested that we show the “noisy data” we obtained during mitosis that we excluded from our analysis in Figure 3. As we now explain in more detail (p8, para.2), there are two reasons why the data for mitosis in this experiment is “noisy”: (1) The protein levels on the centrioles are low in mitosis and the centrioles are more mobile, so they are hard to track; (2) The Asl-mCherry marker used to identify the mother centriole starts to incorporate into the daughter (now new mother) centriole during mitosis, making it difficult to unambiguously distinguish mothers and daughters. As a result, we cannot track and assign mother/daughter identity to very many centrioles during mitosis—although we now include some extra data-points during mitosis for the centrioles where we could do this (revised Figure 3C,D). Importantly, it is clear that this “noisy” data hides no surprises: one can see (Figure 3C,D) that the signal on the centrioles is simply low during mitosis and then starts to rise again as the embryos enter the next cycle. This is confirmed in the normal resolution data (Figure 2B,C; Movies S1 and S2) where we can track many more centrioles due to the wider field of view and because we do not have to discard centrioles in mitosis that we cannot unambiguously assign as mothers or daughters.

      The reviewer requests that we conduct a super-resolution Airy-scan analysis of CP110/Cep97 driven from their endogenous promoters (eCP110 or eCep97) to ensure that the oscillations we see with these lines (shown in Figure 4C,D) are also occurring at the daughter centriole—as we already show for the oscillations observed with the uCP110 and uCep97 lines (shown in Figure 4C,D, and analysed at super-resolution on the Airy-scan in Figure 3). This is technically very challenging as super-resolution techniques require a lot of light and the centriole signal in the eCP110/Cep97 embryos is very dim compared to uCP110/Cep97 embryos (Figure 4C,D). We have managed to do this for eCep97-GFP and confirmed that—even in these embryos that express Cep97-GFP at much lower levels than the endogenous protein (Figure 4A)—the “oscillation” is primarily on the daughter (Reviewer Figure 2). As this data is very noisy, and as the ubiquitin uCP110/Cep97 lines express these fusions at levels that are closer to endogenous levels (Figure 4A,B), we do not show this data in the main text.

      The reviewer also asks for clarification as to why we use the Airy-scan for some experiments and 3D-SIM for others. As we now explain (p8, para.1), 3D-SIM has better resolution than the Airy-scan, but it takes more time and requires more light—so we cannot use it to follow these proteins in living embryos. Thus, for tracking CP110/Cep97 throughout S-phase in living embryos we had to use the Airy-scan.

      The reviewer questions why in some experiments we analyse the behaviour of 100s of centrioles, whereas in others the numbers are much smaller (1-14 in Figure 3—note, the reviewer quoted this number as coming from Figure 4, but it actually comes from Figure 3, so we have assumed they mean Figure 3). We apologise for not explaining this properly. The super-resolution experiments in Figure 3 are performed on a Zeiss Airy-scan system, which has a much smaller field of view than the conventional systems we use in other experiments. Thus, we inherently analyse a much smaller number of centrioles in these experiments. In addition, as explained in point 6 above, in these experiments we need to analyse mother and daughter centrioles independently, and in many cases we cannot unambiguously make this assignment, so these centrioles have to be excluded from our analysis.

      The reviewer questions why we selected the 10 brightest centrioles for the analysis shown in Figure S1B,C (note, the reviewer states Figure S2 here, but it is the data shown in Figure S1B,C that is selected from the 10 brightest centrioles, so we assume this is the relevant Figure). We apologise for not explaining this properly. In these mutant embryos very little CP110-GFP localises to centrioles in the absence of Cep97, and vice versa, so we cannot track centrioles using our usual pipeline and instead have to select centrioles using the Asl-mCherry signal. As the difference between the WT and mutant embryos is so striking, we simply selected the brightest 10 centrioles (based on Asl-mCherry levels) in both the WT and mutant embryos for quantification. We could select more centrioles, or select centrioles based on different criteria, but our main conclusion—that the centriolar localisation of one protein is largely dependent on the other—would not change.

      The reviewer also questioned why we performed the analysis shown in Figure S2 (new Figure S3) during S-phase of nuclear cycle 14, when the rest of the manuscript focuses on nuclear cycles 11-13. We apologise for not explaining this properly. In cycles 11-13 centriolar CP110/Cep97 levels rise and fall during S-phase, whereas both proteins reach a sustained plateau during the extended S-phase (~1hr) of nuclear cycle 14—making it easier to analyse CP110/Cep97 levels in embryos when their centriole levels are maximal. We now explain this.

      The reviewer requests that we quantify the western blots shown in Figure 4 in the same way we do in figure 8. To do this we would need to perform multiple repeats of these blots and we did not perform these because the blots shown in Figure 4 largely recapitulate already published data (Franz et al., JCB, 2013; Dobbelaere et al., Curr. Biol., 2020). Moreover, as described in our response to Reviewer #2, these ECL blots are very sensitive, but highly non-linear, so we always compare multiple serial dilutions of the different extracts to try to estimate relative levels of protein expression. We now explain this in the M&M.

      The reviewer suggests the data shown in Figure 8 is a “straw man”: we really want to test whether modulating CP110/Cep97 levels modulates centriolar Plk4 levels, but instead we test how they modulate cytoplasmic Plk4 levels. The language here is harsh, as it suggests that our intention was to mislead readers into thinking that we have addressed a relevant question by addressing a different, irrelevant, one. We apologise if we have missed something, but we believe we do perform exactly the experiment that the reviewer thinks we should be doing—quantifying how centriolar Plk4 levels change when we modulate the levels of CP110 or Cep97 (Figure 7). It is clear from this data that modulating the levels of CP110/Cep97 does indeed modulate the centriolar levels of Plk4. In Figure 8 we seek to address whether this change in centriolar Plk4 levels occurs because global Plk4 levels in the embryo are affected—a very reasonable hypothesis, which this experiment addresses quite convincingly (although negatively).

      Minor Points:

      The reviewer highlights a small number of mistakes and omissions, all of which have been corrected.

      Finally, we would like to thank the reviewers again for their detailed comments and suggestions. We hope that you and they will agree that the changes we have made in response to these comments have substantially improved that manuscript and that it is suitable for publication in The Journal of Cell Science.

      Sincerely,

      Jordan Raff

      __Reviewer Figure 1. CP110/Cep97 dynamics remain cyclical even when Cep97-GFP and CP110-GFP are expressed from their endogenous promotors in the absence of any endogenous protein. __Graphs show how the levels (Mean±SEM) of centriolar CP110/Cep97-GFP change during nuclear cycle 12 in (A) Cep97-/- embryos expressing eCep97-GFP or (B) CP110-/- embryos expressing eCP110-GFP. CS=Centrosome Separation, NEB=Nuclear Envelope Breakdown. N≥11 embryos per group, average of n≥15 centrioles per embryo.

      __Reviewer Figure 2. ____The cyclical recruitment of Cep97-GFP expressed from its endogenous promoter occurs largely at the growing daughter centriole. __The graph quantifies the fluorescence intensity (Mean±SD) acquired using Airy-scan microscopy of eCep97-GFP on mother (dark green) and daughter (light green) centrioles in individual embryos over Cycle 12. CS = Centrosome Separation, NEB = Nuclear Envelope Breakdown. Data was averaged from 3 embryos as the number of centriole pairs that could be measured was relatively low (total of 2-8 daughter and mother centrioles per time point; in part due to the much dimmer signal of eCep97-GFP in comparison to uGFP-Cep97).

    1. This work has been peer reviewed in GigaScience (see paper https://doi.org/10.1093/gigascience/giac011), which carries out open, named peer-review.

      These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 2: Gregg Thomas

      This paper presents 17 new insect genomes from the order of caddisflies (Trichoptera). The authors combine these genomes with 9 previously sequenced genomes to analyze genome size evolution across the order. They find that genome size tends to correlate with evolution of repeat elements, specifically expansion of transposable elements (TEs). Interestingly, the authors also notice that TE expansions also correlate with gene copy-number (or gene fragment copy-number), even of highly conserved genes used to assess genome completeness. Overall, I find this paper very well written and easy to follow. The genomic resources and analyses presented provide novel new resources and findings for insects in the order Trichoptera, with potential implications beyond. I have only minor suggestions before publication, outlined below.

      1. Regarding the TE and BUSCO gene fragment associations, while I think this is a really interesting analysis, I found the underlying models a bit difficult to understand. Line 236 reads, "To test whether repetitive fragments were due to TE insertions near or in the BUSCO genes or, conversely, due to the proliferation of 'true' BUSCO protein-coding gene fragments…" Is the idea that a BUSCO gene has been duplicated itself and then one copy is either fragmented by a TE insertion or hitch-hikes with a TE (as mentioned on line 501)? Or are these fragments only of BUSCO genes that didn't match a full BUSCO gene at all, but the fragments that did match had unexpectedly high coverage? I guess I'm just confused as to whether a gene duplication needs to precede the TE insertions/hitch-hiking, which is subsequently pseudogenized either prior to or because of the TE activity, or if these are gene losses. I understand how the TE could inflate the coverage of these fragments, but I guess I'm still not clear on how these fragments arise in the first place. Any clarification would be helpful! Also, if the case is that these are fragments of BUSCO genes that have no full matches in the genome, how might assembly contiguity or quality be affecting these matches?

      2. One thing that I noticed throughout the figures is that branch B1, leading to A. sexmaculata, the branch leading to clade A, and the branch leading to clade B (as labeled in Figures 1 and 2) appear to form a polytomy. I don't find this mentioned in the text and am wondering why this relationship remains unresolved with these data. I don't think this has any bearing on the results, since all analyses are done on the tips of the tree, but I think readers looking at these trees will want to know what is going on at that node.

      3. The authors use custom scripts for their BUSCO-TE correlation analysis and provide a link to a Box folder on line 514. I would request that these scripts be put somewhere more stable and accessible (e.g., github). Not only was I asked to login when clicking the link, but after I had done so that link didn't seem to exist.

      Minor/editorial points

      1. Would the authors be able to report concordance factors for the species tree? I think this should be easy enough with IQ-tree and is something I ask everyone to do. This may also help answer my question about the polytomy.

      2. The authors do a good job of mentioning and citing programs used throughout the manuscript but seem to skip this in the Assembly section (starting on Line 398). "First, we applied a long-read assembly method…" Which one? Same for "de novo hybrid assembly approaches." I see that assembly is covered in detail in the Supplement, but I think naming the main programs used (wbtdbg2 and Masurca) should be in the main text.

      3. Line 281-282: I think some of the brackets and parentheses here are mismatched or un-closed.

    1. This work has been peer reviewed in GigaScience (see paper https://doi.org/10.1093/gigascience/giac005), which carries out open, named peer-review.

      These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1: Paul Stewart

      Fahrner et al have produced a very nice manuscript and corresponding pipeline. They describe a collection of DIA tools in the Galaxy framework for reproducible and version-controlled data processing. These DIA tools are an excellent addition to the growing number of proteomics-centric tools already available in Galaxy. The reviewer could find no major revisions needed and therefore only requests a few minor revisions before this is ready for publication:

      Please include page numbers in the revised manuscript to make referencing the text easier.

      Page 6

      OpenSwath and PyProphet are cited and are also used in the manuscript. Please cite one or two alternatives.

      Please consider citing a tool the each time it is used in a new paragraph (e.g. MSstats).

      There is heavy reliance on conjunctive adverbs (However, ...; Thus, ...) on this page and throughout the manuscript. These can make passages a bit hard to read. Please consider rephrasing.

      Page 7

      Why "so-called histories"? Aren't they simply "Histories"?

      Page 14

      'To decrease the analysis time of the semi-supervised learning, the merged OSW results can be first subsampled using the PyProphet subsample tool and subsequently scored using the PyProphet score tool. '

      The reviewer is not familiar with this approach. Can you please give additional justification (maybe under methods?) or provide a citation that this is a reasonable approach?

      Page 15

      Please check your reference software and/or work with the journal to ensure that the web addresses are linked properly. For example, the reviewer tried copying the link "https://training.galaxyproject.org/training- %20material/topics/proteomics/tutorials/DIA_lib_OSW/tutorial.html" but a "%20" (or a space) is inserted into the URL after "training-" so the link as it appears did not work until this was removed. A less technically savy reader may think the links are broken and will not be able to access the materials.

      Page 16

      'We identified and quantified between 25.000 to 27.000 peptides ...'

      Please be consistent with number formatting (25000 vs 25.000). Other values in the tables did not use this formatting. Please check with journal editor for convention.

      Figures

      Please be consistent with axes labels. Some are upper case and some are lower case.

      Figure 2B

      Please round R2 to 2 or 3 decimals.

      Figure 3

      Please change the red-green color scheme to a more color-blind friendly color scheme (e.g. red blue)

    1. This work has been peer reviewed in GigaScience (see paper https://doi.org/10.1093/gigascience/giac001), which carries out open, named peer-review.

      These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1: Bo Li

      Single-cell RNA-seq has revolutionized our abilities of investigating cell heterogeneity in complex tissue. Generating a high-quality gene count matrix is a critical first step for single-cell RNA-seq data analysis. Thus, a detailed comparison and benchmarking of available gene-count matrix generation tools, such as the work described in this manuscript, is a pressing need and has the potential to benefit the general community.

      Although this work has a great potential, the benchmarking efforts described in the manuscript are not comprehensive enough to justify its publication at GigaScience unless the authors address my following major and minor concerns.

      Major concerns:

      1) The authors should discuss related benchmarking efforts and the differences between previous work and this manuscript in the Background section instead of the Discussion section. For example, Du et al. 2020 G3: Genes, Genomics, Genetics. and Booeshaghi & Pacther bioRxiv 2021 should be mentioned and discussed in the Background section. In addition, STARsolo manuscript (https://www.biorxiv.org/content/10.1101/2021.05.05.442755v1), which contains a comprehensive comparison of CellRanger, STARsolo, Alevin and Kallisto-Bustools should be cited and discussed. Zakeri et al. 2021 bioRxiv (https://www.biorxiv.org/content/10.1101/2021.02.10.430656v1) should also be included and discussed in the Background section.

      2) Benchmark with latest versions of the software. The choice of Cell Ranger, STARsolo, Alevin and Kallisto-BUStools is good because they are four major gene count matrix generation tools. However, I urge the authors also include CellRanger v6 and Alevin-fry (Alevin_sketch/Alevin_partialdecoy/Alevin_full-decoy, see STARsolo manuscript), which are currently lacking, into their benchmarking efforts. The authors may also consider add STARsolo_sparseSA into the benchmark. Since single-cell RNA-seq tool development is a fast-evolving field, benchmarking of the up-to-date versions of tools is super critical for a benchmarking paper.

      3) Conclusions. The authors summarized the observed differences between tools based on the benchmarking results. This is good but very helpful for end-users. I recommend the authors to emphasize their recommendations for end-users more clearly in the discussion/results section. For example, do the authors recommend one tool over the others under certain circumstances? If so, which tool and which circumstance and why? I like Figure 5 a lot and hope the authors can summarize this figure better in the manuscript.

      4) This manuscript concluded that differential expression (DEG) results showed no major differences among the alignment tools (Figure 4). However, the STARsolo manuscript suggested DEG results are strongly influenced by quantification tools (Sec. 2.6, Figure 5). Please explain this discrepancy.

      5) This manuscript suggested simulated data is not as helpful as real data. However, the STARsolo manuscript reported drastic differences between tools using simulated data. Please comment on this discrepancy.

      6) I have big concerns regarding the filtered vs. unfiltered annotation comparison. In particular for pseudogenes, we know that many of them are merely transcribed or lowly transcribed. As a result, many of these pseudogenes would not be captured by the single-cell RNA-seq protocol. At the same time, because these pseudogenes share sequence similarities with functional genes, they would bring trouble for read mapping. This is one of the main reasons for using a carefully filtered annotation. Actually, whether and how to filter annotation is in active debate in big cell atlas consortia such as Human Cell Atlas. Thus, I would be super careful about describing results comparing filtered vs. unfiltered annotation. For example, in Suppl. Figure 8D, there are 6 mitochondrial genes that have 100% sequence similarity to their corresponding pseudogenes. It is impossible to distinguish if a read comes from a gene or a pseudogene for these 6 genes and it is also not necessary --- the transcribed RNA should also be exactly the same. Thus, I encourage the authors remove their pseudogenes from the annotation and I suspect the mouse data results should look similar to the human data in the Suppl. Figure 8A.

      7) The endothelial dataset was only run on CellRanger 3 because the UMI sequence is one base shorter. Could the authors augment the UMI sequence with one constant base and run this dataset through CellRanger 4/5/6?

      8) I think it is more appropriate to call the tools benchmarked as "gene count matrix generation tools" instead of "alignment tools".

      Minor concerns:

      1) The Suppl Table 2 mentioned in the main text corresponds to Suppl. Table 3 in the attachment. In addition, there is no reference to Suppl Table 2.

      2) Suppl Table 3 PBMC, why do I see endothelial cell markers in PBMC dataset?

      3) Suppl Figure 7 is never referenced in the main text.

      4) Suppl Figure 8D is never referenced in the main text.

    1. This work has been peer reviewed in GigaScience (see paper https://doi.org/10.1093/gigascience/giab099), which carries out open, named peer-review.

      These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 3: idoia ochoa

      The authors present a novel tool for the compression of collections of bacterial genomes. The authors present sound results that demonstrate the performance gain of their tool, MBGC, with respect to the state-of-the-art. As such, I do not have concerns about the method itself. My main concerns are with respect the description of the tool, and how the results are presented. Next I list some of my suggestions (in no particular order):

      Main Paper: - Analysis section: Before naming MBGC specify that it is the proposed tool. - Analysis section: Reference for HRCM. Mention here also that other tools such as iDoComp, GDC2, etc. are discussed in the Supplementary (this way the reader knows more tools were analyzed or at least tried on the data).

      • Analysis section: The paragraph "Our experiments with MBGC show that... " is a little misleading, since it seems that the tool has the capacity to compress a collection and just extract a single genome from it. This becomes clear later in the text when it is discussed how the tool could be used to speed up the download of a collection of genomes from a repository. So maybe explain that in more detail here, or mention that it could be used to compress a bunch of genomes prior to download. And then point to the part of the text where this is discussed in more detail.

      • Analysis section: The results talk about the "stronger MGBC mode", the "MGBC max", but in the tables it reads "MBGC default" or "MBGC -c 3". I assume "MBGC -c 3" refers to "MBGC max", but it is not stated anywhere. maybe better to call it "MBGC default" and "MBGC max".

      • Analysis section: Although the method is explained later in the text, it would be a good idea to give a sense of the difference between the default and max modes of the tool. Or some hints on the trade-off between the two. Also, the parameter "-c 3" is never explained.

      • Analysis section: Figures, it is difficult to see the trade-off between relative size and relative time, can you use colored lines? such that the same color refers to the same set of genomes. Also, in the caption, explain if we want small or high relative size and time. it may be clear, but better to clearly state it.

      • Analysis section: there is a sentence that says "all figures w.r.t. the default mode of MBCG". It would be good also to state that in the caption, so that the reader knows which mode of the tool is being used to generate the presented results. and if the input files are gzipped or not. For example, for the following paragraph that starts with Fig. 1, it is not clear if the files are gzipped or not.

      • Analysis section: First time GDC2 is mentioned, the first thing that comes to mind is why it was not used for the bacterial experiments. See my previous point on having a couple of sentences about the other tools that were considered, and why they are not included in the main tables/figures.

      • Methods:

      -- Here I am really missing a diagram explaining the main steps of the tool. It seems the paper has been rewritten slightly to fit the format of the journal and some things are not in the correct order. For example, it says the key ideas are already sketched, but i do not think that is true.

      -- (offset, length) i assume refers to the position of the REF where the match begins, and the length of the match, but again, not really explained. A diagram would help. Also, when it is time to compress the pairs, are the offset delta encoded? or encoded as they are with a general compressor?

      -- How are the produced tokens (offset, length, literals, etc.) finally encoded?

      -- First time parameter "k" is mention, default value? Also, how can you do a left extension and "swallow" the previous match? is it because the previous match could have been at another position? otherwise if it was in that position it would have been already extended to the right, correct? i mean, it would have generated a longer match.

      -- The "skip margin" idea is not well explained. not sure why the next position after a match is decreased by m. please explain better or use a diagram with an example.

      -- when you mention 1/192, maybe already state that this is controlled by the parameter u. otherwise when you mention the different parameters is difficult to relate them to the explanation of the algorithm.

      Availability of supp...

      -- from from (typo) Tables

      -- Specify the number of genomes in each collection.

      -- change MBGC -c 3 to MBGC max or something similar. (see my previous comment -c flag is not explained!)

      Supplementary Material

      -- move table 1 after the text for ease of reading

      -- not clcear if the tool has random access or not. it is discussed the percentage of time (w.r.t. decompreessing the whole collection i believe) that it would take to decompress one of the first gneomes vs one of the last ones. this should be better explained. for example, if we decompress the last genome of the collection we will employ 100% of the time, right? given that previous genomes are part of REF (potentially). please explain better and discuss this point in the analysis part, not only in the supplementary. seems like an important aspect of the algorithm.

      -- I assume this is not possible, but should be discussed as well. can you add a genome to an already compressed collection? this together with the random access capabilities will highlight better the main possible uses of the tool.

      -- section 4.3: here HT is used, and then HT is introduced in the next paragraph. please revise the whole text and make sure everything is in the right order.

      -- parameter m, please explain better.

      -- add colors to figures, it will be easier to read them. Overall, as I mentioned before, I believe the tool offers significant improvements with respect to the competitors for bacterial genomes, and performs well on non bacterial genomes as well. What should be improved for publication is the description of the method, since at the end of the day is the main contribution, and how the text is presented.