510 Matching Annotations
  1. Last 7 days
    1. Author Response:

      Reviewer #1 (Public Review):

      This manuscript has great potential. The study is well designed, performed, and written, with good statistical analyses. On the other hand, it does not have a sufficient experimental basis. The authors investigated whole body immunoglobulin diversity in killifish and found that it decreases with age. This decrease is mostly driven by larger clones, in other words, by the expansion of B cell clones. They further analyzed immunoglobulin diversity in the intestine and found that its decrease is much more pronounced than in the whole body. It was also observed that the transfer of the young gut flora to old fish does not rejuvenate the B cell repertoire. The major novelty of this work is the model organism, killifish. Also, while this study is solid, it is descriptive, without many mechanistic insights.

      We thank the reviewer for their frank assessment of our manuscript, as well as for their helpful suggestions of possible ways to dig deeper into the phenomenon of killifish repertoire ageing. We agree with the assessment that this study is primarily descriptive in nature, and that experimental interventions – including infection challenge studies – would help establish causal mechanisms. Nevertheless, we have provided new data supporting an association between loss of repertoire diversity (see our response below) which we believe supports the biological relevance of our findings.

      While our initial submission demonstrates that the diversity of the killifish repertoire declines with age, it is true that this does not necessarily imply that this decline is linked to changes in immune functionality. To provide functional insights into the transcriptomic signature associated with different antibody diversity orders, we now include an analysis linking repertoire diversity data in our intestinal cohort to pre-existing intestinal RNA-seq data from the same individuals (Figure 6). The combination of these two data sets allows us to analyse changes in gene expression with respect to intestinal antibody diversity, controlling for age. We find that a number of immune-activity GO terms – including “B cell receptor signaling pathway”, “B cell proliferation”, and “lymphocyte activation” are significantly positively enriched with respect to repertoire diversity across multiple diversity orders. A decline in intestinal antibody diversity – as seen in ageing – is thus associated with a decline in B-cell immune activity in killifish. We acknowledge that confident demonstration of a causal link between repertoire diversity and immune state will require experimental challenge of host immunity, for example through infection experiments – something we will address in the future and is beyond the scope of this work. However, we believe these new data are sufficient to demonstrate a significant association between the two, supporting the biological relevance of the age-associated decline in diversity we observe.

      Some of the following experiments, or other experiments, may help explore mechanisms and make the study more compelling: 1) whole genome sequencing of lymphoid tissues and brain as a control, from the same old fish to determine whether there are clonal somatic mutations. If confirmed, it may be an important finding, as it would mean that clonal expansions emerge as fast as the killifish lifespan, and it would be a great model to study mechanisms of mutation accumulation and clonal selection with age. This WGS data may be further used to reconstruct immunoglobulin repertoires to understand if the whole-body decrease is driven solely by intestine B cells, or it initiates in lymphoid tissues.

      We agree that further investigation of primary repertoire development in killifish lymphoid organs would be a valuable direction for future work, and would help disentangle whole-body from intestinespecific repertoire changes. However, we believe our current analysis is sufficient to demonstrate the presence of clonal somatic mutations in the whole-body repertoire. The pRESTO/Change-O pipeline used in our analysis can distinguish heavy-chain sequences arising from different naive ancestors, and the presence of large clones in the killifish repertoire (see e.g. Supplemental Figure 5A) necessitates rapid clonal expansion. Ongoing work in our group is indeed directed at studying somatic DNA sequence variation across tissues during aging in killifish, including alternative experimental approaches to investigating killifish repertoire aging. We have now added a sentence about these further research directions to the manuscript discussion. However, we feel these further experiments may be beyond the specific scope Bradshaw et al. Point to point rebuttal of the present work, which is focused on high-level changes in killifish antibody repertoire composition with age.

      2) RNA sequencing of intestine samples or spleen from young versus old killifish to obtain insights into possible molecular mechanisms clonal expansion and diversity loss. Spleen RNA sequencing may be used to reconstruct the immunoglobulin repertoire. The authors used 750 ng of total RNA in the current study, so there should be enough material for RNA sequencing. As an alternative, single cell RNA sequencing may be performed.

      We certainly agree that investigation of repertoire aging in a wider array of immune organs, including spleen, would be highly valuable, and that killifish is a promising model organism in which to carry out these investigations. We have now included analysis of RNA-sequencing data from the killifish gut, which as discussed above supports an association between loss of repertoire diversity and immune function in that organ (see response to A.1). We hope for future work to more comprehensively explore the landscape of organ-specific repertoire ageing in the turquoise killifish; however, we feel that this would be beyond the scope of the present study.

      Reviewer #2 (Public Review):

      This study introduces the killifish as a short-lived vertebrate model for immune aging and immunosenescence and characterizes the changes in the immune-repertoire during aging. The authors convincingly show a decrease in diversity of the large expanded B-cell clones that is greater than small clones and a more pronounced change in the intestinal antibody repertoire with age. A limitation of the current study is its descriptive nature and lack of strong evidence that these animals truly experience functional immunosenescence. The impact of this work could be strengthened by functional data showing a decline in adaptive immunity that goes along with the loss of diversity in the antibody repertoire or citation and discussion of prior literature supporting this relationship. As it is, it is difficult to know the extent to which the observed changes are strongly correlated with changes in immune function, and the manuscript currently somewhat overstates the importance of the observations. It should be explicitly noted that further research is needed to determine whether the changes in immune-repertoire actually reflect immune senescence or simply changes with little or no consequence.

    1. Author Response:

      Thank you for taking the time to review the Digital Brain Bank, and for providing several suggestions to improve both the manuscript and website. We appreciate the positive comments surrounding our new resource, given the considerable effort that has been invested to date. Below, we provide a summary of the key changes that have been made to the Digital Brain Bank manuscript, reflecting the Editors’ and Reviewers’ suggestions.

      Resource Description

      We appreciate from the Reviewers’ comments that the description of the Digital Brain Bank as an “interactive data discovery and release platform” and a “cross-scale, cross-species investigation framework” does not reflect the current underlying functionality of the website. Although considerable effort has been made to enable users to visualise datasets directly on the website, the primary purpose of the Digital Brain Bank is a data release platform. We have adapted our wording to align with this, shifting the emphasis of the Digital Brain Bank as a data resource. We have additionally clarified the scope of datasets available in the resource, alongside the types of data available on the Digital Brain Bank website.

      Context of Resource

      The Reviewers noted that the original manuscript did not frame the Digital Brain Bank in the context of existing resources. In the revised manuscript, we have added a discussion of the Digital Brain Bank in terms of existing neuroimaging resources spanning multiple domains, including histology, transcriptomics, in vivo MRI & post-mortem MRI. We anticipate that the Digital Brain Bank will complement existing open-science initiatives in both human and non-human neuroimaging. We foresee the greatest overlap and integration between the Digital Brain Bank and existing in vivo and post-mortem MRI databases, where common signal-forming mechanisms facilitate comparisons.

      Web-based Image Viewer (Tview)

      Our web-based image viewer, Tview, provides visualisation of multi-scale (e.g. MRI & microscopy) data in a single 2D plane. This functionality was not readily available with existing viewers, requiring careful implementation due to the large size of the high-resolution microscopy datasets. The Reviewers note that Tview is only implemented for certain datasets in the first data release to the Digital Brain Bank. In the new manuscript we motivate this decision. Notably, several of the datasets in the first release are MRI-only. For these datasets, we found that a detailed static image was more suitable for visualisation.

      To further improve visualisation of these datasets, we are in the process of implementing a second web-based viewer to the Digital Brain Bank website, NiiVue. NiiVue is an open-source 3D volume viewer under active development. This will enable users to navigate 3D MRI datasets directly on the website, and supports overlays to localise the histology sampling location. These points are raised in the new manuscript, with an online NiiVue example available at https://niivue.github.io/niivue/features/overlay.multiplanar.html.

      Datasets

      Reviewer 1 raises that the Digital Anatomist and Pathologist categories have relatively few datasets. In the updated manuscript, we emphasise the uniqueness of the data available under these themes, and that the Digital Brain Bank represents one of the most substantial resources of its kind, providing data from 45 brains in total. We additionally provide further details of datasets which are intended for future release to the Digital Brain Bank. These are the Forget-Me-Not developing Human Connectome Project (dHCP) study - providing diffusion MRI datasets acquired in unfixed, post-mortem neonatal brains; BigMac dataset - providing in vivo MRI, post-mortem MRI, PLI and immunohistochemistry in a single, whole macaque brain; a cohort study combining multi-modal MRI and histology to investigate mouse models of ALS; and further primate species, alongside extensions into orders Carnivora and Rodentia.

      Corpus Callosum Analysis

      The corpus callosum analysis in Figure 3 has a small control cohort, and Reviewer 1 raises whether this analysis can produce meaningful results. We agree that the low number of controls and difficulty matching between groups is a major limitation of this analysis. Certainly, one would need to be cautious about interpreting any new observations based on our results. However, the purpose of this analysis was to demonstrate that we can use our data to replicate findings which have been previously reported in ALS (e.g. Chapman et al., 2014). This has been clarified in the new manuscript, alongside text to acknowledge the limitations of our analysis.

      MRI-Microscopy Registrations

      Co-registration between the MRI and microscopy data for the Human ALS MRI-Histology dataset is ongoing. As raised by Reviewer 1, coregistered MRI-microscopy datasets were previously available in only two brains. Since submission of the original preprint, we have additionally coregistered the PLP (myelin) staining data in multiple anatomical regions (5-8 regions per brain) for 13 brains in the Human ALS MRI-Histology dataset. These will now be available through the Digital Brain Bank.

      API, Metadata & Versioning

      Reviewer 3 raises that the resource is not currently designed for programmatic interactions or versioning. In the new manuscript we discuss why these are not yet implemented due to the current ad hoc nature of data access through signing MTAs via email. We have also taken the opportunity to outline our ambitions for incorporating these features in a future iteration of the Digital Brain Bank. Specifically, we intend on developing a new database to streamline data access and enable a programmatic interface. This database will perform user sign-up, authentication, and approval directly on the Digital Brain Bank website. This will enable approved users to access datasets directly on the website, which can readily incorporate stricter standards for linking data and dataset tracking.

    1. Author Response:

      Reviewer #2 (Public Review): Gaffield and Christie trained mice to an interval task of self-initiate bouts of licking to understand how the cerebellar activity relates to the organization of well-timed transitions to motor action and inaction during discontinuous periodically performed movements. Recording and optogenetically stimulating the activities of Purkinje cells, they concluded that the cerebellum encodes and influences the motor transitions, initiation and termination of discontinuous movements. The conclusion of the paper is very interesting and potentially provides insights on the neural mechanism of the previously proposed principle that the cerebellum controls the timings of discrete movements (Ivry et al. 2002). However, in the logic and interpretation to the conclusion I have concerns which they need to address. [Major comments]

      We thank the reviewer for their positive evaluation of our work and their helpful comments. We have substantially altered our manuscript to address their concerns, including an entirely new figure as well as additional supplemental figures.

      First, the activity of Purkinje cells can largely encode each bout of licking movements, in addition to initiation and termination of movements. Figure 2BCEF plays the peak of neural activity around the water time and Figure 2DG indicates the relationship between the neural activity and lick rate. The encoding of the initiation and termination alone cannot explain these observations. Related to this, none of the panels Figure 2BCEF shows a lead of the onset of neural activity to that of the lick rates (around -5 sec to water time). This looks inconsistent with the lead shown in Figure 3. The authors need to explain why such an inconsistency can happen.

      We agree that Crus I and II PCs encode parameters of licking bouts in addition to movement initiation and termination and deeply apologize for not making this point more clearly. To address this concern, we have extensively edited the text in several sections and have added an additional figure to emphasize the richness of the PC representation of behavioral attributes, beyond just initiation and termination alone. We disagree that there is an inconsistency in the lead times differences in our datasets. As the reviewer points out, the water-delivery-aligned firing rate z-scores do not seem to lead the licking rate (Fig. 2B-E). However, these data are averaged across trials with a high variance in the timing of lick initiation relative to water delivery; consequently, it is not possible to assess the timing of PC activity relative to lick bout initiation from these panels. When, by contrast, data are aligned to welldefined licking bouts (i.e., bouts with no licking in the preceding 2 s), it becomes clear that PC firing ramps up in advance of the bouts (Fig. 4C-D). We have edited the text, explaining this rationale, as requested by the reviewer.

      Second, the positive sign of neural modulation indicates biased recording sites. So far, many studies have been indicating the increasing firing modulation at the deep cerebellar nuclei in cerebellar timing tasks and motor tasks (e.g. Ten Brinke et al. 2017 eLIFE for the eyeblink conditioning; Ohmae et al. 2017 JNS for a self-initiate timing task; Becker and Person 2019 Neuron). Ramping-up modulation of Purkinje cells is not able to activate the deep cerebellar nuclei. When the motor-driving module generates negative modulation of Purkinje cells, the neighboring modules can generate positive modulation (e.g. Ten Brinke et al. 2017 eLIFE; De Zeeuw 2021 Nat Rev; Ohmae and Medina 2014 Soc. Neurosci. Abstr.). Because the neighboring modules are much wider than the motor-driving module, recording without identifying the driving modules, as in this study, will result in the recording being biased toward the adjacent modules.

      We too were surprised that we did not observe more negatively modulating PCs. However, our craniotomy was relatively large (>2 mm square) exposing an area over Crus I and II that encompassed zebrin bands 7+, 6-, and 6+. We randomly sampled PC activity within this region, so we don’t think our recordings were necessarily “biased”. We are unaware of any definite experiments showing whether positively and negatively PCs form separate, or convergent, channels of output onto their postsynaptic targets in the cerebellar nuclei. If convergent, then the response of the nuclear neurons will be determined by an ensemble of PCs with time varying signs of activity, in addition to the integration of the activity from pontine collaterals.

      We thank the reviewer for highlighting the developing idea of motor and non-motor cerebellar modules and the loops formed by their connectivity. We have edited our text to address how our recordings could fit into such an organizational scheme and have cited their recent unpublished preprint on this topic, now available on BioRxiv (Ohmae et al. 2021). However, we believe several considerations suggest that both positive and negative modulation of Purkinje cell firing rates will impact movement. (1) Large regions of the cerebellar cortex are capable of evoking or modulating movements when microsimulation is applied. Similarly, optogenetic suppression of IntA activity increases the outward velocity of reaching movements in mice (Becker & Person 2019). (2) In contrast with delay eyeblink conditioning, in which the motor output is an impulse-like twitch, rhythmic movements of the tongue (or, similarly, the limbs) require alternating recruitment and de-recruitment of muscles. Thus, motor commands will necessarily be multiphasic in time, and will tend to be out of phase for populations controlling antagonistic muscles. (3) Excitation of the DCN by collaterals of mossy fibers will likely modulate, and perhaps override, Purkinje cell inhibition. Therefore, further work will certainly be necessary to decipher exactly how potential antagonistic cerebellar modules participate organizing complex motor actions.

      Third, the authors used z scores for the unit of spike rate, but it is more appropriate to use spike per second as in Figure 3CD. In particular, I do not understand the meaning of difference of spike rate in the unit of z score in Figure 3E. The spike rate modulation in Figure 4E looks small which should be evaluated in the unit of spike per second as well. For the analysis of the last lick, the spontaneous spike rates should be displayed, instead of (or in addition to) the spike rate in the middle of lick bouts which should be much higher than the spontaneous spike rate according to Figure 2.

      We appreciate the reviewer’s input regarding style, but the current standard in the neurophysiology field is to report firing rate comparisons from a neural population as z-scores. Z-scoring is particularly useful because this metric provides a probability of an individual score occurring within a normal distribution, as well comparisons of different scores from different normal distributions; it also gives an indication of the raw score differs from the mean, information that isn’t available in spike rate comparisons alone. For these reasons, we elect to not change how we represent our data. However, we have modified our figures to report firing rates for traces from individual example cells as z-scoring is not appropriate for this purpose.

      Forth, I did not understand the conclusion for the optogenetic perturbation. In the result section for Figure 7, I think there is a logical gap between the last conclusion sentence and the sentences before it. The suppression of lick bouts in Figure 7D and the rebound induction in Figure 7G can be explained by the cerebellar contribution to each bout of lick movement (shown in Figure 2). I do not understand if these observations indicate the cerebellar contribution to the initiation and termination of a sequence of lick movements. Also, I have a concern about the location of stimulation sites. The stimulation may cover both the motor-driving module and neighboring modules, which makes the observations difficult to interpret because the stimulation is not specific to the positively modulating Purkinje cells.

      A lick bout is composed of a sequence of tongue protrusions and retractions performed at a highly regular rhythm. Apart from the first lick (Bollu et al., 2021), the motor command for this behavior is under the control of central pattern generators in the brainstem. Said another way, a lick bout is a continuous movement rather than series of discrete actions that are repeatedly started and stopped (they are like stepping during locomotion in some animals). Lick bout initiation and directional control of the bout can be commanded by the cerebral cortex. Given this organization, we do not believe our optogenetic experiment can be interpreted as an effect on the initiation and termination of individual licks because licks are not discrete actions when performed in a consummatory bout. However, based on the reviewer’s recommendation, we investigated how PCs encode information pertinent to individual licks in a bout (Figure 3). Although there was entrainment to individual lick cycles, there were no time-locked responses apparent in their average activity. Instead, there was a continuous mapping of the lick cycle across their population. Notably, licking rhythmicity was disrupted by the optogenetic perturbation, consistent with the influence of PC output on this movement parameter. We have edited the text to address these concerns.

      Fifth, For Figure 8, I had difficulty to understand what kind of activity of Purkinje cells can explain the shift of the peak timing of lick rate, because in the result sections of Figures 2-6 I could not find any activity encoding the peak timing of lick rate. For figure 8EFG, the analysis may not be correct. Because lick onset can be delayed with the photostimulation, in Figure 8E the boundary of onset corresponding to the 1s in control should 1+alpha in stimulation trials to correctly pick up the corresponding trials. Because we do not know the exact values of alpha, I think this analysis is not possible.

      PC ramping activity may contribute to the vigor of the ensuing licking response which would dictate peak licking rate timing. In fact, in many individual PCs, we observed correlations between PC firing and lick rate indicating a relationship. However, this was not borne out in the population response, so we did not pursue it further.

    1. Author Response:

      Reviewer #1 (Public Review):

      The underlying data are dominated by data from the UK Biobank, which means that, in effect, only few samples for the 25-50 age group are available. This may not be a big issue in terms of estimating smooth trajectories, but may limit comparisons to the reference model in certain cases (e.g. early disease onset) where this age range may be of particular interest.

      We show per site evaluation metrics, cross validation, and additional transfer examples. These additional analyses show that the model performance is not driven solely by the UKB sample. However, we agree with this comment and have also updated the limitation section (in the Discussion) regarding the overrepresentation of UKB and included a statement regarding the known sampling bias of UKB.

      The manual QC data is somewhat limited as it is based on a predominantly younger cohort (mean age ~30yrs). Furthermore, the number of outcome measures (cortical thickness and subcortical volume) and the number of data modalities (only structural MRI) are limited. However, as the authors also state, these limitations can hopefully be addressed by incorporating new/additional data sets into the reference models as they become available.

      We have added further details regarding the quality checking procedure to the methods section and improved the clarity of directions for implementing the scripts, including an interactive link to view an example of the manual QC environment, on the QC GitHub page to enable others to reproduce our manual QC pipeline.

      Reviewer #2 (Public Review):

      1. The evidence that the model will generalize ("transfer" as per the authors) to new, unseen sites, is very limited. To robustly support the claim that the model generalizes to data from new sites, a cross-validation evaluation with a "leave-one-site-out" (or leave-K-sites-out) folding strategy seems unavoidable, so that at each cross-validation split completely unseen sites are tested (for further justification of this assertion, please refer to Esteban et al., (2017)). The "transferability" of the model is left very weakly supported by figures 3 and 4, which interpretation is very unclear. This point is further developed below, regarding the overrepresentation of the UK Biobank dataset.

      We thank the reviewers for this suggestion and have addressed the concern regarding generalizability in several ways. First, we ran an additional 10 randomized train/test splits of the data in the full sample. These new analyses show the stability of our models, as there is very little variation in the evaluation metrics across all 10 splits. These results are visualized in Figure 3 – Supplement 2. However, the static Figure 3 – Supplement 2 is challenging to read, simply because there are many brain regions fit into a single plot. Therefore, we also created an online interactive visualization tool that shows the image of the brain region and the explained variance when you hover over a point (see the screenshot of the online tool below). This interactive visualization was created for all supplemental tables for easier exploration and interpretations and we now recommend this tool as the primary method to interrogate our findings interactively. Second, we updated and expanded the transfer data set to include 6 open datasets from OpenNeuro.org (N=546) and we provide this example dataset on our GitHub with the transfer code. This simultaneously provides a more comprehensive evaluation of the performance of our model on unseen data and more comprehensive walk-through for new users applying our models to new data (sites unseen in training). Finally, we added per-site evaluation metrics (Figure 3 – Supplement 3) to demonstrate that performance is relatively stable across sites and not driven by a single large site (i.e., UKB). As site is strongly correlated with age, these visualizations can also be used to approximate model performance at different age ranges (i.e., 9–10-year-old performance can be assessed by looking at ABCD sites evaluation metrics, and 50–80-year-old performance can be assessed by looking at UKB evaluation metrics). Moreover, we would also like to emphasize that we should not expect that all sites achieve the same performance because the sampling of the different sites is highly heterogeneous in that some sites cover a broad age range (e.g., OASIS, UKB) whereas other sites have an extremely narrow age range (e.g., ABCD).

      1. If I understand the corresponding tables correctly, it seems that UK biobank data account for roughly half of the whole dataset. If the cross-validation approach is not considered, at the very (very) least, more granular analyses of the evaluation on the test set should be provided, for example, plotting the distribution of prediction accuracy per site, to spot whether the model is just overfitted to the UKB sample. For instance, in Figure 4 it would be easy to split row 2 into UKB and "other" sites to ensure both look the same.

      We have addressed this comment in response to Reviewer 1 above.

      1. Beyond the outstanding work of visually assessing thousand of images, the Quality Control areas of the manuscript should be better executed, and particularly lines 212-233): 3.a. The overall role of the mQC dataset is unclear. QC implies a destructive process in which subpar examples of a given dataset (or a product) are excluded and dropped out of the pipeline, but that doesn't seem the case of the mQC subset, that seems a dataset with binary annotations of the quality of the FreeSurfer outcomes and the image.

      We have addressed this in response to Reviewer 1 above. We included the manual QC in this work, because in prior work by our group (https://www.biorxiv.org/content/10.1101/2021.05.28.446120v1.abstract) that leveraged big data and relied on automated QC, reviewers often criticized this approach and claimed our results could be driven by poor quality data. Thus, in this work we wanted to compare the evaluation metrics of a large, automated QC data set with the manual QC dataset to show very similar performance.

      3.b The visual assessment protocol is insufficiently described for any attempt to reproduce: (i) numbers of images rated by author SR and reused from the ABCD's accept/reject ratings; (ii) of those rated by author SR, state how the images were selected (randomly, stratified, etc.) and whether site-provenance, age, etc. were blinded to the rater; (iii) protocol details such as whether the rater navigated through slices, whether that was programmatic or decided per-case by the rater, average time eyeballing an image, etc; (iv) rating range (i.e., accept/reject) and assignment criteria; (v) quality assurance decisions (i.e., how the quality annotations are further used)

      These details have been added to the methods section where we describe the manual QC process. We have also updated the QC GitHub with more detailed instructions for using and include a link to view an example of the manual QC environment.

      3.c Similarly, the integration within the model and/or the training/testing of the automated QC is unclear. The responses to Reviewer 1 above and our revisions to the methods section should also clarify this. In brief, QC was performed on the data prior to splitting of the data to assess generalizability.

      Additional comments

      • Repeated individuals: it seems likely that there are repeated individuals, at least within the UKB and perhaps ABCD. This could be more clearly stated, indicating whether this is something that was considered or, conversely, that shouldn't influence the analysis. We have clarified in the methods section that no repeated subjects were used in the dataset.
      • Figure 3 - the Y-axis of each column should have a constant range to allow the suggested direct comparison. We have changed Figure 3 to have a constant range across all test sets.
      • Tables 5 through 8 are hard to parse - They may be moved to CSV files available somewhere under a CC-BY or similarly open license, and better interpreted with figures that highlight the message distilled from these results.

      We agree with the reviewer about the difficulty in summarizing such a large number of results in an easily digestible manner and that tables are not the optimal format to achieve this. Therefore, we have created interactive visualizations for Tables 5-8 that make exploring the evaluation metrics much easier. All the CSV files are also hosted on our GitHub page in the metrics folder (https://github.com/predictive-clinical-neuroscience/braincharts/tree/master/metrics).

      • Lines 212-214 about the QA/QC problem in neuroimaging are susceptible to misinterpretation. That particular sentence tries to bridge between the dataset description and the justification for the mQC sample and corresponding experiments. However, it fails in that objective (as I noted as a weakness, it's unclear the connection between the model and QC), and also misrepresents the why and how of QC overall.

      We have considerably expanded upon our motivation for using a manual QC approach and the steps this entails, which should address this issue.

      • The fact that the code or data are accessible doesn't mean they are usable. Indeed, the lack of license on two of the linked repositories effectively pre-empts reuse. Please state a license on them. We thank the reviewer for this suggestion. We have updated both repositories to include a license file.
      • Figure 1 - caption mentions a panel E) that seems missing in the figure.

      We have corrected this mistake in the caption of Figure 1.

      • There is no comment on the adaptations taken to execute FreeSurfer on the first age range of the sample (2-7 yo.).

      We did not make adaptations of the Freesurfer pipeline for this age range and have added this to the limitation section.

      • Following up on weakness 3.c, while scaling and centering is a sensible thing to do, it's likely that those pruned outliers actually account for much of the information under investigation. Meaning, EC is a good proxy for manual rating - but Rosen et al. demonstrate this on human, neurotypical, adult brains. Therefore, general application must be dealt with care. For example, elderly and young populations will, on average, show substantially more images with excessive motion. These images will go through FreeSurfer, and often produce an outlier EC, while a few will yield a perfectly standard EC. Typically, these cases with standard ECs are probably less accurate on the IDPs being analyzed, for example, if prior knowledge biased more the output for the hidden properties of this subject. In other words, in these cases, a researcher would be better off actually including the outliers.

      This is an important point to raise. We agree with the reviewer that the Euler Characteristic is likely correlated with pathology in addition to data quality (e.g., due to movement artefacts) and this is important to consider when modeling clinical populations but also ensure high quality data. First, we point out that the inclusion threshold is mostly important for the estimation of the normative model, which in our work – like Rosen et al – is based on healthy control data. It is easy to repeat predictions for subsequent clinical samples using a more lenient inclusion threshold (or none at all) in cases where this consideration might be operative. Second, in an effort to strike the right balance, we have chosen the EC threshold quite conservatively in that it excludes subjects that are very far into the tail of the (rescaled and centered) EC histogram. This means that we are likely dropping only subjects with true topological defects. This is also an important motivation for the careful manual QC procedures we describe above. That said, we acknowledge that any heuristic is necessarily imperfect, which we acknowledge in the limitations section and in the methods.

      • Title: "high precision" - it is unclear what precision this is qualifying as high. Is it spatial effectively granularity for a large number of ROIs being modeled or is it because the spread of the normative charts is narrow along the lifespan and as compared to some standard of variability.

      We refer to spatial precision in terms of the granularity of the regions of interest that we estimate models for. We have revised the manuscript throughout to make this more explicit.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this report, Shekhar et al, have profiled developing retinal ganglion cells from embryonic and postnatal mouse retina to explore the diversification of this class of neurons into specific subtypes. In mature retina, scRNAseq and other methods have defined approximately 45 different subtypes of RGCs, and the authors ask whether these arise from a common postmitotic precursor, or many ditinct subtypes of precursors. The overall message, is that subtype diversification arises as a "gradual, asynchronus fate restriction of postmitotic multipotential precursors. The authors find that over time, clusters of cells become "decoupled" as they split into subclusters. This process of fate decoupling is associated with changes in the expression of specific transcription factors. This allows them to both predict lineage relationships among RGC subtypes and the time during development when these specification events occur. Although this conclusion based almost entirely on a computational analysis of the relationships among cells sampled at discrete times, the evidence presented supports the overall conclusion. Future experimental validation of the proposed lineage relationships of RGC subtypes will be needed, but this report clearly outlines the overall pattern of diversification in this cell class.

      We thank the reviewer for their thoughtful assessment of our study.

      Reviewer #2 (Public Review):

      The manuscript "Diversification of multipotential postmitotic mouse retinal ganglion cell precursors into discrete types" by Shekhar and colleagues represents an in-depth analysis of an additional transcriptomic datasets of retinal single-cells. It explores the progression of retinal ganglion cells diversity during development and describes some of aspects of fate acquisition in these postmitotic neurons. Altogether the findings provide another resource on which the neural development community will be able to generate new hypotheses in the field of retinal ganglion cell differentiation. A key point that is made by the authors regards the progression of the number of ganglion cell types in the mouse retina, i.e., how, and when neuronal "classes diversify into subclasses and types" (also p. 125). In particular, the authors would like to address whether postmitotic neurons follow either a predetermination or a stepwise progression (Fig. 2a). This is indeed a fascinating question, and the analysis, including the one based on the Waddington-OT method is conceptually interesting.

      Comments and questions:

      Is the transcriptomic diversity, based on highly variable genes (the number of which is not detailed in the study) a robust proxy to assess cell types? One could argue that early on predetermined cell types are specified by a small set of determinants, both at the proteomic and transcriptomic level, and that it takes several days or week to generate the cascade that allows the detection of transcriptional diversity at the level of >100 gene expression levels.

      We had tested the dependence of our results on the number of highly variable genes (HVGs) used. This analysis, shown in Figure 2h, demonstrates that results are robust over the range tested – 1244-3003 total HVGs. Since the analysis in the paper employs 2800 HVGs (~800- 1500 at each stage), we are confident that we are in comfortable excess of the number at which we would need to worry. We have expanded the discussion to avoid confusion on this point. We also address the possibility that a small set of determinants are sufficient to define cell state in a transcriptomic study. This is a common argument, but we believe it is a tenuous one. We believe that the only way a small number of genes can truly define cell state is if they are expressed at very high levels. If these are expressed at high levels, they should be detected in our data and should drive the clustering. If they are expressed at extremely low levels, then given the nature of molecular fluctuations in cells, they cannot be expected to serve as a stable scaffold for differentiation. Indeed, a small set of determinants (usually transcription factors) may be necessary to specify a cell type. However, sufficiency of specification requires the expression of a usually much larger of number downstream regulators.

      Since there are many RGC subsets (45) that share a great number of their gene expression, is it possible that a given RGC could transition from one subset to another between P5 and P56? Or even responding to a state linked to sustained activity? Was this possibility tested in the model?

      We cannot address the possibility that cells swap types postnatally so that the cells comprising type X at P5 are not the same ones that comprise type X at P56. It does seem pretty unlikely, as the cell types are well-separated in transcriptional space (~250 DE genes on average). Regarding activity, we have made some initial tests by preventing visually evoked activity from birth to P56 in three different ways (dark-rearing and two mutant lines). We find no statistically significant effect on diversification. These results are currently being prepared for publication.

      The authors state that early during development there is less diversity than later. This statement seems obvious but how much. Can this be due to differential differentiation stage? At E16 RGC are a mix of cells born from E11 to E16, with the latter barely located in the GCL. Does this tend to show a continuum that is may be probably lost when the analysis is performed on cells isolated a long time after they were born (postnatal stages)? Alternatively, would it be possible to compare RGC that have been label with birth dating methods?

      Regarding the amount of diversification, we quantified this using the Rao diversity index (Figure 2h), which suggests an overall increase in 2-fold transcriptional diversity at P56 compared to the early stages. The continuum is likely because cells at early stage are close to the precursor stage and not very differentiated. Regarding combining RNA-seq with birthdating, although elegant methods now make this combination possible, it falls beyond the scope of this study.

      Comparing data produced by different methods can be challenging. Here the authors compared transcriptomic diversity between embryonic dataset produced with 10X genomics (E13 to P0) and, on the other hand, postnatal P5 that were produced using a different drop-seq procedure). Is it possible to control that the differences observed are not due to the different methods?

      It is correct that most of the P5 data was produced using Drop-seq, but that dataset also includes transcriptomes obtained by the 10X method. The relative frequency of RGC clusters and the average gene expression values obtained using either method was highly correlated (Reviewer Fig. 1). This is now pointed out in the “Methods.”

      Reviewer Fig. 1. Comparison between the relative frequency of types (left) and the average gene expression levels (right) at P5 between 10X data (y-axis) and Drop-seq data (x-axis). R corresponds to the Pearson correlation coefficient. The axes are plotted in the logarithmic scale.

      It might be important to control the conclusion that diversity is lower at E13 vs P5 when we see that thrice less cells (5900 vs 180000) were analyzed at early stage (BrdU, EdU, CFSE...)? A simple downsampling prior to the analysis may help.

      Although we collected different numbers of cells at different ages, we noted in the text that they do not influence the number of clusters. Regarding P5 specifically, Rheaume et al. (who we now discuss) obtained very similar results to ours with only 6000 cells (3x lower).

      Ipsilateral RGC: It is striking that the DEG between C-RGC and I-RGC reflect a strong bias with cells scored as" ipsi" are immature RGC while the other ("contra") are much more mature. This bias comes from the way ipsilateral RGC were "inferred" using non-specific markers. Can the author try again the analysis by identifying RGC using more robust markers? (eg. EphB1). Would it be possible to select I-RGC and C-RGC that share same level of differentiation? Previous studies already identified I-RGC signature using more specific set-up (Wang et al., 2016 from retrogradely labelled RGC; Lo Giudice et al., 2019 with I-RGC specific transgenic mouse).

      We are not sure how the reviewer concludes that the putative I-RGCs are more immature than the putative C-RGCs. As discussed earlier, insofar as expression levels of pan-RGC markers are indicative of maturational stage, we found no evidence that clustering is driven by maturation gradients. Thus, we expect our putative I-RGCs and C-RGCs to not differ in differentiation state. Following the reviewer’s suggestion, we now include EphB1(Ephb1) in our I-RGC signature. The impact of replacing Igfbp5 with Ephb1 on the inferred proportion of I-RGCs within each terminal type was minimal (Reviewer Fig. 2). We would like to note that to assemble our IRGC/C-RGC signatures we relied on data presented Wang et al. (2016). Outside of wellestablished markers (e.g. Zic2, and Isl2), we chose the RNA-seq hits in Wang et al. that had been validated histologically in the same paper or that are correlated with Zic2 expression in our data. This nominated Igfbp5, Zic1, Fgf12, and Igf1.

      Reviewer Fig. 2. Comparison of inferred I-RGC frequency within each terminal type (points) using two I-RGC signature reported in the paper. For the y-axis we used Zic2 and EphB1.

      It would be important to discuss how their findings differs from the others (including Rheaume et al., 2018). To make a strong point, I-RGC shall be isolated at a stage of final maturation (P5?) and using retrograde labelling, which is a robust method to ensure the ipsilateral identity of postnatal RGCs.

      We cite Rheaume et al. in several places. In fact, there is good transcriptional correspondence between our dataset and theirs (Figure S1i), despite the differences in the number of cells profiled (~6000 vs ~18000) and technologies (10X vs. Drop-seq/10X). We now mention this is the text. Note also that we had compared our P56 data with Rheaume et al.’s, P5 data in an earlier publication (Tran et al., 2019) and observed a similar tight correspondence between clusters. Zic1 is expressed in I-RGCs (Wang et al., 2016) at early stages, and in our dataset its expression at E13 and E14 is similar to that of Zic2 (Supplementary Fig. 8); Postnatally, however, it marks W3B RGCs (Tran et al., 2019), many of which project contralaterally (Kim et al., J. Neurosci. 2010). Regarding retrograde labeling, as noted above, additional experiments would take a prohibitively long time (up to a year) to complete.

      It is unclear how good Zic1 and Igf1 can be used as I-RGC marker. Can the author specify how specific to I-RGC they are? Have they been confirmed as marker using retrograde labelling experiments?

      We have relied on previous work, primarily from the Mason lab, to choose I-RGC and C-RGC markers. Igf1 is a C-RGC marker that is expressed in a complementary fashion with Igfbp5, an I-RGC marker as noted in Wang et al, 2016. They also perform ISH to show that Igf1 is not expressed in the VT crescent, while Igfbp5 is (see Fig. 5 in Wang et al., 2016). Similarly, Zic1 is also cited in Wang et al. as an RNA-seq hit for I-RGCs. Although Zic1 was not validated using ISH, we found its expression pattern to be highly correlated with Zic2 at E13 (Supplementary Fig. 8c).

      The enrichment procedure may deplete the RGC subpopulation that express low levels of Thy1 or L1CAM. A comparison on that point could be done with the other datasets analysed in the study.

      We presume the reviewer is referring to the data of Lo Guidice and Clark/Blackshaw, which we show in comparison to ours in Figure S1. In both of those studies, all retinal cells were analyzed, whereas we enriched RGCs. As noted in the text, RGCs comprise a very small fraction of all retinal cells, so Lo Giudice and Clark/Blackshaw lacked the resolution to resolve RGC diversity at later time points. Indeed, there is no whole retina dataset available in which RGCs are numerous enough for comprehensive subtyping. Our approach to this issue was to collect RGCs with both Thy1 and L1 at E13, E14, E16 and P0, with the idea that the markers might have complementary strengths and weaknesses. In fact, at each age, all clusters are present in both collection types, although frequencies vary. This concordance supports the idea that neither marker excludes particular types. We now stress this point in results and in the Supplementary Fig. 2 legend.

      In supplemental Fig. S1e: why are cells embedded from "Clark" datasets only clusters on the right side of the UMAP while the others are more evenly distributed?

      Actually, both the Clark et al. and Lo Giudice et al. datasets are predominantly clustered on the right side of the UMAP. This reflects the methodological difference noted above: they profiled the whole retina, whereas we isolated RGCs. Thus, their datasets contain a much higher abundance of RPCs and non-neurogenic precursors compared to ours. The right clusters represent RPCs due to their expression of Fgf15 and other markers, while the left clusters represent RGCs based on their expression of Nefl. Indeed, a main reason for including these plots was to illustrate the relative abundance of RGCs in our data (also see Supplementary Fig. S1h).

      What could explain that CD90 and L1CAM population are intermingled at E14, distinct at E16, and then more mixed at P0?

      We believe the reviewer is referring to Supplementary Figs. S2a-c. Given the temporal expression level changes in Thy1 and L1cam (Supplementary Fig. S1c) in RGCs, a likely possibility is that they enrich RGC precursor subsets at different relative frequencies. We now note this in the Supplementary Fig. 2 legend.

      On Fig. 6: the E13 RGC seems to be segregated in early born RGC expressing Eomes and later born expressing neurod2. Thus, fare coupling with P5 seems to suggest that Eomes population at P5 may have been generated first, and Neurod2 generated later. Is that possible?

      That the Eomes RGCs are specified before Neurod2 RGCs is one of our conclusions from the fate decoupling analysis (Figures 6f-h). Whether this is because the former arise from early born cells and the latter arise from later born cells is not clear. There is disagreement in the literature on whether ipRGCs are born at a different time than other RGCs, so we prefer not to make a comment.

      Methods: The Methods section is extensive, and yet it is presented in a rather complex manner so that it is difficult to understand for a broad audience. It would be valuable if the authors could simplify or better explain some parts (the WOT section in particular).

      We believe that the sections on animals, molecular biology and histology are quite straightforward, but agree that the sections describing the computational analysis are hard going. We have modified them in several places as requested. As regards better explanation of the WOT, we now precede that section with an “overview” as a way of making it easier to follow. (We had already included an overview of the clustering procedures.) We have also provided further detail on some of the reviewer’s subsequent questions on this section, including the use of HVGs, the Classifier, and the strategy for inferring I-RGCs (see below). Perhaps most important, we have worked to make the “Results” and “Discussion” sections accessible to a broad audience.

      *Highly variable genes (HVG) used for clustering and dimensionality reduction: how many of them and what are they? Are they the same used for each stage?

      Since clustering was performed at each stage independently, we determined HVGs at each stage separately using a statistical method introduced in one of our previous studies (Pandey et al., Current Biology, 2018). The total number of HVGs at each stage were as follows: E13: N=1094 E14: N=834 E16: N=822 P0: N=881 P5: N=1105 P56: N=1510

      We note that these are not necessarily the same at each stage due to the temporal variation in gene expression. Together these correspond to 2854 unique genes (union of all HVGs). The WOT analysis was done using this full set.

      *In the methods p9: "The common features G = GR ∩ GT are used to train a third classifier ClassR on the reference atlas AR. This ensures that inferred transcriptomic correspondences are based on "core" gene expression programs that underlie cell type identity rather than maturation-associated genes." Could the authors explain the relevance of using a third model and, more importantly, is there any genes that eliminated through the procedure that could be important to drive the diversification process? If so, would it be possible to estimate their number and the relative impact?

      The rationale for this was as follows. Our goal is to map cells from one time point to a type at another time point. The naïve way to do this would be to use a classifier trained entirely at either of the time point. However, the features of such a classifier is likely to contain genes that are not expressed at the earlier time point, and likely to generate spurious mappings (since the set of cluster specific genes are not identical). Therefore, we sought to train a classifier that is trained using genes that are part of conserved transcriptional signatures at both time points, which corresponds to the third model.

      When this filtering was not performed, the temporal correspondences in the supervised classification model were less specific than those reported. In particular, ARI values dropped by about 15% on average. The simple reason for this is that a cluster specific gene at E13 (for e.g.) may no longer be expressed at E14, and vice-versa. Thus, by restricting the features to a common set of cluster specific genes, we obtained the “best possible” transcriptomic correspondences between clusters at consecutive time points. We note that the correspondences obtained in this way (Figure 3) were recovered through WOT when the results of the latter were collapsed at the cluster level (Supplementary Fig. 5).

      *Methods page 15: Inference of ipsilaterally-projecting RGC types. Wouldn't it be more valuable to consider more markers to distinguish RGC precursors?

      As indicated before, we used I-RGC genes and C-RGC genes reported in Wang et al., 2016 (Table 2), in addition to the well-known markers Zic2 and Isl2. Here, we prioritized genes that had been histologically validated (Figs. 4 and 5), which were expressed in our data (Sema3e and Tbx20 were not considered as these undetectable at E13 in our data). Following the reviewer’s earlier suggestion, we also noted that including Ephb1 in our signature minimally impacts the results.

      Discussion: *Is there somewhat a plasticity that allow the RGC subgroups to switch over time? (IF we were to record the transcriptome of the same cell over time, will one observe that the cell belong to another cluster / subgroup?

      One can only speculate. Other than long-term in vivo imaging combined with vital type-specific markers we know of no way to experimentally address the possibility that cells swap types postnatally so that the cells comprising type x at P5 are not the same ones that comprise type x at P56. It does seem pretty unlikely though.

      *While the data appears technically rigorous, and the number of cells sequenced very high, the results seem redundant with several prior studies and the discrepancies are not sufficiently discussed.

      We are confused by this point, since the reviewer does not cite the papers to which s/he refers. To our knowledge there is no study at present that has described RGC diversification, so it is not clear what would be discrepant.

    1. Author Response:

      Reviewer #1:

      The authors present an interesting concept for the mechanism of rash induction in EGFR inhibitor (EGFRi) treated rats. EGFRi causes production of pro-inflammatory factors in epidermal keratinocytes which may induce dedifferentiation and reduction of the dWAT compartment, presumably mediated via PPAR. Factors produced by dedifferentiated FB then recruit monocytes thereby inducing skin inflammation. This work is aiming to improve targeted cancer therapy efficiency and is therefore of potential clinical relevance.

      However, most of the conclusions drawn by the authors are based on correlations, e.g. between the amount of dWAT and rash intensity. Mechanistic data have been mainly generated in vitro. The exact order of events to formulate a definitive mechanistic proof in vivo for this hypothesis is missing. In particular, it is not clear which cells in the skin, apart from keratinocytes, are specifically targeted by EGFR inhibitors and/or by Rosiglitazone. The authors also do not show EGFR staining in adipocytes and its inhibition by Afa. The effects of Afa and Rosi on monocytes / macrophages are completely ignored by the authors. Additionally, some of the presented results are overinterpreted and not really supporting what is claimed.

      Most importantly, the whole study is based on inhibitor treatments. Afatinib for example is not only inhibiting EGFR but all other erbB family members and as such it represents a panErbB inhibitor and it is not clear whether the observed effects are induced by inhibition of EGFR of other erbB receptors which have been shown to have also effects in the skin. For further specification of the role of EGFR, other, more specific inhibitors should be used to confirm the basic concept along with genetic proof either in genetically engineered mice or by Crispr-mediated-deletion.

      To further support the hypotheses of the authors, the study needs to be further substantiated by mechanistic experiments and the clinical relevance should be strengthened by performing histologic analysis of skin samples of patients treated with EGFRi and respective analysis of rash and e.g. BMI etc.

      Thanks for your positive comments on the potential impact for cancer patients suffering EGFR inhibitor induced skin rash. We have carefully considered all comments from the reviewer and revised our manuscript accordingly. In the following section, we summarize our responses to each comment of the reviewer. We believe that our responses have well addressed all concerns from the reviewer.

      We agree with the reviewer’s comment that our research may need more direct mechanistic in vivo studies upon our in vitro results. In our research, we have collected evidence from previous studies and used various in vitro and ex vivo experiments to investigate our findings. However, the study was still limited by currently available technologies.

      In the revised version, we supplemented the pEGFR and pERK staining of adipocytes in Figure 3-figure supplement 1C. The levels of phospho-EGFR and ERK in dWAT were significantly decreased after EGFRi treatment.

      This study was inspired by the observations of the unusual dWAT reduction during EGFRi treatment, thus we focused on the investigation of dermal adipocytes. In addition, the roles of mastocytes, monocytes, and macrophages in EGFRi-induced cutaneous toxicity have been thought as responders to increased expressions of cytokines. Local depletion of macrophages and degranulated mastocytes just provided partial resolution, indicating a multifactorial and complicated pathology of cutaneous toxicity induced by anti-EGFR therapy(Lichtenberger et al., 2013; Mascia et al., 2013).

      In terms of some inappropriate descriptions, we agree with the reviewer that they will be more convincing if there is a direct assessment from genetically engineered mice. For example, we tried to establish the relationship between S. aureus infection and EGFRi-induced rash based on a well-accepted study from Lingjuan Zhang (Zhang et al., 2015). They reported that adipose precursor cells secret antimicrobial peptide cathelicidin during differentiation to against S. aureus infection. Mice with impaired adipogenesis were more susceptible to S. aureus infection. This conclusion gave us insights into the relationship between S. aureus infection and EGFRi-induced skin inflammation. Unfortunately, the anti-CAMP antibody was made by the author’s lab and there are no mature products that can recognize CAMP in rats. To provide more mechanistic evidences, we conducted qPCR experiments to study the transcriptional level of the Camp gene both in dWAT and dFB cells isolated from rat skin (Figure 3I and 3J). dWAT in Afa group showed a lower expression level of Camp compared with control group. In addition, in different differentiation stages of dFB in vitro, transcriptional levels of Camp were decreased by Afa treatment while increased by Rosi. In summary, the data we collected could verify the causal relationship between EGFRi-induced dWAT reduction and S. aureus infection to some extent. However, the limitation of the technology is an obstacle for us to provide more evidences. Thus, in the revised manuscript, we have edited our writing to make the statement not that strong.

      According to the clinical evidence, the rash can also be induced by many specific Erbb1 inhibitors. All three generations of EGFR inhibitors in the clinic have very high incidence rates of cutaneous toxicity (Supplementary file 1). In the revised version, we provided rash models induced by both first-generation EGFRi, Erlotinib, Gefitinib, and the third-generation EGFRi, Osimertinib. As shown in Figure 1-figure supplement 1D, the rash caused by Erlotinib, Gefitinib, and Osimertinib had the same phenotypes as Afatinib-induced rash.

      In summary, the current form of evidences should support our findings, even more direct mechanistic studies would be better. We are now seeking the opportunity for cooperation to build a dermal adipocyte knockout mouse model platform and hope to investigate the specific roles of dermal adipocytes in the future. We also plan to have cooperation with hospitals to explore the clinical evidence of patients receiving EGFR inhibitors.

      References:

      Lichtenberger BM, Gerber PA, Holcmann M, Buhren BA, Amberg N, Smolle V, Schrumpf H, Boelke E, Ansari P, Mackenzie C, Wollenberg A, Kislat A, Fischer JW, Röck K, Harder J, Schröder JM, Homey B, Sibilia M. 2013. Epidermal EGFR controls cutaneous host defense and prevents inflammation. Sci Transl Med 5.

      Mascia F, Lam G, Keith C, Garber C, Steinberg SM, Kohn E, Yuspa SH. 2013. Genetic ablation of epidermal EGFR reveals the dynamic origin of adverse effects of anti-EGFR therapy. Sci Transl Med 5.

      Zhang L, Guerrero-juarez CF, Hata T, Bapat SP, Ramos R, Plikus M V, Gallo RL. 2015. Dermal adipocytes protect against invasive Staphylococcus aureus skin infection. Science 347:67–72.

      Reviewer #2:

      Leying Chen et al. investigated the mechanism of EGFR inhibitor-induced rash. They find that atrophy of dermal white adipose tissue (dWAT), a highly plastic adipose tissue with various skin-specific functions, correlates with rash occurrence and exacerbation in a murine model. The data indicate that EGFR inhibition induces the dedifferentiation of dWAT and lipolysis , finally lead to dWAT reduction which is a hallmark of the pathophysiology of rash. Notably, they demonstrate that stimulating dermal adipocyte expansion with a high-fat diet (HFD) or the pharmacological PPARγ agonist rosiglitazone (Rosi) ameliorated the severity of rash. Therefore, PPARγ agonists may represent a promising new therapeutic strategy in the treatment of EGFRI-related skin disorders pending to be confirmed in further study.

      We greatly appreciate the reviewer for giving the above positive comments.

      The conclusions of this paper are mostly well supported by data, but some results need to be clarified and verified.

      1) PPAR signaling in the pathology of EGFRI-induced skin toxicity. In figure 2 , the results show Rosi reversed the dedifferentiation of dermal adipocytes induced by Afa. This may due to PPARγ upregulation but not be confirmed in the results. The relative genes expression in dWAT after treated with Afa and ROSi were not demonstrated in the results.

      We thank the reviewer for reminding us for additional experiment of PPARγ. In the revised version, we collected attatched-dWAT after 5-day Afa or Rosi treatment, and performed transcriptional experiment of Pparg. The expression level of Pparg was downregulated by Afa treatment and upregulated by Rosi treatment (Figure 2-figure supplement 1D).

      2) the effect of PPAR signaling on PDGFRA-PI3K-AKT pathway The AKT pathway is a key downstream target of EGFR kinase, so it is reasonable to see p-AKT1 and p-AKT2 levels were decreased by Afa (figure 3C) However, addition of Rosi to Afa significantly activated both AKT1 and AKT2 . What is the underlying mechanism for the results and whether it is related to the PPAR signaling pathway.

      Given the importance of the PI3K/AKT pathway in regulating AP and mature adipocyte biology(Jeffery et al., 2015), we used p-AKT to characterize the activation of dFBs. The mechanism of how modulating PPARγ affects AKT is still unknown. One study found that MAPK and PI3K are upregulated and activated by rosiglitazone that in turn might enhance adipogenesis(Fayyad et al., 2019). In skeletal muscle, PPARγ enhances insulin-stimulated PI3K and Akt activation(Marx et al., 2004). It is also reported rosiglitazone has a neuroprotection effect against oxidative stress. The PPARγ-rosiglitazone complex binds to the neurotrophic factor-α1 (NF-α1) promoter and activates the transcription of NF-α1 mRNA which is then translated to the protein. NF-α1 binds to a cognate receptor and activates the AKT and ERK pathways(Thouennon et al., 2015). Thus, further studies should be carried out to investigate the effects of rosiglitazone to PI3K/AKT pathway on adipogenesis.

      3) According to figure 3 F , 3G and 3H., authors draw a conclusion that " a lack of APs and mature dWAT impairs the maintenance of the host defense and hair growth in the skin" In my opinion, there are no results can directly prove this. According to figure 3H, the impairment of hair growth may be caused by EGFR inhibition of hair follicles.

      We appreciate the reviewer for pointing this important point out. We tried to establish the relationship between S. aureus infection and EGFRI-induced rash based on a well-accepted study from Lingjuan Zhang (Zhang et al., 2015). They reported that adipose precursor cells secret antimicrobial peptide cathelicidin during differentiation to against S. aureus infection. Mice with impaired adipogenesis were more susceptible to S. aureus infection. This conclusion gave us insights into the relationship between S. aureus infection and EGFRI-induced skin inflammation. Unfortunately, the anti-CAMP antibody was made by the author’s lab and there are no mature products that can recognize CAMP in rats. To provide more mechanistic evidences, we conducted qPCR experiments to study the transcriptional level of the Camp gene both in dWAT and dFB cells isolated from rat skin (Figure 3I and 3J). dWAT in Afa group showed a lower expression level of Camp compared with control group. In addition, in different differentiation stages of dFB in vitro, transcriptional levels of Camp were decreased by Afa treatment while increased by Rosi. In summary, the data we collected depending on the current technology could verify the causal relationship between EGFRI-induced dWAT reduction and S. aureus infection to some extent. However, we agree with the reviewer that this conclusion needs more direct evidence. Thus, in the revised manuscript, we have edited our writing to make the statement not that strong.

      Since recent reports have shown that dermal adipocytes have the capacity to support hair regeneration, we used this conclusion to characterize the function of dWAT. However, we agree with the reviewer that it needs more specific and direct experiments to verify the causality with dWAT. And we are seeking the opportunity for cooperation to build a dermal adipocyte knockout mouse model platform and hope to investigate the specific roles of dermal adipocytes in the future. In the revised manuscript, we also adjusted the statements.

      4) EGFRI stimulates keratinocytes (HaCaT cells) to produce lipolytic cytokines (IL-6) (Figure 4G). IL6 enhanced the lipolysis of differentiated dFB (Figure S4M) and C18 fatty acids were supposed to be released the cell matrix during lipolysis. In figure 4H, HaCaTcells supernatants and dFB supernatants were collected. IL-6 was supposed to increase in HaCaTcells supernatants and was confirmed in Figure 4SK and S4L.However, C18 fatty acids were not showed to be in the dFB supernatants in the study directly.

      We thank the reviewer for pointing this out. We conducted additional lipidomics of dFB supernatants. However, because the differentiation medium needs to be changed every two days, it is hard to accumulate enough FFAs. We collected supernatants on Day3, Day 6, and Day 9. They were all below the detection limit of mass spectrum. We agree with the reviewer that more evidences are needed to prove the correlation between C18 FFAs and lipolysis. Therefore, we performed a mass spectrometry analysis of skin tissues from Ctrl and Afa groups after 3-day treatment to confirm the releasing of C18 FFAs. The result showed an increased tendency of C18:2 and other FFAs in the Afa group (Figure 1 in response letter). However, this increase had no significant statistic difference. This might be due to the interference of sebaceous gland and dermal adipocytes. In consequence, we adjusted the descriptions in the revised manuscript to make this statement not that strong.

      Figure 1. C18 concentrations in skin tissues from Ctrl and Afa groups after 3-day treatment. n=3.

      References:

      Fayyad AM, Khan AA, Abdallah SH, Alomran SS, Bajou K, Khattak MNK. 2019. Rosiglitazone Enhances Browning Adipocytes in Association with MAPK and PI3-K Pathways During the Differentiation of Telomerase-Transformed Mesenchymal Stromal Cells into Adipocytes. Int J Mol Sci 20.

      Jeffery E, Church CD, Holtrup B, Colman L, Rodeheffer MS. 2015. Rapid depot-specific activation of adipocyte precursor cells at the onset of obesity. Nat Cell Biol 17:376–385.

      Marx N, Duez H, Fruchart J-C, Staels B. 2004. Peroxisome proliferator-activated receptors and atherogenesis: regulators of gene expression in vascular cells. Circ Res 94:1168–1178. Thouennon E, Cheng Y, Falahatian V, Cawley NX, Loh YP. 2015. Rosiglitazone-activated PPARγ induces neurotrophic factor-α1 transcription contributing to neuroprotection. J Neurochem 134:463–470.

      Zhang L, Guerrero-juarez CF, Hata T, Bapat SP, Ramos R, Plikus M V, Gallo RL. 2015. Dermal adipocytes protect against invasive Staphylococcus aureus skin infection. Science 347:67–72.

    1. Author Response:

      Reviewer #2 (Public Review):

      This manuscript by Barton and colleagues explores the roles of the conserved Eco1 transacetylase in modulating cohesin function in meiosis in budding yeast. Numerous studies in mitotically dividing cells have shown that the Eco1 family of transacetylases acetylate the Smc3 subunit of cohesin and that this acetylation renders cohesin on chromosomes resistant to removal by the Wapl (Wpl1 in budding yeast) family of proteins. Cohesins play critical roles in both sister chromatid cohesion and chromatin organization (through the formation of intrachromosomal loops). How cohesins are regulated by Eco1 in meiosis to accommodate meiotic chromosome structures such as the synaptonemal complex, chromatin domains around centromeres, repair of programmed meiotic double strand DNA breaks in prophase, and sequential removal of cohesins - first at arms in meiosis I and centromeres at meiosis II - is largely unexplored. Thus, this manuscript is exploring important new areas.

      The authors show that Eco1 persists thru prophase I (longer than it does in vegetative cell cycles), that it is not necessary for cohesin loading at centromeres but is needed to counteract Wpl1 to protect centromeric cohesion, that it is critical for the establishment of chromatin loops on meiotic chromosome arms and that it is critical for protection of the arm cohesin from removal by Wpl1. The authors also provide evidence that, in meiosis, Wpl1 exhibits underappreciated functions in cohesin loading or cohesion establishment in addition to its recognized role in cohesin removal.

      The experiments demonstrate that Eco1 is necessary for sharp cohesin boundaries that flank the centromeres and suggest this might be a replication-independent function of Eco1 (the boundaries form in clb5, clb6 cells with no DNA replication phase) but it is unclear if the detectable, but diminished, boundaries in clb5,clb6 cells were formed in the replication-free meiosis or presist from the S-phase associated loading and cohesion establishment from the preceding mitotic cycle.

      Entry into meiosis occurs from G1 when there is no cohesin on the chromosomes and boundaries are not present, therefore this would only be a concern if there were persistent mitotic cells in G2 (i.e. after DNA replication). Our flow cytometry shows that the cells used in the experiment were unreplicated, so even if mitotic cells were present, they would not have been through S phase.

      Nevertheless, we addressed this point by analysis of pre-S phase meiotic cells (ime1/ime4 block) and by anchoring away Eco1 in unreplicated cells.

      Immunofluorescence imaging assays are used to observe the behavior of sister chromatids in meiosis I and meiosis II as a function of Eco1 activity. In wild-type cells sister chromatids co-orient in meiosis I and move to the same pole of the spindle. In mammalian cells and fission yeast this co-orientation requires cohesin while studies in budding yeast have suggested the co-orientation is cohesin-independent. Here, the authors show that when Eco1 is depleted, the sisters often move to opposite poles at meiosis I, and suggest that cohesin (and Eco1) is indeed required for sister co-orientation. An alternate possibility is that the sisters have lost their association in meiotic prophase (due to cohesin failures) before attaching to microtubles and segregating randomly - often to opposite poles.

      We agree with this point, but would argue that the “alternative possibility” (which our data support) still leads to the conclusion that cohesin and Eco1 are required for sister co-orientation. A prior study (Monje-Casas et al., 2007) had suggested that monopolin could link sister kinetochores even without cohesin. We now show that this is not the case, which we believe to be an important conclusion.

      Our results indicate that establishment of monoorientation requires the cohesin that is localized at centromeres. WPL1 deletion in eco1-aa rescues centromeric cohesion (Figure 2F, Figure 8E), but not chromosome arm cohesion (Figure 2H) or sister chromatid segregation in meiosis II (Figure 8F), indicating that pericentromeric cohesion must still be defective.

      For clarity, please note that the relevant data is not immunofluorescence, but live cell imaging (now shown in Figure 8) so these conclusions are based on observation of single chromosomes in individual live cells from prophase I until anaphase II.

      In summary the authors show that Eco1 has distinct roles on chromosome arms and centromeres and probably in both replication-linked and replication-independent events, acts to modulate cohesin location and function in meiosis.

      Reviewer #3 (Public Review):

      This paper investigates the meiotic roles of two regulators of cohesin, the cohesin destabilizer Wpl1 and the cohesin acetyltransferase Eco1. The authors provide evidence that Eco1 antagonizes Wpl1 to allow stabilization of centromeric cohesin, which is important to establish meiotic chromosome segregation patterns. In addition, Eco1 regulates the stable anchoring of cohesin at boundaries to promote defined chromosome loop formation in meiotic prophase.

      The study uses a combination of calibrated ChIP-seq analysis, and chromosome conformation capture techniques to convincingly show that loop formation is altered in wpl1 depletion and eco1 depletion mutants. Well-established cytological techniques are used to demonstrate different effects on chromosome cohesion along arms and at centromeres, and to show that Eco1 is important for establishing the meiotic segregation pattern. The paper is well written and the data largely support the conclusions. As such, this paper is expected to be of substantial interest to the field.

      One notable weakness is the poor definition of the eco1 anchor-away allele (eco1-aa), on which much of the eco1 phenotypic analysis is based. The presented data indicate that addition of the FRB-GFP tag alone causes most of the phenotypes, regardless of nuclear depletion. It is well possible that the tag creates a meiosis-specific loss-of-function allele, although it is surprising that the tag does not have mitotic defects even though Eco1 presumably has the same substrate (the cohesin subunit Smc3) in both situations. Encouragingly, some of the phenotypes could be confirmed using a non-acetylatable smc3 mutant. However, the tag may also create neomorphic effects that may contribute to the Wpl1-independent effects and the apparent stronger defects of the eco1-aa allele compared to the non-acetylatable smc3 mutant.

      Available evidence suggests that eco1-aa is a loss of function allele.

    1. Author Response:

      Reviewer #3 (Public Review):

      In this study Borg et al. explore the mechanism of PcTx1 inhibition of ASIC1a using TEVC fluorometry. They detected a robust change of a fluorescence signal when PcTx1 was added, and based on this finding, propose that the toxin has three different binding modes: 'loos', 'global' and 'ECDonly'. In addition, using concatamers they conclude that damage of a single PcTx1 binding site out of the three sites present in ASIC1a destabilizes the conformational changes but disruption of two or three binding sites is required to prevent PcTx1-mediated inhibition.

      The main weakness of the study is the lack of additional experiments to confirm that the proposed three PcTx1 binding modes are actually happening.

      We thank the reviewer for the constructive feedback on our work. While PcTx1 modulation of ASIC is no doubt complex and other methods might reveal additional binding modes in the future, we believe that our contribution has indeed provided insights that go beyond the knowledge gained from functional experiments using only electrophysiology experiments, as well as structural efforts. The strength of the VCF method lies in the simultaneous measurement of function and conformation. Here, we have used VCF to uncover three distinct conformational states, of which only one was previously known. We have now included several new experiments, along with changes to the text that hopefully alleviate some of the concerns regarding the existence of the binding modes. Further, we have included additional text in the discussion to acknowledge that other methods might uncover additional or distinct binding modes in the future.

    1. Author Response:

      Reviewer #1 (Public Review):

      Wang et al., investigated the role of RNA m6A modification in intestinal epithelial cells (IECs) in the context of rotavirus infection. The authors found that the mice which specifically lacks METTL3 in IECs show resistance to rotavirus infection. They attributed this effect to increased IFN and ISG expression presumably via IRF7 upregulation. Further genetic IRF7 ablation in IECs led to the sensitivity rotavirus infection. They also found that ALKBH5 is suppressed by a rotaviral protein, although the knockout of ALKBH5 in IECs did not influence viral infection.

      Overall, although the resistance of IEC-specific METTL3-deficient mice upon rotavirus infection via the control of IRF7 is a novel and interesting finding, the proposed model is not fully supported by the findings here. Especially, the following points need to be addressed:

      We are grateful to the reviewer for the complimentary summary of our research. We also appreciate the valuable experiments suggested by the reviewer to improve our manuscript. We have added additional important controls and mechanistic data to further support our conclusions.

      1) The m6A dot blot used in Figure 1 is not a good measurement system of total m6A modification levels, because the antibody used here also detects other RNA modification, m6Am (PMID: 31676230). Therefore, it is unclear if the increase of m6A dot blot intensity is due to the increase of m6A in RNAs mediated by METTL3 in IECs. The authors should investigate the m6A levels in IECs, not BMDMs, under METTL3 deficiency. Ideally, this analysis should be done using mass spectrometry.

      We thank the reviewer for raising a critical point. We have tried several methods to avoid the potential non-specific detection of the previous antibody (Synaptic System, #202003) we used, which was reported to detect m6Am as well.

      1.We have included Dot Blot data for m6A modification in Mettl3^△IEC and WT IECs during RV infection by using another m6A antibody (Anti-N6-methyladenosine (m6A), Sigma-Aldrich, Cat. No. ABE572-I). (see below and also Fig. 1d, 1e)

      2.We have included mass spectrometry data for m6A modification in IECs during development (see below and also Fig. 1c) or RV infection (see below and also Fig. s3a).

      These data suggested m6A modifications in IECs are indeed regulated during the development or RV infection. We have included the descriptions in the text.

      Figure 1. Rotavirus infection increases global m6A modifications, and Mettl3 deficiency in intestinal epithelial cells results in increased resistance to rotavirus infection. (c) MS analysis of m6A level in ileum tissue from mice with different ages. (mean ± SEM), Statistical significance was determined by Student’s t-test (\P < 0.05, NS., not significant). (d) WT and Mettl3^△IEC mice were infected by rotavirus EW strain at 8 days post birth. m6A dot blot analysis of total RNA in ileum IEC at 2 dpi. Methylene blue (MB) staining was the loading control. (e) Quantitative analysis of (d) (mean ± SEM). Statistical significance was determined by Student’s t-test (*P < 0.05, ***P<0.001, NS., not significant). The quantitative m6A signals were normalized to quantitative MB staining signals.*

      Figure s3. MS analysis of total m6A level in mice ileum. (a) WT and Mettl3 △IEC mice were infected by rotavirus EW strain at 8 days post birth. MS analysis of m6A level in ileum tissue from mice at 2 dpi (mean ± SEM), Statistical significance was determined by Student’s t-test (\*P < 0.005)*

      2) The authors show that Alkbh5 expression is increased when the mice grow up to 3 weeks old. However, the Alkbh5 protein expression changes are missing.

      We thank the reviewer for raising this point. We have included the protein expression of ALKBH5 in intestine during the development (see below and Fig. s1). The ALKBH5 protein levels are increased in the intestine along with the age (Fig. s1a, s1b), which is consistent to the changes of mRNA levels of ALKBH5 during the development (Fig. 1d).

      Figure s1. ALKBH5 regulate total m6A level in intestine. (a) Immunoblotting with antibodies target ALKBH5 and TUBULIN in ileum tissues from mice with different ages. (b) Quantitative analysis of (a) (mean ± SEM), Statistical significance was determined by Student’s t-test (\P < 0.05, NS., not significant).*

      3) The authors claim that m6A declined from 2 to 2 weeks post birth is caused by increased Alkbh5 (Line 110). However, it is not clear if the subtle increase in Alkbh5 mRNA leads to the change in global m6A levels. The author can use ALKBH5-deficient mouse cells to confirm this point.

      We thank the reviewer for pointing out an important point. We have included the ALKBH5 over-expression or knock-down data in a mouse IEC cell line MODE-K, to test whether the regulation of Alkbh5 mRNA in IECs leads to the change in global m6A levels.

      Over-expression of ALKBH5 in MODE-K cells largely reduced the global m6A level (see below and Fig. s1d). 1. Crispr-mediated knock down of ALKBH5 in MODE-K cells augmented the global m6A level while knock down of another m6A eraser FTO in MODE-K cells didn’t affect the global m6A level (see below and Fig. s10b).

      Figure s1. ALKBH5 regulate total m6A level in intestine. (d) Immunoblotting with antibodies target ALKBH5 and TUBULIN in MODE-K cells transfected with pSIN-EV or pSIN-mAlkbh5-3xFlag for 24h. m6A dot blot analysis of total RNA in indicated samples. Methylene blue (MB) staining was the loading control.

      Figure s10. Alkbh5 is the dominant m6A eraser in intestine. (b) m6A dot blot analysis of total RNA in different MODE-K cells. Methylene blue (MB) staining was the loading control.

      4) The authors should describe the overall phenotype of IEC-specific METTL3-deficient mice at the steady state. It is important to clarify if the augmented expression of ISG upon METTL3 deficiency is dependent on rotavirus infection. Also, the authors should describe any detectable abnormalities or changes without stimulation.

      We actually collaborated another group and found there is a defect in intestinal stem cells in IEC-specific METTL3-deficient mice. However, as RV normally infected IECs in the villi but not in the crypt, and stem cells are not the major producers of IFN/ISGs (Sue E. Crawford et al. Nature reviews disease primers, 2017). The defect in intestinal stem cells will less likely affect the RV infection phenotype. As it is another story that are under review, we tend to not include this part of the data in our manuscript. Moreover, we have crossed Irf7^−/− mice to Mettl3^ΔIEC mice and verified Irf7 mediated induction of ISGs is critical for the anti-viral phenotype in Mettl3^ΔIEC mice.

      Our bulk RNA-seq data in IECs showed the augmented expression of ISGs upon METTL3 deficiency in steady state (Fig. 2a). We also found an augmented ISG expression in intestine of METTL3-deficient mice in steady state or early infection of RV (2d) by qPCR. However, as the RV loads in METTL3-deficient mice during the late infection stage are significantly lower than WT mice, thus the inducible ISGs expressions are consequently lower in intestine of METTL3-deficient mice than WT mice in day 4 post infection (Fig. 3f).

      5) The finding that IRF7 is targeted by METTL3 is not convincing. First, the authors performed MeRIP-seq and -qPCR experiments only using RNAs from wild-type IECs not from METTL3-deficient cells. It is necessary to show that the modification levels on IRF7 mRNA is indeed reduced upon METTL3 deficiency. Second, it is unclear if MeRIP-seq is properly performed or not, because there is no quality checking figure shown. For instance, the authors can generate metagene plots or gene logos of m6A modified sites to see if there is any consistency with previous reports. Third, in Figure 2h, the authors should show that the change in luciferase activity between wild-type and mutant Irf7-3'UTR reporters is dependent on METTL3 activity by performing METTL3 knockdown or knockout. Also, the authors should describe how they mutagenize the sequences for clarification. Fourth, in Figures 2F and 3C, they showed that IRF7 is upregulated in METTL3-deficient IECs while in Figure 3F, IRF7 is conversely downregulated in METTL3-deficient IECs. This is apparently contradictory to each other.

      We appreciate the valuable suggestion provided by the reviewer to improve our manuscript.

      1. We have done RIP-qPCR in Mettl3 knock-down and WT MODE-K cells to verify the m6A modification on IRF7 mRNA, the modification levels on IRF7 mRNA is indeed reduced upon METTL3 deficiency (see below and Fig. s5c, s5d). We have added the description of the experiment in the manuscript.

      Figure s5. Characterization of m6A modifications on Irf7 mRNA. (c) m6A-RIP-qPCR confirms Irf7 as an m6A-modified gene in IECs. Fragmented RNA of sgEV and sgMettl3 MODE-K cells was incubated with an anti-m6A antibody (Sigma Aldrich ABE572-I). The eluted RNA and input were processed as described in ‘RT-qPCR’section, the data were normalized to the input samples (n=3, mean ± SEM, Statistical significance was determined by Student’s t-test (\P < 0.05, **P < 0.005, NS., not significant). Tlr3 and Rps14 were measured with m6A sites specific qPCR primer as positive control and negative control, Irf7 was measured with predicted m6A sites specific qPCR primers. (d) Knock down efficiency of METTL3 in MODE-K cells.*

      1. We have performed metagene plots as suggested. As shown in figure s5b, the m6A peak is enriched near the stop codon and 3’UTR region, which is consistent with previously study (Xuan et al. 2018; Dominissini et al., 2012; Yang et al., 2019). We have added the description in the manuscript.

      Figure s5. Characterization of m6A modfications on Irf7 mRNA. (b) Metagene plots of m6A modified sites.

      1. We have performed the luciferase assay in WT and METTL3 knockdown 293t cell, and found increased luciferase activity in mutant Irf7-3'UTR reporters is dependent on METTL3 activity (see below and fig. 2h, s5e). We have added the description of the experiment into the manuscript.

      Figure 2. Mettl3 deficiency in intestinal epithelial cells results in decreased m6A deposition on Irf7, and increased interferon responses. (h) Relative luciferase activity of sgEV and sgMettl3 HEK293T cells transfected with pmirGLO-Irf7-3’UTR (Irf7-WT) or pmirGLO-Irf7-3’UTR containing mutated m6A modification sites (Irf7-MUT). The firefly luciferase activity was normalized to Renilla luciferase activity (n=3, mean ± SEM). Statistical significance was determined by Student’s t-tests between genotypes (\P < 0.05, NS., not significant).*

      Figure s5. Characterization of m6A modifications on Irf7 mRNA. (e) Knock down efficiency of METTL3 in 293t cells used for luciferase assay.

      1. IRF7 is an ISG. The expression of IRF7 is controlled by both PAMP (such as virus component)-induced transcription and post-transcriptional regulation like m6A modification mediated mRNA decay. In steady state or early stage (2d) of rotavirus infection, there is no virus or the viral loads is comparable in both Mettl3^△IEC mice and WT mice, thus, IRF7 expression is mainly regulated by m6A and is higher in IECs from Mettl3^△IEC mice in comparison with that from WT mice. However, as the RV loads in Mettl3^△IEC mice during the late infection stage are significantly lower than WT mice, in this case, IRF7 expression is mainly regulated by the PAMP from virus, thus the inducible IRF7 expressions is consequently lower in intestine of Mettl3^△IEC than WT mice in day 4 post infection (Fig. 3f).

      6) It is unclear if the augmented expression of IRF7 per se upregulates IFN and ISG expression. Since IRF7 exerts its transcriptional activity upon phosphorylation, the authors should examine IRF7 phosphorylation and total protein levels in METTL3-deficient IECs. Also, it is interesting to see if the phosphorylation of TBK1 is augmented or not.

      We have provided the phosphorylation and total protein levels of IRF7 and TBK1 in MODE-K cells treated with poly I:C. Both total IRF7 and phosphorylated IRF7 are upregulated in Mettl3-knock down cells compare to control cells (see below and Fig s5f). However, Both total TBK1 and phosphorylated TBK1 remain unchanged (Fig s5f), suggesting the augmented ISGs are less likely due to the activation of the upstream signal of IFN.

      Figure s5. Characterization of m6A modifications on Irf7 mRNA. (f) Western blot analysis of sgEV and sgMettl3 MODE-K cells transfected by lipo3000 with 2ug/ml poly I:C at indicated hours post transfection, at least three replicate experiments were performed.

      7) In Figure 3, the authors utilized METTL3 and IRF7 deficient mice to show the contribution of METTL3-mediated IRF7 regulation in rotavirus infection. However, if IRF7 is totally abrogated, IFN production should be greatly impaired as shown in Figure 3A. Thus, it is not surprising to see that the IFN response is diminished. The authors can use heterozygous IRF7 deficient mice instead to check if upregulation of IRF7 under METTL3 deficiency is critical to control rotavirus infection.

      We thank the reviewer for pointing out an important issue. However, we checked the IRF7 expression levels in IECs from Irf7^+/+ , Irf7^+/- and Irf7^-/- mice and found that there is no difference between IRF7 levels in IECs from Irf7^+/- mice and that in IECs from Irf7^+/+ mice. Thus, it is not feasible to use heterozygous IRF7 deficient mice to test the idea (Supporting Figure 1).

      Supporting Figure 1. WT and Irf7 Heterozygous mice show same IRF7 expression level in IECs. (a) IECs from 2-weeks-old Irf7^+/+ , Irf7^+/-, Irf7^-/- mice were isolated. Western blot analysis show IRF7 expression level in different mice. (b) Quantitative analysis of (a) (mean ± SEM), statistical significance was determined by Student’s t-test ( \**P < 0.001, NS., not significant).*

      8) Given no effect of ALKBH5 knockout on rotavirus infection as shown in Figure 4, it is questionable if ALKBH5 has a profound role in the regulation of m6A in IECs. The authors should determine if m6A modification levels are increased in IECs under ALKBH5 deficiency.

      We performed the m6A dot blot assay to detect m6A modification levels in ALKBH5-knock down MODE-K cells and we do find an increase of m6A modification level under ALKBH5 deficiency (see above and Fig s10). No effect of ALKBH5 knockout on rotavirus infection actually puzzled us as well before (Fig.4c, 4d and 4e), until we found RV infection down-regulated ALKBH5 expression in the intestine of WT mice (Fig.4a).

    1. Author Response:

      Reviewer #1 (Public Review):

      In this paper, the authors describe a MRI-based functional connectivity mode for the striatum and attempt to show that it is related to dopaminergic input from the midbrain. Currently, dopaminergic input can only be assessed in humans with radionuclide imaging modalities (PET and SPECT), which have poor spatial resolution, relatively long acquisition times, and require radioactive tracers. The MRI-based method would provide higher resolution and greater accessibility, and moreover, can be applied retrospectively to data that has already been collected. The authors use multiple lines of study to build the case: comparison to DaT SPECT, which shows the distribution of dopamine transporters; alteration in Parkinson's Disease, where dopaminergic input is known to be reduced; and relation to alcohol and tobacco use in healthy volunteers, where dopamine signalling in the brain's reward processing pathway is altered. The combination of clinical, behavioral and imaging experiments to validate the MRI biomarker of dopamine input is the major strength of this study. Not only is the biomarker altered as expected in each case, but the alterations also exhibit regional specificity that is consistent with prior reports often obtained with invasive measurements. A direct validation of the biomarker would require invasive histology that is clearly impossible in healthy humans, but while any single finding from one modality would be less convincing, taken together, they provide sufficient circumstantial evidence to motivate further use and investigation of the biomarker. The authors use quantitative techniques to characterize the change in the functional connectivity mode and find truly impressive correspondence with the SPECT measurements of DaT at the group level. As expected, the correspondence is weaker at the individual level, but still respectable. The authors show substantial individual data throughout the manuscript in addition to the group data, which increases confidence in both their results and the potential utility of the biomarker in the clinic. For example, the relationship between symptom severity after L-DOPA and changes in the biomarker at the individual level is very encouraging. The least convincing aspect of the manuscript was the relationship between the connectivity mode and the amount of tobacco use (Fig 6, top) where the line fit looks as if it may have been driven by two very high use points. Given the strength of the other findings, even if the relationship with tobacco use does not completely hold up, it detracts very little from the overall study. The lack of a difference in the biomarker between the left-dominant Parkinson's group and the control group is also a bit surprising. Given the discussion about flooring effects, it may be a power effect, but it definitely warrants more investigation in the future.

      We thank the reviewer for providing feedback to improve this manuscript and for acknowledging the many strengths and importance of our work. To make sure that the relationship between the connectivity mode and the amount of tobacco use (Figure 6) was not driven by outliers, we first determined if the two high use points of 175 and 195 cigarettes a week with TSM (linear X) values of 1.395 and 1.440 respectively, are outliers. The median and interquartile range (IQR) of this distribution are 1.272 and 0.122 respectively. Accordingly, both high-value points just fall outside the Q1-Q3 IQR of 1.150 – 1.394, but the first datapoint of 1.395 is still within 2 standard deviations from the mean (1.268+2*0.0803=1.429) and the second datapoint (1.440) is still within 3 standard deviations from the mean (1.509). As such, we do not consider these data points as extreme outliers that need to be removed from our analysis. We nevertheless repeated the GLM analyses testing for associations between the amount of tobacco use and second-order connectivity mode without these two subjects and the association was still significant (X 2 =46.14, p=0.004). It is also important to keep in mind that our sample is population-based. While the corresponding usage of cigarettes (175 and 195 cigarettes a week corresponding to 25 and 28 cigarettes a day) is at the high end in this particular population-based sample, this amount of use is not uncommon in regular smokers. As for the lack of findings with respect to left-dominant PD patients we agree that here we may be suffering from a lack of power. Nevertheless we feel that this is worth reporting for the sake of completeness, if only to indicate that as a straight-forward hypothesis, it did indeed get tested.

      Reviewer #2 (Public Review):

      This is an excellent paper with an excellent outstanding methodology and sequence of steps which contains many strengths

      • First, they apply a novel fMRI resting state functional connectivity method, connectopic mapping (CM). This is validated in a large standard data set, the HCP fMRI, in around 800 healthy subjects.
      • Secondly, they use the measurement of a striatal DA transporter, DaT SPECT, in a large number of subjects (around 200) to establish spatial correlation with fMRI connectopic mapping.
      • Thirdly, they measure subjects where striatal dysfunction is known to be altered. Parkinson disease (PD) with L-Dopa therapy; this serves the purpose to the direct impact of dopamine deficiency (D2-receptors) and dopamine replacement therapy (L-Dopa) on striatal connectopic mapping
      • Fourthly, they further support that by scanning people with daily alcohol or nicotine consumption whose degree of substance use corelates with the striatal connectopic mapping.

      We thank the reviewer for providing feedback to improve our manuscript and for acknowledging the excellence of our methodology and the strengths of our work.

      Some weaknessness shall be mentioned.

      • I was wondering how their striatal DA connectopic marker stands in relation to others like melatonin-sensitive MRI (Cassidy et al.PNAS and others). This should at least be discussed. Ideally, they do melatonin scanning in their sample and correlate it with their striatal connectopic marker. This would provide the opportunity to more directly validate their marker.
      • Another issue is the biochemical specificity. The striatum contains also many glutamatergic (medium spiny) and gaba-ergic neurons which are key in mediating DA effects as the latter (as far as I know) terminate on the former. Moreover, it is known that rsFC is related to excitation-inhibition balance and thus to glutamate-GABA. How can the authors make sure that their cortical conenctopic maps are really related specifically to DA rather than glutamate and/or GABA? This is even more urgent given that we know glutamate changes in alcohol and/or smoking and also in PD to be prevalent.
      • It would be good if this issue of specificity could be addressed. Like in people who receive ketamine (anti-NMDA): if the authors' connectopic marker is specific for striatal DA, it should not be changed under NMDA treatment.
      • Another way is to conduct computational modelling: modulation of glutamate/GABA should ideally not affect the striatal cortical connectopic marker....
      • Some key literature should be cited and discussed: Conio et al. 2020 establishes a model of DA projects and their implications for psychiatric disorders
      • Yet another issue is the question for serotonin. Various papers by Marinto/Magioncalda in especially bipolar disorder recently established modulation of nigral D2 by raphe-based serotonin. This should be discussed at least: Could the connectopic marker be related to such modulation? How could they make sure that their marker is related exclusively to cortical D2 projections rather than cortical serotonine effects? I am aware that these are tough questions but they should at least be addressed in the discussion...
      • Moreover, the striatum is a complex region with subdivisions like dorsal and ventral which again can be featured by different dopamine systems (D2 vs D1/5) - this should be probed in their data to enhance specificity for nigral-based D2 of their connectopic marker....

      The above points nearly all relate to the specificity of the second-order connectivity mode to dopaminergic projections. We refer the reviewer therefore to our response to comment 1 of the essential revisions. Here we conducted additional analyses demonstrating that the mapping of the second-order connectivity mode onto the DaT SPECT scan is far superior compared to the PET tracers available for other neurotransmitter systems, such as the serotonin and GABA system. Further and in addition to our response above it appears that our sensitivity analysis does not suggest a strong differentiation of the second connectivity mode relative to D1 or D2 receptor distribution but instead segregates either of these from the distribution of the DaT. We unfortunately do not have access to melatonin-sensitive MRI data or high-resolution fMRI data of patients. While the reviewer has many excellent suggestions these therefore need to remain the subject of future studies.

      Reviewer #3 (Public Review):

      The study provides an impressive breadth of analyses, including comparisons to SPECT imaging, Parkinson's patients, drug manipulation and behavior, which build to form a compelling case that the identified patterns of functional connectivity. The surface modeling approach employed provides an interesting alternative to more standard parcellation approaches, which highlights the possibility that organization with the striatum occurs along gradients, rather than within functionally or anatomically circumscribed regions. Importantly, the findings have potentially wide-ranging implications and applications, since striatal dopamine (DA) and cortico-striatal connectivity are of great interest across a wide variety of fields, including their variation across the lifespan, disruption in various clinical populations, and contribution to normative behaviors.

      We thank the reviewer for providing feedback to improve our manuscript and for commenting on the breadth of analyses and potential wide-ranging implications of our work.

      While the surface modeling approach has some appealing features, it is a rather complex approach that is hard to understand intuitively. The difficultly to grasp its nuances limited my ability to follow some of the interpretations provided. For example, an important aspect of the results is that only the second order mode of the functional connectivity profile (and not the 0th or 1st order modes) are associated with dopamine measures and manipulations, but I found it difficult to assess what these different modes are capturing. Are these overlapping modes of distinct aspects of connectivity (each of which is expressed to a different extend), or different characterizations of the same pattern? Do the modes represent the extent to which different striatal regions exhibit the same pattern of cortical connectivity, or is the connectivity pattern also shifting? Some additional clarity on these patterns would have greatly helped me understand the subsequent results. Similarly, in the results of PD patients, it is stated "we can interpret the observed alteration in the connection topography as a decrease in dopaminergic projections to striatum." (l. 242). A decrease in the quadratic term of the TSM would seem to indicate less spatial variability, but not obviously an overall decrease, which would seem instead to be reflected by the 0th order term (if I understand these modes correctly). Some clarification on this interpretation, and more description of the modes in general, would be helpful.

      We acknowledge that our connectopic mapping method and the subsequently applied trend surface modeling (TSM) approach might not be as intuitive and easy to understand as more traditional functional connectivity approaches. This is largely due to classical approaches neglecting the presence of functional multiplicity, i.e., the fact that within brain regions neural computations can contribute to multiple cognitive processes. In short, connectopic mapping yields a set of overlapping, but independent connection topographies or “connectivity modes” that together describe the functional organization of a brain region. In Haak et al 2016, we demonstrated for example that in V1 we can detect separate gradients that reflect sensitivity to orientation and eccentricity– cortical organisations that can also be probed experimentally using retinotopic mapping procedures. Likewise, when applying connectopic mapping to the striatum, the obtained connection topographies indicate how the connectivity profile with the rest of the brain changes across striatum. Voxels that have similar colours in these connectivity modes have similar connectivity patterns with the rest of the brain. Which aspects of functional connectivity these modes are precisely capturing depends on the region of interest investigated and is furthermore difficult to predict beforehand, especially for the higher-order connectivity modes. Regarding the striatum, we showed in previous work (Marquand et al., 2017) that the dominant (zerothorder) mode represents its basic anatomical subdivisions, while the first-order mode maps on to a ventromedial-to-dorsolateral gradient associated with goal-directed behaviour in cortex that has been described previously on the basis of tract-tracing work in non-human primates. In this manuscript we subsequently provide evidence that the second-order striatal connectivity mode maps onto dopaminergic projections.

      We have now clarified our approach in the legend of Figure 1: “Then similarity between voxels is computed using the η2 coefficient, resulting in matrix S. Manifold learning using Laplacian eigenmaps is then applied to this matrix, yielding a set of overlapping, but independent connection topographies or “connectivity modes” that together describe the functional organization of the striatum. These connection topographies indicate how the connectivity profile with the rest of the brain changes across striatum. Voxels that have similar colours in these connectivity modes have similar connectivity patterns with the rest of the brain.”

      Further, we have also clarified the trend surface modeling (TSM) approach in the Materials and Methods section:

      “Finally, to enable statistical analysis over these connection topographies, we fitted spatial statistical models to obtain a small number of coefficients summarizing the second-order connectivity mode of each striatal subregion in the X, Y, and Z axes of MNI152 coordinate space. For this, we use ‘trend surface modelling’ (TSM; 27), an approach originally developed in the field of geostatistics, but that has wide ranging applications due to its ability to model the overall distribution of properties throughout space as a simplified surface. Here we use the TSM approach to predict each individual subject’s connection topography by fitting a set of polynomial basis functions defined by the coordinates of each striatal location. …. This criterion strongly favoured a polynomial of degree 2 for the putamen subregion and a polynomial of degree 4 for the caudate-NAcc subregion. This means that the connectivity mode in putamen was modelled with linear and quadratic functions in the X, Y, and Z direction of MNI152 coordinate space (6 TSM coefficients) and the connectivity mode in the caudate-NAcc region with linear, quadratic, cubic and quartic functions in the X, Y and Z direction of MNI152 coordinate space (12 TSM coefficients). The TSM coefficients of the fitted polynomial basis functions describe the rate at which the connectivity modes changes along a given spatial dimension and can be used for statistical analysis.”

      Regarding the following statement: "we can interpret the observed alteration in the connection topography as a decrease in dopaminergic projections to striatum.", we would like to point out that we first used a GLM omnibus test of all TSM coefficients modelling the second-order connectivity mode to investigate whether an association with UPDRS symptom severity was present. Post-hoc Pearson correlations then revealed that this association was driven by the quadratic TSM coefficients modelling the putamen region in the Y and Z direction of MNI space. Next, we plotted the association with UPDRS symptom severity for the quadratic Y coefficient as well as show five of the connectivity modes at varying UPDRS symptom severity for visualization and interpretation purposes in Figure 4B. The interpretation above is based on visual inspection of these five connectivity modes shown in figure 4B (in light of the similarity between the second-order connectivity mode and the DaT SPECT scan shown in Figure 2). We hope that this answer sufficiently clarifies our interpretation.

      Several common confounds for rsFC analyses, especially head motion, are not sufficiently well addressed as to ensure that they do not contribute to the spatial patterns reported. Specifically, the second-order fit would seem to capture some sense of the "sharpness" of the spatial connectivity profile in the striatum. This seems like it could be driven either by neurophysiological features regarding the functional segregation of these regions, or data quality features regarding the smoothness of the data. Since one effect of head motion (in both resting state fMRI and other domains such as PET/SPECT) can be to change the spatial smoothness of data, it would be important to characterize how much of the variance in this measure can be accounted for by head motion (or other confounds). This is especially true since such confounds are known to be greater in, e.g., patient populations, which could affect the analyses performed later.

      We agree that head motion can indeed have a profound impact on resting-state functional connectivity analyses. We have now added several post-hoc sensitivity analyses to the supplementary materials strongly demonstrating that our findings are unlikely to be confounded by head motion. For a more extensive description, we refer to our detailed response to comment 2 in the essential revisions section of this document.

      Finally, the findings are at various points referred to as a potential biomarker for dopamine (dys)function. While this term has been used in a wide range of contexts, such claims generally require a greater burden of proof than the presence of statistically significant associations, e.g., including classification and/or sensitivity/specificity analyses. These assertions do not yet seem well supported by the included statistics, and may need clarification.

      Indeed, to ultimately proof that our connectivity mode can be used as a biomarker for dopamine (dysfunction) would require invasive histology, which is impossible in healthy humans and in the context of this study. As such, we cautiously refer to our connectivity mode as a ‘potential’ biomarker for dopaminergic (dys)function and also state in the discussion that more research and out of sample replication is needed. We believe that, while each of our findings in isolation would be insufficient to claim that the second-order striatal connectivity mode could be used a potential biomarker, all our findings together provide sufficient circumstantial evidence to motivate the further use and investigation of this connectivity mode as a biomarker. In particular, the direct within-subject mapping of the connectivity mode onto the DaT SPECT scan (acknowledged as being a biomarker) and the finding that our connectivity mode is sensitive to acute dopaminergic modulation suggest specificity to dopaminergic function. Furthermore, we also conducted an additional analysis (see Figure 2–figure supplement 1) comparing the spatial mapping of DaT SPECT to the second-order striatal connectivity mode, to various other PET derived neurotransmitter systems. This analysis revealed that the TSM coefficients describing the DaT SPECT scan provide a much better fit to the data than TSM coefficients describing any other PET derived neurotransmitter system.

    1. Author Response:

      Evaluation Summary:

      This study examines genetic and non-genetic factors influencing immune responses in type 1 diabetes Key findings are: 1) age and season affect immune cell traits and cytokine production upon stimulation; 2) certain genetic variants that determine susceptibility to T1D significantly affect T cell composition, notably the CCR region that is associated with CCR5+ regulatory T cells; and 3) 15 genetic loci that influence immune responses in T1D, most of which have not been seen previously in healthy populations. The results suggest mechanisms of T1D-specific genetic regulation.

      We thank the reviewer for the appreciation of our data quality and approach. We have tried to bring more focus in the conclusions, partly by taking out the whole non-genetic section.

      Reviewer #1 (Public Review):

      Strengths of the manuscript include the important research question addressed, the robust functional genomics methodology used, the relatively large sample size, and the translational implications of the study findings that pinpointed new potential drug targets in autoimmune diabetes. Weaknesses include the analysis of immune responses at a certain time point that may not represent the dynamic immune phenotype of the disease over time, the testing of immune responses in peripheral blood mononuclear cells (PBMC's) that may not represent the islet infiltrating immune cells that cause autoimmune diabetes, using generic stimulants to activate PBMC's instead of beta-cell autoantigens, and that the QTL analysis may not be relevant to the etiology of autoimmune diabetes as it identified QTLs associated with immune cell proportion and cytokine production, but these do not necessarily influence the development of autoimmune diabetes.

      We thank the reviewer for the fair assessment of our manuscript. We fully agree with the reviewer that the study in relevant tissue at different time points could be very important for understanding type 1 diabetes, however, tissue immunity could be partly reflected by the changes in circulating level of immune cells and cytokine production capacity, since the islet infiltrating immune cells do originate from circulating blood cells. We have modified the manuscript by adding more discussion about this topic.

      “The data presented in our study are generated from PBMC. While these likely reflect overall immune function, some immune cell types may not be captured and all over the findings refer to changes in circulating factors that may not necessarily reflect changes occurring in relevant immune organs, such as pancreatic islets, gut or lymph nodes. Still, islet infiltrating immune cells do originate from circulating blood cells, and circulating chemokines/cytokines are important in activating and recruiting immune cells. Hence, the circulating level of immune cells and cytokine production capacity is probably relevant for local tissue immunity.

      Reviewer #2 (Public Review):

      This manuscript presents data collected from two cohorts of individuals, one including patients with type 1 diabetes, the other encompassing non-diabetic persons. Of note, the cohorts are not contemporary and samples from the two groups were collected several years apart (2013/14 for controls, 2016/17 for the diabetic group). This is not an issue for any genetic comparisons. However, comparing immune phenotypes in non-contemporary cohorts, particularly with respect to seasonal variations as the authors attempt in some of their analyses, is not useful as it lacks the rigor of collecting samples under identical conditions.

      We thank the reviewer for raising this point. In order to focus on “genetics part” as suggested by the reviewers, we have taken the non-genetic associations, including seasonal effects and age, completely out of the paper and have rewritten the paper accordingly. Hence also one figure was removed, the others were renumbered.

      This caveat aside, the overall aim of the study was to compare the function of immune cells, with a focus on the distribution of various cell populations and their cytokine secretion, between individuals with and without type 1 diabetes. Many of the analyses are difficult to interpret because the authors use measures and correlations for which the rationale is not well explained and whose presentation in the rather busy figures lacks detailed descriptions. There is no doubt that the authors amassed a substantial amount of data in what appears to be an ambitious study of hundreds of blood samples. However, the authors do not do their data justice by failing to present it in a easily comprehensible and interpretable data. Much of the description of the results makes the assumption that readers are familiar with the very particular way the authors analyzed the data (e.g. refering to parental and grandparental percentages, where it is entirely unclear what the authors are refering to).

      We thank the reviewer for the suggestions. We have modified the figure legends by adding more information to explain the results and help the readers to acquire a better understanding. We have included a supplementary table S6 describing how parental and grandparental percentages were defined and the immune cell gating method could be found in our previous study (Aguirre-Gamboa et al., 2016) and in Figure 1 - Figure supplement 2.

      Many of the observations presented are trivial and could have been omitted from the manuscript, for example showing that the immune system acquires more memory lymphocytes as people age, with no apparent difference between the groups studied. The fact that our immune system gets more experienced as we age is both unsurprising and a well known phenomenon. Similarly, the correlations between immune cells and cytokine secretion compared between groups yield no discernable differences and this could have been summarized much more succinctly in the interest of clarity. The more interesting data relating to gene variations that appear to impact immune phenotypes could have been given more weight in the overall manuscript to better describe them and discuss possible implications.

      In sum, this is a manuscript with a very large data set whose presentation lacks focus on the key points that would emphasize the novelty of the findings put forward by the authors. As such, it is not very accessible to a general readership.

      We thank the reviewer for the comments and suggestions. We agree with the reviewer that the genetic regulation is probably the most interesting and novel, hence we have modified the manuscript to focus more on genetics.

    1. Author Response:

      Reviewer #1 (Public Review):

      The paper clearly indicates that by using parallel fMRI and ECoG experiments, the authors are able to detail the hierarchy of predictive coding in the cortical and higher subcortical areas of the auditory pathway. The methodology is well detailed and I didn't spot any major concerns.

      The scientific methodology detailed in this paper appears to be sound. Further, the main conclusions appear to be well argued.

      We thank the reviewer for the positive comments.

      The statistical analysis, however, is not reported clearly in the main text. For instance, I'm unsure how multiple comparison correction was addressed. A more detailed primer on the statistical methods used in the results section is warranted.

      In the fMRI analysis, to assess the novelty responses as a function of the different number of sequence types, we performed a within-subject one-way ANOVA design in SPM, where the single-session contrast images corresponding to trial types were introduced as within-subject factor. To directly compare the responses of different novelties, we defined each type of contrast using the pair-wise t-test. We initially observed the results with a threshold of uncorrected p < 0.001 at the voxel level, and then considered the results as significant at p < 0.05 with false discovery rate (FDR) corrected for multiple comparisons across the brain (Cacciaglia et al., 2019; Uhrig et al., 2014). If no voxel survived FDR correction, then a threshold of uncorrected p < 0.001 was used. In the ECoG analysis, we performed the independent-sample t-test for each comparison in TFRs. The multiple comparison problem originates from the fact that ECoG data are multidimensional. For ECoG data from a single electrode, the signals are sampled at multiple frequencies and multiple time points. Therefore, we used a nonparametric cluster-based permutation test for multiple comparisons over frequency and time (Chao et al., 2018; El Karoui et al., 2015). To report the statistical analysis more clearly, we have added the details about statistical methods of fMRI and ECoG analyses, and multiple comparison correction, in the results of the main text. Please see the section of 1 st -level (local) novelty (xY sequences).

      My largest concerns are to do with communication, and language overreach. At one point the term "lower auditory pathways" is used, but the lowest portion investigated in this study is the IC, and this usage was in reference to the thalamus. There's a lot of brain between the IC and cochlea, to say nothing of the thalamus. There are also concerns about both the temporal and spatial resolution of fMRI and ECoG - the text at times implies that the resolution for these techniques is far greater than it is. However, these are communication issues that should be easily addressed.

      We are grateful for the reviewer’s suggestions. The temporal and spatial terms we used in the last version were based on our paradigm and recording methods. With 9.4 T fMRI, the lowest area of the auditory pathway that we can assess is the midbrain. Thus, we limited the observed range of the auditory pathway from midbrain to frontal cortex (as shown in Figure 1C). In the design of our paradigm, the local auditory information focus on the millisecond timescale, and the global auditory information refer to the second timescale. Therefore, we defined the temporal range from millisecond to second. However, we have realized that the temporal and spatial terms we used were not rigorous and may cause ambiguity. Throughout the revised manuscript, we have rewritten them as lower- and higher-level areas, shorter- and longer-time scales.

      Reviewer #2 (Public Review):

      In this study, Jiang et al. combined whole-brain 9.4 T functional magnetic resonance imaging and large-scale electrocorticography to study brain wide activation patterns in response to different pattern violations in marmosets. The authors confirm previous results of a cortical hierarchy for auditory predictive processing and expand on these results by quantifying subcortical responses in MGB and IC as well as using omission to confirm previous results obtained with mismatches. The results highlight the existence of the two levels of auditory prediction signals in the marmoset brain that can be interpreted in a hierarchical predictive processing framework.

      The paradigm used to assess the hierarchical depth of predictive auditory sequences for processing predictions errors and prediction updates at two distinct timescales is well designed, and presumably based on one of the authors earlier studies (Chao et al., 2018). Unfortunately, the current study fails to highlight the novelty of this work (as far as we can tell, mainly the omission responses) and give adequate credit to previous work on the topic. However, this can be easily fixed by rephrasing the relevant passages of the manuscript.

      We thank the reviewer for the positive comments. We have now revised the Results and Discussion to provide more details about the omission responses and discuss the contribution and novelty of omission sequences in the hierarchical predictive coding. Please also see the reply to Q1 of the Main concerns.

      Main concerns:

      1) It would be good to clarify what the novelty of the present manuscript is (omission responses) in comparison to the previous work (Chao et al., (2018)). The authors do argue that their higher resolution fMRI, allows them to also study subcortical response - which is correct - but the authors make no use of them in any meaningful way in the manuscript. The emphasis on novelty is likely better placed on the omission responses.

      We thank the reviewer for the constructive suggestion. In the revised Discussion, in comparison to the previous work (Chao et al., 2018), we have specifically emphasized the novelties of the present study.

      -First, the model described the 1st - and 2nd - levels of violations (prediction and error) in the present study is novel and more straightforward. Instead of using the partial- or full-global predictions in the Chao et al. model, which is challenging to interpret, we first introduce the sequences with xx and xY as separate internal templates. Similarly, as we mentioned in the discussion, although the local-global paradigm has been intensively studied in humans and macaque monkeys (Chao et al., 2018; Uhrig et al., 2014; Wacongne et al., 2011), most studies tested the global violation by combining xx|xY and xY|xx novelties, which, in fact, contain two different types of predictions. Our study is the first to separate the two novelties and search for their neural representations, respectively. This is important because xx|xY novelty was only involved in the 2nd-level signal with the xY sequence as the internal sequence template, and the xY|xx novelty was involved in both 1st - and 2nd -level signals (the 1st-level novelty triggers the 2nd-level novelty), where the xx sequence was the internal template (see Discussion).

      -Second, this is the first study to construct the hierarchy of predictive auditory sequences in the marmoset brain using fMRI. Our results extended the hierarchical organization of predictive coding from the cortex to the subcortical regions. To emphasize the importance of this animal model, we have added a section of Marmosets as an animal model for auditory sequences in the Discussion.

      -Third, most importantly, as suggested by the referee, the omission responses is indeed novel. To highlight it, we have expanded the results of omission and provided more discussion of its contribution to the hierarchical predictive coding.

      2) Figure 3C (and all similar figures). We fear this figure is not interpretable without a substantially improved explanation. Both what the arrows mean (i.e. how they are computed), and what the values indicate that are listed next to the arrows is not explained (arrows appear randomly bi- or unidirectional and the legend at the bottom of the figure is not very helpful).

      We apologize for the missing details in Figure 3C. The color dots in the brain diagrams indicate the electrodes with significant responses found in corresponding comparisons, which were subsequently used in the functional correlation test (see Materials and Methods). Lines represent significant functional correlations between signals from the paired brain diagrams. Labeled values close to lines provide the Pearson correlation coefficient (p-value) of the corresponding correlations. Unidirectional arrows indicate relative temporal orders at which the signals appear, while bidirectional arrows indicate uncertain temporal orders of the signals. Figure 3D, 5C and D, Figure 3-figure supplement 1C and D, Figure 5-figure supplement 1C and D have the same format as Figure 3C. Accordingly, we have revised the legend of Figures 3C and D, 5C and D and added more explanations in the Results.

      References:

      Cacciaglia, R., Costa-Faidella, J., Zarnowiec, K., Grimm, S., & Escera, C. (2019, Feb 1). Auditory predictions shape the neural responses to stimulus repetition and sensory change. Neuroimage, 186, 200-210. https://doi.org/10.1016/j.neuroimage.2018.11.007

      Chao, Z. C., Takaura, K., Wang, L., Fujii, N., & Dehaene, S. (2018, Dec 5). Large-Scale Cortical Networks for Hierarchical Prediction and Prediction Error in the Primate Brain. Neuron, 100(5), 1252-1266.e1253. https://doi.org/10.1016/j.neuron.2018.10.004

      El Karoui, I., King, J. R., Sitt, J., Meyniel, F., Van Gaal, S., Hasboun, D., Adam, C., Navarro, V., Baulac, M., Dehaene, S., Cohen, L., & Naccache, L. (2015, Nov). Event-Related Potential, Timefrequency, and Functional Connectivity Facets of Local and Global Auditory Novelty Processing: An Intracranial Study in Humans. Cereb Cortex, 25(11), 4203-4212. https://doi.org/10.1093/cercor/bhu143

      Uhrig, L., Dehaene, S., & Jarraya, B. (2014, Jan 22). A hierarchy of responses to auditory regularities in the macaque brain. J Neurosci, 34(4), 1127-1132. https://doi.org/10.1523/jneurosci.3165- 13.2014

      Wacongne, C., Labyt, E., van Wassenhove, V., Bekinschtein, T., Naccache, L., & Dehaene, S. (2011, Dec 20). Evidence for a hierarchy of predictions and prediction errors in human cortex. Proc Natl Acad Sci U S A, 108(51), 20754-20759. https://doi.org/10.1073/pnas.1117807108

    1. Author Response:

      Reviewer #1 (Public Review):

      This is a clearly written manuscript describing an elegant study that demonstrates how microsaccades are not the triggers of attentional effects, and that attentional modulations can be observed in the absence of microsaccades. This is a very much needed work, especially in the light of the recent debate regarding whether or not microsaccades are the cause of peripheral attentional effects. By explicitly comparing and quantifying the effects of attention on neuronal responses in the presence and in the absence of microsaccades, this work provides important insights on this debate. I think the work is well conducted and the results are solid.

      We thank the reviewer for their supportive comments!

      I only have few comments/suggestions:

      1. Lines 125-126, the authors report that monkeys generated frequent microsaccades but their overall direction was not systematically biased towards the cue location. This seems to be in contrast with what previously reported in the literature in humans and monkeys. I think this discrepancy should be discussed in the discussion. Is this simply the result of different experimental paradigms (maybe exogenous vs endogenous attention, or the presence of the cue for the entire duration of the trial, ect)?

      As suggested, we discuss three main factors which may contribute to this discrepancy:

      The first factor is the difference in the time window used for microsaccades analyses. Previous reports focused their analyses of microsaccades on the time window immediately after cue onset. In our analyses, the time window focused on is the ‘delay period’ which is hundreds of milliseconds after the cue and the time epoch used in most electrophysiology studies about attention.

      A second factor is how the spatial cues were presented. In our paradigm the cue ring appeared in the periphery and then disappeared. In contrast, previous paradigms used a cue presented near fixation that persisted throughout the trial. Our brief peripheral cue provides less of an impetus to generate small saccades directed towards the cue, compared to the case when the cue is continuously near the center of gaze.

      A third factor is that monkeys in our task were trained to release a joystick to report their detection of stimulus events, rather than make a saccade. Because human and monkey subjects tend to make microsaccades in the same direction as their upcoming saccadic choices (Yu et al., 2016), attention tasks using saccade reports will tend to introduce this direction bias on microsaccades. By using a joystick release, we minimized these lateralized effects related to saccade preparation.

      These points are now addressed in the second paragraph of discussion.

      1. It is very interesting that microsaccades modulate neural responses for stimuli that are much further away from their landing location. However, the stimulus used in these cueing tasks is also unnatural. Normally we are not fixating on a meaningless dot while all the interesting stimuli are presented in the periphery. In normal conditions the foveal input is rich in detail and it is generally relevant (that's why we are foveating certain stimuli in the first place). I wonder if the authors can comment on whether the modulations reported here would also occur in more natural conditions when an interesting and maybe salient/relevant stimulus is presented at the center of gaze, while subjects are also attending to a peripheral target. Will the neural response be modulated selectively for neurons for which the receptive field is on the peripheral target or will it also affect neurons where the receptive field aligns with the microsaccade target location in the fovea?

      The reviewer raises a very good point. In our study, the relationship between microsaccades and attention-related modulation was examined when monkeys selectively attended a stimulus located in the near peripheral visual field while maintaining central fixation. We agree that under more natural conditions, the monkey would just look directly at the peripheral stimulus. As in many attention studies with this type of design, our experiments hold the system in a state of sustained peripheral attention which would otherwise be much shorter.

      We believe that similar modulation at the peripheral location would be briefly observed if the monkey were allowed to satisfy the natural tendency to look at the stimulus, although this would make it more difficult to examine the relationship with microsaccades. This would be consistent with the documented pre-saccadic modulation of attention (e.g., documented by the Carrasco lab, Li, Hanning, & Carrasco, 2021).

      Once the attended stimulus is foveated, there is strong behavioral evidence from several recent studies demonstrating that attention can be selectively distributed even within the fovea (Poletti, Rucci, & Carrasco, 2017). Considering the now substantial evidence that the foveal portion of the SC map is activated when the behaviorally relevant location is at the center of the visual field (e.g., during parafoveal smooth pursuit as in Hafed & Krauzlis, 2008), we expect that SC neurons with foveal RFs would display similar attention-related modulation as we found here. However, to the best of our knowledge, there have not yet been studies documenting the attention-related modulation of neurons with foveal RFs and the possible influence of microsaccades.

      We agree with the reviewer that these are interesting points, and have now added a new paragraph in the discussion (final paragraph) to address this point.

      1. The authors do not report behavioral performance. Presumably the task is very easy, but I wonder if reaction times and performance correct was related with the attentional effects and how did it change with respect to microsaccade direction, e.g., were subjects' reaction times shorter at the cued location also when microsaccades were directed at the opposite location? I think this information would be very valuable.

      We agree it is valuable to document the behavioral performance; we had omitted this because this is the same task we have used in previous studies which do include such behavioral documentation.

      To address the reviewer’s comments, we added an analysis and plot documenting the hit and false alarm rate for each subject in each experimental session. To accommodate this new plot, we have now divided the original Figure 1 (which included task, neuronal data and microsaccades) into a new Figure 1 (task, behavior, and neuronal data) and a new Figure 2 (microsaccades). The new plot showing hit and false alarms is Figure 1b in the revised manuscript.

      The task was not especially easy – we adjusted the amplitude of the color saturation change to be just slightly above the threshold for detection; hence, the hit rates were generally between 75-90%. The performance was very consistent across sessions in our well-trained monkeys, and the low rate of false alarms for ‘foil’ changes provides behavioral confirmation that they attended to the correct stimulus location.

      To address the comments about reaction time, we have added a new plot to our new Figure 2 (Figure 2c) showing the monkeys’ hit rates (top) and joystick release times (bottom) subdivided based on whether there were no microsaccades, microsaccade towards, and microsaccades away from the cued location (-50 to 50ms relative to cued stimulus change onset). These plots show that when there were no microsaccades, behavioral performance was at least as good as with microsaccades. When there were microsaccades, reaction times were slower when microsaccades were directed away from the cued location. As the reviewer may have anticipated, these effects again confirm that differences in attentional state as evident in task performance covary with the direction of microsaccades, and we thank them for the suggestions. We now added a new paragraph in the results to describe these findings.

      1. Another important difference in the paradigm used in Lowet et al vs the one described in this manuscript is that in Lowet et al monkeys were instructed to saccade toward the target position at some point during the trial after the cue and the target presentation. Hence, monkeys presumably prepared the saccade and held off its execution during the time the cue and the target were presented. This was not the case in the current paradigm, where the monkey is instructed to maintain fixation as in a standard spatial cueing paradigm. I wonder if this difference may explain some discrepancies in the results.

      This is a very good point. As mentioned in our reply to point #1 above, previous studies (Yu et al., 2016) have shown that human and monkey subjects tend to make microsaccades in the same direction as their upcoming saccadic choices. As pointed out by the reviewer, in the Lowet et al. study the directions of microsaccades might be related to the motor preparation of the upcoming choice saccade as well as related to the allocation of attention. In contrast, in our experiments, monkeys reported their choice by releasing the joystick and were prohibited from making larger saccades.

      We agree this can be an important factor for the differences in the results, and we now address these points in the second paragraph of discussion.

      Reviewer #2 (Public Review):

      This is a correlative study with the main result that microsaccades do not alter attention-related modulations of neuronal activity. This is an important question, speaking to the origin of one of the mind's most fundamental processes. The experimental manipulations and analyses are well chosen, carefully conducted and visualized. They include critical controls for alternative explanations.

      Thank you for your constructive comments.

      To ascertain their claims, however, it is important that the authors cover their ground. In pursuit of that, a few important analyses are required.

      1. Did the manipulation of attention work? In the present version of the manuscript, the authors do not report behavioral results, which is necessary to confirm that the cue was successful in manipulating attention. That is, the observed modulation in firing (in RF vs outside of RF) should be related to a behavioral advantage in sensitivity to changes at the cued location. To confirm the link of the neural results to attention (rather than, say, just the cue), the behavioral results provide opportunities for critical tests. One way to do this would be to analyze neural firing rates as a function of response rather than cue location (provided subjects made enough errors). Note: A detailed discussion of why the cue cannot be equated to attention can be found in Laubrock et al. (2010, Atten Percept Psychophys; https://doi.org/10.3758/app.72.3.683).

      Yes, the manipulation of attention worked. As suggested, we now document the effectiveness of the attention manipulation by plotting the hit and false-alarm rates for each subject in each experimental session (new Figure 1b). We also confirmed that the SC neuronal attention-related modulation depended on subjects’ behavioral response (new Figure 1d). We also note that these same attention manipulations have been used in previous studies examining the neuronal mechanisms of attention.

      1. Were all microsaccades detected? One of the main results of the study is that attention-related modulations were observed even in the absence of microsaccades. These results hinge on successful detection of all microsaccades, even at a very small scale. Given the video-based eye tracking the authors will have missed a (possibly large) number of smaller microsaccades (Poletti & Rucci, Vision Res, 2016; https://doi.org/10.1016/j.visres.2015.01.018). This concern is exacerbated by the fact that eye tracking was monocular, such that a validation of detected microsaccades based on the signal in the other eye could not be performed.

      We have performed additional microsaccade detection analyses using both more stringent and more lenient thresholds (the "lambda" value of Engbert & Kliegl, 2003). We have verified that our findings are robust over a range of detection thresholds and include a new supplemental figure to demonstrate this point (Figure 4 – figure supplement 2).

      1. Relation to previous claims of causality Hafed (2013, Neuron) reported perceptual changes in attentional cueing that covaried with the occurrence of microsaccades. Hafed (2013) argued that microsaccades might be underlying the performance changes commonly attributed to covert shifts of attention. This point seems central to the current paper's line of argument and should thus be discussed in detail with respect to the current findings. At present, the paper by Hafed (2013) is not cited in the current manuscript when its conclusions may need reconsideration based on the current results.

      We agree, and a similar point was raised by Reviewer #1. We have expanded the main text based on your recommendations.

    1. Author Response:

      Reviewer #2 (Public Review):

      In this manuscript, Johnson Jr, et al. investigated the potency and selectivity of NBI-921352, a novel Nav1.6 blocker, on different voltage-gated sodium channel (VGSC) isoforms as well as on epileptic Nav1.6 variants. NBI-921352 exhibited exquisite selectivity against Nav1.6 channels, preferentially acting on activated channels, and inhibited tested Nav1.6 variants at similar potency except for the R1617Q, a variant that is proximal to the predicted binding site of NBI-921352. Brain slice recordings revealed that NBI-921352 effectively attenuated AP firing in excitatory pyramidal neurons, but not in inhibitory interneurons. Seizure assays in three rodent models demonstrated the protective effect of NBI-921352 on electrically induced seizures in all three models.

      Nav1.6-selective blockers have been reported before, but their relative selectivity between Nav1.6 and Nav1.2 are not great; NBI-921352 is the first blocker that shows a high Nav1.6 selectivity over Nav1.2, making it a promising candidate for the development of therapeutics of Nav1.6-related disorders including early onset encephalopathies and mental disabilities. The study on epileptic variants of Nav1.6 further supports its potential use for the treatment on SCN8a-related diseases, which was confirmed by the seizure assays. NBI-921352 will also be a valuable pharmacological tool in VGSC-related basic research.

      Despite all the wonderful work the authors have completed, there are some issues should be addressed.

      First, different protocols were adopted to examine the selectivity of NBI-921352 on different VGSC isoforms. NBI-921352 is a state-dependent inhibitor, holding potential may alter the potency of NBI-921352 by changing channel activation/inactivation state, and therefore, difference in voltage-clamp protocols could introduce bias in the comparison of selectivity among VGSCs.

      Second, a depolarized holding potential (-45 mV) was used in the study to determine IC50 of NBI-921352 on most VGSCs, which is uncommon under physiological conditions. The selectivity of NBI-921352 on Nav1.6 vs other VGSCs under physiological conditions could be different compared to the values reported here. It is better to hold cells at physiologically-relevant membrane potentials or using action potential waveforms derived from real AP recordings in neurons. The authors should discuss these limitations, and possible impact on their assessment of selectivity against other VGSCs in their native cellular backgrounds.

      There are pros and cons to any method of determining selectivity and we acknowledge that none of them are ideal for all purposes. We chose to focus on what we refer to as “molecular selectivity,” the fundamental ability for a compound to bind to the channel and stabilize the high affinity conformation. We accomplish this by choosing voltages that promote the same fraction of channels to be in the high affinity (inactivated) state. This contrasts with “functional selectivity” that may be largely driven by the distinct state-dependence of different isoforms. Our approach avoids assumptions about what the physiologically relevant voltage is since that voltage can vary depending on the tissue or cell type. For any given isoform there may be multiple physiologically relevant voltages.

      Consistent with this philosophy, we bias all the channels to be in their highest affinity state (inactivated) and then use this maximal potency to compare selectivity. At more hyperpolarized, voltages, potency for all isoforms will tend to be somewhat less. We are adding more explanation of our rationale to the text, and we are adding supplemental data giving more insight into the impact of voltage on potency.

      Figure 1-figure supplement 2 shows the potency of NBI-921352 after holding at a membrane potential nearer the physiologic range (-62mV). Potency at this voltage (IC50 = 53 nM) was similar to that at fully inactivated potentials evaluated in the primary potency assay described shown in Figure 1. For this reason, we anticipate that the selectivity ratios described in the manuscript will be similar to those in physiologic conditions. A note to this effect has been added to the results section.

      Third, Nav1.6 is highly expressed in Purkinje neurons and motor neurons, and plays important roles in motor system. Did the authors observe any motor impairment in the behavior studies? It would be informative to examine the effect of NBI-921352 on AP firing and resurgent currents in Purkinje neurons.

      Fourth, wrong statistical test was used in the current-clamp study, and there is no description of statistical methods used for seizure assays. Please add a section of statistical analysis in Materials and Methods, and list the statistical analysis method used in each experiment.

      We have reanalyzed the data in figure 4 using an AUC based analysis and this is now described in the legend and p values shown in the data transparency file. We have added statistical analysis methods to the methods and the figures legends

    1. Author Response:

      Reviewer #3:

      Weaknesses:

      In utero electroporation as well as other in vivo gene manipulation techniques do not allow fine manipulations of expression gradients. Therefore some conclusions of the paper are not fully supported. Although the data presented in the paper clearly show that Nuropilin1 expression level is important for establishment of homotopic connections, it does not show directly that the gradient of expression indeed is in play, as suggested by the authors. Another week point is that there is no direct evidence that the Neuropilin1 protein level follows the mRNA expression gradient.

      Therefore it remains an open question, whether it is a gradient of expression or a sharp border of cellular response to higher-lower levels of Neuropilin1 that controls area specific connections within somatosensory cortex.

      Another weak point is that the paper relies on in utero electroporation solely. This technique with all its advantages, has some disadvantages too. One of them is high variability of individual experiments. On the other hand, it targets only subsets of cells, and therefore is not the best to address cell extrinsic mechanisms, especially those that involve expression gradients.

      The reviewer raised interesting comments. We would like to clarify that we never attempted to disrupt the gradient per se but to alter Nrp1 expression in individual cells. This evaluates how their projections are affected by the Nrp1 expression imposed by their location and this contributes to understand how the gradient contributes to connectivity. Nevertheless, the reviewer raised a fair point in that we realized it was important to revise the expression of Nrp1 to sharpen our interpretations. In this revised version we have performed in situ hybridization to investigate if we were dealing with gradients or sharp borders. Surprisingly we found unexpected patterns of expression. This is now shown in Figure 1. Interestingly, the expression pattern of Nrp1 in the postnatal brain is highly dynamic. At early stages of CC development in the cortex shows a discontinuity in L2/3 neurons of the SS, rather than a gradient. Nrp1 expression is upregulated after P7 in a manner that suggests a gradual activation from lateral to medial SS cortex. At P16, few cells are positive but they are equally distributed throughout the S1 and S2 cortex. Therefore, we have modified the text and avoided referring to the gradient. This gradient was described at embryonic stages and P0 (Zhou et al., 2013). The new version of the manuscript also adds results showing that changes in Nrp1 expression do not detectably modify contralateral innervation at P10 and that the S2 column is not formed at this stage. The quantification method suggested by reviewer #2 allows us to conclude that the reduction in the S2 columns in the shNrp1 condition, although is statistically significant. Together, the new data provides a better understanding of our phenotypes and explains why the phenotypes of CAG-Nrp1 and shNrp1 are so similar and both block innervation in S2, since they both disrupt the normal transient expression.

    1. Author Response:

      Evaluation Summary:

      In a set of in vitro and in vivo experiments the investigators demonstrated that coating of urinary tract catheters with fibrinogen-degrading substances reduced adhesion and colonization with a broad range of bacteria relevant in the pathogenesis of CAUTI. This approach might, therefore, be interesting for prevention of CAUTI as an alternative to catheters coated with antibiotics.

      We appreciate the summary done. However, this coating doesn’t aim to degrade fibrinogen, it simply reduces fibrinogen’s ability to adhere to the catheter. “Fibrinogen anti-fouling” would be a more accurate description. Additionally, this study not only focused on bacteria but also fungi and thus “microbial” would be a more accurate description of the scope of this study.

      Reviewer #1 (Public Review):

      The major strengths are a clear hypothesis and the consecutive description of a set of experiments, each time demonstrating the next step in the pathogenetic pathway.

      We thank the Reviewer for their supportive and enthusiastic response.

      The weakness is that the experiments stop where the clinical relevance would start. Are the in vitro and in vivo animal experiments representative of the in-human situation?

      We appreciate the insightful comments provided by the Reviewer. This works has a clinical potential based on our data that shows that our use of urine as a media to grow our pathogens for in vitro testing as well as our mouse model of infection recapitulates human CAUTI. Some of our findings are shown in Flores-Mireles, Mbio 2016; Flores-Mireles, J Urol 2016; Flores-Mireles, Nat Rev Microbio 2015; and Flores-Mireles, STM 2014. To emphasize the clinical relevance of this study, we have changed the introduction and discussion.

      Moreover, it does not become clear from the discussion whether this approach of coating is technically feasible. This step towards in-human testing will determine the impact and significance of the work.

      We thank the Reviewer for this feedback. To improve the clarity of the technical feasibility of this coating, we have addressed it in the introduction, results, discussion, and methods.

      Reviewer #2 (Public Review):

      This article provides a detailed account of both in vitro and in vivo experiments that: • Establish the role of fibrinogen (Fg) in the etiology of catheter-associated urinary tract infections (CA-UTI) • Investigate the prevention of CA-UTI with the use of LIS catheters, containing anti-fouling modifications (liquid infused silicone) to prevent the interaction between Fg and common uropathogens.

      The study follows up on previous (by the investigators) research on the role of Fg on the attachment of uropathogens and the formation of biofilms. It is a comprehensive article that contains a detailed description of the following experiments:

      1. In vivo experiments demonstrate the interaction between Fg and uropathogens in the bladder and the catheter lumen.
      2. The manuscript provides in vitro evidence that Fg-coated silicone catheters enhances the binding of uropathogens, compared to uncoated or bovine serum albumin coated catheters.
      3. The manuscript describes the development of the LIS catheter, in which a catheter is drained in silicone gel. It demonstrates the effects of this process on the catheter weight, length and inner and outer membrane diameter.
      4. The manuscript provides in vitro evidence that the use of a LIS-catheter reduces Fg deposition and uropathogens binding.
      5. Using in vivo mouse experiments, the study provides evidence that when introducing a variety of uropathogens and thereby inducing CA-UTI, the use of LIS-catheters reduces o Fg deposition and uropathogens binding on the catheter o uropathogens colonization of the kidneys, spleen and heart
      6. Finally, the manuscript demonstrates in mice that the LIS catheter reduces protein deposition on catheters in case of CA-UTI

      The study has a clear structure and there is little to criticize about the study methods. For steps 4 to 6, they used a control group of uncoated catheters, which they compared with a Mann-Whitney U test. The results, although not all statistical significant, provide convincing evidence for the efficacy of LIS catheters within this study. Another strength of the study is the simplicity of the development and (probably) the limited costs of a LIS catheter, so that it can also be applied in the future in less wealthy countries.

      I identified two potential weaknesses of this study. Addressing these would improve the replication of these findings, the set-up of follow-up studies, also outside your study group, and it would help in the translation and implementation of the LIS catheter in humans.

      First, it is insufficiently clear from the methods how the LIS catheter was developed exactly, and specifically the LIS-catheter that was used for the mice experiments. This complicates the understanding and replication of these study findings. It is not exactly clear for me if these catheters were drained in liquid infused silicone or whether liquid infused silicone was infused into the catheter tuber before insertion? For how long were the LIS-catheters that were finally used for the mice experiments incubated in silicone oil?

      We thank the Reviewer for pointing out where our explanation was lacking. This liquid infused modification was made by submerging the silicone tubing into silicone oil for at least 5 days (for in vitro assay catheter materials) or 30 min (for catheters used in in vivo assays). This information has been added to the results section as well as materials and methods section.

      Second, the article demonstrates that the drainage of a catheter in silicone gel increases the weight, length, inner and outer diameter of the mouse catheter. These results seem to stand alone and are not addressed in the discussion.

      We thank the Reviewer for pointing out this deficiency. These results now are discussed.

      What influence this could have on the urinary flow and the introduction/ascent of uropathogens?

      Currently, we are performing an in-depth characterization of the LIScatheter and their effect in urine flow. This evaluation is out of the scope of this study. This in-depth study will be part of a follow publication. Regarding, the introduction/ascending of uropathogens, our colonization studies have showed a decrease of colonization in the kidneys, suggesting that ascending of the pathogen to the upper urinary tract is affected. Our data shows (Fig. 4) that this modification reduces initial binding of both pathogens and deposition of Fg.

      Could it be that the effect of the silicone gel diminishes over time, which necessitates a catheter change? Do you have evidence on the stability of this polymer?

      We are so excited that the reviewer is thinking about the follow up steps to this study. Currently, we are investigating the long-term stability in urine conditions in vitro, in the bladder in vivo, and in prolonged CAUTI. However, these analyses are out of the scope of this study and will be part of further publications. A study done by Sotiri et al (2018) has shown that this modification has long-term stability in vitro.

      Would it be possible to infuse silicone oil when the catheter is in situ?

      We appreciate the Reviewer’s comments. Based on the time that is needed to fully infuse the catheter, it will be difficult to do it in situ. This will need further investigation under urine conditions.

    1. Author Response:

      Reviewer #1 (Public Review):

      The investigators' goals were to describe the epidemiology and kinetics of post-acute covid lung sequalae and to determine the risk factors predictive of persistent lung impairment. A major strength of the study is the longitudinal observation through 6 months with protocolized clinical assessments that included patient-reported outcomes, lung function tests, inflammatory marker testing, and computed tomography of the chest, in a reasonably sized cohort that reflects the spectrum of disease severity in the pre-vaccination era. We learn a great deal about the different patterns of recovery in this group of COVID-19 survivors. The primary epidemiologic finding is that 52% of survivors continued to have symptoms at 6 months, while up to 72% of those with severe COVID requiring ICU level care continued to have lung abnormalities by chest imaging. This confirms general observations of "long covid" which also encompasses non-lung effects. While lung disease is less common in those with milder disease, the proportion of patients who were never hospitalized but experienced persistent symptoms is striking (50%), with lung function impairment in 17% at 6 months. As expected, the patients who had the most severe disease-those who needed the ICU-had the highest degree of chest imaging abnormalities. The kinetics of recovery is a significant observation: Figure 3 shows that most of the post-acute recovery in structural lung abnormalities occurs in the first 3 months and slows down thereafter, particularly for the hospitalized non-ICU patients. The investigators then embarked on a sophisticated analysis to determine how to predict persistent lung abnormalities (as detected by chest CT) at 6 months. When analyzed individually, among 50 clinical characteristics or lab values, the strongest unfavorable risk factors were elevated IL-6 (an inflammatory cytokine that is the target of tocilizumab) and CRP (c-reactive protein). Other variables that were strongly associated with CT abnormalities included immunosuppressive therapy, ICU stay as well as pre-existing conditions. When machine learning techniques were applied, risk factors that correlated with each other could be grouped together, and the patients could be categorized as low, intermediate, and high risk for delayed pulmonary recovery. As expected, known factors for COVID19 infection (age, male sex, medical comorbidities) and disease severity (need for oxygen therapy, ICU care and antibiotics) were more frequent in the intermediate and high risk groups. These predictive factors at acute COVID and day 60 follow-up mostly held up when tested against part of the cohort that was not used for analysis. Interestingly lung function impairment as measured by pulmonary function tests were only weakly correlated with persistent and severe chest imaging abnormalities.

      The novelty of this study lies in taking the epidemiology a step further with a machine learning analysis to determine which clinical characteristics and chest imaging features at the onset of acute COVID-19 are predictive of later persistent disease. One limitation of this study, however, is that it was conducted on patients in the early part of the pandemic, prior to the widespread use of remdesivir and corticosteroids/anti-cytokine therapies, that are now considered standard of care. Based on these findings, we can now hypothesize that current treatments are likely to reduce the impact of long-covid.

      We would like to thank the reviewer for careful study of the manuscript and appreciation of our work. We agree, that our longitudinal cohort and its hospitalized, severe COVID-19 subset in particular encompasses the patients, for whom the therapeutic armamentarium was limited and far from the therapeutic options available now. Whether novel anti-viral and anti-inflammatory medication as well as, in case of the vaccinated patients, the immunization status may accelerate the recovery or reduce the pulmonary damage is a matter of current research also in our center. We address this issue in the Discussion section to support a clear interpretation of the data by the interested reader.

      Machine learning (artificial intelligence, AI) is now being increasingly used to answer clinical questions on limited cohorts; the application of machine learning in this study contributes to our conceptual understanding of how clinical characteristics and biological factors cluster together to contribute to long-term COVID outcomes. Namely, the profound inflammation that characterizes severe acute COVID-19 pneumonia and poor early outcomes also contributes to chronic lung damage in survivors. In addition, a robust antiviral immune response (as seen with elevated anti-viral antibodies) without elevated systemic inflammatory markers were associated with less severe chest imaging patterns, also supporting the notion that an individual's immune response to the virus is responsible for the trajectory of disease. As noted, a significant proportion of non-hospitalized patients also suffered from chronic lung impairments. Taken together, the impact of prolonged convalescence on the workforce, healthcare, and individual lives should not be underestimated. These results underscore the paramount need for continued public health measures and vaccinations to prevent COVID-19, particularly for the most vulnerable individuals (older, immunocompromised, and with preexisting health problems). These observations provide additional biologic justification for the use of agents directed at reducing lung inflammation early in the course of disease, and potentially at an early post-recovery time point (i.e 2 months). Machine learning algorithms may one day help clinicians decide which patients should be targeted for additional therapies after the acute phase. With further study, implementation of AI to real world medicine may be on the horizon.

      We agree with the Reviewer that machine learning algorithms can overcome limitations of ‘canonical’, ordinal and generalized regression methods in the multidimensional setting i. e. when the number of available clinical parameters approaches or exceeds the number of observations/patients. Consequently, machine learning or AI allows for serial screening of medical record data at low cost and supports diagnostic and therapeutic decisions. We discuss those two aspects in the revised manuscript in the context of acute COVID-19 course prediction and long COVID prediction and phenotyping in light of the recent literature [1–4,6].

      Reviewer #2 (Public Review):

      This is a potentially valuable manuscript which links early markers of inflammation with residual abnormalities on chest CT following SARS-CoV-2 infection. Surprisingly, early surveyed symptoms do not predict long term radiologic outcomes (6 months after infection) while inflammatory markers have stronger predictive value. The cohort is well designed and the selected tools for analysis are appropriate.

      We thank the Reviewer for the careful study, critic and appreciation of our work.

      While this finding is potentially of high importance for clinical practice, the endpoints are inconsistently defined, and certain components of the machine learning and clustering analyses are difficult to interpret as presented. It is therefore challenging to understand whether the conclusions are justified by the analysis.

      We apologize for this unclarity. In the revised manuscript, we precisely define the analysis endpoints (any radiological lung findings at the 6-month follow-up, radiological lung abnormalities with CT score > 5, lung function impairment and persistent symptoms at the 6-month follow-up) of the analysis; see: Introduction and Methods/Study design. We also indicate the numbers of participants reaching those endpoints in Table 3.

      Several components of the analysis are confusing and would benefit from further elucidation:

      1) The authors do not clearly define "delayed pulmonary recovery". My sense is that they are using several radiologic based definitions rather than their functional definition (defined by FEV1, FEV:FVC & DLCO) of lung function but this is never explicitly stated. Are the functional outcomes and symptomatic recovery considered in any of the analyses other than correlations with radiologic findings in S1?

      As described above in our previous response, the prime focus and primary endpoint of the analysis was the presence of radiological lung abnormalities at the 6-month follow-up. Our motivation to focus on radiological endpoints was to focus on the potential development of persistent structural lung abnormalities, fibrosis and interstitial lung disease following COVID-19, as observed in SARS-CoV-1 patients [7,8]. Of note, lung function parameters were only weak correlates of radiological impairment as shown in Figure 3 – figure supplement 1 – 3 and our previous work [27]. This finding is in line with numerous studies in ILD patients which demonstrate a low sensitivity of lung function testing (especially FEV1 and FVC assessment) in patients with early interstitial lung disease (ILD) [10,11]. In addition, we could not exclude a pre-existing, COVID-19-independent impairment of lung function in a subset of the study participants suffering from pulmonary diseases, obesity and/or cardiovascular diseases (Table 1). Thus, lung function parameters only partially reflect COVID-19 mediated lung injury and convalescence.

      Nevertheless, we agree, that clinical and functional endpoints are of great interest for the scientific and clinical community. For this reason, we present additional results of univariable risk modeling for long-term (6-month follow-up) symptom persistence and lung function impairment (Figure 5, Appendix 1 – table 2), the results of machine learning modeling for those outcomes (Figure 9, Appendix 1 – table 5) and discuss the findings. We also present the prevalence of such long-term manifestations and lung function impairment in the Low-, Intermediate and High-Risk clusters of the study participants defined by non-CT and non-lung function clinical features (Figure 8).

      2) To this end, I was surprised that the functional definition and symptomatic recovery were not used as the primary endpoints. The functional definition and resolution of symptoms seem most important for the recovering patient so seems like the more important outcome. However, in Figures 5-7, it is often not clear whether the functional outcome is being considered at all.

      As mentioned above, the focus of the study was the assessment of structural lung impairment following COVID-19 and both, lung function parameters as well as symptom burden moderately correlate with structural lung damage (Figure 3 – figure supplement 1 – 3) – a phenomenon observed previously in SARS-CoV-1 [7,8]. Although the symptom burden and its resolution during follow-up are of major importance for the individual patient during post-acute recovery, these parameters are not a good marker for the potential long-term pulmonary outcome. E.g. younger patients with moderate to severe lung damage may demonstrate only mild pulmonary symptoms during post-acute recovery, but the structural damage may be associated with severe impairment at long-term follow-up due to progression of lung fibrosis or age-related decrease of functional pulmonary capacity [11]. Still, we agree with the reviewer that the follow-up on symptoms and lung function is of interest for the reader and additionally included those outcomes in the univariate and multi-parameter risk modeling. In addition, we present the frequencies of symptom persistence and lung function impairment in the low-, intermediate- and high-risk participant clusters defined solely by non-CT and non-lung function clinical parameters. See previous issue for more details.

      3) For the clustering in figure 5, I am uncertain how CT severity score >5 & CT abnormalities cluster separately, when these 2 outcomes appear to logically overlap. Specifically, does the CT abnormalities outcome include patients with the high severity score outcome? In other words, are patients in the "high severity" group a subset of patients with "CT abnormality"? If not a subset, then the CT abnormality should be labeled "non-severe CT abnormality". This could all be clarified by listing the number of patients in each group and showing with a Venn diagram whether there is any overlap.

      We apologize for the lacking clarity in this matter. As pointed by the reviewer, the patients with CT abnormalities scores > 5 points were a subset of the participants with any CT abnormalities. The same was true for the GGO-positive subgroup. We agree, that the overlap between the radiological outcomes obscures the message of the clustering and modeling results. To overcome this, we removed the GGO outcome variable from the analyses in the revised manuscript. In the revised manuscript, we clearly differentiate between mild (CT severity score ≤ 5) and moderate-to-severe radiological abnormalities (CT severity score > 5) in feature (Figure 6) and participant clustering (Figure 8). Frequencies of mild and moderate-tosevere CT abnormalities in the study collective stratified by the severity of acute COVID-19 are presented in Figure 3 – figure supplement 3B. Numbers of the study participants with any, mild or moderate-to-severe CT abnormalities at the subsequent follow-up visits are listed in Table 3.

      4) For the same reason, figure 4 is hard to interpret. Are CT severity >5 being compared to those with normal CTs only or those with normal or mild / moderate CTs? Please provide more specific definitions of normal, "CT abnormality" and "severe CT abnormality" and provide the number of people in each category and specify the comparator groups in all analyses.

      We are sorry for the confusion. In Figure 4 of the initial manuscript, any CT abnormalities, GGO-positivity and abnomalities with CT severity score > 5 were analyzed as separate outcome variables. The baseline was specific for the given explanatory variable, e. g. for the ICU stay this was the mild COVID-19 group or for the elevated IL-6, normal serum IL-6 levels. In the revised manuscript we present the modeling results in an abbreviated form for the 5 strongest co-variates of any CT abnormalities, moderate-to-severe CT abnormalities (CT severity score > 5), persistent symptoms and lung function impairment each (Figures 4 – 5). We indicate the baseline and the n number in the plots. The complete summary of univariable risk modeling with the requested information is provided in Appendix 1 – table 2.

      5) Similarly, how can GGO @V3 be used a potential explanatory variable for the outcome CT abnormalities @V3 when these 2 variables are clearly non-independent. Inclusion of highly related and likely correlated variables may throw off the overall conclusions of the clustering analysis.

      We agree with the editor and the reviewer that this representation was confusing. For this reason and the reasons described in Response 4, we removed the GGO variable from the revised analysis pipeline and differentiate between mild (CT severity score ≤ 5) and moderate-tosevere (CT severity score > 5) radiological lung abnormalities in modeling and machine learning classification. In addition, we define symptom and participant clusters solely with the non-CT parameters (Figure 6 – 7). To investigate the association of mild and moderate-to-severe CT abnormalities with other non-CT variables (Figure 6, Supplementary Figure S5), the CT features are assigned to the no-CT clusters by a k-NN-based label propagation algorithm, i. e. semi-supervised procedure [12,13,26] employed in our recent paper as well [6].

      6) In Figure 6, the criteria for the low, medium, and high-risk subsets are unclear. Is this high risk for persistent functional abnormality, radiologic abnormality, or both? Why were 3 sub populations selected? Was this done subjectively based on the clustering algorithm?

      This is an important issue. The study subject clusters were named according to the increasing frequency of any radiological lung abnormalities in the respective cluster (Figure 8A). We stress this more clearly in the revised manuscript. In addition, as suggested by the reviewer above, we show the frequency of functional lung impairment and persistent symptoms in the study participant clusters. There are multiple criteria for choice of the optimal clustering algorithm and the optimal number of clusters. In our cohort, two criteria for the choice of optimal clustering algorithm were applied:

      1. High fraction of the data set variance ‘explained’ by the cluster assignment (ratio of between-cluster sum-of-squares to the total sum-of-squares, Figure 6 – figure supplement 1A and Figure 7 – figure supplement 1A)
      2. The relatively highest cluster stability or reproducibility of the clustering structure in 20-fold cross-validation (Figure 6 – figure supplement 1B and Figure 7 – figure supplement 1B) [15] The optimal number of clusters of the study participants based on non-CT study variables was based on the algorithm (SOM + hierarchical clustering algorithm, see Reviewer 2, Issue 4) [17,18], as done usually in the unsupervised or semi-supervised setting. The prime criterion for the optimal cluster number was the bend of the curve of within-cluster sum-of-squares versus cluster number as presented in Figure 7 – figure supplement 1D. In addition, this decision was supported by a visual analysis the SOM node dendrogram (Figure 7 – figure supplement 1E) and the curve of the crossvalidated stability statistic (classification error) vs cluster number (Figure 7 – figure supplement 1F) [15].

      7) The accuracy and sensitivity of the machine learning approaches shown in S5 & S6 are somewhat limited. Please comment on why such highly granular data can only provide limited prediction about degree of lung damage post infection. Are there missing data types that might make the algorithm more predictive?

      This is an important issue that deserves more discussion in the revised manuscript. Each of the machine learning classifiers presented in the previous and the revised version of the manuscript was extremely sensitive and specific at predicting the outcomes in the training data encompassing the entire cohort (Supplementary Figure S11), as expected. However, their performance was way worse in repeated holdout (previous version) or 20-fold cross-validation (revision, Figure 9) used here as surrogate tools used to check the sensitivity and specificity with ‘unseen’ test data. We believe that there are two prime sources of such suboptimal performance: the size of the training set and the choice of the classifier. To address the first limitation, the following alterations to the analysis pipeline were introduced:

      1. We do not restrict the analysis to the subset of the CovILD study with the complete set of all variables. Instead, the non-missingness criterion is applied to each outcome variable separately (any CT abnormalities: n = 109, moderate-to-severe abnormalities: n = 109, lung function impairment: n = 111, persistent symptoms: n = 133).
      2. We altered the internal validation strategy. Instead of the repeated holdout approach applied to the machine learning classification, which strongly limits the size of the training data set, we switched to 20-fold cross-validation both for the cluster algorithms (Figure 6 – figure supplement 1BD and Figure 7 – figure supplement 1BF) [15] and the machine learning models (Figure 9, Appendix 1 – table 5) [19]. To address the second issue, the following changes were introduced:
      3. We compare the performance of a broader set of classifiers representing different classes of machine learning algorithms provided by the R package caret [19] (tree model: C5.0 [20], bagged tree model: Random Forests [21], support vector machines with radial kernel [22], shallow neural network: nnet [23], and elastic net regression: glmnet [24]) (Figure 9, Appendix 1 – table 4).
      4. Finally, a model ensemble representing a linear combination of the classifiers presented above developed with the elastic net regression algorithm (Figure 9, Figure 9 – figure supplement 2) and tools provided by caretEnsemble package [25]. Such model displayed better performance at predicting any CT abnormalities and persistent symptoms than single classifiers (Figure 9, Appendix 1 – table 5). Finally, we agree with the Reviewer, that the input variable set, despite its size, was still not complete. We believe that inclusion of other inflammatory markers recorded during acute COVID19 and at the 60-day follow-up may additionally improve the prediction of the radiological abnormalities at the 6-month follow-up visit. Of note, our data set missed important readouts of cellular immunity such as neutrophil levels or neutrophil: lymphocyte ratio (NLR) and blood parameters for the mild COVID-19 subset. We discuss this issue in more detail in the revised Discussion section.

      8) The authors state that "the sole application of a lung function measurement at screening for subjects at risk of delayed lung recovery may bear insufficient sensitivity". I am not sure that I agree with this assessment. From the perspective of a patient, full recovery of lung function with limited or no residual symptoms, even in the presence of residual chest CT abnormalities, seems like a favorable outcome. I would suggest either changing this statement or providing citations that associate residual chest CT abnormalities (in the absence of residual functional lung dysfunction) with adverse long-term outcomes. Do the authors hypothesize that persistent radiologic abnormalities may predate organizing pneumonia which will ultimately become symptomatic?

      We thank the reviewer for the interesting point of discussion. We agree with the reviewer that the functional status and symptom burden is of major importance for the individual patient in the postacute phase of COVID-19. Still, prioritizing lung function over mild structural lung abnormalities may pose two major problems. First, as previously discussed, lung function testing has a rather low sensitivity to detect early ILD [10,11], is not a good prognostic marker for long-term clinical outcomes and may not correlate well with patients' symptom burden. For instance, a patient with a normal lung function status may still be highly symptomatic (e. g. due to reduced capacity of respiratory muscle function) [7] and/or demonstrate structural lung abnormalities (e.g. it has been shown for various ILD that lung function test such as FVC and FEV1 may be normal even in pronounced disease and lung function testing is not sufficient to rule out ILD [10]). Second, to date, it is not known if persistent structural lung abnormalities following COVID-19 (even when mild) are at risk for progressing at long-term follow-up. Especially, sub-clinical structural changes may behave like incidentally detected interstitial lung abnormalities (ILAs) and develop to symptomatic progressive fibrotic interstial lung disease including IPF [11]. For this reason, we think that further pulmonary follow-up is necessary for patients with structural lung abnormalities due to COVID-19 and a sole focus on lung function is not sufficient to assess pulmonary COVID-19 outcomes [9].

      9) The authors note selection bias against ordering CT and perhaps inflammatory markers early during infection as a limitation. I would suggest a sensitivity analysis to understand whether this misclassification will impact the model's predictions.

      We now address this issue in a more detailed way. As shown in Figure 1, there was indeed a significant dropout of participants during the study due to missing the longitudinal visits and missingness of the longitudinal variable set. This phenomenon was indeed the most evident for the mild COVID-19 patients, who lost interest at the participation most likely because of subjective complete convalescence. This issue is discussed now as a limitation in the revised manuscript. In the revised manuscript, we investigated highly influential factors for clustering and machine learning classifiers. To determine, which variables played the most important role for the clustering of the study individuals, we applied the explanatory variable ‘noising’ procedure initially described by Breiman for the random forest algorithm [21] and compared the ‘explained’ variance (ratio of between-cluster sum-of-squares to the total sum-of-squares) of the initial clustering structure with the clustering structures generated in the datasets with noised variables. Although this algorithm is not free from shortages such as blindness to tight correlations, it may provide a coarse measure of the variable’s impact on the cluster formation (Figure 7 – figure supplement 2). For three of the machine learning algorithms tested importance statistics were extracted from the models: (1) for the C5.0 algorithm, the percentage of variable usage in the decision tree, (2) for the Random Forests algorithm, the delta of Gini index obtained by variable noising [21] and (3) for the elastic net/glmNet procedure, the absolute values of regression coefficients β [24] (Figure 9 – figure supplement 4 – 7). The technical details are provided in Methods, the cluster and model importance data are discussed in the manuscript text.

      References

      1. Gutmann C, Takov K, Burnap SA, et al. SARS-CoV-2 RNAemia and proteomic trajectories inform prognostication in COVID-19 patients admitted to intensive care. Nat Commun 2021;12. doi:10.1038/S41467-021-23494-1
      2. Benito-León J, Castillo MD Del, Estirado A, et al. Using Unsupervised Machine Learning to Identify Age- and Sex-Independent Severity Subgroups Among Patients with COVID-19: Observational Longitudinal Study. J Med Internet Res 2021;23. doi:10.2196/25988
      3. Demichev V, Tober-Lau P, Lemke O, et al. A time-resolved proteomic and prognostic map of COVID-19. Cell Syst 2021;12:780. doi:10.1016/J.CELS.2021.05.005
      4. Estiri H, Strasser ZH, Brat GA, et al. Evolving phenotypes of non-hospitalized patients that indicate long COVID. BMC Med 2021;19. doi:10.1186/S12916-021-02115-0
      5. Sudre CH, Murray B, Varsavsky T, et al. Attributes and predictors of long COVID. Nat Med 2021;27. doi:10.1038/s41591-021-01292-y
      6. Sahanic S, Tymoszuk P, Ausserhofer D, et al. Phenotyping of acute and persistent COVID-19 features in the outpatient setting: exploratory analysis of an international cross-sectional online survey. Clin Infect Dis Published Online First: 26 November 2021. doi:10.1093/CID/CIAB978
      7. Hui DS, Wong KT, Ko FW, et al. The 1-Year Impact of Severe Acute Respiratory Syndrome on Pulmonary Function, Exercise Capacity, and Quality of Life in a Cohort of Survivors. Chest 2005;128:2247–61. doi:10.1378/CHEST.128.4.2247
      8. Ng CK, Chan JWM, Kwan TL, et al. Six month radiological and physiological outcomes in severe acute respiratory syndrome (SARS) survivors. Thorax 2004;59:889–91. doi:10.1136/THX.2004.023762
      9. Raghu G, Wilson KC. COVID-19 interstitial pneumonia: monitoring the clinical course in survivors. Lancet Respir. Med. 2020;8:839–42. doi:10.1016/S2213-2600(20)30349-0
      10. Suliman YA, Dobrota R, Huscher D, et al. Pulmonary function tests: High rate of falsenegative results in the early detection and screening of scleroderma-related interstitial lung disease. Arthritis Rheumatol 2015;67:3256–61. doi:10.1002/ART.39405/ABSTRACT
      11. Hatabu H, Hunninghake GM, Richeldi L, et al. Interstitial lung abnormalities detected incidentally on CT: a Position Paper from the Fleischner Society. Lancet Respir Med 2020;8:726. doi:10.1016/S2213-2600(20)30168-5
      12. Leng M, Wang J, Cheng J, et al. Adaptive semi-supervised clustering algorithm with label propagation. J Softw Eng 2014;8:14–22. doi:10.3923/JSE.2014.14.22
      13. Lelis L, Sander J. Semi-supervised density-based clustering. Proc - IEEE Int Conf Data Mining, ICDM 2009;:842–7. doi:10.1109/ICDM.2009.143
      14. Huang C, Huang L, Wang Y, et al. 6-month consequences of COVID-19 in patients discharged from hospital: a cohort study. Lancet 2021;397:220–32. doi:10.1016/S0140- 6736(20)32656-8
      15. Lange T, Roth V, Braun ML, et al. Stability-Based Validation of Clustering Solutions. Neural Comput 2004;16:1299–323. doi:10.1162/089976604773717621
      16. Hartigan JA, Wong MA. Algorithm AS 136: A K-Means Clustering Algorithm. Appl Stat 1979;28:100. doi:10.2307/2346830
      17. Kohonen T. Self-Organizing Maps. Berlin, Heidelberg: : Springer Berlin Heidelberg 1995. doi:10.1007/978-3-642-97610-0
      18. Vesanto J, Alhoniemi E. Clustering of the self-organizing map. IEEE Trans Neural Networks 2000;11:586–600. doi:10.1109/72.846731
      19. Kuhn M. Building predictive models in R using the caret package. J Stat Softw 2008;28:1–26. doi:10.18637/jss.v028.i05
      20. Quinlan JR. C4.5: Programs for Machine Learning. San Francisco, CA, USA: : Morgan Kaufmann Publishers Inc. 1993. doi:10.5555/152181
      21. Breiman L. Random forests. Mach Learn 2001;45:5–32. doi:10.1023/A:1010933404324
      22. Weston J, Watkins C. Multi-Class Support Vector Machines. 1998.
      23. Ripley BD. Pattern recognition and neural networks. Cambridge University Press 2014. doi:10.1017/CBO9780511812651
      24. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw 2010;33:1–22. doi:10.18637/jss.v033.i01
      25. Deane-Mayer ZA, Knowles JE. Ensembles of Caret Models [R package caretEnsemble version 2.0.1]. 2019.https://cran.r-project.org/package=caretEnsemble (accessed 13 Dec 2021).
      26. Glennan T, Leckie C, Erfani SM. Improved Classification of Known and Unknown Network Traffic Flows Using Semi-supervised Machine Learning. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 2016;9723:493–501. doi:10.1007/978-3-319-40367-0_33
      27. Sonnweber T, Sahanic S, Pizzini A, et al. Cardiopulmonary recovery after COVID-19 - an observational prospective multi-center trial. Eur Respir J Published Online First: 10 December
      28. doi:10.1183/13993003.03481-2020
  2. Jan 2022
    1. Author Response:

      Reviewer #1:

      Weaknesses:

      For me, most of the weaknesses of this manuscript are related to the cluster detection:

      1. There is no consensus on the definition of transmission clusters in the field. However, the rational of taking the union (rather than the intersection) of two different methods (HIV-TRACE and cluster picker) did not become clear to me.

      2. HIV-TRACE defines clusters based on pairwise genetic distances and cluster picker identifies clusters using pairwise genetic distance with the guidance of a phylogenetic tree (and node support / bootstrap values). Given the underlying sample size and that the phylogeny was constructed already, the rationale for the purely distance related criterion of HIV-TRACE did not become clear.

      We thank the reviewer for their comments and are happy to provide additional results that motivate our decision to use the union of clusters detected with HIV-TRACE and Cluster Picker to estimate HIV transmissions within and between demographic sub-groups in the Botswana - Ya Tsie trial population. The primary motivation was that a filtering step was required to save time and computational resources from evaluating sequences that were too distantly related, before applying the “gold standard” of Phyloscanner to detect directed (when possible) transmission pairs. Accordingly, clustering algorithms plus a distance threshold helped to achieve this filtering. Because we shared what we take to be the reviewers’ concerns about either of the algorithms alone, we sought to maximize the number of transmission pairs that could be identified between participants in the Botswana – Ya Tsie trial with Phyloscanner by using the union of clusters detected with HIV-TRACE and Cluster Picker. This also served as a sensitivity analysis that allowed us to evaluate the extent to which the clustering patterns observed were specific to a single algorithm.

      Furthermore, a previous study done by Rose and colleagues (PMID: 27824249) to compare the number and size of clusters identified with HIV-TRACE and Cluster Picker clustering algorithms revealed that HIV-TRACE generally identified larger but fewer clusters, compared with clusters identified with Cluster Picker that were typically more numerous and mostly small 2-person clusters (Please see Figure 3B below extracted from Rose and colleagues (PMID: 27824249)). This suggested that HIV-TRACE would be helpful in detecting potentially larger transmission chains and Cluster Picker would be valuable in revealing potential transmission events between pairs of individuals.

      Of the 236 genetic clusters detected with the two algorithms, we identified 19 full or partial clusters (including 41 sequences) that included members that were only detected with HIV- TRACE and 122 full or partial clusters (including 242 sequences) that were unique to Cluster Picker. Moreover, of the 82 directed male-female transmission pairs inferred from the sample, (n = 5) were from genetic clusters that were unique to HIV-TRACE compared with (n = 27) that were from clusters unique to Cluster Picker. Of the five transmission events unique to HIV- TRACE clusters, three occurred in intervention communities originating from control communities. By contrast, four of the twenty-seven transmission events unique to Cluster Picker clusters occurred in intervention communities from control communities.

      In summary, estimates of HIV transmissions in the trial population based on the full overlap of clusters detected with HIV-TRACE and Cluster Picker would have excluded 32 of the 82 male- female pairs used for the primary analysis.

      1. For a phylogeny of this size it is feasible to calculate real bootstrap values instead of using (in my experience more liberal) Shimodaira-Hasegawa support values.

      We value the reviewer suggestion and agree that real bootstrap values could be ideal. However, the likely benefit of computing the suggested bootstrap values and thereafter repeating the entire analysis inferring transmission pairs with Phyloscanner and estimating transmission flows would be minimal. As noted above, liberality in a filtering step is a virtue (avoiding filtering out pairs of interest) as long as it does not lead to unfeasibly large computational burden, as this did not.

      1. In Supplementary Note 2.5 it is described how the linkage and direction of transmission score threshold of 57% was chosen. However, the finding that almost half of the accordingly selected probable source-recipient pairs were same-sex and had to be excluded from the analysis questions the reliability of the threshold.

      We apologize for the insufficient clarity in our description and would like the reviewer to kindly note that the threshold in of itself is insufficient to distinguish between Female-Female pairs separated by a single Male intermediate, but rather by design can distinguish between direct Male-Female pairs and Male-Female pairs separated by several intermediates. Once again, the threshold was meant to be a filter that would allow us to run Phyloscanner on a feasible number of sequences, thus appropriately should let through some pairs that are rejected by later steps in the pipeline. Also, kindly note that all previous Supplementary Notes are now presented in the methods section in line with the reviewer’s suggestions.

    1. Author Response:

      Reviewer #1 (Public Review):

      The lateral entorhinal cortex (LEC) receives direct inputs from the olfactory bulb (OB) but their odor response properties have not been well characterized despite a recent increase in interests in the role of LEC in olfactory behaviors. In this study, Bitzenhofer and colleagues provide unprecedented details of odor response properties of layer 2 cells in LEC. The authors first show that LEC neurons respond to odors with a rapid burst of activity time-locked to inhalation onset, similarly to the piriform cortex (PCx), but distinct from the OB. Firing rates of LEC ensembles conveyed information about odor identify whereas timing of spikes odor intensity. The authors then examined the difference between two major cell types in LEC layer 2 - fan cells and pyramidal neurons, and found that, on average, fan cells responded earlier than pyramidal neurons, and pyramidal neurons, but not fan cells, changed their peak timing in response to changes in concentrations, providing a basis for temporal coding of odor concentrations. Additionally, the authors show that inactivation of LEC impairs odor discrimination based on either identify or intensity, and demonstrate different cellular properties of fan cells and pyramidal neurons. Finally, the authors also examined the odor response properties of hippocampal CA1 neurons, and showed that odor identify can be decoded by firing rate responses, while decoding of odor concentration depended on spike timing.

      The authors performed a large amount of experiments, and provide an impressive set of data regarding odor response properties of LEC layer 2 neurons in a cell type specific manner. The results reported are very interesting, and will be a point of reference for future studies on odor coding and processing in the LEC. The manuscript is clearly written, and data are well analyzed and presented clearly. I have only relatively minor concerns or suggestions.

      1. The authors infer the time at which "mice could discriminate odors" from the time at which d-prime becomes significantly different between baseline and odor stimulation conditions (line 111 and line 121). However, the statistical test applied to these data does not guarantee that an observer can accurately discriminate odors. For example, a small p-value can be obtained even when discrimination accuracy is only slightly above chance if there are many trials. The statement such as "mice could discriminate two odors by as early as 225 ms after inhalation onset" (line 111) can be misleading because this might sound as if mice can accurately discriminate odors at this timepoint, while this is not necessarily the case (as indicated by the d-prime value).

      We have added plots of performance accuracy over time under control conditions (LED off) to Figure 2-supplement 1. These plots of fraction of correct responses (binned every 50 ms) show that mice (n = 6) are making choices significantly different from chance within 200 ms of odor inhalation. We changed the wording in the Results to now say: “Moreover, by analyzing lick timing, we determined that the discriminability measure d’ became significantly different under control conditions as early as 225 ms after inhalation onset and performance accuracy increased within 200 ms of inhalation (Fig. 2b, Figure 2-supplement 1).”

      1. Optogenetic identification can be a little tricky when identifying excitatory neurons as in this study. Please discuss some rational or difficulty regarding how to distinguish those that are activated directly by light from those activated indirectly (i.e. synaptically). Do the results hold if the authors use only those that the authors are more confident about identification?

      We only used the cells that were confidently identified using a combination of two criteria. First, tagged cells had to show a significant increase in firing (p_Rate <0.01) during the 5 ms LED illumination period versus 100 randomly selected time windows before LED stimulation. Cells also had to respond with a fixed latency to reduce the chance of including cells recruited by polysynaptic excitation. Further, we used the stimulus associated spike latency test (SALT) as detailed in Kvitsiani et al., 2013. To be judged as tagged, units had to show significantly less spike jitter during the 5 ms LED illumination than 100 randomly selected time windows before LED stimulation (p_SALT<0.01). Only those cells with BOTH p_Rate<0.01 and p_Salt<0.01 were considered as tagged (both methods typically agreed for most cells). Moreover, slice work testing synaptic connections between LEC layer 2 cells found extremely low levels of connectivity between fan and pyramidal cells Nilssen et al., J. Neuroscience, 2018. This makes it unlikely that LED-induced firing of fan or pyramidal cells would recruit indirectly (synaptically) excited cells.

      1. The authors sort odor response profiles by peak timing, and indicate that odor responses peak at different timing that tiles respiration cycles. However, this analysis does not indicate the reliability of peak timing. Sorting random activity by "peak timing" could generate similar figure. One way to show the reliability or significance of peaks is to cross-validate. For instance, one can use a half of the trials to sort, and plot the rest of the trials. If the peak timing is reliable, the original pattern will be replicated by the other half, and those neurons that are not reliable will lose their peaks. Please use such a method so that we can evaluate the reliability of peaks.

      We analyzed the data as suggested by this reviewer as shown below (Author response image 1). Plotting only the odd trials sorted by the odd trials in the dataset (top) looked identical to the data from all trails used in Figure 1g. More importantly, plotting only the even trials sorted by the odd trials (bottom), though noisier due to trial-by-trial variation, showed the same general structure of tiling throughout the respiration cycle for OB cells.

      Author response image 1

      Reviewer #2 (Public Review):

      In this study, Bitzenhofer et al recorded odor-evoked activity in the LEC and examined the coding of odor identity and intensity using extracellular recordings in head-fixed mice, and used the standard suite of quantitative tools to interpret these data (decoding analyses, dimensionality reduction, etc). In addition, they performed behavioral experiments to show the necessity of LEC in odor identity and intensity discrimination, and deploy some elegant and straightforward 'circuit-busting' slice physiology experiments to characterize this circuit. Importantly, they performed some of their experiments in Ntng1-cre and Calb-cre mice, which allowed them to differentiate between the two major classes of LEC principal neurons, fan cells and pyramidal cells, respectively. Many of their results are contrasted with what has previously been observed in the piriform cortex (PCx), where odor coding has been studied much more extensively.

      Their major conclusions are:

      Cells in the LEC respond rapidly to odor stimuli. Within the first 300 ms after inhalation, odor identity is encoded by the ensemble of active neurons, while odor intensity (more specifically, responses to different concentrations) is encoded by the timing of the LEC response; specifically, the synchrony of the response. These coding strategies have been described in the PCx by Bolding & Franks. Bolding also found two populations of responses to different concentrations: one population of responses was rapid and barely changed with concentration and the second population of responses had onset latencies that decreased with increasing concentration. Roland et al also found two populations of responses using calcium imaging in anesthetized mice: one population of responses was concentration-dependent and another population was 'concentration-invariant'. However, neither Bolding nor Roland were able to determine whether these populations of responses emerged from distinct populations of cells. Here, the authors elegantly register these two response types in LEC to different cell types: fan cells respond early and stably, and pyramidal cells response latencies decrease with concentration. This is a novel and important finding. They also showed that, unlike PCx or LEC where concentration primarily affects timing rather than rate/number, odor concentration in CA1 is only reflected in the timing of responses.

      Using optogenetic suppression of LEC in a 2AFC task, the authors purport to show that LEC is required for both the discrimination of odor identity and odor intensity. If true, this is an important result, but see below.

      In slice experiments, the authors characterize the differential connectivity of fan and pyramidal cells to direct olfactory bulb input, input from PCx, and inhibitory inputs from SOM and PV cells. This work is elegant, novel, and important, although it is a little out of place in this manuscript. As such, their findings are irrelevant/orthogonal to the rest of the results in this study. But fine.

      The simultaneous recordings from three different stations along the olfactory pathway are impressive.

      Major concern

      My major concern with this manuscript regards the behavioral experiments. The authors show that blue light over the LEC in GAD2-Cre/Ai32 mice completely abolishes (i.e. to chance) the mouse's ability to perform a 2AFC task discriminating between either two different odorants or one odorant at different concentrations. Their interpretation is that LEC is required for rapid odor-driven behavior. The sensory component of the task is so easy, and the effect is so striking that I find this result surprising and almost too good to be true. The authors do control for a blue-light distraction effect by repeating the experiments in mice that don't express ChR2, but do not control for the effect of rapidly shutting down a large part of the sensory/limbic system. If they did this experiment in the bulb I would be impressed with how clean the result was but not conceptually surprised by the outcome. I think a different negative control is needed here to convince me that the LEC is necessary for this simple sensory discrimination task. For example, the authors could activate all the interneurons (i.e. use this protocol) in another part of the brain, ideally in the olfactory pathway not immediately upstream of the LEC, and show that the behavior is not affected.

      This reviewer suggests a negative control experiment for the effects we observe on behavior when optogenetically silencing LEC. However, we disagree that it would be informative to silence other olfactory pathways in search of those that do not affect behavior. Our strong effects on behavior are also in complete agreement with recent findings that muscimol inactivation of LEC abolishes discrimination of learned odor associations (Extended Data Figure 8, Lee et. al., Nature, 2021).

      More specifically, both the presentation and the interpretation of the data are confusing. First, there is a lack of detail about the behavioral task. I was not sure exactly when the light comes on and goes off, when the cue was presented, and when the reward was presented. In the manuscript they say (line 108) "…used to suppress activity during odor delivery on a random subset…". There is nothing more about this in the figure legend or Methods. The only clue to this is the dotted line in the 'LED On' example at the bottom of Fig. 2a. The authors also say that (line 660) "Trials were initiated with a 50 ms tone." When exactly was the tone presented? In the absence of any other information, I assume it was presented at odor onset. When was the reward presented? Lines 106-7 say "Mice were free to report their choice (left or right lick) at any time within 2 s of odor onset." Presumably this means the reward was presented to one of the ports for 2 seconds, starting at odor onset.

      The LED is applied during odor delivery, the 50 ms tone immediately precedes odor delivery, and water reward is dispensed after the first lick at the correct lick port during the choice period. The choice period begins with the odor onset and odor delivery is terminated by the first lick at either the correct or incorrect port. If there is no lick at either port, odor delivery lasts 1s and is followed by an extended choice period (terminated by correct or incorrect lick) lasting 1s. To clarify the behavior protocol, we have included a schematic of the trial structure in Figure 2-supplement 1.

      These details matter because the authors want to claim that "LEC is essential for rapid odor-driven behavior." The data presented in support of this claim are (1) that mice perform this task at chance levels in LED On trials, presumably based on which port the mouse licked first (this is the 'essential' part), and (2) that in control in LED Off trials, d' becomes statistically different from baseline after ~200 ms (this is the 'rapid' part).

      To further support the argument that LEC is required for rapid odor-driven behavior, we now show a plot of % correct responses over time from first odor inhalation.

      On first reading, these suggested that shutting off LEC makes odor discrimination worse and/or slower. However, the supplementary data clarifies several things. First, the mice never Miss (Fig.2S.2a & c), meaning then they always lick. Second, in LED Off trials (F2S2 & e), the mice make few mistakes, and these only occur immediately after inhalation, presumably meaning the mice occasionally guess, possibly in response to the auditory cue. Thus, the mean time to lick is much shorter for Error trials than Correct trials. To state the obvious, the mice often wait >300 ms before they lick, and when they do wait, they never make mistakes. Now, in the LED On trials, the mice almost always lick within the first 300 ms and perform at chance levels, with the distribution of lick times for Correct and Error trials almost overlapping. In fact, although the authors claim LEC is required for rapid odor discrimination, the mean time to lick on Correct trials appears to decrease in LED On trials. This makes me think that the mice are making ballistic guesses in response to the tone in LED On cases, which doesn't necessarily implicate a dependence on LEC for odor discrimination.

      We do not believe that mice are making ballistic guesses in response to the tone for LED on trials. First, although a 50 ms tone immediately precedes odor delivery, all data in Figure 2-supplement 1 shows lick times aligned to the first inhalation of odor. Thus, time 0 ms is not the tone or subsequent odor onset but rather a variable time point coinciding with the first odor inhalation (the delay from odor onset to first inhalation is ~300 ms, the average respiration interval under our conditions). In fact, we excluded trials if mice made premature licks between the time of odor onset and first odor inhalation. We re-analyzed these trials to test the reviewer’s idea that mice were more likely to make fast ballistic guesses when the LEC was silenced. However, we saw no evidence that mice made more premature licks in trials with LED on (Author response image 2).

      Author response image 2

      The authors' interpretation of their data would be more solid if, for example, there were a delay between the auditory cue and odor delivery and/or if the reward was only available with some delay after the odor offset. Here, however, it seems just as likely as not that the mice are making ballistic guesses in response to the tone in LED On cases, which doesn't necessarily involve dependence on LEC for odor discrimination. Here, the divergence of d' from baseline in the control (i.e LED Off) condition seems mostly because mice take longer to correctly discriminate under control conditions. While this is not formally contradictory to LEC is essential for rapid odor-driven behavior", it is nevertheless a bit contrived and misleading. An interesting (thought) experiment is what would happen if the authors presented a tone but no odor. I would guess that the mice would continue licking randomly in Light On trials.

      While a delay between odor delivery and reward would have been useful for some aspects of interpreting the behavior, we would have lost the ability to examine the role of LEC in response timing. To address this reviewer’s concern, we have added a section to the Discussion mentioning caveats related to the interpretation of experiments using acute optogenetic silencing to understand behavior.

    1. Author Response:

      Reviewer #1 (Public Review):

      This article by H. Izgi et al. describes interesting work measuring transcriptional changes through development and later aging. The authors broadly conclude that these tissue transcriptomes diverge during development, but re-converge during aging. They name this expression pattern divergence convergence, or DiCo.

      After drawing this conclusion from tissue samples drawn from 16 mice of their own, they look at published mouse and human transcriptomic data and observe similar patterns of change.

      Overall the authors emphasize that both highly mitotic and less mitotic tissues show examples of the DiCo transcriptional pattern, supporting the possibility that this may be a general phenomenon.

      In addition, the authors ask whether the tissue-specific changes they observe might depend on changes in cell composition with tissues, or cell autonomous transcriptional changes within cells, using published single-cell data. They conclude here that both play a role.

      Some of the more specific findings are not surprising and in this support the soundness of parts of the methodology, e.g. that shared developmentally down-regulated genes were enriched in functions such as cell cycle and cell division.

      My largest suggestion centers around an alternative hypothesis that may occur to readers; namely that the convergence or Co part of DiCo could be just regression to a mean due to heteroscedasticity with respect to time (age) caused by increased noise in expression. As the divergence could be imagined to be largely due to tissue differentiation during development, which has been studied extensively previously, the overall novelty of these findings relies much more on the later convergence that the authors have observed. The authors note: "Interestingly, we found no overlap between gene sets with the reversal pattern (up-down or down-up genes) across tissues, relative to random expectation". They also note "Intriguingly, we found that similar cell types (i.e. those with the highest correlations) among tissues become less similar with age (36/54 [67%] of pairwise comparisons, Figure 5-source data 1). On the contrary, the most distinct cell types (i.e. those with the lowest correlations) among tissues become more similar with age (45/54 [83%], Figure 5-source data 1).", which is at first glance consistent with this alternative hypothesis. The authors do directly address previous observations of increased noise with age in their Discussion (Bahar et al. 2006; Martinez-Jimenez et al. 2017; Angelidis et al. 2019; Somel et al. 2006), although I might also suggest perhaps PMID: 20832724 PMID: 8604994, and PMID: 28965763. Their acknowledgment refers to the disagreement of their own findings of inter-tissue correlation distributions being modest and comparable between aging and development in Figure 1c. Their CoV trajectory data in Figure 2, perhaps most relevant here in Figure 2c, may also speak to this issue. Nevertheless, in my opinion it would strengthen the manuscript greatly for many readers if this alternative hypothesis were more explicitly and clearly spelled out, and then perhaps more explicitly ruled out, in the manuscript.

      We thank the reviewer for pointing out this interesting possibility, i.e. that increased expression heterogeneity during ageing (heteroscedasticity) may cause the observed DiCo pattern. Heteroscedasticity can occur at two different levels: inter-individual (Somel et al., 2006) or inter-cellular (Bahar et al., 2006). Here we only have enough power to test whether inter-individual heterogeneity may contribute to DiCo. We used two heteroscedasticity tests. In both cases we compared i) genes with DiCo pattern and ii) genes with DiDi pattern (divergent throughout the lifetime). The hypothesis was that if heteroscedasticity has a role in DiCo, DiCo genes should show stronger heteroscedasticity than DiDi genes.

      In the first approach, we followed the method we used to measure heteroscedasticity in (Işıldak et al., 2020) and (Kedlian et al., 2019). We first fit a linear model between age (log2 scale) and the expression level of each gene. Then, we calculated Spearman’s correlation between the absolute residuals from this model and age. We found that DiCo and DiDi genes are not significantly different in terms of their effect sizes in heteroscedasticity in any of the tissues (two-sided KS test, p>0.05 in all tissues, Figure 2-figure supplement 15a).

      In the second approach, we used the ‘ncvTest’ function from the ‘car’ package which performs Breusch-Pagan test for heteroscedasticity in a linear model. We compared the test statistics of Breusch-Pagan test, i.e. measure of heteroscedasticity of each gene in each tissue, between DiCo and DiDi genes. We found that the two gene sets do not significantly differ in heteroscedasticity in the three tissues. The only exception was muscle; here, contrary to expectation under the alternate hypothesis, DiDi genes showed slightly higher heteroscedasticity (two-sided KS test, p=0.042, Figure 2-figure supplement 15b).

      We believe that the new results strengthen our results and suggest the observed DiCo pattern is not an artefact of inter-individual heteroscedasticity. We have now updated the text to include these new analysis results and figures (Figure 2-figure supplement 15).

      Meanwhile, the above analysis does not test the possible relationship between heteroskedasticity and DiCo at the cellular level. Inter-cellular expression noise, when coupled with constraints on minimum and maximum expression levels, can theoretically lead to gene expression becoming more similar to the mean levels. In other words, the most cell-type-specific genes with the highest and lowest expression may attain lower or higher expression levels during ageing, simply due to increased expression noise during ageing. Such an effect could theoretically increase correlation among cell types. This model is in essence an alternative description of our “loss of cellular identity” model and elegantly links together two observations, inter-cellular heteroskedasticity and convergence.

      We thank the reviewer also for suggesting new references for increased noise with age. We have now updated the text to add those references.

      Reviewer #2 (Public Review):

      In this manuscript, Izgi et al investigated age-dependent gene expression pattern changes in male mice by analyzing a new bulk RNA-seq data from four different tissues collected at different ages covering post-natal development and aging. Gene expression patterns observed before and after sexual maturation seem to suggest inter-tissue divergence and convergence of gene expression profiles, respectively. The authors name that phenomenon Divergence-Convergence or "DiCo". Analysis of publicly available single cell RNA-seq [scRNAseq] datasets (from the Tabula Muris Senis consortium) suggests that such gene expression pattern changes may be explained by both alterations in tissue cell type composition, as well as by cell-autonomous expression changes. These observations may suggest that aging results in at least a partial loss of tissue identity acquired developmentally.

      Although the authors report an intriguing finding, there are major issues in the manuscript as it stands, notably concerning the clarity and rigor of the data analysis and manuscript. Notably, the authors compare expression levels across samples using the FPKM normalization method, which has been shown to be a problematic metric. There are also inconsistencies in statistical and methodological choices for which there is not a clear rationale explained in the manuscript. Finally, the authors use only male animals, which may not reflect age-related trajectories in female animals, but draw broad cross-species conclusions without raising sex as a caveat to the generalization of the conclusions.

      We thank the reviewer for their careful reading and we are happy to hear that they found the results intriguing. Following the reviewer’s criticism, we carefully re-wrote the Methods section. We hope that the reviewer will now agree that the problem in our first submission was with lack of textual clarity instead of methodological. With regard to the comment about normalisation; we do not only use FPKM in our analyses (which, as the reviewer suggests, is an intra-sample normalisation), but we also apply quantile normalisation, which is an intersample normalisation method. We now clarified this aspect in the text. In addition, we repeated the main analyses using VST, which is another inter-sample normalisation that is implemented in the widely used DESeq2 pipeline. This confirmed our main conclusions. In the main text we retained the results based on quantile normalisation, but also report the VST-based analyses for confirmation.

      We also thank the reviewer for their comment on the potential sex-dimorphism of the observed phenomenon, which we had not considered before. Our samples are indeed all-male, whereas the additional dataset from Jonker et al. is composed of only female individuals, and notably, inter-tissue convergence during ageing was also observed in this dataset. Additionally, the GTEx data covers both female and male samples in humans and also suggests a trend towards inter-tissue convergence during ageing. While we observe DiCo in both sexes, it is still possible that the genes and functional pathways that show this pattern might be sexspecific and do not overlap between sexes. A comparison of male and female-specific convergent genes in mice (i.e. those identified in our data and that of Jonker et al.) is not possible at this point, as sex effects would be confounded with laboratory and platform effects.

      Although the human GTEx data contains both males and females, the age distribution of female (n=11) and male (n=36) samples are quite different (and we also lack male individuals at 20-29 and 70-79 age groups, limiting our data only to 30-69 for males). Consequently, we could only test inter-tissue convergence in each sex but could not compare those gene sets. Based on the analysis in the GTEx data, we observed that convergence during ageing was marginally significant in the female sample (⍴_female= -0.58, p_female=0.059) but not in the male sample (⍴_male= -0.052, p_male=0.77) (Figure2-figure supplement 16). The difference might be driven by missing individuals for the youngest and oldest age groups.

      We now included these results in the main text, and discuss the importance of addressing sex-specific effects in the future.

      Reviewer #3 (Public Review):

      In this manuscript Izgi et al. analyzed gene expression time-course data in four tissues during postnatal development and ageing in mice. Authors show that the expression levels of genes often reverse with ageing compared to development. Authors further show that the expression pattern diverge among the tissues during postnatal development and converge among tissues with ageing. This divergence and convergence pattern (called DiCo) is analyzes at both individual gene and genome-wide levels using multiple statistical approaches. Both cellular composition changes and cell autonomous expression changes contribute to the reversal of gene expression pattern during ageing. This study connects expression pattern during postnatal development with ageing, extending previous work on a single tissue.

      Strengths:

      -The expression convergence with age is consistently seen across multiple datasets and species indicating it can be widespread.

      -The datasets generated are unique and would be useful resource for ageing genomic community.

      -Authors go beyond bulk RNA-seq and also analyze available single cell RNA-seq datasets in mice to asses the contribution of cell composition changes and cell intrinsic expression changes to DiCo.

      Weaknesses:

      -Many aspects of expression convergence and DiCo pattern have low effect size and some are not significant. It also appears that this pattern is best seen at the genome-wide level.

      -Although there is statistical support for DiCo, there are no consistent functional associations discovered in Gene Ontology enrichment.

      -The mechanism for DiCo and the extent to which the same genes or pathways underlie this across species is unclear.

      We thank the reviewer for their careful reading of our manuscript and for pointing out the strengths and weaknesses in a clear manner. We hope that both the dataset and the insight we gained from this study will be useful for the community and open new directions of research in the future.

      We agree with the reviewer that although we study the convergence of expression at different levels, it is the most prominent at the genome-wide level and the effect size is small. We now included a discussion on this aspect in our limitations paragraph. As the reviewer points out, our analysis was focused on identifying genome-wide patterns and not on particular genes and/or specific functional processes. Still, we do find certain associations between DiCo genes and GO categories related to tissue development and differentiation. In this version, we provide a more in-depth analysis of these categories, together with their profiles of gene expression during development and ageing. Unfortunately, confirmation of the functional consequences through experimental studies is outside the scope of this paper. Thus, the results should be seen as potential links that require further experimental support. We also mention this in our limitations paragraph. Lastly, to address the reviewer’s comment on the mechanisms, we tested whether the DiCo pattern is associated with certain transcription regulators, miRNAs and TFs; however, we did not find any specific regulator. If DiCo is indeed a transcriptome-wide phenomenon caused by loss of expression regulation and cellular identity during ageing, rather than the result of a controlled program, lack of significant association with specific transcriptional regulators may be expected. This new result and its discussion are also included in the new version.

    1. Author Response:

      Reviewer #1 (Public Review):

      Here, Garner and Theriot investigate the question of leading edge maintenance in migrating cells. They analyze small and dynamic fluctuations of the membrane at the cell front in order to understand how membrane stability emerges from these seemingly random and uncoordinated events. Experimental data enable description of fluctuations at different length scales and their relaxation in a visco-elastic manner.

      To gain knowledge about this system, a stochastic model of branched actin network growth against a membrane is developed, taking into account a number of molecular reactions at play. This model recapitulates correctly the cellular observations, with correct orientation of the filaments and similar membrane fluctuations. Also, addition of Latrunculin B which leads in vivo to increased amplitude of the fluctuations with decreased fluctuation rates is described in the model when nucleation and elongation rates are decreased.

      Changing the different parameters of the model reveals that two features are critically important (2): a branching reaction occurring solely at proximity of the membrane, and the possibility for filaments to spread laterally. Other important parameter includes the Arp2/3 complex branching angle, where a 70-80{degree sign} geometry is found to be optimal for minimizing actin density fluctuations and leading edge fluctuation amplitudes.

      This work is of excellent quality and its conclusions seem justified. However, it would be important to have more details on the limit of detection of membrane shape fluctuations and network growth by phase contrast microscopy.

      The reviewer raises an important point on the differences in spatial resolution between the experimental and theoretical aspects of our work. We appreciate this opportunity to further clarify which of our conclusions are directly demonstrated by experimental data, and which are theoretical predictions that are grounded in experimental data but not explicitly measured. Our updated manuscript includes an expanded discourse on this topic in the Results and Discussion. I outline the major points below:

      1) In lines 92-99 of the Results, we estimate our experimental spatial resolution for measuring leading edge fluctuations, emphasizing that imaging by phase-contrast is not sufficient to resolve individual filaments or polymerization events. We also clarify our hypothesis that the measured fluctuations are a micron-scale property arising from stochastic monomer addition at the molecular scale, now more directly stating that simultaneous stochastic polymerization of filaments throughout the leading edge might act collectively to generate large scale curvature.

      2) In lines 142-143, we make more clear that we developed the molecular-scale actin network growth model to explore how molecular interactions might lead to the observed larger scale fluctuation behavior.

      3) In lines 151-152, and 284-287 of the Results, we discuss the range of wavelengths over which the experiments and modeling output can be directly compared.

      4) Finally, in the Discussion (lines 317-324) we emphasize that we experimentally measured micron-scale lamellipodial shape dynamics, but inferred nanometer-scale details using a molecular-scale model that correctly predicts this emergent behavior (as well as many other experimentally-measured features of lamellipodial actin networks). We then discuss how our results might inspire new super-resolution experimental approaches to directly test molecular-level predictions of the model.

      Reviewer #2 (Public Review):

      The topic of actin driven cell motility will be of general interest. The authors provide new ideas for the field of research, the modeling methods and model design seem valid and appropriate, and the paper is well written. My main concern is whether the fluctuation spectrum derived from the model corresponds to that of the experimental images.

      Visually (and perhaps mistakenly on my part), the experimental analysis of Fig. 1b seems to show a nearly periodic red-blue curvature pattern with a scale of order 4 microns that persists over 10-15 sec, a time over which the cell advances by a distance of order the size of the lamellipodium. While such a nearly periodic pattern would be expected to lead to peaks at the corresponding periods and wavelength in Fig. 1e and 1g, no clear peaks are observed in those figures.

      However, the autocorrelation functions in Fig. 1e are not plotted over times comparable to 10-15 sec. Further, the analysis of the leading edge contour is done with a background subtraction method that removes fluctuations over 7 microns, a length scale that may be dampening a real peak at ~4 microns in Fig. 1g.

      The feature I am pointing out could be occurring at a length scale in between the shortest length scales (a pixel) and the longest ones (cell size) in the system. Instabilities, a main theme of the paper, frequently get amplified at a characteristic length scale. Here there may be a length scale that is selected by the system that may not be picked up by the analysis or the proposed model.

      We thank the reviewer for drawing our attention to an apparent discrepancy between the curvature kymograph shown in Fig. 1b and the results of the autocorrelation analysis, which we now believe we have reconciled. In our updated manuscript, we demonstrate that (1) the feature the reviewer points out in the kymograph is not indicative of a dominant mode or instability; (2) regardless, the feature in question is not removed by our pre-processing step; and (3) an extension of our analysis to longer length and time scales does not affect our results. These points are summarized in an extended description of the curvature kymograph and autocorrelation analyses in the Results (lines 120-125), Methods (lines 416-423, 443-444, 471-473, 487-503), and in three new supplemental figures (Fig. S3 5). Our argument is as follows:

      1) The apparent instability in the curvature kymograph (which the reviewer suggested our autocorrelation analysis might not be detecting) can be reproduced in a model in which there are, by definition, no instabilities, dominant wavemodes, or oscillations – that of a membrane freely fluctuating under Brownian motion (Fig. S3). This proves that one cannot interpret the appearance of such an underlying pattern in the kymograph as evidence of an instability. We note that the apparent “dominant wavemode” of ~4 µm in the curvature kymograph might simply reflect the span used to perform the curvature fitting, as it is approximately twice the size of the curve-fitting window. Overall, this control provides a case-in-point for the potential pitfalls in interpreting kymographs and the necessity of Fourier mode autocorrelation analysis as a more comprehensive approach.

      2) The reviewer raised the possibility that baseline-subtracting features above 7 µm might remove the apparent ~ 4 µm instability from our data, but these visual features remain apparent in curvature kymographs generated after the baseline-subtraction is applied (Fig. S3). Therefore our 7 µm cut-off does not remove the features in question.

      3) As suggested by the reviewer, we extended our analysis to longer length and time scales, and found that it did not affect our results. Consistent with what could be observed from the originally-plotted timescales in Fig. 1e, longer timescales show the signal decays to noise (or at least something which cannot be distinguished from noise in any straightforward way) at all length-scales (Fig. S4). Additionally, repeating the analysis using a 10 µm span for background-subtraction of the leading edge shapes (an increase of ~50% compared to the 7 µm span used in the original manuscript, and more than twice the width of the feature of concern to the reviewer), reveals no new features in the data (Fig. S5).

    1. Author Response:

      Reviewer #1 (Public Review):

      Dicks et al. in this study characterized electrophysiological properties of mutant and wild-type hiPSC-chondrocytes and the expression of chondrocyte-associated markers during chondrogenic differentiation of the cells, and analyzed the differential expression of global transcriptome between the different chondrocyte groups. They demonstrated TRPV4 mutation-induced changes in calcium signaling, mechanical property of matrix, and transcriptome of hiPSC-chondrocytes and concluded that the V620I and T89I mutations of TRPV4 in chondrocytes delay or inhibit hypertrophy, which may be a potential cause of skeletal dysplasias.

      This study applied a gene-editing tool to creating mutant hiPSCs as a human cell model of the disease in culture to study TRPV4 mutation-induced alteration in cellular activities and molecular regulation. Establishing such an hiPSC model for disease study is novel and considered a major strength. Other strengths of this report include adequate background information, solid data analysis, and well-referenced discussions. The iPSC model established in this study could potentially be used to study pathogenic mechanisms of the diseases and identify molecular targets involved in regulating the mechanisms for the development of disease treatments.

      However, there are two weaknesses identified in this current report, which are described below.

      1. Through comparison, differences in biological response and activities between mutant and wild-type hiPSC-chondrocytes were shown, and molecules and mechanisms of interest were identified as potential regulators involved in the mutation-induced changes. However, critical experiments such as gain- and loss-of-function assays to determine whether and how some or all of the identified molecules or mechanisms (HOXs, TGFB, biomineralization genes …) are regulated by the mutations to alter chondrocyte activities are missing. These experiments are needed to strengthen their conclusions. The discussions about the identified molecules and mechanisms with cited references are inadequate as a support for the conclusions.

      We agree with the reviewer that gain- and loss-of-function experiments would be critical for identifying whether the proposed mechanisms are in fact responsible for the differences caused by the TRPV4 mutations and the disease phenotypes. However, these experiments are out of the scope of this study, and we plan to investigate each of these mechanisms in future studies. In the meantime, we have added additional citations to the discussion to further support these conclusions.

      1. The data currently presented in Figures 1, 5 and 6 are insufficient to justify the claims regarding mutation-induced changes of TRPV4, chondrocyte hypertrophy, and expression levels of the identified molecules.

      To further support the conclusions in Figures 1, 5, and 6, we have added additional data. As suggested, we investigated the role of TRPV4 phosphorylation on channel function and activity. We found V620I had increased expression of PRKCA, the gene encoding for protein kinase C alpha. These data indicate that TRPV4 phosphorylation may be responsible for the increased basal calcium signaling through V620I TRPV4.

      We then performed western blots to investigate production of hypertrophic proteins to validate the gene expression and support the claims that V620I and T89I had delayed hypertrophy in response to BMP4 treatment. Indeed, BMP4 treatment increased ALPL, COL10A1, IHH, and RUNX2 gene and protein expression compared to TGFβ3 controls, and this response was more prominent in WT than mutant lines. These data have been added to the paper to support our conclusions (Fig. 4 – Fig. S1B, Fig. 5, Fig. 5 – Fig. S1).

      Reviewer #2 (Public Review):

      In this manuscript, Dicks et al. generated two human iPSC lines with TRPV4 mutations (mild V620I or lethal T89I) using a CRISPR-Cas9 approach and examined their channel function and differentiation abilities into chondrocytes. While their initial goal is to elucidate the detailed molecular mechanisms underlying how these two mutations lead to strikingly distinct severities of skeletal dysplasias, most of their data found that these two mutations behave in a similar manner. The minor differences they found are: 1) increased basal currents in V620I cells; 2) reduced mechanical properties of cartilage matrix in V620I chondrocytes; 3) some differences in DEGs of RNA-seq data. They also stated that "The severe T89I mutation inhibits chondrocyte hypertrophy more than moderate V620I 298 mutation" (page 16). However, no substantiated data were provided to support this conclusion. While a serial of RNA-seq experiments were performed to explore the underlying mechanism, they were not followed by validation experiments to pinpoint the exact pathways or molecular mechanisms. Thus, although using CRISPR-Cas9 and iPSCs are novel and potentially important, this manuscript is overall descriptive with limited mechanistic information.

      We thank the reviewer for the summary of the paper. We have further investigated the differences between WT and the two mutant lines to add to the RNA-seq experiments. As suggested by another reviewer, we looked at protein kinase gene expression, which may be altering TRPV4 phosphorylation and ultimately changes in channel activation. This expression data is consistent with the basal calcium differences we saw, and we believe these warrant further investigation in a follow-up study regarding biochemical changes to the channel structure and activation.

      We also further validated the differences in BMP4-induced hypertrophy by looking at protein production. BMP4 not only increased hypertrophic proteins COL10A1, ALPL, IHH, and RUNX2, but we saw much larger increases in WT compared to mutants. Further, ALPL production was increased in the moderate V620I mutation compared to the severe T89I mutation, indicating a potential player in the differences in disease severity caused by the two mutants.

      Finally, we investigated the DEGs between V620I and T89I to highlight the differences between the two mutations. We believe this study has served as a foundation for identifying potential mechanisms leading to the disease phenotypes of moderate and severe skeletal dysplasias. In future studies, we hope to validate these mechanisms.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript, Cai and authors offer a new and important discovery demonstrating the persistence of a clade on non-caballine equids, Sussemionus, well into the later millennia of the Holocene in northern China. My expertise does not lie with the genomics analysis, so I will not offer detailed comment - but as an outsider, the arguments seemed well-supported and convincing.

      We thank the reviewer for the positive assessment.

      The primary weakness of the article lies in the omission of detailed archaeological context, and in the failure to consider implications for and from human societies. All specimens were taken directly from archaeological sites, but no information is given about the archaeological sites and cultures the specimens were derived from. In early China, ca. 3500 BP, the persistence of wild equid taxa is a very significant finding. This time period was a very dynamic period across northern East Asia, with the first introduction of domestic horses and the first spread of other livestock pastoralism (see Brunson et al, https://www.sciencedirect.com/science/article/abs/pii/S2352409X20300535). And, as summarized in Yuan and Flad (2006), many of the earliest sites speculatively linked with domestic horses that predate the final Shang Dynasty are isolated equid bones from archaeological sites, without definitive archaeological data to determine domestic or wild status. Therefore, the archaeological context of these finds is really important - how were each of the bones originally identified in archaeological reports? Is there associated evidence that the equids were hunted and eaten? The authors must add a section describing the archaeological context in greater detail, and considering the possible implications of the finds. For example, the persistence of sussemione equids through the 2nd millennium BCE implies that researchers must be exceedingly careful in zooarchaeological identifications prior to this period.

      We thank the reviewer for pointing this out. We have provided more details about the archaeological context in the revised manuscript: “Nearly 20,000 square meters of Honghe (47.20°N, 123.62°E) have been excavated, revealing a late Neolithic settlement site dated to approximately ~3,400-4,400 years ago and belonging to a unique, rich fishing and hunting culture characteristic of northeastern China (Figure 1—figure supplement 1). The scale of the moated settlement indicates that there was already social management and relatively high productivity and building technology. The Muzhuzhuliang site (38.83°N, 110.50°E) belongs to the “Longshan Culture” dated to approximately ~3,800-4,300 years ago. It is the most complete moated settlement hitherto excavated in the late Neolithic Age of Northern China, and showed a subsistence economy based on agriculture, animal husbandry and hunting. The Shatangbeiyuan site (35.63°N, 105.11°E) belongs to the early cultural relics of “Qijia culture” in the Neolithic Age, which is dated to approximately ~3,900-4,200 years ago. Millet represented the main crop produced at that time, stone and bone arrowheads have also indicated that hunting was also performed. The rise and decline of these cultures were substantially influenced by the regional environmental conditions. And no traces of domestication but consumption were found in the equine specimens of three sites, indicating that they were hunted for food”.

      And we have added “And given that the persistence of Sussemiones through the second millennium BCE, researchers must be exceedingly careful in zooarchaeological identifications prior to this period.” at the end of the article.

      Moreover, the result might also warrant a discussion about the role of pastoral cultures, or the introduction of domestic horses, in the final extinction of the sussemiones. Without such a summary, it is incomplete to suggest that their final extinction is a result of inbreeding and reduced genetic diversity.

      We agree that this is an interesting point to consider. We have added the sentence “Considering the knowledge of environmental and human archaeology, our results imply that the extinction of this lineage may be affected by the combination of climatic change and human mediation.”

      Reviewer #2 (Public Review):

      Dawei Cai and colleagues present a series of firsts and new discoveries including (1) the first high coverage genome from an equid that is unequivocally an extinct species and (2) demonstrating that Equus (Sussemionus) ovodovi survived into the late Holocene, belonged to a lineage sister to all extant non-caballine equids, and underwent extensive admixture soon after its divergence from non-caballine equids.

      The manuscript is clearly laid out and well written. The analyses are conducted logically and to a high standard, which includes testing the impacts of reference genome choice and DNA misincorporations in nearly all analyses. The conclusions are mostly supported by the data but some methodological clarifications and discussion of conflicting results are required.

      Thanks for your comments.

      Strengths/weaknesses of the five main findings:

      (1) Sussemiones survived into the late Holocene. Strengths: It is remarkable that Sussemiones survived so late into the Holocene, but the authors present radiocarbon evidence from multiple skeletal elements and sites supporting the late survival hypothesis. Combined with the genomic evidence, there is very strong support for this assertion. Weaknesses: The manuscript does not describe the radiocarbon methods, such as which laboratory these analyses were conducted in and whether samples were ultrafiltered or not. A description of the calibration methods and curve version used is also lacking.

      Thank you for this suggestion. We have provided more details about the radiocarbon methods in the revision and Supplementary Table S2. “Radiocarbon dating of the samples was performed at the Beta Analytic Radiocarbon Dating Laboratory, Miami, Florida. Bone or tooth pieces about 2g were sampled in the bone and sent for subsequent dating of collagen (not ultrafiltered). Calibration was carried out using OxCalOnline (https://c14.arch.ox.ac.uk/oxcal.html) and the IntCal20 calibration curve.”

      (2) Equus (Sussemionus) ovodovi is a sister lineage to all extant non-caballine equids. Strengths: The authors construct both exome and candidate neutral loci phylogenies from across the nuclear genome, including testing the impact of two different reference genomes. All analyses support the same placement of E. ovodovi with 100% bootstrap support. The assertion is therefore strongly supported. Weaknesses: No weaknesses identified.

      We thank the reviewer for the positive assessment.

      (3) The early evolution of the lineages leading to the E. ovodovi and the three main extant equid groups was characterised by extensive admixture. Strengths: The authors use three different methods to infer the presence, extent, and/or direction of admixture. Weaknesses: A major weakness here is the incongruence between the TreeMix models and the D-statistics and G-PhoCS analyses (the latter two give a coherent story). Given the large admixture events determined by G-PhoCS, it seems concerning that these events are not recovered as migration edges in the TreeMix analyses.

      We thank the reviewer for the suggestion. As the reviewer notes, two reasons may cause the incongruence between the TreeMix models and the G-PhoCS analyses. First, the TreeMix models will work best when gene flow between populations is restricted to a relatively short time period, situations of continuous migration violate this assumption and lead to unclear results (see Pickrell, Joseph K., & Pritchard, Jonathan K. (2012), https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002967). Second, two different datasets were used in the analyses. The tree topologies and gene flow were recovered using whole-genome SNPs by TreeMix, while the G-PhoCS analyses of selected samples based on 15,324 candidate ‘neutral’ loci.

      (4) Population size of E. ovodovi over the past 2 Myr. Strengths: The authors correct for differences in genome coverage to allow for the PSMC profiles between four equid taxa to be comparable, allowing for comparison of population size trajectories. Weaknesses: In Figure 4, the presented PSMC profiles are a mix of those with or without transitions (comparing profiles to Figure - 4 figure supplement 1). Given that the exclusion of transitions impacts the PSMC profiles, these should be standardized in Figure 4 to give a fair comparison.

      We thank the reviewer for this suggestion as well. As for the possible mis-incorporation pattern and high error rate of four equids, we compared the PSMC analyses performed with and without transitions. A consistent pattern was observed regarding two datasets expect for the PSMC bootstrap pseudo-replicates for HH06D, and we therefore only presented PSMC profiles without transitions when considering the ancient HH06D specimen. Meanwhile, we applied a correction based on an empirical uniform false-negative rate for low coverage genomes (<20×). All three Eurasian equine species genomes were rescaled following the same procedure (see L. Orlando et al. (2013), https://www.nature.com/articles/nature12323).

      (5) Inbreeding was a contributing factor to the extinction of E. ovodovi. Strengths: The authors determine heterozygosity and runs-of-homozygosity in E. ovodovi and compare these to all living equids, and find that E. ovodovi had low heterozygosity although not excessive runs-of-homozygosity. Weaknesses: The authors should be more cautious with their interpretation/phrasing on L383-384, given that inbreeding and/or reduced genetic diversity has not been demonstrated as the extinction driver.

      Thanks for the suggestion, and we have now re-written this sentence: “So combined with a degree of inbreeding, the reduced genetic diversity available may have contributed to the subsequent extinction of the lineage”).

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Bishop et al. aim to quantify the ventilatory response to hypoxia and hypercapnia in the common marmoset, an increasingly more common primate research model. They also present an unsupervised analysis tool to quantify ventilatory behavior, which is a potentially major contribution to the respiratory field.

      Strengths of this manuscript include the inclusion of male and female animals and the development of an analysis toolkit that may be less impacted by biases that are introduced when hand analyzing respiratory behavior, as is commonly done in the field. This tool could be of tremendous value to the respiratory community. Identification of sniffs, sighs, and apneas are often plagued by the qualitative nature of the analysis.

      We thank the reviewer for taking their time to evaluate our submission and for the overall positive assessment of our submission.

      Limitations of the study relate to the measure of the hypoxic and hypercapnic ventilatory drive. Tidal volume in whole body plethysmography is not accurate unless the plethysmograph and body temperature are taken into account. (See, https://pubmed.ncbi.nlm.nih.gov/25080926/). This is particularly important when the animal's core body temperature changes during hypoxia because of a fall in metabolic rate. The decrease in VCO2 shown here suggests that this is occurring here.

      We thank the reviewer for their comment. We applied acute hypoxia and hypercapnia to perturb breathing behaviors and used our analysis tool to evaluate said disturbed respiratory behaviors. We have addressed the limitations of our studies in the revised submission. In addition, and because of this limitation, we include an arbitrary unit (a.u.) for tidal volume (and other characteristics of breathing derived from tidal volume).

      It is worth pointing out that the fall in VCO2 is not typically observed in humans. So, while the authors conclude that minute ventilation does not increase in the marmoset, it is not necessarily a valid conclusion that that hypoxia ventilatory drive is low because VE should be expressed as a function of VCO2. If VCO2 falls but VE is constantly, ventilation per unit metabolism will actually have increased. Ventilation may also be underestimated here because of the fall in core body temp that likely coincides with a lower VCO2.

      We thank the reviewer for this comment. The data on changes of metabolic rate (by measuring VCO2 or VO2) during hypoxia in human subjects are not consistent (for instance see PMID: 2390141, Figure 3 clearly shows a decrease of ~50% in metabolic rate during hypoxia). Therefore, we have soften the language in our submitted revision.

      In addition, in the revised manuscript, we have performed the recommended analysis to express VE as a function of VCO2. However, hypoxia did not increase the ventilation efficiency (VE/VCO2) in marmosets. We have added the new data (Figure 4H) and discussed it in the revised manuscript.

      It is also worth noting that the hypoxic ventilatory response is not necessarily linear and the full range of the response is not characterized. For example, 15% O2 in the rat elicits very little response but there is a robust response with 9% O2. It is also worth noting, relevant to the previous points, that this is not an isocapnic ventilatory response, so the hypoxic response is certainly confounded by the changing CO2 which may not mimic situations like sleep apnea.

      We thank the reviewer for this comment. In the revised manuscript, we added that we have applied ‘acute’ hypoxic/hypercapnic challenges and discussed the limitation of our study.

      Reviewer #2 (Public Review):

      I do not see any fundamental flaws in it as such.<br> However, what really compromises the paper, it the lack of a "punch line". It is highly descriptive rather than analytical, it reads like a list of mostly predictable outcomes, but what is the question, what is the novelty, why is it important... This does not come out at all. On one hand it is important to have such basic information about marmosets but is it best placed into a non-specialist journal? In addition, the whole point of getting involved with monkeys is because they are closer to humans than rodents, but authors did not fully explore these similarities/differences or focus on them or try to explain them. One would want to have a clear conclusion in the end, how closely they resemble humans, for what type of experiments they are better than rodents, because of what... But this is not evident. Neither is it clear what is the value of the novel protocol for data analysis which seems to have been a major effort. In the end we are left with the impression that the results you get with it are the same as with the old protocols... What is its value then? Something needs to be done to make this paper attract readers others but only specifically interested in this topic.

      We thank the reviewer for these comments. We acknowledge that the initial submission was not as clear as we had hoped. We have revised the manuscript and added more details about our new analysis tool and further strengthened its applicability by including new analysis from a rodent model. We believe the major contribution of this manuscript to the field is providing a new open-source tool to analyze complex breathing behavior signals in conscious, awake, and active laboratory animals. In this manuscript, we demonstrate the strength of this approach in rapidly expediting analysis of breathing behaviors, which we analyze of the common marmoset and rat, yet could be equally applicable to other animal models.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript addresses a major issue facing consumers of structure-organism pair data: the landscape of databases is very difficult to navigate due to the way data is made available (many resources do not have structured data dumps) and the way data is standardized (many resources' structured data dumps do not standardize their nomenclature or use stable entity identifiers). The solution presented is a carefully constructed pipeline (see Figure 1) for importing data, harmonizing/cleaning it, automating decisions about exclusions, and reducing redundancy. The results are disseminated through Wikidata to enable downstream consumption via SPARQL and other standard access methods as well as through a bespoke website constructed to address the needs of the natural products community. The supplemental section of the manuscript provides a library of excellent example queries for potential users. The authors suggest that users may be motivated to make improvements through manual curations on Wikidata, through semi-automated and automated interaction with Wikidata mediated by bots, or by addition of importer modules to the LOTUS codebase itself.

      Despite the potential impact of the paper and excellent summary of the current landscape of related tools, it suffers from a few omissions and tangents:

      1. It does not cite specific examples of downstream usages of structure-organism pairs, such as an illustration on how this information in both higher quantity and quality is useful for drug discovery, agriculture, artificial intelligence, etc. These would provide a much more satisfying bookend to both the introduction and conclusion.

      Thank you for this remark. We deliberately decided not to insist too heavily on the application examples of the LOTUS outputs. Indeed we are somehow biased by our main investigation field, natural products chemistry, and expect that the dissemination of specialized metabolites occurrences will benefit a wide range of scientific disciplines (ecology, drug discovery, chemical ecology, ethnopharmacology, etc.)

      However, Figure 5 was established to illustrate how the information available through LOTUS is quantitatively (size) and qualitatively (color classes) superior to what is available through single natural products resources.

      As added in the introduction, one of the downstream usages of those pairs is for example to perform taxonomically informed scoring as described in https://doi.org/10.3389/fpls.2019.01329. Obtaining an open database of natural products’ occurrences to fuel such taxonomically informed metabolite annotation tools was the initial impulse for us to build LOTUS. These metabolite annotation strategies, tailored for specialized metabolites, have been shown to offer appreciable performance improvements for current state-of-the-art computational metabolite annotation tools. Since metabolite annotation is still regularly cited as “the major bottleneck” in metabolomics in the scientific literature over the last 15 years (https://europepmc.org/article/med/15663322, https://doi.org/10.1021/acs.analchem.1c00238), any tangible improvement in this field is welcome. With LOTUS we offer a reliable and reusable structures-organisms data source that can be exploited by the community to tackle such issues of importance.

      Other possible usages are suggested in the conclusion, but benchmarking or even exemplifying such uses is clearly out of the scope of this paper, each one of them being an article per se.

      The additional queries are written in our first answer (see “essential revisions”) and demonstrate the impact of LOTUS on accelerating the initial bibliographic survey of chemical structures occurrences over the tree of life.

      This query (https://w.wiki/4VGC) can be compared to a literature review work, such as https://doi.org/10.1016/j.micres.2021.126708. In seconds, it allows retrieving a table listing compounds reported in given taxa and limits the search by years.

      1. The mentions of recently popular buzzwords FAIR and TRUST should be better qualified and be positioned as a motivation for the work, rather than a box to be checked in the modern publishing climate.

      It is true that the modern publishing system certainly suffers from some drawbacks (also critically mentioned within the paper). However, after consultation of all authors, we believe that because LOTUS checks both boxes of FAIR and TRUST, we would rather stick to these two terms. In our view, rules 1 (Don’t reinvent the wheel) and 5 (put yourself in your user’s shoes) of https://doi.org/10.1371/journal.pcbi.1005128 apply here. Both terms are indeed commonly (mis-)used but we felt that redefining other complicated terms would not help the reader/user.

      1. The current database landscape really is bad; and the authors should feel emboldened to emphasize this in order to accentuate the value of the work, with more specific examples on some of the unmaintained databases

      We perfectly agree with this statement and it is the central motivation of the LOTUS initiative to improve this landscape. It was a deliberate choice not to emphasize how bad the actual landscape is, but rather to focus on better habits for the future. We do not want to start devaluing other resources and elevate our initiative at the cost of others. We also believe that an attentive look at the complexity of the LOTUS gathering, harmonization, and curation speaks for itself and describes the huge efforts required to access properly formatted natural products occurrence data.

      If the reviewer and editors insist, although not in our scope, we are happy to list a series of specific (but anonymized) examples of badly formatted entries, of wrong structures-organisms associations, or poorly accessible resources.

      1. While the introduction and supplemental tables provide a thorough review of the existing databases, it eschews an important more general discussion about data stewardship and maintenance. Many databases in this list have been abandoned immediately following publication, have been discontinued after a single or limited number of updates, or have been decommissioned/taken down. This happens for a variety of reasons, from the maintainer leaving the original institution, from funding ending, from original plans to just publish then move on, etc. The authors should reflect on this and give more context for why this domain is in this situation, and if it is different from others.

      We do agree with the reviewer and added a “status” column in the table https://github.com/lotusnprod/lotus-processor/blob/main/docs/dataset.csv We chose 4 possible statuses:

      • Maintained (self-explanatory)
      • Unmaintained: the database did not see any update in the last year.
      • Retired: the authors stated they will not maintain the database anymore.
      • Defunct: the database is not accessible anymore

      As for question 3 above, we decided not to focus too heavily on the negative points and resume the current situation in the previous table. Reasons for the databases publishing being in this situation are multiple, and we think they are well summarized in https://doi.org/10.1371/journal.pcbi.1005128 (Rule 10: Maintain, update, or retire), already cited in the manuscript introduction.

      1. Related to data stewardship: the LOTUS Initiative has ingested several databases that are no longer maintained as well as several databases with either no license or a more restrictive license than the CC0 under which LOTUS and Wikidata are distributed. These facts are misrepresented in Supplementary Table 1 (Data Sources List), which links to notes in one of the version controlled LOTUS repositories that actually describes the license. For example, https://gitlab.com/lotus7/lotus-processor/-/blob/8b60015210ea476350b36a6e734ad6b66f2948bc/docs/licenses/biofacquim.md states that the dataset has no license information. First, the links should be written with exactly what the licenses are, if available, and explicitly state if no license is available. There should be a meaningful and transparent reflection in the manuscript on whether this is legally and/or scientifically okay to do - especially given the light that many of these resources are obviously abandoned.

      This point is a very important one. We did our best to be as transparent as possible in our initial table. Following the reviewer’s suggestion, we updated it to better reflect the licensing status of each resource (https://github.com/lotusnprod/lotus-processor/blob/main/docs/dataset.csv). Therefore, we removed the generic “license” header, which could indeed be misleading, and replaced it with ”licensing status”, filled with the attributed license type and hyperlink to its content). It remains challenging since some resources changed their copyright in the meantime. We remain at the editor and reviewers’ disposal for any further improvement.

      Moreover, as stated in the manuscript, we took care of collecting all licenses and contacted authors of resources whose license was not perfectly explicit to us, therefore accomplishing our due diligence. Additionally, we contacted legal offices in our University and explained our situation. We did everything that we had been advised.

      1) To the best of our knowledge, the dissemination of the LOTUS initiative data falls under the Right to quote for scientific articles, as we do not share the whole information, but only a very small part.

      2) We do not redistribute original content. What comes out of LOTUS has undergone several curation and validation steps, adding value to the original data. The 500 random test entries, provided in their original form for the sake of reproducibility and testing, are the only exception.

      Many scientific authors forget about the importance of proper licensing. While it might be deliberate to restrict the use, inappropriate license choice (or omission) is too often due to a lack of information on its implication.

      All authors of the utilized resources can freely benefit from our curation. We are sharing with the community the results of our work, while always citing the original reference.

      Concerning the possible evolution of licensing, it remains a real challenge. While we tried to “freeze” the license status when we accessed the data, some resources updated their licensing since then. This can be tracked in the git history of the table (https://github.com/lotusnprod/lotus-processor/blob/main/docs/dataset.csv). Discrepancies between our frozen licensing (at the time of gathering) and actual license can therefore occur. Initiatives such as https://archive.org/web could help solving this issue, coming with other legal challenges.

      1. The order of sections of the manuscript results in several duplicated, but not further substantiated explanations. Most importantly, the methods should be much more specific throughout and the results/discussion should more heavily cross-link to it, as a reader who examines the paper from top to bottom will be left with large holes of misunderstanding throughout.

      As our paper focuses a lot on the methods, the barrier between results & methods becomes thinner. We took into account the reviewers’ suggestions and added some additional cross-links for the reader to be able to quickly access related methods.

      1. The work presented was done in a variety of programming languages across a variety of repositories (and even version control systems), making it difficult to give a proper code review. It could be argued that the most popular language in computational science at the moment is Python, with languages like R, Bash, and in some domains, still, Java maintaining relevance. The usage of more esoteric languages (again, with respect to the domain) such as Kotlin hampers the ability for others to deeply understand the work presented. Further, as the authors suggest additional importers may implemented in the future, this restricts what external authors may be able to contribute.

      Scientific software has indeed always been written in multiple languages. To this day, scientists have used all kinds of languages adapted both to their needs and their knowledge. Numpy uses Fortran libraries and many projects published in biology and chemistry recently are in Java, R, Python, C#, PHP, Groovy, Scala… We understand that some authors are more comfortable with one language or another. But R syntax is for example much more distant from Python's syntax than Kotlin can be. We needed a highly performant language for some parts of the pipeline and R, Bash, or Python were not sufficient. We decided to use Kotlin as it provides an easier syntax than Java while staying 100% compatible with it.

      The advantage of the way LOTUS is designed is that importers are language-agnostic. As long as the program can produce a file or write to the DB in the accepted format, it can be integrated into the pipeline. This was our goal from the beginning, to have a pipeline that can have its various parts replaced without breaking any of the processes.

      1. As a follow up to the woes of point 4., 5., and 7., the manuscript fails to reflect on the longevity of the LOTUS Initiative. Like many, will the project effectively end upon publication? If not, what institutions will be maintaining it for how long, how actively, and with what funding source? If these things are not clear, it only seems fair to inform the reader and potential user.

      LOTUS is an initiative that aims to improve knowledge management and sharing in natural products research. Our first project, which is the object of the current manuscript, is to provide a free and open resource of natural products occurrences for the scientific community. Its purpose is not to be a database by itself, but instead to provide through Wikidata and associated tools a way to access natural products knowledge. The objective was not to create yet another database (https://doi.org/10.1371/journal.pcbi.1005128), but instead to remove this need and give our community the tools and the power to act on its knowledge. This way, as everything is on Wikidata, the initiative is not “like many”. This also means that this project should not be considered and evaluated exactly like a classical DB. Once the initial curation, harmonization, and dissemination jobs have been done, they should ideally not be run again. The community should switch to Wikidata as a point of access, curation, and addition of data. If viewed with such arguments in mind, yes, LOTUS can live long!

      Wikimedia is a public not-for-profit organization, whose financial development appears to indicate solid health https://en.wikipedia.org/wiki/Wikimedia_Foundation#Finances.

      In terms of funding sources, we would like to refer to https://elifesciences.org/articles/52614#sa2 , which stated the following in response to a similar question: "Wikidata is sustained by funding streams that are different from the vast majority of biomedical resources (which are mostly funded by the NIH). Insulation from the 4-5 year funding cycles that are typical of NIH-funded biomedical resources does make Wikidata quite unique." The core of the Wikidata funding streams are donations to the Wikipedia ecosystem. These donations - with a contributor base of millions of donors from almost any country in the world, chipping in at an average order of magnitude of around 10 dollars - are likely to continue as long as that ecosystem is useful to the community of its users. See <https://wikimediafoundation.org/about/financial-reports for details>.

      1. Overall, there were many opportunities for introspection on the shortcomings of the work (e.g., the stringent validation pipeline could use improvement). Because this work is already quite impactful, I don't think the authors will be opening themselves to unfair criticism by including more thoughtful introspection, at minimum, in the conclusions section.

      We agree with the reviewer and therefore, list again the major limitations of our processing pipeline:

      First, our processing pipeline is heavy. It includes many dependencies and requires a lot of time for understanding. We are aware of this issue and tried to simplify it as much as possible while keeping what we considered necessary to ensure high data quality. Second, it can sometimes induce errors. Those errors, ranging from unnecessary discarded correct entries to more problematic ones can be attributed to various parameters, reflecting the variety of our input. We will therefore try listing them, keeping in mind that the list won’t be exhaustive. For each detected issue, we tried fixing it at best, knowing it will not lead to an ideal result, but hopefully increase data quality gradually.

      ● Compounds

      ○ Sanitization (the three steps below are performed automatically since we observed a higher ratio of incorrect salts, charged or dimerized compounds. However, this also means that true salts, charged or dimeric compounds were erroneously “sanitized”.)

      ■ Salt removals

      ■ Charged molecules

      ■ Dimers

      ○ Translation (both processes below are pretty error-prone)

      ■ Name to structure

      ■ Structure to name

      ● Biological organisms

      ○ Synonymy

      ■ Lotus (https://www.wikidata.org/wiki/Q3645698, https://www.wikidata.org/wiki/Q16528).

      This is also one of the reasons why we decided to call the resource Lotus, as it illustrates part of the problem.

      ■ Iris (https://www.wikidata.org/wiki/Q156901, https://www.wikidata.org/wiki/Q2260419)

      ■ Ficus variegata (https://www.wikidata.org/wiki/Q502030, https://www.wikidata.org/wiki/Q5446649)

      ○ External and internal dictionaries are not exhaustive, impacting translation

      ○ Some botanical names we use might not be the accepted ones anymore because of the tools we use and the pace taxonomy is renaming taxa.

      ● References

      ○ The tool we favored, Crossref, returns a hit whatever the input. This generates noise and incorrect translations, which is why our filtering rules focus on reference types.

      ● Filtering rules:

      ○ Limited validation set, requires manual validation

      ○ Validates some incorrect entries (False positives)

      ○ Does not validate some correct entries (False negatives)

      Again, our processing pipeline removes entries we do not yet know how to process properly.

      Our restrictive filters but substantial contribution to Wikidata in terms of structure-organisms pairs data upload should hopefully incentivize the community to contribute by further adding its human validated data.

      We updated the conclusion part of the manuscript accordingly. See https://github.com/lotusnprod/lotus-manuscript/commit/a866a01bad10dfd8b3af90e2f30bb3ae51dd7b9e.

      Reviewer #2 (Public Review):

      Rutz et al. introduce a new open-source database that links natural products structures with the organisms they are present in (structure-organism pairs). LOTUS contains over 700,000 referenced structure-organism pairs, and their web portal (https://lotus.naturalproducts.net/) provides a powerful platform for mining literature for published data on structure-organism pairs. Lotus is built within the computer-readable Wikidata framework, which allows researchers to easily contribute, edit and reuse data within a clear and open CC0 license. In addition to depositing the database into Wikidata, the authors provide many domain-specific resources, including structure-based database searches and taxon-oriented searches.

      Strengths:

      The Lotus database presented in this study represents a cutting-edge resource that has a lot of potentials to benefit the scientific community. Lotus contains more data than previous databases, combines multiple resources into a single resource.

      Moreover, they provide many useful tools for mining the data and visualizing it. The authors were thoughtful in thinking about the ways that researchers could/would use this resource and generating tools to make it ways to use. For example, their inclusion of structure-based searches and multiple taxonomy classification schemes is very useful.

      Overall the authors seem conscientious in designing a resource that is updatable and that can grow as more data become available.

      Weaknesses/Questions:

      1) Overall, I would like to know to what degree LOTUS represents a comprehensive database. LOTUS is clearly, the best database to date, but has it reached a point where it is truly comprehensive, and can thus be used for a metanalysis or as a data source for research questions. Can it truly replace doing a manual literature search/review?

      As highlighted by the reviewer, even if LOTUS might be the most comprehensive natural products occurrences ressources at the moment, TRUE or FULL comprehensive quality of such resource will always be limited to the available data in the litterature. And the community is far from fully describing the metabolome of living beings. We however hope that the LOTUS infrastructure will offer a good place to start this ambitious and systematic description process.

      1) Yes it can serve as data source for research questions, as exemplified in the query table

      2) No, it cannot and must not replace manual literature search. Manual literature search is the best but at an enormous cost. If the outcome of such search can be made available to the whole community (eg. via Wikidata), the value of such would be even bigger. However, LOTUS can expedite a decent part of a manual litterature search and liberate time to complement this search. See our comment to the editors “To further showcase the possibilities opened by LOTUS, and also answer the remark on the comprehensiveness of our resource, we established an additional query (https://w.wiki/4VGC).This query is comparable to a literature review work, such as: https://doi.org/10.1016/j.micres.2021.126708. In seconds, it allows retrieving a table listing compounds reported in given taxa and limits the search by years.”

      We added these examples in the manuscript (see https://github.com/lotusnprod/lotus-manuscript/commit/a6ee135b83e56e8e2041d09d7ce2d5b913c1029d)

      2) Data Cleaning & Validation. The manuscript could be improved by adding more details about how and why data were excluding or included in the final upload. Why did only 30% of the initial 2.5 million get uploaded? Was it mostly due to redundant data or does the data mining approach result in lots of missed data?

      The reason for this “low” yield is that we highly favored quality over quantity (as in the F-score equation, ß being equal to 0.5, so more importance is given to the precision than the recall). Of course there is redundancy, but the rejected entries are mostly because of too low confidence level according to our developed rules. It is not fully discarded data as we keep it for further curation (ideally including the community) before uploading to Wikidata. We adapted the text accordingly.

      3) Similarly, more information about the accuracy of the data mining is needed. The authors report that the test dataset (420 referenced structure-organisms pairs) resulted in 97% true positives, what about false negatives? Also, how do we know that 420 references are sufficiently large to build a model for 2.5M datapoints? Is the training data set is sufficiently large to accurately capture the complexities of such a large dataset?

      False negatives are 3%, which is, in our opinion, a fair amount of “loss” given the quality of the data. We actually manually checked 500+ documented pairs, which is more or less the equivalent of a literature review. We were careful in sampling the entries in the right proportions, but we cannot (and did not) state they are enough. We cannot model it either, since the 2.5M+ points have absolutely different distributions, in terms of databases, quality, etc. Only “hint” is the similar behaviour among all subsets. (the 420 + 100 entries) were divided between 3 authors, which obtained similar results.

      4) Data Addition and Evolution: The authors have outlined several mechanisms for how the LOTUS database will evolve in the future. I would like to know if/how their scripts for data mining will be maintained if they will continue to acquire new data for the database. To what extent does the future of LOTUS depend on the larger natural products community being aware of the resource and voluntarily uploading to it? Are there mechanisms in place such as those associated with sequencing data and NCBI?

      Programs have been not only maintained but also updated with new possibilities (as, for example: the addition of a “manual mode” allowing user to run the LOTUS processing pipeline on a set of their own entries and make them Wikidata-ready (https://github.com/lotusnprod/lotus-processor/commit/f49e4e2b3814766d5497f9380bfe141692f13f23). We will of course do our best to keep on maintaining it, but as no one in academia can state he/she will maintain programs forever. However the LOTUS initiative hopefully embraces a new way of considering database dynamics. If the repository and website of the LOTUS initiative shut down tomorrow, all the work done will still be available to anyone on Wikidata. Of course, future data addition strongly relies on community involvement. We have already started to advocate for the community to start taking part of it, in the form of direct upload to Wikidata, ideally. At the time, there are no mechanisms in place to push publishing of the pairs on Wikidata (as for sequencing, mass spec data), but we will be engaged in pushing forward this direction. The initiative needs stronger involvement of the publishing sector (also reviewers) to help change those habits.

      5) Quality of chemical structure accuracy in the database. I would imagine that one of the largest sources of error in the LOTUS database would be due to variation in the quality of chemical structures available. Are all structure-organism pairs based on fully resolved NMR-based structures are they based on mass spectral data with no confirmational information? At what point is a structural annotation accurate enough to be included in the database. More and more metabolomics studies are coming out and many of these contain compound annotations that could be included in the database, but what level (in silico, exact mass database search, or relative to a known standard) are required.

      This is a very interesting point and some databases have this “tag” (NMR, cristal, etc.). We basically rely on original published articles, included in specialized databases. If poorly reported structures have been accepted for publication, labelled as “identified” (and not “annotated”) and the authors publishing the specialized databases overlooked it, we might end up with such structures.

      Here, the Evidence Ontology (http://obofoundry.org/ontology/eco.html) might be a good direction to look at and further characterize the occurrences links in the LOTUS dataset.

      Reviewer #3 (Public Review):

      Due to missing or incomplete documentation of the LOTUS processes and software, a full review could not be completed.

      Some parts of LOTUS were indeed not sufficiently described and we improved both our documentation and accessibility to external users a lot. We thank the reviewer for insisting on this point as it will surely improve the adoption of our tool by the community.

    1. Author Response:

      Reviewer #2 (Public Review):

      Mattis et al have used a hemizygous mutant of the gene Scn1a to study changes underlying the severe epilepsy disorder Dravet syndrome. They describe a change in activation of the dentate gyrus in this mouse model, due to altered excitatory synaptic input. They show that this occurs in the age range after normalization of early inhibitory interneuron dysfunction. This provides an interesting potential mechanism by which neural circuit function is altered even after deficits in inhibition are seemingly corrected. They also report that stimulation of inputs to the dentate gyrus increase seizure susceptibility when body temperature is elevated. Overall these findings indicate a new form of circuit dysfunction that may underlie the etiology of this severe genetic epilepsy disorder.

      These findings are not fully complete, and the manuscript suffers from some flaws in experimental design.

      The most pressing issue is the lack of a counter-balanced design in experiments testing the ictogenicity of DG stimulation. The authors attempt to justify this stating "there is a theoretical concern that seizure threshold on Day 2 (the second consecutive day of stimulation) could be lowered by a seizure 24 hours prior (a "kindling"-like phenomenon)". In the very next sentence, they cite a study in which this phenomenon has been shown (thus the concern is not theoretical). That said, this is not a semantic argument, but a flaw in experimental design. On day 1, the authors perform experiment A. On day 2, they perform experiment A+B. In an attempt to show that performing experiment A on day 1 does not by itself lead to changes in experiment A+B, they use a separate cohort and show that experiment A does not lead to changes in a repetition of experiment A. Unfortunately, this is not an adequate control. Experiment A+B involves a different set of stimuli, to which the response could very well be altered by the day 1 experiment, but this change would not be revealed with the described experimental design. To determine whether the effect shown in experiment A+B requires a more rigorous, counter-balanced experimental design where one group undergoes experiment A followed by experiment A+B, and a second group undergoes experiment A+B followed by experiment A.

      Thank you for this important critique.

      → We agree with these points and have repeated this experiment using an improved experimental design (Figure 6). We now present data from three groups of mice: Scn1a-ChR2 (experimental mice), Scn1a-YFP (photostimulation control), and WT-ChR2 (genotype control), tested on a single day (obviating concerns about day 2).

      → Please note that this revised manuscript includes an additional ictogenicity experiment (Figure 7), in which we employ the proposed counter-balanced experimental design.

      The second major issue is a lack of wild type control groups for several experiments. The experiments presented in Figures 4, 6C and F, and 7 all lack the necessary wild type control measures. Wild type controls were done for Figure 6E, but the data are not presented in the figure.

      This is also an important point.

      → For the Hm1a experiment (Figure 4), we now present wild-type control data for both PV-IN electrophysiology and 2P circuit-level imaging (Figure 4 – figure supplement 1).

      → We have removed the optogenetic imaging data (previously Figure 6C).

      → The entorhinal cortex ictogenicity experiment (Figure 6) has been re-designed, as per above, and includes appropriate controls.

      → For the experiment demonstrating a decrease in circuit activation in response to PV-IN stimulation (now Figure 8), we were not able to perform a wild-type control due to very low levels of wild-type activation under those conditions (see Figure 2 panel A3 – response to 1 pulse in young adult wild-type mice), as noted in the comments in response to the critique of Reviewer #1. In other words, in the wild-type mice, there was essentially no signal to block. In this experiment we in fact conceptualize the SST activation as the control group (for the PV activation), which we clarify in the text.

      Some of the cell physiology experiments presented were not optimally designed to provide a relevant mechanistic follow-up to the major findings. For the first major finding of the paper, Figure 2 shows clear and interesting changes in DG activation in the mouse model, and Figure 5 reveals changes to synaptic excitation and inhibition in these neurons. Figure 3 and 4 present data showing changes to PV-interneuron intrinsic properties that only reveal themselves under very intense stimulation. While these findings are interesting and worthy of follow-up, the changes aren't relevant to the synaptic stimulation used in Figure 2.

      Thank you for this important comment. We now include additional data, as follows:

      → A parallel dataset quantifying intrinsic properties in the early postnatal timepoint (Figure 3 – figure supplement 1; Table 2). We find that the PV-INs are much more profoundly impaired at this younger timepoint, which further argues against PV-IN dysfunction as the cause of the increased DG activation seen in young adult Scn1a mice relative to wild-type; i.e., PV-IN excitability partially normalizes with development in Scn1a+/- mice, whereas the DG hyperactivation becomes more severe.

      → Synaptic data from the early postnatal timepoint (Figure 5 – figure supplement 2), in which we find no genotype difference in the E/I ratio or EPSC magnitude.

      → PPR at both timepoints, showing no genotype difference in the early postnatal mice, but a higher release probability in the young adult mice.

      Finally, Figure 2 has missing data points, seemingly due to cropping of panels. Data visualization is problematic for this vital figure. The fit lines for individual experiments overwhelm the color-filled variance of the mean. Thus, the data in this figure are very difficult to read and interpret. The figure would benefit from including all the individual data points and summary data, but removing the individual fits or putting them into a supplement.

      We appreciate this very helpful feedback. We now present a “cleaner” version of this main Figure (Figure 2), with the individual fit lines shown in a supplemental Figure (Figure 2 – figure supplement 1).

      Reviewer #3 (Public Review):

      The authors tackle an interesting question - whether the dentate gyrus is a locus of pathology in Scn1a+/- mice and uncover a strong phenotype - the granule cells of the dentate gyrus are over-activated and the EC to dentate pathway is prone to seizure genesis. In the discussion, they suggest that their results support the idea that the DG may be a common locus to several different types of epilepsy… an attractive hypothesis! There are several strengths of the paper. The team has done a nice job of presenting 'ground-truth' data that their measurements of dF/F across a large population of granule cells correlates with action potentials in these cells. As the authors point out, this is especially important when working in disease models in which the dF/F-action potential relationship may be altered. Throughout, the authors were also careful about considering the limitations of their various techniques and analyze the data in several ways to account for possible artifacts (e.g. ensuring that differences in activation are not arising because of slicing and consideration of kindling in later in vivo seizure threshold experiments). The experiments were well designed and appropriately interpreted.

      One of most intriguing results of the work is that PV interneurons in the DG of Scn1a+/- show only very minor impairments in young adult animals (they show more spike accommodation than in control animals). Rather, it seems that the GCs receive enhanced excitation from the entorhinal cortex. They perform a set of pharmacological experiments to prove that PV interneurons (and more generally inhibition) do not account for the difference in granule cell activation - however, here it would be useful to see the data summarized more consistently. It is difficult to interpret the pharmacological results (both of which are presented as changes in dF/F0) with respect to the initial findings of the manuscript (presented as estimated activation across the entire population).

      We appreciate this helpful suggestion. We agree that the presentation of the calcium imaging data in the initial submission made data interpretation more difficult for the reader. In this revised manuscript we have improved the consistency of presentation of the calcium imaging data. Please note however that we conceptualize this imaging data as fitting into two categories, which do require different graphical depiction: 1) Unpaired data in which we analyze responses across a range of stimulation conditions, shown in Figure 2 and associated Figure 2 – figure supplement 1 and Figure 2 – figure supplement 3; and 2) Paired data in which we assess the response within a given imaging field to a manipulation performed at a single stimulation condition (Hm1a data in Figure 4 and Figure 4 – figure supplement 1; PTX data in Figure 5 and Figure 5 – figure supplement 2; PV-IN data in Figure 8)

      A beautiful aspect of this work is that it goes from cells to circuits to intact brain (in vivo). They nicely show that the heightened excitation from the EC to the DG is sufficient to drive seizures in the Scn1a+/- mice, and finally that since PVs are intact, they can be harnessed to balance out the over activation of GC via optogenetic stimulation of PVs.

    1. Author Response:

      We thank the editors and reviewers for their assessment of our manuscript on the instructive role of enhancer activity in the probability of gene allele activation and random monoallelic expression, and the associated helpful comments.

      Concerning the possibility that our findings apply to gene types other than hematopoietic-related genes, we believe that answer is yes. In fact, the first documented examples of enhancers regulating the probability of target gene expression were in nonhematopoietic cells: the non-hematopoietic cell lines CV-1 and HeLa (Weintraub PNAS 1988 PMID: 3045805; Walters et al. PNAS 1995 PMID: 7624382). Furthermore, the characteristic constitutive accessibility of enhancers at RME loci regardless of expression of the gene, which is suggestive of probabilistic effects, is shared by hematopoietic and non-hematopoietic (neural lineage) cells (Xu et al. Nat. Genet. 2017 PMID: 28112738). Together with our study, the available evidence argues for a unified role of enhancer activity in determining gene expression probabilities across cell types.

      Concerning the thoughtful review from reviewer 1, these dynamics are not limited to genes encoding cell surface receptors. Recent work that we cited in our manuscript showed that a distal enhancer regulates the expression probability of the gene encoding the transcription factor Bcl11b which determines the T cell fate (Ng et al. eLife 2018 PMID: 30457103).

      In summary, while we focused on the NK cell receptor genes and genes encoding other cell surface proteins in various hematopoietic cell types due to experimental tractability, we believe it is unlikely that our findings will be restricted to specific cell types or to receptor genes.

      Reviewer #1:

      This study investigates the role of enhancer activity in the regulation of stable random monoallelic expression (RME) using the Ly49 and Nkg2 receptor gene families expressed in natural killer (NK) cells, as models of RME genes. The authors show that, unlike promoters of RME genes, enhancer are accessible on both alleles and display histone marks of active enhancers. Moreover, they show that weakening enhancer activity, via CRISPR-mediated deletion, can lower the frequency of gene expression or lead to variegated expression patterns, that are reminiscent of RME. The manuscript is clearly written and the data presented are compelling. This study takes advantage of previously-characterised allele-specific antibodies for various genes expressed in NK cells, a powerful tool allowing the analysis of random monoallelic expression (RME) at the protein and single-cell level within a population. The use of these antibodies allows the investigation of in vivo cell population and circumvents the analysis at the RNA level, which is limited by expression bursts and transcript levels. The authors also substantiate their model using examples of receptor genes expressed in other cell types from the hematopoietic lineage. One question that remains is whether this model applies to other developmentally regulated stable RME genes, that are 1-not expressed at the cell surface (such as transcription factors) and 2- expressed in other cell lineages? It is also unclear what defines the strength of an enhancer upstream of the RME genes studied, e.g. what is the difference between a weak enhancer for Ly49 genes and strong enhancer. These points should be of broad interest for the readers and could be discussed further in the discussion part of the manuscript.

      We thank the Reviewer 1 for very thoughtful comments.

      First, we would like to address the reviewer’s question concerning whether our findings apply to genes that encode transcription factors or other proteins that are not on the cell surface. This indeed IS the case based on recent work we cited showing that a distal enhancer regulates the expression probability of the gene encoding the transcription factor Bcl11b, which determines the T cell fate (Ng et al. eLife 2018 PMID: 30457103). That our findings also apply to genes expressed in non-hematopoietic cells is addressed in the response above to the evaluation summary.

      We also welcome the opportunity to elaborate on enhancer “strength”, albeit somewhat speculatively. Enhancer activity acting upon a locus varies quantitatively in a context-dependent manner. The strength of enhancer activity is likely a function of several factors including (but not limited to) A) the collective (nonredundant) effects of multiple enhancers in genes that have more than one; B) the concentration of enhancer-binding transcription factors (TFs) in the nucleus; C) the affinity of those factors for the target DNA sequences; D) interactions of the relevant transcription factors with each other and with other components of the transcriptional machinery; E) interactions of the enhancer with the specific promoters; and F) the distance between an enhancer and a promoter. Concerning A), our work suggests that where multiple enhancers are present, elimination of one of them reduces overall enhancer strength/activity, resulting in a lower frequency of gene expression. Relevant to B) is work from several groups showing that Ly49 expression frequencies change when relevant TF expression levels are experimentally altered (Held et al. Immunity 1999 PMID: 10549625; Ohno et al. Int. Immunol. 2008 PMID: 18003603; Bezman et al. J. Exp. Med 2011 PMID: 22124110); those results suggest that one means by which enhancer activity may be increased is by increasing the concentration of available TFs. Relevant to F), enhancer-promoter distance may play a role in determining enhancer “strength”, as recent work has shown a distance-dependent binary effect of enhancers on gene expression in integrated reporters (Rinzema et al. BioRxiv 2021 https://doi.org/10.1101/2021.10.05.463209). Fully fleshing out the definition of enhancer strength in the context of RME gene expression will likely accompany a better understanding of how enhancers work generally, a subject of intense current study in the field. Finally, we do not exclude are role for the promoter, which may possess varying levels of intrinsic “competence” to be activated by the collective enhancer activity acting upon it.

    1. Author Response:

      Joint Public Review:

      Davis et al. parameterize a published, coarse-grained classical density functional theory (DFT) model to describe the free energy landscape of the FG-NTR system. They leverage their previously published experimental data (Zahn et al. eLife, 2016) to develop the model of inter-molecular cohesion calculations, which were tuned to reproduce their previous experimental results. The authors investigate NTR binding behavior to the planar film of FG-nups, first for single NTRs and then by combinations of NTRs. They confirm that the higher concentration of NTRs in the FG-nup films decreases their affinity to the film, which provides one rationale to explain the "transport paradox" of NTRs, which bind specifically to FG-nups but transit the NPC extremely rapidly and at high density. The second result is that increasing the concentration of one of the transport receptors in the film (by increasing its bulk concentration) reduces the adsorbed amount of the other transport receptor (whose concentrations is fixed). Last, the authors thus suggest that within some NTR concentration regimes there emerges a phase separation of the two NTRs such that NTF2 (small NTRs) locate near the surface while importin beta (large NTRs) go to the film/solution interface, implying the existence of separate transport pathways inside the NPC, which has been reported previously in experimental findings.

      There was broad enthusiasm for the model, which was found to be interesting, relevant, and to have successfully delivered testable insights. In general, the conclusions were found to be supported by the model outcomes. The segregation of small and large NTRs to different regions of the film was found to be an interesting result. Some results were found to be less exciting, for example the effect of competition between NTRs as they possess only repulsive interactions in the model.

      While there was some disagreement about the quality of the writing, there was a consensus that the explanation of the motivation, methodology, and impact of the conclusions was not sufficient. In particular, the reviewers felt there was a lack of sufficient context related to prior work in the field in the introduction and discussion and the need to better articulate the impact of the findings in the study. Thus, although the work was found by some to be a meaningful contribution addressing two important questions in the NPC field: how different NTRs are organized within the permeability barrier and if NTR organization and dynamics contribute to the efficient rates of nucleocytoplasmic transport through the crowded environment of the NPC, this point needs to be made clearer. Moreover, more attention is needed to previous theoretical works related to protein adsorption in polymer brushes.

      There was a consensus that the authors could have increased the impact of the work by broadening the study to investigate (or at a minimum discuss) 1) how the combination of NTRs with inert molecules behave (i.e. does the addition of NTRs influence the exclusion of inert cargo?); and 2) how cargo bound to the NTRs (particularly NTF2, which has a single cargo - Ran) influences the results (e.g. would the importin-beta effect be exacerbated by its coupling to an "inert" cargo?). A related theme was concern over the potential impact that the geometry of the NPC in vivo would have on the model outcomes, which speaks to the biological relevance. While the authors mention this issue in the Discussion, more directly addressing whether they can speculate on how their results will change for a cylindrical geometry and how the calculations would compare in a system with opposing surfaces (i.e., two surfaces modified by polymer brushes) was warranted. The latter system was felt to be a good proxy to understand how the effects of nanoconfinement in a cylindrical geometry may affect the results.

      We thank the reviewers for the thorough and comprehensive scientific evaluation of our manuscript and for the constructive feedback as articulated in their comments.

      Summary of Major Changes:

      We have added 28 individual data plots, in the form of four additional supplemental figures, in response to the comments of the reviewers. Importantly, we have produced an additional main figure containing a qualitative phase diagram that concisely summarizes the essential physical picture resulting from our work. In response to criticism about the explanation of our work, we have made substantial changes to the introduction, methods, results, and discussion sections in line with the feedback from the reviewers. Overall, we believe that the manuscript is much stronger than before in terms of the science, the relation to the existing literature, and the clarity in conveying our major assumptions.

    1. Author Response:

      Reviewer #2 (Public Review):

      I think this is a very interesting and timely contribution to the literature. It combines a dynamical systems perspective and single cell data in a very neat and exciting combination in order to identify aspects of the EMT process and dynamics.

      This is an ambitious and multi-faceted study and draws on a wide range of experimental, data science, and modelling tools and techniques. Overall I really liked the scope and focus of the study. I do believe that there are a few points where the arguments can be tightened and I will focus on those aspects.

      General Comments:

      In order to capture the dynamics the authors should perhaps engage with the arguments in Cruel and Flandoli (J Dynamics Diff Equations) which prove that additive noise destroys a pitchfork bifurcation. Related to this I think the arguments in PMC3372930 should be considered. They make a case against the pitchfork bifurcation on purely dynamical grounds. In PMID: 27616569 the arguments are not made quite as forceful but this is an excellent background reference. Against this background it is probably not surprising that the dynamics are best explained by saddle node bifurcations.

      One potential concern relates to the construction of the Langevin equation. Additive noise is a very specific choice and needs to be clearly justified. It is convenient, but not based on any physical reasoning in this case. We know that multiplicative noise (e.g. in the chemical Langevin equation, or geometric noise) will qualitatively alter the dynamics compared to the deterministic model. Much of the discussion in lines 250-260 is therefore limited or restricted to the case of additive noise and this needs to be made explicit. If additive noise is chosen because reaction coordinates can only be easily defined in this framework then this limitation should be specified.

      I can see that the simple additive noise makes the integrations in the calculation of the potential 486-499 easier, but again the limitations of this approach should be addressed either by pointing them out, or by considering a model with multiplicative noise.

      The most intriguing result to my mind is the existence of multiple reaction paths. I would like to see to what extent this is robust to e.g. multiplicative noise and other factors in the analysis.

      Thanks for these great points. One point we want to clarify. In our Langevin formulation, we do not assume additive noises, and the corresponding diffusion constant D is also positiondependent, as explained in Materials and Methods (1). In the revised manuscript we added the x-dependence of the noise terms to make it clear.

      References:

      1. Scheffer M, et al. (2009) Early-warning signals for critical transitions. Nature 461(7260):53-59.
    1. Author Response:

      Reviewer #1 (Public Review):

      [...]

      1. A notable shortcoming of the authors' interpretation is the generalization of their findings to preterm premature rupture of membranes (PPROM). As noted by the authors, term labor is considered a "sterile" process, which is particularly important in terms of the authors' findings since TLR4 in the fetal membranes may be responding to endogenous signals such as danger signals. However, a large proportion of PPROM cases are associated with microbial invasion of the amniotic cavity, and thus in this context TLR4 would be responding to bacterial products.

      To bring in some new elements and address this reviewer’s concern, along with the potential extrapolation between physiological rupture and pathological rupture in the case of PPROM, we decided first to remove Figure 3C (expression of TLR4 in the presence of LPS from bacterial origin) from the revised version of the manuscript. To address this comment, it is well known that the percentage of PPROM associated with microbial invasion are variable based on the weeks of gestation. In fact, early gestational ages are clearly linked to high-microbial-associated intra-amniotic inflammation prevalence (64.3% when <25 WGA) whereas this percentage subsequently decreases throughout gestation (Romero et al., 2015), reaching one-third at term, which better links with the gestational stage of the current study. Such observations support the fact that the TLR4 model in physiological rupture could be transposed—at least in part—to sterile PPROM and initiated by the presence of alarmins (i.e., HMGB1) and their binding to such type of receptors. Indeed, TLR4 is now well described as being stimulated by ligands other than LPS, such as HMGB1, a member of the DAMPs (Robertson et al., 2020). Furthermore, the quantification of TLR4 mRNA expression and protein in the case of PPROM without chorioamnionitis compared with term no labor without chorioamnionitis was already carried out (Kim et al., 2004), indicating an absence of clear link between the chorioamnionitis and TLR4 expression. Finally, in an animal model of PPROM, an article underlined the importance of TLR4 in preterm labor by using TLR4 mice mutants in a sterile context (Wahid et al., 2015).

      1. It is a well-known concept that TLR4 is expressed by the fetal membranes and is responsive to LPS stimulation, and thus the confirmatory set of experiments performed by the authors do not seem to be as novel. Indeed, given that this study was focused on the "sterile" process of term labor, perhaps the utilization of danger signals that can interact with TLR4 would be more appropriate.

      The choice to use LPS (Figure 3C) was only to confirm that TLR4 leads to a proinflammation activation in the amnion and choriodecidua, demonstrating the functional pathway after TLR4 activation in the fetal membranes environment. We completely agree these are not novel data; this is why we decided to remove this part of results in the revised version of the manuscript. Furthermore, we decided to not repeat the use of DAMPs (such as HMGB1) to stimulate the TLR4 pathway in this work because it was already published in the fetal membranes context (Bredeson et al., 2014). To be in accordance with your comments, we have modified the end of the results paragraph entitled ‘Combination of transcriptomic and methylomic results in the ZAM zone demonstrate that genes more expressed in the choriodecidua are linked to pregnancy pathologies’ to better justify the choice to focus on TLR4 global transcriptional regulation.

      1. The distinction between the ZAM and ZIM seems to have been lost among the TLR4-focused experiments, and thus it is unclear how these fetal membrane zones fit into the conceptual model proposed by the authors in the final figure.

      The reviewer is correct here, so to avoid confusion between the ZIM and ZAM used, we decided to do the following:

      • Read carefully all the successive paragraphs of the results to check for the presence of ‘ZAM specification’
      • Add ‘ZAM’ in the legend of Figure 4. This information was present in the related text of the article.
      • Update Figure 7 and its legend (model of regulation). We had ‘ZAM zone’ in the discussion part regarding Figure 7.
      1. The study is largely descriptive and would benefit from the addition of fetal membrane tissues from pregnancy complications such as PPROM and/or animal models in which premature rupture of the membranes has been induced.

      We agree that animal models are available. Nevertheless, we considered that such models are far from the human reality. In fact, animal models are often used for fetal membrane studies, but they are different regarding pregnancy physiology, structure and uterine environment, which hamper their use. We used ‘term’ fetal membrane to decipher the physiological rupture of membrane and demonstrate the importance of the TLR4 actor. To bring some elements regarding this comment and the possible extrapolation between physiological rupture and pathological rupture in the case of PPROM, we decided to remove Figure 3C (expression of TLR4 in the presence of LPS from bacterial origin) to focus more on the physiological rupture of fetal membranes without the involvement of bacterial presence. Previous bibliographic data answer the reviewer’s question: Kim et al. (2004) well demonstrated that TLR4 mRNA levels are higher in PPROM (31.2 weeks of gestation) fetal membranes without chorioamnionitis than in term (39.1 week of gestation) ones without chorioamnionitis.

      1. The study focuses on the mechanisms of rupture of membranes, but does not provide an explanation as to how the regulation of TLR4 mediates the process of membrane rupture.

      We agree with your comment; however, ‘how the regulation of TLR4 mediates the process of membrane rupture’ is not the topic of the manuscript. In addition, this has already been well established in previous publications. Nevertheless, we added a sentence in the introduction part between the lines 97-100 : ‘The mechanisms implying TLR4 in the physiological or pathological rupture of membrane in case of PPROM are well known. Triggering TLR4 will lead to NFκB activation, leading to an increase of the release of proinflammatory cytokine, concentration of matrix metalloprotease and prostaglandin, which are well established actors of fetal membrane rupture (Robertson et al., 2020).

      Reviewer #2 (Public Review):

      This is a well-conceived and executed paper that adds novel data to improve our understanding of rupture of the human fetal membranes. The new information presented not only addresses gaps in our understanding of normal parturition mechanisms but also the significant issue of preterm birth. The authors highlight the need to understand the understudied human fetal membranes to be able to understand its role in normal parturition but also to lower the rates of preterm birth. They not only establish the need to study this tissue but also to improve our appreciation for regional differences within it, using a comprehensive genetic approach. The authors provide data from a genome wide methylation study and cross reference this with transcriptome data. Using this new knowledge, they then zero in on a specific gene of interest TLR4. This receptor is already established as an extremely important receptor for preterm birth but little is known about its role in normal parturition. Strengths of this paper stem from the comprehensive data set provided, answering both the questions pertaining to the specific aims of this paper but also potentially future questions and providing potential focused targets of study. One example of this may be the common methylated genes that are found in both the ZIM and ZAM, illustrating not regional changes but gestational programming of this tissue.

      We thank the reviewer for the positive and constructive comments regarding the article. Following all the reviewers’ comments, we now have an improved version.

      Reviewer #3 (Public Review):

      Manuscript by Belville et al describes the significance of epigenetic and transcription associated changes to TLR4 as a mechanistic event for sterile inflammation associated with fetal membrane weakening, specifically in the zone of altered morphology. This manuscript is timely in an understudied area of research.

      The authors have taken an extensive set of experiments to derive their conclusions.

      However, it is unclear why the focus is on TLR4. Although LPS is a ligand for TLR4, gram negative infections are rare in PPROM but mostly genital Mycoplasmas. The methylome and transcriptome analysis does not necessarily warrant examination of a single marker. A clear rationale would need to be included.

      We would like to thank the reviewer for their comments regarding the article. For the last part of the public review, we would like to underline the following:

      -The choice of focusing on TLR4 is explained in the article text between lines 161 and 165 by the following sentences: ‘Of all the genes classified in these processes, TLR4 was the only one represented in all these biological processes and, therefore, seems to play a central role in parturition at term. To validate this in-silico observation and pave the way for describing TLR4’s importance, immunofluorescence experiments were first conducted to confirm the protein’s presence in the amnion and choriodecidua of the ZAM (Figure 3B)’. Furthermore, this choice arises from analysis described in Figure 3A, which underlines that the four GO terms most represented have only one common gene: ‘TLR4’. The combination of two high-scale studies does not permit us to individually characterize how each gene is regulated. Nevertheless, the focus on TLR4 provides an original and interesting hypothesis on how a specific layer regulation between the amnion and choriodecidua could be cellular realised in the ZAM’s weaker zone. Finally, because the high-scale study results are public, this type of analysis could be conducted on other candidate genes.

      -Throughout the text, we changed all the ‘E. Coli’ to ‘Gram-negative bacteria’. Furthermore, as found in the literature, genital mycoplasma are considered ‘Gram-negative bacteria’. We focused on the ‘sterile inflammation phenomenon’, and to support the hypothesis concerning the importance of TLR4, we realised a supplementary transcriptome ‘ZAM heatmap’, which confirmed a sur-expression of DAMP in choriodecidua, S100A7, A8 and A9, for example, which are well-known ligands of TLR4 (given below as an image).

      Heatmap of genes differentially expressed in the ZAM zone in relation to the sterile inflammation phenomenon.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this interesting paper by Dady and colleagues the nature of human neural progenitor differentiation is evaluated via transplantation studies. The first part of the paper establishes the timing of neural differentiation in human IPSC model systems and in human embryonic spinal cord, showing that the relative timing of neurogenesis and gliogenesis is maintained. In the second part of the paper these human IPSC neural rosettes are transplanted into the chick spinal cord during neurogenic stages (i.e. Isochronic transplantation) and they find that neurons are generated by these transplanted populations. Analysis of transplants at later stages reveals that neurogenesis has "stalled" and is relatively reduced within the transplanted population.

      Overall, this is an interesting paper that uses classic approaches to answer potentially interesting questions, however there are some issues that limit it's potential impact. The first two figures are recursive and show that the authors can implement an existing protocol.

      Our previous work established this differentiation protocol (Verrier et al 2018 Development), but this is its first use to analyse by immunofluorescence the timing of neural differentiation and the appearance of specific neuronal and glial cell types in an in vitro neural rosette assay. None of the data presented is recursive and it provides a quantitative timeline of human dorsal spinal cord differentiation. Moreover, our comparison with the human embryonic spinal cord indicates that neural differentiation progresses more rapidly in vitro.

      The transplantation studies are intriguing but do not offer sufficient new insights. The key finding seems to be that at later stages post-transplantation neuronal differential is "stalled". There are many other reasons (besides "stalling") that could explain their results. Suppose that stalling was indeed occurring, the authors offer no cellular or molecular insights into what regulates these intrinsic differences across species. At the end, it is not sufficiently clear what we have learned about the mechanisms that control the timing (pace seems to be another term for timing) of differentiation in human neural stem cells.

      Our study shows that differentiation of isochronic transplantation of human iPSC neural rosettes into the chick embryonic spinal cord initially follows that of neural rosettes cultured in vitro, rather than that of the faster differentiating host chicken embryo. This suggests that the human cells follow an intrinsic differentiation programme. However, after a longer period of culture the transplanted rosettes now lag behind their in vitro counterparts, suggesting that intrinsic cues are insufficient and that appropriate extrinsic cues are needed to promote differentiation progression. We discuss potential reasons for this stall of the differentiation programme in the Discussion and agree that some further investigation is of interest. Moreover, revision experiments now further extend our findings.

      Reviewer #2 (Public Review):

      In general, this manuscript provides new significant knowledge by comparing between neural differentiation rate within the same species (human) in vivo and in vitro and between species (human and chick). The quality of the data is excellent, and the combining of the in vivo chick model to compare between grafted and host cells is a fantastic idea, that can only be done in this experimental model. Yet, some controls and more in-depth analysis are missing and are required in my opinion before publication.

      1. In the grating experiment, non-manipulated embryos serve as controls. Yet, the grafted rosettes are inserted near an injured area where a piece of the neural tube was moved. A better control would be to graft homologous cells from a donor chick embryo (GFP+ chick line is available in the UK) or quail embryo (which has a similar growing rate as chick at E2) and examining whether the injured area doesn't affect the grated cells to differentiate in a different pace as compared to the human grafts. This control is necessary to rule out the possibility that the human graft did not accelerate their differentiation rate and later stopped differentiating due to extrinsic signals/lack of signals form the manipulated environment.

      For clarity, we point out that the control is not an unmanipulated embryo, but an embryo subject to the same tissue removal experiment but lacking the graft. We found that these operated only embryos quickly regenerated lost tissue to such an extent that the neural tube appeared similar to the unoperated contralateral side of the neural tube after 2 days. We further note many previous studies using quail tissue in place of chick to fate map the embryo (for example the many excellent studies of Nicole Le Douarin) which suggest that such manipulations result in normal differentiation of the grafted tissue and so are unaffected by placement into an injury site in a developing embryo. However, we appreciate that it would be informative to demonstrate this for our precise experiment and in our hands.

      1. When examining the entire results of the manuscript some important points need to be addressed: On the one hand, the rosettes correspond to their in vitro growth conditions/extrinsic cues and display an accelerated differentiation pace, when compared to their in vivo counterpart human cells. On the other hand, the rosettes do not correspond initially to the chick environment and maintain their own intrinsic tempo.

      The human rosettes were matched for differentiation state with the tissue removed from the chick embryo, despite this and the local cues that support the chicken spinal cord development beyond this point, the human cells retained the differentiation timing of their in vitro counterparts rather than that of the host chick embryo. This is consistent with the slower cell cycle and cell metabolism characteristic of human cells.

      Later, they do change their developmental program and attenuate their differentiation. Therefore, the conclusion that the cells mostly obey to intrinsic regulation is confusing. It would be great if the authors could provide better experimental data to confirm their conclusion. Some ideas that the authors may consider are to determine whether there is a time window that sets the tempo of the rosettes that cannot be influenced later by extrinsic cues. Will the grafted cells correspond differently whether they would be grafted at a more/less advanced stages and domains? Is there an initial mechanistic elucidation to the different behavior of spinal cord progenitors in the three contexts? Is there a possibility somehow to obtain human spinal cord progenitors and grow them in the same in vitro conditions as the rosettes to compare their differentiation rate? I am aware that some of these experiments are very hard to perform and not expecting the authors to perform all the suggested ones, yet, some more in-depth analysis would enable this article to explain better the presented observations.

      These are interesting suggestions, heterochronic grafting of the human neural rosettes, for example into the same site in an older chicken embryo would further test whether they continue to operate an intrinsic differentiation programme in this temporary distinct embryonic environment.

      Reviewer #3 (Public Review):

      The authors have developed dorsal spinal cord rosette assays from human pluripotent stem cells (hPSCs) and also from human induced pluripotent stem cells (hiPSCs) in a minimal culture medium containing retinoic acid. They define the dorsal spinal cord identity of these cells based on the presence of SOX2, PAX6, SNAI2 and PAX7, and absence of OLIG2 (characteristic of more ventral neural tube). Assessment of markers for migrating neural crest-like cells (HNK1, SOX10 and TFAP2alpha), immature neurons (DCX) and glial progenitors (NFIA) at different time points was used to show that the in vitro model recapitulates sequential differentiation observed in the spinal cord of avian and mouse embryos. Next, by comparing these results with neural differentiation in the human embryo, the authors show that neural differentiation occurs faster in vitro than in vivo. The authors then asked how these hiPSC-derived neural rosettes would respond to the more rapidly developing chicken embryonic environment, by grafting the rosettes into the developing chick neural tube. By assessing expression of various neural markers in the graft-derived cells, authors conclude that after two days of culture, human cells continued differentiation at the rate of the in vitro hiPSCs rather than at the rate of the host chicken cells. After longer culture (5 days), authors say that neurogenesis rate among graft-derived human cells attenuates and that the cells stall in the neural progenitor phase. Authors conclude that while initially an intrinsic differentiation programme is followed by the human cells, appropriate extrinsic inputs are required to maintain the neural differentiation trajectory of human cells.

      However, it is difficult to assess whether all conclusions by the authors for the human-into-chicken graft experiments are supported by their data, as some details of analysis are unclear (1) or experimental design was not conducive to the questions being asked (2). Some aspects of data analysis therefore need to be clarified and extended.

      1. Position of graft derived cells within the chicken host is very important when analysing presence/absence of a marker, but it is not always clear whether this has been taken into account by the authors. It appears that authors are assessing expression of markers in graft derived cells that are present outside OR inside domains in the chick host that would normally express that marker, and are not separating out such analysis. This will confuse interpretation of results and affect conclusions.

      One example where this would affect major conclusions of the manuscript is in the case of Islet-1 expression in human graft derived cells in the chicken host. Authors say that no Islet-1 was found in single graft derived cells in the chick embryo after two days of culture and use this to support their conclusion that the "pace of neural differentiation in the grafted human rosettes is unaltered in a more rapidly differentiating environment". However, Islet-1 expression in the chick is restricted to specific domains, therefore it would be important to know whether the graft-derived cells that the authors were analysing were within these Islet-1 positive host domains. Lack of Islet-1 in graft derived cells within such Islet-1 positive domains in chick would suggest that the graft derived cells have not responded to the host's timing of differentiation, and would support the authors' conclusions. However, lack of Islet-1 in graft derived cells outside of such Islet-1 positive domains could not be used to conclude the same thing as cells would be receiving different signals from the host. It appears that the graft used by the authors to show absence of Islet-1 in Fig 4G is outside of chick Islet-1 positive domains. Therefore, lack of Islet-1 in graft derived cells cannot be used to suggest that pace of human neural differentiation is initially directed by cell intrinsic factors, unless the location of the human cells in the chick is clearly shown to be within Islet-1 expressing domains in the chick.

      Human rosette cells were grafted into the chicken neural tube following removal of the dorsal half of the host neural tube at E2 and grafted cells were assessed in at least 3 sections from grafts in 3 different embryos for each marker analysed (see meta-data tables S1-7 and Methods). The reviewer is correct in that in a subset of sections graft cells were not in precisely the same position as chick endogenous Islet-1 expressing cells. We can provide the data which just includes only those with cells in this precise domain (3 sections from 3 different graft embryos), but also note that none of the sections analysed included human cells expressing ISLET1. This is the only marker analysed where this is an issue, other neuronal markers, such as P27 are expressed throughout the dorsal extent of the neural tube.

      1. Size of the graft used when transplanting human iPSCs into the chick will also affect the interpretation of results, as human cells will be exposed to varying levels of host signal depending on how much of their surface is exposed to host cells. Since the authors are using this experiment to test the effects of the chicken environment on human cells, this is a crucial point. After grafting hiPSC derived neural rosettes into the chick and culturing the chick embryo, authors assess expression of various markers in the graft-derived cells and separate out their analysis of marker expression across three different categories; cells found in 'cell rosettes', 'cell groups' or as 'single cells'. However, it remains unknown for how long these groupings were true during the culture time. For example, while it is known that at the time of grafting the cells were in a rosette structure, it is unknown at what time cells detached to incorporate as single cells (it could have been directly after grafting, or just prior to analysis) and is therefore not consistent across cells being analysed.

      One way to go around this would be not to graft the entire rosettes, but rather to dissociate the rosette and graft single cells/small groups of cells into the chick. With single cells the community effect (Gurdon 1988) would be avoided and the experiment would be testing the influence of only the host environment on this cell (rather than a combined influence of host environment and environment created by neighbouring graft derived cells as is the case in the current manuscript). This is particularly important as the data presented in the manuscript appear to show a difference between marker expression in single cells versus groups of cells and rosettes (plots in Fig 4 and 5).

      Details of rosette graft preparation are provided in the paper and this includes a gentle cell dissociation step, so we grafted human rosette cells that then reformed a rosette structure (which may reflect that human cells have greater affinity for each other), and some single cells were also initially available for insertion into the chicken neuroepithelium. It is likely that this cell mixing takes place early on while the chick dorsal neural tube reforms following the operation. For this reason, we analysed cell type specific markers in the human cells in large cell groups (reformed rosettes), smaller cell groups incorporated into the chick dorsal neural tube and single cells within this chick neuroepithelium. We appreciate that without the ability to monitor single cells throughout the experiment it is not possible to account fully for the environment experienced by a grafted cell. We agree that smaller grafts or other approaches may increase the number of cases of single human cells surrounded by chick neuroepithelial cells. We note the reviewer has taken up our consideration of the Community effect in the paper, which is of course why we have analysed marker expression in the three cell configurations. We also make clear in the paper that the apparent increase in P27 expression in single cells is not statistically significant and that this reflects the small number of single/isolated human cells within the chick neuroepithelium available for analysis (see metadata provided).

    1. Author Response:

      Reviewer #1 (Public Review):

      My main concern relates to the title, which does not appear to be supported by the data. One can't conclude that the reported effects are strictly due to altered glycolysis in cholinergic neurons without directly assessing glucose metabolism in these neurons. Moreover,TIGAR functions by blocking glycolysis and directing the pathway into the pentose phosphate shunt. Therefore, the resulting effect of deleting TIGAR in a neuronal population might be multiple.

      The authors show convincingly that deleting TIGAR from ChAT-expressing neurons, but not adipose or muscle cells, protects mice from cold-induced hypothermia. It is however unclear whether this leads to alteration in energy expenditure per se. This it important considering the first argument of the discussion highlighting how approaches to increase energy expenditure through the development/activation of brown/beige adipose tissue thermogenesis have failed. Moreover, it is unclear if TIGAR also affects heat dissipation considering the impact of its deletion from ChAT-expressing neurons on blood pressure and heart rate, two parameters that will likely influence the tail vasoactivity. Evaluating energy expenditure and heat loss appears to be necessary to support the conclusion that the resistance to hypothermia is exclusively dependent on shivering thermogenesis.

      One key aspect that may deserve discussion is a potential contribution of the sympathetic nervous system to the observed phenotype. The focus of the manuscript is on acetylcholine but one can't disqualify that sympathetic compensations may happen following the deletion of TIGAR in ChAT-expressing neurons.

      There are many data that are not shown but that would worth be included (lines 99, 113, 119, 159, 168, 181, 221,

      1. We have changed the title to better reflect the specific findings in this study.
      2. We now present in the Discussion section the potential roles of other mechanisms besides cholinergic signaling (sympathetic, vascular, behavioral) that could also contribute to temperature regulation in this model system.
      3. We have now included some of the data that was originally indicated as data not shown but have eliminated some of these data from the text as they are superfluous and do not provide important information for any of the conclusions drawn.

      Reviewer #3 (Public Review):

      Strengths: The study is nicely written and presented. The investigation of whole-body TIGAR knockout (TKO) clearly demonstrates resistance to cold exposure, and the authors logically follow potential sources through the obvious tissue candidates.

      Both skeletal muscle and adipose specific TIGAR knockouts were generated, neither of which recapitulated the effect of the TKO. Other obvious candidates, such as UCP1 content in adipose and basal oxidative capacity and contractility of skeletal muscle were ruled out using ex vivo techniques.

      Nevertheless, pharmacological interventions indicated that muscle contraction was necessary for protection from cold exposure and that the loss of TIGAR overcame competitive antagonism of the nicotinic acetylcholine receptor. These data were supportive of a role for skeletal muscle contraction, particularly at the level of cholinergic signaling.

      A cholinergic neuron specific TIGAR knockout was produced. Loss of TIGAR was molecularly confirmed, and this mouse recapitulated the whole-body knockout's resistance to cold exposure.

      Tracer studies are largely compelling and confirm that loss of TIGAR increases substrate dependence on glucose oxidation in a cell model.

      Weaknesses: The TKO mice were not characterized for body weight, body composition or energy expenditure, leaving some room for alternative or additive mechanisms.

      Although the tracer data demonstrate that loss of TIGAR causes the cell model to increase reliance on glycolysis compared to other unlabeled substrates, the data do not necessarily demonstrate an increase in the absolute rate of glycolysis or total acetyl-CoA production as intimated in the discussion. It is also unclear why media glutamate is examined for tracer incorporation rather than tissue glutamate.

      There are some minor weaknesses related to the description of the methods. For example, the 18O studies need clarification. It will be unclear to most readers how this method works.

      1. We now include body composition, food intake, activity and energy expenditure data in new Figures S1D-H.
      2. Following the stable isotope label from 1,2- 13C glucose into glutamate was used in these tracer analyses to non-invasively assess the differences in carbon flux between pyruvate carboxylase and pyruvate dehydrogenase, allowing us to use the cells for assessment of acetyl CoA and acetyl carnitine in the same experiment. This media tracer data indicates an increase in PDH flux (m1) in TKO cells compared to that in control cells, which, along with the corresponding cellular data for acetyl-CoA and acetyl-carnitines levels, all elevated in the TKO SH-SY5Y cells that are also consistent with an increase rate of glycolysis (new Figure S7C and D).
      3. We have further clarified the methods for the use of 18O labeled water.
    1. Author Response:

      Reviewer #2:

      In this manuscript, Ng et al., report on a system where cardiac mesoderm and pulmonary endoderm co-develop from pluripotent stem cells. This is of potential interest, as it could provide an integrated model for the study of human cardiopulmonary development.

      The main weakness lies in the lack of thorough characterization of the resulting cells and tissues. The characterization relies almost entirely on reporter gene expression and PCR for a limited set of markers. The only indication that ATII cells are generated is expression of a SPC-dTomato reporter and SFTPC mRNA. No evidence is given of function, of expression of other markers or direct staining for SPC, or of ultrastructure. No data are provided whether the lung component contains other lung cells. Another outstanding question for the lung component is whether any pulmonary mesenchyme was generated.

      Thank you for the suggestion. In the revised manuscript, we have included further cellular characterization of the 3D µTs. We included additional characterization for alveolar type 2 (AT2) cells, including a direct immunofluorescence staining of Pro-SPC and transmission electron microscopy imaging of the lamellar bodies (Fig. 6). Besides AT2 cells, we also identified the emergence of AT1-like cells via the expression of HOPX (Fig. 6b). To characterize cell types beyond the alveolar epithelium, we observed positive staining for S100A4, which is a marker for mesenchyme in the µTs (Figure 6-figure supplement 1a). In the meantime, we did not detect any proximal airway epithelial cell types, such as cilia cells (FOXJ1), secretory cell (MUC5AC), and basal cells (p63) (Figure 6-figure supplement 1b-d).

      The same is true for the cardiac component. Which types of cardiac cells are generated: ventricular, atrial, endocardium, epicardium, conducting tissue? No benchmarking was done compared to either human tissues or similar cells generated using more focused differentiation protocols, and functional studies are very limited.

      We agree with the reviewer’s perspective that the present study was primarily focused on progenitor specification. Nonetheless, in the revised manuscript, we have provided additional characterization of the induced cardiac tissues via immunofluorescence staining of Sarcomeric Alpha Actinin. Further, we have included new data on assessing the cardiac contractile function using a calcium channel blocker (Verapamil), showing reduced contractility in response to increasing concentrations of Verapamil (Fig. 6e).

      Another weakness is that there is no characterization of early intermediate developmental stages: primitive streak, mesendoderm, definitive endoderm, cardiac mesoderm, first or second heart field. This type of analysis would be required to validate this complex model as an approach to study human cardiopulmonary development.

      Thank you for pointing this out. In the revised manuscript, we have added a new figure (Figure 1-figure supplement 1) to include data on characterizing the presence of primitive streak by staining for T (Brachyury) after 2 days CHIR treatment. We also showed the presence of the mesendodermal marker (MIXL1), endodermal marker (SOX17) and mesodermal marker (NCAM1) during Stage-1 co-differentiation, as indicated by qPCR and immunostaining (Fig. 1).

      There is also no quantification of differentiation efficiency and yield, and neither are data shown to document absence or presence of other endodermal or mesodermal lineages. NKX2.1, for example is also expressed in the forebrain and in the thyroid.

      Thank you for the suggestion. In the revised manuscript, we have included FACS analysis of Day-15 differentiated cells to quantify the percentage of NKX2.1+ lung and NKX2.5+ cardiac progenitor cell populations. To assess the possibility of other related endodermal and neuronal cell populations, we have included new data on characterizing the Day-15 differentiated cells and showed no co-expression of NKX2.1 with TUJ1 (neuronal marker) or PAX8 (thyroid marker), thus, further supporting the observed NKX2.1+ cells representing the lung lineage (Figure 2-figure supplement 1).

      A final limitation is that multiple pluripotent line should be used.

      In the revised manuscript, we have provided a comprehensive characterization of applying the co-differentiation protocol to another hiPSC line (BU1), including germ layer induction, cardio-pulmonary progenitor induction, 3D organoid formation, and alveolar maturation (Figure 4-figure supplement 5). We have included the data for mesoderm and endoderm induction during Stage-1 (Figure 4-figure supplement 5b) and cardio-pulmonary µT formation from Day-15 progenitor cells. On Day-18, we showed that BU1-derived cardio-pulmonary µTs were stained positive for NKX2.1 and NKX2.5 as what we have observed in BU3. These µTs were also able to further mature into distal lung epithelial cells as indicated by positive staining of SFTPC and HOPX. Meanwhile, the NKX2.5+ cardiac lineages expressed cTnT and Sarcomeric Alpha Actinin ( Figure 4-figure supplement 5e).

      This type of model could be very useful, but it not clear that the goal of integrated cardiopulmonary development was achieved.

      We thank the reviewer for the comment. The following findings from this study suggests that an in vitro hiPSC-based integration of cardio-pulmonary development is possible. First, we showed that following establishment of a mixture of endoderm and mesoderm, the same set of signaling molecules were capable of inducing parallel induction of endoderm-to-pulmonary and mesoderm-to-cardiac specification, echoing their close spatial coordinates with embryonic body patterning and shared requirement of paracrine signaling. Second, we showed that in the presence of cardiac accompaniment, alveolar maturation was expedited, implying inter-lineage crosstalk between the co-developing cardio-pulmonary systems. In the meantime, we agree to the overall suggestion from reviewers that this study is primarily focusing more on cardio-pulmonary progenitor specification, and future investigations are needed to further clarify the mechanism and outcome of integrated cardio-pulmonary co-development. We have added this clarification in our revised manuscript.

      Reviewer #3:

      Ng and Johnston et al. reported the successful multilineage co-differentiation of mesoderm-derived cardiac and endoderm-derived lung progenitors from human pluripotent stem cells (hPSCs). The authors achieved their goals through a stepwise strategy built on the knowledge from published cardiac and lung differentiation protocols. The authors first employed WNT activation using GSK3 inhibitor CHIR, an established WNT signaling agonist, at relatively high dosage to induce primitive streak formation from hPSCs maintained in pluripotent medium (days 1-2). This is supported by knowledge from vertebrate development that both mesodermal and endodermal germ layers are patterned by primitive streak. This is also consistent with recent findings by Martyn et al. (PMID 29795348, https://doi.org/10.1038/s41586-018-0150-y) that activation of WNT signaling is sufficient to induce primitive streak from hPSCs. In the subsequent step (days 2-4), the newly formed primitive streak provides a gradient of endogenous WNT, BMP and Nodal/Activin signaling, which allows the co-induction of both mesoderm and definitive endoderm (DE) from the remaining hPSCs in culture in a serum and morphogen free differentiation medium. Consistently, high Nodal (by exogenous Activin A) favors endodermal induction at the expense of mesodermal specification, and medium-high exogenous BPM4 is detrimental to lung endodermal specification but enhances cardiac mesodermal differentiation. The authors then demonstrated that dual TGF and WNT inhibition is efficient to pattern the mesoderm and endoderm simultaneously for future cardiac and lung induction (days 4-8). This agrees with the existing knowledge that lungs derive from anterior foregut endoderm, and cardiac progenitors, the major substance of heart, derive from cranial lateral mesoderm. Mesoderm and DE patterning was followed by lung and heart specification through the activation of WNT and RA signaling exogenously, in the presence of endogenous BMP4 signaling (days 8-15).

      The differentiation strategy developed by the authors follows the lung and cardiac developmental paradigm overall, the protocol yields efficient lung and heart progenitor specification on the tested hiPSC line. The work provides a new insight into cardiac and lung directed differentiation, and offers a valuable platform to study human heart and lung development in vitro. For cardiac and pulmonary progenitor differentiation (days 4-15), the protocol described in this manuscript relies mainly on the exogenous application of common key developmental signal events shared by heart and lung specification from meso- and endo- derms, respectively. For progenitor maturation (post day 15), the data shows expedite alveolar maturation process in cardio-pulmonary co-differentiation culture, suggesting paracrine signal(s) from cardiac cells positively regulate alveolar maturation. The authors did not report any data on whether/how paracrine signal(s) from lung lineages may influence cardiac maturation. The authors achieved their goals, and the results support the conclusion of the paper overall.

      We agree to the reviewer’s point of view that the present study was primarily focused on progenitor cell induction and the maturation of the pulmonary lineage. In response to the reviewer’s comment, we have provided additional discussion suggesting how this model can be used to further investigate how paracrine signals from the lung lineage may influence cardiac maturation. Also, thank you for suggesting the reference (PMID19795348), we have added it to the manuscript.

      The weaknesses of manuscript are: 1) Lack of evidence/characterization of primitive streak formation at 48 hours of differentiation. 2) Lack of a thorough characterization of the composition of the entire differentiation culture at progenitor stage (day 15): it is very likely that there are pulmonary mesenchymal/mesodermal cells generated in the differentiation culture, besides cardiac mesoderm. The pulmonary mesenchyme may not be abundant in quantity but it plays critical roles in promoting alveolar maturation that the authors observed at day 18 of co-differentiation culture. Before drawing a conclusion, the authors must examine rigorously whether alveolar maturation was promoted by cardiac mesoderm or pulmonary mesoderm.

      We thank the reviewer for bringing this to our attention. In the revised manuscript, we have provided additional data on characterizing the primitive streak at 48 hours of differentiation (Figure 1-figure supplement 1), as well as on characterizing Day-15 differentiated cells using FACS analysis (Figure 2-figure supplement 1a-c). Further, as the reviewer pointed out, it is possible that supporting function maybe coming from additional mesodermal lineages aside from cardiac mesoderm. This has been demonstrated in the mouse model by Peng et al. that the heart served as a reservoir of cardiac and pulmonary mesenchymal cells that play a major role in lung development. In the revised manuscript, we have also added staining of S100A4, a marker for mesenchyme, in the 3D µT (Figure 6-figure supplement 1), as well as an additional discussion (line 556-562) on future studies needed to further assess the regulation of alveolar maturation by cardiac mesoderm or pulmonary mesoderm.

      3) The paper can benefit from providing mechanistic insights into whether/how alveolar maturation medium (CDCIK, days 15-18, and KDCI days 18-25) influenced the downstream cardiac lineage fate specification from the cardiac progenitors. Besides contracting/beating cardia cells, are there any other type(s) of cardiac lineages present in d25 culture? Do the cardiac progenitors generated by this protocol mainly represent cells from primary heart field? Is there any second heart field potential?

      We thank the reviewer for the comments. We agree to the overall comments from the editor and reviewers that the present study was primarily focused on the induction of cardiac and pulmonary progenitors. We also agree with the reviewer that further investigation and understanding of the cellular composition of cardiac-related lineages is needed. Related to this comment, we found that CHIR within the CKDCI was inhibitory for cardiac contraction, which would not initiate until the removal of CHIR, which is consistent with prior studies where they show that GSK-3β inhibition promotes expansion of cardiomyocytes but causes disorganized myofibrillar architecture. We have provided additional related discussion in the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      [...] Recently, pupil dilation was linked to cholinergic and noradrenergic neuromodulation as well as cortical state dynamics in animal research. This work adds substantially to this growing research field by revealing the temporal and spatial dynamics of pupil-linked changes in cortical state in a large sample of human participants.

      The analyses are thorough and well conducted, but some questions remain, especially concerning unbiased ways to account for the temporal lag between neural and pupil changes. Moreover, it should be stressed that the provided evidence is of indirect nature (i.e., resting state pupil dilation as proxy of neuromodulation, with multiple neuromodulatory systems influencing the measure), and the behavioral relevance of the findings cannot be shown in the current study.

      Thank you for your positive feedback and constructive suggestions. We are especially grateful for the numerous pointers to other work relevant to our study.

      1. Concerning the temporal lag: The authors' uniformly shift pupil data (but not pupil derivative) in time for their source-space analyses (see above). However, the evidence for the chosen temporal lags (930 ms and 0 ms) is not that firm. For instance, in the cited study by Reimer and colleagues [1] , cholinergic activation shows a temporal lag of ~ 0.5 s with regard to pupil dilation - and the authors would like to relate pupil time series primarily to acetylcholine. Moreover, Joshi and colleagues [2] demonstrated that locus coeruleus spikes precede changes in the first derivative of pupil dilation by about 300 ms (and not 0 ms). Finally, in a recent study recording intracranial EEG activity in humans [3], pupil dilation lagged behind neural events with a delay between ~0.5-1.7s. Together, this questions the chosen temporal lags.

      More importantly, Figures 3 and S3 demonstrate variable lags for different frequency bands (also evident for the pupil derivative), which are disregarded in the current source-space analyses. This biases the subsequent analyses. For instance, Figure S3 B shows the strongest correlation effect (Z~5), a negative association between pupil and the alpha-beta band. However, this effect is not evident in the corresponding source analyses (Figure S5), presumably due to the chosen zero-time-lag (the negative association peaked at ~900 ms)).

      As the conducted cross-correlations provided direct evidence for the lags for each frequency band, using these for subsequent analyses seems less biased.

      This is an important point and we gladly take the opportunity to clarify this in detail. In essence, choosing one particular lag over others was a decision we took to address the multi-dimensional issue of presenting our results (spectral, spatial and time dimensions) and fix one parameter for the spatial description (see e.g. Figure 4). It is worth pointing out first that our analyses were all based on spectral decompositions that necessarily have limited temporal resolutions. Therefore, any given lag represents the center of a band that we can reasonably attribute to a time range. In fact, Figure 3C shows how spread out the effects are. It also shows that the peaks (troughs) of low and high frequency ranges align with our chosen lag quite well, while effects in the mid-frequency range are not “optimally” captured.

      As picking lags based on maximum effects may be seen as double dipping, we note that we chose 0.93 sec a priori based on the existing literature, and most prominently based on the canonical impulse response of the pupil to arousing stimuli that is known to peak at that latency on average (Hoeks & Levelt, 1993; Wierda et al. 2012; also see Burlingham et al.; 2021). This lag further agrees with the results of reference [3] cited by the reviewer as it falls within that time range, and with Reimer et al.’s finding (cited as [1] above), as well as Breton-Provencher et al. (2019) who report a lag of ~900 ms sec (see their Supplementary Figure S8) between noradrenergic LC activation and pupil dilation. Finally, note that it was not our aim to relate pupil dilations to either ACh or NE in particular as we cannot make this distinction based on our data alone. Instead, we point out and discuss the similarities of our findings with time lags that have been reported for either neurotransmitter before.

      With respect to using different lags, changing the lag to 0 or 500 msec is unlikely to alter the reported effects qualitatively for low- and high frequency ranges (see Figure 3C), as both the pupil time series as well as fluctuations in power are dominated by very slow fluctuations (<< 1 Hz). As a consequence, shifting the signal by 500 msec has very little impact. For comparison, below we provide the reviewer with the results presented in Figure 4 but computed based on zero (Figure R1) and 500-msec (Figure R2) lags. While there are small quantitative differences, qualitatively the results remain mostly identical irrespective of the chosen lag.

      Figure R1. Figure equivalent to main Figure 4, but without shifting the pupil.

      In sum, choosing one common lag a priori (as we did here) does not necessarily impose more of a bias on the presentation of the results than choosing them post-hoc based on the peaks in the cross-correlograms. However, we have taken this point as a motivation to revise the Results and Methods sections where applicable to strengthen the rationale behind our choice. Most importantly, we changed the first paragraph that mentions and justifies the shift as follows, because original wording may have given the false impression that the cross-correlation results influenced lag choice:

      “Based on previous reports (Hoeks & Levelt, 1993; Joshi et al., 2016; Reimer et al., 2016), we shifted the pupil signal 930 ms forward (with respect to the MEG signal). We introduced this shift to compensate for the lag that had previously been observed between external manipulations of arousal (Hoeks & Levelt, 1993) as well as spontaneous noradrenergic activity (Reimer et al., 2016) and changes in pupil diameter. In our data, this shift also aligned with the lags for low- and high-frequency extrema in the cross-correlation analysis (Figure 3B).”

      Figure R2. Figure equivalent to main Figure 4, but with shifting the pupil with respect to the MEG by 500 ms.

      Related to this aspect: For some parts of the analyses, the pupil time series was shifted with regard to the MEG data (e.g., Figure 4). However, for subsequent analyses pupil and MEG data were analyzed in concurrent 2 s time windows (e.g., Figure 5 and 6), without a preceding shift in time. This complicates comparisons of the results across analyses and the reasoning behind this should be discussed.

      The signal has been shifted for all analyses that relate to pupil diameter (but not pupil derivative). We have added versions of the following statement in the respective Results and Methods section to clarify (example from Results section ‘Nonlinear relations between pupil-linked arousal and band-limited cortical activity’):

      “In keeping with previous analyses, we shifted the pupil time series forward by 930 msec, while applying no shift to the pupil derivative.”

      1. The authors refer to simultaneous fMRI-pupil studies in their background section. However, throughout the manuscript, they do not mention recent work linking (task-related) changes in pupil dilation and neural oscillations (e.g., [4-6]) which does seem relevant here, too. This seems especially warranted, as these findings in part appear to disagree with the here-reported observations. For instance, these studies consistently show negative pupil-alpha associations (while the authors mostly show positive associations). Moreover, one of these studies tested for links between pupil dilation and aperiodic EEG activity but did not find a reliable association (again conflicting with the here-reported data). Discussing potential differences between studies could strengthen the manuscript.

      We have added a discussion of the suggested works to our Discussion section. We point out however that a recent study (Podvalny et al., https://doi.org/10.7554/eLife.68265) corroborates our finding while measuring resting-state pupil and MEG simultaneously in a situation very similar to ours. Also, we note that Whitmarsh et al. (2021) (reference [6]) is actually in line with our findings as we find a similar negative relationship between alpha-range activity in somatomotor cortices and pupil size.

      Please also take into account that results from studies of task- or event-related changes in pupil diameter (phasic responses) cannot be straightforwardly compared with the findings reported here (focusing on fluctuations in tonic pupil size) , due to the inverse relationship between tonic (or baseline) and phasic pupil response (e.g. Knapen et al., 2016). This means that on trials with larger baseline pupil diameter, phasic pupil dilation will be smaller and vice versa. Hence, a negative relation between the evoked change in pupil diameter and alpha-band power can very well be consistent with the positive correlation between tonic pupil diameter and alpha-band activity that we report here for visual cortex.

      In section ‘Arousal modulates cortical activity across space, time and frequencies’ we have added:

      “Seemingly contradicting the present findings, previous work on task-related EEG and MEG dynamics reported a negative relationship between pupil-linked arousal and alpha-range activity in occipito-parietal sensors during visual processing (Meindertsma et al, 2017) and fear conditioning (Dahl et al. 2020).Note however that results from task-related experiments, that focus on evoked changes in pupil diameter rather than fluctuations in tonic pupil size, cannot be directly compared with our findings. Similar to noradrenergic neurons in locus coeruleus (Aston-Jones & Cohen, 2005), phasic pupil responses exhibit an inverse relationship with tonic pupil size (Knapen et al., 2016). This means that on trials with larger baseline pupil diameter (e.g. during a pre-stimulus period), the evoked (phasic) pupil response will be smaller and vice versa. As a consequence, a negative correlation between alpha-band activity in the visual cortex and task-related phasic pupil responses does not preclude a positive correlation with tonic pupil size during baseline or rest as reported here. In line with this, Whitmarsh et al., 2021 found a negative relationship between alpha-activity and pupil size in the somatosensory cortex that agrees with our finding. Although using an event-related design to study attention to tactile stimuli, this relationship occurred in the baseline, i.e. before observing any task-related phasic effects on pupil-linked arousal or cortical activity.”

      In section ‘Arousal modulation of cortical excitation-inhibition ratio’ we have added: “The absence of this effect in visual cortices may explain why Kosciessa et al. (2021) found no relationship between pupil-linked arousal and spectral slope when investigating phasic pupil dilation in response to a stimulus during visual task performance. However, this behavioral context, associated with different arousal levels, likely also changes E/I in the visual cortex when compared with the resting state (Pfeffer et al., 2018).”

      Finally, in the Conclusion we added (note: ‘they’ = the present results): “Further, they largely agree with similar findings of a recent independent report (Podvalny et al., 2021).”

      Related to this aspect: The authors frequently relate their findings to recent work in rodents. For this it would be good to consider species differences when comparing frequency bands across rodents and primates (cf. [7,8]).

      Throughout our Results section we have mainly remained agnostic with respect to labeling frequency ranges when drawing between-species comparisons, and have only reverted to it as a justification for a dimension reduction for some of the presented analysis. Following your comment however, we have phrased the following section in the Discussion, section ‘Arousal modulates cortical activity across space, time and frequencies’, more carefully:

      “The low-frequency regime referred to in rodent work (2—10Hz; e.g., McGinley et al., 2015) includes activity that shares characteristics with human alpha rhythms (3—6Hz; Nestogel and McCormick, 2021; Senzai et al. 2019). The human equivalent however clearly separates from activity in lower frequency bands and,here, showed idiosyncratic relationships with pupil-linked arousal.”

      1. Figure 1 highlights direct neuromodulatory effects in the cortex. However, seminal [9-11] and more recent work [12,13] demonstrates that noradrenaline and acetylcholine also act in the thalamus which seems relevant concerning the interpretation of low frequency effects observed here. Moreover, neural oscillations also influence neuromodulatory activity, thus the one-headed arrows do not seem warranted (panel C) [3,14].

      This is a very good point. First, we would like to note that we have extended on acknowledging thalamic contributions to low-frequency (specifically alpha) effects in response to the Reviewer’s point 11 (‘Recommendations for authors’ section below). Also, we have added a reference to the role of potential top-down (reverse) influences to our Discussion, section ‘Arousal modulates cortical activity across space, time and frequencies’, as follows:

      “Further, we note that our analyses and interpretations focus on arousal-related neuromodulatory influences on cortical activity, whereas recent work also supports a reverse “top-down” route, at least for frontal cortex high-frequency activity on LC spiking activity (Totah et al., 2021).”

      Ultimately, however, we decided to leave the arrows in Figure 1C uni-directional to keep in line with the rationale of our research that stems mostly from rodent work, which also emphasises the indicated directionality. Also, reference [3] is highly interesting for us because it actually aligns with our data: The authors show that a spontaneous peak of high-frequency band activity (>70 Hz) in insular cortex precedes a pupil dilation peak (or plateau) in two of three participants by ~500msec (which mimics a pattern found for task-evoked activity; see their Figure 5b/c). We find a maximum in our cross-correlation between pupil size and high frequency band activity (>64 Hz) that indicates a similar lag (see our Figure 3B). Importantly, both results do not rule out a common source of neuromodulation for the effects. We have added the following to the end of the section ‘An arousal-triggered cascade of activity in the resting human brain’:

      “In fact, Kucyi & Parvizi (2020) found spontaneous peaks of high-frequency band activity (>70 Hz) in the insular cortex of three resting surgically implanted patients that preceded pupil dilation by ~500msec - a time range that is consistent with the lag of our cross-correlation between pupil size and high frequency (>64Hz) activity (see Figure 3B). Importantly, they showed that this sequence mimicked a similar but more pronounced pattern during task performance. Given the purported role of the insula (Menon & Uddin, 2015), this finding lends support to the idea that spontaneous covariations of pupil size and cortical activity signal arousal events related to intermittent 'monitoring sweeps' for behaviourally relevant information.”

      1. In their discussion, the authors propose a pupil-linked temporal cascade of cognitive processes and accompanying power changes. This argument could be strengthened by showing that earlier events in the cascade can predict subsequent ones (e.g., are the earlier low and high frequency effects predictive of the subsequent alpha-beta synchronization?)-

      We added this cascade angle as one possible interpretation of the observed effects. We fully agree that this is an interesting question but would argue that this would ideally be tested in follow-up research specifically designed for that purpose. The suggested analysis would add a post-hoc aspect to our exploratory investigation in the absence of a suitable contrast, while also potentially side-tracking the main aim of the study. We have revised the language in this section and added the following changes (bold) to the last paragraph to emphasise the speculatory aspect, and clarify what we think needs to be done to look into this further and with more explanatory power.

      “The three scenarios described here are not mutually exclusive and may explain one and the same phenomenon from different perspectives. Further, it remains possible that the sequence we observe comprises independent effects with specific timings. A pivotal manipulation to test these assumptions will be to contrast the observed sequence with other potential coupling patterns between pupil-linked arousal and cortical activity during different behavioural states.”

    1. Author Response

      Reviewer #1 (Public Review):

      As far as I can tell, the input to the model are raw diffusion data plus a couple of maps extracted from T2 and MT data. While this is ok for the kind of models used here, it means that the networks trained will not generalise to other diffusion protocols (e.g with different bvecs). This greatly reduces to usefulness of this model and hinders transfer to e.g. human data. Why not use summary measures from the data as an input. There are a number of rotationally invariant summary measures that one can extract. I suspect that the first layers of the network may be performing operations such as averaging that are akin to calculating summary measures, so the authors should consider doing that prior to feeding the network.

      We agree with the reviewer that using summary measures will make the tool less dependent on particular imaging protocols and more translatable than using rawdata as inputs. We have experimented using a set of five summary measures (T2, magnetization transfer ratio (MTR), mean diffusivity, mean kurtosis, and fractional anisotropy) as inputs. The prediction based on these summary measures, although less accurate than predictions based on rawdata in terms of RMSE and SSIM (Figure 2A), still outperformed polynomial fitting up to 2nd order. The result, while promising, also highlights the need for finding a more comprehensive collection of summary measures that match the information available in the raw data. Further experiments with existing or new summary measures may lead to improved performance.

      The noise sensitivity analysis is misleading. The authors add noise to each channel and examine the output, they do this to find which input is important. They find that T2/MT are more important for the prediction of the AF data, But majority of the channels are diffusion data, where there is a lot of redundant information across channels. So it is not surprising that these channels are more robust to noise. In general, the authors make the point that they not only predict histology but can also interpret their model, but I am not sure what to make of either the t-SNE plots or the rose plots. I am not sure that these plots are helping with understanding the model and the contribution of the different modalities to the predictions.

      We agree that there is redundant information across channels, especially among diffusion MRI data. In the revised manuscript, we focused on using the information derived from noise-perturbation experiments to rank the inputs in order to accelerate image acquisition instead of interpreting the model. We removed the figure showing t-SNE plots with noisy inputs because it does not provide additional information.

      Is deep learning really required here? The authors are using a super deep network, mostly doing combinations of modalities. is the mapping really highly nonlinear? How does it compare with a linear or close to linear mapping (e.e. regression of output onto input and quadratic combinations of input)? How many neurons are actually doing any work and how many are silent (this can happen a lot with ReLU nonlinearities)? In general, not much is done to convince the reader that such a complex model is needed and whether a much simpler regression approach can do the job.

      The deep learning network used in the study is indeed quite deep, and there are two main reasons for choosing it over simpler approaches.

      The primary reason to pick the deep learning approach is to accommodate complex relationships between MRI and histology signals. In the revised Figure 2A-B, we have demonstrated that the network can produce better predictions of tissue auto-fluorescence (AF) signals than 1st and 2nd order polynomial fitting. For example, the predicted AF image based on 5 input MR parameters shared more visual resemblance with the reference AF image than images generated by 1st and 2nd order polynomial fittings, which were confirmed by RMSE and SSIM values. The training curves shown in Fig. R1 below demonstrate that, for learning the relationship between MRI and AF signals, at least 10 residual blocks (~ 24 layers) are needed. Later, when learning the relationship between MRI and Nissl signals, 30 residual blocks (~64 layers) were needed, as the relationship between MRI and Nissl signals appears less straightforward than the relationship between MRI and AF/MBP/NF signals, which have a strong myelin component. In the revised manuscript, we have clarified this point, and the provided toolbox allows users to select the number of residual blocks based on their applications.

      Fig. R1: Training curves of MRH-AF with number of residual blocks ranging from 1 to 30 showing decreasing RMSEs with increasing iterations. The curves in the red rectangular box on the right are enlarged to compare the RMSE values. The training curves of 10 and 30 residual blocks are comparable, both converged with lower RMSE values than the results with 1 and 5 residual blocks.

      In addition, the deep learning approach can better accommodate residual mismatches between co-registered histology and MRI than polynomial fitting. Even after careful co-registration, residual mismatches between histology and MRI data can still be found, which pose a challenge for polynomial fittings. We have tested the effect of mismatch by introducing voxel displacements to perfectly co-registered diffusion MRI datasets and demonstrated that the deep learning network used in this study can handle the mismatches (Figure 1 – figure supplement 1).

      Relatedly, the comparison between the MRH approach and some standard measures such as FA, MD, and MTR is unfair. Their network is trained to match the histology data, but the standard measures are not. How does the MRH approach compare to e.g. simply combining FA/MD/MTR to map to histology? This to me would be a more relevant comparison.

      This is a good idea. We have added maps generated by linear fitting of five MR measures (T2, MTR, FA, MD, and MK) to MBP for a proper comparison. Please see the revised Figure 3A-B. The MRH approach provided better prediction than linear fitting of the five MR measures, as shown by the ROC curves in Figure 3C.

      • Not clear if there are 64 layers or 64 residual blocks. Also, is the convolution only doing something across channels? i.e. do we get the same performance by simply averaging the 3x3 voxels?

      We have revised the paragraph on the network architecture to clarify this point in Figure 1 caption as well as the Methods section. We used 30 residual blocks, each consists of 2 layers. There are additional 4 layers at the input and output ends, so we had 64 layers in total.

      The convolution mostly works across channels, which is what we intended as we are interested in finding the local relationship between multiple MRI contrasts and histology. With inputs from modified 3x3 patches, in which all voxels were assigned the same values as the center voxel, the predictions of MRH-AF did not show apparent loss in sensitivity and specificity, and the voxel-wise correlation with reference AF data remained strong (See Fig. R2 below). We think this is an important piece of information and added it as Figure 1 – figure supplement 3. Averaging the 3x3 voxels in each patch produced similar results.

      Fig. R2: Evaluation of MRH-AF results generated using modified 3x3 patches with 9 voxels assigned the same MR signals as the center voxel as inputs. A: Visual inspection showed no apparent differences between results generated using original patches and those using modified patches. B: ROC analysis showed a slight decrease in AUC for the MRH-AF results generated using modified patches (dashed purple curve) compared to the original (solid black curve). C: Correlation between MRH-AF using modified patches as inputs and reference AF signals (purple open circles) was slightly lower than the original (black open circles).

      The result in the shiverer mouse is most impressive. Were the shiverer mice data included in the training? If not, this should be mentioned/highlighted as it is very cool.

      Data from shiverer mice and littermate controls were not included in the training. We have clarified this point in the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      This work raises the question of how in plane forces generated at the apical surface of an epithelial cell sheet cause out of plane motion, an important morphogenetic motif. To address this question, a new ontogenetic dominant negative rho1 tool, based on the cry2-CIBN system is presented. The authors use this tool to analyze the well studied biophysical process of ventral furrow formation, and dissect the spatiotemporal requirement of rho1 signaling to modulate myosin accumulation. They separate the effect on morphogenesis into an early phase that becomes significantly slowed down by myosin inhibition, and a late phase where the kinetics is comparable to wild type despite treatment. For interpretation of the data, an older model of cell mechanics treating tissue as a purely elastic material is presented. It fails to reproduce the observations. As a modification, in analogy to buckling of a thin beam under load, a compressive stress exerted by the adjacent ectoderm is introduced. Further analysis of cell behaviors in response to various laser mediated tissue manipulations is presented as support of the proposed mechanism.

      Overall, the manuscript addresses an important aspect of morphogenesis. In particular the use of optogenetic tools promises new insights that might be more challenging to achieve with traditional mutant analysis. However, reservations remain with respect to (1) rigor of the analysis, and (2) interpretation and quality of the data in support of the proposed mechanism; this applies in particular to presentation of biophysical observations, including experiment and simulations.

      The manuscript adds valuable quantitative data, in particular the findings described in Fig 2ab. However, insufficient analysis are performed to fully support the claims of the manuscript by the data presented.

      (I) The manuscript proposes an elasticity based model of tissue mechanics, but provides no experimental evidence in support of this assumption. Many rheology studies performed in a wide range of specimen (including the Drosophila embryo) found a separation of time scales, that shows elasticity is a good approximation of tissue mechanics only for time scales short compared to the process studied here.

      We agree with the reviewer that an elasticity-based model of tissue mechanics is a simplification for the actual tissue properties in the real embryos. To provide justification for this simplification, in the revised manuscript, we have cited a previous biophysical study measuring tissue viscoelasticity in early Drosophila embryos (Doubrovinski et al., 2017). Using a magnetic tweezers-based approach, Doubrovinski et al. shows that the lower bound of the decay time of the elastic response is four minutes (the lower limit on the timescales where tissue behaves elastically). In addition, when history dependence of the response is considered, the decay time increases to nine minutes, which is close to the duration of ventral furrow formation (~ 15 – 20 minutes). Therefore, we consider elasticity is a reasonable approximation of tissue mechanics during ventral furrow formation. The elasticity assumption has been widely used in the previously published modeling work to simulate ventral furrow formation (Allena et al., 2010; Conte et al., 2009; Gracia et al., 2019; Heer et al., 2017; Hocevar Brezavšček et al., 2012; Muñoz et al., 2007; Rauzi et al., 2015).The modeling framework used in our current study, which is initially described in Polyakov et al. 2014, successfully predicts the intermediate and final furrow morphologies with a minimal set of active and passive forces without prescribing individual cell shape changes. It is therefore advantageous to use this model to explore the main novel aspect of the folding mechanics underlying ventral furrow formation. We show that the model can recapitulate the binary tissue response to acute myosin inhibition. In addition, it accurately predicts the intermediate furrow morphology at the transitional state and several other morphological properties associated with myosin inhibition. We therefore believe that this minimalistic model captures the central aspect of the physical mechanism underlying mesoderm bistability observed in the experiments.

      (II) The manuscript uses a method of micro-dissection to soften cells, but does not provide a clear definition of the concept softening, provides no rational for the methods functioning, and does not provide independent validation. The described treatment might affect cells in many alternative ways to the offered interpretation. This data is the central experimental evidence given in support of the proposed ectoderm compression mechanism, and therefore it is essential to provide a precise physical explanation of the method, and validation of measurements that bolster the conclusion.

      We apologize for not explaining the meaning of “softening” clearly in our original manuscript and the rationale for using laser ablation to detect compression. By “softening”, we meant to describe the mechanical status of the cell when the subcellular structures that normally support the mechanical integrity (e.g., cortical actin) are disrupted. We reason that when such a change in mechanical properties happens in a specific region of a tissue that is under compression, the cells in this region should have an impaired ability to resist compression from outside of the region and thereby cause the region to shrink.

      Laser ablation has been widely used to measure tensile stresses in cells and tissues by disruption of cells or subcellular structures. The method we used is adapted from previous described protocols, where a femtosecond near infrared laser is used to disrupt subcellular structures for detection of tissue tension (Rauzi et al., 2015; Rauzi et al., 2008).It has been shown that when laser intensity is properly controlled, the treatment can leave the plasma membrane intact but disrupt subcellular structures associated with the plasma membrane, such as adherens junctions and the cortical actomyosin networks (Rauzi et al., 2015; Rauzi et al., 2008).Using a femtosecond near infrared laser, we were able to ablate embryonic tissues that are under tension and observe tissue recoil after laser ablation, suggesting that our approach has disrupted the cortical cytoskeleton in the laser treated region (e.g., Figure 3 and Authors’ Response Figure 1). In these experiments, the lack of damage on the plasma membrane is indicated by the readily recovery of the plasma membrane signal after laser treatment, as well as the lack of bright burn marks on the tissue.

      As we noted before, we reasoned that if tissue is compressive, similar laser treatment that generates tissue recoil in tissues under tension should result in tissue shrinking within the laser-treated region. The data presented in our original manuscript demonstrate that tissue shrinking is not a non-specific response to our laser treatment – we did not observe such a response when we treat the tissue during cellularization or within the first five minutes of gastrulation, although identical experimental conditions were used (Original Figure 4). We have also obtained additional evidence that supports the use of tissue shrinking as a readout of tissue compression. We tested our laser ablation approach in Stage 8 – 9 embryos at regions where cells are actively dividing/proliferating, which would expect to generate compressive stresses in the tissue. As we perform laser ablation in this region, we observed shrinking of the treated region, which was distinct from the tensile tissue response (Authors’ Response Figure 1). While this preliminary evidence is encouraging, we agree with the reviewer that further independent validations are needed given that the methods for detecting tissue compression have not been well established in the field. Following the editor’s suggestion, we have removed this experiment from the current manuscript and focus on the characterization of the optogenetic tool and the binary tissue response after acute actomyosin inhibition.

      Authors’ Response Figure 1: Laser ablation in regions of tissues with active cell proliferation (a) or undergoing apical constriction (b). The movement of tissues is indicated by overlaying membrane signals (Ecadherin-GFP) at T = 0 sec and at T = 10 sec. T = 0 in the “After ablation” panels marks the time immediately after ablation. (a) Stage 8 – 9 embryos. Multiple cells are in the process of cell division, as indicated by mitotic rounding (yellow arrowheads) or the appearance of cleavage furrows (red arrowheads). Immediately after laser ablation, the surrounding cells moved towards the ablated region (cyan arrows). (b) An embryo undergoing ventral furrow formation. Ablation within the constriction domain results in recoil of the surrounding cells away from the ablated region (cyan arrows).

      (III) Mechanical isolation of the mesoderm is a very exciting approach to test the possible involvement of adjacent tissues in folding. Indeed, the authors report a delay of ventral furrow formation. However, there is no evidence provided that (a) the mesoderm is mechanically uncoupled, and (b) that the treatment did not have undesired side effects. For example, a similar procedure (so-called cauterization, see Rauzi 2015) has been used to immobilize cells in the Drosophila embryo. Such an effect could account for the observed delay in furrow formation.

      We agree with the reviewer that “mechanical uncoupling” is merely a prediction based on our observation but has not been directly demonstrated. On the other hand, since the purpose of this experiment is to ask whether the presence of the lateral ectoderm is important for the mesoderm to transition between apical constriction and invagination (and our result shows yes), whether the approach we used mechanically uncoupled mesoderm and the ectoderm is no longer an immediately relevant question. We apologize for the imprecise use of the term “mechanically uncoupling” in our original manuscript and we thank the reviewer for pointing this out.

      As for the reviewer’s point (b), we have several pieces of evidence indicating that our approach did not cause anchoring of the tissue to the vitelline membrane. The major difference between the approach we used and that used by Rauzi et al. 2015 is the location of the tissue where the laser treatment was imposed. In order to anchor the tissue to the vitelline membrane, Rauzi et al. target the laser to the apical side of the tissue, adjacent to the vitelline membrane. The resulting cauterization of the tissue caused anchoring of the tissue to the vitelline membrane, presumably by fusion of the tissue with the vitelline membrane. In our approach, we used similar type of laser (femtosecond near infrared laser) to perform tissue disruption, but instead of targeting the apical side of the tissue, we targeted the basal region of the invaginating cleavage furrows during cellularization, with the goal to block cell formation. While the laser intensity we used is high enough to cause cauterization of the tissue as indicated by the appearance of bright autofluorescence in the laser treated region, these “burn marks” are not located at the apical side of the cells (Authors’ Response Figure 2a). The lack of “burn marks” on the vitelline membrane in our experiment is in sharp contrast to the result shown in Rauzi et al 2015 (see Authors’ Response Figure 2b for an example from Rauzi et al in comparison to our own data in 2a). Because of the difference in the location of cauterization, we do not expect that the tissue would be fused with the vitelline membrane after our treatment. This is further suggested by the observation that the burn marks can move before the onset of gastrulation, which again indicates that the tissue is not anchored to the vitelline membrane (Authors’ Response Figure 2c).

      That being said, we acknowledge that we do not fully understand the impact of the laser treatment on the embryo (e.g., what causes the reduced rate of apical constriction), and more control experiments are required in order to fully describe the tissue response we observed. As suggested by the editor, we decided to remove the ectoderm-ablation experiment from the revised manuscript and focus on the characterization of the optogenetic tool and the binary tissue response after acute actomyosin inhibition.

      Authors’ Response Figure 2: Laser disruption of cell formation in the lateral ectodermal region. (a) Cross-section and en face views showing the basal location of the “burn marks” after laser disruption in the lateral ectodermal region. No burn marks are observed at the level of the vitelline membrane. Blue and red curves in the cross-section views indicate the vitelline membrane and the position where the projections were made for the en face views. Magenta arrows: burn marks. (b) Figure 5a from Rauzi et al., 2015, clear bright burn marks can be seen from the apical surface view. (c) Overlay of the signal at T = -10 min and 0 min (onset of gastrulation) showing the movement of burn marks before gastrulation (yellow arrows).

      (IV) Some panels show two distinct molecules tagged with the same or spectrally overlapping flurophores, that unfortunately localize in similar spatial patterns. This encumbers data validation.

      We agree with the reviewer that having two distinct proteins tagged with the same fluorophore is not ideal for understanding the behavior of the tagged proteins, however, it usually does not affect the evaluation of the cell or tissue morphology, as far as the cell membrane is explicitly labeled. For example, in our original Figure 2 (new Figure 4), although GFP is tagged on both CIBN and Sqh, and mCherry is tagged on both CRY2-Rho1DN and Sqh, the cell and tissue morphology is clearly discernable by these markers, which allowed us to evaluate the progression of ventral furrow formation. In the cases where there was a need to evaluate the behavior of a particular molecule (e.g. Sph), we always repeated the experiments in a way such that the molecule of interest is tagged with a distinct fluorophore that does not spectrally overlap with other fluorophores – this often requires the use of an plasma membrane anchored CIBN that is not fluorescently tagged (e.g. Figure 1, Figure 4 – figure supplement 3).

      (V) The physical model is a central part for data interpretation. In its current form it is very challenging to follow. It is also critical the system be studied with proper cell aspect ratio, as the elasticity of thin sheets has a well established non-linear thickness dependence.

      These are valid critiques of our thin layer physical model (original Figure 5). The original purpose of this model is not to recapitulate the actual furrow morphology or cell shape change observed in the actual embryo, but rather to test the possibility of recapitulating the acceleration in tissue flow during the folding process by combining local constriction and global compression in a spherical (circular in 2D) elastic shell. Developing a dynamic vertex model that contains the realistic cell aspect ratio comparable to the actual cells in the embryo while displaying realistic cellular dynamics during the folding process is nontrivial and need substantial further development of the model. Since the manuscript is now focused on the bistable characteristics of the mesoderm during gastrulation rather than tissue dynamics during the folding process, we decide to leave the dynamics vertex model out of the revised manuscript, as suggested by the editor.

      Reviewer #2 (Public Review):

      Guo and colleagues aim to unravel the mechanisms driving the fast process of mesoderm invagination in the Drosophila early developing embryo. While cell apical constriction is known to drive ventral furrowing (1st phase), it is still not clear if apical constriction is necessary/sufficient to drive mesoderm internalization (2nd phase) and weather other mechanisms cooperate during this process. By using 1ph optogenetics, the authors cannot test specifically the role of apical constriction but can systematically affect the overall actomyosin network in ventral cells in a time specific fashion (1-minute resolution). In this way, they come to the conclusion that actomyosin contractility is necessary for the 1st phase but not for the 2nd phase of mesoderm invagination. Interestingly, they conclude that the system is bistable. In the second part of this study, the authors test the role of the coupling between mesoderm and ectoderm by using 2D computational modelling and infrared pulsed laser dissection. They propose that the ectoderm can generate compressive forces on the mesoderm facilitating mesoderm internalization (2nd phase).

      This project is of interest since it tackles a key morphogenetic process that is necessary for the development of the embryo. The conclusion of 'bistability' resulting from the RhoDN optogenetic experiments (1st part of this study) are well supported and quite interesting. The IR laser experiments used to tackle the coupling between ectoderm and mesoderm (2nd part of the study) are key to support main conclusions, nevertheless their experimental design and results are puzzling. It is not clear what the authors are actually doing to the tissues. The experiments performed in the 2nd part of this study need to be revisited and conclusions eventually softened.

      Major comments:

      1) The 920 nm laser ablation of ectoderm cells is a key experiment in this study to support the ectoderm compression hypothesis. Nevertheless, this experiment is puzzling: the rationale of the experimental design, the effect of the laser on cells and the interpretation of the results are unclear.

      The rationale for the laser ablation experiment designed to test tissue compression is analogous to the widely used laser ablation approach for detecting tissue tension (Rauzi et al., 2015; Rauzi et al., 2008). In typical experiments where laser ablation was used to measure tensile stresses in cells and tissues, ablation of cells or subcellular structures that are under tension results in recoil of surrounding cell/tissue structures. We reasoned that if the tissue is under compression, similar laser treatment should result in shrinking of the laser-treated region, as the cells in the laser-treated region are expected to have an impaired ability to resist compressive stresses from outside of the region.

      In our experiment, we used the reduction of the width of the laser treated region within the first 10 sec after laser treatment as the measure for tissue shrinking, which we considered as an indication for the presence of compressive stresses. This tissue response, albeit mild, is not a non-specific tissue response to our laser treatment – we did not observe tissue shrinking when we treat the tissue during cellularization or within the first five minutes of gastrulation, although identical experimental conditions were used. The rate and magnitude of tissue shrinking after laser treatment is determined by multiple factors, including the level of compressive stresses, the difference in cell rigidity before and after laser treatment, and the overall viscosity of the tissue. We acknowledge that the knowledge on these factors is largely lacking, and therefore additional independent validations of our approach are needed to further strengthen our conclusion on the presence of tissue compression. Following the editor’s suggestion, we decided to remove the laser ablation experiment from the current manuscript and focus on the characterization of the optogenetic tool and the binary tissue response after acute actomyosin inhibition.

      2) The authors propose to use again 920 nm laser ablation but this time to "physically separate" the two ectoderms from the ventral tissue. This is again a key experiment, but it raises some concerns:

      a. "Physical separation" would need to be demonstrated (e.g., EM after laser ablation). From Fig. 6b it is clear that IR laser ablation results in prominent auto-fluorescent zones. This has been already reported in previous work (De Medeiros G. et al. Scientifc Reports 2020) showing that high power and sustained IR fs laser targeting produces auto-fluorescence and highly electron-dense structures in the early developing Drosophila embryo. This process is referred to laser cauterization that does not induce separation between tissues. This structures eventually displace together with the lateral tissue (also shown in Fig.6 b). b. This strong laser "treatment", that should be ectoderm specific, results in perturbation of other non-ectoderm related processes (e.g., mesoderm apical constriction as shown by the authors). This can support the idea that many other processes are affected and that in general this laser heating "treatment" has global effects. These results might invalidate the conclusion proposed by the authors.

      These are both valid critiques. As for the reviewer’s point “a”, we agree with the reviewer that a “physical separation” of the mesoderm from the ectoderm has not been rigorously demonstrated in our original manuscript. As detailed in our response to reviewer #1 comment #3, since the purpose of this experiment is to ask whether the presence of the lateral ectoderm is important for the mesoderm to transition between apical constriction and invagination (and our result shows yes), whether the approach we used physically separated the mesoderm and the ectoderm is no longer an immediately relevant question. We apologize for the vague use of “physical separation” in our original manuscript and we thank the reviewer for pointing this out.

      To address the reviewer’s point “b” and to ask whether the laser treatment used in our experiment has a global effect, we performed a control experiment where we treated the yolk region of the embryo with the identical approach. Despite the appearance of burn marks in the treated yolk region, mesoderm invagination proceeded largely normally under this condition, with a mild reduction in the rate of furrow invagination (Authors’ Response Figure 3). Therefore, the prominent delay in the transitional state we observed after disruption of lateral ectoderm (Original Figure 6) is not likely caused by non-specific laser heating effect. In addition, in both the yolk-ablation and the ectoderm-ablation experiments, cellularization occurred normally outside of the laser-treated regions, in further support of the lack of strong non-specific effect from our laser treatment. That being said, we acknowledge that we do not fully understand the impact of the laser treatment on the embryo (e.g., what causes the reduced rate of apical constriction), and more control experiments are required in order to fully describe the tissue response we observed. As suggested by the editor, we decided to remove the ectoderm-ablation experiment from the revised manuscript and focus on the characterization of the optogenetic tool and the binary tissue response after acute actomyosin inhibition.

      Authors’ Response Figure 3. Laser treatment in the yolk region of the embryo. (a) Cartoon depicting the position of laser treatment. Similar laser condition was used as described in the original Figure 6. Laser ablation was performed during cellularization and the treated embryo was imaged during gastrulation. (b) An example control embryo without laser treatment. (d-e) Two examples showing ventral furrow formation after laser treatment in the yolk region. Only a mild delay in furrow invagination was observed. Red arrowheads indicate the invagination front. Scale bar: 25μm.

      Reviewer #3 (Public Review):

      The authors address how contractile forces near the apical surface of a cell sheet drive out-of-plane bending of the sheet. To determine whether actomyosin contractility is required throughout the folding process and to identify potential actomyosin independent contributions for invagination, they develop an optogenetic-mediated inhibition of myosin and show that myosin contractility is critical to prevent tissue relaxation during the early stage of folding but is dispensable for the deepening of the invagination. Their results support the idea that the mesoderm is mechanically bistable during gastrulation. They propose that this mechanical bistability arises from an in-plane compression from the surrounding ectoderm and that mesoderm invagination is achieved through the combination of apical constriction and tissue compression. Regarding global message of the manuscript, I have two main critics. The authors consider their work as the first to prove that there is a additional mechanism to apical constriction leading to invagination. This is not true. First, the fact that the ectoderm could exert a compressive force on the invaginating mesoderm is not new and has been not only proposed, but tested previously (Rauzi and Leptin, 2015). Second, several recent publications demonstrated that on top of apical constriction, lateral forces were also required for the invagination and the authors ignore these data (Gracia et al, 2019 ; John et al, 2021).

      We thank the reviewer for this important comment. In the original Introduction, we have mentioned several previous studies that suggest the presence of additional mechanisms to apical constriction during ventral furrow formation. We stated: “The observation that the maximal rate of apical constriction and the maximal rate of tissue invagination occur at distinct times suggests that apical constriction does not directly cause tissue invagination (Polyakov et al., 2014; Rauzi et al., 2015). A number of computational models also predict that mesoderm invagination requires additional mechanical input, such as “pushing” forces from the surrounding ectodermal tissues, but experimental evidence for this additional mechanical input remains sparse (Munoz et al., 2007; Conte et al., 2009; Allena et al., 2010; Brodland et al., 2010).”

      To address the reviewer’s comment, in the revised manuscript, we expanded this paragraph to further elaborate the previous contributions: “However, accumulating evidence suggests that apical constriction does not directly drive invagination during the shortening phase. First, it has been observed that the maximal rate of apical constriction (or cell lengthening) and the maximal rate of tissue invagination occur at distinct times (Polyakov et al., 2014; Rauzi et al., 2015). Second, it has been previously proposed, and more recently experimentally demonstrated, that myosin accumulated at the lateral membranes of constricting cells (‘lateral myosin’) facilitates furrow invagination by exerting tension along the apical-basal axis of the cell (Brodland et al., 2010; Conte et al., 2012; Gracia et al., 2019; John and Rauzi, 2021). Finally, a number of computational models predict that mesoderm invagination requires additional mechanical input from outside of the mesoderm, such as “pushing” forces from the surrounding ectodermal tissue (Munoz et al., 2007; Conte et al., 2009; Allena et al., 2010; Brodland et al., 2010). These models are in line with the finding that blocking the movement of the lateral ectoderm by laser cauterization inhibits mesoderm invagination (Rauzi et al., 2015). A similar disruption of ventral furrow formation can also be achieved by increasing actomyosin contractility in the lateral ectoderm (Perez-Mockus et al., 2017). While these pioneer studies highlight the importance of cross-tissue coordination during mesoderm invagination, the actual mechanical mechanism that drives the folding of the mesodermal epithelium and the potential role of the surrounding ectodermal tissue remain to be elucidated.”

      One of the motivations for us to develop experimental approaches to detect compression in the ectoderm (original Figure 4) and to disrupt the ectoderm (original Figure 6) is the lack of direct evidence demonstrating the mechanical contribution of the ectoderm to mesoderm invagination. Several studies have shown that manipulations of the ectodermal tissue can impair ventral furrow formation. One study shows that preventing the movement of the lateral ectoderm, by anchoring ectodermal cell apices to the vitelline membrane, blocks ventral furrow invagination(Rauzi et al., 2015). Another study shows that upregulation of apical myosin contractility in the lateral ectodermal tissues can inhibit or even reverse the furrow invagination process (Perez-Mockus et al., 2017). These results indicate that an increase in the resistance to mesoderm movement can impair mesoderm invagination. However, this would be expected even if the ectoderm does not provide active mechanical input to facilitate mesoderm invagination. Therefore, these experiments, while very informative, did not provide direct evidence for a role of ectodermal compression in mesoderm invagination.

      Another motivation for us to examine potential mechanisms outside of the mesoderm is the observation that ventral furrow invagination continues even when both apical myosin and lateral myosin are disrupted after Ttrans (Late Group embryos). This result indicates that factors other than apical or lateral myosin must be responsible for the invagination of the furrow in Late Group embryos. In the revised manuscript, we used a modeling approach to demonstrate that lateral myosin and ectodermal compression may function in parallel to promote the invagination of the ventral furrow (Figure 7). In the revised Discussion, we propose that “ventral furrow formation is mediated through a joint action of multiple mechanical inputs. Apical constriction drives initial indentation of ventral furrow, which primes the tissue for folding, whereas the subsequent rapid folding of the furrow is promoted by bistable characteristic of the mesoderm and by lateral myosin contractions in the constricting cells.”

      They generated an optogenetic tool, "Opto-Rho1DN", to inhibit Rho1 through light-dependent plasma membrane recruitment of a dominant negative form of Rho1 (Rho1DN). The specificity of local inactivation of Myosin was tested on apical myosin before and during invagination. They observed a strong reduction of Myosin II recruitment and a phenotype that mimicks Rok inhibition. They found that acute loss of myosin contractility during most of the lengthening phase results in immediate relaxation of the constricted tissue, but similar treatment near or after the lengthening-shortening transition does not impede invagination. They conclude that the second part of furrow invagination is not due to myosin activities at the apical or lateral cortices of the mesodermal cells and that actomyosin contractility is required in the early but not the late phase of furrow formation. This part regarding the temporal requirement of Myosin during invagination brings novelty in the field since it has never been tested before.

      We thank the reviewer for the comment on the novelty of our work.

      They observe that ectodermal cells shorten their apico-basal axis prior to Ttrans, and that compression from the ectoderm is independent of ventral furrow formation since it still occurs even if invagination is inhibited.

      They further develop two types of simulations to test theoretically the importance of compressive stress in the invagination process. The theoretical part would need to be further developed and discussed. They would need to integrate all the different components that have been shown to be essential for the invagination (not only apical constriction) and the dynamic aspect of the vertex model has to be clearly explained.

      We thank the reviewer for the suggestions on the modeling parts. In the energy-based vertex model (the Polyakov model, original Figure 3), two previously identified mechanisms, apical constriction and basal relaxation, have been implemented in the model to drive lengthening-shortening cell shape change and furrow invagination. Following the reviewer’s suggestions, we have modified the Polyakov model to include additional mechanisms that have been shown to facilitate ventral furrow invagination. In particular, we focused our analysis on the role of lateral myosin in the constricting cells on furrow invagination (Figure 7). Please refer to our response to the combined comments for details (in the section “ Additional modeling analysis to test the known mechanisms for mesoderm invagination”).

      As for the dynamic vertex model presented in our original manuscript (original Figure 5), as detailed in our response to Reviewer #1’s comment #5, since the revised manuscript is focused on the bistable characteristics of the mesoderm during gastrulation rather than tissue dynamics during the folding process, we decide to leave this part out of our revised manuscript as suggested by the editor.

    1. Author Response

      Reviewer #3 (Public Review):

      The authors analyzed several models for predicting the early onset of T2D, where they trained and tested on a UKB based cohort, aged 40 - 69 and suggest two simple logistic regression models: the anthropometric and the five blood tests models in reference to FINDRISC and GDRS models. Their models achieved better auROC, APS, and decile prevalence OR, and better-calibrated predictions.

      Strengths:

      1.The authors have neatly explained their objectives and performed well-justified analyses.

      2.The authors highlight how using both features - HbA1C% measure and reticulocyte count may provide a better indication of the average blood sugar level during the last two-three months than using just the standard HbA1C% measure.

      3.Further verification of the proposed anthropometric-based and 5 blood-test results-based modelscan discriminate discriminating within a group of normoglycemic participants and within a group of pre-diabetic participants resulted in outperforming the FINDRISC and the GDRS based models.

      Weaknesses:

      1. As the authors point out in the manuscript that these models are suited for the UKB cohort or populations with similar characteristics. It limits the extrapolation of these findings onto another cohort from a different background until analyzed on another country/continent-based cohort.

      We agree with this comment as we indeed pointed in the paper. We recommend to adjust these models when applying it to populations with distinct characteristics.

      1. In the methods section, an additional explanation of how the T2D prevalence bins were formed would be useful to a reader.

      We thank the reviewer for this note, we added the following explanation in section 4.11: “We considered several potential risk score limits that separate T2D onset probability in each of the scores groups, and we chose boundaries that showed a separation between the risk groups on the validation datasets. Once we decided on the boundaries of the score, we report the prevalence in each risk group on the test set and we report these results.”

      1. The authors have mentioned that the prevalence of diabetes has been rising more rapidly in low and middle-income countries (LMICs) than in high-income countries and the objective of the present research was to develop clinically usable models which are easy to use and highly predictive of T2D onset. As lifestyle is also one of the contributory factors for T2D, additional analysis that includes a comparison of groups between low-income and high-income subjects within UKB-based cohort provided such metadata available would help understand if the prevalence for T2D differs or not between such groups.

      We thank the reviewer for this comment, we added below an analysis that we run on our data, showing the deprivation indexes differences between sick and healthy populations. The sick population has a higher deprivation index as expected. When running a Mann-Whitney U Test on the data we get a p value of zero, creating this with a sample of just 1000 participants from each group, we get a p-value of 2.37e-137. This indicates that there is a significant correlation between deprivation index and tendency to develop T2D. We also add this finding to the supplementary material and a reference to it.

      You can also find below a SHAP diagram showing tht higher Townsend deprivation index is pushing the prediction for T2D upwards.

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper the authors use a conditional knockout strategy to assess the effects of deletion of the dominant oxygen-sensing hypoxia-inducible factor (HIF) hydroxylase enzyme, prolyl hydroxylase 2 (Phd2) restricted to the regulatory T cell (Treg) lineage. They use a well-established Foxp3-driven Cre recombinase allele. Phd2 is thus silenced in cells that have expressed or continue to express Foxp3 from the time this transcription factor, which is essential for Treg development and function, first occurs. They show that this approach leads to a change in Treg behaviour resulting in loss of some aspects of regulatory function and development of a Th1-like phenotype by the Foxp3 expressing cells. Effects are in general reversed when HIF-2 is silenced alongside Phd2, and may be amplified by simultaneous silencing of the HIF-1 isoform.

      The findings overlap with those reported following generalised silencing of Phd2 and following adoptive transfer of Treg in which Phd2-silencing is induced (Yamamoto et al., 2019) and are broadly compatible with those reported following a similarly Treg-restricted knockout of the von Hippel-Lindau gene (the recognition component of the E2-ubiquitin ligase that targets HIF-alpha chains that have been modified by Phd2) (Lee et al., 2015) but the results reported also differ significantly from these earlier reports in a number of intriguing respects which I feel warrant further discussion and ultimately investigation.

      The Introduction is in general informative and well written but it is a shame that it does not contain more discussion of the current state of knowledge of the interplay between HIF signalling and Treg function. This would provide a platform for a more detailed and scholarly discussion of the similarities and differences between this work and existing literature in the Discussion, where existing papers are currently described rather briefly. The introduction contains the statement 'Further complexity in this pathway has been provided by the identification of additional, non-HIF-related, PHD substrates, suggesting a role of proline hydroxylation in other settings requiring oxygen-dependent regulation', citing a single reference. This does not really represent the complex balance of arguments across the literature about non-HIF substrates for the HIF hydroxylase enzymes.

      The conclusions of this paper are mostly well supported by data, but some aspects need to be clarified and extended.

      We sincerely apologize for our apparent lack of recognition of previous work performed by other colleagues active in this field. We have now modified the Introduction section, to provide a better, yet concise, overview of the current knowledge of hypoxia signalling in regulatory T cell biology.

      A central issue for any conditional knock-out strategy is whether the intended tissue restriction is successfully achieved. The authors acknowledge that some issues have been reported with the Cre-recombinase allele they use. They, however, show the expected restriction to cells of the Treg lineage in two of the lymphoid tissues under investigation (spleen and mesenteric lymph node - Supplementary figure 1b) but do not show similar results for other tissues. Some concerns arise because in Figure 8b YFP (which is expressed alongside the Cre-recombinase) is visible in what appears to be the endothelium of the spleen. Additionally, the spleen sections illustrated show convincing splenomegaly in the Phd2-deficient Treg mice but expansion of the red pulp appears to be at least as prominent as any changes that might have occurred in the white pulp. Furthermore, the gross changes in abdominal appearances described as a 'hemorrhagic abdomen' (Figure 1c) include a more plethoric abdominal wall, prominent intestinal blood vessels and a much darker, and perhaps enlarged, liver compared with the control animal. These appearances might result from increased angiogenesis and / or erythropoiesis, neither of which would be expected to result from Treg lineage restricted Phd2 knockout but are known to occur with Phd2 ablation in other tissues. If there is convincing evidence of haemorrhage it would be nice to see this more obviously displayed macro- or, perhaps better still, microscopically.

      We thank the reviewer for this comment. We have now provided a better description of the haematological status of these mice, in which an elevated haematocrit and increased vascular permeability has been observed (now depicted in supplemental Figure 2). As suggested, we found indeed minimal, yet sizable expression of the Cre recombinase (as judged by YFP expression) in CD45-negative, non-lymphoid cells in all organs examined (as now depicted in supplemental Figure 9). Finally, none of the organs examined displayed an increased expression of erythropoietin (as judged by a sensitive qPCR assay, data not shown), a likely candidate for the haematological abnormalities observed in these mice. The mechanism underlying the apparent extramedullary erythropoiesis occurring in these mice remains therefore to be established. Noteworthy however, an additional experiment performed following a suggestion from one of the reviewers (see Figure 3 and our response 23), strongly suggests that PHD2 affects the Treg phenotype in a cell autonomous fashion. We do however acknowledge that the tissue abnormalities preclude any firm conclusion related to the positioning of Tregs within the spleen and have therefore deleted this section from the manuscript and adapted our conclusion consequently.

      Given that the Cre-recombinase allele used is expressed through the endogenous Foxp3 locus which is located on the X-chromosome and thus subject to random inactivation in the cells of females it is important that the sex of animals used in the experiments is specified.

      This has now been done in the Figure legends

      Experiments show alterations in Phd2-deficient Treg mice compared with control mice in homeostatic proliferation in a lymphopenic environment (Figure 3), the induction of colitis by DSS colitis (Figure 4) and the response to Toxoplasma gondii infection (Figure 4). Given the time courses these effects are likely to be real but interpretation is complicated by the spontaneous effects on the colon of Phd2-deficient Treg mice reported in Figure 1d and e. Given the wide general importance of interferon-gamma in immune / inflammatory responses I am not sure how much weight to place on the observation that concurrent interferon-gamma knockout results in loss of the Phd2-deficient Treg mice pro-inflammatory phenotype (Figure S3). No differences are seen in an in vivo model in which inflammation is induced by injection of anti-CD3 antibodies (Figure S2).

      Although the point is well taken, we felt it was important to perform a few experiments to illustrate the specificity of the inflammatory syndrome observed in these mice. We acknowledge the fact that the effect of concurrent loss of interferon-gamma on the phenotype of PHD2ΔTregs could have been anticipated. Additionnaly, we also think that the fact that these mice retain the same sensitivity to a “Th17-dominated” inflammatory response (also leading to a loss of weight) strengthens one of the messages of the manuscript, i.e. that loss of PHD2 expression affects Treg function in a selective, Th1-oriented fashion.

      An important conceptual difference between the interpretation of results reported here and those reported by Yamamoto et al. is that the 'Phd2-deficient Treg' purified here do not show a change in regulatory function in vitro whereas those used by Yamamoto et al. failed to act normally as regulatory cells. It is unclear whether this is due to differences in the way proliferation was stimulated, the cell purification strategies used (YFP+ in the current work; CD4+;CD25+ in Yamamoto et al.), the silencing of Phd2 (by knockout throughout development here versus through an inducible-shRNA only in mature cells in Yamamoto et al.), some other feature of the experiments (e.g. the use of feeder cells) or whether a difference would be revealed by more extensive titration. The result reported here is somewhat surprising given the presence of a Th1-like immunophenotype in the cells used in these in vitro suppression assays, which at face value might mean that this immunophenotype is not responsible for changes in their regulatory capacity seen in vivo. This may be true, but it is at odds with Bayesian argumentation. It may be a coincidence, but both models in which control Treg and Phd2-deficient Treg behave similarly involve treatment with anti-CD3 antibodies, raising the possibility that these antibodies in some way nullify differences reported with other stimuli, rather than this necessarily being related to the hypothesised difference between Th1 and Th17 responses in the in vivo model.

      We fully agree with the reviewer’s comment, and we were similarly worried that the differences reported in vivo vs in vitro were due to different agonists used. We however attempted to evaluate Treg function in vitro using alternative approaches, including an assay in which allogeneic antigen-presenting cells (including T-cell depleted spleen cells or highly purified dendritic cells) were used as agonists and Interferon-gamma secretion and proliferation as readouts. In another set of experiments, we used in vitro or in vivo derived Th1 cells instead of naïve T cells as responders. In all instances examined to date, PHD2-deficient Tregs displayed an adequate suppressive function in vitro (data not shown).

      Data showing reversal of the Phd2-deficient Treg in vivo phenotype by knockout of HIF-2alpha, but not HIF-1alpha are convincing and support the data of Yamamoto et al. The observation that Treg-specific PHD2-HIF1α double knockout mice were born at sub-mendelian ratios, displayed a marked weight loss during adult life and reduced viability, indicative of a more pronounced pro-inflammatory status is reported but data is not shown. This is certainly of interest and will no doubt receive further attention. The data that Treg-selective HIF1α or HIF2α deficiency does not affect immune homeostasis in naive mice shown in Figure S4 is relevant and compelling. These results are discussed in the context of recent work published by Hsu et al., 2020 which is interesting. Taken together these data highlight the fact that results reported throughout this manuscript arise from a combination of developmental differences with those occurring in the adult animal.

      We thank the reviewer for these positive comments

      The transcriptomic data presented has not, to date, been made available to reviewers or the public. Importantly, it is reported to show a disconnection between changes in glycolytic gene expression pattern and the immune phenotype. Specifically, whilst loss of Phd2 expression in Treg is associated with alterations in their regulatory function and with induction of glycolytic genes, the change in function, but not the change in glycolytic gene expression, is reversed by simultaneous knockout of HIF-2alpha and conversely the gene expression pattern, but not the change in function, is reversed by simultaneous knockout of HIF-1alpha. This will be of great interest to those working on the hypothesis that the switch between oxidative phosphorylation and glycolysis underlies functional changes in T cells, particularly if the changes in glycolytic gene expression actually convert into changes in glycolytic flux (as observed following HIF-induction in other cell types).

      The transcriptomic data are available to the public on GEO with the code: GSE184581

      The authors propose that a change in CXCR3 expression resulting from a change in STAT1 phosphorylation (but not absolute levels of STAT1) consequent on Phd2- inactivation leads to mal-distribution of Treg (at least in the spleen), and that given the broadly paracrine action of Treg this feature alone might explain the loss of regulatory activity in vivo. This is an intriguing hypothesis based at least in part on associative data rather than a formal proof of causality. Changes in STAT1 phosphorylation following interferon-gamma stimulation are far from 'all-or-nothing' (at the timepoint illustrated many cells have normal pSTAT1 levels even though the mean fluorescence intensity is reduced). Results in Figure 7b show that changes in STAT1 phosphorylation are seen in conventional Foxp3 negative T cells; since Phd2 knockout is restricted to the Treg lineage this change is presumably indirect, raising the possibility that the change seen in Treg is also indirect, rather than truly cell autonomous. Changes in pSTAT1 are acknowledged to affect a huge number of genes / processes so picking any one as the total explanation for any change in behaviour may be an over simplification. The analysis of changes in Treg localisation in the spleen is potentially interesting and may reach the correct conclusion but the methodology used is not clearly explained and in particular it is not clear how splenomegaly / changes in gross splenic architecture have been taken into account.

      We fully agree with the reviewer comments and have now deleted the final figure of our manuscript dealing with Treg positioning in the spleen. We indeed agree that due to the morphological changes in spleen size and architecture, more detailed work would be required to confirm our initial hypothesis. Unexpectedly, and thanks to a remark from another reviewer, we found that PHD2-deficient Tregs (which are present at high frequencies in the spleen of PHD2ΔTregs mice) are largely outcompeted both in heterozygous PHD-2fl/fl Cre+/- mice (see Figure 3) and upon equal transfer into WT mice of a 1:1 mix of wt and PHD-2-deficient Tregs, greatly complicating the study of the relative positioning of these cells within lymphoid organs. We do however stand by our previous conclusion suggesting that STAT1-signaling appears as affected in PHD2-deficient Tregs. This conclusion is not only supported by the reduced accumulation of pSTAT1 in these cells, as shown in Figure 8, but also by the bioinformatic analysis of transcriptomic data and the confirmation, at the protein level, of the reduced expression CXCR3 a well characterized STAT1-dependent chemokine receptors (as shown in Figure 8).

      Overall, this work contains many interesting datasets which need to be taken into account as we build our understanding of the intersection between HIF-signalling and regulatory T cell function, particularly as pharmacological manipulation of HIF signalling may provide a route to immunomodulation through alterations in regulatory T cell function.

      We thank again the reviewer for this positive appreciation of our work.

    1. Author Response

      Reviewer #1 (Public Review):

      The key question addressed of this MEG study is whether speech is represented singly or multiplexed in the human brain in the linguistic hierarchy. The authors used state-of-the-art analyses (multivariate Temporal Response Functions) and probablilistic information-theoretic measures (entropy, surprisal) to test distinct contextual speech processing models at three hierarchical levels. The authors report evidence for the coexistence of local and global predictive speech processing in the linguistic hierarchy.

      The work uses time resolved neuroimaging with state-of-the-art analyses and cognitive (here, linguistic) modeling. The study is very well conducted and draws from very different fields of knowledge in convincing ways. I see one limitation of the current study in that the authors focused on phase-locked responses, and I hope future work could extend to induced activity.

      Overall, the flow in the MS could be streamlined. Some smoothing in the introduction would be helpful to extract the main key messages you wish to convey.

      For instance, in the abstract:

      -Can you explain the two views in a simpler way in the abstract and to a non-linguistic audience? Do you mean to say that classic psycholinguistic models tend to follow a strict hierarchically integration (analysis only) but an alternative model is hierarchically inferential (analysis by synthesis)?

      -Indicate early on in abstract or intro where the audience is being led with a concise message on how you address the main question. For instance:

      To contrast our working hypotheses A and B, we used a novel information-theoretic modeling approach and associated measures (entropy, surprisal), which make clear predictions on the latency of brain activity in responses to speech at three hierarchal contextual levels (sublexical, word and sentence).

      We have revised the Abstract and Introduction to reduce the amount of terminology and add additional explanations. Wherever possible, we now use general terms (“bottom up”, “predictions”, “context”, …) instead of terms associated with specific theories. We hope we found a balance between improving accessibility and retaining the qualities seen by Reviewer 2, who thought the Introduction was clearly written and well connected to the psycholinguistics literature.

      All the models we compare are compatible with an analysis by synthesis approach, as long as the generative models are understood to entail making probabilistic predictions about future input. The generative models in analysis by synthesis, then, are one way in which “to organize internal representations in such a way as to minimize the processing cost of future language input“ (Introduction, first paragraph). We have clarified this in the first paragraph of the Introduction.

      • Why did the authors consider that the evoked response is the proper signal to assess as opposed to oscillatory (or non phase-locked) activity?

      The primary reason for our choice of dependent measure is the prior research we based our design on, showing that the linguistic entropy and surprisal effects are measurable in phase-locked responses (Brodbeck et al., 2018; Donhauser and Baillet, 2020). We have made this more explicit in part of the Introduction where we introduce our approach (“To achieve this, we analyzed …”).

      As to oscillatory dependent measures, we consider them an interesting but parallel research question. We are not aware of specific corresponding effects in non-phase locked activity. Accordingly, analyzing oscillatory responses without a clear prior hypothesis would require additional decisions, such as which bands to analyze, which would entail issues of multiple comparison. An additional caveat is that the temporal resolution of oscillatory activity is often lower than that of phase-locked activity, which might potentially make it harder to distinguish responses based on their latency as we did here, to test whether the latency of different context models differ.

      • Parallel processing with different levels of context (hence temporal granularities) sounds compatible with temporal multiplexing of speech representation proposed by Giraud & Poeppel (2012) or do the authors consider it a separate issue?

      We consider our investigation orthogonal to the model discussed by G&P (2012). G&P’s model is about the organization of acoustic information at different time-scales, and does not discuss the influence of linguistic constructs at the word level and above. On the other hand, the information-theoretic models that form the basis of our analysis track the linguistic information that can be extracted from the acoustic signal. The temporal scales invoked by G&P’s model are also different from the ones used here, defined based on acoustic vs. linguistic units. Thus, the kind of neural entrainment as a mechanism for speech processing hypothesized by G&P is fully compatible with our account, but not at all required by it.

      Methods:

      • Figure 2: please spell out TRFs and clarify the measured response

      We have done both in the Figure legend.

      • The sample size (N=12) is very low in today standards but the statistical granularity is that of the full MEG recording. Can a power estimate be provided or clear justification of reliability of statistical measures be described.

      We appreciate and share the reviewers’ concern with statistical power and have made several modifications to better explain and rationalize our choices.

      First, to contextualize our study: The sample size is similar to the most comparable published study, which had 11 participants (Donhauser and Baillet, 2020). Our own previous study (Brodbeck et al., 2018) had more participants (28) but only a fraction of the data per subject (8 minutes of speech in quiet, vs. 47 minutes in the present dataset). We added this consideration to the Methods/Participants section.

      We also added a table with effect-sizes for all the main predictors to make that information more accessible (Table 1). This suggests that the most relevant effects have Cohen’s d > 1. With our sample size 12, we had 94% power to detect an effect with d = 1, and 99% power to detect an effect with d = 1.2. This post-hoc analysis suggests that our sample was adequately powered for the intended purpose.

      Finally, all crucial model comparisons are accompanied by swarm-plots that show each subject as a separate dot, thus showing that these comparisons are highly reproducible across participants (note that there rarely are participants with model difference below 0, indicating that the effects are all seen in most subjects).

      • The inclusion of a left-handed in speech studies in unusual, please comment on any difference (or lack thereof) for this participant and notably the lateralization tests.

      We agree that this warrants further comment, in particular given our lateralization findings. We have made several changes to address this concern. At the same time we hope that the reviewers agree with us that, with proper care, inclusion of a left-handed participants is desirable (Willems et al., 2014), and indeed is becoming more mainstream, at least for studies of naturalistic language processing (e.g. Shain et al., 2020). First, we now draw attention to the presence of a left-hander where we introduce our sample (first paragraph of the Results section). Second, we repeated all tests of lateralization while excluding the left-hander. Because this did not change any of the conclusions, we decided to keep reporting results for the whole sample. However, third, we now mark the left-handed participant in all plots that include single-subject estimates and corresponding source data files. Overall, the left-hander indeed shows stronger right-lateralization than the average participant, but is by no means an outlier.

      • The authors state that eyes were kept open or close. This is again unusual as we know that eye closure affects not only the degree of concentration/fatigue but directly impact alpha activity (which in turn affects evoked responses (1-40 Hz then 20 Hz) that are being estimated here). Please explain.

      Previous comparable studies variably asked subjects to keep their eyes closed (e.g. Brodbeck et al., 2018) or open (e.g. Donhauser and Baillet, 2020). Both modes have advantages and disadvantages, none of which are prohibitive for our target analysis (ocular artifacts were removed with ICA and oscillatory alpha activity should, on average, be orthogonal to time-locked responses to the variables of interest). Importantly however, both modes have subjective disadvantages when enforced: deliberately keeping eyes open can lead to eye strain and excessive blinking, whereas closing eyes can exacerbate sleepiness. For this reason we wanted to allow subjects to self-regulate to optimize the performance on the aspects of the task that mattered – processing meaning in the audiobook. We extended the corresponding Methods section to explain this.

      • It would be helpful to clarify the final temporal granularity of analysis. The TRFs time courses are said to be resampled to 1kHz (p22) but MEG time courses are said to be resampled at 100 Hz (p18).

      Thanks for noting this. We clarified in the TRF time-course section: the deconvolution analysis was performed at 100 Hz, and TRFs were then resampled to 1 kHz for visualization and fine-grained peak analysis.

      • The % of variance explained by acoustic attributes is 15 to 20 folds larger than the that explained by the linguistic models of interest. Can a SNR measure be evaluated on such observations?

      We appreciate this concern, which is indeed reasonable. In order to better clarify this issue we have added a new paragraph, right after Table 1. In brief, since the statistical analysis looks for generality across subjects, the raw % explained values do not directly speak to the SNR or effect size. Rather, the SNR concerns how much variability is in this value across subjects. The individual subject values in Figure 3-B, and effect sizes now reported in Table 1, show that even though the % variability that is uniquely attributable to information-theoretic quantities is small, it is consistently larger than 0 across subjects.

      Results and Figures:

      • The current figures do not give enough credit to the depth of analysis being presented. I understand that this typical for such mTRFs approach but given the level of abstraction being evaluated in the linguistic inputs, it may be helpful to show an exemple of what to expect for low vs. high surprisal for instance from the modeling perspective and over time. For instance, could Figure 1 already illustrate disctinct predictions of the the local vs. global models?

      Thank you for pointing out this gap. We have added two figures to make the results more approachable:

      First, in Figure 3 we now show an example stimulus excerpt with all predictors we used. This makes the complete set of predictors quickly apparent without readers having to collect the information from the different places in the manuscript. It also gives a better sense of the detail that is modeled in the different stimulus representations. Second, we added Figure 6 to show example predictions from the different context models, and explain better how the mTRF approach can decompose brain responses into components related to different stimulus properties.

      • Why are visual cortices highlighted in figures?

      Those were darkened to indicate that they are excluded from the analysis. We have added a corresponding explanation to the legend of Figure 3.

      • Figure 2 Fig 2A and B: can the authors quantitatively illustrate "5-gram generally leads to a reduction of word surprisal but its magnitude varies substantially between words" by simply showing the mean surprisal and its variance?

      Added to the Figure legend.

      Fig 2C: please explain the term "partial response"; please indicate for non M/EEGers what the arrow symbolizes.

      Added to the Figure legend.

      • Figure 3:

      p8: the authors state controlling for the "acoustic features" but do not clearly describe how in the methods and this control comes as a (positive) surprise but still a bit unexpected at first read. Perhaps include the two acoustic features in Fig2C and provide a short couple sentences on how these could impair or confound mTRF performance.

      We thank you for pointing out this lack of explanation. We have added a description of all the control predictors to the end of the Introduction, right after explaining the predictors of main interest. We have also added Figure 3 to give an example and make the nature of all the controls explicit.

      Have the same analysis been conducted on a control region a priori not implicated in linguistic processing? This would be helpful to comfort the current results.

      The analysis has been performed on the whole brain (excluding the insula and the occipital lobe). Figure 4 (previously Figure 3) shows that generally only regions in the temporal lobe exhibit significant contributions from the linguistic models (allowing for some dispersion associated with MEG source localization). Although this is not shown in the figure, regions further away from the significant region generally exhibit a decrease in prediction accuracy from adding linguistic predictors, as is commonly seen with cross-validation when models overfit to irrelevant predictors.

      Fig 3B-C-E: please clearly indicate what single dot or "individual value" represents. Is this average over the full ROI? Was the orientation fixed? Can some measure of variability be provided?

      Explanation of individual dots added to Figure 4-B legend (formerly 3-B). Fixed orientation added to the methods summary in the Figure 2-C legend. To provide more detailed statistics including a measure of variability we added Table 1.

      Fig3E: make bigger / more readable (too many colors: significance bars could be black)

      We have increased the size and made the significance bars black.

      • Figure 4: having to go to the next Fig (Fig5) to understand the time windows is inconvenient and difficult to follow. Please, find a work around or combine the two figures. From which ROI are the times series extracted from?

      We have combined the two figures to facilitate comparison, and have added a brief explanation of the ROI to the figure legend.

      Reviewer #3 (Public Review):

      This manuscript presents a neurophysiological investigation of the hierarchical nature of prediction in natural speech comprehension. The authors record MEG data to speech from an audiobook. And they model that MEG using a number of different speech representations in order to explore how context affects the encoding of that speech. In particular, they are interested in testing how the response to phoneme is affected by context at three different levels: sublexical how the probability of an upcoming phoneme is constrained by previous phonemes; word - how the probability of an upcoming phoneme is affected by its being part of an individual word; sentence - how the probability of an upcoming phoneme is affected by the longer-range context of the speech content. Moreover, the authors are interested in exploring how effects at these different levels might contribute - independently - to explaining the MEG data. In doing so, they argue for parallel contributions to predictive processing from both long-range context and more local context. The authors discuss how this has important implications for how we understand the computational principles underlying natural speech perception, and how it can potentially explain a number of interesting phenomena from the literature.

      Overall, I thought this was a very well written and very interesting manuscript. I thought the authors did a really superb job, in general, of describing their questions against the previous literature, and of discussing their results in the context of that literature. I also thought, in general, that the methods and results were well explained. I have a few comments and queries for the authors too, however, most of which are relatively minor.

      Main comments: 1) One concerns I had was about the fact that context effects are estimated using 5-gram, models. I appreciate the computational cost involved in modeling more context. But, at the same time, I worry a little that examining the previous 4 phonemes or (especially) words is simply not enough to capture longer-term dependencies that surely exist. The reason I am concerned about this is that the sentence level context you are incorporating here is surely suboptimal. As such, could it be the case that the more local models are performing as well as they are simply because the sentence level context has not been modeled as well as it should be? I appreciate the temporal and spatial patterns appear to differ for the sentence level relative to the other two, so that is good support for the idea that they are genuinely capturing different things. However, I think some discussion of the potential shortcomings of only including 4 tokens of context is worth adding. Particularly when you make strong claims like that on lines 252.

      We strongly agree with the reviewer that the 5-gram model is not the ultimate model of human context representations. We have added a section to acknowledge this (Limitations of the sentence context model).

      While we see much potential for future work to investigate context processing by using more advanced language models, a preliminary investigation suggests that it might not be trivial. We compared the ability of a pre-trained LSTM (Gulordava et al., 2018) to predict the brain response to words in our dataset with that of the 5-gram model. The LSTM performed substantially worse than the 5-gram model. An important difference between the two models is that our 5-gram model was trained on the Corpus of Contemporary American English (COCA), whereas the LSTM was trained on Wikipedia. COCA provides a large and highly realistic sample of English, whereas the language in Wikipedia might be a more idiosyncratic subsample. Thus, the LSTM might be worse just because it has been trained on a less representative sample of English. As an initial step we thus ought to train the LSTM on the superior COCA database, but this simple step alone would already be associated with a substantial computational cost, given the size of COCA at more than a billion words (we estimated 3 weeks on 32 GPUs in a computing cluster). Furthermore, while we acknowledge the limitations of the 5-gram model, we consider it very unlikely that its limitations are the reason that the more local models are performing well. In general, as more context is considered, the model’s predictions should become more different from the local model, i.e., a more sophisticated model should be less correlated with the local models, and should thus allow the local models to perform even better.

      2) I found myself confused about what exactly was being modeled on my first reading of pages 4 through 7. I realized then that all of the models are based on estimating a probability distribution based on phonemes (stated on line 167). I think why I found it so confusing was that the previous section talked about using word forms and phonemes as units of representation (lines 118-119; Fig 2A), and I failed to grasp that, in fact, you were not going to be modeling surprisal or entropy at the word level, but always at the phoneme level (just with different context). Anyway, I just thought I would flag that as other readers might also find themselves thinking in one direction as they read pages 4 and 5, only to find themselves confused further down.

      Thank you for pointing out this ambiguity; we now make it explicit that “all our predictors reflect information-theoretic quantities at the rate of phonemes” early on in the Expressing the use of context through information theory section.

      3) I also thought some the formal explanations of surprisal and entropy on lines 610-617 would be valuable if added to the first paragraph on page 6, which, at the moment, is really quite abstract and not as digestible as it could be, particularly for entropy.

      We appreciate that this needs to be much clearer for readers with different backgrounds. As suggested, we have added the formal definition to the Introduction, and we now also point readers explicitly to the Methods subsection that explains these definitions in more detail.

      4) I like the analysis examining the possibility of tradeoffs between context models. I wonder might such tradeoffs exist as conversational environments vary - if the complexity of the speech varies and/or listening conditions vary might there be more reliance on local vs global context then. If that seems plausible, then it might be worth adding a caveat that you found no evidence for any tradeoff, but that your experiment was pretty homogenous in terms of speech content.

      Thank you for this suggestion. We added this idea to the Discussion in the Implications for speech processing section.

    1. Author Response

      Reviewer #2 (Public Review):

      The manuscript by Carrasquilla and colleagues applied Mendelian Randomization (MR) techniques to study causal relationship of physical activity and obesity. Their results support the causal effects of physical activity on obesity, and bi-directional causal effects of sedentary time and obesity. One strength of this work is the use of CAUSE, a recently developed MR method that is robust to common violations of MR assumptions. The conclusion reached could potentially have a large impact on an important public health problem.

      Major comments:

      (1) While the effect of physical activity on obesity is in line with earlier studies, the finding that BMI has a causal effect on sedendary time is somewhat unexpected. In particular, the authors found this effect only with CAUSE, but the evidence from other MR methods do not reach statistical significance cutoff. The strength of CAUSE is more about the control of false positive, instead of high power. In general, the power of CAUSE is lower than the simple IVW method. This is also the case in this setting, of high power of exposure (BMI) but lower power of outcome (sedentary time) - see Fig. 2B of the CAUSE paper.

      It does not necessarily mean that the results are wrong. It's possible for example, by better modeling pleiotropic effects, CAUSE better captures the causal effects and have higher power. Nevertheless, it would be helpful to better understand why CAUSE gives high statistical significance while others not. Two suggestions here:

      (a) It is useful to visualize the MR analysis with scatter plot of the effect sizes of variants on the exposure (BMI) and outcome (sedentary time). In the plot, the variants can be colored by their contribution to the CAUSE statistics, see Fig. 4 of the CAUSE paper. This plot would help show, for example, whether there are outlier variants; or whether the results are largely driven by just a small number of variants.

      We agree and have now added a scatter plot of the expected log pointwise posterior density (ELPD) contributions of each variant to BMI and sedentary time, and the contributions of the variants to selecting either the causal model or the shared model (Figure 2-figure supplement 1 panel A). We identified one clear outlier variant (red circle) that we thus decided to remove before re-running the CAUSE analysis (panel B). We found that the causal effect of BMI on sedentary time remained of similar magnitude before and after the removal of this outlier variant (beta=0.13, P=6x10-4 and beta=0.13, P=3x10-5, respectively) (Supplementary File 1 and 2).

      We have added a paragraph in the Results section to describe these new findings:

      Lines 204-210: “We checked for outlier variants by producing a scatter plot of expected log pointwise posterior density (ELPD) contributions of the variants to BMI and sedentary time (Supplementary File 1), identifying one clear outlier variant (rs6567160 in MC4R gene) (Figure 2, Appendix 1—figure 2). However, the causal effect of BMI on sedentary time remained consistent even after removing this outlier variant from the CAUSE analysis (Supplementary File 1 and 2).”

      (b) CAUSE is susceptible to false positives when the value of q, a measure of the proportion of shared variants, is high. The authors stated that q is about 0.2, which is pretty small. However, it is unclear if this is q under the causal model or the sharing model. If q is small under the sharing model, the result would be quite convincing. This needs to be clarified.

      We thank the reviewer for a very relevant question. We have now clarified in the manuscript that all of the reported q values (~0.2) were under the causal model (lines 202-203). We applied the strict parameters for the priors in CAUSE in all of our analyses, which leads to high shared model q values (q=0.7-0.9). To examine whether our bidirectional causal findings for BMI and sedentary time may represent false positive results, we performed a further analysis to identify and exclude outlier variants, as described in our response to Question 7. I.e. we produced a scatter plot of expected log pointwise posterior density (ELPD) contributions of each variant to BMI and sedentary time, and the contributions of the variants to selecting either the causal model or the shared model (Supplementary Figure 2 panel A, shown above). We identified one clear outlier variant (red circle) that we thus removed (panel B), but the magnitude of the causal estimates was not affected by the exclusion of the variant (Supplementary File 1 and 2).

      (2) Given the concern above, it may be helpful to strengthen the results using additional strategy. Note that the biggest worry with BMI-sedentary time relation is that the two traits are both affected by an unobserved heritable factor. This hidden factor likely affects some behavior component, so most likely act through the brain. On the other hand, BMI may involve multiple tissue types, e.g. adipose. So the idea is: suppose we can partition BMI variants into different tissues, those acted via brain or via adipose, say; then we can test MR using only BMI variants in a certain tissue. If there is a causal effect of BMI on sedentary time, we expect to see similar results from MR with different tissues. If the two are affected by the hidden factor, then the MR analysis using BMI variants acted in adipose would not show significant results.

      While I think this strategy is feasible conceptually, I realize that it may be difficult to implement. BMI heritability were found to be primarily enriched in brain regulatory elements [PMID:29632380], so even if there are other tissue components, their contribution may be small. One paper does report that BMI is enriched in CD19 cells [PMID: 28892062], though. A second challenge is to figure out the tissue of origin of GWAS variants. This probably require fine-mapping analysis to pinpoint causal variants, and overlap with tissue-specific enhancer maps, not a small task. So I'd strongly encourage the authors to pursue some analysis along this line, but it would be understandable if the results of this analysis are negative.

      We thank the reviewer for a very interesting point to address. We cannot exclude the possibility of an unobserved heritable factor acting through the brain, and tissue-specific MR analyses would be one possible way to investigate this possibility. However, we agree with the reviewer that partitioning BMI variants into different tissues is not currently feasible as the causal tissues and cell types of the GWAS variants are not known. Nevertheless, we have now implemented a new analysis where we tried to stratify genetic variants into “brain-enriched” and “adipose tissue-enriched” groups, using a simple method based on the genetic variants’ effect sizes on BMI and body fat percentage.

      Our rationale for stratifying variants by comparing their effect sizes on BMI and body fat percentage is the following:

      BMI is calculated based on body weight and height (kg/m2) and it thus does not distinguish between body fat mass and body lean mass. Body fat percentage is calculated by dividing body fat mass by body weight (fat mass / weight * 100%) and it thus distinguishes body fat mass from body lean mass. Thus, higher BMI may reflect both increased fat mass and increased lean mass, whereas higher body fat percentage reflects that fat mass has increased more than lean mass.

      In case a genetic variant influences BMI through the CNS control of energy balance, its effect on body fat mass and body lean mass would be expected to follow the usual correlation between the traits in the population, where higher fat mass is strongly correlated with higher lean mass. In such a scenario, the variant would show a larger standardized effect size on BMI than on body fat percentage. In case a genetic variant more specifically affects adipose tissue, the variant would be expected to have a more specific effect on fat mass and less effect on lean mass. In such scenario, the variant would show a larger standardized effect size on body fat percentage than on BMI.

      We therefore stratified BMI variants into brain-specific and adipose tissue-specific variants by comparing their standardized effect sizes on BMI body body fat percentage. Of the 12,790 variants included in the BMI-sedentary time CAUSE analysis, 12,266 had stronger effects on BMI than on body fat percentage and were thus classified as “brain-specific”. The remaining 524 variants had stronger effects on body fat percentage than on BMI (“adipose tissue-specific”). To assess whether the stratification of the variants led to biologically meaningful groups, we performed DEPICT tissue-enrichment analyses. The analyses showed that the genes expressed near the “brain-specific” variants were enriched in the CNS (figure below, panel A), whereas the genes expressed near the “adipose tissue-specific” variants did not reach significant enrichment at any tissue, but the showed strongest evidence of being linked to adipocytes and adipose tissue (figure below, panel B).

      Figure legend: DEPICT cell, tissue and system enrichment bar plots for BMI-sedentary time analysis.

      Having established that the two groups of genetic variants likely represent tissue-specific groups, we re-estimated the causal relationship between BMI and sedentary time using CAUSE, separately for the two groups of variants. We found that the 12,266 “brain-specific” genetic variants showed a significant causal effect on sedentary time (P=0.003), but the effect was attenuated compared to the CAUSE analysis where all 12,790 variants (i.e. also including the 524 “adipose tissue-specific” variants) were included in the analysis (P=6.3.x10-4). The statistical power was much more limited for the “adipose tissue-specific” variants, and we did not find a statistically significant causal relationship between BMI and sedentary time using the 524 “adipose tissue-specific” variants only (P=0.19). However, the direction of the effect suggested the possibility of a causal effect in case a stronger genetic instrument was available. Taken together, our analyses suggest that both brain-enriched and adipose tissue-enriched genetic variants are likely to show a causal relationship between BMI and sedentary time, which would suggest that the causal relationship between BMI and sedentary time is unlikely to be driven by an unobserved heritable factor.

      Minor comments

      The term "causally associated" are confusing, e.g. in l32. If it's causal, then use the term "causal".

      We have now changed the term “causally associated” to “causal” throughout the manuscript.

      Reviewer #3 (Public Review):

      Given previous reports of an observational relationship between physical inactivity and obesity, Carrasquilla and colleagues aimed to investigate the causal relationship between these traits and establish the direction of effect using Mendelian Randomization. In doing so, the authors report strong evidence of a bidirectional causal relationship between sedentary time and BMI, where genetic liability for longer sedentary time increases BMI, and genetic liability for higher BMI causally increases sedentary time. The authors also give evidence of higher moderate and vigorous physical activity causally reducing BMI. However they do note that in the reverse direction there was evidence of horizontal pleiotropy where higher BMI causally influences lower levels of physical activity through alternative pathways.

      The authors have used a number of methods to investigate and address potential limiting factors of the study. A major strength of the study is the use of the CAUSE method. This allowed the authors to investigate all exposures of interest, in spite of a low number of suitable genetic instruments (associated SNPs with P-value < 5E-08) being available, which may not have been possible with the use of the more conventional MR methods alone. The authors were also able to overcome sample overlap with this method, and hence obtain strong causal estimates for the study. The authors have compared causal estimates obtained from other MR methods including IVW, MR Egger, the weighted median and weighted mode methods. In doing so, they were able to demonstrate consistent directions of effects for most causal estimates when comparing with those obtained from the CAUSE method. This helps to increase confidence in the results obtained and supports the conclusions made. This study is limited in the fact that the findings are not generalizable across different age-groups or populations - although the authors do state that similar results have been found in childhood studies. As the authors also make reference to, due to the nature of the BMI genetic instruments used, the findings of this study can only inform on the lifetime impact of higher BMI, and not the effect of a short-term intervention.

      The findings of this study will be of interest to those in the field of public health, and support current guidelines for the management of obesity.

      We thank the Reviewer for the valuable feedback and insights. We agree that the lack of generalizability of the findings across age groups and populations is an important limitation. We have now mentioned this in lines 341-342 of the manuscript:

      “The present study is also limited in the fact that the findings are not generalizable across different age-groups or populations.”

    1. Author Response

      Reviewer #2 (Public Review):

      This paper combines neuroimaging, behavioral experiments, and computational modeling to argue that (a) there is a network of brain areas that represent physical stability, (b) these areas do so in a way that generalizes across many kinds of instability (e.g., not only a tower of blocks about to fall over, but also a person about to fall off a ladder), and (c) that this supports a simulation account of physical reasoning, rather than one based on feedforward processing; this last claim arises through a comparison of humans to CNNs, which do an OK job classifying physical instability but not in a way that transfers across these different stability classes. In my opinion, this is a lovely contribution to the literatures on both intuitive physical reasoning and (un)humanlike machine vision. At the same time, I wasn't sure that the broader conclusions followed from the data in the way the authors preferred, and I also had some concerns about some of the methodological choices made here.

      1. The following framing puzzled me a bit, and even seemed to raise an unaddressed confound in the paper: "Here we investigate how the brain makes the most basic prediction about the physical world: whether the situation in front of us is stable, and hence likely to stay the same, or unstable, and hence likely to change in the immediate future".

      Consider the following minor worry, which sets up a more major one: This framing, which connects 'stability' to 'change' and which continues throughout the paper, seems to equivocate on the notion of 'stability'. One meaning of 'stable' is, roughly, 'unchanging'. Another meaning is 'unlikely to fall over'. The above quotation, along with others like it, makes it seem like the authors are investigating the former, since that's the only meaning that makes this quotation make sense. But in fact the experiments are about the latter -- towers falling down, people falling off ladders, etc. But these aren't the same thing! So there's a bit of wordplay happening here, it seemed to me.

      This sets up the more serious worry. As this framing reveals, unstable scenes (in the likely-to-fall-over sense) are, by their nature, scenes where something is likely to change. In that case, how do we know that the brain areas this project has identified aren't representing 'likeliness to change', rather than physical stability? There are, of course, many objects and scenes that might be highly likely to change without being at all physically unstable. Even the first example in the paper ("a dog about to give chase") is about likely changes without any physical instability. But isn't this a confound? All of the examples of physical instability explored here also involve likeliness to change! So these could be 'likely to change' brain areas, not 'physically unstable' brain areas. Right? Or if not, what am I missing?

      The caption of Figure 1 seems to get at this a bit, but in a way I admit I just found a bit confusing. If authors do after all intend "physically unstable" to mean "likely to change", then many classes of scenarios that are unexplored here seem like they would be relevant: a line of sprinters about to dash off in a race, someone about to turn off all the lights in a home, a spectacular chemical reaction about to start, etc. But the authors don't intend those scenarios to fall under the current project, right?

      The reviewer is correct that "stability" has (at least) these two different meanings, and also correct that we are investigating here the situation in which a configuration is not changing now but would be likely to change with just the slightest perturbation. Our hypothesis is that the “Physics Network” will be sensitive to the likelihood that a physical configuration will change for physical (not social) reasons. That is what our data show: we do not find the same univariate and multivariate effects for situations that are likely to change because of the behavior of an animal. This indicates that what we are decoding is not general ‘likeliness to change’ but rather physical instability in particular.

      (Also: Is stability really 'the most basic prediction' we make about the world? Who is to say that stable vs. unstable is a more basic judgment than, say, present vs. absent, or expected vs. unexpected, or safe vs. unsafe, etc? I know this is mostly just trying to get the reader excited about the results, but I stumbled there.)

      We have now modified the sentence to say: “…how the brain makes a fundamental prediction about the physical world: whether the situation in front of us is stable, and hence likely to stay the same, or unstable, and hence likely to change in the immediate future.”

      1. Laying out these issues in terms of feedforward processing vs. simulation felt a bit misleading and/or unfair to those views, given the substance of what this paper is actually doing. In particular, the feedforward view ends up getting assimilated to "what CNNs do"; but these are completely different hypotheses (or at least can be). Note, for example, that many vision researchers who don't think CNNs are good models of human vision nevertheless do think that lots of what human vision does is feedforward; that view could only be coherent if there are kinds of feedforward processing that are un-CNN-like. It would be better not to conflate these two and just say that the pattern of results rules out CNN-like feedforward processing without ruling out feedforward processing in general.

      This is a fair point, and we certainly agree that we cannot rule out all feedforward models. We have tried to be clear about this claim, e.g., here (in the Discussion: “Three lines of evidence from the present study indicate that pattern recognition alone – as instantiated in feedforward CNNs and the ventral visual pathway – is unlikely to explain physical inference in humans, at least for the case of physical stability."

      3a. I wasn't sure how impressed to be by the fact that, say, 60% classification accuracy one class of stable/unstable scenes doesn't lead to above-chance performance on another class of stable/unstable scenes. Put differently, it seems that the CNNs simply didn't do a great job classifying physical stability in the first place; in that case, how general should we expect their representations to be anyway? Now, on one hand, I could see this worry only further supporting the authors' case, since you could think of this as all the more evidence that CNNs won't have representations of stability in them. But since (a) the claims the authors are making are about feedforward processing in principle, not just in one or two CNNs, and (b) the purpose of this paper is to explore the issue of generality per se, rather than just stability, this seems inadequate. It could be that a CNN that does achieve high accuracy on physical stability judgments (90%?) would actually show this kind of general transfer; but we don't know that from the data presented here, because it's possible that the lack of generality arises from poor performance to begin with.

      You are correct in noting that CNNs don’t do a great job in classifying physical stability, which reinforces our point that pattern recognition systems are not very good at discerning physical stability. In fact, the classification accuracy that we have reported is close to the baseline performance in literature (Lerer et al 2016). Interestingly, training on the block tower dataset itself could only bring up the stability classification accuracy to 68.8% on the real-world block tower images. While this is true of the current best model of stability detection, we think that CNNs trained on large-scale datasets of stability under varying scenarios may in future be able to potentially generalize to other natural scenarios. However, to our knowledge no such datasets exist.

      3b. I wasn't sure how to think about whether showing CNNs stable and unstable scenes is a fair test of their ability to represent physical stability. Do we know that stability is all that these images have in common? Maybe the CNN is doing a great job learning some other representation. This sort of thing comes up in some recent discussions of 'shortcuts' and/or the 'fairness' of comparisons between human and machine vision, including some recent theoretical papers (see author recommendations for specific suggestions here).

      If our point were that CNNs do a great job at representing physical stability, we would indeed have to worry about low-level image confounds or “shortcuts” enabling this performance. But our point is that they do badly. If some of their already bad performance is due to image confounds/shortcuts then they are in fact doing even worse, and that only makes our point stronger.

      4a. I didn't really follow this passage, which is relied on to interpret greater activity for unstable vs stable scenes: "we reasoned that if the candidate physics regions are engaged automatically in simulating what will happen next, they should show a higher mean response when viewing physically unstable scenes (because there is more to simulate) than stable scenes (where nothing is predicted to happen)." It seems true enough that, once one knows that a scene is stable, one doesn't then need a dynamically updated representation of its unfolding. But the question that this paper is about is how we determine, in the first place, that a scene is stable or not. The simulations at issue are simulations one runs before one knows their outcome, and so it wasn't clear at all to me that there is always more to simulate in an unstable scene. Stable scenes may well have a lot to simulate, even if we determine after those hefty simulations that the scene is stable after all. And of course unstable scenes might well have very little to simulate, if the scene is simple and the instability is straightforwardly evident. Can the authors say more about why it's easier to determine that a stable scene is stable than that an unstable scene is unstable? They may have a good answer! It would just be better to see it in the paper.

      The idea here is that forward simulation happens in all cases but stops if no change has occurred since the last frame. That stopping, both represents the stability of the configuration and produces less activity. This idea is akin to the “sleep state” used for nonmoving objects in a physics engine: they do not need to be re-simulated or re-rendered if they have not moved since the last frame (Ullman et al, 2017 TICS).

      4b. I was confused a bit by the Animals-People condition, and whether to think of it as a control condition or not. The image of it in Figure 1a makes it seem like it is meant to be interpreted along the usual "physical stability" lines, just like falling towers and people on ladders, and the caption seems to say this too; it also makes intuitive sense since the man in the boat looks like he'll fall if and when the alligator attacks. But then in the main text the authors predict that the representations of stability would not extend to the Animals-People condition, because they are just supposed to be about peril but not stability. Why not? And then the results themselves are equivocal, with some findings generalizing to Animals-People and some not. I don't have much more to say here other than that I found this hard to follow.

      We used the Animals-People as a control for peril/instability that is not caused by the physical situation (but rather by another agent). Our hypothesis was that the “Physics Network” would hold information about physical stability, not just any kind of propensity for change for any reason. Hence, we predicted, that any brain region responding (only) to physical stability should not respond in a similar way to peril/non-peril conditions in the Animals-People scenario as they involve a more biological-agent driven interaction. That is what we found.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Pardo et al. describes the identification and characterization of a novel subpopulation of delta cells that normally resides in the zebrafish pancreatic islet. Using two models of beta cell ablation, the authors demonstrate that this delta cell subpopulation efficiently converts into an insulin/somatostatin co-expressing cell population to restore euglycemia. The study includes robust transcriptome data to determine that this delta cell subpopulation is characterized by the expression of sst1.1 (rather than sst2) and expresses many beta cell genes. Furthermore, the resulting insulin/sst1.1 co-expressing cells represent a long-lived population that are sufficiently functional to restore euglycemia. The study goes on to suggest that inhibition of the p53 pathway compromises formation of the bihormonal population; however this data is not as convincing. Overall, this is a novel study that suggests the existence of heterogeneous delta cell populations in the zebrafish islets and supports previous findings related to adult islet cell plasticity.

      Strengths:

      1. Although several studies have identified heterogeneous populations of islet alpha and beta cells, this is one of the first studies to demonstrate two apparently distinct delta cell populations; the study provides sufficient characterization that it should be easy to test whether a similar population exists in mammals.
      2. Demonstration that the induction of Ins/SST biohormonal cells is triggered in two independent models of beta cell ablation
      3. The use of several different transgenic fish lines to characterize the relative numbers of different islet cell populations in control and ablation conditions.
      4. High quality data, including immunofluorescence images, RNA-seq data and validation studies with appropriate controls.
      5. The extensive use of comparative transcriptome data to validate islet lineage relationships.

      Weaknesses:

      1. Although the data suggests that the newly formed bihormonal cells have sufficient function to rescue the hyperglycemic phenotype, there are no experiments to directly test the functionality of these cells.

      To address the question of the functionality of bihormonal cells, we opted for a complementary approach, ie a glucose tolerance test in adults. We showed that bihormonal cells represent the vast majority (95-99%) of all Insulin-expressing cells throughout the pancreas (see Figure 1 and new Figure 4), thereby minimizing the possibility that a putative population of pure beta cells (SST negative) would significantly contribute to regulate glycemia. The glucose tolerance test reveals that the regulation of blood glucose in regenerated fish after glucose injection is identical to CTL fish (new Figure 4). These observations strongly support that bihormonal cells are the main sources of Insulin in regenerated fish and that they are responsible for blood glucose homeostasis in the near absence of beta cells.

      1. Many of the genes cited as "beta cell specific" are also expressed in delta cells in mouse and human islets - although this could relate to species differences, it causes some confusion and could affect the ultimate interpretation.

      We include now a table Figure3-figure supplement 2.

      1. Although it is clear from the images that are presented in the manuscript that a large number of bihormonal cells arise upon beta cell ablation, the relative numbers of bihormonal cells to monohormonal Sst and insulin cells is not clearly indicated. In some cases, it appears that a large percentage convert, while in others there are only a fraction. One can extrapolate this information from the presented data (ie figure D and E), but it would have been more informative if the direct analysis was provided.

      To be more explicit, we compare now the relative number of bihormonal cells compared to sst1.1 delta-cells in the revised paper (Discussion lines 433-435). Also, for a better representation of the size of each cell population, we present now the absolute number of cells instead of % of islet cells shown in the first version in Figure 1 and new Figure 6. These quantifications reveal that the number of BH cells that are formed after ablation exceeds the number of monohormonal Sst1.1 cells. This indicates a more complex mechanism than simply direct conversion of sst1.1 cells to bihormonal cells, including neogenesis from ducts and proliferation, that we directly address now in the revised manuscript (Figure 4 and Figure 6). See also explanation in our response to Reviewer 3.

      1. The authors only refer to the fact that Pdx1 is known to be expressed in beta and delta cells in a small paragraph in the discussion; it would have been helpful if this information were introduced in the introduction and in the relevant experimental sections.

      We think that presenting Pdx1 in the Introduction section would anticipate too much on the results, so we chose to refer to Pdx1 in the Results and Discussion sections.

      1. The authors make the strong conclusion that sst1 cells directly convert into bihormonal cells based on time lapse imaging. Genetic lineage tracing would be needed to absolutely make this conclusion. The time lapse imaging can only suggest that direct conversion might be occurring.

      See our response to point 1 of Essential revisions and explanation and experimental exploration of alternative mechanisms.

      1. The inhibition of p53 appears to only cause a relatively small decrease in the number of bihormonal cells (from ~20 to ~15), somewhat undermining the conclusion that p53 promotes the formation of this cell population.

      To augment the data on p53, we present now validations of the activation in the islet of the p53 pathway by in situ hybridization with ccng1 and mdm2 (shown now in Figure 6G), two established p53 target genes that were identified in our transcriptomes. We also explore the cell cycle signatures.

      We decided to remove the experiment with pifithrin alpha. Indeed, using different timely treatments with the p53 inhibitor pifithrin alpha, we obtained two opposite responses: one that confirms the results shown in the first version of the paper (a decrease of bihormonal cells that is moreover paralleled by an increase of sst1.1:GFP cells), the other showing an increase. We think that p53 acts at different levels, possibly in monohormonal sst1.1 delta cells and in bihormonal cells and the understanding of these observations would be the focus of another project.

      Reviewer #2 (Public Review):

      This is an interesting and potentially exciting manuscript that reports, based on a series of zebrafish reporter lines, that there exists a subset of delta cells that can rapidly assume partial beta cell-like identity following beta cell ablation. This conversion correlates with the restoration of (near) normal glucose levels within 3 weeks. The major strengths are a series of technically well executed experiments that report an interesting observation of two discernable populations of delta cells. These populations are supported by transcriptome data, which validate the differences between these populations established using FISH or immunofluorescence. Major weaknesses are the lack of lineage tracing of delta cells and questions on the mechanisms underlying the origins of the bihormonal cells reported in this paper. The observation of the rapid appearance of bihormonal cells is potentially exciting and important. However, directionality of the conversion is insufficiently established. The conversation of delta to beta cells needs to be supported by direct lineage tracing. The alternative explanation that these cells are surviving beta cells that turn on somatostatin expression cannot be ruled out on the basis of the current experiments. The authors tend to extrapolate too much from their transcriptome data and subsequent pathway analyses to make claims that would be better supported by additional experiments, or toned down. The authors are right to point out the major differences in zebrafish beta cell regenerative potential and plasticity compared to mammalian models, but this diminishes the credibility of the claims of translational potential. There is value in conducting careful experiments into islet cell plasticity in a zebrafish without having to make a promise of direct translational relevance.

      All these points have been addressed.

      This paper suggests the presence of two sets of delta cells, marked by Sst1.1 and Sst1.2. The Sst1.1 cells are marked by GFP in a Sst1.1:GFP transgenic reporter. This reporter clearly is not selective for Sst1.1 cells only, as a majority of delta cells expresses GFP at dimmer levels and is Sst2 positive. This is in good agreement with the lower - but not absent - Sst1.2 and Sst2 mRNA profiles in Figure 4, but complicates the claim that it is specifically Sst1.1 delta cells that convert into bihormonal cells. An overlay between Sst1:1 and Sst1:2 or Sst2 mRNA to demonstrate that it is specifically the Sst1:1 expressing delta cells that become INS positive (Figure 1B) would help. Formal lineage tracing of the Sst1:1 delta cells is the accepted way to solidify support for this claim, but such data are absent from this paper.

      Unfortunately, we did not succeed in performing genetic lineage tracing of the sst1.1 delta-cells.

      However, we now explored alternative cellular origins of bihormonal cells such as the ducts and proliferation (new Figure 4 and Figure 6). We toned down our previous conclusion that ruled out beta cells as an origin of bihormonal cells (Figure 2).

      To follow the suggestion of Reviewer 3, we provide now the comparative expression by double fluorescent ISH of sst1.1 and sst2 mRNA with the insulin mRNA (performed in larvae, see new Figure 2C). The overlays show that insulin is coexpressed with sst1.1 specifically, but not with sst2. This demonstrates that bihormonal cells express selectively the sst1.1 somatostatin gene and provides support, though still does not demonstrate, to the hypothesis that it is specifically the Sst1:1 expressing delta cells that become INS positive.

      The model is presented as a 'beta cell ablation' model, but there are some concerns with the flow of islet cells between islet cell populations immediately following ablation and during recovery that require clarification. The beta cell population size measures between 25-35% of islet cells (Figure 1D/Figure 1Suppl2). If these cells are all ablated acutely, this should immediately lead to significant increase in the remaining non-beta cell populations, including Sst1:1 delta cells. However, this is not observed as Sst1:1 GFP+ cells are steady as a fraction of total islet cell number (Figure 1F). Instead, the population that is increased at 3 days following ablation is the mCherry-GFP double positive cell population, which accounts for approximately half of the loss of beta cells. The scenario that a portion of beta cells is not actually ablated but is instead converted into a bi-hormonal state is insufficiently explored as detailed below. If the rapid appearance of these cells were indeed attributable to the conversion of GFP cells into co-positive cells, this should have been reflected in the data of Figure 1F. However, the GFP population appears to be neither increasing to reflect the loss of beta cells, or decreasing in response to the co-expression of mCherry. In Figure 5, a drop in GFPhigh cells specifically is shown, but this reflects only a potential 5% shift of islet cell numbers from GFPhigh to potentially bihormonal cells. The live imaging data in Figure 5B are not helping as there is simply not enough spatial and axial resolution to place the mCherry signal in GFP+ cells. If both processes are balancing each other out to maintain steady numbers of GFP+ delta cells, this implies rapid proliferation of GFP positive delta cells to replenish the delta cells that become bihormonal, or the rapid proliferation of bihormonal cells shortly after they arise. Either of these scenarios should be readily demonstratable.

      This ablation model has been shown to lead to a massive destruction of beta cells through apoptosis (Curado et al, 2007) (Bergemann et al, 2018). In line with the loss of beta cells, the total number of cells (new Figure 1G), shows a downward shift after ablation. We also quantified islet cells in situ on paraffin section in Figure 6-figure supplement 1. Due to the difficulty to detect INS+ or mCherry after ablation (very low expression in bihormonal cells), we used Pdx1 as a proxy for beta and sst1.1 delta and bihormonal cells. The decrease of Pdx1+ nuclei we observed is consistent with the extent of the loss of β-cells.

      Together with the fact that we do not detect a lot of spared beta cells after ablation by lineage tracing, all these observations support that we have an efficient model of ablation. Despite this efficient ablation, we nevertheless observed some bihormonal cells derived from pre-existing beta cells (Figure 2G and close-up in E’) and now openly discuss this possible cellular source.

      We realized that our initial representation in terms of percentages “% of cells / islet” was misleading. For a more accurate representation of population size, we now present in this revised manuscript the absolute number of cells (instead of %) detected for each population (per fish), as this reflects the real size of the populations present in the dissected tissue, which contain all cells of the main islet, and make easier the comparisons between conditions and cell types.

      As pointed out by the Reviewer, the respective size of GFP monohormonal, bihormonal and beta cell populations indicate that the flow between islet cells (and potentially with non-islet cell types) is too complex to infer directionality of conversion. While ~3300 beta cells are lost and ~1500 bihormonal cells are gained, there are only ~900 monohormonal sst1.1 delta cells before ablation (GFPhigh), which is inferior to the number of BH cells formed after ablation. This suggest multiple origins of bihormonal cells, and/or proliferation. In the revised manuscript, we consider the following scenarios: i) the contribution of non-ablated beta cells to bihormonal cells (Figure 2), ii) neogenesis from ducts (Figure 4) and iii) expansion of GFP and/or bihormonal cells by proliferation (Figure 6). We discuss these results lines 433-447). These mechanisms are not mutually exclusive and are compatible with a “direct” conversion sst1.1 delta cells.

      The presumption is that new beta cells are formed, and this is based in part on lineage tracing data using the zsYellow label in conjunction with an inducible beta cell specific Cre driver strain. It is not clear why this experiment was done in developing embryos instead of during the adult stage where the original observation of the appearance of bihormonal cells that is associated with normalization of glucose levels was made. It appears that in that crucial lineage tracing experiments, the authors are ambiguous about the use of mCherry to detect beta cells after ablation. They describe beta cells as mCherry+ beta cells in the text, while they indicate in the legend and figure labels to have used INS antibody staining to detect these cells. The punctate staining that is different from the mCherry staining elsewhere in the manuscript certainly is compatible with the use of an INS antibody, but raises the question why mCherry was not used to detect beta cells which is what was used throughout the rest of the paper. This is relevant as the lack of zsYellow positivity is interpreted as a sign of beta cell neogenesis. However, these cells might have lost zsYellow precisely because they were killed and have lost their fluorescence lineage markers, including mCherry, but are still detectable by INS immunofluorescence as they have not been cleared from the islet tissue.

      The genetic tracing of beta cells was performed in larvae. The experimental details are now shown in Figure 2D. CRE recombination by 4-OHT was induced at 6 dpf before ablation at 7 dpf and the larvae were analysed at 14 dpf. We opted for larval stages since bihormonal cells appear at any stage and young small animals are more amenable to fast and efficient inducible CRE recombination (Hans et al, Plos One, 2009; Mosimann et al, Development, 2011).

      We thank the reviewer for highlighting the discrepancy about the INS/mCherry antibodies. It is indeed an anti-Insulin detection with typical punctate staining that is shown Figure 2E and quantified in Figure 2F-H, and not anti-Cherry, because of species incompatibility between antibodies in the immunodetection assay (both Cherry and zsYellow antibodies are from rabbit while INS is made in guinea pig). We have rectified in the Figure and in the corresponding text and legend.

      We think that, were the INS protein to persist in the ablated islet, its presence specifically in sst1:GFP+ cells is consistent with our transcriptomic data and with true expression of the insulin gene in bihormonal cells rather than with persistence of killed beta cells.

      However, we agree that the absence of zsYellow lineage marker as a sign of neogenesis was overinterpreted. Indeed, we clearly detect some (5.8 cells, 12% of all INS+) INS+ zsYellow+ cells (Figure 2E and E’) confirming the persistence of some traced beta cells. In fact, 4 of the 5.8 cells are sst1.1GFP+, indicating that preexisting beta cells become bihormonal. For this reason, we do not rule out anymore the beta cell origin of bihormonal cells.

      Although it is possible that the number of spared beta cells (and beta-derived bihormonal cells) is underestimated as some beta cells could have escaped excision of the Lox cassette before ablation (therefore, surviving beta cells would be zsYellow negative), we would like to stress that the ablation efficiency is very good and does not favour (but yet does not exclude) a huge contribution of beta cells to bihormonal cells.

      In the revised paper, we tone down our conclusions and consider alternative origins and mechanisms of bihormonal cells.

      The enrichment of Sst1.1 mRNA in biohormonal cells is an important piece of data that should be included instead of 'not shown'. The same is true for the statements that ROS, lack of insulin signaling and hyperglycemia all do not drive INS expression in Sst1.1 cells, which amplifies concerns that the appearance of bihormonal cells is contingent on the administration of beta cell toxins.

      We include now our “data not shown”. See new Figure 1-figure supplement 1 and Figure 6-figure supplement 2.

      To relate the interesting observation on biohormonal beta cells in zebrafish to human pancreas biology, the authors point at single cell sequencing data and then claim that 'the occurrence of SST+ and INS+ beta cells in mammals remains largely undocumented'. It strikes me that there must be dozens of papers that show high quality insulin and somatostatin co-labeling in human, primate and rodent pancreas with no evidence of clear colocalization (unless following severe beta cell ablation, see Chera et al., 2014). That actually is clear documentation of their absence.

      We realize that this point was not clear. By referring to scRNAseq data, our goal was to suggest that some Ins+ Sst+ cells could be detected at the mRNA level while we admit that there is poor, if any, evidence of naturally occurring bihormonal cells at the protein level in mammals. This part was too speculative and we removed it.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors report the discovery of a new bacterium, termed HS-3, that displays a novel form of multicellularity consisting of long filamentous structures tightly packed into a two-dimensional structure with characteristics reminiscent of liquid crystals. Motivated by the occasional immersion of the bacterial structures in water due to flooding in their cave environment, laboratory immersion is found to disrupt these structures, which can transform into clusters of coccobacillus daughter cells released by contact with water.

      As a discovery, this paper will certainly trigger great interest in this bacterium for these unusual properties. In particular biophysicists studying active matter will be fascinated by the liquid crystalline order and topological defects, which are reminiscent of those in motor/microtubule systems studied recently. The observations of filamentous forms reminds me of the work of Mendelson many years ago on a mutant of B. subtilis that fails to separate daughter colonies after division, leading to growing filaments. But those were not in a colonial form seen here.

      The paper is, however, rather descriptive, without much physical quantification of the biophysical properties. More importantly, the presentation does not make contact with much recent (and not-so-recent) work on the problem of understanding evolutionary driving forces toward multicellularity, particularly as seen in green algae and choanoflagellates.

      We introduced a series of works in the Introduction, Discussion, and Figure 1, in terms of arguments of how single cell organisms could self-organize and sustain the cells in a certain order in the evolutionary process towards multicellularity. Together with the consideration about environmental settings in the cave as an ‘Ecological scaffolding’ and the liquid crystal-like self-organization, the finding of HS-3 was properly contextualized as a new example of multicellularity. As seen in Mendelson’s pioneering work, as well as in recent works on the field of applied hydrodynamics in biology, bacteria have potential to self-organize their cells. However, as far as we know, there is no extant species that clearly shows a relation between liquid crystal phenomenon and the origin of multicellularity. We think the features of HS-3 that we report would serve as an attractive model of bacterial multicellularity useful for future studies including physical analysis and theoretical study.

      Reviewer #2 (Public Review):

      I thought this was a very cool example of bacterial multicellularity, with the description of a newly discovered bacterium that forms a sort of simply differentiated colony- a sheet of cells which then develops to contain a large bolus of small, coccoid cells, which then release into the water column upon submergence. I wasn't totally convinced that this release was developmental, as suggested by the authors- evidence that other colonies released cells at the same time could be due to multiple colonies sharing the same biophysical basis of colony formation that is disrupted by immersion in water (diffusion of extracellular polysaccharides, or even the pressure from being underwater). However, it's notoriously difficult to rigorously test evolutionary hypotheses, and I think that the microbiology here is compelling- it's a form of bacterial multicellularity that I have never seen before.

      My largest issue with the paper is that it does a very poor job of contextualizing how the research affects our understanding of the evolution of multicellularity more broadly. This paper suggests that little is known about the ecological factors selecting for simple multicellularity, but there has actually been quite a bit of work on this topic. This list is far from exhaustive, but prior work has examined a range of selective agents that can favor simple multicellularity- these include predation (Boraas 1998, Herron 2020, Bernardes, 2021), protection from antibiotics (Smukulla 2008), cooperative metabolism (Koschwanez, 2011), dispersal (smith 2014), syntrophy (Libby and Ratcliff, 2021), resource competition (Heaton 2020), and motility / division of labor (Solari 2006). Indeed, one of the things about the evolution of multicellularity is that there is no one 'route'- there are many different reasons different lineages evolve to be multicellular

      The paper is focused around the idea that 'group life' is a hypothetical "missing link" to multicellularity (see Figure 1), but this is not an open hypothesis in the field. It's been a universally accepted fact for more than 50 years. Multicellular organisms had to have evolved from simpler social groups of cells- given their phylogenetic nesting in clades of unicellular organisms, there's no other way they could have come into existence. But there is also been a great deal of work examining simple multicellular relatives of complex multicellular lineages, most notably in the volvocine green algae, holozoans (e.g., choanoflagellates and ichthosporeans), fungi, charophyte algae leading to land plants, and red algae. There is also a body of work using experimental evolution of evolve progressively more complex multicellular lineages (e.g., snowflake yeast). My central problem with this paper is that the 'group phase' they have described is far less compelling than existing work showing a 'group phase' being ancestral to more complex lineages of multicellular organisms, particularly because this multicellular lineage is not contextualized within a clade that has ultimately evolved complex multicellularity.

      In the "recommendations for authors" section, I make suggestions for how to reframe the work to better highlight its novelty, focusing it around a) the discovery of a new form of bacterial multicellularity, and b) the possibility that this reflects ecological scaffolding, a hypothesis for how multicellular organisms could have evolved by developmentally co-opting ecologically-mediated life cycles.

      The manuscript submitted to eLife was actually a different version from the preprint version in bioRxiv, but we noted the comments were based on the preprint version. We apologize for this confusion, if we have missed some submission procedure. The term ‘group life’ has been amended in the present manuscript, and instead we used the term ‘ecological scaffolding’ at the center of the Figure 1, and we think this could correct the wrong impression that evolutionary process is ‘one-route’. We also revised the Introduction to appropriately contextualize HS-3 as a new example of multicellularity among the preceding works, together with references about physiological significance. In the Discussion, we also mentioned some experimental work on evolution including ‘snowflake yeast’ (reference 48 and 49).

      As for the comment about the release of coccoid cells, we also agree that release in water itself is not a programmed developmental process. The “crowded-out” phenomenon was seen on solid agar surface (not in water, Figure 4C), but if we consider the natural niche of HS-3, the significance of the formed structure is the capability to release coccoid cells upon the trigger of immersion in water.

    1. Author Response:

      Reviewer #1:

      Bandyopadhaya et al have sought out to elucidate the immunometabolic mechanisms of monocyte tolerance induced by 2-AA, a quorum-sensing signal that is produced by Pseudomonas aeruginosa. An interesting topic, since elucidating how p. aeruginosa escapes the immune system could be very relevant from a clinical perspective.

      In previous publications, they showed that 2-AA can induce immune tolerance, leading to decreased cytokine production and epigenetic changes mediated via increased HDAC activity. In this follow-up paper, they tried to elucidate what immunometabolic changes are observed in 2-AA tolerized cells (both mouse and human cell lines) and how this can explain the improved intracellular survival of P. aeruginosa.

      The authors must be praised for the effort they put in to proof their point. They have undertaken a tremendous amount of experiments and measurements with so many different cell lines, stimuli, inhibitors and readouts. Unfortunately, the amount of figures and data also makes it very confusing and hard to read and in my opinion, they draw the wrong conclusions from the results of the experiments. Therefore, I cannot agree with some of the important statements, for example that 2-AA induces a Warburg effect. In addition, the methods are written in such a limited way, that it is hard to conclude if their conclusions are correct or to repeat these experiments.

      We thank the reviewer for their constructing comments and for appreciating the complexity of the study. We apologize for the brevity of material and methods. We hope that in addition to the data already presented, our revised manuscript will thoroughly address this reviewer’s concern on whether the Pseudomonas aeruginosa MvfR-regulated small molecule, 2-AA, indeed promotes a “Warburg-like” metabolic reprogramming in macrophages. The additional ongoing experiments, including seahorse studies and more detailed information in the materials and methods section of our manuscript, should ease this reviewer’s concerns.

      Reviewer #2:

      In the manuscript "Immunometabolic hijacking of immune cells by a Pseudomonas aeruginosa quorum-sensing signal" the authors studied the mechanism by which the quorum sensing signal 2-aminoacetophenone (2-AA), produced by the pathogen Pseudomonas aeruginosa, enables persistence of this pathogen in host tissues.

      Lactate, the fermentative product of glycolysis, reflects glycolytic fluxes and represses immune signaling activation decreasing inflammation in macrophages. Therefore, lactate levels reflect the metabolic status of the cells and has consequences for the inflammatory levels of the cells.

      In this study the authors show that 2-AA can affect the metabolic state of macrophages by increasing the glycolytic flux with the consequent increase in lactate levels and decrease in TCA flux. They also show that lactate decreases inflammation by suppressing 2-AA activation of NF-kBeta signaling and proinflammatory cytokine production.

      Using a murine model they show that addition of that 2-AA in mice infected with Pseudomonas aeruginosa results in an increase production of lactate and decrease of ATP in mice tissue, thus providing for 2-AA-mediated metabolic changes in vivo.

      The study described here is well written and the conclusions are generally well supported by the data. While they tested the direct effect of the 2-AA signal in macrophages, this was not tested in vivo in the absence of infection, and I think it is important to address the direct impact of the signal on the host.

      The study reported here proposes that a quorum sensing signal has an impact in pathogen persistence through immunometabolic reprograming properties, and provides evidence for a novel mechanism by which bacteria use quorum sensing signals to persist in the host.

      We thank the reviewer for appreciating our work, the experimental strategy, and the conclusions. We agree that the proposed additional 2-AA in vivo experiments in the absence of infection will further strengthen the in vitro studies. Additionally, they will corroborate our previously published in vivo studies on the immune responses triggered by 2-AA in absence of infection (Bandyopadhaya et al., PLoS Pathogens 2012 & Bandyopadhaya et al., Nat Microbiology, 2016).

      Reviewer #3:

      Tolerance in macrophages involves a global transcriptional shift from a pro-inflammatory response toward one characterized by the expression of anti-inflammatory and pro-resolution factors. In the case of TLR-mediated tolerance, pro-inflammatory cytokines are not universally suppressed in all tolerant cells, but distinct patterns of cytokine expression distinguished TLR-specific tolerance. (10.3389/fimmu.2018.00933, 10.1615/critrevimmunol.2015015495). However, the authors only show differences in TFNa. Thus, I strongly suggest the authors to determine anti-inflammatory cytokines, such as IL-10.

      We appreciate the reviewer’s comment and thank the reviewer for the suggestion to determine the levels of the anti-inflammatory cytokine IL-10. Indeed, we could not detect IL-10 in 2-AA tolerized cells; we will refer to this in the revised manuscript. Most likely because, as we previously demonstrated, 2-AA-mediated tolerance is markedly different from LPS mediated tolerance (Bandyopadhaya et al., PLoS Pathogens 2012 & Bandyopadhaya et al., Nat Microbiology, 2016) and TLR-regulated tolerance is primarily LPS mediated. In the previous publication (Bandyopadhaya et al. PLoS Pathogens 2012), we have also reported the IFN and anti-inflammatory TGF levels in 2-AA tolerized mouse macrophages, but we could not detect IL-10.

    1. Author Response

      Reviewer #1 (Public Review):

      In Wang et al., the authors investigate issues related to the relative proportion of flux for the enzymatic decarboxylation of pyruvate between PDH (pyruvate dehydrogenase) and PFOR (pyruvate-ferredoxin oxoreductase) in the model organism Synechococystis. The manuscript provides evidence that PDH becomes increasingly inactivated by a high ratio of NADH:NAD+ as well as evidence to suggest that PFOR is transcribed and remains intact under aerobic conditions. The authors put forward the theory that both PDH and PFOR are functionally active routes for pyruvate decarboxylation under aerobic conditions, whereas PFOR has previously been assumed to be inactive under growth conditions containing oxygen. This distinction is particularly highlighted by conditions where Synechocystis is grown photomixotrophically - and where the NADH:NAD+ pool may be relatively over-reduced because of two parallel inputs of reductant (water-splitting at PSII and catabolism of glucose). The authors examine growth under photoautotrophic and photomixotrophic conditions for a number of relevant mutants including members of the ferredoxin/flavodoxin family, PFOR, and NDH-1 complex subunits.

      The theory put forward in this manuscript is of general interest regarding electron flux through the combined electron transport chain (photosynthetic + respiratory) of cyanobacteria. The authors further broaden the potential audience for the manuscript by elaborating on the potential significance of these results in the context of a switch from PFOR (ancestral) to PDH (oxygenic/modern).

      Comments:

      Generally, theories put forward in this manuscript are intriguing and have a number of potential implications for understanding electron flux and regulation of central metabolic processes in photosynthetic microorganisms. If these theories are supported and become more generally adopted, they would have significant impact on the understanding of the regulation of central carbon metabolism in cyanobacteria. That said (due in no small part to the complexity of some of these pathways), the evidence provided to support the hypotheses is indirect in many instances. In some cases, there is a pairing of indirect data with broad statements that can come across as over-reach. These problems can be somewhat exacerbated by an unclear organization at parts of the Discussion, a lack of succinctly defined claims, and numerous typographical considerations.

      Thank you very much for this point. We now reorganized the discussion and overhauled it completely. It starts with aspects that are best supported by our data. We then added two sentences to stress that the following lines include hypothetical considerations that are meant as thought-provoking impulses. We hope that thereby over-reach is prevented.

      Major considerations:

      A major component of the proposed theories in this manuscript rest upon the assumption that PFOR is an active enzyme under highly aerobic conditions: this claim is never directly demonstrated.

      This is true. We could show though that PFOR of Synechocystis is in constrast to most bacterial PFORs stable in the presence of oxygen. However, as stated likewise for the oxygen stable PFOR of the obligate aerobe Sulfolobus acidocaldarius (3), and PFOR from E. coli, which was recently shown to contribute to metabolism in the presence of oxygen in vivo (1) we as well had to remove oxygen for enzyme acitivty in vitro. This point is discussed frankly.

      Indirect evidence of altered growth of pfor mutants, increased repression of PDH, and the higher NADH:NAD+ ratio under photomixotrophic conditions is in general alignment with this theory. However, while deletion of pfor does indeed result in altered growth dynamics in Synechocystis under periods of photomixotrophy, the alterations do not entirely align with the idea that this pathway is critical for rapid growth under aerobic conditions. For instance, pfor and most of the highlighted mutants (fdx 3, fdx 9, isiB) presented in Figure 3 show the greatest defects in their OD after reaching stationary phase (more rapid decline in OD on/after Day 6) relative to WT. This doesn't align as nicely with the highest NADH:NAD+ seen in Days 3-5 (which is also specifically called out: e.g., Line 146, Supplemental Figure S8).

      We are very cautious to compare growth experiments day by day. This is due to the fact that the growth behaviour of WT and mutants differ between experiments. We therefore repeat these experiments in several independent experiments including at least three replicates and show the data of typical growth experiments. In the case of the shown growth behaviour of WT and pfor and the NADH/NAD+ ratios under photoautotrophic and photomixotrophic conditions shown in figure 1, NADH/NAD+ ratios were determined in exactly those cultures for which growth data are shown. It is therefore legitimate to directly compare these results day by day. However, we did not determine the NADH/NAD+ ratios of the cultures shown in Fig. 3. The rise in NADH might have started with a delay here.

      In this context, the deletion of F-GOGAT is much more convincing in it's severity and timing, yet for this mutation to have a more severe phenotype is unexpected if PFOR is one of the primary/sole electron donors to the ferredoxin pool from glucose utilization as proposed (i.e., stated differently, F-GOGAT is only one of the enzymes downstream of ferrodoxin and might be expected to have a more subtle phenotype in comparison to the KO of PFOR if that is a primary source for electrons to ferredoxin under photoheterotrophic conditions).

      F-GOGAT requires reduced ferredoxin which can be provided by PFOR and in addition also by PSI. As electrons from glucose oxidation can be fed via photosynthetic complex I into the PQ-pool they will eventually arrive at PSI (Fig. 3C) where ferredoxin can be reduced and transfer electrons to F-GOGAT. However, to get a truly complete picture of the situation several issues will have to be addressed in the future: we do not know which of the low abundant ferredoxins as well as high abundant ferredoxin 1 interact with PSI, F-GOGAT, PFOR and photosynthetic complex I. It would be furthermore helpful to know all midpoint potentials of the different ferredoxins. Without this information it might be too much to ask for a simple interpretation.

      A central tenant of the argument put forward on the evolutionary importance of using either PFOR vs. PDH is the conservation of extra free energy by the former reaction. However, additional information on the ferredoxin paralog(s) that accept electrons from PFOR is necessary to evaluate these claims. Based on the data within these manuscripts, Fdx3, Fdx9, and IsiB have the strongest links to PFOR: though the authors do take care to never state directly that they have evidence that these are the acceptors in vivo. Given the variability in the midpoint potentials of different ferredoxins, some ferredoxin acceptors may better conserve the free energy in pyruvate, while others may actually be more 'wasteful' than NAD+ as the acceptor through PDH. Unfortunately, the midpoint potentials for Fdx3, Fdx9, and IsiB are unknown or not stated in this manuscript. It is therefore unclear what ferredoxin is being used as the reference point for conservation of Gibbs free energy in Figure 4C and referenced multiple times in the text.

      We agree that it would be great if we already knew the redox potentials of all the ferredoxins involved. We are currently working on this issue. All that we know for now is that the redox potentials of ferredoxins lay between -240 mV to -680 mV whereas the redox potential is around -320 mV for NAD(P)H/NAD(P)+. Unpublished data that require further validation reveal that the redox potential of Fdx9 is definitely more negative than the redox potential of Fdx1 (-412 mV) in Synechocystis and is thereby clearly more negative that -320 mV. However, as these data require further validation, we did not name numbers. In addition, interaction studies on PFOR and low abundant ferredoxins are planed and preparations are in progress.

      Finally, the measurements of NADH:NAD+ (most prominently used for measurements in Fig 1B) utilized kits that require multiple, long centrifugation steps in the dark prior to assaying this rapidly exchanging pool. While it appears that the authors were able to get reproducible results with these kits, it is difficult to interpret what the increase in relative NADH levels in glucose-fed cells means given that 10+ minutes of incubation in the dark and/or changing temperatures elapsed after the cyanobacteria were removed from the incubator before the NADH:NAD ratio was assessed. While it superficially makes logical sense that the cytosol would be over-reduced when illuminated and under glucose feeding relative to illumination alone, it shouldn't be assumed that these measurements are representative of this rapidly-exchanging pool under the steady-state growth conditions.

      Thank you very much for raising this important point. We are very much aware of the difficulties to determine the redox state of NADH:NAD+ using these kits. However, there is no other method available that properly distinguishes NADH and NADPH. Furthermore, the centrifugation step was done at -9°C which should minimize metabolic reactions during this step. However, we now added in vivo measurements using the NAD(P)H-module available for the PAM and using the Dual-KLAS/NIR to determine the redox state of ferredoxin (newly added Fig. S4). Both methods show that NAD(P)H as well as ferredoxin are more strongly reduced under photomixotrphic conditions in comparison to photoautotrophic conditions and thus support our previous data.

      Reviewer #2 (Public Review):

      The observation that cyanobacteria can use two alternative pyruvate decarboxylating enzymes using either NAD+ or ferredoxin is an interesting and the work is useful contribution. The authors very nicely characterize the enzymatic properties of the two pyruvate metabolizing enzymes and also are able to connect the ideas of redox balance with a set of ferredoxins. Even though they are not able to definitively characterized the specific ferredoxin which interacts with the enzyme, the analysis is nicely conducted and it's clear that the suggestion they're making regarding the involvement of the minor ferredoxins is compelling. However, the work could be written in a way that might be more useful.

      Specific comments:

      Overall this is an interesting study, but the arguments could be sharpened and better connected with the literature. The introduction needs to be considerably revised in my opinion. It is not obvious whether it is even appropriate to discuss the enzymes as an aerobic enzymes or aerobic enzymes, since this concept is simplistic and perhaps, archaic. Indeed, placing the results of the present study in the context of "aerobic enzymes versus aerobic enzymes" is a bit of a 'strawman' argument. For example, the counter examples of O2-tolerant enzymes cited seem to suggest that PFORs have been capable of evolving into O2-tolerant enzymes quite readily and that two types of decarboxylase have evolved for quite different reasons than simple replacement for a new environment. Instead, I think a more current and general perspective relates more to the interpretation that the authors are already putting forth. Namely, the enzymes are utilized according to redox balance considerations rather than sensitivity to oxygen.

      Therefore, I think the very long and pedantic introduction is useful for review, but only if it is shortened and also includes the alternative interpretation regarding adaptations to redox potential in the cytoplasm. My guess is that there are plenty of examples of redox balance function arguments in the literature to refer to in contrast to the evolutionary replacement argument used. Certainly, there are good examples regarding glucose toxicity in mutants of Synechocystis that can be considered.

      Thank you very much for this point. The O2-tolerant PFORs mentioned were merely shown to be stable in the presence of oxygen in vitro which means that they can be isolated under anaerobic conditions. However, all enzymatic in vitro assays required anaerobic conditions. Only one PFOR was shown to be active in the presence of oxygen in vitro. Physiological studies on the importance of these enzymes under aerobic conditions in vivo are completely missing. However, animated by the requests of the reviewers we searched the literature intensively again and indeed found a recent report, which describes the involvement of PFOR in redox regulation in an aerobic culture of an E. coli mutant, in which glucose-6P dehydrogenase (ZWF) was down-regulated (1). We included this study both in our introduction and discussion. It very much supports our own findings, as the E. coli PFOR requires likewise anoxic conditions in in vitro enzyme tests. We agree that the idea that PDH complex and PFOR are exclusively regulated by oxygen availability might sound simplistic. However, we do not fully agree that this is a strawman argument as both enzyme systems are still mostly discussed as counterparts for either aerobic respiration (PDH complex) or anaerobic fermentation (PFOR)(4). To the best of our knowledge, the study that was included now and our own data, are the very first ones that put clearly forward the idea, that redox control governs the activity of these enzyme systems at the pyruvate node independent of oxygen. However, doubts about the rather simplistic distinction between aerobic versus anaerobic enzymes in general have indeed been expressed. Even though these studies in general lack physiological in vivo experiments. We therefore included this information in the introduction as well. (line 76: There are several reports on the aerobic expression of enzymes that are assigned to anaerobic metabolism in prokaryotes and eukaryotes and therefore challenge the simplistic distinction between aerobic versus anaerobic enzymes (5-7). Their physiological significance and regulation are only partly understood.) This did not result in a shortened introduction though as additional information was added. The new introduction thus includes alternative interpretations as requested and is therefore hopefully more balanced.

      Given the interpretation that the alternative forms of the enzyme help cells adjust their redox balance to different conditions, such as photomixotrophic growth, the very nice enzymatic analysis and growth studies of the mutants work would be significantly strengthened by more direct physiological measurements that report intracellular redox states.

      Thank you very much for this important point. Intracellular redox states were shown by measurements of the NAD+/NADP level (Figure 1B) and were now extended by new in vivo measurements that show that both the NAD(P)H and the ferredoxin pools are more reduced under photomixotrophic in contrast to photoautotrophic conditions (new Fig. S4).

      Minor comments:

      line 211: Perhaps, "..the deleted alleles failed to segregate, keeping some wild type copies."

      This was changed to: the deleted alleles of fx2 (sll1382) and fx5 (slr0148) failed to segregate, keeping some wild type copies.

      It would be interesting to characterize whether the observed distribution of PFOR correlates with specific physiological features. In other words, PFOR seems to become important upon the addition of an external carbon source in way that must integrate with autotrophic metabolism (i.e. mixotrophic growth) altering the balance of the oxidized and reduced form of redox cofactors--does the observed distribution correlate at least with the metabolic characteristics of the handful that have been studied in the lab?

      Thank you very much for this suggestion. We checked the lists of cyanobacteria that either possess or do not possess a PFOR in order to search for shared known physiological features. However, the challenge is currently that the number of uncharacterized cyanobacteria in our list is too large. It is therefore impossible to find solid correlations. But we fully agree that it would be interesting to find these.

      A more detailed set of calculations that help explain panel C in figure 4 need to be included to support the quoted values for redox potential in free energy. I assume these are standard values and and the specific superscripts and subscription associate with the ΔG nomenclature needs to be defined.

      The calculations are shown in the materials and methods part. A respective notice (for calculations see materials and methods part) is now given in the legend of Fig. 4C. Information concerning the nomenclature is found in the cited literature in the materials and methods part as well.

      Reviewer #3 (Public Review):

      The manuscript by Wang et al. conclusively demonstrates that the cyanobacterium Synechocystis sp. PCC6803 prefers to use the ferredoxin-reducing enzyme PFOR over the NAD+-reducing PDH-pathway when grown under photomixotrophic conditions while the PDH-route is favored under photoautotrophic conditions. Both the potential physiological meaning of this switch and implications for the evolutionary history of the role of the respective enzymes and their pathways are discussed.

      The main hypothesis of this work considers that PFOR-mediated decarboxylation of pyruvate replaces the PDH-based one when cells shift from photoautotrophic to photomixotrophic growth conditions. This hypothesis is assessed via the comparison of growth curves measured on a host of deletion mutants and via direct detection of expression levels of certain enzymes. The authors' hypothesis is robustly supported by the majority of the reported experiments and the reviewer is fully convinced by these data. However, I would hold that the data shown with respect to phosphorylation of PDH (Fig. S4) are unconvincing. I can't see a clear difference in growth-curves for the incriminated mutants deltaspkB and L which would convincingly exceed the variation observed for the entire dataset.

      We agree that the data on the phosphorylation of the PDH complex including the kinase mutants are not very convincing. We were uncertain from the beginning on whether it would be a good idea to include these data sets and therefore discussed them very cautiously in the manuscript. Anyway, as the enzymatic tests with the E3 subunit of the PDH complex at different NADH concentrations show convincingly that high NADH levels have an inhibitory effect on the complex, we now decided to delete both data sets out of the manuscript, as they are not really required for the statement of the manuscript.

      1) S. Li et al., Dynamic control over feedback regulatory mechanisms improves NADPH flux and xylitol biosynthesis in engineered E. coli. Metab Eng 64, 26-40 (2021).

      2) T. Nakayama, S. Yonekura, S. Yonei, Q. M. Zhang-Akiyama, Escherichia coli pyruvate:flavodoxin oxidoreductase, YdbK - regulation of expression and biological roles in protection against oxidative stress. Genes Genet Syst 88, 175-188 (2013).

      3) A. Witt, R. Pozzi, S. Diesch, O. Hädicke, H. Grammel, New light on ancient enzymes – in vitro CO2 Fixation by Pyruvate Synthase of Desulfovibrio africanus and Sulfolobus acidocaldarius. The FEBS Journal 286, 4494-4508 (2019).

      4) M. Müller et al., Biochemistry and Evolution of Anaerobic Energy Metabolism in Eukaryotes. Microbiology and Molecular Biology Reviews 76, 444 (2012).

      5) S. B. Gould et al., Adaptation to life on land at high O2 via transition from ferredoxin-to NADH-dependent redox balance. Proceedings of the Royal Society B: Biological Sciences 286, 20191491 (2019).

      6) O. Schmitz, J. Gurke, H. Bothe, Molecular evidence for the aerobic expression of nifJ, encoding pyruvate : ferredoxin oxidoreductase, in cyanobacteria. FEMS Microbiol. Lett. 195, 97-102 (2001).

      7) K. Gutekunst et al., LexA regulates the bidirectional hydrogenase in the cyanobacterium Synechocystis sp. PCC 6803 as a transcription activator. Molecular Microbiology 58, 810-823 (2005).

    1. Author Response

      Reviewer #1 (Public Review):

      Kano and authors present a very interesting and unique study investigating whether the white sclera, uniquely characteristic of human eyes, contributes to better gaze detection by individuals, a key prediction of the gaze-signaling and cooperative-eye hypotheses. They test both humans and chimpanzees in a well designed, counter balanced, experiment where they examine both within and cross-species evaluations of gaze from static, controlled images. Overall, they provide compelling evidence that the white sclera not only contribute to better gaze discrimination by both humans and chimpanzees, but that the white sclera also aid gaze discrimination when visibility conditions are poor.

      I found the experiments well designed and carefully thought out. The statistical methods are also appropriately applied in my opinion, although it would be helpful to have the exact R code the authors used as an additional supplement. In general however, the authors should be commended on the transparency with which they describe both the training and testing of individuals for both species.

      One clear weakness of the paper is that the evidence for chimpanzees is limited to only 3 (sometimes 2) individuals, but one can appreciate that this kind of experimental set up and task would have been quite difficult for them. Additionally, although the authors were diligent in selecting a cross-cultural sample of human images, the test subjects were all primarily of one cultural background. Although these weaknesses mean that the generalization of their results need to be taken with caution, I find the methods and results are compelling and provide a significant contribution to the on-going discussion of the importance of external eye morphology in facilitating cooperation and communication.

      Importantly, they show evidence for both white sclera and eye shape/size enhancing gaze discrimination when visibility is compromised, adding empirical evidence for a critical component to the gaze-signaling and cooperative eye hypotheses. I am confident their experimental approach will be useful to other scholars investigating this topic and will provide a comparative framework with which to test other species or test more individuals from different populations of humans and chimpanzees.

      Thank you very much for your positive evaluation. We added our R formula in Table S1. Although the majority of our participants were from similar cultural backgrounds, we believe that the results from the previous two experimental studies on the same topic (Ricciardelli et al. 2000; Yorzinski and Miller, 2020) complement our results because these previous studies tested participants from other cultures. We also added a paragraph explicitly addressing the limitations of our study, including the small number of chimpanzee participants in our test conditions.

      Reviewer #2 (Public Review):

      The proclaimed goal of Kano et al. is to provide "experimental evidence answering the question of whether the human white sclera serves any communicative function for eye-gaze signaling". This is indeed an important gap in the literature, although it has recently been addressed by e.g., Yorzinski & Miller, J (PLoS ONE 15(2), e0228275 (2020)) in a set-up with human subjects. This study, however, includes the first experimental approach to this issue that is built on an interspecific comparison: The authors tested how well humans and chimpanzees can evaluate eye gaze direction in face pictures deriving from both their own and the other species. Additionally, the human and chimpanzee subjects also had to score manipulated photos, in which irido-scleral colors were inverted.

      The experimental protocol is one of the strengths of the study. The experimental stimuli were thoughtfully crafted to avoid unwanted biases and variable shading and size dimensions of stimulus pictures address relevant perceptual challenges of glance identification in the real world. Minor aspects of stimulus design (e.g. inverting pupil colors) are not justified, though. Research hypotheses are clearly stated and are relevant to the current scientific discourse on the topic. The training procedure for the chimpanzees was made fully transparent, impressively demonstrating the efforts involved in preparing them for the study.

      The results are straight-forward and I have no criticism towards analyses and data presentation in the manuscript, which I believe are all well done. Nevertheless, I want to point out that only two chimpanzee subjects participated in all tests, which limits the conclusiveness of the data. This is particularly true, because several chimpanzees that later dropped out of the training performed better when conspecific rather than human stimuli were presented. This issue should receive more attention in the manuscript.

      In general, I believe that many of the interpretations and a priori assumptions of the authors are problematic, constituting the most important weaknesses of the manuscript. Even key claims of the study are only partially supported by the collected data or by results previously reported in the literature:

      From a methodological perspective, this manuscript simply addresses the question: "Is human eye-gaze more conspicuous than that of chimpanzees?" The authors answer this questions positively, which is an expected result and in line with previous research. Nevertheless, the Introduction and Discussion sections of the manuscript prominently discuss the question "Why is the human eye more conspicuous?". For this, an evolutionary perspective needs to be taken into account (see below) and, if an adaptive conjecture is adopted, potential functions need to be proposed.

      The study endorses social drivers behind the depigmentation of the human sclera. However, social functions of eye gaze were not explored in the experiments, as subjects simply needed to extract basic information on glance direction from pictures. It should be expected that increased contrast, as present in the human eye compared to the chimpanzee eye, facilitates the detection of these patterns. I therefore see no new arguments for the idea that scleral color is importantly involved in social cognition and the link between the results and the authors' interpretations remains speculative. It has been demonstrated that reflexive glance following is found in various catarrhine primates, but only humans appear to use glances as referential cues in social situations. The lack of focus on eye orientation in chimpanzee behavior has been strikingly demonstrated by the training results presented herein and strongly supports this dichotomy. At the same time, extensive scleral depigmentation is not rare among monkeys and apes, so that explanations for this phenomenon should be applicable to species other than humans (Caspar et al. Sci Rep 11, 12994 (2021), Perea-García et al. Symmetry, 13(7), 1270 (2021)).

      Thus, it is unfortunate that the very strong conclusive statement "we found that the key function of white sclera is to enhance the eye-gaze signal", is not balanced out by an exploration of alternative hypotheses or caveats to this conclusion. I would argue that such a claim is difficult to defend when a single species pair with very different expressions of eye pigmentation is studied. The authors do not discuss how their interpretations might or might not fit other primates with strongly depigmented sclerae, like Sumatran orangutans. This is an important shortcoming, because such comparisons could potentially back up or damage the hypotheses drawn from the human-chimpanzee pairing.

      Finally, the authors strongly imply that the human condition of scleral pigmentation alone is the derived one and thus requires a peculiar (functional) explanation. On the contrary, the chimpanzee phenotype is discussed as if it would represent an ancestral condition which is deemed representative for nonhuman primates as a whole. However, recent evidence suggests that both humans and chimpanzees show unusual scleral color patterns, with other great apes displaying variable pigmentation with a strong trend towards (at least localized) depigmentation in orangutans, bonobos, and gorillas (Perea García, J. O. J. Lang. Evol. 1 (2), 151-158 (2016), Caspar et al. Sci Rep 11, 12994 (2021)). This is not mentioned in the manuscript and should be added. The uniformly dark chimpanzee sclera is indeed not representative for great apes or most other groups of nonhuman primates.

      All in all, this paper represents a valuable experimental contribution to the debate on the evolution of eye pigmentation in apes. In particular, it demonstrates that eye gaze (and therefore coloration) is negligible for chimpanzee communication. However, a more inclusive and nuanced interpretation of results and a better portrayal of their relevance to hypotheses explored in the literature is required. This includes an improved discussion of the limitations of the study's approach when it comes to deducing evolutionary and socio-cognitive patterns.

      Thank you very much for your helpful comments and expertise in this topic.

      Regarding the previous experimental study on this topic, Yorzinski and Miller (2020) is indeed our predecessor, and we detailed this study in our introduction section. Moreover, we want to point out that Ricciardelli et al. 2000 is our predecessor as well, and in fact, we designed our stimulus manipulation based on this previous study. Ricciardelli et al. 2000 tested human participants in a gaze-discrimination task and found that reversing the contrast polarity of the eye regions in the human faces deteriorates the judgment of the gaze directions in participants (thus we should have said this stimulus manipulation as “the reversal of eye contrast polarity”, rather than “the inversion of eye colors”; we apologize for our error in word choice). One advantage of Ricciardelli’s et al. method is that we could change only the contrast polarity but not any color differences within each eye image, namely the color differences between the iris and sclera and also between the pupil and iris (thus this manipulation does not change the conspicuousness of iris or pupil per se). We added Figure S4 to better explain how we made our stimuli. Please also note that the visibility of eye-gaze directions depends on the visibility of both iris and eye-outline edges, not only that of the iris (or pupil). To clarify this aspect, we added Figure S1.

      Regarding a small number of chimpanzee participants, we addressed this limitation more explicitly in our revision.

      Regarding the individual differences in training performance, although we indeed observed some differences between individuals in their performance for the chimpanzee and human stimuli during the training stage, we did not find any relation between these performances and the participants’ particular backgrounds (Figure S3 and Table S3). Most importantly, please note that the key criterion of passing the training stage was learning to reliably discriminate the eye-gaze directions of both human and chimpanzee stimuli.

      Regarding the interpretations and a priori assumptions, we can clarify them by referring to one most recent morphological study on great ape eye color in this revision (Kano, F., Furuichi, T., Hashimoto, C., Krupenye, C., Leinwand, J. G., Hopper, L. M., . . . Tajima, T. (2021). What is unique about the human eye? Comparative image analysis on the external eye morphology of human and nonhuman great apes. Evolution and Human Behavior, in press, DOI: 10.1016/j.evolhumbehav.2021.12.004). Although the current experimental study was performed independently from this related study, we built our experimental designs partly based on this related study. The results from our experimental study and those from this related study are complementary to one another.

      Regarding the iris-sclera color contrast/difference, Kano et al. (in press) found that the iris-sclera color difference (not the contrast measure of Perea-Garcia et al., 2019 criticized by Caspar et al., 2021) did not differ between the human and chimpanzee eyes. Importantly, we confirmed this same result in our chimpanzee and human stimuli (Figure S2). More importantly, as mentioned above, we did not just swap the iris and sclera colors in our eye images, but reversed the contrast polarity of eye images, without changing any local color difference within the eye images such as the iris-sclera or the iris-pupil color contrast/differences. Thus, please note that we did not ask "Is human eye-gaze more conspicuous than that of chimpanzees?", but asked, “Does the uniformly white sclera (with a darker iris) facilitate the visibility of eye-gaze directions across species?”. We expanded our introduction section to clarify our general aims and rationales/explanations for our experimental manipulations.

      Regarding the social drivers, one sentence in the conclusion paragraph (that you pointed out) was indeed misleading and thus rephrased it as “one function of the uniformly white sclera is to equip eye-gaze signal with robustness against its degradation caused by natural noises (e.g., shading, distancing)”. Indeed, our aim was to test the perceptual advantage of the uniformly white sclera, one key premise of the gaze-signaling hypothesis, but not to test the social drivers of eye-gaze signals.

      Regarding the sclera colors of other great ape species, we also recognized in Kano et al. (in press) that some nonhuman ape individuals have partly unpigmented sclera. However, this related study found that such partly unpigmented sclera is characterized as more graded or patchy color patterns compared to humans’ uniformly unpigmented sclera and that these color patterns more easily blend into adjacent skin/hair colors around the eyes, particularly in visually challenging conditions (e.g., shading, distancing). We thus predict that the same pattern of results would be obtained even when we use partly unpigmented sclera as our stimulus. However, further experimental studies are necessary to confirm this prediction. We clarified these points in both our introduction and discussion sections.

      Thank you once again for your critical but constructive comments.

    1. Author Response

      Reviewer #2 (Public Review):

      In this study, the authors developed a new expansion microscopy (ExM) method called Ten-fold Robust Expansion Microscopy (TREx). This method emphasizes one-round sample expansion of cells by systematically optimizing the monomer recipe. Compared to existing ExM methods which expand samples to similar scale (~ 10 folds), TREx aims for a robust procedure that can be handled more easily. The reviewer experimentally tested the TREx protocol, and validated the TREx 10x gel can be made robustly by researchers who have experience with standard ExM.

      We are very pleased that the reviewer tested out our new recipe!

      Specific comments:

      1) The authors claimed in the abstract that "TREx can provide ultrastructural context to subcellular protein localization by combining antibody-stained samples with off-the-shelf small molecule stains for both total protein and membranes". The authors only demonstrated one NHS ester dye, BODIPY-FL NHS dye (lined 151-159) without justification why this dye was selected. Does BODIPY-FL NHS dye work better than other off-the-shelf NHS dyes? The reviewer recommends the authors to validate a few more widely used dyes with TREx, e.g. Cy3/Cy5, Alexa 488, Alexa 568, to guide the readers to choose the appropriate dyes.

      We have added text on this issue, "Sim et al (Sim et al., 2021) have shown that highly hydrophobic NHS ester dyes exhibit strong contrast for cytosolic organelles while highly hydrophilic NHS ester dyes strongly stain the nucleus. The moderate hydrophobicity dyes that we used (BODIPY-FL (Zanetti-Domingues, Tynan, Rolfe, Clarke, & Martin-Fernandez, 2013) and AlexaFluor594 (Hughes, Rawle, & Boxer, 2014)) exhibit both nuclear staining and contrast for cytosolic organelles."

      2) Page 8: The reviewer is happy to see the discussion on the heterogeneous local expansion factors in cells. It is critical for evaluating the expansion isotropy and avoid pitfalls in the applications of TREx. Based on this work and previous work (e.g. U-ExM), organelles with higher protein density may have smaller local expansion factors than the macroscopic expansion factor. The authors discussed the local expansion factor of organelles with different protein density, including centrioles, NPCs, and microtubules. To evaluate the local expansion factors comprehensively, the reviewer asks the authors to add a figure or plot to compare the local expansion factors of different organelles, ideally including centrioles, NPCs, microtubules, clathrin-coated pits, mitochondria, ER, and centromere. The authors have already measured or imaged many of these organelles. For the other organelles, good antibodies are available. Therefore, the additional experiments should be straightforward for the authors. But the comprehensive comparison will make the work much more impactful.

      We address this in our response to essential point 3 and agree that the added comparison over multiple organelles has made the work more impactful.

      3) Line 388: The authors stated "The strong overlap between NHS ester and mCLING stains was not unexpected, given the reactivity of NHS esters towards both unreacted lysines in the mCLING molecule and antibodies." Since AcX (6-((acryloyl)amino)hexanoic Acid, Succinimidyl Ester) at high concentration was added after the mCLING staining, most of the lysines in the mCLING should be reacted by the AcX. Therefore, NHS ester dye staining should not strongly overlap with the mCLING. The authors should re-evaluate and interpret the overlap. The authors can do simple experiments like increasing the concentration of AcX, or use pH 8 for AcX treatment. If the overlap is reduced, it means the overlap was caused by the unreacted lysines in mCLING, and can be reduced. If the overlap is not reduced, there are other mechanisms which need further examination or interpretation.

      The AcX concentration was selected to maximize retention of proteins without hindering gel expansion by cross-linking through multiple AcX modifications on each individual protein. Therefore, it is likely that AcX is not close to saturating the available primary amines. We have explored this further with an AcX competition assay.

    1. Author Response

      Reviewer #1 (Public Review):

      Strength: Excellent statistical methods are employed. Specimens collected from two centers are used.

      Weakness: It is not clear what new knowledge this follow-up study bring to the audience. The critical biomarker, miR150 they propose for development of biodosimetry assay was already discovered. There are close to dozen publications showing the dose response of miR50, in mouse, rats, non-human primates and humans (including two research papers and and several reviews from authors). The dose response shown in 4b is not appreciable. Introduction and discussion talk about clinical utility for triage after nuclear disaster. Is analysis of miRNAs purified from exosome a viable approach for triage and clinical decision making? If so, please provide convincing argument showing practicality.

      We appreciate that the reviewer and the editor believe that “excellent bioinformatics and biostatistical methods are employed”. We apologize for the confusion regarding miR-150 and its utility as a radiation exposure biomarker. Indeed we and others have shown the importance of miR-150 and other miRNAs in detecting radiation exposure in mice and macaques. We had inferred that the resulting evolutionarily conserved radiation-inducible microRNAs were very likely to translate well to humans due to the high conservation of their promoter regions and transcription factor binding sites. However, in this study validating microRNA-based test for radiation detection using actual samples , we demonstrate that while most of the predictions grounded in animal models held true, solely through the analysis of human data were we able to develop a model that reached clinically-useful performance. And most importantly there are key differences in humans suggesting that for clinical application the primary source of data has to be human. For example, a key miRNA for radiation detection noted in macaques – miR-133 – was absent in human patient sera. The miR-30 family, important for dose separation in mice was redundant in the human test. The results from animal studies of miR-150-5p are not directly translatable for the use in humans. In animals, particularly isogenic mice, miR-150-5p kinetics enable perfect separation of the irradiated from non-irradiated samples, even after low dose exposure. The dose response in humans, that have different genetic and clinical background, is much less appreciable and therefore a simple, single- or two-miRNA-based test is insufficient. To overcome this, we employed artificial neural networks reliant on the expression of 8 miRNAs and 2 normalizers, which assure robustness to differences in sample material content. Therefore, we are bringing significantly new knowledge to the field, and providing a template for how miRNA signatures derived from animal models need robust validation in human samples before we even conceive a human application. The analysis of miRNAs purified from exosomes constitutes an exploratory component of our work and is not part of the proposed diagnostic procedure for triage and clinical decision making. We introduced necessary changes to make the division between the main and exploratory parts of our work more evident (lines 116-127).

      Major comments:

      1. Longitudinal evaluation of specimens from human patients who received TBI is a plus. However, baseline readings in specimens collected from leukemia patients need to be compared with that in healthy humans. Why several specimens were excluded from analysis?

      Since the irradiation of healthy humans would not be ethically acceptable, we cross-referenced the results from patients with leukemia with our earlier results of radiation-responsive miRNAs in healthy mice and non-human primates as a surrogate of healthy humans undergoing TBI. As outlined in the “Preprocessing of profiling data” section of Materials and Methods, we implemented quality control based on the number of detected miRNAs per sample. For the miRNA-seq based experiment, samples with less than 350 miRNAs with non-zero reads detected (4A and 7A in Figure 1 – supplementary figure 1) and respective paired samples were removed from the analysis. Additionally, sample DFCI.13A was an outlier in hierarchical clustering and in Principal Component Analysis (Figure 1 – supplementary figure 2) and therefore this sample, together with paired samples from other timepoints, were excluded from the analysis. We incorporated this information in the main part of the manuscript (lines 146-148).

      1. Dose response noted is moderate. Biodosimetry refers retrospective evaluation of absorbed dose and the analysis should include validation using specimens of unknown exposure.

      As outlined above, the moderate dose responsiveness of miRNAs used in our proposed signature is the primary reason why we believe that a simple diagnostic procedure based on a single miRNA, e.g. miR-150-5p, will not be feasible for use in humans. The final model was evaluated on an independent group of 12 patients with samples drawn under the same protocol (for which exposure and dose was unknown, to validate the model diagnostic accuracy).

      1. Authors says that 1 Gy exposure in humans can cause ARS (paragraph 1, introduction). However their approach do not resolve dose under 4 Gy (around the LD50 value in humans).

      The TBI protocol does not allow for irradiation with doses lower than 2Gy in a single fraction, which was the reason behind the definition of low-dose exposure group (2 or 4Gy) in our study. However, localized irradiation with higher doses provokes response reflected by changes in miRNA levels in serum (Malachowska et al. Int. J Radiation Oncol Biol Phys), suggesting that the irradiation signature are likely to hold true and identify individuals exposed to smaller doses.

      Reviewer #2 (Public Review):

      The study first compared the profiles of serum miRNA in patients before and after irradiation treatment. Then they selected 8 miRNA markers that showed significant changes in levels for further analysis. Then, they showed that the analysis of these markers by real-time PCR can differentiate the pre- and post-irradiation samples in 12 additional patients. The objective of the study is unclear.

      We rephrased the appropriate sections of the manuscript accordingly to elucidate the objective of the study (lines 105-106 and 131-132).

      The study only demonstrates that the 8 miRNA markers are useful to differentiate serum samples collected before and after irradiation. This information is not useful as the blood picture would be more accurate and cheap to accomplish this task.

      The currently used diagnostic screening tests for radiation exposure, including time to onset of radiation sickness, kinetics of lymphocyte depletion and chromosomal abnormalities analysis, are time-consuming and do not allow definite conclusions, as outlined by the lack of FDA-approved biodosimeter. The nadirs of peripheral blood cell counts may reflect high dose exposure but do not allow for prediction of the eventual outcome. Moreover, as evidenced in our prior experimental studies, the dynamics of the blood cell counts are significantly slower than those of circulating miRNAs. For example, the differences in outcome, that is probability of survival of an animal after acute radiation exposure, is not evident by any blood counts or other measures for weeks after radiation, and is predicted by a blood based-microRNA signature with ~90% accuracy assessed 24 hours after radiation exposure (Acharya et al, Science Translational Medicine, 2015). Therefore, although we acknowledge that a blood cell count would be cheaper, we respectfully disagree that it would be more accurate in rapidly providing the necessary information to implement countermeasures safeguarding from the absorbed radiation dose. Furthermore, qPCR-based assays are also inexpensive and increasingly available, owing to the COVID-19 pandemic and the great need to expand PCR-based testing capabilities that it gave rise to. We acknowledge that this information was not presented in sufficient detail and we expanded relevant sections of the manuscript (lines 64-76, 401-402).

      The authors also propose that these markers are useful for the identification of subjects exposed to irradiation. As this study has not addressed the specificity of these miRNA markers to irradiation, the claim of having a signature for radiation exposure is not justified.

      We had shown in previous, experimental exposure studies (“Serum microRNAs are early indicators of survival after radiation-induced hematopoietic injury”, Science Translational Medicine, 2015 and “Evolutionarily conserved serum microRNAs predict radiation-induced fatality in nonhuman primates”, Science Translational Medicine, 2017), performed using animal models that miRNAs with radiation-dependent alterations of expression show association with bone marrow depletion, correlate with survival in amifostine rescue experiment, and that miRNA expression changes are supressed by the use of radiation-mitigating agents like gamma-3-tocotrienol. These arguments act in favour of specificity towards irradiation as the inciting stimulus of the expression patterns. The cross-referencing of results from animal studies and from our miRNA-seq experiment on human samples was aimed to account for this issue, as similar experiments on healthy humans would not be ethical, and to identify high-confidence miRNAs from which a signature could be built. We now added these explanations (lines 112-115, 164-167, 344-350).

      Although patients with irrevocable damage of bone marrow due to other factors would be an interesting comparative group, we struggle to find an ethically acceptable scenario that would match the TBI in terms of the timeline and repeatability of the bone marrow depletion. A feasible alternative may be high dose chemotherapy conducted in preparation for bone marrow transplant, but the dynamics of that procedure are vastly different making the group more adequate for analyses of bone marrow regeneration rather than a control for TBI-initiated damage.

      The key new experiments in this study are the profiling of the serum miRNA in the patients undergoing total body irradiation. The results on mouse model and macaques have been published previously. The consistency of the changes of the miRNA markers is not surprising.

      The consistency of the radiation-inducible miRNAs between mice, non-human primates and humans was expected, given the high conservation of their promoter regions and transcription factor binding sites, as we showed previously (Fendler et al., 2017). This step was important to assure that the miRNA level changes observed in humans result from radiation exposure, as this could not be determined directly, as mentioned in the response to previous remark. However, the creation of the clinically-applicable test would not be possible without a true study in humans presented in the manuscript. Notably, miRNAs crucial for the radiation exposure models in our macaque model (miR-133b) was surprisingly absent in human sera, and the miR-30 family, important for dose separation in mice was redundant in the human test. This serves as a cautionary tale for “translational” studies without true validation in humans and underlines the importance of our findings in terms of the first human-specific and adequately validated diagnostic and prognostic test for radiation exposure.

      Reviewer #3 (Public Review):

      1. Appropriate bioinformatics discussions and functional pathway analysis are necessary for the key differentially expressed miRNAs that have been screened out. It is boring to only discuss the differences of miRNA data.

      We appreciate the suggestion to back the results of differential miRNA expression with a more in-depth bioinformatic discussion. We discussed the results of functional enrichment analysis, presented in Fig. 3C, in more detail, and appended the bioinformatic analysis (lines 218-222, 360-364, 546-549). A graph of miRNA-gene interactions, created using miRTargetLink 2.0 for miRNAs differentially expressed in exosomes after high dose irradiation has been added as figure supplement 1 to Figure 3.

      1. In page 5, "We used logistic regression to create such a model in the low-dose setting (N=22 sample pairs). The resulting classifier was based on the expression of miR-150-5p, miR-126-5p and miR-375" , Why the three miRNAs in the low-dose radiation group were selected for modeling instead of the seven overlapping miRNAs in the high and low dose radiation group to classificate the irradiated- and non-irradiated samples ? Please explain in detail.

      The expression of miR-150-5p, miR-126-6p and miR-375 was used in our previous animal studies to determine radiation exposure and we used similar approach at this stage of the project to evaluate whether their expression measured using RNA sequencing in human sera can reliably distinguish between the irradiated and non-irradiated samples. We acknowledge that it is not clearly stated. The primary purpose of this analysis was to visually present similarities in radiation-inducible miRNA expression changes across species, and the logistic regression model in question was not used any further. Following the Reviewer suggestion, we built a model using the seven miRNAs overlapping in the high and low dose radiation comparisons to classify the irradiated- and non-irradiated samples, obtaining AUC of 0.95 (95%CI: 0.89-1.0); however, we believe adding this information to the main part of the manuscript is not necessary.

      1. In page 5, "Therefore, the expression of miR-126-5p, miR-150-5p and miR-375 enabled efficient classification of the irradiated- and non-irradiated samples in both settings (Fig. S6C)";

      In page 6, "Interestingly, a set of 3 miRNAs quantified by qPCR in all of our previous experiments clearly visually distinguished irradiated from non-irradiated samples in the human analysis (Fig. 5A)",

      Which three of miRNAs,miR-150-5p,miR-375,miR-126-5p mentioned before or miR-150-5p,miR-375,miR-215-5p?Please clarify clearly.

      Thank you for the suggestion. We rephrased this fragment (lines 289-290).

      1. In page 4, "Since miRNA-containing exosomes.......high dose irradiation", Do you think that the differently expression of serum miRNAs partly results from exosomes? Low dose irradiation is also able to change exosomal miRNA profile,why only high dose irradiation is taken into account in paper while low dose irradiation is not?

      We believe that serum miRNA expression results in part from exosomes and, as an exploratory component of our work, aimed to verify whether the magnitude of changes in exosomal miRNA expression exceeded that in serum, improving the potential biomarker specificity to the extent that would justify the development of an arguably more complex and labour-intensive test utilizing exosome isolation. The sequencing of exosomal miRNA content was therefore performed as an exploratory analysis only after high radiation exposure. However, the lower amount of exosomal miRNA than obtained through the total miRNA extraction protocol offsets any benefit stemming from higher cellular specificity of the former, and, based on the results that were comparable with those obtained from sera, decided to not explore this concept further. We added this explanation to our manuscript as this issue was not clarified previously (lines 116-127 and 339-343).

      1. Are there any miRNAs that can clearly distinguish between high and low dose groups? If so, please clarify them in text.

      We now clarified this issue in discussion (lines 415-417).

      1. In page 7,"Importantly, similarities were observed in the level of both individual miRNAs and miRNA families", What part of result Comes to this conclusion?Please explain clearly.

      When describing similarities between human and animal studies, we refer to our previous work describing radiation-responsive miRNAs in mice and non-human primates. These similarities (and differences) are described in detail in Table 1. We added relevant references to Table 1 and to the cited sentence (line 352).

      1. In page 7, "We found that the most common putative tissue sources for differentially expressed miRNAs were hematopoietic and endothelial cells", Which part of result shows this sentence? Please point it out.

      This statement is not validated in our work explicitly but based on the results from references: Ludwig et al., 2016, de Rie et al., 2017 and Landgraf et al., 2007. Since Ludwig et al., de Rie et al. and Landgraf et al. generated excellent data of miRNA expression across human and mouse tissues and cell types that showed overlapping results for the miRNAs of interest, as detailed in Table 1, we did not perform additional confirmatory experiments.

      1. Were the patients suffering from cancer or other diseases? How to ensure that the differential expression of miRNA was caused by radiation exposure rather than their own disease? Please explain.

      As described above, initial experimental studies performed in animal models (mouse and macaque) in preparation for this study showed the specificity of miRNA (including ones in the signature) towards radiation exposure in different animal models. This was evidenced on multiple layers of validation and rescue experiments. Admittedly, a demonstration that additional diseases with a phenotype similarity with ARS affect study performance is an interesting concept, but it would be extremely unlikely to impair the performance of the test in an individual after radiation exposure. Namely, even if the examined patient has a hematologic malignancy or myelofibrosis potentially affecting the performance of the test, identification of such individuals as potentially irradiated would lead to them being followed-up adequately. Failure of the test to detect radiation exposure will likely not be severe risk, since such individuals will already be severely ill and under proper care with regular monitoring of bone marrow function. We are aware that some unforeseen and not discussed clinical factors may affect some facets of the test but the built-in robustness derived from having multiple miRNAs mitigates the risk of non-specificity.

  3. Dec 2021
    1. Author Response

      Reviewer #1 (Public Review):

      Giove and colleagues find that a perceptual effect, namely whether a flicker is perceived or unperceived, is reflected in metabolic signals measured with functional MRS, but not in BOLDfMRI. Specifically, perceived but not unperceived flicker led to an increase in lactate and glutamate in early visual cortex (a combination of V1, V2 and V3). BOLD-fMRI did not increase in this same region, suggesting that we are missing important neural signals by focusing on BOLD-fMRI only. The authors also provide a thorough discussion of the potential physiological mechanisms underlying these metabolic effects. I should note that I have no expertise in fMRS, and my assessment is based on knowledge of BOLD-fMRI and perception.

      Whether or not the flicker was visible was manipulated by changing the frequency of the flicker. Specific, a low frequency flicker (7.5 Hz) was perceived, but a high frequency flicker (30 Hz) was not. Of course, this means that it is difficult to assess whether the fMRS effects are related to perception itself (visible vs. invisible) or due to the low-level features of the stimulus, e.g. the temporal filtering properties of the visual system. This limitation does not however hinder the main conclusion of the paper, which is that certain neural signals are missed by BOLD-fMRI but can be picked up by fMRS.

      We thank the referee for these constructive comments. In this revision we further stress the importance of the argument suggested by the referee that MRS but not BOLDfMRI better reflects differences in information processing related to perception. In other words, the metabolic response of V1 can predict whether a visual stimulus is perceived or not. This of course does not necessarily involve causality. We argue that stimulus perception is inextricably linked with low-level features of the stimulus, i.e., perception is equivalent to the filtering of the stimulus, which in turn depends on stimulus characteristics.

      In Figure 2B, it looks like BOLD dynamics may differ between the slow and fast flicker blocks, even if the mean amplitude did not. So perhaps there are some more subtle BOLD differences between conditions that the authors do not explore.

      In this revision, we test for statistical differences in the BOLD time-course, as suggested by the referee. Please see our response in the “Essential Revisions” above.

      The authors themselves also raise a potential partial voluming issue in the fMRS measurement that seems important to consider, given the differential BOLD signal in nearby regions (V2 and V3). Specifically, the volume in which fMRS is measured consists of parts of V1, V2, and V3. There are no significant differences between perceived and unperceived BOLD-fMRI in this volume as a whole, but there are in V2 and V3 in isolation. This raises the possibility that the null effect of BOLD-fMRI in the fMRS volume as a whole is due to it washing out in this larger volume. Could it be that the fMRS effects are also driven by V2 and V3, but are for some reason stronger/more robust, and therefore survive in the larger volume? In other words, I wonder if the BOLD and fMRS effects may actually co-localise, but differ in effect size.

      This is an interesting possibility to consider, but unfortunately, it cannot be really addressed without the help of a tailored study. To attempt a minimally meaningful analysis we would need (at least) to know the partial volume of each individual subject for assessing whether and to what extent the partial volume correlates with the spectroscopic results. As stated above, we did not acquire single-subject retinotopic maps. Even with this piece of information, a reliable identification of multiple and spatially distinct components summing up to the single MRS signal would be problematic. A qualitative reply to the issue raised by the referee is that the metabolic response to visual stimulation measured with FDG-PET (an index of glucose utilization, and by extension, of lactate production) has been proposed to peak in V1 (e.g., Chen et al., HBM 2018 PMID: 30076750). Therefore, it is unlikely that V2/V3 contribute much more than V1 to the stimulation-induced increase in lactate and glutamate concentration. Furthermore, to the best of our knowledge all previously reported increases in lactate concentration during photic stimulation have been assigned to V1.

      In conclusion, the authors demonstrate an intriguing dissociation between BOLD-fMRI and fMRS, which should prompt further research into this topic, and may ultimately change the way we interpret neuroimaging signals.

      The referee has wonderfully summarized our study. Thank you.

      Reviewer #2 (Public Review):

      In this paper the authors investigate differences in metabolic response in primary visual cortex (V1) to perceptible and imperceptible stimuli using proton magnetic resonance spectroscopy (1h-MRS) and fMRI.

      The main strength of this paper is it shows that perceptible stimuli trigger a different metabolic response in V1 than imperceptible stimuli, namely that lactate and glutamate levels both increase for perceptible stimuli but are unchanged for imperceptible stimuli. Weaknesses of the study are that no retinotopic mapping was performed on the subjects so the spectroscopic voxel may contain contributions from early visual cortex outside V1; the assumption that increased BOLD response in V2 is caused by perception is not convincing.<br> The differences in concentration of lactate and glutamate are striking, and the only plausible explanation is differences in metabolic response in V1.

      This is the clear and main result of the paper. The argument that an increased activation in V2 is caused by perception is less interesting. More sophisticated experimentation and analysis including connectivity analysis would be required to investigate the interaction between V1 and the rest of the brain.

      This could considerably increase the importance of MRS in cognitive neuroscience. It would be fascinating to use dynamic causal modelling or a similar technique to explore connectivity between regions for perceptible/imperceptible stimuli and to combine this with proton spectroscopic imaging.

      We agree with the referee that our work cannot help in establishing a relationship between perception and activation of visual areas, and that more sophisticated investigations would be necessary for that purpose. We also acknowledge that our study suffers of the limitations mentioned by the referee. In the present revision we include, as limitations, the absence of retinotopic mapping and the lack of causality between perception and BOLD/MRS, as suggested by the referee. The idea of correlating brain connectivity with metabolic imaging (CSI or even FDG-PET) during different stimulation paradigms (or resting-state) is appealing, as recent combined PET/fMRI experiments showed that BOLD and glucose consumption (CMRglc) are dissociated in a region- and task-dependent manner (e.g., Stiernman et al, PNAS 2021).

      Reviewer #3 (Public Review):

      Di Nuzzo et al demonstrate here that perception of visual stimulation is reflected in dissociable neurometabolic -but not neurovascular- responses in human visual cortex. This work uses human neuroimaging to show the effects of perception on neuronal energy demands and is of great importance for the neuroscience community. The authors carefully designed a task that would elicit similar BOLD response in primary visual cortex (V1) for perceived or unperceived visual flickering. They combined fMRI BOLD measurements with functional MRS, to quantify the functional (BOLD) and metabolic (concentration of lactate and glutamate) responses during visual stimulation. While they found no differences in the BOLD response within V1 for perceived vs. unperceived visual flicker, the authors show increased levels of glutamate and lactate in V1 when the flicker is perceived, suggesting increased energy metabolism during perceived visual stimulation.

      We thank the referee for the careful and constructive reviews of our manuscript.

      While BOLD response within V1 does not differ between perceived and unperceived flicker (Figures 3B, 2C, 3C), the authors find enhanced BOLD in the lateral occipital cortex when the flicker is perceived (Figures 3B, 3D).

      The authors consider BOLD in secondary visual areas to be a surrogate measure of V1 output, indicating that stimulus processing during perceived stimulation results in enhanced V1 output. The spatial and temporal resolution more commonly used in human neuroimaging do not facilitate building relationships of input-output neuronal activity in a way analogous to animal neurophysiology. The assumption that BOLD activity in secondary visual areas reflects V1 output is very tightly linked to the unique architecture of the visual system; however, the paper would benefit from including the uncertainty of this assumption in the discussion.

      We agree with the referee, and we now mention the uncertainty of the assumption that the BOLD signal increase we observe in secondary visual areas does reflect a rise in the output from V1. We also mention the lack of direct measurement to support such assumption as a limitation of the study.

      The paper would further benefit from following a more standardised way of reporting preprocessing steps of the fMRI data, as well as a more detailed description of the statistical analyses on the fMRI data.

      We have carefully checked the methods section related to the fMRI data analysis (prepreprocessing and statistics) and modified the text where appropriate. Thank you for pointing this out.

      Finally, the authors have provided a series of well-chosen controls to ensure that their findings are not driven by differences in levels of attention between perceived and unperceived stimulation (Figure 1). The authors are commended on the quality of their figures, their choice of detailed graphs and the constructive use of additional media.

      We would like to thank the referee for the complimentary comment.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper investigates what functional properties emerge from training an anatomically-constrained neural network on a specific computational task-detection of looming visual stimuli. Several functional models are identified by optimizing a network model for this task, and one of these models matches several properties observed in the fly neurons that perform the task. The approach and results are interesting. I did feel that several aspects of the work could be described more clearly, and that the potential of the model to reveal important aspects of the computation could be probed more thoroughly.

      Inhibitory component of model. The interplay between excitatory and inhibitory components of the model could be explored in more detail. A specific aspect that is interesting is the inclusion of rectification in the inhibitory circuit. Rectification is motivated by the extra neuron in the circuit proving inhibition (lines 155-157), but it is not clear why an additional neuron would require rectification. Are their physiological measurements that indicate that the extra neuron introduces rectification, or is that a speculation? Exploring whether rectification is important would also be interesting - e.g. by removing it from the trained models, and/or training models on circuits in which rectification is absent. Lines 360-362 mention interesting response properties created by inhibition, but do not define what those are. Including some of these extensions of the basic model could highlight the potential of the model to make predictions about specific circuit features that are important for detection of looming stimuli.

      Thanks for this interesting comment. Please see the revision summary and Essential Revisions 4 for more information. For lines 360-362 in the first submission, we referred to the Fig. 10E, F, where some examples of the peripheral inhibition are shown.

      Intuition for second model class. One of the key results in the paper is the existence of two classes of solution to the optimization problem - one of which follows the expectation for a detector based on outward optical flow, and the other of which does not. It is important to explain intuitively how the ”inward” model is able to detect looming stimuli, given that it seems sensitive to the wrong optical flow features. This should be early - e.g. around lines 214-216.

      We agree. In the revised manuscript, we have modified some expressions around (former) lines 214-216 to say explicitly that the inward solutions are sensitive to hit stimuli coming from the side of the receptive field rather than the center.

      In general, the results would benefit from developing some arguments in more detail. One example is the paragraph on lines 232-237. The differences in performance in Figures 8C and D stick out to me as a reader, but I am not guided through those differences in the text. Intuition for why you see the change in relative performance of the two solution (lines 266-268) would similarly be helpful. Another example is lines 290-292. These are several examples in which more explanation would be helpful, but you could look at the results in general with this in mind.

      These are good suggestions. First, We added more explanations about the differences shown in the Figure 8C. Figure 8D is basically a re-plot of the red and orange curves in Figure 8C, and is to show the distance dependency of the miss signals. Second, the relative changes in the performance of the two solutions appear due to the fact that the ROC and PR curves are bounded from above and the loss function is bounded from below (by 0). The better-performing solution (inward in this case) in general has less space to improve compared with the other one. Third, we moved the comments about the angular-size encoder in the discussion section to the results section after the sentences starting in (former) lines 290-292.

      The performance of the two classes of solution becomes more similar as the number of neurons increases. A concern is that this reflects saturation of performance rather than actual equivalence of the models. Can you make the task harder, e.g. by adding distracting optical flow? That might help separate performance of the different models and avoid saturation.

      It is correct that the tasks are relatively easy for our model, and both outward and inward models with large enough population size can almost perfectly distinguish hit cases from others. In this revision, we engineered a new set of stimuli with rotational background flows. In this case, both inward and outward solutions are found, and the outward solutions tend to perform better than the inward ones. Though this particular choice of more difficult task seems to favor outward solutions, we find it difficult to interpret, for lack of experimental comparisons. Instead, in the discussion, we interpret this result to show the potentially strong dependence of the solution on the statistics of loom stimuli, which requires characterizing. For more details, see Essential Revisions 3.

      Figure 10: how did you chose the specific outward solution used in this figure? More generally, some measure of the similarity of model components with experiment across all outward models is important. Currently the text reads as if you chose one of many models that happened to have components that looked like those measured. This comes up again on lines 310-311 and 313-315.

      We have answered this question in the section of Essential Revisions 5. With the new, simpler model, it is no longer necessary to pick from among the distribution of solutions.

      Are there animals that detect looming stimuli with fewer loom detectors? If so it would be interesting to see if they have adopted a similar or different computation.

      This is a very interesting question. However, the authors are not sure about the number of loom detectors in other animals, and also not aware of the existence of the inward solutions in either flies or other animals. One related point to note is that the LPLC2 neuron and its computational structure are not the only way to detect looming events, and there are other loom sensitive neurons and neural circuits that receive very different types of visual signals, such as LC4 in flies and LGMD in locust, which do not appear to receive directional inputs.

      Reviewer #2 (Public Review):

      The manuscript from Zhou et al. investigates how certain features of looming-detecting neurons can arise from optimizing a shallow neural network to detect imminent collisions. The authors consider architectures that resemble the known anatomy of LPLC2 neurons in Drosophila, with excitatory inputs from the four layers of motion detectors in the lobula plate and inhibitory inputs from the interneurons in those layers. The authors find that some fraction of the trained networks exhibit tuning properties of LPLC2 neurons, including (a) similar response profiles to stimuli that are not present in the training data; (b) similar dependence on the angular size of the looming object as opposed to angular velocity; and (c) similar dependence between peak response time and the ratio of size to speed of the looming object. The authors also find another solution among the trained networks that is very different from the biological circuits. However, they show that this other solution becomes less common as the number of neurons grows, which is the relevant regime for the biological circuit. This paper adds to a body of work that suggests that the structural or functional properties of brain circuits are the solution to an optimization problem implied by the task that they have to perform – in this case, the ability to detect looming motion.

      The conclusions of the paper seem well supported within the class of models that was considered. The choice of class is, however, rather narrow and could be better explained and analyzed.

      1. One potentially confusing aspect of the work is that there are in fact three major types of solutions that are found, not only two as described in the abstract: apart from ”outward” (similar to LPLC2) and ”inward” (dissimilar to LPLC2) there are also ”unstructured” solutions that, as far as I understand, basically fail to perform the task – although their performance isn’t adequately discussed. The authors comment on this in the Discussion, suggesting that the unstructured networks are local optima where the stochastic gradient descent algorithm they use for optimization gets stuck. They argue that evolutionary processes would be unlikely to linger there, implying that it might be fine to ignore these solutions. While reasonable, this claim is difficult to assess without more discussion of these results. These solutions are not a rare occurrence: according to the Methods, over half of the trained networks end up in the ”unstructured” pile.

      In our initial submission, the term ’unstructured solution’ was an unfortunate name to use for these solutions. In this revision, we call them ’zero solutions’, since all the elements in the filters are zero (or very close to zero). Please see Essential Revision 2 for a more detailed answer to this comment.

      1. The stimuli used in the paper are very simple: basically rigid, featureless objects moving in a straight line and at constant velocity, or rotating at constant angular velocity. Naturalistic stimuli are likely to be much more complex, which could hurt the training process. This is only briefly touched upon in the Discussion, leaving open the question of how the results of this work would change in more natural settings.

      This is an interesting point. Please see Essential Revision 3 for our responses and changes.

      1. The authors impose a 90-degree rotation symmetry as well as a reflection symmetry on the connection weights to the four layers of motion detectors that are sensitive to the four cardinal directions. Given that the training data that is used also has these symmetries, the question arises whether imposing these symmetries by hand was necessary. This is unfortunately not discussed in the paper.

      The imposed symmetries are not strictly necessary. Please see Essential Revisions 1 for details about how we have addressed this comment.

      1. One highly confusing aspect is that there is, in fact, an additional symmetry: the same filters are used for all the subunits. The difference between the different subunits seems to be only in the inputs that they receive – i.e., that they are responding to different parts of the visual field. This is only really apparent from the Methods. Given again the rotational symmetry of the inputs, it would be reasonable to assume that this symmetry could be learned, but this isn’t discussed or explained properly.

      Yes, we agree that this symmetry could be learned, but this requires a lot more training data, which is not practical in terms of computational cost. In addition, this imposed across-unit symmetry makes different models with different M’s have the same number of parameters, which is a nice property to have when studying how the population size affects the model performance and trained filters.

      1. The authors say that the ”outward” model reproduces biology but I’m not sure that the details of the lobula plate circuitry match this claim. For instance, LPi neurons typically have broad arbors, making location specific inhibitory inputs unlikely. And is there evidence that the inhibitory inputs are limited to a small region, like in the model?

      The LPi neurons seem to be similar in size to the LPLC2 dendrites in the lobular plate (Klapoetke et al. (Nature, 2017), Figure 5K and Extended Data Figure 9). In our outward models (both linear receptive and rectified inhibition), the inhibitory components are larger than the excitatory components when the number of units is large, which is at least consistent with potentially larger pooling of inhibitory signals than excitatory ones. Please refer to Essential Revision 4.

      1. Why not test the predictions of the model by analyzing the inputs onto the LPLC2 neurons using connectomics datasets?

      We would have loved to do this. Regrettably, the hemibrain dataset lopped off virtually all of the lobula plate. Our response to Essential Revision 1 expands a bit more on this point.

      Reviewer #3 (Public Review):

      Although collision detecting neurons have been identified across animals, the computations they perform remain unresolved. Here, Zhou et. al train artificial neural networks to predict collisions across a diverse set of visual stimuli and constrain network geometry using the known anatomy of a Drosophila looming detector cell type, LPLC2. Zhou et al demonstrate that trained networks converge upon three solution types: an unstructured solution, a solution where inward motion is excitatory, and a solution where outward motion is excitatory. Interestingly, the solution excited by outward motion is also inhibited by inward motion as predicted for LPLC2 computations, and the output of these trained networks is highly similar to measured LPLC2 responses across stimuli.

      1.Strengths: a. The novelty of this study is that the networks are trained to solve a problem(collision detection) instead of being trained on neural data, but as a result are able to reproduce neural data. b. The authors investigate how collision detection solutions change when looming is computed by a single neuron versus a population of neurons. This is particularly interesting because looming detectors have been identified at both population and single neuron levels. These results shed light on why many different collision detection computations have been proposed across neurons and across species, as they may face different anatomical constraints. The results also provide novel computations that can be further investigated in vivo. c. The manuscript is well written, the figures are clear, and the movies are very helpful in understanding the approach and the results.

      2.Limitations: a. The findings could be strengthened by a more thorough characterization across the different solutions. For example, only two of many outward solutions are compared to actual neural data, and there is no explanation for why these two solutions were selected and whether they are representative of the entire category of outward solutions. There is also no metric for evaluating how well these solutions match the neural data.

      For a more detailed response to this comment, please see Essential Revisions 5. In particular, our focus on the linear receptive field model has eliminated this issue with the distribution of solutions in the main presentation of the results. We believe this is overall less confusing than the prior presentation of the more complicated rectified inhibition model.

      b. The inward solutions are dropped from the last section of the paper; however, it would be very interesting to see the output of example inward solutions in comparison to actual neural data.

      Please see Essential Revisions 2. We have added the inward solutions to Figure 10 in the supplemental figures.

      c. Within outward solutions, there are cases where inward inhibition is completely absent which does not follow what is known about LPLC2. The authors should mention this and also provide a comparison between outward solutions with or without inhibition.

      With the simpler, linear RF model, these are no longer the focus of the study. They do still exist in the rectified inhibition model solutions, which have substantial variability.

    1. Author Response

      Reviewer #1 (Public Review):

      This group has examined basement membrane composition using sophisticated technical methods previously. Here they have methodically examined kidney organoids for their resemblance to mammalian foetal kidneys in the temporal expression of membrane proteins. They continue this through to adulthood and use peripheral blood leucocytes to demonstrate the effect of a COL4A5 mutation on the expression of basement membrane components. The manuscript's strengths are its methodical nature and the number of proteins examined, as well as building on previous work. Its weaknesses are that we do not know how good a model the organoid is for Alport syndrome and whether it results in an intact glomerular basement membrane. So far, this manuscript has demonstrated that the organoids are consistent with what we know - but can it also tell us new things? In addition, it has only examined one pathogenic Alport COL4A5 variant and this person also had a COL4A4 variant and thus complicated disease.

      Thank you for reviewing our paper and for highlighting the strengths of our work. Regarding the novelty, we consider the primary advance of our manuscript is the focus on the assembly and remodelling of extracellular matrix in kidney development. Through this focus we demonstrate that kidney organoids are a valuable human, multicellular system correlating with the matrix changes observed in mammalian kidney development at both gene expression and protein levels. Finally, we have shown that human kidney organoids can be used to study basement membrane assembly in health and disease, using Alport patient-derived organoids. As far as we are aware, this is the first time-course study using organoids to track the intrinsic changes in basement membranes during development. As such it will facilitate further studies into developmental transitions in basement membrane components during kidney development and permit detailed evaluation of the early changes that occur in genetic conditions that affect basement membrane assembly.

      Reviewer #2 (Public Review):

      Morais et al provide a convincing model for understanding basement membrane (BM) biology and interactions of BM components. The key findings of this paper are to establish a model that recapitulates the same biology and chemistry that is occuring during equivalent kidney development in humans, primarily. Utilizing kidney organoids, the authors characterize the spatiotemporal relationship of the proteins within kidney organioids as they form distinct basement membrane structures. They kidney is vital system in itself for understanding basement membranes among many different organs/tissues as they kidney has served as a genesis for discoveries over the last 60 years. Here the authors describe not only the timing of proteins in the development of kidney organoid BMs, but also the spatial relationships. Importantly, as a kidney BM model, the authors recapitulated the disease state of Alport syndrome (AS), a syndrome involving the disruption of the collagen IV α345 network in kidneys, an essential component of kidney BMs. Furthermore, they find that this model of kidney organoids derived from AS patients had the same hallmarks during development as AS in a human patient, including laminin overcompensation as a result of α345 network disruption.

      This manuscript provides an invaluable model for understanding overall BM biology and disease progression, and especially so for kidney BM biology and kidney diseases. The potential for this model to study any number of missense variations within any number of proteins within a tractable and functionally identical BM is worth noting and exploring by other researchers.

      We thank the reviewer for highlighting the strengths of our manuscript.

      In general, the weaknesses of this article are insignificant as this manuscript aims to provide functional proof of concept of kidney organoids as a model for understanding human BM disease. Importantly, however, is the assumption that kidney BMs might represent all BMs. The diversity of BMs across tissues within humans alone is significant. Amongst different organisms from a broader evolutionary standpoint than just fly, C. elegans, mouse, and human, BMs are very likely exceptionally diverse from the earliest animal BMs to different human tissues BMs. While this model provides an important model for understanding BM biology, a caveat that a kidney BM will functionally differ from a lens BM should be apparent and noted. However, the open-ended question of how to create tractable models like kidney organoids in other tissues systems will be of use in stimulating the matrix, proteomic, and structural biology fields.

      Thank you for this insightful comment. We agree that BMs are diverse and dynamic both in composition and structure throughout life. We have made alterations throughout the manuscript to highlight this point and to further emphasise the focus of this manuscript on kidney development.

      Reviewer #3 (Public Review):

      The emergence of methods to convert human induced pluripotent stem cells (iPSCs) into cultured kidney organoids that phenocopy the normal progression of embryonic and fetal differentiation represent a major advance in the study of normal and defective renal morphogenesis. This progress has been enriched by the addition of temporal/cell-type specific proteomics.

      The current study largely focusses on the site-specific compositional changes that occur in basement membranes (BM) that form on different abluminal cell surfaces as differentiation advances. A general model of BM assembly from earlier studies provides a foundation upon which to interpret organoid kidney development. Laminins initiate BM assembly by binding to cognate cell-surface receptors, polymerizing, and binding to secreted nidogens, proteoglycans and collagen type IV, the last forming a second stabilizing polymer network. The iPSC differentiation system reveals the assembly and turnover of BM components consistent with the above, but now provides detailed information on the accumulation and turnover of different components in the key cell types through the different steps of differentiation with proteomic correlation. The approach also enables the analysis of the assembly defects and consequences arising from human congenital diseases as was shown with a type IV collagen alpha 5 subunit in organoids derived from Alport cells.

      In combining organoid kidney culturing with laser microdissection and proteomic analysis, the authors have advanced use of the new tool compared to a 2018 study (Hale LJ et al., Nat. Communications), pushing the model from 18 to 25 days of differentiation and focusing more on BM formation during development. Evidence is presented to show that the major cell types, importantly including vascular endothelial cells, appear in the organoids in a temporal sequence. Relevant changes in BM-associated components are also shown. BM staining patterns are shown to change with emergence of laminin alpha5, laminin beta2 and collagen-IV alpha3 (replacing laminin beta1 and collagen-IV alpha1/2) at later stages. Organoids generated from iPSC cells derived from an X-linked missense variant of COL4A5 generated glomeruli containing alpha3/4/5, but with increases in laminin beta2, a known compensatory outcome.

      The evaluation of later renal differentiation stages is particularly critical for the study of the glomerulus in which BM components undergo isoform switches that normally correlate with glomerular vascularization. A limitation of previous differentiation glomerular models has been the inability to show formation of the vascular tuft and the associated morphological changes as well as to show podocytes form inter-digitations. In that light, the current study could be strengthened by showing the ultrastructure of the day 25 glomeruli with identification of the BMs and different glomerular cell types (noting in particular if vascular endothelial cells are beginning to organize into the morphology of vascular tufts), and revealing the appearance of podocyte processes. It would also benefit the reader to enumerate the strengths as well as limitations with the culture model and how this work compares to previous studies.

      The current submissions addresses temporal and tissue-specific BM changes during organoid kidney development. Day 25 kidney organoids contained tubules, stroma, and glomeruli with partial resemblance to (mouse) E19 kidney. Tubular and glomerular BMs are seen to form, the latter showing the expected switch from alpha-1/beta-1 laminins to alpha-5/beta-2 laminins, and alpha1/2 type IV collagens to alpha3-containing type IV collagens required for glomerular maturation.

      Thank you for reviewing our manuscript and for your summary of how our findings relate to other seminal studies in the field of basement membrane assembly.

    1. Author Response:

      Reviewer #2:

      The overall approach of this study is to compare gametocyte related parameters of infected blood samples from asymptomatic children, in some cases followed over time, with matched samples from uncomplicated malaria infections during the transmission season. A variety of parameters are analysed to investigate which mechanisms are used by the parasite to ensure that gametocytes are present in adequate number