1,089 Matching Annotations
  1. Nov 2020
    1. Reviewer #1:

      The importance of host associated microbiomes for health and disease of their hosts cannot be overstated. Fungi tend to feature more prominently in microbiome studies of soil or plants, but microbiome work in animals has mostly focused on bacteria, with fungi having received comparatively less attention. The current study addresses the question whether there is evidence for co-evolution or consistent ecological filtering of fungal communities in the animal gut, similar to what has been reported for bacteria. Such patterns have been termed "phylosymbiosis", even though the ecological interactions that underlie such patterns are largely unknown.

      The strength of the study is the wide range of animals investigated, 49 species from eight different classes of vertebrates and invertebrates. However, this wide sampling also is a weakness, as few groups are well sampled. Members of the same species are found to have relatively similar bacterial and fungal microbiota, and fungal microbiota are found to be somewhat correlated with phylogenetic distance. There is also correlation between bacterial and fungal communities, but whether this is driven by independent effects of the host on both groups, or primarily by interactions between the two microbial groups remains unknown. Some of the other observations, such as the tendency of bacterial diversity to be higher than fungal diversity, are more difficult to parse, since it is not clear what the proper yardstick for diversity comparisons is (i.e., whether functional differences between fungal ASVs are comparable to functional differences between bacterial ASVs). This study provides interesting insight regarding the general characteristics of the fungal microbiome and its relationship to the bacterial communities and the host. It does not directly reveal how these communities might affect the host. As the authors themselves state, "The drivers of phylosymbiosis remain unclear".

    1. This manuscript is in revision at eLife

      The decision letter after peer review, sent to the authors on November 19 2020, follows.

      Summary

      The paper reports the involvement of isoleucine 553 in targeting Drp6 to cardiolipin containing nuclear membrane. The data are interesting, but there is no mechanistic understanding of how a single amino acid can target this protein so specifically to cardiolipin enriched membranes.

      Essential Revisions

      The authors are strongly requested to address the issues that were raised in the previous review. The authors state in their rebuttal that they plan to address them in a timely manner. The additional request of one reviewer that should be addressed is to test the involvement of residue 552 and 554 to highlight the significance of isoleucine in position 553 in targeting Drp6 to cardiolipin.

    1. Reviewer #3:

      The authors showcase results from an experimental pipeline aiming at demonstrating how evolution of in vitro cancer models can be exploited to identify somatic genomic and structural variants associated with the emergence of drug resistance.

      To this aim the authors unbiasedly selected 5 widely used chemotherapeutic agents via systematically treating the HAP-1 cell line with 16 different drugs then chose those yielding clinically compatible half-maximal effective concentrations. After generating stably resistant clones ) of the HAP-1 parental cell line, across a number of replicates (by culturing in sublethal doses of the selected compounds), the authors whole-exome/whole-genome sequenced their models and compared variants observed in the resistant clones versus those present in the parental line.

      In this way the authors identified recurrent loss of function variants across replicates in the drug resistant clones per each drug and were able to reproduce the increase in drug resistance of the parental line by knocking out the genes found altered in the drug resistant clones or they were able to reproduce the same finding by pharmacologically inhibiting genes found to host gain-of-function mutations in the resistant clones. Thus highlighting a new potential target for combinatorial cancer therapy and chemosensitization.

      Briefly, this is a nice piece of work showing for the first time that exploiting in vitro evolution paired with whole genome analysis for identifying targets for combinatorial therapy and elucidate the mechanisms involved in the emergence of drug resistance is practically feasible.

      The experimental pipeline and the followup validation experiments are well thought and designed and outcomes convincingly support the authors' final claim. There are no arbitrary nor unjustified choices and the showcased platform seems to be robust enough.

      I would like to see the following few points addressed/answered:

      1) The authors focused only on chemotherapeutics while composing their initial search basin. Would considering also few targeted therapies worthy? or it is known that no effects would be exerted on HAP-1? This should be briefly mentioned.

      2) The title of the manuscript can be improved: the authors are deconvolving genomic alterations whose acquisition is linked to the development of drug resistance, thus potential chemosensitising targets or targets for combinatorial therapies. This could be better reflected by the title. As it is now it reads like the main aim is to identify 'innate/intrinsic' targets/cancer-dependencies.

      3) Mutagenesis experiments to identify mutations that are linked to the emergence of drug resistance might be mentioned in the introduction, and the following work cited: PMID: 28179366.

      4) When mentioning the 'Genomics of Drug Sensitivity in Cancer' portal (www.cancerrxgene.org) the following two works (describing the online resource) should be cited: PMID: 23180760 and PMID: 27397505

      5) Figure 1 nicely describes the experimental pipeline presented in this manuscript however it should be completed with a final panel or a couple of panels illustrating the genomic comparison between parental and drug resistant clones to identify SNV and CNV associated with drug resistance.

      6) It is not clear what the numbers in the 'fennel' in figure S3A refer to. Resistant clones within an individual tested drug? individual resistant clone or overall cases? This should be specified.

      7) As it is presented, Table 1 is not very informative/clear, I would replace it with a barplot.

    2. Reviewer #2:

      In this manuscript, Jado et al. studied the in vitro evolution of the haploid cell line HAP1 in the presence of five common anti drug agents. The authors exposed the cells to the drugs and then performed whole-exome or whole-genome sequencing (WES or WGS) in order to identify point mutations (SNVs) and copy number changes (CNVs) associated with resistance. In multiple cases, the authors confirmed that shRNA-mediated knockdown of a candidate gene (that is, a gene that was recurrently mutated at high allele fractions, or recurrently lost/gained) indeed conferred resistance to the drug.

      Overall, this is an elegant demonstration that in vitro evolution in cancer cell lines can be useful for the study of chemotherapy resistance. Surprisingly, relatively few studies attempted to identify resistance mechanisms to anticancer drugs using spontaneous evolution experiments, despite the prevalence of this approach in the study of antibiotics resistance. While the authors were able to identify and validate a few known resistance mechanisms to very commonly-used drugs, a major limitation of the current study is that it doesn't really shed any new light on chemotherapy resistance mechanisms. While I appreciate the time and effort that were required to perform the drug experiments and sequence the various clones, the follow-up studies are rather superficial and do not really extend our knowledge on any of the proposed mechanisms of drug resistance.

      Specific Comments:

      1) The AF threshold of 0.85 seems pretty arbitrary. Can this threshold be determined empirically based on the sequencing depth and noise of each sample? Mutations with AF>0.85 may still be subclonal, whereas mutations with AF<0.85 may still be of interest.

      2) While the rationale for performing the initial experiments in HAP1 cells is clear, it is unclear why no validation experiments were performed in additional cancer cell lines. It is imperative to perform the knockdown experiments not only in HAP1 cells but in a panel of additional cancer cell lines, in order to examine whether these are general mechanisms of resistance.

      3) Multiple CRISPR-Cas9 studies were performed to identify mechanisms of drug resistance to anticancer drugs. The authors note in the Discussion that these studies "are useful but cannot readily reveal critical gain-of-function, single nucleotide alleles". This makes sense, yet in almost all cases the authors use a simple loss-of-function shRNA assay in order to confirm their initial sequencing results. This means that the added value of the spontaneous evolution approach is rather limited, either because other mechanisms of resistance are much less common or because it is much easier to identify them.

      4) In the gemcitabine resistance experiment, the authors confirmed that RRM1 KD increased the sensitivity of the cells to the drug. A complementary experiment should be to test whether the overexpression of RRM1 would increase the resistance.

      5) In several cases, multiple SNVs or CNVs were identified in the same resistant clone at a clonal (or near-clonal) AF. Other than following up on "immediate suspects", is there a systematic way to tease apart resistance "drivers" from "passengers"? This should be at least discussed.

      6) The manuscript would benefit from language editing, there are quite a few grammatical errors.

    3. Reviewer #1:

      Major Comments:

      The experimental design is inconsistent in at least three ways:

      1) The genomes of 14 resistant clones were analyzed by whole exome sequencing (WES), whereas the genomes of the remaining 21 clones were analyzed using whole genome sequencing (WGS).

      2) And the sequencing approach even differed among the six lines evolved in three separate drugs: doxorubicin, paclitaxel, and gemcitabine.

      3) We feel the authors did not adequately explain how the different sequencing methodologies could affect their results and the inferences drawn from them. For example, one is likely to miss information with respect to copy number variants by only sequencing exomes. The authors highlight this fact in their discussion, but they do not explain by how much they could be off in their assessment.

      In some cases the same parental clone was used to find replicate lines subjected to the same selective pressure, and in other cases, the same parental clone was used to find replicate lines subjected to different selective pressures.

      Lines were evolved anywhere from seven to thirty weeks, and the length of the evolution experiments does not correlate with the selecting drug (e.g., three replicate lines were evolved in doxorubicin for 9 weeks and three other lines were evolved to this same drug for 12 weeks). Did the authors normalize by generations? Again, the authors do not address this issue in their manuscript.

    1. Reviewer #3:

      This work by Kilroy et al., is a nice study on the role of inactivity on DMD zebrafish and the beneficial impacts of neuromuscular electrical stimulation on muscle structure and function in these fish. The clinical presentation of muscular dystrophies is often variable which makes it difficult to predict the disease severity and progression. The key points of this work are (1) Same genetic defect could lead to phenotypic and functional variability (2) Inactivity in DMD deficiency worsens the disease progression in zebrafish (3) Neuromuscular electrical stimulation improves muscle structure and function. While this study summarizes these key points in a detailed manner, many of the mechanistic details leading to these observations are missing.

      1) There have been many published natural history studies as well as longitudinal imaging studies performed in human DMD patients. How does phenotypic data in zebrafish compare with longitudinal phenotypic studies in human patients?

      2) For data presented in figure 1: authors describe the birefringence phenotype in mild mutants as increased degeneration for three days and then increased regeneration. Could they provide any experimental evidence of "muscle regeneration" mentioned in this statement?. Similarly, they mention severe dmd mutant regenerated throughout this study, however, no experimental data is provided to support this statement. As myotome contains both normal and degenerating myofibers, could improvement in birefringence be a consequence of the growth of those normal myofibers vs regeneration of sick myofibers? The term regeneration has also been used later in NEMS studies and needs to be supplemented with the experimental evidence of regeneration.

      3) DMD is caused by damage in sarcolemma and subsequent myofiber detachment. The authors didn't observe any effect on myofiber structure but still found reduced velocity in mutants that were subjected to intermittent inactivity. Could this be due to a slight increase in sarcolemma damage (not examined here) and/or changes in the calcium in muscle fibers? Similarly, what are the effects of extended inactivity on MTJ structure? While authors make good observations with their animal model (as also seen in human and other animal models previously), mechanistic details underlying these changes are lacking.

      4) Authors show few transcripts in figure 10C that were restored to WT level in MT on eNMES treatment. What is the role of these genes in DMD pathology or muscle function? Why do authors think a change in these 5-6 genes out of several hundred genes is important?

      5) While authors demonstrate proposed ECM modeling in response to eNMES, it will be helpful to present changes in ECM structure in response to eNMES treatment (EM or IF).

      6) Previous studies in humans in other animal models have also shown that physical exertion or mild forms of exercise exacerbates the decline in muscle function in DMD deficiency. How are these results comparable to the previously published studies?

    2. Reviewer #2:

      In this paper, Kilroy, J.K. et al. Assess if inactivity in dmd zebrafish is deleterious for muscle structure and function. The authors first, categorized dmd fish into mild and severe phenotypic groups but by 8dpf this phenotypic variability disappears. Next, the authors devised two inactivity regimes: intermittent and extended and found that only fish undergoing extended inactivity exhibited improved muscle phenotype followed by rapid deterioration of muscle structures. Furthermore, these fish were more susceptible to contraction-induced injury. Finally, by varying the frequency, amplitude, and pulse of an electrical current, the authors developed four types of neuromuscular stimulation (NMES) aimed to mimic varying levels of strength training exercises. They found that endurance NMES improved muscle structure, reduced degeneration and increased fiber regeneration.

      Major Concerns:

      1) For the dmd phenotypic variability: the authors conclude that mild dmd phenotypic fish undergo extensive degeneration for the first three days followed by slight regeneration, while severe dmd fish undergo muscle regeneration throughout the study merit some caution. The authors should consider degeneration and regeneration rates. Compared to dmd fish exhibiting a mild muscle phenotype, dmd fish rate of degeneration early in development might exceed that of regeneration, while later in development, the rate of degeneration is probably lower compared to that of regeneration. To confirm that regeneration is the cause for increased muscle brightness over time in fish with severe muscle phenotype, assays showing degeneration, regeneration (and eventual failure of regeneration) should be performed.

      2) Intermittent inactivity: zebrafish are diurnal, thus it is not surprising that sedating fish at night, when they are naturally at rest, resulted in no major effects on muscle organization. Authors should consider repeating this experiment with daytime sedation and/or alternating between day and night intermittent inactivity. It is not obvious if the authors are referring to fish with mild and/or severe muscle phenotype. This is particularly important because the authors are focusing their birefringence analysis between 5-8 dpf in which phenotypic variability was reported and the mild and severe phenotype have not yet converged. Please clarify.

      3) Birefringence is one of two main assays used throughout the study. Birefringence is an assay that relies on polarized light bouncing from the anisotropic surfaces. Due to the anisotropic nature of the muscle this assay allows for visualizing the structure of the muscle. However, alignment of the fish is a critical part for this assay, if fish are not aligned with the direction of polarized light will exhibit a reduced and variable birefringence results. Thus, this might explain the discrepancy between muscle structure (birefringence assay) and muscle function (swimming behavior) in comparing the different NMES paradigms.

      Perhaps a Western blot assays for quantifying either a muscle or housekeeping protein during 5, 6, 7, 8 dpf between wildtype, dmd and dmd NMES treated fish might provide a quantitative picture of degeneration and regeneration cycles based on protein mass of the fish. That is, if the muscles are degenerating, these fish will have less total protein to that of its control and treated counterparts.

      4) Although the authors showed that inactive fish are more susceptible to NMES training. NMES was performed after the inactivity period. No experiments showing NMES treatments during extended inactivity will rule out if NMES could alleviate muscle wasting in relatively inactive fish.

      5) Although the authors found differential gene expression between dmd and wildtype fish that have undergone eNMES treatment. The authors fail to show differential gene expression in dmd and wildtype fish not undergoing eNME treatment. This comparison is critical for determining if eNMES is the result of these changes in genes expressed between both strains.

      6) Authors argue that eNMES improves cell adhesion based on % of fish exhibiting muscle detachment recovery. Authors should consider staining for ECM proteins in dmd and dmd plus eNMES fish to determine if indeed eNMES treatment improved cell adhesion.

    3. Reviewer #1:

      The manuscript by Kilroy and colleagues centers on demonstrating that inactivity is deleterious for DMD zebrafish and that electrical stimulation is highly beneficial in the model. The authors identify a subpopulation of inactive DMD (sapje) zebrafish that progress faster in dystrophic disease muscle breakdown. They use tricaine to restrict movement and show a faster myofiber breakdown in the severe DMD fish cohorts. The authors then use neuromuscular electrical stimulation (NMES) to improve muscle pathologies and overall DMD zebrafish outcomes. The authors go into extensive details in characterizing the consequences of NMES on normal and DMD zebrafish muscle growth, health, and overall function. Transcriptomic analysis reveals fibrotic and regenerative genes are modulated by NMES.

      Overall, this is a strong manuscript on the effects of NMES/electrical stimulation on DMD muscle growth. It does lay several parameters for evaluation of NMES in the zebrafish model. The manuscript is fairly well-written and most of the experiments are presented in a straight-forward manner with clear interpretations. I do have some issues with one or two points that the authors try to extrapolate from their studies. I have significant issues with the description and use of tricaine as an inactivity paradigm in these studies as there are multiple interpretations of these findings. I have a few points about the NMES stimulation protocol and NMJ contribution that should be addressed. This is a good manuscript and can be an important addition to the field if these points are addressed.

      1) The inactivity paradigm (e.g. figure 2) using tricaine as a means of inducing inactivity has pluses and minuses. There are issues with comparing it to rodent and human inactivity experiments (which usually involve hindlimb/limb immobilization), as the authors here are using chemical inhibition. Tricaine has systemic effects on multiple tissue types and organ systems including neurological and respiratory systems. I would be careful to call this model an inactivity model as a more appropriate model would be to physically restrain the zebrafish larvae to prevent movement. While technically challenging this experiment can be done and would likely be more reflective of the consequences of physical inactivity in the DMD fish than tricaine anesthesia. Mdx mice have respiratory consequences due to pulmonary muscle weakness, independent of an inactivity (Burns et al., J.Physiol., 2017).

      The authors need to rule out if the consequences of tricaine administration is due to inactivity or pulmonary/secondary dystrophic pathology issues (e.g. swim bladder or respiration).

      2) The NMES protocol is more extensively established by the authors and has a clearer interpretation. That being said, the main benefit of NMES is to stimulate muscle force/function in the absence of proper innervation by the NMJ, which is also disrupted in DMD. The authors do an excellent job in demonstrating that the NMJ does not change in morphology via immunofluorescence and anatomical observations. Can/have the authors evaluated the functional output of the NMJ in the NMES-treated DMD zebrafish? Were any electrophysiological measurements performed on the NMES treated DMD fish, independent of any therapeutic experimental protocol?

      3) Hmox1 overexpression has been pursued as a strategy for DMD in mice by the Zoltan Arany and Joseph Dulak's groups, so the findings in figure 10 are supported. Have the authors evaluated whether or not the entire Hmox1 pathway was affected in the NMES-treated DMD fish?

    1. Reviewer #3:

      Obstructive sleep apnea (OSA) is a common disease associated with intermittent hypoxia (IH) and is linked to health complications. The lung is the first organ to experience the IH and in this study Wu et al uses a mouse model of OSA to identify transcriptional changes in the lung as a whole organ. The authors then also use single cell RNA sequencing (scRNAseq) to further identify transcriptional changes in different cellular populations of the lung. The authors found changes in circadian and immune pathways and that endothelial cells in the lung specifically showed the greatest transcriptional changes. The data will be useful as a reference for the field in understanding transcriptional responses in lung cells exposed to IH.

      scRNASeq is an exciting technique that has the potential to identify how different cell populations respond to a stimulus (in this case intermittent hypoxia). However, it provides an enormous amount of data which requires significant processing and interpretation. This paper contains a huge amount of data generated by scRNASeq, yet the actual data section is very short. Given the complexity of information obtained, I think it warrants a more detailed analysis in the results section and discussion. It would be helpful to me if the authors could distil the very large volumes of information into a more extensive discussion of their findings (particularly discussing the figures in more detail). Is the summary finding of this paper that early changes in hypoxia and circadian gene expression drive later disease in the lungs of OSA patients? The abstract seems to focus on hypoxia, circadian and immune changes but the data text section focuses very little on these pathways. More details on the figures shown and tying the figures to the results text would improve this paper and enable further interpretation by readers.

    2. Reviewer #2:

      General assessment of work:

      In contrast to the author's claim of OSA, the experimental design mostly focused on intermittent hypoxia neglecting sleep pattern and arterial oxygen level. The entire study is based on exploratory approach without any validation, confirmatory experiments. The selection of marker to cluster many cells is not critical. It seems that this selection method caused various abnormal biological process patterns, types and proportion of certain reported cells in the lung.

      Summary:

      1) OSA is having complex pathophysiology and IH is the one aspect of OSA. As it seems that the authors did not measure arterial oxygen pressure upon the induction of IH and also it was not sure IH was induced when the animals were really on sleeping mode. In Figure 6, they should have tested the gene expression of OSA patients to make sure that their models are physiologically relevant. So it would be better to avoid OSA in the manuscript but they can mention the IH.

      Results:

      2) While it is understood that the authors tried to mimic OSA by doing the experiments in "inactive phase" to conduct IH, what will happen if they do in active phase? Do the authors expect the changes in circadian rhythm related genes when they induce IH in active phase? As the authors did not focus on sleep pattern (it seems), "inactive" and "active phase" should not be an issue. The authors should clearly mention that what is the sleep pattern during "inactive" or experimental phase. As they are exposed to IH inactive phase, it seems there is no surprise in getting circadian rhythm related pathways. What will happen if they do the experiments in active phase? Then also they will find circadian gene effects?

      3) The induction of hypoxia might have disturbed the sleep pattern and this could have precipitated the endogenous stress via HPA axis. It is well known that HPA axis is linked with reduction in immune response. So the authors should check these.

      Figure 1:

      4) Angiogenesis is a kind of compensatory mechanism for hypoxia. Similarly other biological processes mentioned in Figure 1B should have some mechanisms related to hypoxia. This should be explained. Because some biological process like organ development has less meaning.

      5) Though they found the alteration in the proportion of different cell types in the lung based on the analysis, this should have been confirmed with the other techniques like flow cytometry. At least a few cell types that have seen gross alteration should have been checked. This is very crucial as most of the story is woven with the type of cells. BAL should have been performed to see the cellular proportions in the airway.

      6) Though it is not surprising to see the changes in endothelial cells, the change in myofibroblasts is interesting and this should be explained.

      7) It is not clear the downregulated genes in immune cells are due to reduction in cell number? Did they normalize to the number of cells? If cell numbers are reduced, what could be the possible reason? Was there any change in pathways related to apoptosis?

      Figure 2:

      8) In the context of almost 60% airway epithelial cells are non-ciliated and among these cells clara cells are predominant one and more than 95% of non-ciliated cells are Clara cells. In fact, Clara cells reside throughout the tracheobronchial and bronchiolar epithelium. Surprisingly the authors did not find Clara or Club cells in Figure 2. Also smooth muscle cells have not shown. What could be the reasons behind these? How have these markers been selected to segregate each cell type? How to explain the presence of abundant erythroblasts that are generally observed in bone marrow.

      9) While it is known that single cell sequencing has indicated the possible presence of new cell types, it should not ignore the already well known cell types. It is really surprising to see the predominant presence of endothelial cells. This is different from available literature based on single cell sequencing based molecular cell atlas. In general, Sox17, a marker of endoderm, is also expressed by other endoderm derived derivatives like epithelia. (Park et al, Am J Respir Cell Mol Biol. 2006 Feb;34(2):151-7). Please clarify.

      10) Amine oxidase C3 is a relatively new marker of myofibroblasts (Hsia et al, Proc Natl Acad Sci U S A. 2016 Apr 12;113(15):E2162-71). But this ectoenzyme is also expressed abundantly in adipocytes, endothelial cells and other cells. Please clarify.

      11) It is not clear why the authors have not chosen a well established marker to identify the cells.

      12) Figure 3: Top panel, it seems that hypoxia images had shown the lungs seem to be congested with relative thickening of the alveolar wall. This is well evident with HOPX staining in which one can see clear cut higher expression of HOPX in hypoxic mice. Same thing is partially true for Pro-SFTPC as well. All these seem to be a representative picture and so, the morphometry may be required to see the overall status of each marker.

      Figure 5:

      13) Though it is known that endothelial cells are able to phagocytose cells like red blood cells in conditions like aging, it is not clearly known that alveolar capillary endothelial cells, capillary aerocytes, will have professional phagocytic function in the context of main function in gas exchange. In this context, biological processes derived from softwares could lead to abnormal patterns. Also, how to explain decreased "vasculogenesis" and "regulation of angiogenesis" in capillary general cells while Figure 1B mentioned about increased angiogenesis.

      14) In a dynamic environment, these biological processes derived from the altered gene expression without actual demonstrative studies could lead to distortion in biological understanding. This is also evident in Figure 4: Figure supplement 2 where both upregulation and downregulation are observed in Erythroblasts (inflammatory response) and MPhage-DC (apoptotic process related). Similar dual altered pattern is observed in Figure 4.

      15) Figure 6: It is worrisome as there is no single validation or demonstrative experiment.

    3. Reviewer #1:

      Obstructive sleep apnea is an important medical problem, with elevated cardiovascular risk as a common association. Intermittent hypoxic episodes are a good predictor of such risk so a connection is indeed plausible. Thus the manuscript starts with a good premise, but what limits my enthusiasm is the large number of loose ends in the story that make it likely that what we are seeing is a small amount of signal, with a large amount of noise, limiting potential mechanistic insights that are translatable.

      Major comments:

      1) OSA and intermittent hypoxia are clearly different things. Further the hypoxia of OSA is much less in the lung compared to the systemic organs. To illustrate this point, an upper estimate for alveolar CO2 is the venous CO2, or more commonly 10-15 mm Hg elevation over normal i.e. 55 mm Hg. At even 60 mm Hg CO2, local oxygen tension in lungs would be above 80 mm Hg. Systemic desaturation is because of widening A-a gaps and physiological/pathophysiological shunts. While severe OSA with prolonged apnea could indeed be worse, the clinical associations are seen even with milder disease. Thus a-priori it is very unlikely that the model reflects the disease accurately.

      2) Given the limitations of the model, it is imperative that at least the pathways elicited by intermittent hypoxia be clearly defines so that even if we do not gain fully understanding of OSA, we may understand the consequence of intermittent hypoxia that may be relevant in another context. Here too the manuscript is lacking. The genomic analysis is interesting and indeed data rich. However, more attention could have been paid by exploring a hypothesis, ensuring multiple markers for target cell populations, and building a mechanistic model. In current form, the work is hypothesis generating, based on limited markers and analysis, and is extrapolated widely to other pulmonary disease without a solid rationale.

    1. Reviewer #2:

      In this study the authors claim that short lasting low intensity ultrasound stimulation activates many neurons in the whole brain. They further claim that the activation mechanism is via the ASIC1a channel. There are some intriguing results in this paper, but there are also many open questions and methodological issues that should be addressed. The authors use pERK as a surrogate for neuronal activation by a global ultrasound stimulus. Some but not all neurons in cortex seem to show activation (it seems only large pyramidal cells, why not interneurons? More analysis needed here.

      This experiment is followed by an in vitro experiment with cultured cortical neurons from neonates (no ages given for the animals used in this experiment as far as I can see). These are also not equivalent to the adult cells tested in the in vivo experiment. In the bulk of the experiments calcium imaging is used as a surrogate to measure neuronal activation. Unfortunately, in none of the graphs displayed of the Delta F/Fo is there any indication of the number of cells selected and measured. This is critical to evaluate the robustness of the results. In addition, it is normal at the end of the experiment to permeabilize the neurons to calcium by using an ionophore. This allows the assessment of the maximum fluorescence signal when calcium outside concentration equilibrates with the intracellular concentration. This was not done which means the experiments have no internal calibration.

      It is for me impossible to assess the robustness of the calcium imaging experiment when I do not know what each data point corresponds to, take Figure 2I as an example. Are these individual cells or means values from many cells from individual cultures? Many critical methodological details are indeed missing from the paper.

      The idea that ASIC1a is THE critical mediator of this effect is quite surprising and the more dramatic and implausible the conclusion may seem, the more solid the evidence needed. The authors should use ASIC1a mutant mice both in vivo and in vitro to prove that ASIC1a really is critical. The same applies to the apparent effect on neurogenesis.

      The videos show quite large physical effects of the ultrasound on the cultures (cells moving around). This is problematic as it may be that what the calcium signals are purely indicative of cell damage. Controls should be provided to ensure this was not the case.

    2. Reviewer #1:

      In the manuscript entitled “ASIC1a is required for neuronal activation via low-intensity ultrasound stimulation in mouse brain", Lim et al. investigate the mechanism underlying the activation of brain neurons by transcranial low-intensity ultrasound stimulation. The authors propose that ultrasound stimuli-induced movements of the extracellular matrix and the cytoskeleton cause mechanical activation of ASIC1a in cortical neurons, which leads to Ca2+ influx and subsequent expression of pERK, which the authors used as a surrogate marker for neuronal activation.

      While I agree that the finding that ultrasound activates neurons via activation of a mechanosensitive ion channel is per se very interesting, I have to say that in my opinion most of the conclusions and claims are not supported by the actual data.

      1) The entire study is purely correlative. Thus, the authors made two independent experiments; on the one hand they show that in-vivo transcranial ultrasound stimulation induces pERK in various brain regions and on the other hand they shown that ultrasound-evoked Ca2+ influx in cultures of cortical neurons is probably mediated by ASIC1a. From this data they conclude that pERK activation is also mediated by ASIC1a activation. This is, however, pure speculation. The authors must provide additional evidence to support their claim. In my opinion the sole use of PcTx1 is not sufficient to prove that the Ca2+ signals are mediated by ASIC1a. Hence, firstly the authors should demonstrate that ASIC1a is indeed activated by ultrasound. This is a very simple experiment. All they would have to do is express ASIC1a in a cell line (e.g. HEK293, CHO, etc) and show that this expression renders the cells sensitive to ultrasound. Second, I would appreciate it if the authors would show that cortical neurons, especially those that show pERK activation, express ASIC1a in the first place. This would also be quite simple - just co-stain the brain sections with an anti-ASIC1a antibody. Third, if the authors want to keep up their claim (see title) that ASIC1a is required for ultrasound activation of brain neurons they should examine ultrasound-induced pERK activation in ASIC1a-knockout mice.

      2) It is difficult to evaluate the Ca2+ imaging experiments, because the method - especially the ultrasound stimulation - is not very well described. Hence it is unclear to me how close to the cell the ultrasound stimulator was placed. Moreover, the N-numbers of the Ca2+ imaging experiments are rather small (by the way, it would make reading much easier if the N-numbers were indicated in the figure). Most importantly, it is unclear if the inhibitors (Gadolinium, GsMTx4 etc - Figure 2B-H) were applied to the control cells from the same panel or to different cells. In this context it would be important to know how many control cells actually responded to the ultrasound stimulation. Considering the low N-number, I was wondering if the authors may have had a hard time finding cells that responded and that this is the reason why the N-numbers are so small? I suggest examining many more control neurons and provide information about the proportion of cells that respond. If only for the controls as well as for the cells treated the various channel inhibitors.

    1. Reviewer #3:

      The manuscript by Dr. Vlachos group has demonstrated many important features as well as mechanisms of RA-induced synaptic plasticity. For example, they demonstrated that RA-induced plasticity happens in human neurons as well as in rodent neurons in vivo; discovery that synapodin as a critical mediator of RA plasticity as well as RA effect on the size of spine head, synaptopodin cluster and spine apparatus. Moreover, the effect of RA on in vivo LTP plasticity is very interesting. The data looks solid and supports the authors' conclusions.

      However the manuscript can be significantly improved by discussion of their results, in the context with literature data as well as explaining the possible mechanism of their results.

      1) RA effect on AMPAR upregulation has been reported to not share the same SNARE mechanisms as electrical LTP (Synt1/7 independent vs dependent). How does RA have the extra effect on the LTP amplitude? Moreover, RA plasticity is recognized as a form of homeostatic synaptic plasticity, i.e., the effect takes hours to develop as shown by the authors of RA incubation of many hours in their experiment on human neurons. How does this compare with their RA manipulations in LTP exp (Is TA injected shortly before LTP stimulus? What do the author think that LTP stimulus does to RA signaling?)?

      What about metaplasticity involves RA? any connections to the present study?

      2) The authors conclude that RA have effects on spines with or without spine apparatus, however, the authors' data suggest that RA-plasticity is blocked when spine apparatus is eliminated (with synaptopodin KO). Moreover, there is significant overlap of spine size for spines with or without spine apparatus... How do the authors interpret their results here? Is spine apparatus dynamic? can floating between spines quickly? Any literature on this? The authors need to discuss more on the possible ways, with supporting literature data, of how this spine apparatus can affect RA function.

      In short, a discussion of the above points will add significance to the study.

    2. Reviewer #2:

      This paper explores the effect of all-trans retinoic acid (atRA) on synaptic plasticity in human and murine brain slices. The paper builds on previous work showing that atRA plays a key role in various forms of homeostatic and Hebbian plasticity, but extends our understanding in two very significant ways. First, the work convincingly shows that atRA enhances synaptic function in human layer 2/3 pyramidal neurons in intact cortical slices, and like previous studies using murine models and human ipSCs, this is critically dependent on new protein synthesis. Second, the studies show that atRA-mediated synaptic plasticity requires synaptopodin, a protein that is specifically localized to the spine apparatus.

      Overall, the studies have been well-executed and the data are both rigorous and convincing. The paper is very clearly written and the findings are significant. This is a very strong body of work that will be of broad interest.

      Comments:

      1) While the authors rightly point out in the introduction that no previous studies have assessed atRA effects in human cortical circuits, the Zhang et al. (2018) paper did elegantly show synaptic plasticity effects in human neurons (derived from ipSCs). This is noted in the discussion, but should also be pointed out in the introduction as it bears directly on the rationale for the studies described in the paper.

      2) Figure 1C illustrates responses of layer 2/3 pyramidal neurons to intracellular current injection. While the passive membrane properties are quantified and similar regardless of atRA exposure, it is not clear if atRA affects intrinsic excitability of these neurons (i.e., the number of spikes elicited by different levels of injected current). These data should be included.

      3) The legend for Figure 1 C-E is too vague and does not describe the specific measures that are shown in the figure.

      4) For the mouse studies shown in Figure 3A and 3B, did wild-type littermates serve as controls (the gold standard)? Data from wild-type neurons is described in the text but it is not clear if these were collected from a different cohort of animals or from the WT littermates of the Synpo-deficient mice. Also, the authors should state whether the deficient allele is null.

      5) The Synpo-deficient mice have basal sEPSC amplitudes that are noticeably larger than WT mice (as reported in the text). Some discussion of this observation is warranted.

      6) The cumulative frequency plots shown throughout the paper show a curious trend where the smallest events appear to be at least 10 pA or larger. This is somewhat atypical, as most studies find a large number of events between 5 and 10 pA (and many lower still). Does this reflect events only larger than 10 pA being included in the analysis? If so, the points to the left of 10 pA should probably be removed from these plots as including them implies that this data range was adequately sampled.

      7) The schematic shown in Fig4B refers to early-phase and late-phase LTP, but the recordings appear to be limited to 60 min post-LTP induction (i.e well before the late-phase). These terms should be replaced with the actual times post-LTP induction.

      8) The discussion is quite on point, but is rather brief. The paper would benefit from a more detailed discussion of the link between the spine-apparatus and translation-dependent forms of synaptic plasticity.

    3. Reviewer #1:

      The study by Lenz et al. explores the acute action of retinoic acid (RA) in adult human cortical neurons. The main findings are:

      1) Consistent with previous findings in mouse neurons, the authors reported enhanced excitatory synaptic transmission in RA-treated cortical layer 2/3 neurons.

      2) Also consistent with previous findings, this enhancement is independent of gene transcription, but requires protein synthesis.

      3) RA's effect on EPSC requires expression of an actin-modulating protein called synaptopodin. In the Synaptopodin deficient mouse mPFC neurons, RA's effect on EPSC is eliminated. Moreover, in synaptopodin deficient hippocampal dentate gyrus neurons, enhancement of LTP by RA is also reversed.

      Overall, this study demonstrates RA-induced synaptic plasticity in acute human cortical neurons, thus expanding the previous findings from mouse neurons and immature human neurons induced from iPS cells to adult human cortical neurons.

      Specific Comments:

      1) Figure 3 shows that in synaptopodin deficient mouse neurons, RA no longer increases sEPSC amplitudes. The rescue experiments are very nice. However, in both WT neurons (stated in main text, not in figure) and rescue neurons (Fig. 3B), the baseline sEPSC amplitudes are significantly smaller than those of the KO neurons. Can the authors speculate why deletion of synaptopodin may lead to enhanced basal excitatory synaptic transmission?

      2) The LTP experiments are a bit problematic. First of all, it was done in mouse hippocampal DG neurons, not cortical neurons. The effect of RA may be different in different neuronal types, as has been shown in previous mouse studies. It will be nice to examine whether RA changes basal synaptic transmission in these neurons in acute slices. Without knowing the effect on basal transmission, it is hard to interpret the LTP results. Second, why did WT DG show no LTP? Third, previous work by Arendt et al. (2015) showed that RA enhances hippocampal CA1 neuron basal EPSCs, and occludes further LTP. The observation here in the DG with RA treatment points the opposite direction. Can the authors offer some explanation (i.e. RA alters LTP threshold through some kind of priming)? Again, knowing the effect of RA on basal transmission specifically in the DG neurons would be informative toward understanding the effect on LTP.

      3) The pharmacological treatments (ActD, anisomycin etc.) in this study are in general very long (6 hr) compared to conventional methods (less than 2 hr). To control for potential toxicity associated with prolonged treatment, vehicle control should be added in both Fig 5 and Fig 6.

    1. This manuscript is in revision at eLife

      The decision letter after peer review, sent to the authors on August 28 2020, follows.

      Summary

      This manuscript presents new tool to detect and classify mice ultrasonic vocalizations (USVs). The tool (VocalMat) applies neural network technology for categorization of the various USVs to predetermined categories of pup calls. The paper in the form submitted seems to fit more as a methodology paper. Indeed, the authors state that the goal of their work is to: "create a tool with high accuracy for USV detection that allows for the flexible use of any classification method."

      The paper is well written and presents a useful tool to identify and classify USVs of mice. However, the reviewers think that the authors did not provide enough supporting evidence to claim that their method is significantly superior to other tools in the literature that attempted USV classification. For example Vogel et al (2019) - https://doi.org/10.1038/s41598-019-44221-3] - reported very similar (85%) accuracy using more mainstream ML approaches than attempted in this study with CNNs.

      Moreover, some of the reviewers were not convinced that the comparison to other tools was conducted in an unbiased and completely fair manner and that the approach described in this paper really represents a significant advantage over other tools. For example, two reviewers claim that the authors used DeepSqueak on their dataset without properly training it for this type of data, while their tool is specifically trained for it. Also, the reviewers expect to see a confusion matrix to assess model performance and establish whether the model does indeed replicate accurately classes (or how skewed it is with dominating classes).

      Overall, all the reviewers agree that they would like to see a more rigorous attempt to validate the findings presented (ideally also on an external database) and proper (unbiased) comparison with other similar software, to justify the claim that VocalMat performance in classification of USVs is indeed superior and novel to the methods already in use.

      If the authors wish to have the manuscript considered as a research paper and not in the form of a methods paper they should change the focus of the paper and provide more data showing a novel biological application of their pup calls classification findings. If not, we will be happy to consider a suitably revised version of the manuscript for the Tools and Resources section of eLife.

    1. Reviewer #3:

      In this manuscript, Sachella et al examine the contributions of the lateral habenula (LHb) to fear conditioning. They use 3 different paradigms: (1) a contextual fear conditioning paradigm, (2) a cued fear conditioning paradigm, (3) a combination paradigm where both context and cues can predict shocks. They also manipulate the LHb in several ways: (1) using muscimol, (2) using inhibitory optogenetics, (3) using excitatory optogenetics. The results are thought-provoking and would represent a novel contribution to the field, but I am left confused about some of the major points. My suggestions for improvement/clarification of the manuscript are as follows:

      Major Comments:

      1) Some important points need to be brought up in the introduction in order to frame the problem the authors are addressing and motivate the study. First, the introduction needs more background on separate circuits controlling cued vs contextual fear conditioning (hippocampus, amygdala). This only comes up in the discussion. Readers also need more background on connections between known structures for fear conditioning and the LHb. There should also be explicit discussion of the well characterized connections between LHb and dopamine neurons, including how LHb inputs help generate reward prediction errors that may be important for fear conditioning. The idea that prediction errors contribute to the authors' observations could be foreshadowed here.

      2) In general, the muscimol experiments are nicely done. However, muscimol is always administered during training. I am left wondering whether LHb activity is required during the initial learning of the association or for consolidation later. It would be ideal to also include a test of muscimol infusion immediately following the FC training, during a memory consolidation period. This is important because the authors at times seem to argue that the LHb is important specifically for memory consolidation, but later in the discussion claim that activity during the training (related to prediction errors) is an explanation for their results.

      3) I'm struggling with the interpretation of the experiments in Figures 3 + 4 using the cue + context FC paradigm and talking about "reconsolidation." These are really key to the paper so making sure the experiments are clear is a must. From the cue + context test, it seems that having both cues + contexts available for memory provides a much stronger memory. I am uncertain about why the authors think this is so and whether the effect is independent of the LHb? For the "reconsolidation" experiment, I can't figure out what's new. The no-reconsolidation group should look like Figure 2 muscimol group, and it mostly does. The reconsolidation group should look like the Figure 3 muscimol group, and it mostly does. So this looks to me more like a replication of Figures 2+3 (with no vehicle control) than anything else. What did we learn that could not be learned from the experiments in Figures 1-3? The suggestion is that "FC training under inactivation of the LHb creates a cued memory whose retrieval depends on contextual information." (lines 154-155). I don't disagree with this interpretation necessarily but it seems vague, and there is no circuit-level insight as to the mechanism.

      4) The ArchT experiments, as the authors already recognize, are problematic because of potential heating and other artifacts. 25s of continuous 10mW green light is a lot. I am not left with much confidence in interpreting these experiments and therefore I am not sure why they are included in the paper. There are other methods of optogenetic inhibition that would be better suited perhaps, or the results could be replicated with chemogenetics, where the authors could ensure DREADD viruses did not spread into the medial habenula.

      5) The oChIEF experiments are interesting, but again very difficult to interpret. There is no data showing what the stimulation does to LHb firing, which is a concern given the very long light stimulation (through the whole experiment). Therefore, it is unclear whether the authors' hypothesis that the light stimulation interferes with normal function is correct. The design here also does not take advantage of the temporal precision of optogenetics.

    2. Reviewer #2:

      In this work by Sachella and colleagues, the role of the lateral habenula (LHb) is investigated for its role in fear conditioning during initial encoding and subsequent retrieval in a later setting. This diencephalic nucleus has received a significant amount of attention in the preceding decade after its connectivity and regulation of neuromodulatory systems during learning and motivation was discovered. However, much less is known about its function in fear learning and memory. Building on the findings the authors report in an earlier avoidance setting, the present study deftly employs a series of pharmacological and optogenetic tools to identify the potential time-limited role of LHb in fear memory. Overall, the findings fit well with their previous work, and builds upon these observations by adding in more contemporary genetic tools to parse these aspects of the task. In particular, I was very enthusiastic about the further exploration of LHb in an associatively-learned fear approach; the strategies that have been highly successful in our understanding of amygdalo-hippocampal fear systems here are compelling applied to the LHb which traditionally has been better understood in stress and motivational settings. However, while the studies themselves were carefully conducted, it was not clear that these observations provided a conceptually transformative approach to the understanding of these neurobehavioral processes. Furthermore, some potential limitations in controls and isolation of important circuit function limits the impact of these findings. Specific concerns are numbered below:

      1) First, while the optogenetic inhibition of LHb via ArchT selectively during the cue confirms the pharmacological observations in the preceding experiments, the use of that approach did not significantly extend those observations. Other controls such as a neural stimulus (CS-) or equivalently-applied optical inhibition during the inter-trial interval may have provided insights into the selectivity of the manipulation on the stability of the fear memory beyond that observed in the pharmacological approach. Adding to this, it would also have been of value (particularly with the optogenetic approaches where this would be quite straightforward) to explore some of the encoding vs retrieval vs expression distinctions that the LHb may contribute by providing stimulation/inhibition selectively during memory retrieval/expression in the 24h/Day7 test days.

      2) The authors comment on the potential circuit-related contributions of LHb to portions of the amygdalo-hippocampal fear system, which would be of tremendous interest, yet without some isolation of these pathways in their approach, the authors are correct that these predictions would be largely speculative.

      3) The use of optogenetics in the final study was quite unorthodox and I am not sure I found it entirely convincing as an approach to understand contextual representation via chronic optical stimulation. The utility of optogenetics should ideally derive from its temporal specificity, and as such, non-specific pulses applied throughout the session would take away from that core strength. Indeed, it seems to me that were the authors particularly invested in this chronic stimulatory or inhibitory approach intersecting with a vector-based targeting that DREADDs would likely present a superior option for these populations. Building on my last comment, this approach would also gain value from being able to target selected populations (e.g., hippocampal or DRN projections) via intersectional strategies.

    3. Reviewer #1:

      The manuscript by Sachella examines the role of the lateral habenula (LHb) in learning to associate a context and a cue with an aversive event. The methods use pharmacological and optogenetic modulation of LHb function. The data show that inactivation of the LHb impairs contextual fear conditioning (CFC) as well as cued fear conditioning (when testing occurs in a novel context). The disruption in context but not cued FC is also obtained when testing occurs in the context of conditioning (A) 7 days after training but the deficit in both is evident when testing occurs 21 days after training. Overall, similar results are obtained with cue-specific optogenetic inhibition using ArchT and more sustained optogenetic excitation across the entire training session with oChiEF. Finally, exposure to the context and tone 24hrs prior to the test rescued cued but not contextual fear.

      The present paper provides an interesting set of studies looking at the role of the LHb in fear conditioning. There are many strengths to the paper. The variation in testing and training conditions is great. It allows to examine memory to the conditioning context when it is the only stimulus the animals learn about, as well as to examine the memory for the cue when tested in a novel context in the absence of influence from the conditioning context (i.e., cue test in context B), as well as in the context of conditioning (i.e., context A). This allows the authors to rule out overshadowing as an interpretation. For example, the LHb-inactivated animals do not present an augmented case of overshadowing in the cued and contextual fear training conditions. If that was the case in the CFC alone experiment, LHb inactivation would not have disrupted learning, but it did. Further, if the LHb had a specific role in summation of context and cued fear (this could account for the data in Fig 3 as ceiling levels could mask performance differences in 3B), then it would not modulate contextual and cued FC when examined independently (Fig 1 and 2). The authors allude to this briefly in line 226. Other strengths of the manuscript include excellent anatomical controls.

      Despite the strengths, there are a number of weaknesses that need to be addressed. The major one, I believe, lies in the necessity for additional data to support the conclusions. Although there are a lot of data presented in the manuscript, together they are not a convincing set that speaks to one interpretation. Specifically, the idea that LHb inactivation/stimulation leads to weakening of the memory strength is interesting, but it also requires additional investigation to show that under conditions when the CFC is strengthened, LHb inactivation has a less devastating effect. Further, the authors concede on line 252-253 that more experiments are needed to determine whether LHb inactivation disrupts the associative or representation components of CFC. I agree but feel this should have been done in the present paper instead of the reconsolidation studies which are also incomplete. The authors argue 'under inactivation of the LHb, a cued FC memory is formed whose retrieval depends on the context in which the cue is presented'. However, the disruption of contextual fear makes this interpretation difficult to accept. If the correct context is needed for cued fear to be expressed then this suggests either a possible generalization decrement effect that is ameliorated by being placed in the same context or a context-gating effect. Both require some knowledge of the context where the cued fear learning occurred. Yet, this is difficult to reconcile with the consistent disruption in context fear.

      The reconsolidation experiments, although interesting, lack clarity and the vehicle controls. A systematic investigation of exposure to the conditioned context or the conditioned cue (in context B) on fear to the conditioned context, the cue and both would help dissociate how retrieval-based reconsolidation acts in the current preparation. This may warrant an independent investigation/publication.

      Some other arguments that I didn't find convincing: The equivalent reduction in exploration in the OF for the vehicle and muscimol animals is argued to suggest that similar contextual representation are formed between the groups and therefore the CFC differences are unlikely to be due to deficits in context encoding. The OF data are insufficient to argue this. Many aspects can modulate activity in the OF from the traditional anxiety argument (here similar reduction in anxiety) to a sense of familiarity. There is no evidence for similar contextual encoding.

      Some additional comments:

      The way the 24hr and 7d data are presented is a little odd. While the authors justify this, it seems strange from the reader's perspective to see the 7d test data before the 24hr test data. In addition, the 24hr tests data are referred to as long term memory, which can be perceived as odd relative to the longer 7d test. This section just needs to be revised for clarity in the presentation.

      Does the difference in cued fear at the 24hr interval persist if conditioning differences are used as a covariate in the analysis and if a difference score is calculated from the baseline difference?

    1. Reviewer #3:

      Verhelst and colleagues presented an interesting work about fibre-specific laterality of white matter in left and right language dominant people. A new fixel-based approach was used. Two main results were reported. First, extensive areas of significant lateralization were found in white matter, and second, a cluster of fixels in the forceps minor showed significant differences between people with the left and right language dominance, but no differences were found in other white matter tracts, including the arcuate fasciculus, which is sometimes considered to be relevant to the language lateralization.

      The authors suggest that the lateralization of language functioning and the arcuate fasciculus are driven by independent biases, and that the relationship between forceps minor asymmetry and language dominance could be of interest.

      1) Arguments against traditional fiber tractography and DTI-derived metrics. I agree with the authors that it is a great advantage of the fixel-based approach to investigate fiber-specific effects. But some arguments in the current paper seem to be misleading, and are not very convincing. For example, The authors wrote that "it has been established that streamline counts from fibre tractography do not represent an appropriate metric to quantify white matter connectivity (Jones et al., 2013)." In some rare cases, this could be correct, but I don't know robust evidence that could support this absolute statement. No empirical data was found in either the present paper or the cited paper, and relevant discussions were mainly on the usage of the term e.g., 'streamline count' and data interpretation. It may be still fair to assume a monotonic relationship between 'streamline counts' and the actual white matter connectivity. The authors may want to further clarify this point.

      A similar problem exists for the argument against DTI-derived metrics. It reads like that we should never use DTI-derived metrics in future studies as 'crossing fibers' widely exist in the brain, and that the DTI model could not provide (as) 'reliable and informative results' (as the fixel-based method). My understanding is that these different approaches/metrics could reflect complementary aspects of white matter fibers. I didn't find relevant data or discussions e.g., about the relationship between DTI-derived metrics and the three metrics in the fixel-based analysis (i.e., FD, FC, and FDC). Actually, if the DTI-derived metrics could reflect unique aspects of white matter, the non-significant results in FD, FC and FDC (e.g., in the arcuate fasciculus) could not simply suggest that no differences in every aspect of one white matter tract. Let alone that there are many other metrics that describe regional properties of white matter. Even so, the authors suggested independent biases repeatedly in the text based on the non-significant results in the arcuate fasciculus.

      In addition, it reads strange that, while traditional approaches were simply considered not useful in the Introduction, in the Discussion the consistency with previous results based on these traditional approaches was used to support the current findings. This makes me curious what unique information we could get from the fixel-based approach. Each metric has its own advantages and limitations. I agree that the fixel-based approach could provide great advantage in describing fiber-specific effects. A fair discussion is better for readers to understand the results.

      2) Arguments against traditional laterality index. The authors spent several paragraphs to support their proposed log-ratio laterality index. Their main point against the traditional laterality index is that the traditional index lacks additivity property. While I agree that the log-ratio is a potential approach for laterality studies, it seems that such an additivity property is not necessary for the laterality index. The main reference cited is an old paper from Tornqvist et al., (1985), which focused on relative changes, rather than laterality. In this reference, a relative change index H is considered as additive if and only if H(z/x) = H(y/x)+H(z/y) in a two-stage change: x-->y-->z. But for laterality study, it seems not to be in this case. Only left (i.e., x) and right (i.e., y) quantities are used for characterizing laterality, but without the third quantity (i.e., z). The additivity property seems to be meaningless in the context of laterality calculation. Further clarification is needed.

      In addition, the authors mentioned that the traditional laterality index is 'bounded and therefore lacks the additivity property'. The authors may want to further explain the reasoning behind this statement.

      Finally, although a non-linear relationship between the log-ratio index and the traditional index was showed in the Appendix X, but within the commonly observed range of laterality effect size (i.e., from -0.5 to 0.5 based on the results from this paper), the relationship is almost linear (see Figure 5). Particularly for the most widely used formula (R-L)/((R+L)/2), the results are almost identical to the log-ratio values. Based on this, I guess that if the authors used this traditional laterality index, they would get exactly the same results.

      The traditional laterality index e.g., (R-L)/((R+L)/2) is widely used, which also makes results comparable across studies. This further makes me doubt the necessity of promoting a new laterality index while it does not provide additional information. Back to the beginning, my comments were based on the assumption that the additivity issue is not a problem for laterality studies. The authors may want to clarify.

    2. Reviewer #2:

      Verhelst et al. used a multishell tractography (b-value: 700/1200/2800) fixel-based analysis, to map white matter lateralisations relevant for language dominance in a sample of left-handed healthy volunteers (n=23 right hemisphere dominant and n=38 left hemisphere dominant as per fMRI word generation task). The authors show "lateralisation" in the anterior corpus callosum as the main white matter difference between their two groups.

      While this manuscript is methodologically sound, the lack of novel anatomical, cognitive or clinically-relevant conclusions limits its scope (i.e. the arcuate finding is not novel and the callosal finding is not explained in the context of language dominance). The authors raise several interesting points about the common practice in the field (e.g. calculation of lateralisation index, clinical lesion flipping) and challenge them in this manuscript. But without further in-depth discussion, the current results will not be impactful in the field of clinical-anatomical studies.

      Overall, this study is data-driven methodological rather than hypothesis-driven, which leads to a lack of a rationale in the manuscript or a comprehensive embedding in the white matter literature. For example, it has been previously shown that there is no direct linear relationship between the lateralisation of the arcuate fasciculus and handedness or language dominance (e.g. PMID: 32707542, PMID: 32723129, PMID: 29666567, PMID: 27029050, PMID: 29688293 amongst others). The dataset available in this manuscript is of interest, however, and further analysis should be conducted to study the extended white matter network of language in more depth given the ubiquitous findings of alterations mentioned in the results.

      How did the authors determine the fixel clusters as designated white matter tracts (such as the arcuate, uncinate, etc)?

      The authors praise their fixel-based analysis over the use of previous tensor-based models. Some previous studies have also employed advanced tracking algorithms with varying possibilities to map fibre-specific indices or resolve crossing fibres and their uses have been compared (e.g. PMID: 31106944, PMID: 25682261, PMID: 30113753 amongst others). with the advancement of current algorithms many improvements have been achieved which does not categorically negate previous findings, especially when they were shown to be meaningful for cognitive or clinical applications.

      The authors further discuss the "lateralisation" of the forces minor. This terminology I do have an issue with as this is a commissural connection that cannot per se be lateralized. A difference between both hemispheres can, however, possibly be seen in terms of the asymmetry of the callosal projections. This result needs a lot more explanation and warrants an extensive discussion especially in the light of language processes.

      Overall, the anatomical descriptions should be clearer. For example, when the authors mention the "anterior part of the arcuate fasciculus" do they mean the anterior segment or any frontal lobe projections of this pathway?

    3. Reviewer #1:

      The paper tackles an important aspect of neuroanatomical and language research concerning the lateralization differences related to functional lateralization of language. No clear cut results are currently available nowadays and methodological limitations of previous approaches are here addressed with a new type of analysis. Despite this new angle in the tractography analysis is of interest, the differences in the tasks that are used to address language lateralization are also as important. This may also explain possible differences in previous studies and also with the current one. This aspect seems to be missed in this work.

      Although the Letter fluency task implies the use of language, this task is commonly considered in neuropsychological assessments as an executive function task. A more appropriate task would have been a Semantic Fluency task or as in previous work (Vernooij et al 2007) a verb generation task. There is a close relationship between executive function and many aspects of language production, there is not doubt about this. But this does not mean they are the same. Actually the Forceps minor has been found to be associated with individual differences in executive functions in language function (Mamiya et al 2018; Farah et al 2020). This is a limitation of the study and should be acknowledged since the results may differ with a more purely linguistic task, limiting the scope of the study and its conclusions in terms of language lateralization. I do believe the data are worth publishing and the methodological approach is novel but the reader should be clearly aware of the limits in terms of the conclusions the authors can draw from the selection of the sample that may correspond to lateralization of executive function for language more than language lateralization per se.

    1. Reviewer #1:

      The work by Pipitone et al. is a very carefully performed and technically sophisticated elucidation of the establishment of the thylakoid membrane system in Arabidopsis chloroplasts upon first illumination of cotyledons. Its charm is the three-dimensional resolution during a time course that allows it to follow the rapid changes occuring during the short time window in which the greening occurs. In addition, the authors included proteomics and lipidomics approaches complementing the morphological observations by sound molecular data. All together the study provides a very detailed catalogue of the processes that trigger chloroplast biogenesis that is highly useful for the community as it provides important numbers of size and development.

      Improvements:

      Actually the work has been performed very carefully and there is not much to improve.

      The introduction could contain more references (e.g. lines 77, 83, 90, 93, 98,, 131, 132)

      SBF-SEM should be spelled out at first mentioning (line 146) and maybe a bit more background about the technology would be helpful for the reader to understand it.

      Line 244: The occurrence of starch granules is of course caused by the continuous illumination. It however may also have an impact on the final size of the plastid. It would be interesting to know whether chloroplasts at the end of a night phase are smaller than at the end of a light phase. This is not mandatory for the current manuscript but an interesting question to follow in future and maybe to be discussed.

      Line 251: The surface area.... please define what is meant since membranes have two sides.

      Lines 256-261: There is another study done in cell culture that has a similar design (Dubreuil et al ), are the two studies compatible with each other in their conclusion and if not, what are the differences?

      Lines 549-551: This sentence is not perfectly clear to me. Maybe the authors can explain this a bit more in detail using examples.

      Lines 564-573: I think it is worth noting that the interactions between PSII complexes located in neighbouring thylakoid membranes trigger the stacking of the grana. It is therefore tempting to speculate that stroma lamellae are established first and that these membranes are then stacked after PSII complexes are inserted into the membrane because they provide the adhesion points between them.

    1. Reviewer #2:

      This is a longitudinal aging study of the physiological changes in a specific Drosophila neural circuit that participates in flight and escape responses. To date there have been few examples of longitudinal aging studies looking at the vulnerability or resilience of neurophysiology at the resolution presented in this study. The analyses have revealed different trajectories for individual neural components of the studied behaviors during aging. The study also reveals different sensitivities of neural components to stressors that are known to alter lifespan (temperature, oxidative stress). The study is well-written and the experiments are performed at a high level. A concern is that the study is highly descriptive and provides very little mechanism to explain the differences in the vulnerability or resilience of neural functions observed. In addition, the authors present little evidence other than lifespan to support their interpretation of the effects of the experimental conditions at the cellular level.

      Major Critiques:

      1) Overall, the study is highly descriptive and there is a lack of experiments aimed at understanding the cellular effects of aging on neural function.

      2) There is a lack of supporting data or discussion about the expected cellular mechanisms of the high temperature manipulations or SOD mutants. While it is true that both of these manipulations shorten lifespan, their relationship in the natural process of aging remains controversial. The ability to extend the resilience of the neural components studied by a manipulation that extends lifespan would be very supportive (i.e. diet, insulin signaling mutants).

      3) The data from the current study demonstrates that the major effect of SOD mutants on neural function and mortality exists in newly eclosed animals suggesting significant issues during development in SOD mutants. This complicates the comparison of this condition to the other conditions or even considering it a manipulation of aging. The authors should also consider showing that the effects on neural function by SOD mutants is mimicked by other conditions that alter ROS more acutely such as paraquat exposure or test mutations in insulin signaling (i.e. chico) which have been shown to increase antioxidant expression.

      4) The authors contend that the changes in neural function, particularly in regards to seizure susceptibility, provide indices for age progression. It is unclear to this author how these neural functions described in this study, including the appearance of seizures, contribute to lifespan of the flies. One could imagine that changes in flight distance or escape response could contribute to lifespan in the wild, but do changes in flight, jump response, or seizure susceptibility have any bearing on the lifespan of flies in vials? Why would seizure susceptibility be predictive of mortality? In addition, the assays presented here utilize experimental conditions (intense whole head stimulation) that are seemingly non-physiological so it is unclear what the declines represent in a normal aging fly. The authors need to discuss this.

      5) There are no experiments aimed at understanding the cellular or molecular nature of the functional declines presented.

    2. Reviewer #1:

      The study by Lyengar et al describes age- and temperature-dependent changes in the neurophysiology of the giant fiber (GF) system in adult wild type and superoxide dismutase 1 mutant flies (SOD[1]). While the main GF circuit and downstream circuits exhibit little change when flies are reared at 25C, GF inputs and other circuits driving motoneuron activities show age-dependent alterations consistent with earlier studies. Rearing flies at 29C temperatures had no additional effects except that age-dependent progression of defects were accelerated, as it was expected from previous studies. In SOD[1] mutants, which are short lived, changes in the neurophysiology of the GF system were different from those induced by high temperature.

      Overall this technically challenging, and well executed study provides a nice description of the effects of aging, high activity (induced by higher temperature), and loss of SOD function on the neurophysiology of the GF system. However, most of the described effects have been observed in other systems and are thus not entirely novel. Moreover, the study does not provide any insight into the mechanisms underlying the age-dependent alterations of the examined neurons. Thus, the overall significance of the described findings is limited.

    1. Reviewer #1:

      This manuscript compares the effects of a novel versus a classical augmented acoustic environment protocole on partial improvement of congenital hearing loss. The new protocol is based on the idea that temporal structure, and in particular auditory gaps in the augmented environment should improve perception of temporal features in sounds, in particular of auditory gaps.

      Technically sound, the study describes how the encoding of gap in the auditory midbrain (inferior colliculus, IC) of a mouse hearing loss model is affected by the novel temporally enriched paradigm with respect to control mice and to the classical paradigm. The study clearly confirms that augmented acoustic environments improve spectral tuning, and detection of sound features with respect to control animals in IC. IC neurons also appear to show a more robust increase of sensitivity to amplitude changes (onsets and offsets) when the animals have gone through the temporal augmented sound environment, both in the presence and in the absence of background noise, as compared to the classical paradigm, at least if one considers the magnitude of the effects with respect to control. However, only few measures show a significant difference when directly testing between the classical and the temporally enriched paradigm. Thus, there is an overall impact of the temporal paradigm which is worth emphasizing as a small but likely useful increment of the auditory enrichment approach for improving hearing loss. This is a definitely interesting, even if somewhat expected result which could drive further studies on clinical practice. It seems however too specialized for broader readership. A few things in the presentation of the results could be improved, and behavioral data could eventually reinforce the message although it is not mandatory to make these results interesting :

      1) A figure of the auditory enrichment setup would be nice, to better understand how this works. Are mice constantly submitted to the sounds? Are control mice in a more silent environment than normally housed mice?

      2) The lack of behavioral data opens the question whether IC changes have actually an impact on perception. Although it is likely, it would be interesting to measure the magnitude of this impact.

      3) What makes the study interesting is the tendential bias in favor of the temporal paradigm with respect to the classical one. This is however rarely significant in one to one comparisons for each sensitivity measure. To reinforce their point the authors could consider a multivariate statistical analysis (e.g. two way ANOVA) to show that over all their measures there is a significant improvement with temporal against classical.

    1. Reviewer #2:

      The authors show that neonatal LPS (nLPS) treatment is associated with downregulated PFC levels of ATPase phospholipid tranporting8A2 (ATP8A2) that is associated with elevated IFN in serum and PFC and blocked by an IFN blocking antibody. Antibody treatment marginally antagonized effects of nLPS to cause depressive-like behavior, but was ineffective when females alone were examined.

      This paper adds to a long list of publications reporting alterations in a number of diverse signaling molecules after nLPS treatment. Strengths are that it is generally well done, with appropriate attention to experimental design (eg litter effects) and statistical treatment. However, while the down regulation of ATP8A2 is indisputable, a major weakness is that there is no functional relationship revealed between this and any subsequent behavioral, anatomical or physiological alterations. While the possible role of IFN in causing the increased depressive-like behavior is of some interest, the data here are not convincing. Furthermore, while other work has reported extensively on sex-specific alterations in behavior after nLPS, the behavioral analysis here ( FST, TST) is rather limited.

      1) There is little justification for reverting to the non-alpha corrected LSD test when the Tukey does not show significance.

      2) The extensive literature on the effects of nLPS is only superficially reviewed.

      3) The direct involvement of ATP8A2 in any behavioral or functional outcomes should be tested.

      4) How does IFN cause down regulation of the ATP8A2?

      5) Other behavioral alterations should be tested such as open field that are less stressful than FST or TST.

    2. Reviewer #1:

      This report makes a logical connection between depressive-like behaviors induced in mice following LPS-injection to mimic bacterial infection and the down regulation of phospholipid transporting enzyme, ATP8A2, in the prefrontal cortex. The intermediary is IFN-gamma. The work is quite convincing that LPS down regulates ATP8A2 by upregulating IFN-gamma and that this has some limited effects on behavior. However, the impact of the findings is limited by several factors.

      1) The use of FST and TST as measures of depression is increasingly falling out of favor as there is no face validity to humans. It is understood that these tests have been long in use and were in the past considered the best measures of "depressive-like" behaviors in mice but the field has moved on to much more relevant constructs such as social defeat, anhedonia etc. As it stands the behavioral analysis here is limited and the effects are modest at best.

      2) The use of LPS as a model to induce depression also has limitations. The injection paradigm used is likely to have caused massive inflammation, as evidenced by the increase in cytokines, but what this is modeling is unclear and how the impact would be specific to depression later in life is equally unclear. Indeed, the references the authors cite for the LPS regime they use offer completely different mechanisms and impacts of the inflammation. This is not to say the current findings aren't important, they are, but rather this pathway may be one among many that is invoked following massive inflammation during early development which then has many non-specific effects.

      3) There is no functional connection between down regulation of ATP8A2 developmentally and adult neural activity. Clearly a membrane phospholipid transporting enzyme is important, but exactly how it is important here, meaning what enduring impacts there are on neuronal function, is unknown.

      4) The experiments were designed to test the relationship between IFN-gamma and ATP8A2 but then conclude that the behavioral effects are mediated by this connection. There could be many other effects of IFN-gamma that are not considered here but would be nonetheless blocked by the neutralizing antibody approach used. Thus the main conclusions of the manuscript are not supported in terms of the role of ATP8A2 in LPS-induced depression.

    3. Summary: Both reviewers felt that the work was well done and quite convincing that LPS down regulates ATP8A2 by upregulating IFN-gamma. This is a novel and interesting finding. But both reviewers also agreed that there is insufficient evidence causally connecting the changes in ATP8A2 to behavior, and that the behavioral tests used are not sufficient to draw rigorous conclusions regarding depression-like behavior. Combined, these weaknesses lessen the impact of the findings for the field.

    1. Reviewer #3:

      This manuscript presents data in support of a model whereby neurons harboring a YAC bearing 128 CAG repeats of the Huntingtin protein show disrupted Ca2+ handling via the endoplasmic reticulum in axons and nerve terminals. Unfortunately, my enthusiasm for the manuscript is relatively low for the following reasons:

      1) It is unclear at this point whether YAC-based models are really appropriate since they lack the appropriate genomic control of transcription. This may be why for example one of the stronger phenotypes, the increase in mEPSC frequency, is greatest at DIV14 and diminishes some by DIV18 and is absent by Div21. This of course is not the same trajectory of the disease impairment itself. The authors speculate that the reversal of the phenomenology with older cultures may be from degeneration but there is no data to back up this claim. There seems little reason at this point in time not to use HD knockin mice.

      2) The analysis for synapse "density" (Supplement) was only carried out at Div18, a time point where the impact of the YAC is already diminished. Unfortunately, the high degree of variability associated with measuring all possible puncta on a dendrite is not likely to easily uncover what amounts to a ~30% change in mEPSC frequency. I am not convinced therefore that the data in figure 1 cannot be explained in part by synapse density.

      3) The underlying physiological perturbations driven by the YAC are deciphered almost entirely using pharmacological approaches, many of which are in themselves ambiguous in interpretation. Ryanodine is a complex drug as it potentiates receptors at low doses and blocks at higher doses. Confounding all of this is the fact that the literature has incubation times that span tens of minutes to hours (and not specified in this manuscript). I was disappointed that the authors did not at least repeat the pharmacology experiments with different aged neurons (DIV14, 18, 21). If disrupted ER Ca or RyR function lies at the basis for the change in spontaneous exocytosis, the pharmacology experiments should at the very least track this phenomenology. Similarly high/inhibiting doses of ryanodine should presumably lead to opposite effects, and this at the very minimum should have been done in the control and YAC neurons.

      4) The reported changes in resting Ca2+ are highly suspect. The use of ionomycin should drive the sensor to saturation, and then from the saturated value and knowledge of the dynamic range of the probe, affinity constant, and the Hill coefficient, one can extrapolate back to what the resting concentration is. This has been done with GCaMPs in the past and predicts resting values in the 100-150 nM range (in broad agreement with many previous Ca measurements in live cells). In the experiments here the ionomycin never convincingly reaches saturation, as the response merely rises and recovers making the data uninterpretable.

      5) The central problem with the approach here is that there is a lot of inference with what happens to ER Ca2+ in the YAC cells but no direct measurements were made. There are a number of genetically-encoded probes that have been used in the last 5 years to examine the ER Ca in neurons (CEPA1ER, ER-GCCaMP-150, D1ER), and experiments using one of these probes should be done to inform the science here.

      6) The experiments claiming suppression of AP-evoked release are very difficult to interpret as there is no control over the stimulus itself. The authors simply rely on removing TTX to let APs fire randomly, something that will be driven significantly by network density, synaptic connectivity, and the balance of excitatory versus inhibitory drive in the cultures. The authors should simply study evoked release by stimulating the neurons expressing physin-GCaMP6m directly and examining the response sizes in YAC versus control neurons.

      7) iGlusnFr is a potentially powerful tool to assess glutamate release, but to be interpretable it too needs to be treated in a quantitative fashion. The size of the signal will be proportional to the fraction of GlusnFr present on the cell surface and the amount of glutamate released. If for some reason expression of the CAG repeat led to a smaller fraction of expressed sensor reaching the surface of the neuron, this would artificially lead to changes in apparent DF/F. In order to use this probe in an interpretable fashion the authors need to carry out experiments whereby they correct for the surface fraction of the probe across experiments.

      As it stands, this manuscript reports largely hard to interpret phenomenology owing to the narrow tool kit they have applied to the problem (mostly pharmacology and inference).

      Other important details:

      • There is no mention in the methods (or anywhere else) regarding the temperature of the experiments.
      • A more meaningful graphical representation would be showing median +/- IQR rather than mean +/- SD.
      • It would be helpful to show the effects of inhibition of RyR on WT (confirm ability to decrease mEPSC by inhibiting RyR) and YAC128 (additional proof that RyR contributes to YAC128 pathology).
      • The data on single bouton physin-GCaMP6m need to be extracted for all boutons and then reported as fraction of boutons showing the fluctuations. As it stands, it is unclear if there is a selection bias.
      • What was the percentage decrease in iGluSnFr signal at the last time point?
    2. Reviewer #2:

      In this study, Mackay and colleagues show that resting calcium levels are increased in axons of neurons derived from YAC128 mice, a Huntington Disease model expressing full-length mutant Huntingtin with 128 CAG-repeats in a yeast artificial chromosome. This increase in baseline calcium signaling is due to continuous leak of calcium from the ER that leads to increased spontaneous neurotransmission and reduced evoked neurotransmission. Overall, the manuscript thoroughly documents a clear example of inverse regulation of spontaneous and evoked glutamate release in a well-established monogenic neurological disease model. Moreover, the authors link this observation of dysregulation of calcium release/leak from presynaptic endoplasmic reticulum. I have some relatively minor comments that may help improve this work.

      1) While the authors nicely document and interrogate the relationship between resting axonal calcium signals and spontaneous release, the impact of dysfunctional ER calcium signaling on evoked release is not causally linked. For instance, it would be nice to show that buffering excess baseline calcium (EGTA-AM?) can equilibrate the difference in evoked release phenotype between wild type and YAC128 neurons.

      2) Figure 7: The authors state that evoked glutamate release is reduced in YAC128 neurons, can they show this? i.e. a bar graph with the absolute values of iGluSnFR amplitudes.

      3) Minor: Figure panels are labeled with small letters in the figures but with capital letters in the main text.

    3. Reviewer #1:

      Mackay et al. present a study on the phenotype of neurons from YAC128 mice, an HD model expressing mHTT with 128 CAG repeats. They show (i) that cultured cortical YAC128 neurons exhibit increased mEPSC rates transiently during development in vitro (i.e. between DIV14-18 but not at DIV7 or DIV21), (ii) that calcium release from ER by low-dose ryanodine increases mEPSC rates only in WT but not in YAC128 cells, and (iii) that blocking SERCA to deplete ER calcium stores reduces mEPSC rates in YAC128 neurons as compared to WT controls. These data are interpreted to indicate that a presynaptic ER calcium leak increases mEPSC rates in YAC128 neurons. Using rSyph-GCaMP imaging, the authors then show (i) an increase in longer-lasting AP-independent calcium signals in synaptic boutons of YAC128 neurons as compared to WT, (ii) less profound increases in calcium signals upon ionomycin-mediated equilibration to 2 mM extracellular calcium, (iii) less profound increases in calcium signals upon caffeine treatment in YAC128 boutons, and (iv) less AP-related calcium events in YAC128 boutons. A final dataset shows that evoked synaptic transmission in YAC128 striatum as assessed by iGluSnFR imaging is inhibited by ryanodine in WT but not in YAC128 mice. The authors conclude that the overexpression of mHTT with 128 CAG repeats in the YAC128 mutant causes aberrant calcium handling (i.e. calcium leak/release from the ER), which leads to increased cytosolic calcium concentrations, increased AP-independent release events, but reduced AP-evoked glutamate release.

      Comments:

      1) I think the authors show convincingly that (presynaptic) calcium handling is perturbed in YAC128 cortical presynaptic boutons. What is conceptually unclear to me at the outset is whether this specific phenomenon is related to HD pathology. The phenomenon is transient during the development of cortical neurons in culture and gone at DIV21. In contrast, the first subtle behavioural defects of YAC128 mice arise at about 3 months of age, overt behavioural defects at 6 months of age, and striatal and cortical degeneration still later.

      2) The issue discussed above (1) could have been addressed in part with the slice experiments, which were conducted with tissue from 2-3 months old mice, but the corresponding data are too cursory at this point. They indicate a small defect in evoked glutamate release in the YAC128 model, but it is unclear whether mEPSC rates are altered. It seems important to test this as the increased mEPSC rates are proposed to be at the basis of the phenotype described in the present study. Indeed, the authors ultimately conclude that the YAC128 mutation causes increased mEPSC rates at the expense of evoked glutamate release. This is generally unlikely to be true as the mEPSC rates in question are very likely overcompensated by the vesicle priming rate.

      3) The phenomenon of altered calcium handling in YAC128 neurons is shown convincingly. However, this finding is not unexpected given that previous studies indicated such increased calcium release from endoplasmic reticulum in HD models in other subcellular compartments, and it remains unclear how this defect is caused by the mutant HTT.

      4) As already outlined above (2) it remains unexplained how the calcium handling defects increase mEPSC rates but decrease evoked transmission. The corresponding part of the discussion reflects this uncertainty. This is aggravated by the fact that several of the drugs used have complex dose-dependent effects that cannot easily be reduced to specific effects on calcium handling by the ER. For instance, it is unclear whether caffeine effects on adenosine receptors or PDEs have to be taken into consideration. In general, the sole reliance on partly 'multispecific' pharmacological tools is a bit worrisome.

      5) There are several other aspects of the paper that are not immediately plausible. For instance, I have difficulties to understand why a calcium transient minutes before ionomycin treatment would affect the calcium signal triggered by ionomycin in the presence of 2 mM extracellular calcium (Figure 4); after all, the example trace shows that the calcium levels return to baseline within seconds. And more generally, in this context: Can differences in calcium buffers and the like be excluded? A direct assessment of absolute cytosolic calcium concentrations would be the ultimate solution.

      Overall, the present paper describes a phenomenon in presynaptic boutons of an HD model, key aspects of which (e.g. increased ER calcium handling defects) have been described in other subcellular compartments of HD models. The connection of this phenomenon to HD is unclear as the developmental timelines of the appearance and disappearance of the cellular phenotype and the disease progression do not match. The opposite phenotypes caused at the level of presynaptic boutons on AP-independent and AP-dependent release remain disconnected. The mechanism by which mutant HTT causes these defects remains unexplored. The pharmacological tools used do not always allow unequivocal conclusions regarding the targets affected. I think some more work is needed to generate a clear picture of what exactly happens presynaptically in YAC128 neurons, and to show how this might relate to HD.

    4. Summary: As you can see from the detailed reviews appended below, we acknowledge that a link between aberrant presynaptic ER-calcium handling and HD pathophysiology, as indicated by your data, is clearly interesting. On the other hand we identified a number of critical issues that must be addressed in our view. These include the important conceptual issue of the mismatch between the time courses of disease progression in the YAC128 model on the one hand and of the phenotype development reported in your paper on the other. A more detailed analysis of slices taken from older mice would have helped to resolve this problem. In addition, there are several issues that concern the experimental data and methodology. Among the latter are the following:

      (i) The study almost exclusively relies on pharmacological tools, many of which are multispecific and/or have complex effects that would require additional stringent controls.

      (ii) The key experiment assessing resting calcium levels using GCaMP6-M and ionomycin treatment is problematic as the signal does not saturate in the presence of ionomycin, which prevents a reliable interpretation of the data.

      (iii) Direct measurements of ER calcium are required to support the notion of aberrant presynaptic ER-calcium handling in the HD model.

      (iv) The effect of the YAC128 mutation on AP-evoked transmitter release is difficult to interpret as the corresponding experiments do not involve a direct control over APs. Experiments with direct stimulation of GCaMP6 expressing cells are required, and additional experiments to 'rescue' the mutant effect by buffering calcium would be extremely informative to bolster the general conclusions.

      (v) In order to use the iGluSnFR in an interpretable fashion, experiments need to be carried out with a correction for the surface fraction of iGluSnFR across experiments

    1. Reviewer #3:

      The authors have conducted a very challenging study. The paper is clearly written and the topic of neural function under anesthesia is interesting. However, a significant limitation is that many of the analyses presented here do not provide clear insights into the processes the authors are studying.

      -A key issue is that the authors aim to predict who is more or less sensitive to general anesthesia. However, each individual subject was given a different target plasma concentration of propofol, based on clinical scoring. So any difference in behavior may reflect different dosing rather than different behavioral sensitivity to a particular drug concentration.

      -The interpretation of increased functional connectivity is challenging in the context of anesthesia, which modulates vessel dilation and systemic physiology. These analyses would benefit from additional information about the fMRI signal characteristics, e.g. amplitude and physiological signals.

      -Fig. 3 is used to portray comparisons of wakefulness vs. sedation, implied in the text, but does not include direct statistical tests of the difference between the two conditions, and contrasting p<0.05 with p>0.05 does not indicate a significant difference. The suggestion of reduced cortical responses to auditory stimuli makes sense given that the participants are sedated, but the analysis does not seem to provide information about which aspect of auditory processing is modulated by sedation.

      -The statements about response time not being mediated by age may reflect an underpowered study, as age is a strong modulator of anesthetic sensitivity and one group has an n=6.

      -While many interesting MRI studies can be done with quite small n, depending on the question being asked (e.g. Midnight Scan Club, high-resolution individual studies), this study aims to conduct structure-based predictions of individual differences in behavior. This type of analysis requires more than the n=6 slow responders used for Fig. 5, as there are many other features that likely vary in a group this small. I appreciate that the authors have conducted a very challenging study, and it is not easy to collect more data, but while many interesting analyses can be done on this type of data, this is not an appropriate sample size for assessing GMV-individual differences associations. Larger samples sizes or within-subjects analyses are needed for robust GMV effects.

      -Cluster correction method in 'Analyses of fMRI data' should be specified (and checked, Eklund et al.). The precise statistical method used to assess FDR corrected activity correlations with individual subject response times is not clear; it seems that the ANOVA resulted in non-significant results that are nevertheless being reported as differences using Hedges d?

      -The presented evidence does not sufficiently support the authors' conclusion that they "provided very strong evidence that individual differences in responsiveness under moderate anaesthesia related to inherent differences in brain function and structure within the executive control network, which can be predicted prior to sedation.". I would commend the authors on their interesting and challenging experiment, and recommend refocusing the analyses.

    2. Reviewer #2:

      In this study, Deng and colleagues have sought to assess the neural correlates of individual differences in responsiveness variability across wakefulness and moderate levels of propofol-induced anaesthesia. In addition to resting state scanning and an auditory story task, the participants underwent behavioural assessments including memory recall and a target detection task. Furthermore, the auditory story task was independently rated by a separate group for its "suspensefulness". Focusing their analysis on three major large-scale brain networks, the group-level results first indicated significant differences in the between network interactions of the chosen networks across wakefulness and sedation, specifically in the narrative condition. Furthermore, during the same condition, there was reduced cross-subject correlation between wakefulness versus sedation centred mainly on the sensorimotor brain regions. Moreover, based on the responses in the target detection task, the participants were grouped into fast and slow responders which then showed significant differences in gray matter volume as well as connectivity differences in the wakeful auditory story task condition within the executive control network.

      Overall, this is a well-written manuscript. However, my initial enthusiasm about the question of interest was hampered by major theoretical and methodological concerns related to this study. Below I outline these points in the hopes that they improve this study and its outcomes.

      First and foremost, the authors state that their major interest in this study was to assess individual differences in sedation-induced response variability and its potential brain bases. Despite the attractiveness of this topic, which is undoubtedly of interest both to the academic community and the general public, I do not believe that the current study design would allow the authors to answer this question. First of all, although I completely appreciate the difficulty in recruiting participants to take part in such pharmacological studies, I do not think that a group of 17 participants is enough to be able to assess "individual differences". For this to work, there has to be a large enough sample based on adequate power calculations, keeping in mind all the spurious false positive effects that are generated by pharmacological interventions and their downstream effects on connectivity estimates (e.g. motion, global signal etc.). Second, though it is perfectly valid to carry out the initial within-group connectivity and whole-brain activity analyses for the task/rest (which I believe are the only statistically and experimentally sound sections), following these results, the authors mainly carry out multiple exploratory analyses that aim to infer what happened to 3 non-respondent participants (or 6 slow responders). This to me is closer to a case study rather than an experimental study with proper statistics. Overall this fast/slow responder split only comes as an afterthought and does not seem to be the main intention behind the study. This is at odds with the major goal stated in the introduction that the main aim of the authors was to assess inter-individual differences. As such, I do not think that the analyses highlighted by the authors provide enough evidence to support their claims. More detailed points are provided below:

      • The introduction is well-written, citing as much of the relevant literature on this topic as possible. Having said that, I am not really convinced about the justification for selecting the dorsal attention, executive control and default mode networks as the sole focus of the authors' analysis. Although it is true that there is a strong a priori basis that these associative networks play an important role in maintaining consciousness, the references that the authors refer to are equally biased in focusing their analyses on specific higher-order networks, creating a circular argument. In light of evidence highlighting the importance of sensorimotor networks in this context, as well as the balance in their interactions with associative cortices, I would argue that a whole-brain approach would be better suited. Furthermore, as indicated by the whole-brain analysis during the auditory story task, most alterations were centered on the primary somatosensory regions. This is at odds with the justification of the authors on focusing their connectivity analyses solely on associative brain networks.

      • Given the wide age range (and its potential influence on the obtained results), it would be great for the authors to provide the mean and standard deviation of age within groups, and whether the groups were age-matched (though the range seems similar).

      • The authors state that only the reaction time was measured in the auditory target detection task, but later in the results section they mention "omissions". Given that such omissions might be strongly indicative of unresponsiveness/sleep, it is unclear how one can interpret the observed brain-based effects solely from the perspective of reduced information processing (especially when the data was collected under eyes-closed conditions).

      • The authors provide a thorough description of the sedation administration procedures, which is excellent. Nevertheless, I was wondering whether the blood plasma propofol concentrations could be used to explain some of the results in individual differences or even a nuisance regressor to show that the effects were not simply driven by this factor.

      • I failed to find any information in the methods section as to why/how the authors have decided on a mean-split of the participants to fast/slow responders. Given the already small sample size, further reducing degrees of freedom by a split of 11 versus 6 participants makes it very problematic in terms of any statistical tests that can be carried out.

      • Line 441 - Results should not be reported if it did not reach statistical significance.

      • Line 448 - For the two analyses on this page the authors indicate that although in the wakefulness condition there were significant brain activity that correlated with (not "predicted") task stimulus, no significant effects were observed in the sedation condition. This absence of evidence should not be then taken as evidence of absence. In other words, such lack of evidence can be explained by a variety of factors not attributable to the effect of sedation on brain activity (e.g. simply by the fact that the participants were not paying attention to the story or falling asleep).

      • Line 484 - I do not think it is acceptable/justifiable to carry out post-doc tests, when there was no significant difference in the main ANOVA.

      • Line 503 - I am not really sure about the justification behind the assessment of gray matter volume. Besides the issues related to small sample size, the observed differences in functional connectivity may then simply be due to differences in the quality of the data that can be extracted from the defined ROIs in a subset of participants. Was this analysis corrected for age (as a continuous variable)? In any case, as far as I am aware, there is no simple relationship between gray matter volume and functional connectivity (i.e. greater/smaller gray matter volume does not necessarily mean greater/smaller functional connectivity). Hence, once cannot make the conclusion that: "These results lend support to the functional connectivity results above, and together they strongly suggest that connectivity within the ECN, and especially the frontal aspect of this network, underlies individual differences in behavioural responsiveness under moderate anaesthesia."

      • Line 509 - Again, I am not really sure about the justification behind the analysis carried out here. The authors state that the ROIs that were found in the gray matter volume analysis overlapped with a priori ROIs which they suggest explain differences observed in functional connectivity. They then select a subset of these ROIs and again show that there are differences in connectivity. This seems rather circular.

      • The authors state that "Rather, only the functional connectivity within the ECN during the wakeful narrative condition differentiated the participants' responsiveness level, with significantly stronger ECN connectivity in the fast, relative to slow responders." I apologise if I am missing something, but I do not see any evidence for such a strong claim. All that the authors have found was that there were significant functional connectivity differences in the executive control network in the wakefulness condition between fast and slow responders (which was defined and grouped by the authors themselves), with no significant effect of condition or state. I fail to understand why this one result from a multitude of exploratory analyses that were conducted was picked out as the "main finding" when one cannot make any inferences about its direct relation to sedation.

      • Overall, I would urge the authors to re-think their analysis strategy and the corresponding discussion of their results.

    3. Reviewer #1:

      Deng et al. studied the mechanisms underlying the wide propofol effect-site concentration range associated with loss of responsiveness. Data was acquired from two centers (MRI, Canada; Auditory, Ireland). This is a well conducted study. The results could also explain why older patients (with presumably smaller gray matter volume) are more sensitive to propofol. My major concerns relate to precision in language.

      1) The authors studied mechanisms underlying why patients lose consciousness at a wide range of propofol effect-site concentration. This behavioral phenomenon is known and well described (Iwakiri H, Nishihara N, Nagata O, Matsukawa T, Ozaki M, Sessler DI. Individual effect-site concentrations of propofol are similar at loss of consciousness and at awakening. Anesth Analg. 2005;100:107-10). I would suggest that the. authors position their paper as such. They did not study general anesthesia per se, and the allusions to awareness under anesthesia may not be relevant.

      2) Per comment 1 above. Please reword the intro and discussion section i.e., " Anaesthesia has been used for over 150 years to reversibly abolish consciousness in clinical medicine, but its effect can vary substantially between individuals." What type of anesthesia are you referring to? Anesthetic vapors? Please provide a reference for this statement or make it propofol specific. Awareness under general anesthesia is related to numerous factors, many of which are iatrogenic as detailed in the NAP 5 study "The incidence of awareness rose from 1 out of 135,000 general anaesthetics to 1 out of 8,200 general anaesthetics when neuromuscular blockers were used" (https://pubmed.ncbi.nlm.nih.gov/25204697/). Further, it is unclear when dreaming occurs (during induction which is reasonable to expect/during emergence which is also reasonable to expect versus during the anesthesia). My suggestion is to qualify your statements by stating that this should be further studied in the context of possible genetic predisposition to awareness (Increased risk of intraoperative awareness in patients with a history of awareness. Anesthesiology 2013;119:1275-83).

      3) The term "moderate anaesthesia" is confusing to me, and would be to most clinicians. Please cite the description of what comprises moderate anesthesia. My interpretation is that the study was about sedation. Did you mean moderate sedation? (https://www.asahq.org/standards-and-guidelines/continuum-of-depth-of-sedation-definition-of-general-anesthesia-and-levels-of-sedationanalgesia).

      4) "the antagonistic relationship between the DMN and the DAN/ECN #and# was reduced during moderate anaesthesia, with a stronger and significant result in the narrative condition relative to the resting state." Anticorrelation?

      5) The suggestion that fMRI can be used to improve the accuracy of awareness monitoring is, in my opinion, not necessary and a stretch.

    4. Summary: There was general enthusiasm for the topic of study and the general approach of using neuroimaging to study brain function under anesthesia. However, the reviewers also shared a number of significant concerns, particularly regarding whether the data which has been collected is sufficient to answer the core questions being asked (for example, whether the number of participants supports a robust individual differences analysis).

      Jonathan E Peelle (Washington University in St. Louis) served as the Reviewing Editor.

    1. Reviewer #3:

      Jack and colleagues report that SARS-CoV-2 interacts with RNA to form phase-separated liquid compartments, similar to P bodies and nucleoli, shown here as blobs. The authors then perturbed the system in numerous ways, showing that: i) different nucleic acids give rise to different blobs; ii) that protein cross-linking and mass spec suggests that the phase-separated N is in a different tertiary or quaternary conformation than the soluble N; iii) that some N domains (e.g., PLD, R2) are important for blob formation, particularly when the protein is phosphorylated (by an unknown kinase); and iv) some small molecules can affect the number and size of the blobs. Overall, this story is at a very early stage phenomenology and lacks clear demonstration of physiological relevance. Certainly, the claim that "nilotinib disrupts the association of the N protein into higher order structures in vivo and could serve as a potential drug candidate against packaging of SARS-CoV-2 virus [sic] in host cells" ought to be tested - it would be easy enough to do, though I don't think this would complete the story.

      Major comments:

      1) Figure 1 is difficult to interpret with the information provided. In panel A, the colors seem to be important, but readers are not given a clue as to what. In panel B, how were the Y axes calculated? What are we really looking at in Figs. 1C and D? Were these on glass slides? Plastic? Was the surface coated, passivized, or otherwise derivatized in any way? What kind of microscope was used? What do the white signals (blobs) come from? Is there a fluorescent label involved? Is this phase contrast? In panel D, please include a buffer only control (no protein) to demonstrate blobs are not simply a buffer artefact. Finally, what N:RNA molar ratios were used in this Figure?

      2) For the polymeric RNAs, what were the average chain lengths?

      3) In describing Figure 1, the authors state "The shapes of these asymmetric structures were consistent with remodeling of vRNPs into 'beads on a string', as observed by cryoEM." This is wishful thinking. I see blobs of different shapes, but there is no way to know whether these represent N protein "beads" on RNA "strings." Reference 6 cited in the manuscript and showing "beads on a string" model has a scale bar of 50 nm = 0.05 µm, and even there, the N:RNA complex is very obscure.

      4) My greatest concern of this work is that no information was provided about the N protein that was used for the in vitro studies. How pure was it? What steps were taken to remove co-purifying nucleic acids? Was it monodisperse? Aggregated? Please include DLS data and show silver stained SDS-PAGE.

      5) Similarly, how did the mutant forms of N (Fig. 3A) behave? Were they properly folded? Did the authors check them by CD or SEC? And what concentrations of mutant proteins were used? Without these data, the rest of Fig. 3 is uninterpretable.

      1. B. Could the authors please explain what the numbers on the Y axis are and how they were calculated. Also, their disorder prediction predicts dimerization regions to be highly disordered, would they consider a problem with the prediction method?

      7) C, D, E what is the N: RNA molar ratio?

      8) Could the authors please explain the calculation method used to calculate the % surface area covered by droplets?

      9) Fig, 4A and B. Why is [N] so low? In other experiments the authors usually used 18.5 µM, whereas here the concentration was 7.8 µM, almost invisible blobs as observed in other figures provided by the authors (and below ksat, or very close to it).

      10) Fig, 4C. What is 1.5 M N RNA? [N] is set to 57.6 µM, much higher than in Fig. 4 A-B assays. Is there a reason?

      11) Fig. 4D is missing control cells transfected with GFP only (no N).

    2. Reviewer #2:

      This paper contributes to the large number of papers currently posted on BioRxiv showing that the N protein of SARS CoV2 can undergo liquid-liquid phase separation on its own and in the presence of RNA, and that this behavior can be modulated by phosphorylation. The work here is somewhat different from much of the other work in that the authors have generated the N protein from mammalian cells. The authors have also examined the effects of known drugs on the phase separation process. Given the importance of coronavirus it is imperative to get out information on its biology. But it is also imperative that the information be correct, interpreted with appropriate caution, and of sufficient depth to be valuable to others in the field and not potentially misdirect future research and clinical efforts. In this respect, I think the authors need to clean up some of their experiments and pull back on some of their claims, as I detail below.

      Major comments:

      1) In general, the authors' use of size, number and morphology of droplets to assess the effects of small molecules in figure 4 is problematic. The authors should be measuring the effects of the compounds on the phase separation threshold concentration (of N+RNA or of salt) to see whether the compounds stabilize or destabilize the droplets. Changes in size, number and morphology can be due to many factors, many of which are unlikely to be relevant to viral assembly.

      For example, the authors report that nelfinavir mesylate and LDK378 produced fewer but larger droplets, and conclude that these compounds could disrupt virion assembly. This is problematic for two reasons. Most importantly, it is almost impossible to interpret what fewer larger droplets means. Are they nucleating more slowly and/or growing more rapidly? Are they more viscous and thus less disrupted by handling? Are they denser and thus settling more rapidly? Has the thermodynamic threshold to phase separation changed? Secondarily, because of these uncertainties, it is an overinterpretation to state based on the data that these compounds could act by disrupting virion assembly.

      The class II molecules, which increase both size and number of droplets, are probably more relevant, since concomitant increases in both probably mean that the threshold concentration for LLPS has decreased, and thus the compound has stabilized the droplets.

      The changes in morphology induced by the class III molecules are also hard to interpret. Does the change reflect greater adhesion to and spreading on the slide surface (probably irrelevant to drug action)? Or changes in droplet dynamics--slowed fusion or increased viscosity? What does it mean that nilotinib causes the morphology of N+RNA condensates to become filamentous, and could this same effect account for the (modest) decrease in N protein foci in cells upon drug treatment?<br> I honestly am concerned that the authors conclude the paper urging use of nilotinib in clinical trials, and the effects of drugs on phase separation as a proxy for vRNP formation, based on these very thin data.

      2) In Figure 1 (and beyond), it is not good practice to use fractional areas of droplets that have settled to a slide surface to quantify droplet formation in LLPS experiments. Droplets fall to the slide surface at different rates depending on their sizes, which in turn depend on many factors, some biochemical (the relative rates of nucleation and growth; density; all of which can vary with buffer conditions) and some technical (exactly how the sample was handled). Turbidity, which also is imperfect, is nevertheless a more reliable measure; so is microscopic assessment of the presence or absence of droplets. The authors should provide at least some additional measure in these initial experiments to show the numbers obtained from the fractional area are qualitatively correct.

      3) In figure 1C, the dissolution with salt is not a measure of liquid-like properties, as claimed at the bottom of page 3. The authors should look for evidence of droplet fusion, spherical shape (for droplets larger than the diffraction limit) and rapid exchange with solvent.

      4) The claims on page 4 that the condensates formed with viral RNA fragments are gel-like should be supported with some measure of dynamics, and not simply the shape of the objects that settle to the slide surface.

      5) In the CLMS experiments, how do the authors know that the changes observed are due to LLPS per se and not to differences in structure induced by differences in salt? It seems like additional controls are warranted to make this claim. Relatedly, the authors should state/examine whether higher salt affects dimerization of the dimerization domain.

      6) The analogy made on page 4 between the asymmetric structures observed upon mixing N and viral RNA fragments to the strings of vRNPs observed by cryoEM is quite a stretch. The vRNPs are 15 nm in diameter. The structures observed here are vastly larger. Such associated but non-fused droplets are often observed for solidifying phase separating systems. The superficial similarity of connected particles between the cellular vRNPs and the structures here is, in my opinion, unlikely to be meaningful.

    3. Reviewer #1:

      This article proposes that the assembly of the Sars-CoV-2 capsid is mediated by liquid-liquid phase separation of the N protein and RNA. The strength of the manuscript is a series of in vitro experiments showing that N protein can undergo liquid-liquid phase separation (LLPS) in a manner enhanced by RNA. The authors also identify nilotinib as a compound that alters the morphology of assemblies consisting of RNA and the N protein. The primary weakness of the manuscript is that there is little data connecting the in vitro observations to intracellular events, or viral assembly. Taken together, I find the experiments interesting but, as detailed below, premature.

      Major comments:

      1) A key issue with any in vitro assembly process such as LLPS is a demonstration that same process occurs in the cell. This is an issue since many molecules can undergo LLPS in vitro in a manner unrelated to their biological function. In this work, the authors show that the N protein can undergo LLPS in vitro in a manner a) stimulated by RNA, b) enhanced by the R2 domain, and c) changed in morphology by nilotinib.

      Their argument that this LLPS is relevant to the viral life cycle rests on: a) the observation that over-expressed N protein forms foci in the cytosol, and b) the number of these foci (but not necessarily their morphology as seen in vitro) is somewhat reduced by nilotinib. In my opinion, this is not a very convincing argument for two main reasons.

      First, it is unclear why the N protein is forming foci in cells. Specifically:

      a) Is it being recruited to P-bodies, or some other existing subcellular assembly? (Which could be examined by staining with other markers).

      b) Is it forming a new assembly with RNA as they have proposed? (Which could be addressed by staining for either specific or generic RNAs, or purifying these assemblies and determining if they contain RNA)

      Second, it is unclear that the foci seen in cells are related to the LLPS they observe in vitro or relevant to the viral life cycle. Specifically:

      c) Is the assembly related to the LLPS they have observed in vitro beyond a poorly understood alteration with nilotinib ? (Which could be addressed by examining if the deletions they observe affect LLPS in vitro also affect the formation of N protein foci in cells).

      d) Is the nature of this assembly relevant to the viral life cycle? (Given the difficulty of working with COVID, this is hard. My suggestion here is at a minimum to discuss the issue, and ideally do an experiment with a related coronavirus to test their hypothesis). Frankly, the idea that coronavirus would trigger a LLPS of multiple viral RNAs would seem to be inhibitory to efficient packaging of individual virions. A discussion of how the virus would benefit from such a mechanism, as opposed to a cooperative coating of a viral genome initiated at a high affinity N protein binding site would be important to put the work in context.

      2) The manuscript would be improved by examining the presence of RNA in each LLPS, and the ability of RNA to undergo self-assembly under the conditions examined in the absence of the N protein. As it stands, in some cases, the authors could be studying RNA based self-assembly, that then recruits the N protein to the RNA LLPS by RNA binding (see Van Treeck et al., 2018, PNAS for specific example of this phenomenon). This may be particularly likely for some of the longer viral RNAs that can form more stable base-pairs and thereby promote more "tangled" assemblies (e.g. Tauber et al., 2020, Cell).

      3) I found the CLMS to not fit well in this manuscript for two reasons:

      a) As I understood the methods, the CLMS experiment is looking at cross linking in high and low salt, with some LLPS occurring under low salt. However, since the cross linking was not limited to the dense phase of the low salt condition, a significant fraction (perhaps majority?) of the N proteins will not be in the dense phase. Because of this, the cross linking is essentially mapping interactions that change between high and low salt. If the authors really want to do this experiment, they should separate the phases and examine the crosslinks forming in the dense and dilute phases under the same salt conditions.

      b) A second issue with this cross-linking experiment is that the regions that dominate the changes in cross linking are not ones that appear to be important in driving LLPS in vitro based on their deletion analysis. If the authors want to include this data, it should be related to the deletion experiments and connected to the work in a manner to make it meaningful.

      4) The work would be improved by comparing how alterations that impact LLPS affect specific biochemical interactions of the relevant molecules. In these experiments, the authors are examining assemblies that form through N-N, N-RNA, RNA-RNA interactions. In each case, biochemical assays could be used to examine which of these interactions are altered by deletions or compounds. By understanding the underlying alterations in molecular interactions, a greater understanding of the mechanism of the observed LLPS, and its relevance to the viral life cycle could be revealed.

    4. Summary: Although there is a clear interest in SARS-CoV-2 biology and characterization of the physical properties of its viral proteins, ultimately the reviewers felt that the data was too preliminary and did not link it to physiological relevance even if the experimental concerns could be addressed. We hope that the reviewer's comments will be useful.

    1. Reviewer #3: (Daniele Marinazzo)

      Dear authors,

      Thanks for the opportunity to read this nice paper. I appreciated the quality of the data analysis, and the quest towards associating electrophysiology and BOLD data through a data-driven transfer function, which can be interpreted as a proxy of the HRF. Also I completely agree with you that we need to move beyond a canonical response.

      There are a few issues I would like to discuss with you. I have done quite some work in this sense. On one hand this is good (and I think it's also the reason why I was invited to review this paper), on the other one there is always the risk that I have shaped my own goggles in these last years, and that I am projecting on your work some doubts and issues that I have with my own. In this case I apologize in advance, and I hope that we can have an enriching conversation.

      Please forgive me if I start by my own work; there is always the danger that reviewers try to make authors write the paper that they would write themselves, I will keep this in mind, but on the other hand I think that the best way to convey my thoughts to you is to let them flow as they come.

      So, here's our toolbox: https://www.nitrc.org/projects/rshrf. The idea behind it is that we can take peaks in the BOLD signal and take them as signatures of a pseudo neural event happening some time before at the neural level. This is in line with this work (which could also be relevant with respect to your power law figures):

      Tagliazucchi E, Balenzuela P, Fraiman D, Chialvo DR. Criticality in large-scale brain FMRI dynamics unveiled by a novel point process analysis. Front Physiol. 2012;3:15. Published 2012 Feb 8. doi:10.3389/fphys.2012.00015 and with the subsequent spatial clustering approach which has been called coactivation patterns (CAP)

      Liu X, Zhang N, Chang C, Duyn JH. Co-activation patterns in resting-state fMRI signals. Neuroimage. 2018;180(Pt B):485-494. doi:10.1016/j.neuroimage.2018.01.041 and innovation CAPs

      Karahanoğlu FI, Caballero-Gaudes C, Lazeyras F, Van de Ville D. Total activation: fMRI deconvolution through spatio-temporal regularization. Neuroimage. 2013;73:121-134. doi:10.1016/j.neuroimage.2013.01.067 Karahanoğlu FI, Van De Ville D. Transient brain activity disentangles fMRI resting-state dynamics in terms of spatially and temporally overlapping networks. Nat Commun. 2015;6:7751. Published 2015 Jul 16. doi:10.1038/ncomms8751

      Zoller DM, Bolton TAW, Karahanoglu FI, Eliez S, Schaer M, Van De Ville D. Robust Recovery of Temporal Overlap Between Network Activity Using Transient-Informed Spatio-Temporal Regression. IEEE Trans Med Imaging. 2019;38(1):291-302. doi:10.1109/TMI.2018.2863944

      We then fit these peaks with a GLM, with the time lag as a free parameter. We use several families of basis functions. In the original paper (Wu GR, Liao W, Stramaglia S, Ding JR, Chen H, Marinazzo D. A blind deconvolution approach to recover effective connectivity brain networks from resting state fMRI data. Med Image Anal. 2013;17(3):365-374. doi:10.1016/j.media.2013.01.003) we used canonical HRF and FIR (together with the rBETA, which is basically the portion of the BOLD peak exceeding a certain threshold, as in the Tagliazucchi paper above).

      We then included a mixture of gamma functions together with other families of basis functions in subsequent versions of the toolbox. Then we set up for validation of the approach with electrophysiological signatures, and that's where the doubts and pain kicked in. Some results on simultaneous EEG-fMRI, reported here (Wu G, Marinazzo D. 2015. Retrieving the Hemodynamic Response Function in resting state fMRI: methodology and applications. PeerJ PrePrints 3:e1317v1 https://doi.org/10.7287/peerj.preprints.1317v1 Wu GR, Deshpande G, Laureys S, Marinazzo D. Retrieving the Hemodynamic Response Function in resting state fMRI: Methodology and application. Conf Proc IEEE Eng Med Biol Soc. 2015;2015:6050-6053. doi:10.1109/EMBC.2015.7319771) were encouraging: for example we saw that the positive correlation between envelope of EEG and BOLD in the occipital cortex becomes more positive when we use instead the deconvolved BOLD and the EEG, while the negative correlation in the thalamus becomes more negative.

      Other things present in the PeerJ preprint were encouraging too (and I mention them since I think that they can be relevant to the validation of your approach): namely the retrieval of a simulated ground truth HRF within certain realistic limits of SNR and jitter, the correlation with cerebral blood flow (even though physiological regressors should always be taken into account, see: Wu GR, Marinazzo D. Sensitivity of the resting-state haemodynamic response function estimation to autonomic nervous system fluctuations. Philos Trans A Math Phys Eng Sci. 2016;374(2067):20150190. doi:10.1098/rsta.2015.0190 and this becomes even more relevant when considering aging and clinical datasets), and some similarity across resting state networks.

      So, the question is: can we really trust that peaks in M/EEG reflect the local pseudo-events that would originate the BOLD signal? Reading work by people who had thoroughly investigated this, e.g.

      Logothetis NK, Pauls J, Augath M, Trinath T, Oeltermann A. Neurophysiological investigation of the basis of the fMRI signal. Nature. 2001;412(6843):150-157. doi:10.1038/35084005

      Chen X, Sobczak F, Chen Y, et al. Mapping optogenetically-driven single-vessel fMRI with concurrent neuronal calcium recordings in the rat hippocampus. Nat Commun. 2019;10(1):5239. Published 2019 Nov 20. doi:10.1038/s41467-019-12850-x

      Yu X, He Y, Wang M, et al. Sensory and optogenetically driven single-vessel fMRI. Nat Methods. 2016;13(4):337-340. doi:10.1038/nmeth.3765

      and conversing with them, I got (almost) convinced that it's unlikely that spikes in coarsely recorded or reconstructed M/EEG signal can be one to one mapped to the HRF inducing events that we use in GLM (calcium or even better glutamate signal could be a better choice).

      Now, I like the way you associated HMM states with hemodynamic ones, thus adopting a more systemic/dynamical view, and taking fractional occupancy as a trigger. Do you think that these triggers can be better markers of BOLD-inducing neural events?

      Other issues:

      • What to make of events that are very close, and that would thus violate the assumption of linearity of the GLM?

      • Apart from hemodynamic changes, can aging be associated with different electrophysiological spectral features (both periodic and aperiodic), which in turn could influence the HMM analysis?

      • Detection of brain-behavior relationships with a non-huge dataset can be misleading, see for example this recent study:

      Towards Reproducible Brain-Wide Association Studies Scott Marek, Brenden Tervo-Clemmens, Finnegan J. Calabro, David F. Montez, Benjamin P. Kay, Alexander S. Hatoum, Meghan Rose Donohue, William Foran, Ryland L. Miller, Eric Feczko, Oscar Miranda-Dominguez, Alice M. Graham, Eric A. Earl, Anders J. Perrone, Michaela Cordova, Olivia Doyle, Lucille A. Moore, Greg Conan, Johnny Uriarte, Kathy Snider, Angela Tam, Jianzhong Chen, Dillan J. Newbold, Annie Zheng, Nicole A. Seider, Andrew N. Van, Timothy O. Laumann, Wesley K. Thompson, Deanna J. Greene, Steven E. Petersen, Thomas E. Nichols, B.T. Thomas Yeo, Deanna M. Barch, Hugh Garavan, Beatriz Luna, Damien A. Fair, Nico U.F. Dosenbach bioRxiv 2020.08.21.257758; doi: 10.1101/2020.08.21.257758

      • Why the parcellation in 38 regions? How are the results consistent/robust with finer parcellations?

      • You state that the DMN "is susceptible to aging and neurodegenerative disease". That's certainly probable, the thing is that DMN is possibly sensitive to everything and specific to a very few things.

      • Instead of a point-by-point statistical test, you could use the 3dMVM algorithm in AFNI (your reference 20) to test differences in the shape as a whole.

      • You analyse data from older subjects only. How confident can you be that you are observing effects specific to aging?

      Thanks for listening to this review version of "more of a comment than a question".

    2. Reviewer #2:

      General assessment:

      The study investigated transient coupling between EEG and fMRI during resting state in 15 elderly participants using the previously established Hidden Markov Model approach. Key findings include: 1) deviations of the hemodynamic response function (HDR) in higher-order versus sensory brain networks, 2) Power law scaling for duration and relative frequency of states, 3) associations between state duration and HDR alterations, 4) cross-sectional associations between HDR alterations, white matter signal anomalies and memory performance.

      The work is rigorously designed and very well presented. The findings are potentially of strong significance to several neuroscience communities.

      Major concerns:

      My enthusiasm was only somewhat mitigated by methodological issues related to the sample size for cross-sectional reference and missed opportunities for more specific analysis of the EEG.

      1) Statistical power analysis has been conducted prior to data collection, which is very laudable. Nevertheless, n=15 is a very small sample for cross-sectional inference and commonly leads to false positives despite large observed effect sizes and small p-values (it takes easily up to 200 samples to detect true zero correlations). On the other hand, the within-subject results are far more well-posed from a statistical view, hence, more strongly supported by the data.

      Recommendations:

      • The issue should be non-defensively addressed in a well-identified section or paragraph inside the discussion. The sample size should be mentioned in the abstract too.

      • The authors could put more emphasis on the participants as replication units for observations. For the theoretical perspective, the work by Smith and Little may be of help here: https://link.springer.com/article/10.3758/s13423-018-1451-8. In terms of methods, more emphasis should be put on demonstrating representativeness for example using prevalence statistics (see e.g. Donnhäuser, Florin & Baillet https://doi.org/10.1371/journal.pcbi.1005990)

      • Supplements should display the most important findings for each subject to reveal representatives of the group averages.

      • For state duration analysis (boxplots) linear mixed effect models (varying slope models) may be an interesting option to inject additional uncertainty into the estimates and allow for partial pooling through shrinkage of subject-level effects.

      • Show more raw signals / topographies to build some trust for the input data. It could be worthwhile to show topographic displays for the main states reported in characteristic frequencies. See also next concern.

      2) The authors seem to have missed an important opportunity to pinpoint the characteristic drivers in terms of EEG frequency bands. The current analysis is based on broadband signals between 4 and 30 Hz, which seems untypical and reduces the specificity of the analysis. Analyzing the spectral drivers of the different state would not only enrich the results in terms of EEG but also provide a more nuanced interpretation. Are the VisN and DAN-states potentially related to changes in alpha power, potentially induced by spontaneous opening and closing of the eyes? What is the most characteristic spectral of the DMN state? ... etc.

      Recommendations:

      • Display the power spectrum indexed by state, ideally for each subject. This would allow inspecting modulation of the power spectra by the state and reveal the characteristic spectral signature without re-analysis.

      • Repeat essential analyses after bandpass filtering in alpha or beta range. For example, if main results look very similar after filtering 8-12 one can conclude that most observations are related to alpha band power.

      • While artifacts have been removed using ICA and the network states do not look like source-localized EOG artifacts, some of the spectral changes e.g. in DAN/VisN might be attributed to transient visual deprivation. This could be investigated by performing control analysis regressing the EOG-channels amplitudes against the HMM states. These results could also enhance the discussion regarding activation/deactivation.

    3. Reviewer #1:

      This manuscript uses simultaneous fMRI-EEG to understand the haemodynamic correlates of electrophysiological markers of brain network dynamics. The approach is both interesting and innovative. Many different analyses are conducted, but the manuscript is in general quite hard to follow. There are grammatical errors throughout, sentences/paragraphs are long and dense, and they often use vague/imprecise language or rely on (often) undefined jargon. For example, sentences such as the following example are very difficult to decipher and are found throughout the manuscript: "if replicated, an association between high positive BOLD responsiveness and a DAN electrophysiological state, characterized by low amplitude (i.e., desynchronized) activity deviating from energetically optimal spontaneous patterns, would be consistent with prior evidence that the DMN and DAN represent alternate regimes of intrinsic brain function". As a result, the reader has to work very hard to follow what has been done and to understand the key messages of the paper.

      Much is made of a potential power-law scaling of lifetime/interval times as being indicative of critical dynamics. A power-law distribution does not guarantee criticality, and could arise through other properties. Moreover, to accurately determine whether the proposed power-law is indicative of a scale-free system, the empirical property must be assessed over several orders of magnitude. It appears that only the 25-250 ms range was considered here.

      The KS statistic is used to quantify the distance between the empirical and power-law distributions, which is then used as a marker of the degree of criticality. It is unclear that this metric is appropriate, given that transitions in and out of criticality can be highly non-linear. Moreover, the physiological significance of having some networks in a critical state while others are not is unclear. Each network is part of a broader system (i.e., the brain). How should one interpret apparent gradations of criticality in different parts of the system?

      The sample size is small. I appreciate the complexity of the experimental paradigm, but the correlations do not appear to be robust. The scatterplots mask this to some extent by overlaying results from different brain regions, but close inspection suggests that the correlations in Fig 6 are driven by 2-3 observations with negative BOLD responses, the correlations in Fig 7A-B are driven by two subjects with positive WMSA volume, and Fig 7B is driven by 3 or so subjects with negative power-law fit values (indeed, x~0 in this plot is associated with a wide range of recall scores). Some correction for multiple comparisons is also required given the number of tests performed.

      Figure 1 - panel labels would make it much easier to understand what is shown in this figure relative to the caption.

      Figure 2- the aDMN does not look like the DMN at all. It is just the frontal lobe. Similarly, the putative DAN is not the DAN, but the lateral and medial parietal cortex, and cuneus.

      P6, Line 11 - please define "simulation testing"

    4. Summary: All reviewers appreciated the technical innovation of the work, but they also shared concerns about the robustness of some of the analyses, results, and content of the manuscript.

    1. Reviewer #3:

      The Aizenman lab has previously demonstrated the utility of Xenopus tectum as a model to examine neuronal, circuit and behavioral manifestations of VPA treatment, a teratogen associated with autism spectrum disorder in humans. In Gore et al., they demonstrate that the deficits induced by VPA treatment, including enhanced spontaneous and evoked neuronal activity, are blocked by pharmacological or morpholino based inhibition of MMP9. Inhibition of MMP9 also reverses the effects of VPA treatment on seizure susceptibility and the startle habituation response. Over-expression of MMP9 pheno-copies the effect of VPA, and inhibition of MMP9 in single tectal neuronal blocks the expression of experience-dependent structural plasticity. The results are convincing and add mechanistic insight into circuit and behavioral dysfunction induced by VPA signaling, as well as an expansion of the repertoire of plasticity mediated by MMP9 signaling.

      Minor points:

      -The time course for the introduction of VPA and MMP9 inhibitors should be reiterated in the results section.

      -Fig 1 Please report the number (or %) of tectal neurons in which MMP9 was over-expressed following whole-brain electroporation.

      -Does MMP9 transfection change the E/I ratio, as previously reported for VPA?

      -Does VPA or MMP9 inhibition change the initial large amplitude/short latency evoked response?

      Figure 2: please report statistics for total number of barrages or barrage distribution across experimental groups (latter also for Fig 3).

      Figs 3 and 5: The presentation of the immunoblots should clarify if raw or normalized (to Ponceau Blue) data were quantified.

      Fig 4: Please report a post hoc comparison following the repeated measures ANOVA

      Fig 5: Total growth and growth rates could also be included in the results section.

      Minor comments: -The discussion considers a broad range of potential targets of MMP9, including cell surface receptors, growth factors, adhesive proteins, and extracellular matrix components, many of these are left out of the abstract and introduction.

      -The statement of page 6 "Increased synaptic transmission observed in MMP9 over-expression tectal neurons is consistent with dysfunctional synaptic pruning" appears at odds with a body of literature in mouse hippocampus, included many papers cited in the discussion, demonstrating the role of MMP9 in spine elongation, synaptic potentiation and synapse maturation.

    2. Reviewer #2:

      In the manuscript by Gore et al., the authors show evidence that MMP9 is a key regulator of synaptic and neuronal plasticity in Xenopus tadpoles. Importantly, they demonstrate a role for MMP9s in valproic acid-induced disruptions in development of synaptic connectivity, a finding that may have particular relevance to autism spectrum disorder (ASD), as prenatal exposure to VPA leads to a higher risk for the disorder. Specifically, the authors show that hyper-connectivity induced by VPA is mimicked by overexpression of MMP9 and reduced by MMP9 knockdown and pharmacological inhibition, suggesting a causal link. The experiments appear to be well executed, analyzed appropriately, and are beautifully presented. I have only a few suggestions for improvement of the manuscript and list a few points of clarification that the authors should address.

      1) The authors refer to microarray data as the rationale for pursuing the role for MMP9 in VPA-induced hyperconnectivity. How many other MMPs or proteases with documented roles in development are similarly upregulated? The authors should say how other possible candidate genes did or did not change, perhaps presenting the list with data in a table (at least other MMPs and proteases). If others have changed, the authors should discuss their data in that context.

      2) Please cite the microarray study(ies?).

      3) In a related issue, the authors should comment on the specificity of the SB-3CT, particularly with regard to other MMPs or proteases that may/may not have been found to be upregulated in the microarray experiment.

      4) Results, first paragraph: although it is in the methods, please state briefly the timing of the VPA exposure and the age/stage at which the experiments were performed. Within the methods, please give an approximate age in days after hatching for the non-tadpole experts.

      5) The finding that a small number of MMP9 overexpressing cells is fascinating. Have the authors stained the tissue for MMP9 after VPA? 6) Do the authors have data on the intrinsic cell properties (input resistance, capacitance, etc.)? If so, they should include that data either in Supplemental information or in the text. These factors could absolutely influence hyperconnectivity or measurements of the synaptic properties, so at least the authors should discuss their findings in the context of the findings of James, et al.

      Minor Comments:

      1) Page 15: 'basaly low' may be better worded as 'low at baseline'.

      2) The color-coding is very useful and facilitates communicating the results. The yellow on Figure 5, however, is really too light. Consider another color.

    3. Reviewer #1:

      This study is based on previous work that exposure to valproic acid (VPA), which is used to model autism spectrum disorders, produces excess local synaptic connectivity, increased seizure susceptibility, abnormal social behavior, and increased MMP-9 mRNA expression in Xenopus tadpoles. VPA is an interesting compound that is also used as an antimanic and mood stabilizing agent in the treatment of bipolar disorder, although the therapeutic targets of VPA for its treatment of mania or as a model of neurodevelopmental disorders have remained elusive. The authors validate that VPA exposed tadpoles have increased MMP9 mRNA expression and then test whether the increased levels of MMP9 mediate the effects of VPA in the tadpole model. The authors report that overexpression of MMP-9 increases spontaneous synaptic activity and network connectivity, whereas pharmacological and genetic inhibition with antisense oligos rescues the VPA induced effects, and then tie the findings to experience dependent synaptic reorganization.

      1) What is the exact nature of "increased connectivity"? Is there an increase in synapse numbers or solely an increase in dendritic complexity coupled with a functional plasticity? The authors should document properties of mEPSCs and mIPSCs recording in TTX to isolate synaptic properties. Coupling this "mini" analysis to quantification of synapse numbers will address whether the changes are solely due to structural plasticity or also due to a functional potentiation of transmission. These experiments should at least be conducted in MMP-9 overexpression, VPA treatment and VPA treatment+MMP-9 loss-of-function cases to validate the basic premise that there is an increased connectivity.

      2) It is unclear why the authors focused on MMP-9 compared to other genes dysregulated by VPA. This point should be further discussed.

      3) How does VPA alter MMP-9 levels? Is this through an HDAC dependent mechanism? Granted VPA has been proposed to work through a variety of mechanisms including HDAC inhibition.

      4) Does SB-3CT rescue the expression levels of MMP-9?

      5) How is increased MMP-9 produces the synaptic and behavioral effects? What is the downstream target (specific receptor?) that would produce the broad changes in synaptic and behavioral phenotypes? Or is this a rather non-specific effect of extracellular matrix? Based on years of data on MMP-9 function its impact on "structural plasticity" in general terms is not surprising but further mechanistic details and specific targets would help move this field forward.

    1. Reviewer #3:

      Lang and col. used mouse models to address the impact of the light and dark cycle and of myeloid conditional knockout of BMAL1 and CLOCK in susceptibility to endotoxemia. As expected, mortality rate increased in animals housed in constant darkness (DD). The mortality rate remains dependent on the circadian time in DD mice and, more intriguingly, independent on myeloid BMAL1 and CLOCK, with persistent circadian cytokine expression but loss of circadian leukocyte count fluctuations. The study is mainly descriptive without mechanistic explanation, which leaves the reader a bit frustrated.

      1) Please revise the result section and the legends (for example legends of Figures 3 and 5) to explicitly mention whether experiments with conditional knockouts were performed with LD or DD mice.

      2) Line 15 and 80. Saying that DD mice show a "three-fold increased susceptibility to LPS" is true for very specific conditions only, and should not be used as a general statement.

      3) Line 99-. Please be more precise in describing cytokine levels (for example, in LD, TNF peaks at ZT10, IL-18 at ZT14 or ZT22 but not ZT18, and IL-10 but not IL-12 peaks at ZT14).

      4) Line 105-106. Referring to Figure 1E, it is not straightforward for the reader to understand what is meant by "free-running and entrained" conditions.

      5) Figure 2C and 3G. There is a substantial decreased mortality in LysM-Cre+/+ versus WT mice. Any explanation?

      6) Figure 5 depicts a protocol with LD and DD mice. Yet, it seems that only DD mice were analyzed. Is that correct? LD mice should be analyzed in parallel as controls.

      7) Figure 5 and Sup Figure 5. There are huge differences in leukocytes counts between LysM-Cre+/+ and WT mice. Without being exhaustive, LysM-Cre+/+ display much more macrophages in bone marrow, spleen and lymph nodes, DCs in lymph nodes, NK cells in spleen and lymph nodes at both CT8 and CT20. This is very puzzling and questions about the pertinence of these "control" mice. Additionally, one might expect from these observations that LysM-Cre+/+ mice are more sensitive to endotoxemia, which is not the case (point 5).

      8) Line 257. The effect of IL-18 is not totally surprising, since both detrimental and protective effects of the cytokine have been reported in the literature. This could be briefly mentioned.

      9) Sup Figure 5A. The gating strategy has to be shown for each organ, separately.

      10) Sup Figure 5D. The peritoneal cavity contains not only different macrophage populations with different inflammatory properties, but also different B cell populations including anti-inflammatory B-1a cells (plus NK cells, DCs...). Considering that LPS is injected i.p., more thorough analyses of the peritoneal cavity should be performed to properly interpret results of cytokine and mortality.

      11) It is not clear whether endotoxemia was addressed with BMAL1 and CLOCK myeloid conditional knockout mice kept LD. Since time-of-day dependent differences in mortality were much less in DD mice (line 74), we probably expect only marginal differences in DD mice.

    2. Reviewer #2:

      Lang et al. Investigate and document the role of myeloid-endogenous circadian cycling on the host response to and progression of endotoxemia in the mouse LPS-model. As a principal finding, Lang et al. report how disruption of the cell-intrinsic myeloid circadian clock by myeloid-specific knockdown of either CLOCK or BMAL1 does not prevent circadian patterns of morbidity and mortality in endotoxemic mice. As a consequence of these and other findings from endotoxemia experiments in mice kept in the dark or the observation of circadian cytokine production in CLOCK KO animals, the authors conclude that myeloid responses critical to endotoxemia are not governed by their local cell-intrinsic clock. Moreover they conclude that the source of circadian timing and pace giving that is critical for the host response to endotoxemia must lie outside the myeloid compartment. Finally, the authors also report a general (non-circadian) reduced susceptibility of mice devoid of myeloid CLOCK or BMAL1, which they take as proof that myeloid circadian cycling is important in the host response to endotoxemia, yet does not dictate the circadian pattern in mortality and cytokine responses.

      The paper is well conceived, experiments are very elegant and well carried out, statistics are appropriate, ethic statements are OK. The conclusions of this study, as summarized above, are important and will be of much interest to readers from the circadian field and beyond, also to sepsis and inflammation researchers. To me, there is one major flaw in the argumentative line of this story, as the study relies on the assumption that the systemic cytokine response provided by myeloid cells is paramount and central to the course and intensity of endotoxemia. While this is assumed by many, a rigorous proof of this connection and its causality is still lacking (most evidence is of correlative nature). As a matter of fact, there is an increasing body of more recent experimental evidence that argues against a prominent role of myeloid cells in the cytokine storm. Overall I would like to raise the following points and suggestions.

      Major Points:

      • As mentioned, a weakness of this paper is that it assumes systemic cytokine levels as produced by myeloid cells are center stage in endotoxemic shock (e.g. see line 164). However, recent evidence has shown that over 90 % of most of systemically released cytokines in sepsis are produced by non-myeloid cells (as proven e.g. by use of humanized mice, which allows to discriminate (human) cytokines produced by blood cells from (murine) cytokines produced by parenchyma (see e.g. PMID: 31297113). (Interestingly, there is one major exception to that rule, and that is TNFa). Considering this, it is not surprising that circadian cytokine levels do not change in myeloid CLOCK/BMAl1 KO mice. Also, assuming that myeloid-produced cytokines are not critical drivers, the same applies to the observation that circadian mortality pattern is preserved in those mice. I recommend that the authors more critically discuss this alternative explanation in the paper. In fact, this line of arguing would be in line with the concept that the source for the circadian susceptibility /mortality in endotoxemia resides in a non-myeloid cell compartment, which is essentially the major finding of this manuscript.

      • Intro (lines 51-54): the authors describe one scenario as the mechanism of sepsis-associated organ failure. This appears too one-sided and absolute to me, many more hypotheses and models exist. It would be good to mention that and/or tone down the wording.

      • Very analogous to Light/Darkness cycles, ambient temperature has been shown to have a strong impact on mortality from endotoxemia (e.g. PMID: 31016449). Did the authors keep their animals in thermostated ambient conditions? Please describe and discuss in the text.

      • Fig.2C; The large difference in mortality in the control lys-MCre line looks somewhat worrying to me. Could this be a consequence of well-known Cre off-target activities? Did the authors check this by e.g. sequencing myeloid cells of or using control mouse strains?

      • Line 320: Bmal1flox/flox (Bmal-flox) [48] or Clockflox/flox (Clock-flox) [38] were bred with LysM-Cre to target Bmal1. I suggest showing a prototypical genotyping result, perhaps as a supplemental figure.

      • Line 365: the authors state that mice that did not show signs of disease were sorted out. What proportion of mice (%) did not react to LPS? It would be useful to state this number in the methods section.

      • It is not fully clear to me if male or female or both were used for the principal experiments, please specify. If females were used, please describe how menstruation cycle was taken into account.

    3. Reviewer #1:

      This manuscript has novelty in it’s approach. The authors use an animal model to abolish the circadian rhythm in mice to study the impact on susceptibility to challenge with LPS. The experimental approach they use involves both wild-type mice subject to sudden stop of the light-dark (LD) cycle and mice knocked-out for the Clock system (KO). I have some points of concern:

      • The investigators show that mice shift from LD to DD become more lethal to LPS. If this is due to abolishment of the circadian rhythm, similar lethality should appear with the challenge of the KO mice. The opposite was found. Please explain.

      • LPS is acting through TLR4 binding. Can the author provide evidence that TLR4 expression is down-regulated in transition from LD to DD? Does the same apply for the expression of SOCS3?

      • TLR4 is a receptor for alarmins with IL-1alpha being one of them. Can the authors comment, based on their IL-1alpha findings, if this may be part of the mechanism?

    1. Reviewer #2:

      In this paper, Fiscella and colleagues report the results of behavioral experiments on auditory perception in healthy participants. The paper is clearly written, and the stimulus manipulations are well thought out and executed.

      In the first experiment, audiovisual speech perception was examined in 15 participants. Participants identified keywords in English sentences while viewing faces that were either dynamic or still, and either upright or rotated. To make the task more difficult, two irrelevant masking streams (one audiobook with a male talker, one audiobook with a female talker) were added to the auditory speech at different signal-to-noise ratios for a total of three simultaneous speech streams.

      The results of the first experiment were that both the visual face and the auditory voice influenced accuracy. Seeing the moving face of the talker resulted in higher accuracy than a static face, while seeing an upright moving face was better than a 90-degree rotated face which was better than an inverted moving face. In the auditory domain, performance was better when the masking streams were less loud.

      In the second experiment, 23 participants identified pitch modulations in auditory speech. The task of the participants was considerably more complicated than in the first experiment. First, participants had to learn an association between visual faces and auditory voices. Then, on each trial, they were presented with a static face which cued them which auditory voice to attend to. Then, both target and distracter voices were presented, and participants searched for pitch modulations only in the target voice. At the same time, audiobook masking streams were presented, for a total of 4 simultaneous speech streams. In addition, participants were assigned a visual task, consisting of searching for a pink dot on the mouth of the visually-presented face. The visual face matched either the target voice or the distracter voice, and the face was either upright or inverted.

      The results of the second experiment was that participants were somewhat more accurate (7%) at identifying pitch modulations when the visual face matched the target voice than when it did not.

      As I understand it, the main claim of the manuscript is as follows: For sentence comprehension in Experiment 1, both face matching (measured as the contrast of dynamic face vs. static face) and face rotation were influential. For pitch modulation in Experiment 2, only face matching (measured as the contrast of target-stream vs. distracter-stream face) was influential. This claim is summarized in the abstract as "Although we replicated previous findings that temporal coherence induces binding, there was no evidence for a role of linguistic cues in binding. Our results suggest that temporal cues improve speech processing through binding and linguistic cues benefit listeners through late integration."

      The claim for Experiment 2 is that face rotation was not influential. However, the authors provide no evidence to support this assertion, other than visual inspection (page 15, line 235): "However, there was no difference in the benefit due to the target face between the upright and inverted condition, and therefore no benefit of the upright face (Figure 2C)."

      In fact, the data provided suggests that the opposite may be true, as the improvement for upright faces (t=6.6) was larger than the improvement for inverted faces (t=3.9). An appropriate analysis to test this assertion would be to construct a linear mixed-effects model with fixed factors of face inversion and face matching, and then examine the interaction between these factors.

      However, even if this analysis was conducted and the interaction was non-significant, that would not necessarily be strong support for the claim. As the canard has it, "absence of evidence is not evidence of absence". The problem here is that the effect is rather small (7% for face matching). Trying to find significant differences of face inversion within the range of the 7% effect of face matching is difficult but would likely be possible given a larger sample size, assuming that the effect size found with the current sample size holds (t = 6.6 vs. t = 3.9).

      In contrast, in experiment 1, the range is very large (improvement from ~40% for the static face to ~90% for dynamic face) making it much easier to find a significant effect of inversion.

      One null model would be to assume that the proportional difference in accuracy due to inversion is similar for speech perception and pitch modulation (within the face matching effect) and predict the difference. In experiment 1, inverting the face at 0 dB reduced accuracy from ~90% to ~80%, a ~10% decrease. Applying this to the 7% range found in Experiment 2 would predict that inverted accuracy would be ~6.3% vs. 7%. The authors could perform a power calculation to determine the necessary sample size to detect an effect of this magnitude.

      Other Comments

      When reporting the effects of linear effects models or other regression models, it is important to report the magnitude of the effect, measured as the actual values of the model coefficients. This allows readers to understand the relative amplitude of different factors on a common scale. For experiment 1, the only values provided are imputed statistical significance, which are not good measures of effect size.

      The duration of the pitch modulations in Experiment 2 are not clear. It would help the reader to provide a supplemental figure showing the speech envelope of the 4 simultaneous speech streams and the location and duration of the pitch modulations in the target and distracter streams.

      If the pitch modulations were brief, it should be possible to calculate reaction time as an additional dependent measure. If the pitch modulations in the target and distracter streams occurred at different times, this would also allow more accurate categorization of the responses as correct or incorrect by creation of a response window. For instance, if a pitch modulation occurred in both streams and the participant responded "yes", then the timing of the pitch modulation and the response could dissociate a false-positive to the distractor stream pitch modulation from the target stream pitch modulation.

      It is not clear from the Methods, but it seems that the results shown are only for trials in which a single distracter was presented in the target stream. A standard analysis would be to use signal detection theory to examine response patterns across all of the different conditions.

      In selective attention experiments, the stimulus is usually identical between conditions while only the task instructions vary. The stimulus and task are both different between experiments 1 and 2, making it difficult to claim that "linguistic" vs. "temporal" is the only difference between the experiments.

      At a more conceptual level, it seems problematic to assume that inverting the face dissociates linguistic from temporal processing. For instance, a computer face recognition algorithm whose only job was to measure the timing of mouth movements (temporal processing) might operate by first identifying the face using eye-nose-mouth in vertical order. Inverting the face would disrupt the algorithm and hence "temporal processing", invalidation the assumption that face inversion is a pure manipulation of "linguistic processing".

    2. Reviewer #1:

      Using two behavioral experiments, the authors partially replicate known effects that rotated faces decrease the benefit of visual speech on auditory speech processing.

      As reported by the authors, Experiment 1 suffers from a design flaw considering that a temporal drift occurred in the course of the experiment. This clearly invalidates the reliability of the results and this experiment should be properly calibrated and redone. There is otherwise well-known literature on the topic.

      Experiment 2 should be discussed in the context of divided attention tasks previously reported by researchers so as to better emphasize how and whether this is a novel observation.

      Additionally:

      -The question being addressed is narrowly and ill-construed: numerous authoritative statements in the introduction should reference existing work. For instance, seminal models of Bayesian perception (audiovisual speech processing especially) should be attributed to Dominic Massaro. Such statements as "studies fail to distinguish between binding and late integration" are surprising considering that the fields of multisensory integration and audiovisual speech processing have essentially and traditionally consisted in discussing these specific issues. To name a few researchers in the audiovisual speech domain: the work of Ruth Campbell, Ken Grant, and Jean-Luc Schwartz have largely contributed to refine debates on the implication of attentional resources to audiovisual speech processing using behavioral, neuropsychology, and neuroimaging methods. In light of the additional statements of the kind "The importance of temporal coherence for binding has not previously been established for speech", I would highly recommend the authors to do a thorough literature search of their topic (below some possible references as a start).

      -What the authors understand to be "linguistic cues" should be better defined. For instance, the inverted face experiment aimed at dissociating whether visemic processing depends on face recognition (i.e. on holistic processing) or whether it depends on featural processing (and it does constitute a test, as suggested by the authors, of whether viseme recognition is a linguistic process per se).

      Some references:

      -Alsius, A., Möttönen, R., Sams, M. E., Soto-Faraco, S., & Tiippana, K. (2014). Effect of attentional load on audiovisual speech perception: evidence from ERPs. Frontiers in psychology, 5, 727.

      -Chandrasekaran, C., Trubanova, A., Stillittano, S., Caplier, A., & Ghazanfar, A. A. (2009). The natural statistics of audiovisual speech. PLoS Comput Biol, 5(7), e1000436.

      -Jordan, T. R., & Bevan, K. (1997). Seeing and hearing rotated faces: Influences of facial orientation on visual and audiovisual speech recognition. Journal of Experimental Psychology: Human Perception and Performance, 23(2), 388.

      -Grant, K. W., & Seitz, P. F. (2000). The use of visible speech cues for improving auditory detection of spoken sentences. The Journal of the Acoustical Society of America, 108(3), 1197-1208.

      -Grant, K. W., Van Wassenhove, V., & Poeppel, D. (2004). Detection of auditory (cross-spectral) and auditory-visual (cross-modal) synchrony. Speech Communication, 44(1-4), 43-53.

      -Schwartz, J. L., Berthommier, F., & Savariaux, C. (2002). Audio-visual scene analysis: evidence for a" very-early" integration process in audio-visual speech perception. In Seventh International Conference on Spoken Language Processing.

      -Schwartz, J. L., Berthommier, F., & Savariaux, C. (2004). Seeing to hear better: evidence for early audio-visual interactions in speech identification. Cognition, 93(2), B69-B78.

      -Tiippana, Kaisa, T. S. Andersen, and Mikko Sams. (2004) "Visual attention modulates audiovisual speech perception." European Journal of Cognitive Psychology 16.3: 457-472.

      -van Wassenhove, V. (2013). Speech through ears and eyes: interfacing the senses with the supramodal brain. Frontiers in psychology, 4, 388.

      -Van Wassenhove, V., Grant, K. W., & Poeppel, D. (2007). Temporal window of integration in auditory-visual speech perception. Neuropsychologia, 45(3), 598-607.

    3. Summary: Seeing a speaker's face enhances speech comprehension. This fascinating observation has nourished decades of research yet the behavioral and neural underpinnings of audiovisual speech integration remain to be elucidated.

      In this study, the authors suggest that speech accuracy is influenced by seeing the real face (moving and upright faces being better than static and rotated or inverted faces, respectively) and speech comprehension may benefit more from matching voices and faces. Both reviewers noted that the work presents no conceptual framing and that the manuscript needs to include a better review of the existing literature to situate the study. Several methodological and statistical concerns were also raised, the majority of which are detailed by Reviewer 2.

    1. Reviewer #2:

      This paper reports on a very interesting and potentially highly important finding - that so-called "sleep learning" does not improve relearning of the same material during wake, but instead paradoxically hinders it. The effect of stimulus presentation during sleep on re-learning was modulated by sleep physiology, namely the number of slow wave peaks that coincide with presentation of the second word in a word pair over repeated presentations. These findings are of theoretical significance for the field of sleep and memory consolidation, as well as of practical importance.

      Concerns and recommendations:

      1) The authors' results suggest that "sleep learning" leads to an impairment in subsequent wake learning. The authors suggest that this result is due to stimulus-driven interference in synaptic downscaling in hippocampal and language-related networks engaged in the learning of semantic associations, which then leads to saturation of the involved neurons and impairment of subsequent learning. Although at first the findings seem counter-intuitive, I find this explanation to be extremely interesting. Given this explanation, it would be interesting to look at the relationship between implicit learning (as measured on the size judgment task) and subsequent explicit wake-relearning. If this proposed mechanism is correct, then at the trial level one would expect that trials with better evidence of implicit learning (i.e. those that were judged "correctly" on the size judgment task) should show poorer explicit relearning and recall. This analysis would make an interesting addition to the paper, and could possibly strengthen the authors' interpretation.

      2) In some cases, a null result is reported and a claim is based on the null result (for example, the finding that wake-learning of new semantic associations in the incongruent condition was not diminished). Where relevant, it would be a good idea to report Bayes factors to quantify evidence for the null.

      3) The authors report that they "further identified and excluded from all data analyses the two most consistently small-rated and the two most consistently large-rated foreign words in each word lists based on participants' ratings of these words in the baseline condition in the implicit memory test." Although I realize that the same approach was applied in their original 2019 paper, this decision point seems a bit arbitrary, particularly in the context of the current study where the focus is on explicit relearning and recall, rather than implicit size judgments. As a reader, I wonder whether the results hold when all words are included in the analysis.

      4) In the main analysis examining interactions between test run, condition (congruent/incongruent) and number of peak-associated stimulations during sleep (0-1 versus 3-4), baseline trials (i.e. new words that were not presented during sleep) are excluded. As such, the interactions shown in the main results figure (Figure D) are a bit misleading and confusing, as they appear to reflect comparisons relative to the baseline trials (rather than a direct comparison between congruent and incongruent trials, as was done in the analysis). It also looks like the data in the "new" condition is just replicated four times over the four panes of the figure. I recommend reconstructing the figure so that a direct visual comparison can be made between the number of peaks within the congruent and incongruent trials. This change would allow the figure to more accurately reflect the statistical analyses and results that are reported in the manuscript.

      5) In addition to the main analysis, the authors report that they also separately compared the conscious recall of congruent and incongruent pairs that were never or once vs. repeatedly associated with slow-wave peaks with the conscious recall in the baseline condition. Given that four separate analyses were carried out, some correction for multiple comparisons should be done. It is unclear whether this was done as it does not seem to be reported.

    2. Reviewer #1:

      This work claims to show that learning of word associations during sleep can impair learning of similar material during wakefulness. The effect of sleep on learning depended on whether slow-wave sleep peaks were present during the presentation of that material during sleep. This is an interesting finding, but I have a lot of questions about the methods that temper my enthusiasm.

      1) The proposed mechanism doesn't make sense to me: "synaptic down-scaling of hippocampal and neocortical language-related neurons, which were then too saturated for further potentiation required for the wake-relearning of the same vocabulary". Also lines 105-122. What is 'synaptic down-scaling'? what are 'language related neurons'? ' How were they 'saturated'? What is 'deficient synaptic renormalization'? How can the authors be sure that there are 'neurons that generated the sleep- and ensuing wake-learning of ... semantic associations'? How can inferences about neuronal mechanisms (ie mechanisms within neurons) be drawn from what is a behavioural study?

      2) On line 54 the authors say "Here, we present additional data from a subset of participants of our previous study in whom we investigated how sleep-formed memories interact with wake-learning." It isn't clear what criteria were used to choose this 'subset of participants'. Were they chosen randomly? Why were only a subset chosen, anyway?

      3) The authors do not appear to have checked whether their nappers had explicit memory of the word pairs that had been presented. Why was this not checked, and couldn't explicit memory explain the implicit memory traces described in lines 66-70 (guessing would be above chance if the participants actually remembered the associations).

    3. Summary: This is an interesting topic, and these findings are potentially of theoretical significance for the field of sleep and memory consolidation, as well as potentially of practical importance. However, reviewers raised potential issues with the methods and interpretation. Specifically, reviewers were not confident that the paper reveals major new mechanistic insights that will make a major impact on a broad range of fields.

    1. This manuscript is in revision at eLife

      The decision letter after peer review, sent to the authors on October 28 2020, follows.

      Summary

      This work by Berger et al examined the process of De Novo infection by a model gamma herepesvirus, MHV68, using two complementary single cell approaches - CyTOF and scRNAseq. Using CyTOF and scRNA-seq, they characterize host and viral expression of protein and RNA during infection by the gammaherpesvirus MHV68. From CyTOF of numerous host proteins and one viral protein, they propose that the DNA damage marker pH2AX along with the viral protein vRCA are more precise indicators of progressive infection than a standard LANA reporter. Using a single viral (ORF18) and host RNA (Actin), they demonstrate that pH2AX+, vRCA+ cells uniformly express ORF18. To more closely examine viral RNAs, they performed scRNA-seq on infected cells and observe a high level of heterogeneity in viral gene expression.

      The manuscript is very well written and could potentially be a very welcome addition to the growing field of single cell virology. However, some concerns were raised regarding some of the conclusions and validation of the results. In particular, the variability in gene expression does not fall into existing models of kinetically regulated waves of viral transcription. This and their previous work convincingly argue that bulk measurements of protein and RNA are insufficient to represent the complexity of de novo MHV68 infection. However, in the absence of functional significance to the many clusters identified the impact of the conclusions is limited. With regard to validation, the authors must also consider that inherent variability in scRNAseq technology that could complicate the accurate measure of viral RNA. This should be discussed and addressed with additional data and/or experiments (see below).

      Essential Revisions

      1) The reviewers agreed that this article will be a very useful resource for the single cell virology community, but require further validation to realize that potential. As such, this article should be resubmitted as a "Tools and Resources" article. Furthermore, this revision should pay careful attention to the additional essential revisions that follow this point, in particular there are areas that require more data for validation. Ideally, existing data or experiments closely related to those conducted can be used.

      2) One of the more dramatic conclusions from the paper is that while the median infected cell expressed 52 viral genes, this ranges from 12 to 66 with only a handful of genes expressed uniformly. However, there are a number of indications that this may be explained instead by the stochastic failure to detect lowly expressed viral genes: 1) Figure 1A shows a tight distribution of the # of viral genes detected, which would be unlikely if there were multiple classes of infected cells expressing different subsets of viral genes. 2) Figure 1B shows a strong relationship between the average expression level and the frequency of detection, most easily explained by poor capture efficiency or another technical artifact resulting in undersampling. 3) These results fail to recapitulate known kinetic classes or uniform LANA expression. 4) Figure S3 indicates that even among host genes, the median cell had only a ~1,000 genes per cell detected, likely an insignificant fraction of expressed genes detected to assess viral gene number. These inconsistencies make it difficult to assess whether the observed heterogeneity is a true reflection of the gene expression profiles during infection or a reflection of the inability to detect lowly expressed transcripts by scRNA-seq.

      Given the inherent "noisy" nature of scRNA-seq, it is usually hard to quantify how much of a given mRNA expression variability among individual cells is due to technical limitations, and how much is due to biological differences. The authors could settle this question for at least a small amount of genes, by comparing the variability they see in scRNAseq to that they measure in PrimeFlow and CyTOF (although the latter has the added complication of comparing RNA to protein, but would still be valuable to discuss). If they compare the heterogeneity observed for the given proteins in CyTOF with what they observe for the corresponding transcripts in scRNAseq they will both validate their finding and will be able to estimate how much of their variability in scRNAseq translates to the protein level. They can do the same with their FlowPrime data, which would be even more informative as both measure transcripts. These approaches would be ideal as the data should be readily available. Alternatively, some of the expression should be correlated by RT-qPCR or by Northern blot or if single cell is necessary, then by in situ hybridization.

      The fact that the data do not pick up the established signatures of early vs. late gene expression goes against the bulk of work on viral gene expression control. More discussion about why this may be, including limitations of scRNAseq for less abundant transcripts is warranted.

      3) In figure 3A, the authors observe and note both pH2AX+, vRCA- and pH2AX-, vRCA+ cell populations; based on ORF18 or Actin expression, a significant fraction of these cells are infected. The proportion of cells in each gate is not quantified, but it appears that these single-positive cells represent a significant fraction of the total infected cells. However, in Figure 1C their appears to be no major single-positive populations, and the authors note that vRCA and pH2AX levels are highly correlated. This suggests that the cells are missing from the CyTOF analysis (perhaps lying outside of the two gates presented in figure S1A). These missing cells undercuts the value of the dataset and analysis and may lead to incorrect interpretations of pH2AX's value as a marker. Addressing this discrepancy in the FlowPrime/CyTOF data and some form of validation of scRNAseq (either by leveraging their protein data or via independent experiments) will be important for establishing the datasets as a reliable resource.

      Two related issues in the text: Line 217. "demonstrate that pH2AX+ and vRCA+ show progressive infection.." Progression implies that the study occurs over different time points, but the time parameter is not measured in these studies. It is not clear to me that these different phenotypes relate to different temporal stages of the infection or if they are different terminal outcomes. The authors should use another term than "progressive" in this context. Line 423 - the use of the work "progression" implies temporal studies which were not performed in this work. The study is a snapshot of a single time point and "progression" is inferred.

      4) Phenotype variation may be due to variation in cell cycle stage, cell viability and age, and asynchronous infection. To what extent are these variables controlled or considered in the analysis?

      5) In Figure 3, the authors show that ~20% of mock-infected cells are negative for beta-actin RNA. This seems quite odd for a house keeping gene, and the corresponding PrimeFlow data is not shown. I assume that this has to do with the authors gating strategy, or some technical issue with PrimeFlow that prevents all RNA molecules from being labeled. In either case, it would be helpful if the authors clarify this point and include the data for the mock cells in the figure.

      6) Could the authors explain their rational for including the CycKO mutant in the analysis and in combining the wt and KO data into one analysis? A-priori, if the mutant has no effect on the current question (de novo infection of fibroblasts) I would suggest excluding it from the paper and only showing the wt data, or to present the data for the mutant in a supplementary file, stating similar results were obtained with it. Although the authors states that only five genes were differently expressed between the wt and mutant, it seems wrong to aggregate the data from the different viruses into a single analysis.

      7) In Figure 6 and the accompanying text, the authors make a distinction between "virus-biased" and "host-biased" cells, based on the % of viral genes expressed in each cell. They go on to claim that "no significant difference in host gene expression among expressed genes" was found between these two groups. The statistical analysis for this result seems to be an ANOVA test, which I believe is not appropriate for this analysis. As the authors are comparing two distribution, something like a Kolmogorov-Smirnov test is needed. Additionally, in the text (line 314), the authors claim that no substantial difference is seen for cell-cycle genes between "virus-biased" and "host-biased" cell (Figure S6A). Looking at the data, it seems to me that G2 cells are highly enriched in the "host-biased" group. A formal quantitative analysis is needed to make this point.

      8) In line 316 the authors state that "host-biased cells expressed a number of interferon-response genes (Figure S6 and Table S3), suggesting a potential role in resistance to infection". I think this claim is not fully supported by the data. Since single-cell RNA-sequencing is a "zero sum" technique, cells with a higher proportion of viral gene expression are bound to show less host genes (as the authors have shown in Figure 6), including ISGs. To show that these cells are indeed expressing more ISGs than the "virus-biased" cells, would require sorting the different populations, as well as mock-infected cells, and measure ISGs (by methods such as qPCR, RNAseq, PrimeFlow, WB etc.), or at least have some analysis that takes into account the increased drop-off of host genes in cells with high levels of viral genes (something like a permutation test?

    1. Reviewer #3:

      This work provides a computational model to explain the change of grid cell firing field structure due to changes in environmental features. It starts from a framework in which self-motion information and those related to external sensory cues are integrated for position estimation. To implement this theoretical modeling framework, it examines grid cell firing as a position estimate, which is derived from place cell firing representing sensory inputs and noisy, self-motion inputs. Then, it adapts this model to explain experimental findings in which the environment partially changed. For example, the rescaling of an environment leads to a disruption of this estimation because the sensory cue and self-motion information misalign. Accordingly, the model describes mechanisms through which the grid cell position estimate is updated when self-motion and hippocampal sensory inputs misalign in this situation. The work also suggests that coordinated replay between hippocampal place cells and entorhinal grid cells provide means to realign the sensory and self-motion cues for accurate position prediction. Probably the strongest achievement of this work is that it developed a biology-based Bayesian inference approach to optimally use both sensory and self-motion information for accurate position estimation. Accordingly, these findings could be useful in related machine learning fields.

      Major comment:

      The work seems to provide a significant advance in computational neuroscience with possible implications to machine learning using brain-derived principles. The major weakness, however, is that it is not written in a way that the majority of neuroscientists (who do not work in this immediate computational field) could benefit from. It often does not explain why/how it came to some conclusions or what those conclusions actually mean - for example, right in the introduction, "This process can also be viewed as an embedding of sensory experience within a low-dimensional manifold (in this case, 2D space), as observed of place cells during sleep". It also does not provide a sufficiently detailed qualitative explanation of the mathematical formulations or what the model actually does at a given condition. So my recommendation would be to carefully rewrite the work to make it readable for a wider audience. I also fear that the work also assumes significant a priori neuroscience information, so people in machine learning fields would not benefit from this work in its current form either.

      It is not clear why place cell input was chosen as sensory input. Place cells also alter their firing with geometry, sensory and contextual changes. Although grid cells require place cell input, place cell firing represents more than just sensory inputs. In fact, they may be more sensitive to non-sensory behavioral, contextual changes than grid cells. Moreover, like grid cells, they are sensitive to self-motion inputs, e.g., speed-sensitivity and, at least in virtual environments, head-direction sensitivity. This point would deserve a detailed discussion.

    2. Reviewer #2:

      This paper uses a clever application of the well known Simultaneous Localization and Mapping model (+ replay) to the neuroscience of navigation. The authors capture aspects of the relationship between EC-HPC that are often not captured within one paper/model. Here online prediction error between the EC/HPC systems in the model trigger offline probabilistic inference, or the fast propagation of traveling waves enabling neural message passing between place and grid cell representing non-local states. The authors thus model how such replay - i.e. fast propagation of offline traveling waves passing messages between EC/HP - leads to inference and explains the function of coordinated EC-HP replay. I enjoyed reading the paper and the supplementary material.

      First, I'd like to say that I am impressed by this paper. Second, I see my job as a reviewer merely to give suggestions to help improve the accessibility and clarity of the present manuscript. This could help the reader appreciate a beautiful application of SLAM to HPC-EC interactions as well as the novelty of the present approach in bringing in a number of HPC-EC properties together in one model.

      1) The introduction is rather brief and lacks citations standard for this field. This is understandable as it may be due to earlier versions having been prepared for NeurIPS. It may be helpful if the authors added a bit more background to the introduction so readers can orient themselves and localize this paper in the larger map of the field. It would be especially helpful to repeat this process not only in the intro but throughout the text even if the authors have already cited papers elsewhere, since the authors are elegantly bringing together various different neuroscientific concepts and findings, such as replay, structures, offline traveling waves, propagation speed, shifter cell, etc. A bigger picture intro will help the reader be prepared for all the relevant pieces that are later gradually unfolded.

      It would be especially helpful to offer an overall summary of the main aspects of HPC-EC literature in relation to navigation that will later appear. This will frontload the larger, and in my opinion clever narrative, of the paper where replay, memory, and probabilistic models meet to capture aspects of the literature not previously addressed.

      2) The SLAM (simultaneous localization and mapping) model is used broadly in mobile phones, robotics, automotive, and drones. The authors do not introduce SLAM to the reader, and SLAM (in broad strokes) may not be familiar to potential readers. Even for neuroscientists who may be familiar with SLAM, it may not be clear from the paper which aspects of it are directly similar to existing other models and which aspects are novel in terms of capturing HPC/EC findings. I would strongly encourage an entire section dedicated to SLAM, perhaps even a simple figure or diagram of the broader algorithm. It would be especially helpful if the authors could clarify how their structure replay approach extends existing offline SLAM approaches. This would make the novel approaches in the present paper shine for both bio & ML audiences.

      Providing this big picture will make it easier for the reader to connect aspects of SLAM that are known, with the clever account of traveling waves and other HPC-EC interactions, which are largely overlooked in contemporary models of HPC-EC models of space and structures. It is perhaps also worth to mention RatSLAM, which is another bio-inspired version of SLAM, and the place cell/hippocampus inspiration for SLAM.

      D Ball, S Heath, J Wiles, G Wyeth, P Corke, M Milford, "OpenRatSLAM: an open source brain-based SLAM system", in Autonomous Robots, 34 (3), 149-176, 2013

      3) At first glance, it may appear that there are many moving parts in the paper. To the average neuroscience reader, this may be puzzling, or require going back and forth with some working memory overload to put the pieces together. My suggestion is to have a table of biological/neural functions and the equivalent components of the present model. This guide will allow the reader to see the big picture - and the value of the authors' hard work - in one glance, and be able to look more closely at each section more closely and with the bigger picture in mind. I believe this will only increase the clarity and accessibility of the manuscript.

      4) The authors could perhaps spend a little more time comparing previous modeling attempts at capturing the HP-EC phenomena and traveling through various models, noting caveats of previous models, and advantages and caveats of their model. This could be in the discussion, or earlier, but would help localize the reader in this space a bit better.

      5) Perhaps the authors could briefly clarify where merely Euclidean vs. non-euclidean representations would be expected of the model, and whether they can accommodate >2D maps, e.g. in bats or in nonspatial interactions of HPC-EC.

      6) The discussion could also be improved by synthesizing the old and the new, the significant contribution of this paper and modifications to SLAM, as well as a big picture summary of the various phenomena that come together in the HPC-EC interactions, e.g. via traveling waves.

    3. Reviewer #1:

      In the present manuscript, Evans and Burgess present a computational model of the entorhinal-hippocampal network that enables self-localization by learning the correspondence between stimulus position in the environment and internal metric system generated by path integration. Their model is composed of two separate modules, observation and transition, which inform about the relationship between environmental features and the internal metric system, and update the internal metric system between two consecutive positions, respectively. The observation module would correspond to projection from hippocampal place cells (PCs) to entorhinal grid cells (GCs), while the transition module would just update the GCs based on animal's movement. The authors suggest that the system can achieve fast and reliable learning by combining online learning (during exploration) and offline learning (when the animal stops or rests). While online learning only updates the observation model, offline learning could update both modules. The authors then test their model on several environmental manipulations. Finally, they discuss how offline learning could correspond to spontaneous replay in the entorhinal-hippocampal network. While the work will certainly be of great interest to the community, the authors should improve the presentation of their manuscript, and make sure they clearly define the key concepts of their study.

      Online learning is clearly explained in the manuscript (e.g. l.101). Both environment structure (PC-PC connections) and the observation models (PC->GC synapses) are learned online, and this leads to stable grid cells. Then, the authors suggest that prediction error between the observation and transition models triggers offline inference that can update both models simultaneously. However, it is hard to figure out what offline learning is exactly. The section "Offline inference: The hippocampus as a probabilistic graph" is quite impossible to follow. Before explicitly defining offline learning the authors introduce a spring model of mutual connection between feature locations, but it is not clearly explained if this network is optimized online or offline.

      The end of this section is particularly difficult to follow (line 180): "In this context, learning the PC-GC weights (modifying the observation model) during online localization corresponds to forming spatial priors over feature locations which anchor the structure, which would otherwise be translation or rotation invariant (since measurements are relative), learned during offline inference to constant locations on the grid-map.".

      What really triggers offline inference is only explained much further in the manuscript, l. 366. Interestingly, this section refers to Fig. 1G for the first time, and should naturally be moved at the beginning of the manuscript (where Fig.1 is described)

      Along the same lines, the role of offline learning should be made much more explicit in Fig. 2.

      The frequent references to the method section too often break the flow of paper and make it difficult to follow. The authors should start their manuscript with a clear and simple definition of the core idea and concepts, almost in lay terms and only introducing a few annotations, using Fig. 1 (perhaps with some modification and focusing especially on panels A and F) as a visual support, and to move mathematical equations such as Eq. 3 to the supplementary information.

      The authors have tested their model on various manipulations that have been previously carried out in freely moving animals, such as change in visual gain and in environmental geometry. These sections are interesting but, again, would be much clearer if presented after a clear explanation of online and offline learning procedures, not in between.

      Finally, the authors discuss the relationship between offline inference and neuronal replay, as observed experimentally in vivo (Figs 6&7). This is interesting but would perhaps benefit from some graphical explanation. It is not immediately obvious to understand the fundamental difference between message passing (Fig. 6A) and simple synaptic propagation of activity among connected PC in CA3. Fig. 7 is actually a nice illustration of the phenomenon and should perhaps be presented before Fig. 6.

    4. Summary: In the present manuscript, the authors apply the well-known Simultaneous Localization and Mapping model (+ replay) to the neuroscience of navigation. Their model is composed of two separate modules, observation and transition. The former informs about the relationship between environmental features and the internal metric system while the latter updates the internal metric system between two consecutive positions. The observation module would correspond to projection from hippocampal place cells to entorhinal grid cells, while the transition module would just update the grid cells based on animal's movement. The authors suggest that the system can achieve fast and reliable learning by combining online learning (during exploration) and offline learning (when the animal stops or rests). In the model, online prediction error between the entorhinal cortex and the hippocampus triggers offline probabilistic inference, during which replay of place and grid cells represent non-local states. The authors thus suggest a function to the experimental observation of coordinated replay in the entorhinal-hippocampal network.

    1. This manuscript is in revision at eLife

      The decision letter after peer review, sent to the authors on October 27 2020, follows.

      Summary

      The referees agree your work on reconstitution of Drosophila septin hexamers into filaments on supported bilayers and their characterization and comparisin with the yeast counterparts is interesating and important. However, the referees raise a number of important points, all of which need to be addressed satisfactorily before publication.

      Essential Revisions

      1) In all sets of experiments different amounts of PS, PIP2 and septin concentrations are used. How can the obtained data be discussed in terms of these parameters, if they are not really comparable? What is the rationale of using the different conditions? For example: mEGFP-tagged fly septins are crowed with methylcellulose on neutral PC SLBs. Septin concentrations of 100-500 nM were used (Figure1). In Figure 2, however, 1000 nM septin is used without methylcellulose. AFM and QCM-D data were obtained with 20 % PS, no PIP2, and 10 nM septin concentrations. These conditions do not resemble the conditions in the TIRF experiments (as written). In the TIRF experiments 1000 nM septin was used. Cryo-EM data were obtained for 6 mol% PIP2, no PS. In the discussion a model (or several models) are proposed, which appear to be highly speculative. The results are compared to those with yeast septins reported in literature. As this comparison is the major point that is made in the manuscript it would be important to perform at least one of the experiments with yeast septin for direct comparison.

      2) The authors should create lipid bilayers on curved surfaces like glass rods to simulate the ability of the septins to create annulus structures as in vivo. Indeed it seems that on vesicles the structure of septin filaments hardly differ from the monolayer case. Adding topographica and geometrical cues to the septin assembly can potentially bring new insights in how they can assemble in rings or sheets.

      3) Examinations of septin hexamers only containing the short or long coiled coils or mixing two populations of septin hexamers (wt and ΔCC) to see whether the ΔCC are excluded from any filament stacks would be highly recommended to support the final model.

      4) Regarding the results in Fig.2D: A simple calculation explains why you find dense septin packing on SLBs at 10nM. Assuming a septin hexamer has the area of 4x24 nm2 and the flow chamber has an area of 5x20 mm2 and a height of 2mm, you would need about 1ꞏ 1012 septin hexamers to cover the SLB. This number of septins in the volume of the flow chamber would correspond to a concentration of about 8.7 nM. It is not clear why the authors did not check lower septin hexamer concentrations as this would simply require further dilution of the stock solution. These results seem to be also in conflict with the AFM results, where individual septin filaments are observed at 12nM and 24nM. The authors should clarify this difference.

      5) The comments about mechanical stability of the lipid bound septins are unsurprising and not very conducive as they describe GTA fixed septins. Studies of lateral stability don't have to relate directly to enhanced cortex stability. It would have been more powerful to compare the stability of septin decorated GUVs. Sorry, but the discussion about septin layer height limitation is very speculative and would be much better founded if the authors would have done some more experiments. The claim that the layer is self-limiting is not in line with the TIRF data that shows a steady increase of fluorescence intensity at 500nM septin. It would have been good to add AFM and QCM data on samples with higher septin concentrations, e.g. 500nM, to prove that the layer remains indeed within 12-21nm. It would also be insightful to either test mixtures of ΔCC septins with full length septins or to generate septins only lacking the long coiled-coils or the short coiled-coils to support the conclusions of the authors.

    1. This manuscript is in revision at eLife

      The decision letter after peer review, sent to the authors on October 30 2020, follows.

      Summary

      Using a mouse model of melanoma, this report demonstrates the relevance of the CD300a immunoreceptor, specifically in dendritic cells (DCs), in tumor growth. It shows that the absence of CD300a is correlated with a higher number of regulatory T cells (Tregs) within the tumor microenvironment and therefore the tumor grows faster and survival decreases. Based on additional experiments, the authors propose a mechanism by which tumor-derived extracellular vesicles (TEVs) interact with CD300a in DCs, decreasing IFNbeta production which subsequently reduces the number of Tregs. In addition, data from melanoma patients show a correlation between overall survival and higher levels of CD300a expression in the tumor.

      Essential Revisions

      1) It is highly recommended to clearly demonstrate the role of IFNbeta in the proposed mechanism. In addition to using an anti-IFNbeta mAb in an in vitro culture (Figure 3D), other experiments must be performed, such as in vivo experiments with the anti-IFNbeta mAb. The authors have used this mAb in their previously published article (Nakahashi-Oda et al., Nature Immunology, 2016). Alternatively, in vivo experiments could also be performed with IFNAR1-like (IFN alpha and beta receptor 1 subunit) KO animals.

      In addition, is the observed increase in Tregs within the tumor in CD300a-/- animals due only to an increase in IFNbeta production by DCs? Are there other cytokines and/or cell-cell contact that may play a role? At least this should be discussed.

      2) Why are not all the experiments performed on CD300afl/fl Itgax-Cre mice instead of CD300a-/- mice? The experiments in Figures 2a, S2C, 3 and 4 should have been performed on CD300afl/fl Itgax-Cre mice. This is very important to state unequivocally that only CD300a in DCs is involved in the induction of an immune response capable of inhibiting tumor development.

      3) The authors found expansion of tumor-infiltrating Tregs in mice deficient in CD300a. However, no increase in Tregs was observed in tumor-draining lymph nodes. Did authors assess the expression of Treg activation and proliferation molecular markers, such as CD25, CTLA4, GITR, CD39, CD73 or Ki67? If indeed, Treg expansion as a result of CD300a-deficiency is the cause of enhanced tumor growth, authors should provide more evidence of Treg suppressive response. For example, authors can consider measuring the levels of co-stimulatory molecules (e.g. CD40, CD80 and CD86) on dendritic cells, which generally correlate with Treg activitiy and/or tumoral IL-2 concentration.

      4) PD-1 is the only marker analyzed to assess the exhausted status of CD8+ T cells infiltrating tumor lesion of CD300a-/- mice. Additional evidence of this functional status could be provided, such as for instance expression of CTLA4, TIM3 or other immune checkpoints, or low Ki67 levels. Indeed, particularly in reference of the human setting, PD-1 is also a sign of T cell activation, usually expressed in T cells infiltrating highly immunogenic and hot tumors. Hence, it would be useful having a broader characterization of immune effectors associated with progressing tumor microenvironment when CD300a is lost.

      5) Since authors have Foxp3-reporter mice, they should confirm their data in Fig. 3D with natural / freshly isolated Tregs, unless they are suggesting that CD300a mainly prevents in situ conversion of intra-tumoral CD4+Foxp3- Tconv cells into Tregs.

      6) Given that the interaction between CD300a and phosphatidylserine (PS) is critical to CD300a activation, PS co-localization with CD300a ought to be included in confocal microscopy. In addition, the binding of CD300a to PS and PE, which are both upregulated in dead cells, implies that apoptotic bodies could also be shuttling comparable signaling, Can the authors exclude that these particles are present in the EV preparations? Furthermore, does tumor supernatant lose any effect when depleted of EVs? The latter evidence could significantly strengthenthe exclusive involvement of exosomes in the process.

      7) Did authors validate the importance of PS in the context that they propose with an anti-PS blocking antibody? There are not many anti-PS blocking antibodies available and they might not block engagement with CD300a (see Nat Commun. 2016 Mar 14;7:10871). Nonetheless, this would be a good assay to demonstrate PS as the ligand that triggers CD300a to inhibit TLR3 and subsequent IFN-β production.

    1. Reviewer #2:

      The authors present a work related to the survey of the bacterial community in the Cam River (Cambridgeshire, UK) using one of the latest DNA sequencing technologies using a target sequencing approach (Oxford Nanopore). The work consisted in a test for the sequencing and analysis method, benchmarking some programs using mock data, to decide which one was the best for their analysis.

      After selecting the best tool, they provide a family level taxonomy profiling for the microbial community along the Cam river through a 4-month window of time. In addition to the general and local snapshots of the bacterial composition, they correlate some physicochemical parameters with the abundance shift of some taxa.

      Finally, they report the presence of 55 potentially pathogenic bacterial genera that were further studied using a phylogenetic analysis.

      Comments:

      Page 6. There is a "data not shown" comment in the text:

      "Benchmarking of the classification tools on one aquatic sample further confirmed Minimap2's reliable performance in a complex bacterial community, although other tools such as SPINGO (Allard, Ryan, Jeffery, & Claesson, 2015), MAPseq (Matias Rodrigues, Schmidt, Tackmann, & von Mering, 2017), or IDTAXA (Murali et al., 2018) also produced highly concordant results despite variations in speed and memory usage (data not shown)."

      Nowadays, there is no reason for not showing data. In case the speed and memory usage was not recorded, it is advisable to rerun the analysis and quantify these variables, rather than mentioning them and not report them.

      Or what are the reasons for not showing the results?

      Figure 2 is too dense and crowded. In the end, all figures are too tiny and the message they should deliver is lost. That also makes the footnote very long. I would suggest moving some of the figure panels, maybe b), c) and d), as separate supp. figures.

      Figure 3 has the same problem. I think there is too much information that could be moved as supp. mat.

      In addition to Figure 4, it would be important to calculate if the family PCA component contribution differences in time are differentially significant. In Panel B, is depicted the most evident variance difference but what about other taxa which might not be very abundant but differ in time? you can use the fitFeatureModel function from the metagenomeSeq R library and a P-adjusted threshold value of 0.05, to validate abundance differences in addition to your analysis.

      Page 12-13. In the paragraph:

      "Using multiple sequence alignments between nanopore reads and pathogenic species references, we further resolved the phylogenies of three common potentially pathogenic genera occurring in our river samples, Legionella, Salmonella and Pseudomonas (Figure 7a-c; Material and Methods). While Legionella and Salmonella diversities presented negligible levels of known harmful species, a cluster of reads in downstream sections indicated a low abundance of the opportunistic, environmental pathogen Pseudomonas aeruginosa (Figure 7c). We also found significant variations in relative abundances of the Leptospira genus, which was recently described to be enriched in wastewater effluents in Germany (Numberger et al., 2019) (Figure 7d)."

      Here it is important to mention the relative abundance in the sample. Please, discuss that the presence of DNA from pathogens in the sample, has to be confirmed by other microbiology methodologies, to validate if there are viable organisms. Definitively, it is a big warning finding pathogen's DNA but also, since it is characterized only at genus level, further investigation using whole metagenome shotgun sequencing or isolation, would be important.

      This phrase is used in the abstract , introduction and discussion, although not exactly written the same:

      "Using an inexpensive, easily adaptable and scalable framework based on nanopore sequencing..."

      I wouldn't use the term "inexpensive" since it is relative. Also, it should be discussed that although is technically convenient in some aspects compared to other sequencers, there are still protocol steps that need certain reagents and equipment that are similar or the same to those needed for other sequencing platforms. Probably, common bottlenecks such as DNA extraction methods, sample preservation and the presence of inhibitory compounds should be mentioned and stressed out.

      Page 15: "This might help to establish this family as an indicator for bacterial community shifts along with water temperature fluctuations."

      Temperature might not be the main factor for the shift. There could be other factors that were not measured that could contribute to this shift. There are several parameters that are not measured and are related to water quality (COD, organic matter, PO4, etc).

      "A number of experimental intricacies should be addressed towards future nanopore freshwater sequencing studies with our approach, mostly by scrutinising water DNA extraction yields, PCR biases and molar imbalances in barcode multiplexing (Figure 3a; Supplementary Figure 5)."

      Here you could elaborate more on the challenges like those mentioned in my previous comment.

    2. Reviewer #1:

      The authors present a workflow based on targeted Nanopore DNA sequencing, in which they amplify and sequence nearly full-length 16S rRNA genes, to analyze surface water samples from the Cam river in Cambridge. They first identify a taxonomic classification tool, out of twelve studied, that performs best with their data. They detect a core microbiome and temporal gradients in their samples and analyze the presence of potential pathogens, obtaining species level resolution and sewage signals. The manuscript is well written and contains sufficient information for others to carry out a similar analysis with a strategy that the authors claim will be more accessible to users around the world, and particularly useful for freshwater surveillance and tracing of potential pathogens.

      The work is sufficiently well-documented and timely in its use of nanopore sequencing to profile environmental microbial communities. However, given that the authors claim to provide a simple, fast and optimized workflow it would be good to mention how this workflow differs or provides faster and better analysis than previous work using amplicon sequencing with a MinION sequencer.

      Many of the June samples failed to provide sufficient sequence information. Could the authors comment on why these samples failed? While some samples did indeed have low yields, this was not the case for all (supp table 2 and supp figure 5) and it could be interesting to know if they think additional water parameters or extraction conditions could have affected yields and subsequent sequencing depth.

      One of the advantages of nanopore sequencing is that you can obtain species-level information. It would therefore be helpful if the authors could include information on how many of their sequenced 16S amplicons provided species-level identification.

      While the overall analysis of microbial communities is well done, it is not entirely clear how the authors define their core microbiome. Are they reporting mainly the most abundant taxa (dominant core microbiome), and would this change if you look at a taxonomic rank below the family level? How does the core compare, for example, with other studies of this same river?

    3. Summary: The authors present a survey of the bacterial community in the Cam River in Cambridge, UK, using Nanopore DNA sequencing, one of the latest DNA sequencing technologies. They profile microbial communities along the river, correlate with physicochemical parameters and identify potential pathogens and sewage signals. The work provides standardized protocols and bioinformatics tools for analysis of bacteria in freshwater samples, with the aim of providing a low-cost and optimized workflow that can be applied for the monitoring of complex aquatic microbiomes.

    1. Reviewer #3:

      The authors combine minimal and detailed models of hippocampal theta rhythm generation to understand the underlying mechanisms at the cellular-network level. In their 3 steps approach, they extend previous minimal models, they compare these minimal models with more detailed models and they use a piece (segment) of the detailed model to compare it to the minimal models.

      I have a number of methodological issues with the paper. First, both models should be validated against experimental evidence given that the experimental results exist. The validation of a "minimal" model with data from another model is circumstantial and useful to link two models, but in no way is a scientific validation, in my opinion. Second, the model reduction arguments are simply taken as a piece of a large model. This is in now way a systematic reduction, which the authors should provide. In the absence of that, the two models are simply two different models. Third, it is not clear what aspects of the mechanisms cannot be investigated using the larger models that require the reduced models, given that the models do not necessarily match. Fourth, the concept of a minimal model should be clearly explained. They used caricature (toy) models of (2D quadratic models, aka Izhikevich models) combined with biophysically plausible descriptions of synapses. The model parameters in 2D quadratic models are not biophysical as the authors acknowledge, but they can be related to biophysical parameters through the specific equations provided in Rotstein (JCNS, 2015) and Turquist & Rotstein (Encyclopedia of Computational Neuroscience, 2018). In fact, they can represent either h-currents or M-currents. I suggest the authors determine this from these references. In this framework, the dynamics would result from a combination of these currents and persistent sodium or fast (transient) sodium activation. Fifth, from the original paper (Ferguson et al., 2017) their minimal model has 500 PV and 10000 PYR cells (I couldn't find the number of PV cells in this paper, but I assumed they were as in the original paper). This is not what I would call a minimal model. It is minimal only in comparison with the more detailed model. While this is a matter of semantics, it should be clarified since there are other minimal model approaches in the literature (e.g., Kopell group, Erdi group). Related to these models, it is typically assumed that the relationship between PYR to PV is 5/1. This is certainly not holy, but seems to have been validated. Here it is 20/1. Is there any reason for that? Sixth, the networks are so big that it is very difficult to gain some profound insight. What is it about the large networks and their contribution to the generation of theta activity that cannot be learned from "more minimal" networks?

      Because of these concerns and the development of the paper, I believe the paper is about the comparison between two existing models that the authors have constructed in the past and the parameter exploration of these models.

      I find the paper extremely difficult to read. It is not about the narrative, but about the organization of the results and the lack (or scarcity) of clear statements. I can't seem to be able to easily extract the principles that emerge from the analysis. There are a big number of cases and data, but what do we get out of that? Perhaps creating "telling titles" for each section/subsection would help, where the main result is the title of the section/subsection. I also find an issue with the acronyms. One has to keep track of numbers, cases, acronyms (N, B), etc. All that gets in the way of the understanding. I believe figures would help.

      Another confusing issue in the paper is the use of the concept of "building blocks". I am not opposed to the use of these words, on the contrary. But building blocks are typically associated with the model structure (e.g., currents in a neuron, neurons in a network). PIR, SFA and Rheo are a different type of building blocks, which I would call "functional building blocks". They are building blocks in a functional world of model behavior, but not in the world of modeling components. For example, PIR can be instantiated by different combinations of ionic currents receiving inhibitory inputs. Also, the definitions of the building blocks and how they are quantified should be clearly stated in a separate section or subsection.

      I disagree with the authors' statement in lines 214-216, related to Fig. 4. They claim that "From them, we can say that the PYR cell firing does not speci1cally occur because of their IPSCs, as spiking can occur before or just after its IPSCs." Figure 4 (top, left panel) suggests the opposite, but instead of being a PIR mechanism, it is a "building-up" of the "adaptation" current in the PYR cell. (By "adaptation" current I mean the current corresponding to the second variable in the model. If this variable were the gating variable of the h-current, it would be the same type of mechanism suggested in Rotstein et al. (2005) and in the models presented in Stark et al. (2013).) The mechanism operates as follow: the first PV-spike (not shown in the figure) causes a rebound, which is not strong enough to produce a PYR spike before a new PV spike occurs (the first in the figure), this second PV-spike causes a stronger rebound (it is super clear in the figure), which is still not strong enough to produce a PYR-spike before the new PV-spike arrives, this third PV spike produces a still stronger rebound, which now causes a PYR spike. The fact that this PYR spike occurs before the PV spike is not indicative of the authors' conclusions, but quite the opposite.

      The authors should check whether the mechanistic hypothesis I just described, which is consistent with Fig. 4 (top, left panel), is also consist with the rest of the panels and, more generally, with their modeling results and the experimental data and whether it is general and, if not, what are the conditions under which it is. If my hypothesis ends up not being proven, then they should come up with an alternative hypothesis. The condition the authors' state about the parameter "b" and PIR is not necessarily general. PIR and other phenomena are typically controlled by the combined effect of more than one parameter. As it stands, their basic assumption behind the PRC is not necessarily valid.

      The subsequent hypothesis (about PYR bursting) is called to question in view of the previous comments. The experimental data should be able to provide an answer.

      The authors' should provide a more detailed explanation and justification for the presence of an inhibitory "bolus". What would the timescale be? Again, the data should provide evidence of that. In their discussion about the PRC, the authors essentially conclude what they hypothesis, but this conclusion is based on the "bolus" idea. The validity of this should be revised.

      The discussion about degeneracy of the theta rhythm generation is interesting. However, because of the size and complexity of the models, this degeneracy is expected. Their minimal modeling approach does not help in shedding any additional light. In addition, the authors' do not discuss the intrinsic sources of degeneracy and how they interact with the intrinsic ones.

      The last two sections were difficult to follow and I found them anecdotal. I was expecting a deeper mechanistic analysis. However, I have to acknowledge that because of my difficulty in following the paper, I might have missed important issues.

      The discussion is extensive, exhaustive and interesting. But it is not clear how the paper results are integrated in this big picture, except for a number of generic statements.

      The proposal that the hippocampus has the circuitry to produce theta oscillations without the need of medial septum input has been proposed before by Gillies et. (2003) and the models in Rotstein et al. (2005) and Orban et al. (2005). But the idea from this work is not that the hippocampus (CA1) is a pacemaker, but rather what we now call a "resonator". To claim that the MS is simply an amplificatory of an existing oscillator is against the existing evidence.

    2. Reviewer #2:

      In this work Chatzikalymniou et al. use models of hippocampus of different complexities to understand the emergence and robustness of intra-hippocampal theta rhythms. They use a segment of highly detailed model as a bridge to leverage insights from a minimal model of spiking point neurons to the level of a full hippocampus. This is an interesting approach as the minimal model is more amenable to analysis and probing the parameter space while the detailed model is potentially closer to experiment yet difficult and costly to explore.

      The study of network problems is very demanding, there are no good ways to address robustness of the realistic models and the parameter space makes brute force approaches impractical. The angle of attack proposed here is interesting. While this is surely not the only approach tenable, it is sensible, justified, and actually implemented. The amount of work which entered this project is clear. I essentially accept the proposed reasoning and the hypotheses put forward. The few remarks I have are rather minor, but I think they merit a response.

      1) l. 528-530 "This is particularly noticeable in Figure 9D where theta rhythms are present and can be seen to be due to the PYR cell population firing in bursts of theta frequency. Even more, we notice that the pattern of the input current to the PYR cells isn't theta-paced or periodic (see Figure 10Bi)."

      This is a loose statement. When you look at the raw LFP theta is also not apparent (e.g. Figure 9.Ei or Fi). What happens once you look at the spectrum of the activity shown in 10.Bi? Do you see theta or not?

      2) l. 562 "This implies that the different E-I balances in the segment model that allow LFP theta rhythms to emerge are not all consistent with the experimental data, and by extension, the biological system."

      This is speculative. We do not know how generic the results of Amilhon et al. are. They showed what you can find experimentally, not what you cannot find experimentally. I agree with the statement from l.581, though : "Thus, from the perspective of the experiments of Amilhon et al. (2015) theta rhythm generation via a case a type pathway seems more biologically realistic ..."

      3) There are several problems with access to code and data provided in the manuscript.

      l. 986, 1113 - osf.io does not give access<br> l. 1027 - bitbucket of bezaire does not allow access l. 1030 - simtracker link is down l. 1129, 1141 - the github link does not exist (private repo?)

      4) l. 1017 - Afferent inputs from CA3 and EC are also included in the form of Poisson-distributed spiking units from artificial CA3 and EC cells.

      Not obvious if Poisson is adequate here - did you check on the statistics of inputs? Any references? Different input statistics may induce specific correlations which might affect the size of fluctuations of the input current. I do not think this would be a significant effect here unless the departure from Poisson is highly significant. Any comments might be useful.

      5) l. 909 - "Euler integration method is used to integrate the cell equations with a timestep of 0.1 msec."

      This seems dangerous. Is the computation so costly that more advanced integration is not viable?

    3. Reviewer #1:

      This study takes two existing models of hippocampal theta rhythm generation, a reduced one with two populations of Izhikevich neurons, and a detailed one with numerous biophysically detailed neuronal models. The authors do some parameter variation on 3 parameters in the reduced model and ask which are sensitive control parameters. They then examine control of theta frequency through a phase response curve and propose an inhibition-based tuning mechanism. They then map between the reduced and detailed model, and find that connectivity but not synaptic weights are consistent. They take a subset of the detailed model and do a 2 parameter exploration of rhythm generation. They compare phenomenological outcomes of the model with results from an optogenetic experiment to support their interpretation of an inhibition-based tuning mechanism for intrinsic generation of theta rhythm in the hippocampus.

      General comments:

      1) The paper shows the existence of potential rhythm mechanisms, but the approach is illustrative rather than definitive. For example, in a very lengthy section on parameter exploration in the reduced model, the authors find some domains which do and don't exhibit rhythms. Lacking further exploration or analytic results, it is hard to see if their interpretations are conclusive.

      2) The authors present too much detail on too few dimensions of parameters. An exhaustive parameter search would normally go systematically through all parameters, and be digested in an automated manner. For reporting this, a condensed summary would be presented. Here the authors look at 3 parameters for the reduced model and 2 parameters in the detailed one - far fewer than the available parameter set. They discuss the properties of these parameter choices at length, but then pick out a couple of illustrative points in the parameter domain for further pursuit. This leaves the reader rather overwhelmed on the one hand, and is not a convincing thorough exploration of all parameters of the system on the other.

      3) I wonder if the 'minimal' model is minimal enough. Clearly it is well- supplied with free parameters. Is there a simpler mapping to rate models or even dynamical systems that might provide more complete insights, albeit at the risk of further abstraction?

      4) Around line 560 and Fig 12 the authors conclude that only case a) is consistent with experiment. While it is important to match data to experiment, here the match is phenomenological. It misses the opportunity to do a quantitative match which could be done by taking advantage of the biological detail in the model.

      5) The paper is far too long and is a difficult read. Many items of discussion are interspersed in the results, for example around line 335 among many others.

    1. Reviewer #3:

      The study by Jackson et al. characterizes the progression of the degeneration of axons and dendrites, including metrics on density and dynamics of dendritic spines and terminaux boutons (TBs), in the rTg4510 transgenic mouse model. The authors describe a decrease in the density of both structures, spines and TBs, as well as degeneration of neurites. Repression of the expression of the mutated version of tau was able to partially mitigate some of the negative effects observed in the non-repressed condition. When degeneration of the neuronal process was observed, the loss of a dendritic branch was preceded by a sharp increase in the loss of dendritic spines, while axonal loss was preceded by a long-lasting and progressive loss of TBs. While the findings are interesting, there are several concerns that dampened the enthusiasm on the study:

      1) The data obtained with the rTg4510 mouse model must be very carefully interpreted given that the disruption of the endogenous gene Fgf14 that occurs in this mouse model contributes significantly to the neurodegenerative phenotype (Gamache et al., 2019). While the authors acknowledge the possibility that genetic factors other than tau hyperphosphorylation may contribute to the rTg4510 pathology, the results must be put into the perspective of the mouse model rather than into the perspective of the tauopathy exclusively. In this sense, it would be recommended that the caveats of the mouse model be included in the introduction.

      2) The authors do not either mention the sex of the animals used in the study or how many mice from each sex were included in each experimental group. This is an important matter because it has been described that the rTg4510 mouse model presents with sex differences in the degree of accumulation of tau (Yue et al., 2009; Song et al., 2015).

      3) A big concern is the identity of the neurons labeled. The strategy to label cells is very unspecific and no details are given on their identity. Different subtypes of pyramidal neurons with different densities of dendritic spines and axon boutons may be mixed up in different proportions in each group and batch. In fact, the resilience of different neuron subtypes to the pathology may be different too. If the authors cannot pinpoint the identity of the neuron imaged, an elaboration on this issue must be included in the manuscript. In addition, the manuscript must include representative images of the cortex of both genotypes showing the labeling pattern obtained with their approach. It is recommended to the authors to add more information about the vector.

      4) How did the authors estimate the point of divergence between genotypes? The authors mentioned the 30-35 wk and 50 wk as points of divergence - which should be interpreted as the first time points where the differences between groups are significantly different - in lines 180-183. While the Wald test and the Akaike information criterion indicate that genotype is the factor with the most influence on the model estimates, it does not compute statistical differences between phenotypes at a given time point. Regarding the GAMMs, some fits suggest that data at earlier points may be very different between groups (i.e., Fig 2E, 5C, 6C). Is the decrease in density of TBs over time in WT mice significant? How do the authors interpret those fits?

      5) Looking at the data in Figures 1E and 2E, one would expect more negative growth values in figs 5E and 6E, indicating a larger decrease in density. They are flat. Are these analyses well powered? Are the data in Figures 5E and 6E not representative?

    2. Reviewer #2:

      This manuscript asked the question of how axons vs dendrites are lost by the live-imaging cortex of rTg4510 tau transgenic mice. Overall, this manuscript is well-done and well-written, and confirms previous findings. However, there are a number of key controls missing from the experimental data (please see below). Statistical analyses are satisfactory (with some caveats, please see below).

      Figures 1+2 replicate previous findings also in rTg4510 (Crimins et al., 2012; Jackson et al., 2017; Kopeikina et al., 2013); Figures 3+4 (Ramsden et al., 2005; SantaCruz et al., 2005; Spires et al., 2006; Crimins et al., 2012; Kopeikina et al., 2013; Helboe et al., 2017; Jackson et al., 2017). The novelty here are the differing patterns of bouton and spine turnover shortly before axons and dendrites, respectively, are lost, which is a finding uniquely enabled by 2-photon. Thus, findings in Fig. 5/6 should be highlighted and solidified. Further, the manuscript lacks mechanistic insight.

      It is not clear how the authors ensure that the perceived loss of spines/boutons/dendrites/axons is not due to bleaching or loss of the GFP signal. Please validate loss of spines/boutons and actual synapses using fixed tissue imaging or electron microscopy on a separate cohort of mice.

      Did the authors control for gliosis after the repeated imaging (very short after viral injection and cranial window implant on the same site)? Could it be that the repeated imaging itself on a damaged tissue induces blebbing on the already more vulnerable spines in the tau mice? Please show Iba1 and GFAP with and without doxycycline administration should be included in supplemental along with area staining quantification. Transgenic mice without manipulation (viral injection/cranial window/2P imaging) should also act as a control to ensure no gliosis is observed.

      rTg4510 transgene insertion: Gamache et al. recently showed that the integration sites of both the CaMKIIα-tTA and MAPT-P301L transgenes impact the expression of endogenous mouse genes. The disruption of the Fgf14 gene in particular contributes to the pathological phenotype of these mice, making it difficult to directly ascribe the phenotypes seen in the manuscript to MAPT-P301L transgene overexpression. Although this limitation is acknowledged in the discussion, the T2 mice employed in this paper (Gamache et al., 2019) would be suitable controls to better evaluate the contribution of tauP301L alone on the neuropathology and disease progression observed in the authors' experiments, at least in fixed synapse imaging.

    3. Reviewer #1:

      Studies in mouse models and humans show synapse loss and dysfunction that precede neurodegeneration, raising questions about timing and mechanisms. Using longitudinal in vivo 2-photon imaging, Jackson et al., investigate pre- and post-synaptic changes in rTg4510 mice, a widely used mouse model of tauopathy. Consistent with cross sectional studies, the authors observed a reduction in density of presynaptic axons and dendritic spines in layer 1 cortex that relate to degeneration of neurites and dendrites over time. Taking advantage of an inducible model to overexpress tau p301L, they show that reducing expression of tau by DOX early in disease progression, resulted in amelioration of synapse loss, also consistent with other studies. Interestingly, the authors observed a significant reduction of dendritic spines less than a week before dendrite degeneration. In contrast, they observed plasticity and turnover of presynaptic structures weeks before axonal degeneration, suggesting different mechanisms.

      Overall the results are interesting and largely consistent with previous findings. The new findings shown in Figures 5 and 6 address the timing of pre and postsynaptic loss and structural plasticity and reveal interesting differences; however, the data are highly variable and there are several issues that diminish enthusiasm as outlined below. Moreover, this study does not include new biological or mechanistic insight into the differences in pre- and post-synaptic changes from previous work in the field.

      The main weakness relates to the significance and relevance beyond this specific mouse model and brain region. I appreciate the strengths but also technical challenges of in vivo longitudinal imaging, including a small field of view. Thus, the rationale and choice of model and brain region, and validation of key findings is critical to support conclusions. In this case, the tau model, although used by others, has several caveats relevant to the investigation of synapse loss (see point 2 below) that weaken this study and its impact.

      1) Most of the work in the model related to synapse loss and dysfunction have been carried out in hippocampus and other regions of cortex in this model and tau and amyloid models. Here the authors focused on layer 1 of (somatosensory) cortex and followed neurites of pyramidal cells labeled with AAV:GFP, an approach that does not enable one image and track axons and dendrites from large numbers of neurons. They observed divergent dynamics in spine and presynaptic TBS of individual dendrites and axons. Given the small number of neurons sampled, significant noise in their imaging data, these findings need more validation using other approaches. This is particularly important for the data and conclusion drawn from Figures 5 and 6 (see point 3).

      To estimate the overall effect of genotype the authors fitted Generalized Additive Mixed Models (GAMMS) to their data given the variability in the data within animals and genotype. It would be helpful to those less familiar to provide more comparisons of data using additional statistical tests and analyses along with power analyses calculations.

      2) Major caveat with inducible Tau mode Tg4510. While this inducible model has the advantage of controlling timing of tau overexpression in neurons, a recent study by Gamache et al (PMID: 31685653) demonstrated that there are issues with the transgene insertion site and factors other than tau expression are actually what is driving the phenotype. Thus, differences in synaptic and behavioral phenotypes are based on the mouse line used and this needs to be carefully controlled. This was not addressed or discussed. See https://pubmed.ncbi.nlm.nih.gov/31171783/ and https://pubmed.ncbi.nlm.nih.gov/30659012/

      3) The interesting new findings presented in Figures 5 and 6 that address timing and differences in axonal and dendritic/spine plasticity and loss need to be validated with more neurons and animals. The sample size is small ( i.e. n= 18 axons from 7 animals and not clear how many neurons. Given the significant variability of the data even within animals, these experiments and data are considered preliminary.

      4) How does anesthesia influence these changes in structural plasticity observed? This was not addressed or discussed.

    1. Reviewer #2:

      The paper titled "Brain Network Reconfiguration for Narrative and Argumentative Thought" sought to uncover the common neural processing sequences (time-locked activations and deactivations; inter-subject correlations and inter-subject functional connectivity) underlying narrative and argumentative thought. In particular, the study aimed to provide evidence that would help adjudicate between two current theories: the Content-Dependent Hypothesis (narrative argumentative) and the Content-Independent Hypothesis (narrative = argumentative). In order to assess these possibilities they tested participants in an fMRI scanner as they listened to validated narrative and argumentative texts. Each text condition was directly compared to resting state and scrambled versions of the texts. Across a range of interesting analyses that focus on how each participant's brain synchronized with other participants' brains throughout the same narrative and argumentative texts, they primarily found support for the content-dependent hypothesis with a few differences and commonalities across text conditions. Relative to the scrambled conditions, listening to narrative texts was more associated with default mode activity across participants and listening to argumentative texts only activated a common network of superior fronto-parietal control regions and language regions. Argumentative texts did not differ much from scrambled versions of the same text. These patterns reveal themselves in both ISC and ISFC data. Overall, I feel like this paper is really well written and is a novel approach to distinguishing the neural processes between similar, but different types of thought. At times the manuscript loses touch with its primary brain coordination metrics (ISC and ISFC), describing the findings more like a GLM or functional connectivity study.

      Comments:

      Introduction:

      1) The introduction is very clearly written and uses a wonderful variety of sentence structure. Well done!

      2) While the writing is beautiful, a few sentences are less easy to comprehend than others. For example the use of outstands in line 36 is a bit difficult to parse on first read. Consider simplifying the language some.

      3) There seems to be an opportunity to discuss this work and its findings in a broad context of narrative or argumentative self-generated internal thought (not based on listening to texts). For instance, I think there could be a few sentences tying this work to studies of autobiographical memory retrieval or mind wandering (for argumentation perhaps studies of the cognitive and neural processes behind complex decision making). This is captured to some extent in the introduction and discussion, but I think it could go further with citations beyond those just associated with listening to various types of text.

      4) Appreciate the thorough discussion of hypotheses and background.

      5) It is not necessary, but it might be interesting to show some basic functional connectivity analyses of the individual participant activations in supplemental analyses (no ISC or ISFC).

      Methods:

      1) Please clarify how the ISFC analysis can be directional in any way? Does unidirectional mean that you're just taking one value for each pairwise connection Cij?

      Results:

      1) To what extent is there a concern that participants would still try to stitch together the scrambled narratives even if they are less coherent? Was this even possible given the nature of the stimuli?

      2) In line 125 and throughout the authors should consistently remind the reader that 'engagement' in this case means that there were consistent and correlated increases in the bold response across participants. This differs in some ways to task engagement in event-related GLM studies.

      3) The language throughout should reflect consistent involvement across participants at particular time points in each of the narratives vs the argumentative.

      4) It seems like argumentative is more similar to the scrambled in many ways. Might it be that argumentative texts are just less coherent and structured than narrative texts?

      5) It seems clear that the neural processing of argumentative texts (64 distinct edges) were very different from the narrative texts (2348 distinct edges), but that the current contrasts did not clearly and consistently distinguish argumentative thought from the scrambled argument conditions. A discussion of the analyses that might be necessary to better elucidate the dynamics of processing for argumentative thought would be helpful.

      Discussion:

      1) Were there any neural differences between the narrative vs argument scrambled-texts? This might reveal any differences in the processing of the scrambled texts for each condition and might help shine light on features of the scrambled argument condition that contributed to the overall lack of distinction relative to the narrative vs scrambled narrative conditions.

      2) Throughout the results from ISC and ISFC findings are convolved with the findings from univariate or GLM results from prior studies. Please compare and contrast how ISC and ISFC findings might relate to univariate or GLM findings early in the discussion.

      3) Related to point 2 in the introduction, please also cite studies from autobiographical memory retrieval studies that also show the frontoparietal control system working as information is iteratively accumulated and updated over long temporal windows (St. Jacques et al., 2011; Inman et al., 2018; Daselaar et al., 2008).

      4) Please reconsider how the ISC findings are discussed as 'activation'. While the BOLD activity of these areas are certainly coordinated across participants at similar points in the text, I feel like the term activation fits best with studies that convolve the brain activity with an HRF. In particular, from what I understand ISC, a common decrease in BOLD activity across participants at the same time in a read text would also lead to activity or 'activation' of that area in an ISC analysis. This seems counterintuitive. The 2nd paragraph of the discussion describes ISC and ISFC well in terms of what it shows across a sample (synchronization of fluctuations in BOLD activity across participants for the same stimuli). "Activity" may capture this, but please consider some more nuanced ways to refer to these ISC and ISFC findings.

      Figures:

      1) Please double check the box plots in figure 1a for Scene Construction. Another method of displaying this likert rating data might be helpful. While appreciating the attempt to display the individual data points, the simple main points get somewhat obscured by all of the information in the graph.

      2) Overall, I appreciate the attention to detail in all of the figures and the completeness of the data visualization with several useful supplemental figures.

    2. Reviewer #1:

      Xu and colleagues compared the intersubject correlation (ISC) and intersubject functional connectivity (ISFC) of participants listening to narrative and argumentative texts while undergoing fMRI. Replicating earlier findings, they show that ISC in the DMN was greater when participants listened to an intact narrative than when they listened to a sentence-scrambled version of the same narrative. Listening to a sentence-scrambled argument elicited ISC in language and control regions of the brain, though interestingly, there was no region in the brain where ISC was greater when participants listened to an intact version of the argument. Instead, there was greater ISFC between the IPS and language areas of the brain when participants listened to the intact argument than when they listened to the scrambled argument. The authors interpret their results as suggesting that listening to the intact argument did not recruit additional brain systems, but instead promoted the cooperation between regions that were already involved in processing the argument.

      Most prior work using "naturalistic stimuli" has examined the neural responses to narratives. This manuscript extends this work in an important way by examining how the brain responds to arguments, which comprise a non-trivial proportion of the linguistic content people are exposed to on a daily basis. The ISFC results (Fig. 7) are particularly noteworthy and novel. My main concerns have to do with the possibility that ISC for the scrambled argument seems to be stronger and more extensive than that for the intact argument, and how this might affect the authors' interpretation of their results. Below are some suggestions and comments which I think the paper could benefit from considering further:

      1) I think it would be helpful to run the Scrambled Argument > Intact Argument ISC contrast. Visual inspection of Figure 2 suggests that ISC for the scrambled argument might be stronger than that for the intact argument, especially in control regions. If this is truly the case, I think the authors should discuss what this might imply about what is happening during the scrambled condition and if this affects thinking of the scrambled condition as a control for low-level linguistic features. In particular, the 2.97 out of 5 comprehensibility rating of the scrambled arguments suggests that participants might have understood the scrambled arguments. If participants are actively trying to make sense of the scrambled argument text, it seems like this could then drive observed differences in ISFC between the intact and scrambled arguments as well (e.g., decreased connectivity between control and language regions when trying to make sense of scrambled text, rather than increased connectivity between control and language regions when processing an intact argument).

      2) More broadly, I think the authors need to make sure their effects aren't driven by the scrambled conditions. For example, for Figure 2 - figure supplement 2, the (Intact Narrative - Scrambled Narrative) > (Intact Argument - Scrambled Argument) contrast can be driven by high ISC in the Scrambled Argument condition, which would suggest a different interpretation of the results. My suggestion would be to run the contrast as (Intact Narrative - Scrambled Narrative) > max((Intact Argument - Scrambled Argument),0) to make sure that the contrast isn't driven by a negative value on the right hand side of the inequality.

      3) Point 2 also applies to Figures 6 and 7. Relatedly, the rightmost panel of Figure 6C suggests that the analysis is indeed capturing some edges where the SES of the Scrambled Argument is greater than that of the Intact Argument.

      4) How well do the vertexes identified in Figure 7D overlap with the Intact Argument > Resting map? Given the authors interpretation that the ISFC results suggest cooperation between areas involved in processing the intact stimulus, I think this should be properly assessed.

      5) Both ISC and ISFC capture only signal that is shared across participants. Most narratives are crafted such that all listeners have a similar interpretation. This is unlike arguments, where different listeners might agree with an argument to a different extent. If listeners had differing interpretations of the argument, ISC/ISFC would miss brain activity/connectivity involved in processing an argument. I think this possibility should be considered and discussed, especially given the null DMN finding for the argumentative texts.

      6) For the t-tests on the behavioral ratings , it looks like the authors collapsed over the two texts within a category. This doesn't seem right, given that the ratings for each text are dependent. A mixed model approach would be more appropriate. I doubt this will change the results, but I think it would be good to follow best practices when possible.

    1. Reviewer #3:

      This paper describes a novel technique for measuring several distinct subcortical components, using naturalistic speech instead of the more typical clicks and tone-pips. The benefits of using extended speech (e.g., stories) include simultaneous measurement of middle- and late-latency components automatically.

      The technique is of great interest with many potential use cases. The manipulation of the acoustics is reasonable (replacing voiced speech with click trains of the same pitch), does not degrade intelligibility, and reduces sound quality only in minor ways. The manipulation is also described clearly for others to implement.

      The authors also investigate several variations and generalizations of the technique, and their tradeoffs, inducing responses from specific tonotopic bands and ear-specific responses.

      The reliability of the ABR wave I and V responses is remarkable (especially given the previous results of the senior author using unprocessed speech); wave III is less so. Being able to simultaneously record P0, Na, Pa, Nb, P1, N1, and P2 simultaneously shows promise for future clinical applications (and basic science). The practical importance of using a lower fundamental frequency (i.e., typical of male speakers), is clearly established.

      The technique has some overlap with the Chirp spEECh of Miller et al., but with enough tangible additional benefits that it should be considered novel.

      The writing is very clear.

      Major Concerns:

      "wave III was clearly identifiable in 16 of the 22 subjects": Figure 1 indicates that the word "clearly" may be somewhat generous. It would be worthwhile to discuss wave III and its identifiability in more detail (perhaps its identifiability/non-universality could be compared with that of another less prominent peak in traditionally obtained ABRs?).

    2. Reviewer #2:

      General assessment:

      This manuscript presents an improved methodology for extracting distinct early auditory evoked potentials from the EEG response to continuous natural speech, including a novel method for obtaining simultaneous responses from different frequency bands. It is a clever approach and the first results are promising, but more rigorous evaluation of the method and critical evaluation of the results is needed. It could provide a valuable tool for investigating the effect of corticofugal modulation of the early auditory pathway during speech processing. However, the claims made of its use investigating speech encoding or clinical diagnosis seem too speculative and unspecific.

      General comments:

      1) Despite repeated claims, I don't think a convincing case is made here that this method can provide insight on how speech is processed in the early auditory pathway. The response is essentially a click-like response elicited by the glottal pulses in the stimulus; it averages out information related to dynamic variations in envelope and pitch that are essential for speech perception; at the same time, it is highly sensitive to sound features that do not affect speech perception. What reason is there to assume that these responses contain information that is specific or informative about speech processing?

      2) Similarly, the claim that the methodology can be used as a clinical application is not convincing. It is not made clear what pathology these responses can detect that current methods ABR cannot, or why. As explained in the Discussion, the response size is inherently smaller than standard ABRs because of the higher repetition rate of the glottal pulses, and the response may depend on more complex neural interactions that would be difficult to quantify. Do these features not make them less suitable for clinical use?

      3) It needs to be rigorously confirmed that the earliest responses are not contaminated or influenced by responses from later sources. There seems to be some coherent activity or offset in the baseline (pre 0 ms), in particular with the lower filter cut off. One way to test this might be to simulate a simple response by filtering and time shifting the stimulus waveforms, adding these up plus realistic noise, and applying the deconvolution to see whether the input is accurately reproduced. It might be useful to see how the response latencies and amplitudes correlate to those of conventional click responses, and how they depend on stimulus level.

      4) The multiband responses show a variation of latency with frequency band that indicates a degree of cochlear frequency specificity. The latency functions reported here looks similar to those obtained by Don et al 1993 for derived band click responses, but the actual numbers for the frequency dependent delays (as estimated by eye from figures 4,6 and 7) seem shorter than those reported for wave V at 65 dB SPL (Don et al 1993 table II). The latency function would be better fitted to an exponential, as in Strelcyk et al 2009 (equation 1), than a quadratic function; the fitted exponent could be directly compared to their reported value.

      5) The fact that differences between narrators leads to changes to the ABR response is to be expected, and was already reported in Maddox and Lee 2018. I don't understand why it needs to be examined and discussed at such length here. The space devoted to discussing the recording time also seems very long. Neither abstract or introduction refers to these topics, and they seem to be side-issues that could be summarised and discussed much more briefly.

      L142-144. Is it possible to apply the pulse train regressor to the unaltered speech response? If so, does this improve the response, i.e. make it look more similar to the peaky speech response? It would be interesting to know whether improvement is due to the changed regressor or the stimulus modification or both.

      L208 -211. What causes the difference between the effect of high-pass filtering and subtracting the common response? If they serve the same purpose, but have different results, this raises the question which is more appropriate.

      L244. This seems a misinterpretation. The similarity between broadband and summated multiband responses indicates that the band filtered components in the multiband stimulus elicited responses that add linearly in the broadband response. It does not imply that the responses to the different bands originate from non-overlapping cochlear frequency regions.

      L339-342. Is this measure of SNR appropriate, when the baseline is artificially constructed by deconvolution and filtering? Perhaps noise level could be assessed by applying the deconvolution to a silent recording instead? It might also be useful to have a measure of the replicability of the response.

    3. Reviewer #1:

      Major issues:

      I have two major comments on the work.

      1) The authors motivate the work from the use of naturalistic speech, and the application of the developed method to investigate, for instance, speech-in-noise deficits. But they do not discuss how comprehensible the peaky speech in fact is. I would therefore like to see behavioural experiments that quantitatively compare speech-in-noise comprehension, for example SRTs, for the unaltered speech and the peaky speech. Without such a quantification, it is impossible to fully judge the usefulness of the reported method for further research and clinical applications.

      2) The neural responses to unaltered speech and to peaky speech are analysed by two different methods. For unaltered speech, the authors use the half-wave rectified waveform as the regressor. For peaky speech, however, the regressor is a series of spikes that are located at the timings of the glottal pulses. Due to this rather different analysis, it is impossible to know to which degree the differences in the neural responses to the two types of speech that the authors report are due to the different speech types, or due to the different analysis techniques. The authors should therefore use the same analysis technique for both types of speech. It might be most sensible to analyse the unaltered speech through a regressor with spikes at the glottal pulses a well. In addition, it would be good to see a comparison, say of a SNR, when the peaky speech is analysed through the half-wave rectified waveform and through the series of spikes. This would also further motivate the usage of the regressor with the series of spikes.

    1. Reviewer #3:

      In this study, Higgs et al. apply a systematic and hierarchical approach to testing the enrichment of imprinted gene expression in (mostly) adult tissues, culminating in a survey at the single-cell and neuronal sub-type level, which the authors achieve by exploitation of now extensive single-cell gene expression datasets. Arguably, there are no great surprises in this analysis: it reinforces previous studies showing/suggesting an enrichment for imprinted genes in the brain, with functions in feeding, parental behaviour, etc. But, it is conducted in a rigorous manner and makes highly informed inferences about the expression domains and neuronal subtypes identified. This level of detail is beyond any previous survey, therefore, the study will provide an excellent resource (although the fine details of the specific neuronal sub-populations in which imprinted gene expression is enriched are likely to be of interest to specialists only). Having, at all levels of their analysis, access to two or more single-cell datasets provides an important level of confidence in the analysis and findings, although there are some discrepancies between the enrichments found in comparing any two datasets. Moreover, the findings will give more prominence to neuronal domains that have received less emphasis in functional studies, for example, the enrichment of imprinted genes within the suprachiasmatic nucleus implicating roles in circadian processes.

      Imprinted expression covers a range of allelic biases and we are still some way from really understanding what an allelic skew means in comparison to absolute monoallelic expression: biased expression in all cells in a tissue or a mosaic of mono- and biallelically expressing cells. So finding an imprinted gene expressed in a given cell type without knowing whether its expression is actually imprinted in that cell type is a problem. And certainly a significant proportion of more recently discovered brain-expressed imprinted genes seem to fall into a category or paternal bias rather than full monoallelic expression. The authors do acknowledge this caveat in their discussion (lines 491-499). Is it possible to stratify the analysis according to degree of allelic bias? Ultimately, scRNA-seq using hybrid tissues will be important to resolve such issues. In this context, the authors will need to discuss findings in the very recently published paper from Laukoter et al. (Neuron, 2020), although that study focussed on cortical neurons in which Higgs and colleagues do not find imprinted gene enrichments.

      Another issue that could cloud the analysis, and particularly inference of how PEGs and MEGs could be involved in separate functions, is the issue of complex transcription units. The authors allude to Grb10 in which there are maternally and paternally expressed isoforms largely arising from separate promoters, which also applies to Gnas. There are also cases in which there are imprinted and non-imprinted isoforms. A problem with short-read RNA-seq libraries will be that much of the expression data for a given transcription unit cannot discriminate such differentially imprinted isoforms, as most of the reads mapping to the locus will map to shared exons. This caveat probably also needs to be mentioned in the text.

      The authors give some prominence to Peg3 as an example of the role of imprinted genes in maternal behaviours (e.g., line 508) as reported in the original knock-out (Li et al. 1999). However, this particular Peg3-knock-out associated phenotype has been questioned by a more recent Peg3 knock-out in which it was not observed (Denizot et al. 2016 PMID: 27187722), suggesting that the initial phenotype could be a consequence of the nature of the targeting insertion rather than Peg3 ablation.

      While a general picture that emerges is of imprinted genes acting in concert to influence shared functions (e.g., feeding), the authors also point out cases in which a single imprinted gene contributes to a neuronal function (Ube3a in the case of hippocampal-related learning and memory; line 511-512) but for which they did not find enrichment of imprinted genes in the relevant neuronal population. This poses some problems, but it could indicate that that particular function of the gene is not the function for which imprinting was selected if the gene is active in other domains, but is rather 'tolerated'. Of course, many imprinted genes will have multiple physiological functions, so the convergence on specific functions probably provides the best (but by no means perfect) basis for discerning the evolutionary imperatives.

    2. Reviewer #2:

      General assessment of the work:

      In this manuscript Higgs and colleagues test the hypothesis that imprinted gene expression is enriched in the brain, and that identifying specific brain regions of enrichment will aid in uncovering physiological roles for imprinted pathways. The authors claim that the hypothesis that imprinted genes are enriched in key brain functions has never been formally/systematically tested. Moreover, they suggest that their analysis represents an unbiased systems-biology approach to this question.

      In our assessment the authors fail to meet these criteria on several major grounds. Firstly, there are multiple instances of methodological bias in their analysis (detailed below). Secondly, the authors claim that their findings are validated by similar test results in 'matched' datasets. However, throughout the authors appear to have avoided identifying individual imprinted genes that are enriched in their analysis (they can be found in a minimally annotated supplementary file). Due to this it is impossible to judge to what extent there is agreement between matched datasets and between levels of the analysis. For these reasons the analysis appears arbitrary rather than systematic, and lacks rigor. Consequently we do not feel that the work of Higgs and colleagues goes beyond previous systematic reports of imprinting in the brain (for example, Gregg, 2010, Babak 2015, in ms reference list).

      Numbered summary of substantive concerns:

      1) Imprinted genes that were identified as enriched are not clearly named or listed

      -The authors use two or more independent datasets at each level to "strengthen any conclusions with convergent findings" (p4 ln96). By this the authors mean that both datasets pass the F-test criteria for enrichment. However, they should show which imprinted genes are allocated to each region, and clearly present the overlap. Are the same genes enriched in the two datasets? Similarly, are the same genes that are enriched in, e.g. the hypothalamus the same genes that are enriched in the ARC?

      -The authors discuss how their main aim of identifying expression "hotspots" helps inform imprinted gene function in the brain. An analysis of the actual genes is therefore crucial (and the assumed next step after identifying the location of enrichment).

      -The authors allocate parental expression enrichment to the brain regions but do not state why they do this analysis.

      -Are imprinted genes in the same cluster co-expressed, as might be expected?

      2) Selection of datasets needs to be more clearly explained (i.e. a selection criteria)

      -Their reason for selection "to create a hierarchical sequence of data analysis" - suggests that there could be potential bias in their selection based on previous knowledge of IG action in the brain.

      -A selection criteria would explain the level of similarity between datasets, which is important before datasets are systematically analyzed

      3) The study is more like a set of independent analyses of individual datasets (rather than one systematic/meta-analysis)

      -Each dataset was individually processed (filtered and normalized) following the original authors' procedure, rather than processing all the raw datasets the same way.

      -"A consistent filter, to keep all genes expressed in at least 20 cells or (when possible) with at least 50 reads" (p7 ln115), our emphasis - which filter was used? This should be consistent throughout.

      -Two different cut-offs were used to identify genes with upregulated expression, making the identification of enriched genes arbitrary (p7 para2).

      -Some datasets contain tissues from various time-points and sexes, but there is no clarification if all the data was included in the analysis. (e.g. the Ximerakis et al. dataset was originally an analysis of young and old mouse brains). This is particularly difficult to interpret when embryonic data is likened to adult data, which is in no way equivalent.

      -The cell-type and tissue-type identities were supplied by the dataset authors, based on their original clustering methods. This can be variable, particularly at the sub-population level.

      4) These differences make it hard to draw connections between the findings from each dataset

      -In some levels, the authors compare two datasets for a "convergence" of IG over-expression. Yet the above differences between datasets and analyses makes them difficult to compare. (e.g. the comparison of hypothalamic neuronal subtypes with enriched IG expression between two datasets in level 3.a.2 is quite speculative).

      -More generally, the authors draw connections between their findings from each level, but the lack of consistency between analyses may not justify these connections.

      5) Hence, the study does not lead to a definitive set of findings that is new to the field

      -The above reasons suggest that this is not an objective set of data about IG expression in the brain, but rather evidence of certain hotspots for targeted analysis. However, these hotspots were already known.

      -A systematic analysis of raw data using fewer datasets, that then includes and discusses the imprinted genes, may lead to novel findings and a paper with a clearer narrative.

    3. Reviewer #1:

      The authors studied the over-representation of imprinted genes in the mouse brain by using fifteen single-cell RNA sequencing datasets. The analysis was performed at three levels 1) whole-tissue level, 2) brain-region level, and 3) region-specific cell subpopulation level. Based on the over-representation and gene-enrichment analyses, they interpreted hypothalamic neuroendocrine populations and monoaminergic hindbrain neurons as specific hotspots of imprinted gene expression in the brain.

      Objective:

      Though the study is potentially interesting, the expression of imprinted genes in the brain and hypothalamus is already known (Davies W et al., 20005, Shing O et al. 2019, Gregg et al, 2010 including many other studies cited in the paper). However, the authors put forth two objectives, the first being whether imprinted gene expression is actually enriched in the brain compared to other adult tissues, where they did find brain as one of the tissues with over-represented imprinted genes. Secondly, whether the imprinted genes are enriched in specific brain regions. The study objectives cannot qualify as completely novel as it is the validation of most of what is already known using scRNA-seq datasets.

      Methods and Results

      Pros:

      -15 scRNA-seq datasets were analysed independently and they were processed as in the original publication.

      -Two enrichment methods used to find tissue-specific enrichment of imprinted genes and appropriate statistics applied wherever necessary.

      Concerns:

      -It is not clear how the over-representation using fisher's exact test was calculated? It would be appropriate to include the name of the software or R package, if used, in the basic workflow section of Materials and methods.

      -Why did authors particularly use Liger in R for GSEA analysis?

      -GSEA plots generated using Liger and represented for each analysis in the paper by itself does not look informative. For eg. in figure 4 and other GSEA plots in the paper- i) Which 'score' does the Y-axis represent? Include x-axis label and mention corrected GSEA q value either in the legend or the figure. ii) Was the normalized enrichment score (NES) calculated? What genes in the cluster represent maximum enrichment? A heat map of the imprinted genes contributing to the cell cluster will add more clarity to the GSEA plots.

      -Apart from the tissue-specific enrichment of gene sets, a functional GO/pathways enrichment of the group of imprinted genes will strengthen the connection of these genes with feeding, parental behavior and sleep.

      -Are these imprinted genes coexpressed across the analyzed brain structures, as the authors repeatedly stress on the functioning of imprinted genes as a group?

      -A basic workflow schematic might be necessary for an easy and quick understanding of the methods.

      Overall, the study gives some insight into the brain regions, particularly cell clusters in the brain where imprinted genes could be enriched. However, the nature of the study is preliminary and validates most of previous studies. The authors have already highlighted some of the limitations of the study in the discussion.

    1. Reviewer #2:

      The authors describe the development and use of a D-Serine sensor based on a periplasmic ligand binding protein (DalS) from Salmonella enterica in conjunction with a FRET readout between enhanced cyan fluorescent protein and Venus fluorescent protein. They rationally identify point mutations in the binding pocket that make the binding protein somewhat more selective for D-serine over glycine and D-alanine. Ligand docking into the binding site, as well as algorithms for increasing the stability, identified further mutants with higher thermostability and higher affinity for D-serine. The combined computational efforts lead to a sensor for D-serine with higher affinity for D-serine (Kd = ~ 7 µM), but also showed affinity for the native D-alanine (Kd = ~ 13 uM) and glycine (Kd = ~40 uM). Molecular simulations were then used to explain how remote mutations identified in the thermostability screen could lead to the observed alteration of ligand affinity. Finally, the D-SerFS was tested in 2P-imaging in hippocampal slices and in anesthetized mice using biotin-straptavidin to anchor exogenously applied purified protein sensor to the brain tissue and pipetting on saturating concentrations of D-serine ligand.

      Although presented as the development of a sensor for biology, this work primarily focuses on the application of existing protein engineering techniques to alter the ligand affinity and specificity of a ligand-binding protein domain. The authors are somewhat successful in improving specificity for the desired ligand, but much context is lacking. For any such engineering effort, the end goals should be laid out as explicitly as possible. What sorts of biological signals do they desire to measure? On what length scale? On what time scale? What is known about the concentrations of the analyte and potential competing factors in the tissue? Since the authors do not demonstrate the imaging of any physiological signals with their sensor and do not discuss in detail the nature of the signals they aim to see, the reader is unable to evaluate what effect (if any) all of their protein engineering work had on their progress toward the goal of imaging D-serine signals in tissue.

      As a paper describing a combination of protein engineering approaches to alter the ligand affinity and specificity of one protein, it is a relatively complete work. In its current form trying to present a new fluorescent biosensor for imaging biology it is strongly lacking. I would suggest the authors rework the story to exclusively focus on the protein engineering or continue to work on the sensor/imaging/etc until they are able to use it to image some biology.

      Additional Major Points:

      1) There is no discussion of why the authors chose to use non-specific chemical labeling of the tissue with NHS-biotin to anchor their sensor vs. genetic techniques to get cell-type specific expression and localization. There is no high-resolution imaging demonstrating that the sensor is localized where they intended.

      2) Why does the fluorescence of both the CFP and they YFP decrease upon addition of ligand (see e.g. Supplementary Figure 2)? Were these samples at the same concentration? Is this really a FRET sensor or more of an intensiometric sensor? Is this also true with 2P excitation? How does the Venus fluorescence change when Venus is excited directly? Perhaps fluorescence lifetime measurements could help inform what is happening.

      3) How reproducible are the spectral differences between LSQED and LSQED-T197Y? Only one trace for each is shown in Supplementary Figure 2 and the differences are very small, but the authors use these data to draw conclusions about the protein open-closed equilibrium.

      4) The first three mutations described are arrived upon by aligning DalS (which is more specific for D-Ala) with the NMDA receptor (which binds D-Ser). The authors then mutate two of the ligand pocket positions of DalS to the same amino acid found in NMDAR, but mutate the third position to glutamine instead of valine. I really can't understand why they don't even test Y148V if their goal is a sensor that hopefully detects D-Ser similar to the native NMDAR. I'm sure most readers will have the same confusion.

    2. Reviewer #1:

      The manuscript “A computationally designed fluorescent biosensor for D-serine" by Vongsouthi et al. reports the engineering of a fluorescent biosensor for D-serine using the D-alanine-specific solute-binding protein from Salmonella enterica (DalS) as a template. The authors engineer a DalS construct that has the enhanced cyan fluorescent protein (ECFP) and the Venus fluorescent protein (Venus) as terminal fusions, which serve as donor and acceptor fluorophores in resonance energy transfer (FRET) experiments. The reporters should monitor a conformational change induced by solute binding through a change of the FRET signal. The authors combine homology-guided rational protein engineering, in-silico ligand docking and computationally guided, stabilizing mutagenesis to transform DalS into a D-serine-specific biosensor applying iterative mutagenesis experiments. Functionality and solute affinity of modified DalS is probed using FRET assays. Vongsouthi et al. assess the applicability of the finally generated D-serine selective biosensor (D-SerFS) in-situ and in-vivo using fluorescence microscopy.

      Ionotropic glutamate receptors are ligand-gated ion channels that are importantly involved in brain development, learning, memory and disease. D-serine is a co-agonist of ionotropic glutamate receptors of the NMDA subtype. The modulation of NMDA signalling in the central nervous system through D-serine is hardly understood. Optical biosensors that can detect D-serine are lacking and the development of such sensors, as proposed in the present study, is an important target in biomedical research.

      The manuscript is well written and the data are clearly presented and discussed. The authors appear to have succeeded in the development of D-serine-selective fluorescent biosensor. But some questions arose concerning experimental design. Moreover, not all conclusions are fully supported by the data presented. I have the following comments.

      1) In the homology-guided design two residues in the binding site were mutated to the ones of the D-serine specific homologue NR1 (i.e. F117L and A147S), which lead to a significant increase of affinity to D-serine, as desired. The third residue, however, was mutated to glutamine (Y148Q) instead of the homologous valine (V), which resulted in a substantial loss of affinity to D-serine (Table 1). This "bad" mutation was carried through in consecutive optimization steps. Did the authors also try the homologous Y148V mutation? On page 5 the authors argue that Q instead of V would increase the size of the side chain pocket. But the opposite is true: the side chain of Q is more bulky than the one of V, which may explain the dramatic loss of affinity to D-serine. Mutation Y148V may be beneficial.

      2) Stabilities of constructs were estimated from melting temperatures (Tm) measured using thermal denaturation probed using the FRET signal of ECFP/Venus fusions. I am not sure if this methodology is appropriate to determine thermal stabilities of DalS and mutants thereof. Thermal unfolding of the fluorescence labels ECFP and Venus and their intrinsic, supposedly strongly temperature-dependent fluorescence emission intensities will interfere. A deconvolution of signals will be difficult. It would be helpful to see raw data from these measurements. All stabilities are reported in terms of deltaTm. What is the absolute Tm of the reference protein DalS? How does the thermal stability of DalS compare to thermal stabilities of ECFP and Venus? A more reliable probe for thermal stability would be the far-UV circular dichroism (CD) spectroscopic signal of DalS without fusions. DalS is a largely helical domain and will show a strong CD signal.

      3) The final construct D-SerFS has a dynamic range of only 7%, which is a low value. It seems that the FRET signal change caused by ligand binding to the construct is weak. Is it sufficient to reliably measure D-serine levels in-situ and in-vivo? In Figure 5H in-vivo signal changes show large errors and the signal of the positive sample is hardly above error compared to the signal of the control. Figure 5G is unclear. What does the fluorescence image show? Work presented in this manuscript that assesses functionality and applicability of the developed sensor in-situ and in-vivo is limited compared to the work showing its design. For example, control experiments showing FRET signal changes of the wild-type ECFP-DalS-Venus construct in comparison to the designed D-SerFS would be helpful to assess the outcome.

      4) The FRET spectra shown in Supplementary Figure 2, which exemplify the measurement of fluorescence ratios of ECFP/Venus, are confusing. I cannot see a significant change of FRET upon application of ligand. The ratios of the peak fluorescence intensities of ECFP and Venus (scanned from the data shown in Supplementary Figure 2) are the same for apo states and the ligand-saturated states. Instead what happens is that fluorescence emission intensities of both the donor and the acceptor bands are reduced upon application of ligand.

    1. Summary: The work detailed here explores a model of recurrent cortical networks and shows that homeostatic synaptic plasticity must be present in connections between both excitatory (E) to inhibitory (I) neurons and vice versa to produce the known E/I assemblies found in the cortex. There are some interesting findings about the consequences of assemblies formed in this way: there are stronger synapses between neurons that respond to similar stimuli; excitatory neurons show feature-specific suppression after plasticity; and the inhibitory network does not just provide a general untuned inhibitory signal, but instead sculpts excitatory processing A major claim in the manuscript that argues for the broad impact of the work is that this is one of only a handful of papers to show how a local approximation rule can instantiate feedback (akin to the back-propagation of error used to train neural networks in machine learning) in a biologically plausible way.

      Reviewer #1:

      The manuscript investigates the situations in which stimulus-specific assemblies can emerge in a recurrent network of excitatory (E) and inhibitory (I, presumed parvalbumin-positive) neurons. The authors combine 1) Hebbian plasticity of I->E synapses that is proportional to the difference between the E neuron's firing rate and a homeostatic target and 2) plasticity of E->I synapses that is proportional to the difference between the total excitatory input to the I neuron and a homeostatic target. These are sufficient to produce E/I assemblies in a network in which only the excitatory recurrence exhibits tuning at the initial condition. While the full implementation of the plasticity rules, derived from gradient descent on an objective function, would rely on nonlocal weight information, local approximations of the rules still lead to the desired results.

      Overall the results make sense and represent a new unsupervised method for generating cell assemblies consisting of both excitatory and inhibitory neurons. Major concerns are that the proposed rule ends up predicting a rather nonstandard form of plasticity for certain synapses, and that the results could be fleshed out more. Also, the strong novelty claimed could be softened or contextualized better, given that other recent papers have shown how to achieve something like backprop in recurrent neural networks (e.g. Murray eLife 2019).

      Comments:

      1) The main text would benefit from greater exposition of the plasticity rule and the distinction between the full expression and the approximation. While the general idea of backpropagation may be familiar to a good number of readers, here it is being used in a nonstandard way (to implement homeostasis), and this should be described more fully, with a few key equations.

      Additionally, the point that, for a recurrent network, the proposed rules are only related to gradient descent under the assumption that the network adiabatically follows the stimulus, seems important enough to state in the main text.

      2) The paper has a clear and simple message, but not much exploration of that message or elaboration on the results. Figures 2 and 3 do not convey much information, other than the fact that blocking either form of plasticity fails to produce the desired effects. This seems somewhat obvious -- almost by definition one can't have E/I assemblies if E->I or I->E connections are forced to remain random. This point deserves at most one figure, or maybe even just a few panels.

      3) The derived plasticity rule for E->I synapses, which requires modulation of I synapses based on a difference from a target value for the excitatory subcomponent of the input current, does not take a typical form for biologically plausible learning rules (which usually operate on firing rates or voltages, for example). The authors should explore and discuss in more depth this assumption. Is there experimental evidence for it? It seems like it might be a difficult quantity to signal to the synapse in order to guide plasticity. The authors note in the discussion that BCM-type rules fail here -- are there other approaches that would work? What about a more local form of plasticity that involves only the excitatory current local to a dendrite, for example?

      4) Does the initial structure in excitatory recurrence play a role, or is it just there to match the data?

      Reviewer #2:

      In this work, the authors simulated a rate-based recurrent network with 512 excitatory and 64 inhibitory neurons. The authors use this model to investigate which forms of synaptic plasticity are needed to reproduce the stimulus-specific interactions observed between pyramidal neurons and parvalbumin-expressing (PV) interneurons in mouse V1. When there is homeostatic synaptic plasticity from both excitatory to inhibitory and reciprocally from inhibitory to excitatory neurons in the simulated networks, they showed that the emergent E/I assemblies are qualitatively similar to those observed in mouse V1, e.g. there are stronger synapses for neurons responding to similar stimuli. They also identified that synaptic plasticity must be present in both directions (from pyramidal neurons to PV neurons and vice versa) to produce such E/I assemblies. Furthermore, they identified that these E/I assemblies enable the excitatory population in their simulations to show feature-specific suppression. Therefore, the author claimed that they found evidence that these inhibitory circuits do not provide a "blanket of inhibition", but rather a specific, activity-dependent sculpting of the excitatory response. They also claim that the learning rule they developed in this model shows for the first time how a local approximation rule can instantiate feedback alignment in their network, which is a method for achieving an approximation to a backpropagation-like learning rule in realistic neural networks.

      Major points:

      1) The authors claim that their synaptic plastic rule implements a recurrent variant of feedback alignment. Namely, "When we compare the weight updates the approximate rules perform to the updates that would occur using the gradient rule, the weight updates of the local approximations align to those of the gradient rules over learning". They also claim that this is the first time feedback alignment is demonstrated in a recurrent network. It seems that the weight replacement in this synaptic plastic rule is uniquely motivated by E/I balance, but the feedback alignment in [Lillicrap et al., 2016] is much more general. Thus, the precise connections between feedback alignment and this work remains a bit unclear.

      It would be good if the following things about this major claim of the manuscript could be expanded and/or clarified:

      i) In Fig S3 (upper, right vs. left), it is surprising that the Pyr->PV knock-out seems to produce a better alignment in PV->Pyr. Comparing the upper right of Fig S3 and the bottom figure of Fig 1g, it seems that the Pyr->PV knock-out performs equally well with a local approximation for the output connections of PV interneurons. Is this a special condition in this model that results in the emergence of the overall feedback alignment?

      ii) In the feedback alignment paper [Lillicrap et al., 2016], those authors introduce a "Random Feedback Weights Support"; this uses a random matrix, B, to replace the transpose of the backpropagation weight matrix. Here, the alignment seems to be based on the intuition that "The excitatory input connections onto the interneurons serve as a proxy for the transpose of the output connections," and "the task of balancing excitation by feedback inhibition favours symmetric connection." It seems synaptic plasticity here is mechanistically different; it is only similar to the feedback alignment [Lillicrap et al., 2016] because both reach a final balanced state. Please clarify how the results here are to be interpreted as an instantiation of feedback alignment - whether it is simply that the end state is similar, or if the mechanism is thought to be more deeply connected.

      iii) The feedback alignment [Lillicrap et al., 2016] works when the weight matrix has its entries near zero (e^TWBe>0). Are there any analogous conditions for the synaptic plastic rule to succeed?

      iv) In the supplementary material, the local approximation rule is developed using a 0th-order truncation of Eq's 15a and 15b. Is it noted that "If synapses are sufficiently weak ..., this approximation can be substituted into Eq. 15a and yields an equation that resembles a backpropagation rule in a feedforward network (E -> I -> E) with one hidden layer -- the interneurons." It would be helpful if the authors could discuss how this learning rule works in a general recurrent network, or if it will work for any network with sufficiently weak synapses.

      v) This synaptic plasticity rule seems to be closely related to another local approximation of backpropagation in recurrent neural network: e-prop in (Bellec et.al 2020, https://www.nature.com/articles/s41467-020-17236-y) and broadcast alignment (Nøkland 2016, Samadi et.al, 2017). These previous papers do not consider E/I balance in their approximations, but is E/I balance necessary for successful local approximation to these rules?

      2) In the discussion, it reads as if the BCM rule cannot apply to this recurrent network because of the limited number of interneurons in the simulation ("parts of stimulus space are not represented by any interneurons"). Is this a limitation of the size of the model? Would scaling up the simulation change how applicable the BCM learning rule is? It would be helpful if the authors offer a more detailed discussion on why some forms of plasticity in interneurons fail to produce stimulus specificity.

    1. Summary: Didychuk et al. report crystal and cryo-EM structures of the ORF68 protein from KSHV/HHV-8, plus the cryo-EM structure of its homologue BFLF1 from EBV/HHV-4. These structures, along with biochemical data presented in this paper and the group's previous work, demonstrate convincingly that ORF68 is a DNA-binding protein involved in genome packaging. Importantly, the authors show that the conserved cysteine residues in ORF68 mediate zinc ligation, suggesting that they play a structural role rather than a role in intracellular disulfide bond regulation (as had been hypothesised for the HSV-1/HHV-1 homologue pUL32). The work is methodologically sound and provides a structural framework for probing the function of ORF68 and homologues in virus assembly.

      Reviewer #1:

      The genome packaging machinery of herpesviruses is composed of 6 proteins. The functions of 5 of these have been relatively well characterized, but little is known about the 6th component, the conserved protein termed ORF68 in KSHV. Here, by obtaining a high-resolution structure of ORF68 (and its homolog from a closely related EBV), authors show that it forms a pentameric ring with a positively charged pore that could accommodate dsDNA. Authors further show that the basic residues lining the pore are essential for DNA binding, genome packaging, and viral replication. These data for the first time suggest that ORF68 binds the dsDNA genome and may, in some manner, act as an adaptor bringing the genome and the genome-packaging terminase motor to the capsid portal. Structural analysis suggests that all ORF68 homologs share similar architecture, providing templates for the future mechanistic exploration. The study is well executed, and the manuscript was a pleasure to read. The concerns are minor except for the following.

      The functional importance of basic residues lining the pore leaves little doubt that some sort of a quaternary structure with a pore that would accommodate dsDNA is formed in vivo. However, the authors do not formally show that the pentameric assembly observed in vitro is functionally relevant nor consider the possibility that a functionally relevant assembly could be something other than a pentamer. If ORF68 acts as an adaptor that tethers the hexameric terminase motor to the dodecameric capsid portal, it could very well be a hexamer. In principle, it could even form a spiral rather than a ring. Understandably, obtaining additional structures may be beyond the scope of this manuscript whereas mutagenesis of the pentameric interface would not rule out hexamers (pentameric and hexameric interfaces may be quite similar). Nonetheless, the authors could, at least, acknowledge the possibility of alternative oligomeric states.

      Reviewer #2:

      Didychuk et al. report crystal and cryo-EM structures of the ORF68 protein from KSHV/HHV-8, plus the cryo-EM structure of its homologue BFLF1 from EBV/HHV-4. These structures, along with biochemical data presented in this paper and the group's previous work, demonstrate convincingly that ORF68 is a DNA-binding protein involved in genome packaging. Importantly, the authors show that the conserved cysteine residues in ORF68 mediate zinc ligation, suggesting that they play a structural role rather than a role in intracellular disulfide bond regulation (as had been hypothesised for the HSV-1/HHV-1 homologue pUL32). The work is methodologically sound and provides a structural framework for probing the function of ORF68 and homologues in virus assembly.

      Limitations of the study are that it does not identify any specific interactions with other members of the terminase/packaging complex, so the exact role of ORF68 and homologues remains enigmatic. However, several compelling hypotheses are presented in Figure 6 and this work will undoubtedly stimulate further investigations to unravel the precise function of ORF68.

      Substantive issues:

      1) The authors assert that ORF68, BFLF1 and UL32 all form pentamers, and that this is the active form of these proteins. While this is supported by the EM analysis of ORF68 and UL32, the assertion that BFLF1 is also most likely active as a pentamer (lines 166-7) is not supported by data. Ideally the authors would use analytical ultracentrifugation or MALS to define the oligomeric state of the particles in solution, but analytical size exclusion chromatography would be sufficient to confirm that ORF68, BFLF1 and UL32 all form similarly sized particles in solution.

      2) The structural work presented in this manuscript show compellingly that ORF68 and BFLF1 share the same fold, and sequence conservation suggests that this fold will be conserved across alpha- and beta-herpesvirus homologues, UL32 and UL52 (respectively). However, building a homology model of UL32 and UL52 using ORF68 as a template structure does not provide additional support to this hypothesis - by definition a homology model will always look similar to its template structure. Figures 3(c,d) and discussion of the homology models should be removed in favour of a discussion of sequence conservation (Figure S4).

      3) The authors use EMSAs to probe the affinity ORF68 for 'cognate' (GC-rich) or scrambled DNA. While the similar binding affinity can be easily seen, the estimated dissociation constant (Kd) is likely significantly wrong because the Langmuir-Hill equation used by the authors does not take into account ligand depletion and the assumption that the [ORF68]total equals [ORF68]free is not valid when using nM concentrations of both fluorescent DNA probe and ORF68. The authors should either quote the effective binding affinity in their assay (EC50) or fit their data to a model that takes into account ligand depletion.

      Reviewer #3:

      This paper by Didychuk et al. focused on determining the structure and possible functions of the proteins encoded by the KSHV (orf68) and EBV (BFLF1) that are required for genome packaging. The cleavage and packaging of herpesvirus genomes involves a number of viral proteins. These homologous proteins form pentameric rings with channels that bind dsDNA. The authors present a number of structural and biochemical studies focused on determining the role of these proteins in the cleavage and packaging of the herpesvirus genomes. The work answers questions of significance regarding the novel biochemical activities of ORF68 protein and several models are proposed on how these proteins may function in the packaging of the herpesvirus genomes. The paper is well written, very concisely presented considering the large amount of data, and will be important to those studying DNA packaging of herpesviruses as well as other DNA viruses. Although there are a large number of experiments they all contribute to a very extensive analysis of this very interesting protein whose role in DNA packaging has been unknown.

      Specific Points:

      1) p. 18. Lines 335-339. The authors might want to point out that HSV-1 DNA replication produces branched, head-to-tail concatemers of viral genomes that must be cleaved and packaged into capsids as individual, unit-length monomers. PFGE studies have shown that in HSV infected cells the replicated viral genome produces concatemers that are cleaved only at the UL-end of the viral genome (PMID: 9222355). A number of studies with HSV mutants indicated that all of the cleavage packaging proteins (except UL25) along with capsid proteins are required for this initial cleavage reaction. Also the portal protein has been shown to interact with replicating HSV genomes and the role of UL32 and its homologs may facilitate the first cleavage as part complex (PMID: 28095497). Also of interest, these studies (iPOND/aniPOND) did not detect a DNA interaction of UL32.

      2) Discussion: In contrast to the KSHV and HSV proteins the EBV BFLF1 protein forms a decameric ring. What might be the significance of this and why would this not be the case for the other two proteins?

    1. Summary: There was general enthusiasm for exploring approaches to semantic relationships in language, and for the quantitative comparison of different modeling approaches. There were questions on the degree to which the current results tied in to past literature of semantic processing, which seemed like it could have been better integrated, to help make current advances in theory more clear. As one example, the overall framing to try to link computational models and neural processing seemed to be a stretch given the data.

      Reviewer #1:

      In this paper the authors examine neural representations of semantic information based on EEG recordings on 25 subjects on a two-word priming paradigm. The overall topic of how meaning is represented in the brain, and particularly the effort to understand this on a rapid timescale, is an important one. Although presented thoroughly, the analyses did not make a convincing step beyond prior investigations in linking semantic models to neuroscientific theories of meaning representation.

      Linking word embedding / high dimensional semantic spaces to brain data has been done before in both fMRI and M/EEG (some of these papers are cited here). That is, the potential to link these two types of data has been demonstrated. So, an important question is what key advance to the current data does this provide. This seems like it could be either a deep dive into the representational spaces of the language models, or using the models to advance our understanding of semantic representation in the brain. Unfortunately I was not convinced that either of these was realized.

      One important contribution seems to be the use of three word embedding models (i.e. three semantic spaces): CBOW, GloVe, and CWE. Although these are described briefly (L89 and following) the nature of the different predictions was not spelled out, and thus the different (contradictory or complementary) aspects of these models were not immediately clear. In other words, by the end of the paper it wasn't clear whether we learned anything about these models we didn't know before.

      The relationship of the reported ERP findings to contemporary views of semantic memory was lacking. There is a large literature on semantic memory that goes far beyond the N400. I don't mean to imply that the authors need to address ALL of it, but right now it is difficult to get even a sanity check on whether the topographic/neuroanatomical distributions for the models are reasonable. This difficulty also leads to some questions with the methods - for example, averaging model-brain correlations across all channels. Given that some channels are likely to be more informative than others, I'm not convinced the overall average is a good metric. All told a greater link between the language models and neural responses is needed (i.e. a clearer link to frameworks for semantic memory).

      Reviewer #2:

      Summary and General Assessment:

      25 participants performed a visual primed lexical decision task while EEG was recorded. The authors correlate the EEG-recorded neural activity with three different methods of deriving word embedding vectors. The goal was to investigate semantic processing in the brain, using metrics that have been derived using NLP tools. The main finding is that neural activity during the same time-window (~200-300 ms) that has been associated with semantic processing in classic EEG literature - the so-called N400 component - was significantly modulated by semantic similarity between the prime and target pairs as quantified by the word embeddings. The authors claim, therefore, that brains and machines have similar representations of semantics in their processing.

      My main concern, highlighted below, is that the claims exceed the findings of the paper. I believe that the current results nicely recapitulate the classic N400 literature using a continuous variable rather than a categorical design, but do not significantly contribute to our understanding of semantic processing in AI and humans.

      Major comments:

      1) Magnitude of claims

      My main concern is that the authors are claiming interpretations that are much broader than the experimental design and results can support. The experimental design adapts a classic lexical-decision priming paradigm, using the cosine-similarity in the word-embeddings as the index of semantic similarity between prime and target. They replicate an N400 result using this continuous measure rather than a categorical one. While this is interesting, it does not, in my view, contribute to the discussion of the similarity between brains and AI. Instead, it demonstrates that co-occurrence metrics can be used as proxies for semantic similarity between word pairs.

      2) Analytic rigor

      I also have my concerns regarding the analysis techniques selected. The authors primarily analyse activity as recorded from the single electrode, or average the data across all electrodes. The results across electrodes are just shown for visualisation purposes with no statistics. I would suggest instead applying a spatio-temporal permutation test to incorporate the spatial dimension.

      Relatedly, even though justification is given for primarily analysing data recorded from channel Cz based on previous N400 studies, it seems that a lot of the analyses are actually applied on Oz (e.g. line 288, and in Figure 4 caption). Is this a typo, or was the analysis indeed applied to Oz?

      The duration of the effects using the temporal cluster test are very short, in some cases less than 10 ms in duration. A priori, we would expect meaningful measures of semantic processing to be of a much longer duration.

      3) Completeness of description of analysis

      I found the reporting of the statistical results very much under-specified. Although behavioural analyses are sufficiently reported, EEG-analyses are not. I found no report of effect sizes, and specific p-values were missing in many cases.

      Reviewer #3:

      The study analyzed EEG responses to visually presented noun-noun pairs. Priming effects were estimated by subtracting the response to the same noun presented in prime position from the response in target position. These priming effects were then correlated with the cosine distance computed from 3 variations on a word embedding model.

      Semantic distances from word embedding models have been previously shown to predict brain responses (papers cited on line 74, but also work by Stefan Frank, e.g. Frank & Willems, 2017; Frank & Yang, 2018). The main text argues that previous studies, which used whole sentence stimuli, confound semantic composition with semantic representations, and that the innovation of the present study is that it uses a semantic priming paradigm to access "pure" (79) semantic representations.

      My main concern is that the conclusions are not supported by the data (point 1 below). I also have some concerns about the methods. In my view the data and analysis approach could potentially be interesting, but the framing would need to be quite different to emphasize conclusions that are appropriate for the evidence (and probably more modest).

      1) Interpretation of the results

      The main claim of the manuscript is that the correlations imply "Comparable semantic representation in neural and computer systems" (title), repeated as "common semantic representations between [the] two complex systems" (300 ff.) and "human-like computation in computational models" (13). This conclusion is not warranted by the results. The word embedding models are essentially (by design) statistical co-occurrence models. It has also long been known that humans, and N400s specifically are sensitive to language statistics (e.g., Kutas & Federmeier, 2011). The correlation is thus parsimoniously explained by the fact that both systems are sensitive to lexical co-occurrence statistics. The (implicit) null hypothesis that is rejected is merely that human responses are insensitive to these co-occurrence patterns at all. The alternative hypothesis does not by itself imply any deeper similarity in the representational format. Similarly, the comparison of correlations with different word embedding models can potentially tell us something about which specific co-occurrence patterns humans are sensitive to, but it does not by itself imply any deeper similarity of the representations.

      2) Methods

      The Methods section leaves open several crucial questions.

      2-A) Data was recorded from multiple subjects. However, the dependent variable was a correlation coefficient between single-trial ERP and trial-wise semantic dissimilarity. How did this model account for the multi-level structure of the data?

      2-B) It is not clear that the results are corrected for multiple comparison across the 600 time points. The threshold for significance in Figure 4 B varies for each time point, whereas a critical feature of classical permutation tests is to aggregate the maximum statistic across the time points to correct for multiple comparison. The legend also indicates that the test was performed "at each time point" (4) without mentioning correction.

      2-C) The statistical analysis is even less clear when different models are compared (309 ff.). For a significant result, a p-value should be provided and, if possible, some estimate of effect size.

      References

      Frank, S. L., & Willems, R. M. (2017). Word predictability and semantic similarity show distinct patterns of brain activity during language comprehension. Language, Cognition and Neuroscience, 32(9), 1192-1203. https://doi.org/10.1080/23273798.2017.1323109

      Frank, S. L., & Yang, J. (2018). Lexical representation explains cortical entrainment during speech comprehension. PLOS ONE, 13(5), e0197304. https://doi.org/10.1371/journal.pone.0197304

    1. Summary: In this well done manuscript, the authors examine the bHLH transcription factor TWIST1 and its interacting proteins in neural crest cell development using an unbiased screen. Given the important role of neural crest cells in craniofacial and cardiac developmental defects, the data are both useful and important.

      The major problem is the claim that the regulation reported here is important for neural crest specification / induction. This cannot be the case, as Twist 1 starts to be expressed in mouse only during the delamination step according to published single cell data. The premigratory Zic/Msx positive neural crest shows no expression of Twist1 before EMT markers kick in. The authors need to deal with this. It would be important to show in vivo expression data analysis and bring the conclusions in line with the timing in neural crest development.

      Reviewer #1:

      This excellent study is focused on the mechanisms of action of Twist1 in the neural crest cells and on the identification of core components of the Twist1 network. The authors performed an in-depth experimental study and sophisticated analysis to identify Chd7/8 as the key partners of Twist 1 during NCC development. This identification and corresponding predictions later appeared consistent with experimental in vivo data including single and combinatorial gene knockout mouse models with phenotypes in the cranial neural crest. Overall, this study is important for the field. However, I disagree with some secondary interpretations the authors give to their results. At the same time, the major conclusions stay solid. Below I discuss the most critical points.

      1) Chd7, Chd8 and Whsc1 are ubiquitously expressed. Thus, the specificity of regulation is achieved via interactions with other, more cell type- and stage-specific, factors. This would be good to mention.

      2) The authors suggest: "The phenotypic data so far indicate that the combined activity of TWIST1-chromatin regulators might be required for the establishment of NCC identity. To examine whether TWIST1- chromatin regulators are required for NCC specification from the neuroepithelium and to pinpoint its primary molecular function in early neural differentiation, we performed an integrative analysis of ChIP-seq datasets of the candidates".

      • This is a strange assumption, given that Twist1 is expressed only starting from the NCC delamination stage in mouse cranial neural crest (Soldatov et al., 2019). It does not seem to correlate with premigratory NCC identity and the situation inside of the neural tube. The authors conclude: "Therefore, combinatorial binding sites for TWIST1, CHD7 and CHD8 may confer specificity for regulation of patterning genes in the NECs." Or, alternatively, they may confer the control of mesenchymal phenotype, downstream migration and fate biasing etc. I do not think the authors have good arguments to bring up induction or patterning of NCCs at the level of the neural tube.
      • I have a good suggestion for the authors: I would extract the regulons from Soldatov et al. single cell data and run the binding site proximity check for the individual genes belonging to the gene modules /regulons specific to delamination and early NCC migration stages. I am curious, if the proximity of binding sites of Twist1-related crowd would rather correlate with genes from these specific regulons as compared to randomly selected regulons from the entire published single cell dataset. Randomization/bootstrapping analysis are welcomed. So far, being an excellent study, this paper does not solve a problem of downstream (of Twist1) gene expression program in the neural crest cells. At the same time, this is what the author can try to obtain with their DNA binding data in combination with published single cell data. Repression of Sox2 and upregulation of Pdgfra (reported in Figure 4) might be a part of this downstream program being in line with the published single cell gene expression data (Soldatov et al., 2019).
      • The authors conclude the paragraph: "Therefore, combinatorial binding sites for TWIST1, CHD7 and CHD8 may confer specificity for regulation of patterning genes in the NECs". Again, this is not a good or plausible explanation based on specificity of expression of suggested patterning genes (or visualized genes are poorly selected). Additionally, although I believe the obtained results are important and of a good quality, I would not call them "developmentally equivalent to ectomesenchymal NCCs" or other NCCs. This is because the in vitro system will never reflect the embryonic in vivo development with high accuracy (especially when it comes to patterning and positional identity). This might explain that some prominent binding positions and interpretations the authors give do not correspond to the gene expression logic during neural crest development. Besides, Twist1 and Chd7/8 are naturally expressed in many other cell types and might target non-NCC genes (Vegfa?). This does not reduce the value of the data, but it is good to mention for the community.

      3) Figure 2: Twist1-/+ Chd8-/+ is repeated two times in panel B (but the embryos look differently), although the authors most likely meant to show Twist1-/+ Chd7-/+ in the second case. If this is indeed the case, the authors should also show a phenotype of Chd7 KO.

      4) The authors write: "Impaired motility in Twist1, Chd8 and Whsc1 knockdowns was accompanied by reduced expression of EMT genes (Pdgfrα, Pcolce, Tcf12, Ddr2, Lamb1 and Snai2) (Figure 6D, S3D) and ectomesenchyme markers (Sox9, Spp1, Gli3, Klf4, Snai1), while 375 genes that are enriched in the sensory neurons located in the dorsal root ganglia (Ishii et al., 2012) were upregulated (Sox2, Sox10, Cdh1, Gap43; Figure 6E).

      • From the list of genes characterizing EMT, I can agree only on Pdgfra and Snai2, the rest is unspecific for EMT, and appears rather ubiquitous or specific to different cell populations (non-EMT).
      • From the list of suggested ectomesenchyme markers, I cannot pick any gene that would be a bit specific for ectomesenchyme (within neural crest lineage) except for Snai1. Sox9 is broadly expressed also in the trunk neural crest, Spp1 and Klf4 are not expressed in early mouse ectomesenchyme, Gli3 is too broad and non-selective. I suggest to select other gene sets (check the expression with online PAGODA app from Soldatov et al): http://pklab.med.harvard.edu/cgi-bin/R/rook/nc.p63-66.85-87.dbc.nc/index.html
      • The choice of DRG genes is also non-optimal, as Sox10 is pan-NCC, Sox2 is expressed in early migrating crest and satellite glial cells of DRG and Schwann cell precursors, Gap43 and Cdh1 are not specific enough. These genes clearly suggest the beginning of neuro-glial fates or trunk neural crest bias. To be more precise and for claiming sensory neurons, the authors should come up with pro-neuronal genes such as neurogenins, NeuroD, Isl1, Pou4f1, Ntrk and many others.

      Still, overall, I agree with the author's main conclusions.

      5) The authors write: "The genomic and embryo phenotypic data collectively suggest a requirement of TWIST1- chromatin regulators in the establishment of NCC identity in heterogeneous neuroepithelial 403 populations". Again, I do not think the authors can claim anything related to the establishment of NCC identity. NCC identity, in broad sense, includes NCC induction within the neural tube, at both trunk and cranial levels. In mice, Twist1 is not expressed in trunk NCCs at all. At a cranial level, Twist1 is expressed too late to be a NCC inducing or patterning gene. As I mentioned earlier, it comes up during delamination.

      6) Figure 7G only partly corresponds to the positioning of the NCC markers in a mouse embryo. Id1 and Id2 are broadly expressed throughout all phases of NCC development and in the entire dorsal neural tube beyond the NC region. Mentioning Otx2 as a NCC specifier is strange. At the same time, Msx1, Msx2, Zic1 are excellent genes! Tfap2 is a bit too late, but still ok. Please keep in mind, Msx1/2, Zic1 are expressed before Twist1, and, thus, Twist1 can be downstream of this gene expression program. Also, these genes become downregulated quite soon upon delamination, whereas Twist1/Chd7/8 expression stays (in vivo). Expression pattern of Tfap2a better corresponds to Twist1, although Tfap2a comes a bit before Twist1, and, besides, Tfap2a is expressed independently of Twist1 in trunk NCC. Despite such gene expression divergence, Twist1-based networks might provide positive feedback loops stabilizing the expression of other transcriptional programs that were originally induced by other factors. It might be good mentioning this to the readers. This "stabilizing role" of the Twist1 network can be a really important one. Given the incremental and combinatorial nature of the phenotype in vivo - this is most likely the case. I believe these points are important to reflect in the discussion section.

      Reviewer #2:

      This manuscript, by Fan et. al, is a comprehensive look into the bHLH protein TWIST1 and its interacting proteins in neural crest cell differentiation. The study employs an unbiased screen where a TWIST1-BirA fusion is used in conjunction with biotin linking to collect Twist protein transcriptional complexes. (BioID-Proximity-labeling, TWIST1-CRMs). The work appears carefully done and the data and impact of this study are high given the nature of NCCs being involved as key players in craniofacial and cardiac developmental defects. The association of TWIST1 with the chromatin helicases CHD7 & 8 is important to understand as numerous TWIST1 loss-of-function studies indicate that its role in NCCs clearly is required for normal NCC function.

      The NCC cell line O9-1 is used to collect the data. Genetic interactions between TW1, Chd7, Chd8 and Whsc1 are tested in genome edited ESCs. Overall, this is a well-executed, interesting and important study.

      Reviewer #3:

      Using BioID, the authors identified more than 140 proteins that potentially interact with transcription factor Twist1 in a neural crest cell line. Most of these 140 Twist1-interactomes do not overlap with the 56 known Twist1 binding partners during neural crest cell development (see below). By focusing on several strong Twist1 binding partner candidates (particularly a novel candidate CHD8), the authors found:

      1) Twist1 interacts with these proteins via its N-terminal protein domain as demonstrated by co-IP.

      2) Compound heterozygous mutation of Chd8, Chd7 or Whsc and Twist1 displayed more severe phenotype compared to heterozygous mutation of Twist1 alone, for example, more significant reduction of the cranial nerve bundle thickness.

      3) ChIPseq analysis of Twist1 and CHD8 and key histone modifications revealed that the binding of Chd8 strongly correlates with those of Twist1, to active enhancers that are also labeled by H3K4me3 and H3K27ac.

      4) The binding of CHD8 requires the binding of Twist1, but not vice versa.

      5) Twist1-Chd8 regulatory module represses neuronal differentiation, and promotes neural crest cell migration, and potentially their differentiation into the non-neuronal cell types.

      The authors use an impressive array of different techniques, both in vitro and in vivo, and yield consistent results. The manuscript is nicely written. The findings are nuanced, but the major conclusions are largely expected.

      Critiques:

      • As the title states, the three key TWIST interacting factors that most of the study focuses on are chromatin regulators. However, the consequence of mutating these factors at the epigenetic level was not directly addressed, including the level of active histone modification, the accessibility of the Twist1/CDH co-bound promoters/enhancers, and the position of nucleosomes.
      • CRISPR-generated ESCs and chimera technology were used effectively to generate mutants. In comparison, the analysis of the phenotypes was rather cursory and can benefit from more in-depth molecular analysis. The altered genes found in mutant NEC and NCC in the last section of the study, especially, should be validated in mutants.
      • Across the manuscript, there were jumps from NCC to NEC and back. It will be important to justify why a certain cell type is selected for each analysis, focusing on the biological question at hand.
      • Using BioID, the authors detected 140 different proteins that interact with Twist1. However, only 4 of them overlap with the 56 known Twist1 partners (Figure 1A). This result suggests that BioID identified almost a distinct set of Twist1-interacting proteins, compared to the published results. The authors need to discuss the discrepancy, and the underlying reasons.
      • The authors show that Twist1 colocalizes with Cdh8, and is required for the binding of Cdh8, thus suggesting that Twist1-Cdh8 form a regulatory module. Given the degenerate nature of bHLH factor binding motifs, it is likely that the binding of Twist1, and subsequently the binding of Cdh8, are dictated by other transcription factors. Therefore, a motif enrichment analysis should be done among the Twist1/Cdh8 co-binding sites, and compare those motifs enriched in Twist1-only and Cdh8-only binding sites.
      • The increasing expression of DRG neurons genes in Twist1/Cdh8 mutants suggests a possible transition from cranial NC to trunk NC. Therefore, the authors should examine the expression of marker genes accordingly.
    1. Summary: The reviewers felt that the idea that the pain estimation was a magnitude estimation of heat (even heat pain) could not be ruled out. One of the beauties of the pain percept is the ability to reach the same percept with a large variety of stimulus modalities and this was not done in this ms. So there is nothing to disabuse one of the idea that this is heat or even heat pain but not pain per se. The reviewers also were concerned that the variabilities of the studies included and the individuals therein were ignored: R/L, location and of course manipulation. Finally, it is not an automatic that every individual accomplishes pain perception in the same way. Thus while individual variability may undermine the reliability of the results, it could also reflect a biological possibility, one which the authors do not address.

      Reviewer #1:

      This meta-analysis aims to resolve once and for all the debate surrounding how pain is represented in the brain. The authors take us one step closer, finding that multi-system and whole brain models outperform modular (single locales or single networks) models. They do not see an advantage of the whole brain model over multi-system possibilities. However, as they explain in the Discussion, this may be due to technical liabilities in the evaluation of whole brain models.

      A major concern is that all of the studies used thermal stimulation. Then, in contrast to this homogeneity in stimulus, the manipulations varied widely but did not include straight up vicarious pain. It would seem that if pain report is the variable trying to be explained, studies without a somatosensory stimulus would be particularly informative.

      One other comment. An underlying assumption here is that individuals use the same brain circuits to interpret and report pain. This may not be warranted. Certainly, in reductive systems where this can be and has been rigorously studied (eg. stomatogastric ganglion), a consistent finding has been that different individuals reach the same endpoint using different circuit mechanisms.

      Reviewer #2:

      The authors tackle an important topic, namely the scale at which pain is represented in the human brain, based on fmri brain activity collected in 7 studies and in more than 300 subjects. The statistical approach seems robust and adequate as more than 45 different models compete with each other. However, the study completely lacks any controls and remains questionable if they are actually modeling pain or simply magnitude evaluation. Main concerns are further expounded below:

      1) The study is based on a convenience data set and as such is not designed to properly address the question.

      2) Although the authors purport to model pain perception, in fact they are simply modeling the evaluation of the magnitude of a stimulus, which may not even be painful in the lowest quartile of the magnitudes attempted to be predicted. Thus, the study lacks the critical control of a simple task of magnitude estimation. It is quite likely that the extended brain regions and networks identified are all related to magnitude assessment rather than pain perception.

      3) Additionally one would need to see a contrast between nociceptive stimuli and at least one other sensory modality, for example touch, to demonstrate that the observed required networks are in fact specific to pain rather than to any other sensation.

      4) The diversity of the data sets remains worrisome as they most likely are simply adding to unaccounted variance.

      5) The report remains far too technical and does not convince the reader that they have properly untangled this complex issue at hand.

      Reviewer #3:

      General assessment:

      The manuscript is well written and the results were clearly presented. The methods details of this study remain one of the most comprehensive amongst fMRI MVPA papers, and the statistical procedures taken to ensure validity of the models would be an extensive guide for similar future studies. However, as the methods section was fairly dense, the narrative of the article can be difficult to follow at times. Overall, the manuscript will be of interest and relevant to readers.

      Concerns:

      1) Despite the large sample size and careful statistical validation, the data preparation step of this study, in particular, the decision to average GLM trial brain maps within pain intensity quartile within individuals, may cast some doubts on the conclusions. While this step was necessary for computational tractability, it effectively reduced each participant's data into four brain maps for model training (Figure 2B-C). As far as I understand, this manipulation is likely to smooth out most effects contributed by non-temperature experimental factors due to trial permutation. In addition, it further reduces the temporal resolution of evoked pain into `snapshots' of several pain intensities. While the remainder of the study carefully compared modular and multisystem representations of pain, the study seems incomplete without discussing how this data manipulation might impact the conclusions, or how resulting biases can be acknowledged and mitigated. For example, modular representation of pain could be the superior representation in a particular cognitive manipulation paradigm for pain, or a specific time window/point during extended pain experience, and these possibilities cannot be excluded based on present evidence.

      2) In addition, as mentioned by the authors, between-subject variance is not considered in the present analysis, which appeared to contribute a large amount of pain intensity variance (Figure 1B). It would be great if the authors can discuss the implication of the results in such context, and how MVPA methods can be used to study those effects.