10,000 Matching Annotations
  1. Feb 2023
    1. ChatGPT appears to be good at helping programmers spot and fix errors in their code.

      This resonates with my personal experience playing with chat GPT because when I was playing around with it I was asking the machine questions to help me with my homework. It does indeed catch errors in not just coding but other fields. My experience compares with what I see in the piece because I was able to see personally how chat gpt can find errors and make corrections to work.

    1. If you would like VS Code to remember any ports you have forwarded, check Remote: Restore Forwarded Ports in the Settings editor (Ctrl+,) or set "remote.restoreForwardedPorts": true in settings.json.

      设置记忆临时添加的转发

    1. Use of Morse Code as a standardized structure for the language of telegraphic dispatches meant that telegrams, in one sense, were a classless language.

      Does this mean morse code is the same for all classes? I was not aware language varied so much between different classes.

    1. Le chef d'établissement fixe l'ordre du jour, les dates et heures des séances du conseil d'administration en tenant compte, au titre des questions diverses, des demandes d'inscription que lui ont adressées les membres du conseil. Il envoie les convocations, accompagnées de l'ordre du jour et des documents préparatoires, au moins huit jours à l'avance, ce délai pouvant être réduit à un jour en cas d'urgence.
    1. Here is the template I use for any Zettelkasten-related note:<%*const fileName = await tp.system.prompt("File Name");const fileType = await tp.system.suggester(["🌱", "🌿", "🌞", "🌲", "🧒", "🗺️"], ["seed", "fern", "incubating", "evergreen", "orphan", "moc"]);await tp.file.rename(fileName)let filePath = "100 Zettelkasten/"+fileNamelet mocQuery = ""switch (fileType) { case 'moc': filePath = "100 Zettelkasten/120 MOC/"+fileName mocQuery = '```dataview\nLIST\nFROM "100 Zettelkasten"\nWHERE contains(Topics,[['+fileName+']])\n```' break; case 'seed': filePath = "100 Zettelkasten/110 Zettelkasten Inbox/"+fileName break; }await tp.file.move(filePath)%>---aliases: tags: zettelkasten/<% fileType %>---Topics:: References:: # <% fileName %>---<% mocQuery %>

      An interesting bit of code that could let me have a single template to create a note or a project or a MOC. I could replace 3 of my current templates with a single one, and reduce the number of special hotkeys too.

    1. if x < 10: print ("a") if x < 0: print ("b") else : print ("c")

      How does this code work? It has two "if" statements, and only one "else" statement. Where is the "else" statement for the "if" statement that's nested in the other one? Wouldn't it produce an error?

    1. Reviewer #1 (Public Review):

      This manuscript studies the representation by gender and name origin of authors from Nature and Springer Nature articles in Nature News. The representation of author identities is an important step towards equality in science, and the authors found that women are underrepresented in news quotes and mentions with respect to the proportion of women authors.

      Strengths:

      The research is rigorously conducted. It presents relevant questions and compelling answers. The documentation of the data and methods is thoroughly done, and the authors provide the code and data for reproduction.

      Weaknesses:

      The article is not so clearly structured, which makes it hard to follow. A better framing, contextualization, and conceptualization of their analysis would help the readers to better understand the results. There are some unclear definitions and wrong wording of key concepts.

    2. Reviewer #2 (Public Review):

      This paper set out to investigate disparities in how authors of scientific papers are quoted in the context of science journalism. Quotations, the authors argue, reveal who a science journalist approaches as a source and thus who is considered an expert. At the same time, quotation in the news legitimizes experts and signals the importance of their perspective and opinions. It is therefore important to identify disparities in a quotation, both as a matter of justice and to ensure the representation of diverse viewpoints in journalism.

      Here, the authors investigate disparities in quotation based on the gender and national origin of experts. They focus on science journalism in non-research articles published in the journal Nature. Articles are scraped from the Nature website and using established NLP tools the article content is parsed for quotations and the names of scientists being quoted. The gender and national origin of scientists are inferred based on their names and gendered pronouns used in the text. The rates of quotation based on gender/national origin are then compared to the demographics of authors (also inferred) of research articles published in Nature; this establishes a baseline to compare who is quoted vs. who is actually doing research. Based on these data, a variety of analyses are presented showing various aspects of bias and disparity in who is quoted in science journalism.

      From their analysis, the authors make the following claims:

      • Authors inferred as men were over-represented in quotations in journalistic Nature articles relative to their share of first and last authors in Nature.

      • A quotation is sharply trending towards gender parity, with variation by the type of article.

      • Authors with names inferred as originating from Celtic/English regions were over-represented, whereas authors with names inferred as originating from East Asia were heavily under-represented in quotations.

      • The representation of authors with inferred East Asian names has increased faster among the last authors of research articles in Nature than it has in a journalistic quotation.

      Claims 2-4 are solidly supported by the evidence presented in the manuscript. Claim 1 is supported by the evidence, but with some caveats. Support for Claim 1 depends on whether Nature's first or last authors are the most appropriate comparison set; if the last authors are the most appropriate, then Claim 1 only holds for 2005 through 2010. I expand on this point below.

      I praise the manuscript and the authors for their commitment to reproducibility. Supplied with the paper is all the data (where possible) and code necessary to reproduce the results, as well as a Docker image that ensures that it can be re-executed far into the future.

      The analyses conducted are methodologically rigorous. The authors provide bootstrapped confidence intervals for all analyzed values, choose appropriate baselines, and validate their name inference approach. In addition, I found their analysis comprehensive. By this I mean that they sufficiently explored their data to support their claims; nearly every caveat or limitation I could think of while reading was appropriately addressed either in the main or in a supplemental figure or table.

      While a good paper, it is not without weaknesses. The paper is generally well-written, and the visualizations do a good job of communicating results. There is, of course, room to improve on both. In some cases, the manuscript lacks consistency in terminology, and uses word choice that is strange (e.g., "enrichment" and "depletion" when discussion representation). While this paper is methodologically rigorous and professional in its presentation, I feel that the authors could have done a better job of interpreting and contextualizing their findings. Specifically, readers should be aware of the caveats regarding Claim 1 (listed above), the limits of generalizing these findings to other areas of science journalism, and a somewhat shallow discussion section that I believe detracts from the study's significance. I outline these points in more detail below.

      Despite these quibbles, the authors find solid support for their claims and achieve their goals. This paper, I believe will be of general interest to scientists and science communicators, to those interested in science communication as a field, to meta-scientists, and to those aiming to improve diversity and equity in the scientific process.

      Caveats to Claim Claim 1:

      One of the claims made by the authors (Claim 1) is that quotations in the dataset skew towards men. I find this true, but with two related caveats: that it depends on the choice of comparator set, and that it changes over time.

      The authors assess the representation of quotation by comparison to either Nature's first authors, or last authors. However, the authors do not discuss whether one is more appropriate, and what is implied if, say, quotations match the last author but not the first authors. In most scientific fields, the last author corresponds to the conceptual lead of a paper and is often the corresponding author who is most likely to be contacted to discuss the paper's significance. First authors, in contrast, will often represent the "driver" of the project-basically the person doing most of the actual work and is usually a student or more junior researcher. This distinction is important because cases could be made for either being a more appropriate comparator - last authors due to their seniority, first authors due to their closeness to the study, and (typically) greater diversity.

      The choice of comparator set becomes an issue because, as per Claim 2, the representation of women is increasing over time. Claim 1 only holds for the last authors from 2005 through 2010, and after 2018 women have higher representation given the demographics of the last authors. For the first authors, Claim 1 holds through 2017, after which they are representative or slightly over-representative of women authors.

      So while Claim 1 holds, it does not hold for all comparator sets and for all years. I don't think this is critical of the paper-the authors do discuss the trend in Claim 2-but interpretation of this claim should take care of these caveats, and readers should consider the important differences in first and last authorship.

      Generalizability to other contexts of science journalism:

      Journalistic articles in Nature may not be representative of all contexts of science journalism. Nature has a unique readership, consisting of scientists from many disciplines who have not only a generalist interest in science but also an interest in aspects of science as a profession. Science journalism as a whole, however, is part of the broader landscape of mainstream media, consisting of outlets such as ABC, BBC, and Scientific American. The audiences for these outlets will be more general, less interested in science as a career, and will likely have a different appetite for direct quotations and for more technical topics.

      This does not make the study bad. On the contrary, the author's focus on Nature allowed for many interesting analyses-but their findings should still be understood as coming from a specific context. While the authors outline many limitations of their study, they do not grapple with the limits of its generalizability, and what aspects of their analysis might translate to other contexts of science journalism. For example, part of the trend towards gender parity in a quotation is explained by the higher representation of women in the "Career Feature" article type. However, this article type will likely not be present in more general-interest contexts, which would affect the representation of women.

      Shallow discussion:

      I feel that the authors missed an opportunity to use their discussion to not only properly contextualize their results, but also explore their significance. In broad terms, there is literature on science journalism, its consequences for science, and the impact on public perceptions, as well as a continuous meta-discourse on journalistic ethics and best practices. The authors pay lip service to some of these themes but do little to actually place their findings in the broader discourse. Below, I provide a few specific points that could be further discussed:

      What might be the downstream impacts on the public stemming from the under-representation of scientists with East Asian names?

      The authors highlight gender parity in career features, but why exactly is there gender parity in this format of Representation in quotations varies by first and last author, most certainly as a result of the academic division of labor in the life sciences. However, what does it say about the scientific quotation that it appears first authors are more often to be quoted? Does this mean that the division of labor is changing such that the first authors are the lead scientists? Or does it imply that senior authors are being skipped over, or giving away their chance to comment on a study to the first author?

      Moreover, there are several findings in the study which are notable but don't seem to have been mentioned at all in the discussion.

      Below I highlight a few:

      • According to Figure 3d, not only are East Asian names under-represented in quotations, but they are becoming more under-represented over time as they appear as authors in a greater number of Nature publications.

      • Those with European names are proportionately represented in quotations given their share of authors in Nature. Why might this be, especially seeing as Anglo names are heavily over-represented?

    1. Are there symbols for 'supported by' or 'contradicted by' etc. to show not quite formal logical relations in a short hand?

      reply to u/stjeromeslibido at https://www.reddit.com/r/Zettelkasten/comments/10qw4l5/are_there_symbols_for_supported_by_or/

      In addition to the other excellent suggestions, I don't think you'll find anything specific that that was used historically for these, but there are certainly lots of old annotation symbols you might be able to co-opt for your personal use.

      Evina Steinova has a great free cheat sheet list of annotation symbols: The Most Common Annotation Symbols in Early Medieval Western Manuscripts (a cheat sheet).

      More of this rabbit hole:

      (Nota bene: most of my brief research here only extends to Western traditions, primarily in Latin and Greek. Obviously other languages and eras will have potential ideas as well.)

      Tironian shorthand may have something you could repurpose as well: https://en.wikipedia.org/wiki/Tironian_notes

      Some may find the auxiliary signs of the Universal Decimal Classification useful for some of these sorts of notations for conjoining ideas.


      Given the past history of these sorts of symbols and their uses, perhaps it might be useful for us all to aggregate a list of common ones we all use as a means of re-standardizing some of them in modern contexts? Which ones does everyone use?

      Here are some I commonly use:

      Often for quotations, citations, and provenance of ideas, I'll use Maria Popova and Tina Roth Eisenberg's Curator's Code:

      • ᔥ for "via" to denote a direct quotation/source— something found elsewhere and written with little or no modification or elaboration (reformulation notes)
      • ↬ for "hat tip" to stand for indirect discovery — something for which you got the idea at a source, but modified or elaborated on significantly (inspiration by a source, but which needn't be cited)

      Occasionally I'll use a few nanoformats, from the microblogging space, particularly

      • L: to indicate location

      For mathematical proofs, in addition to their usual meanings, I'll use two symbols to separate biconditionals (necessary/sufficient conditions)

      • (⇒) as a heading for the "if" portion of the proof
      • (⇐) for the "only if" portion

      Some historians may write 19c to indicate 19th Century, often I'll abbreviate using Roman numerals instead, so "XIX".

      Occasionally, I'll also throw drolleries or other symbols into my margins to indicate idiosyncratic things that may only mean something specifically to me. This follows in the medieval traditions of the ars memoria, some of which are suggested in Cornwell, Hilarie, and James Cornwell. Saints, Signs, and Symbols: The Symbolic Language of Christian Art 3rd Edition. Church Publishing, Inc., 2009. The modern day equivalent of this might be the use of emoji with slang meanings or 1337 (leet) speak.

    1. So they greenlight the same old crap, imitations of what's on the list this month, simply to cover their own quivering asses.

      The same happens with everything nowadays. Capitalism destroys innovation for profit. It is especially visible in fields where it takes more people & time to produce: games are the most obvious one. Considering will to success & financial profits, it, unfortunately, makes sense that creativity leaves its lead. When our Game Studio 2 group was trying to figure out our unique time mechanic, we were lost in so many places, starting from code implementation and ending with animation system. All of this, in full production setting makes it impossible to estimate & calculate budget. At the same time, team members need money to pay rent and eat food, and making bold creative choices is just scary and risky. Though I'm very inspired by creators who make what they want, not connecting themselves to trends (like German studio Honig, for example), but it's hard and probably requires some kind of governmental support, which is impossible in many countries.

    1. The Razorpay Activation/Operations team checks whether the selected Purpose Code is correct or not.

      should ideally not be in the watch out callout. Can be a part of the 6th step.

    2. How it Works

      the video and steps don't match. The video talks about how to updated a purpose code and steps are about how to get onboarded. Split into 2 sections for clarity. Process is confusing after the 3rd step.

  2. Jan 2023
    1. y[i] = x[i] + y[i];

      How is that supposed to be correct? There is no way of knowing if the value of i will not overflow both x and y.

      cudaMallocManaged must be doing some kind of memory space management for this piece of code to be correct.

    1. The interface should allow users to see and edit a project in a viewthat shows them what the final product will be. In that case, thestudents do not need to memorize the complicated codes and canfocus on their work

      This is very important in determining if I will use a program or not. If it takes a lot of time to figure out code to make it look pleasing, then I will probably not use it. Google Sites does a pretty good job with this. The assignment where we had to embed code into it was easy and simple to do. What we saw on the screen was how it will be published. This makes things easier and is one less barrier to using something new. WIX is another tool that is pretty good at this. The goal of it is not to learn coding sills to create a webpage, but to actually create a webpage that you want to use and share.

    1. writing and language too often perpetuate white, Eurocentric ideologiesabout what it means to write “well” or “effectively”

      This relates back to last semester when learning about "code switching" and what it actually means to be literate.

    1. HSN CodeHarmonized System of Nomenclature code of a particular class of item/service under GST. HSN code is 8-character longSAC CodeService Accounting Codes (SAC) of a particular class of item/service under GST. The character range of a SAC code can be from 2-6 characters.

      India centric

    1. Author Response

      Reviewer #1 (Public Review):

      The authors optimize a live cell imaging method based on the detection of FAD/NAD(P)H adopted from the fast-growing field of live metabolic imaging. They build upon a method described by KreiB et al 2020 that used metabolic ratio and collagen fiber second harmonic generation imaging. They follow by combining metabolic imaging with morphologic measurements to train a machine-learning model that is able to identify cell types accurately. Upon visualization, authors detected structures hypothesized and then proven to resemble the "goblet cell associated antigen passages" previously studied in intestinal epithelia.

      STRENGTHS

      • The manuscript is succinct, well written, and overall done rigorously.

      • The optimization of the method at multiple levels to the point of identifying both common and rare cell types is impressive.

      • Describes the elegant implementation of a sorely needed method in epithelial biology.

      • Provides an approach to studying the cholinergic response in epithelial cells, a poorly understood phenomenon despite broad clinical use for diagnosis and treatment.

      WEAKNESSES

      A) For what is in large part a methods-development paper, the methods are not explained or shared in a manner that facilitates reproducibility. For example:

      A.1.) The training and validation datasets seem to come from the same sample (or the source is not clearly described). Therefore, it is not clear whether the "96% accuracy" refers to accuracy within the sample measured, or whether it can extrapolate to other samples.

      In order to avoid any confusion, we further clarify that the machine learning training and validation data sets come from the same sample. We had split the total data set into 2 separate subsets for this purpose. This has been laid out in the text as follows:

      “In order to assess the performance of machine learning algorithms designed to distinguish cell types, we divided our data set into training and testing subsets. We utilized 75% of the total cells (154 cells) for machine learning training, leaving 25% (52 cells) for subsequent validation.”

      A.2.) It is unclear whether the model needs to be re-trained within each new sample measured, or if it's applicable to others. This has implications for method adoption by others. Either way is useful but needs to be clarified.

      This is a very interesting point and one that we further clarify in the Discussion noting that in both disease and non-diseased states the model needs to be re-trained in each particular experimental regime.

      A.3.) Code was only listed in a PDF file, which makes reproducing the analysis very cumbersome.

      We hope that all can utilize the code made for this methodology and have uploaded it to a publicly available GitHub account:

      https://github.com/vss11/Label-free-autofluorescence

      B) Whereas the optimization to improve cell type detection is very well described, the implementability of the approach could benefit from exploration (using the data already obtained) of the minimal set of measurements needed to identify cell types. For example, is the FAD/NAD(P)H ratio necessary? Or could just morphologic measurements achieve the same goal?

      This is an excellent point, and we appreciate the Reviewer’s suggestion for this analysis. We have added Figure 3 Supplement 5 where we perform modeling without autofluorescence data. This analysis reveals a dramatic reduction in accuracy with a Matthew’s correlation coefficient ranging from 0.66 to 0.78. This provides additional justification for the use of autofluorescence for cell type identification. Morphologic measurements are not sufficient for cell type identification alone.

      We also have determined the relative contribution of each characteristic to the cell type identification by the Xgboost algorithm in Figure 3 Supplement 4, which shows that autofluorescence signatures are amongst the top contributing characteristics to cell type identification by machine learning.

      C) Whereas the conclusions are overall supported by the data, need small adjustments in some cases:

      C.1.) For example, P3L80: Claims autofluorescence imaging is more specific than "functional markers", however, this is done in the setting of a very specific intervention that massively affects a protein often used as a secretory cell marker (CCSP aka SCGB1A1), which is known to be secreted (and depleted) in secretory cells upon stimulation.

      We agree with the Reviewer that secretory cell identification is a prime example where autofluorescence imaging may be superior to conventional staining, specifically due to the point the Reviewer makes regarding CCSP secretion. We discuss this concept in the Discussion while giving examples of CCSP staining being reduced in asthma, COPD, and smokers. It could be that these cells are missed due to depletion of CCSP. Indeed, we clarify that our methodological approach may be less affected by the loss of the category of specific markers that change with cell state. There are, of course, caveats with utilizing this approach in disease states, and we elaborate on this further below and add this point to the discussion.

      C.2.) Relatedly, it is unclear how the method's accuracy would be affected in conditions that affect redox/metabolic state; the approach may be highly affected in inflammation and injury, for example.

      As suggested by the Reviewer, we re-analyzed the data after Antimycin A + Rotenone and FCCP to determine if autofluorescence ratio is sufficiently different to identify ciliated and secretory cells and included this data in Figure 2 Supplement 1. This is an example where the redox/metabolic state is indeed altered. Though the autofluorescence ratio is affected, it is still useful for cell type identification after intervention as the ciliated and secretory cells have statistically different ratios.

      However, different disease states, particularly infection and inflammation may result in a more profound effect on autofluorescence signatures. For instance, previous work by Dilipkumar et. al, 2019 found changes in autofluorescence over days in repeated measurements in a mouse model of inflammatory bowel disease. Therefore, it is likely that the cell type identification methodology will need to be re-optimized for different experiments and diseased tissues. We include commentary to this effect in the discussion.

      D) The data used to describe "SAPs" is very cursory.

      To further elaborate on our description of SAPs we have included the following:

      1) SAP formation occurs in secretory cells in both stimulated and unstimulated conditions. We performed additional analysis of Figure 4C and determined that SAP formation does occur at baseline prior to stimulation in 9% of secretory cells. Methacholine addition results in 78% of secretory cells forming SAPs (Figure 4 Supplement 1). We have added Figure 5C to demonstrate that SAP formation occurs in the absence of stimulation and is enhanced after methacholine stimulation.

      2) We demonstrate that SAPs can uptake both FITC-dextran and FITC-ovalbumin in Figure 5E, and Figure 5 Supplement 2. We also now show that immune cells (CD11c antigen presenting cells) associate with SAPs containing FITC-dextran and FITC-ovalbumin in Figure 5E and Figure 5 Supplement 2. We have expanded the Discussion of SAPs.

      3) We now show 3 video examples and an XZ optical cross section of ALI that demonstrate uptake and secretion of FITC-dextran in Figure 5 Supplemental Videos 1-3 and Figure 5 Supplement 1.

      D.1.) Unclear if FITC dextran uptake occurs in other cells too, or in secretory cells prior to methacholine stimulation, or induced nonspecifically due to epithelia manipulation. Secretory and goblet cells are very sensitive to stimulation and often considered minimal, for example, see the paper by Abdullah et al DOI:10.1007/978-1-61779-513-8_16 in which extreme care had to be applied to prevent any secretion at all.

      Our autofluorescence methodology revealed the formation of “voids” of autofluorescence forming in secretory cells and we focused our experiments on this phenomenon. Based on the reviewer question, we generated Figure 5C to better characterize SAP formation. Figure 5C illustrates that SAP formation occurs in both unstimulated and methacholine stimulated conditions, but is dramatically increased following methacholine stimulation. This is analogous to the behavior of GAPs in the intestine (Knoop et al., 2015). Furthermore, we have reanalyzed Figure 4C to identify SAPs prior to stimulation and found that these structures are present in 9% of secretory cells. After methacholine stimulation this percentage increases to 78%.

      D.2.) A single image is provided for the SAP timeline (Figure 5C), which appears to be the same cell shown in the supplementary video.

      We now provide numerous example videos and optical XZ cross section of ALI demonstrating SAP uptake and secretion in Supplementary Videos 1-3 and Figure 5 Supplement 1.

      IMPACT AND UTILITY

      This is well-done work with high potential for widespread adoption within the epithelial biology community, particularly if the methods and code are shared in better detail.

      We indeed hope that this methodology can be utilized by others. We have posted analysis code, raw data, MATLAB algorithm, and other necessary files onto a publicly available GitHub link. https://github.com/vss11/Label-free-autofluorescence

      Reviewer #2 (Public Review):

      Shah and colleagues tackle a significant impediment to exploiting tissue culture systems that enable prospective ex vivo experimentation in real-time. Namely, the ability to identify and track dynamic and coordinated activities of multiple composite cell types in response to experimental perturbations. They develop a clever label-free approach that collects biologically-encoded autofluorescence of epithelial cells by 2-photon imaging of mouse tracheal explant culture over 2 days. They report the ability to distinguish 7 cell types simultaneously, including rare ones, by developing a machine-learning approach using a combination of fluorescence and cytologic features. Their algorithm demonstrates high accuracy by Mathew's Correlation Coefficient when applied to a test set. Lastly, they show the ability of their approach to visualize the dynamic uptake and expulsion of fluorescently-tagged dextran by individual secretory cells. Overall, the results are intriguing and may be very useful for specific applications.

      We thank the reviewers for their assessment and indeed hope that the methodology is useful and the discovery of the dynamics of SAP formation have important implications for airway mucosal immunology.

    2. Reviewer #1 (Public Review):

      The authors optimize a live cell imaging method based on the detection of FAD/NAD(P)H adopted from the fast-growing field of live metabolic imaging. They build upon a method described by KreiB et al 2020 that used metabolic ratio and collagen fiber second harmonic generation imaging. They follow by combining metabolic imaging with morphologic measurements to train a machine-learning model that is able to identify cell types accurately. Upon visualization, authors detected structures hypothesized and then proven to resemble the "goblet cell associated antigen passages" previously studied in intestinal epithelia.

      STRENGTHS<br /> - The manuscript is succinct, well written, and overall done rigorously.<br /> - The optimization of the method at multiple levels to the point of identifying both common and rare cell types is impressive.<br /> - Describes the elegant implementation of a sorely needed method in epithelial biology.<br /> - Provides an approach to studying the cholinergic response in epithelial cells, a poorly understood phenomenon despite broad clinical use for diagnosis and treatment.

      WEAKNESSES<br /> A) For what is in large part a methods-development paper, the methods are not explained or shared in a manner that facilitates reproducibility. For example:<br /> A.1.) The training and validation datasets seem to come from the same sample (or the source is not clearly described). Therefore, it is not clear whether the "96% accuracy" refers to accuracy within the sample measured, or whether it can extrapolate to other samples.<br /> A.2.) It is unclear whether the model needs to be re-trained within each new sample measured, or if it's applicable to others. This has implications for method adoption by others. Either way is useful but needs to be clarified.<br /> A.3.) Code was only listed in a PDF file, which makes reproducing the analysis very cumbersome.

      B) Whereas the optimization to improve cell type detection is very well described, the implementability of the approach could benefit from exploration (using the data already obtained) of the minimal set of measurements needed to identify cell types. For example, is the FAD/NAD(P)H ratio necessary? Or could just morphologic measurements achieve the same goal?

      C) Whereas the conclusions are overall supported by the data, need small adjustments in some cases:<br /> C.1.) For example, P3L80: Claims autofluorescence imaging is more specific than "functional markers", however, this is done in the setting of a very specific intervention that massively affects a protein often used as a secretory cell marker (CCSP aka SCGB1A1), which is known to be secreted (and depleted) in secretory cells upon stimulation.<br /> C.2.) Relatedly, it is unclear how the method's accuracy would be affected in conditions that affect redox/metabolic state; the approach may be highly affected in inflammation and injury, for example.

      D) The data used to describe "SAPs" is very cursory.<br /> D.1.) Unclear if FITC dextran uptake occurs in other cells too, or in secretory cells prior to methacholine stimulation, or induced nonspecifically due to epithelia manipulation. Secretory and goblet cells are very sensitive to stimulation and often considered minimal, for example, see the paper by Abdullah et al DOI:10.1007/978-1-61779-513-8_16 in which extreme care had to be applied to prevent any secretion at all.<br /> D.2.) A single image is provided for the SAP timeline (Figure 5C), which appears to be the same cell shown in the supplementary video.

      IMPACT AND UTILITY<br /> This is well-done work with high potential for widespread adoption within the epithelial biology community, particularly if the methods and code are shared in better detail.

    1. il faut démontrer que l’on sait « jouer » avec le langage et utiliser un code partagé par d’autres utilisateurs, qui se rapproche sur certains aspects du code que l’on apprend à l’école

      au delà du code partagé, le langage SMS est aussi une mode depuis plusieurs années.

    1. So the special part of this is then you add a README.md file and this can then have HTML code, markdown text, or anything you'd like. Some profile read me's I've seen are really fancy, others are like a mini webpage for content. I have mine set to share some bio information, social media, and then some recent blog posts

      Interesting idea about using combination of Github profile page and some GitHub actions to have an automatically-updated profile

  3. notebooksharing.space notebooksharing.space
  4. notebooksharing.space notebooksharing.space
    1. Group project: the Climate System

      code ok, text not ok. Often no sources, some mistakes, and badly formatted / explained.

      Correctness 3 Quality 1 Originality 1 Total 5 / 10

  5. notebooksharing.space notebooksharing.space
    1. Group project: the Climate System

      Very nice code and plots (with the exception of the poor choice of colormaps sometimes).

      However the explanations are below average, and no source is cited!!!

      Correctness 2 Quality 2 Originality 2 Total 6/10

    1. After the discovery of the structure of DNA molecules by Crick and Watson in 1953, Kimura knew that genes are molecules, carrying genetic information in a simple code. His theory applied only to evolution driven by the statistical inheritance of molecules. He called it the Neutral Theory because it introduced Genetic Drift as a driving force of evolution independent of natural selection.

      !- reason behind name of theory : independent of natural selection

    1. Translate Select LanguageAfrikaansAlbanianAmharicArabicArmenianAssameseAymaraAzerbaijaniBambaraBasqueBelarusianBengaliBhojpuriBosnianBulgarianCatalanCebuanoChichewaChinese (Simplified)Chinese (Traditional)CorsicanCroatianCzechDanishDhivehiDogriDutchEsperantoEstonianEweFilipinoFinnishFrenchFrisianGalicianGeorgianGermanGreekGuaraniGujaratiHaitian CreoleHausaHawaiianHebrewHindiHmongHungarianIcelandicIgboIlocanoIndonesianIrishItalianJapaneseJavaneseKannadaKazakhKhmerKinyarwandaKonkaniKoreanKrioKurdish (Kurmanji)Kurdish (Sorani)KyrgyzLaoLatinLatvianLingalaLithuanianLugandaLuxembourgishMacedonianMaithiliMalagasyMalayMalayalamMalteseMaoriMarathiMeiteilon (Manipuri)MizoMongolianMyanmar (Burmese)NepaliNorwegianOdia (Oriya)OromoPashtoPersianPolishPortuguesePunjabiQuechuaRomanianRussianSamoanSanskritScots GaelicSepediSerbianSesothoShonaSindhiSinhalaSlovakSlovenianSomaliSpanishSundaneseSwahiliSwedishTajikTamilTatarTeluguThaiTigrinyaTsongaTurkishTurkmenTwiUkrainianUrduUyghurUzbekVietnameseWelshXhosaYiddishYorubaZulu let monitorGoogleTranslateBarChecks = 0; const disableTranslationPathList = [ //"/patientsfamilies/search_doctors/", //"/patientsfamilies/search_doctors" //"/princessmargaret/patientsfamilies/guide_princess_margaret/" ]; const myPagePath = _spPageContextInfo.webServerRelativeUrl.toLowerCase(); const disableTranslation = disableTranslationPathList.find(element => { if (element.includes(myPagePath)) { return true; } }); function setTextDirection() { var rightToLeftLanguagesList = ["ar","iw","ps","fa","sd","ur"]; function checkLang(lang) { return lang == selectedLanguage; } $(".pmcp-content").css("direction","ltr"); $("#content-area").css("direction","ltr"); if (rightToLeftLanguagesList.find(checkLang)) { $(".pmcp-content").css("direction","rtl"); $("#content-area").css("direction","rtl"); } else { $(".pmcp-content").css("direction","ltr"); $("#content-area").css("direction","ltr"); } } function enhanceGoogleTranslateBar() { //Add onClick event to the close option within the Google Translate top bar. $('.goog-te-banner-frame').contents().find('a.goog-close-link').on("click", function() { $("#translateLink").show(); $("#google_translate_element_header").hide(); selectedLanguage = ""; setTextDirection(); }); //Add onClick event to the "Translate" button - Google Translate top bar. $('.goog-te-banner-frame').contents().find('button[id*=confirm]').on("click", function() { const myTimeoutTranslateButton = setTimeout(function () { if (getCookie("googtrans") != '') { selectedLanguage = getCookie("googtrans").slice(-2); setTextDirection(); }; }, 500); }); //Add onClick event to the "Show Original" button - Google Translate top bar. $('.goog-te-banner-frame').contents().find('button[id*=restore]').on("click", function() { $(".pmcp-content").css("direction","ltr"); $("#content-area").css("direction","ltr"); }); //Add onClick event to the language list within the Google Translate top bar. $('.goog-te-banner-frame').contents().find('a.goog-te-menu-value').on("click", function() { $('.goog-te-menu-frame').first().contents().find('a.goog-te-menu2-item').on("click", function() { selectedLanguage = ""; if (getCookie("googtrans") != '') { selectedLanguage = getCookie("googtrans").slice(-2); }; setTextDirection(); sendGoogleTranslateAnalyticsEvent(selectedLanguage,"bar") }); }); /* if (getCookie("googtrans") != '') { selectedLanguage = getCookie("googtrans").slice(-2); setTextDirection(); }; */ } function monitorGoogleTranslateBar() { if ($('iframe.goog-te-banner-frame').length > 0) { const myTimeoutGoogleTranslateBar = setTimeout(enhanceGoogleTranslateBar, 1000); } else { if (monitorGoogleTranslateBarChecks < 40) { const myTimeoutGoogleTranslateBar = setTimeout(monitorGoogleTranslateBar, 500); monitorGoogleTranslateBarChecks++; } } } function getCookie(cname) { let name = cname + "="; let decodedCookie = decodeURIComponent(document.cookie); let ca = decodedCookie.split(';'); for(let i = 0; i <ca.length; i++) { let c = ca[i]; while (c.charAt(0) == ' ') { c = c.substring(1); } if (c.indexOf(name) == 0) { return c.substring(name.length, c.length); } } return ""; } function googleTranslateElementHeaderInit() { new google.translate.TranslateElement({pageLanguage: 'en'}, 'google_translate_element_header'); } function sendGoogleTranslateAnalyticsEvent(lang,loc){ let selectedLanguage = lang; let selectorLocation = "UHN Header"; if (loc == "bar") selectorLocation = "Google Translation Bar"; else if (loc == "header") selectorLocation = "UHN Header"; else if (loc == "disclaimer") selectorLocation = "Disclaimer In Page Dropdown"; else if (loc == "page") selectorLocation = "Page Translation"; const languageForEventAction = 'Selected language: ' + selectedLanguage; const urlForEventLabel = location.href; if (typeof (_gaq) !== "undefined") { _gaq.push(['_trackEvent', 'Google Translate - ' + selectorLocation, languageForEventAction, urlForEventLabel ]); } if (typeof (ga) !== "undefined") { ga('send', { hitType: 'event', eventCategory: 'Google Translate - ' + selectorLocation, eventAction: languageForEventAction, eventLabel: urlForEventLabel }); } //--- Start of new PM GA4 Code --- if((location.pathname.toLowerCase().indexOf("/princessmargaret") > -1)){ try { sendPmGa4Event(PM_GA4_TERMS.SOURCE.TRANSLATE, PM_GA4_TERMS.ACTIONS.SELECTED_LANG_FROM + selectorLocation, selectedLanguage); } catch (error) {} } //--- End of new PM GA4 Code --- } function checkCurrentLanguageSelection (disableTranslateButton) { if (typeof disableTranslateButton === 'undefined') { disableTranslateButton = false } if (getCookie("googtrans") != '') { selectedLanguage = getCookie("googtrans").slice(-2); sendGoogleTranslateAnalyticsEvent(selectedLanguage,"page") if (!(disableTranslateButton)) { $("#translateLink").hide(); $("#google_translate_element_header").show(); } setTextDirection(); monitorGoogleTranslateBar(); } } $(function() { ///update translateLink $("#translateLink").attr("href","/corporate/AboutUHN/Website/Pages/translation-disclaimer.aspx?referrer=" + myPagePath); if (disableTranslation !== undefined) { $("#translateLink").addClass("translationUnavailable"); $("#translateLink").text("Translation Unavailable"); $('#translateLink').click(function(e) { e.preventDefault(); }); } else if (window.location.pathname.includes("translation-disclaimer.aspx") || window.location.pathname.includes("-fr.aspx") || window.location.href.includes("Portugu") || window.location.href.includes("(Spanish)") || window.location.href.includes("(French)") || window.location.href.includes("(Italian)") || window.location.href.includes("(Vietnamese)") || window.location.href.includes("(Chinese)") || window.location.href.includes("(69)")) { $("#translateLink").addClass("translationDisabled"); $('#translateLink').click(function(e) { e.preventDefault(); }); } else { $.getScript("//translate.google.com/translate_a/element.js?cb=googleTranslateElementHeaderInit", function(){ const googleTranslateContainerElement = document.getElementById('google_translate_element_header'); const observer = new MutationObserver(function(mutationList) { mutationList.forEach(function(mutation){ mutation.addedNodes.forEach(function(added_node){ if(added_node.className == 'goog-te-combo' && added_node.nodeName == 'SELECT') { observer.disconnect(); performGoogleTranslateTweaks(); } }) }) }); observer.observe(googleTranslateContainerElement, {subtree: true, childList: true}); function performGoogleTranslateTweaks(){ $('div.goog-te-gadget').first().contents().eq(1).remove(); $("div.goog-te-gadget span").remove(); $(".goog-te-combo").change(function () { selectedLanguage = $(".goog-te-combo option:selected" ).val(); sendGoogleTranslateAnalyticsEvent(selectedLanguage,"header") setTextDirection(); monitorGoogleTranslateBar(); }) checkCurrentLanguageSelection(false); }; });; } });

      The translate selection is very important for the users of this site and acknowledged the different cultural/demographic backgrounds for the users who reside in Toronto and may visit this website to figure out more information about the nearest hospital to them. The button the top right is easily distinguishable as it is highlighted with a blue button and bolded letters. I find this to be a successful implementation of perceivability.

    1. Unfortunately, when these codes of ethics are ignored, it creates an unethical environment for humans being involved in a sociological study.

      I think this goes to show just how important code of ethics are and holding researches accountable to follow them.

    2. Henrietta Lacks: Ironically, this study was conducted at the hospital associated with Johns Hopkins University,

      To have the same hospital to establish the code of ethics and conduct this study feels incredibly backwards.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2022-01758

      Corresponding author(s): Harbison, Susan and Souto-Maior, Caetano

      [Please use this template only if the submitted manuscript should be considered by the affiliate journal as a full revision in response to the points raised by the reviewers.

      If you wish to submit a preliminary revision with a revision plan, please use our "Revision Plan" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      We thank the reviewers for their time and care in evaluating our manuscript. They raise several important points, which we have addressed, resulting in a greatly improved manuscript. Please note that we numbered the comments from both reviewers for ease of reference, as we cross-referenced comments in some cases. Reviewer comments are in italics; our responses are provided in plain text.

      2. Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      *Summary*:

      *The authors of this work generated a Sleep Advanced Intercross Population from 10 extreme sleeper Drosophila Genetics Reference Panel. This new out-bred population was subjected to a artificial selection with the aim of understanding the genes underlying the sleep duration differences between three populations: short-sleep, unselected, and long-sleep. Using analysis of variance the authors identified up to nearly 400 of genes that were significant selected over the various generations and showed opposite trends for long and short sleep, thus potentially relevant for the regulation of sleep duration. 85 of these genes were consistent between male and females sub-populations, suggesting a small number of genetic divergences may underlie sex-independent mechanisms of sleep.

      Given the time-course nature of the generational data obtained, the authors studied potential correlations and interactions between these 85 identified candidate genes. Initially, the authors used pairwise Spearman correlation, noticing how this method could not filter most of pairwise interaction (around 40% of all possibilities were significant). To overcome the linear limitations of the previous approach, the authors implemented a more complex, non-linear Gaussian process model able to account for pairwise interactions. This new approach was able to identify a smaller number of different, and potentially more informative, correlations between the candidate genes previously identified.

      Lastly, with genetic manipulations, the authors show in vivo that a subset of the candidate genes is causally related with the sleep duration as well as partially validating some of the correlation identified by their new model.

      The authors conclude that, given the non-linear and complex nature of biological systems, simplistic linear approaches may not suffice to fully capture underlying mechanisms of complex traits such as sleep.

      *Major comments*

      1. Most of the the work presented focus on the computational and statistical analysis of different populations submitted (or not) to a process of artificial selection for short or long sleep duration. As such, the amount of potentially relevant biological conclusions to be tested is mostly unfeasible. The authors already present additional experiments to partially support some, though not all, of their findings. Given the manuscript is written as a method innovation, these additional experiments illustrate the potential uses of the method described. *

      Our response: The reviewer raises a very important point, one that is at the very impetus of our work. We agree that it is not possible to test all combinations of genes in all contexts to determine whether they influence sleep or not. In contrast to the situation for circadian rhythms, where the core clock is controlled by just four genes, recent work has concluded that sleep is a set of complex traits influenced by large numbers of genes. Robust computational methods are needed to identify the complex interactions among genes. The current manuscript is a first step towards achieving this goal.

      *(OPTIONAL) However, since the one of the focuses of this work in identifying potential gene interactions, it would be interesting if the authors could test a "double knockout" and perhaps demonstrate evidence for epistasis between two of the identified genes. Having access to single mutants, this experiment should be realistic. However, I have no hands-on experience working with Drosophila and I am unable to accurately estimate the amount of resources and time such and experiment could take. My initial guess would be 3-6 months work should suffice. *

      Our response: The reviewer makes an interesting proposal. While such an experiment would provide some additional information, our method does not make any prediction about what a double knockout would do, either to the sleep phenotypes or to gene expression.

      2. In regards to the gene CG1304, it seems to be an important example used throughout the manuscript. It should be carefully re-analyzed as was considered for interaction analyses without showing opposite trends for short- and long-sleep populations (see minor comments on figure 2).

      Our response: We are not entirely certain that we understand the reviewer’s point. We note that significant genotype-by-selection-scheme interactions may not manifest as opposite trends and this is not what is being tested for significance. The likelihood ratio is a test for a significant effect of including sel x gen coefficients for both short and long schemes; therefore, GLM significance may mean that either one or the two selection schemes are significantly different from controls, not from each other. We could, for instance, apply three different tests: one (i) comparing between long and short flies; the second (ii) __comparing short flies to controls; and the third (iii) __comparing long to controls and find that the first test is significant — i.e. short is different from long — and that the two others are not — i.e. neither scheme is found to be different from controls. The opposite could also happen: short and long flies may not be different from each other, but with both being different from controls.

      Since we are interested in identifying differences of either to controls, our choice of statistical test is equivalent to performing tests (ii) __and (iii)__ without the need to perform and correct for multiple tests. While there are caveats to this choice (like all choices), linear model-based differential expression analysis has its own caveats, and has limited ability to pick up arbitrary trends, so it serves as a coarse-grained filter for large shifts since it’s too costly (computationally) to run the Gaussian process on 50 million pairwise combinations.

      *3. One major comment would be that the claim that the Gaussian process method is more sensitive and specific than simpler approaches, though intuitively understandable, does not seem to be fully correct from a strict statistical point of view, given the lack of a gold standard reference to compare if the new method is indeed picking more true positives/negatives. I would reconsider re-rephrasing such statement in the absence of a biologically relevant validation set. *

      Our response: We agree with the reviewer that there is no ‘gold standard’ reference data set with which to compare our findings. We have softened this language a bit in response, where it occurs in both the Abstract and the Results.

      Under Abstract, we changed “Our method not only is considerably more specific than standard correlation metrics but also more sensitive, finding correlations not significant by other methods” to “Our method appears to be not only more specific than standard correlation metrics but also more sensitive, finding correlations not significant by other methods.”

      Under Results, we changed “Therefore, computing correlations between genes using covariance estimates from the Gaussian Processes greatly increases specificity over direct correlations. Furthermore, the Gaussian processes are not only more specific but more sensitive…” to “Therefore, computing correlations between genes using covariance estimates from the Gaussian Processes appears to increase specificity over direct correlations. Furthermore, the Gaussian Processes appear to be more sensitive…”

      *4. Finally, the study appears to be well powered and it is clear that the authors were careful in their explanation of the statistical methods. However, I could not find the copy of the code/script used for the model. Without it, it would be very difficult to fully reproduce the results as both the language used (Stan) and the method itself are not common in the sleep research field. *

      Our response: We thank the reviewer for noticing this, and apologize for this oversight. The code used for analysis has been deposited in GitHub under: https://github.com/caesoma/Multiple-shifts-in-gene-network-interactions-shape-phenotypes-of-Drosophila-melanogaster.

      We have noted the script location in the Data Availability statement. We added a statement to read “All scripts used for the model have been deposited in Git Hub https://github.com/caesoma/Multiple-shifts-in-gene-network-interactions-shape-phenotypes-of-Drosophila-melanogaster.”

      * * *Minor comments* * 5. The statistical cut-off used for gene expression hierarchical GLMM after BH correction was of 0.001, which is 50 times more strict than the common 0.05. Could the authors comment on how this choice may impact the results compared to those available in the literature and on the rational for choosing such a value.*

      Our response: A FDR of 0.05 would increase the number of genes identified (3,544 for females; 1,136 for males, with 462 overlapping). The FDR of 0.001 is consistent with the lowest threshold typically used for gene expression data collected during other artificial selection experiments (Mackay et al., 2005; Morozova et al., 2007; Edwards et al., 2006), though thresholds as high as 0.20 have been used (Sorensen et al., 2007). We have added to the last statement to the Methods and Materials section under “Generalized Linear Model analysis of expression data” to read “Model p-values were corrected for multiple testing using the Benjamini-Hochberg method (Benjamini and Hochberg, 1995), with significance defined at the 0.001 level, consistent with the lower threshold applied in other artificial selection studies (Mackay et al., 2005; Morozova et al., 2007; Edwards et al., 2006).”

      *6. Heritability calculations are not mentioned in the methods. Could it be useful to include a small paragraph? Could a small comment be done on the differences in h2 for the short sleep replicates which show ~10x difference? *

      Our response: We thank the reviewer for noticing this omission and apologize for the oversight. We have added the following statements to the Methods and Materials under “Quantitative genetic analyses of selected and correlated phenotypic responses.”

      “We estimated realized heritability h2 using the breeder’s equation:

      h2 = ΣR/ΣS

      where ΣR and ΣS are the cumulative selection response and differential, respectively (Falconer and Mackay, 1996). The selection response is computed as the difference between the offspring mean night sleep and the mean night sleep of the parental generation. The selection differential is the difference between the mean night sleep of the selected parents and the mean night sleep of the parental generation.”

      Additionally, we thank the reviewer for noticing the large difference in the realized heritability between the short sleeping population replicates; the heritability for replicate 1 is a typo and should be 0.169, not 0.0169. Hence, the heritabilities of both replicate populations are quite similar, i.e., 0.169 for replicate 1 and 0.183 for replicate 2. We have corrected this error in the Results.

      7. In regards to the model implementation, what would be the implications of not enforcing positive semi-definiteness on the co-variance matrix, given than that these are strictly positive semi-defined?

      Our response: All covariance matrices are by definition positive semi-definite (PSD), since they cannot yield negative values for the probabilities associated to them, so it would not be possible to relax that assumption generally. The only choice we could make would be on the number of genes included (M) in each multi-channel gaussian process model, and this in turn would by design enforce positive semi-definiteness on an matrix of size MN, (N being the number of generations). As noted in the appendix, “enforcing” positive semi-definiteness on smaller blocks of a larger 2D-array of covariances (which is not itself a covariance matrix) does not imply the latter is PSD and therefore seems like a softer constraint. In practice scaling up to a model where M >> 40 is not trivial from a computational and inference point of view, so the choice of smaller M is in a way imposed on us, and fortunately it is the less limiting one. We provide the appendix as a general clarification on the subtleties of Gaussian Processes, but a comprehensive assessment is beyond the multidisciplinary scope of this article and would require a narrower mathematical/statistical description in a standalone methodological article or technical note.

      1. *The methods mention that PCA projection were performed on the first 3 components, however only the first two are showed. *

      Our response: PCA was performed on 10 components, although the algorithms will commonly compute all components and return only the selected number. The variance of the third component is smaller than ~5% (that of the second PC). In practice PC1 is by itself enough to show the clear separation of expression per sex with ~65% of the variance; PC2 is in fact only shown to improve visualization. Plots of the remaining components will not show clear separation among samples as the variance explained is so small. We have corrected the Methods to indicate that PCA was performed on 10 components rather than 3.

      *9. Figure 1 refers to the mean night sleep time of the population. Could some measurement of variability (se or sd) be represented to provide a general idea of the distribution of the values? Additionally, the standard deviation of associated with the CVe estimates are mentioned but not showed explicitly. Could they maybe be added to the text as to illustrate how much such deviations were reduced? *

      Our response: We thank the reviewer for this comment. Including either the standard errors or standard deviations on the plot of the response to selection (Figure 1A) makes visualization unwieldy; thus we have added an additional supplemental table, Supplementary Table S15, that contains the mean night sleep, standard deviation, and number of flies measured for each generation in each replicate population. We also added a plot of the standard deviation in night sleep per generation to Supplemental Figure S2 (letter “Q” in the figure) so that the reduction over time in each population can be seen.

      Under “Data Availability,” We added the following: “Night sleep phenotypes per selection scheme/sex/generation/population replicate are listed in Table S15.”

      *10. Figure 2 shows the linear model fits for gene CG1304. I find this gene on the list of significant genes for both sexes (tables S5/6), but it does not seem to be one that shows opposite trend for short- and long-sleep (tables S7/8). Surprisingly, it shows up again on table S10! However, the text introducing the figure reads like this should be one of the 85 sex-independent genes. Would it be best to provide an example of what a significant gene looks like? *

      Our response: As mentioned in our response to comment #2 above, significance in the likelihood-ratio test does not imply opposite trends between long and short selection schemes, but between a model that includes specific slope coefficients for selection scheme by generation (both long and short) compared to a reduced model where the only slope is one associated to generation and therefore independent of selection scheme.

      11. *Figure 3 would be interesting to have both the GP correlations and the Spearman correlations to illustrate the methodological differences. I would be curious to see at least one pairwise expression scatter-plot as well just to see how they correlate in one plot. *

      __Our response: __Table S11 contains all (significant and nonsignificant) GP and Spearman values side-by-side for comparison. High correlations are likely to conform to the Spearman assumptions of a monotonic relationship; nevertheless, this will not be so for the majority of genes since the difference in the number of Spearman and GP-significant genes is tenfold or more, so it would be misleading to focus on individual-gene relationships without taking into consideration the transcriptome wide results for any method employed.

      We would like to stress that there is nothing particularly special about CG1304 in and of itself; furthermore, there are no “representative” genes or figures in this manuscript. Instead, CG1304 is chosen because its GLM and GP fits are illustrative of the limitations and capabilities of each model to pick up certain kinds of trends, and especially because it is especially instructive of how correlations arise from the GP model, which may not be intuitively clear to all readers.

      12. Figures 3S3/4 are described as showing single- and multi-channel models don't change substantially. Would this be expected and why?

      Our response: This is not necessarily expected, as scaling up from a single to a multi-channel model will add additional parameters as well as constraints, like positive the semi-definiteness mentioned in comment #7 above. If that seemed to have considerable impact on the fits it could challenge our assumption that the signal variance parameters estimated from the single-channel are good priors for the same parameters in the two-channel model (although this is not a hard constraint, so in the worst case the result could still only be a slight bias).

      *13. Having build different networks of pairwise associations of genes (projecting on a unified network as illustrated on figure 5), it could in interesting to compare the network topologies at a basic level such as node degrees, overlapping sub-networks, are they potentially scale free as previously described for biological systems, etc. *

      __Our response: __The reviewer makes an interesting point. Indeed summaries of the network could be useful information about the system level parameters, which are the main results of this paper. We now include the number of connections (i.e., the degree) to each gene in each of the four networks presented in Figure 5 in a new supplemental Table (Table S13). We also plot the distribution of node connectivity below. The distributions do not appear random (i.e., a normal distribution), and appear closer to a power-law or scale-free distribution. However, the small size and low average degree of these networks make a formal test unfeasible, and a recent study suggests that a log-normal distribution is in general more likely than a power-law distribution (Broido et al., Nat Comm, 2019), so we lack the evidence to claim that these networks are scale-free.

      We have added to the Results under “Gaussian Process model analysis uncovers nonlinear trends and specifically identifies covariance in expression between genes”: “Table S13 lists the number of connections (degrees) that each gene has with others in the network. The average number of connections for long-sleeper males was 2.6; the other three networks had average degrees of 2.0 or less (2.0 for long-sleeper females and short-sleeper males; 1.75 for short-sleeper females).”

      *14. On table S6 I noticed some gene symbols were loaded as dates (1-Dec) *

      Our response: We thank the reviewer for noticing this, the gene symbol is supposed to be dec. We have corrected this in Table S6 (now Table S7).

      1. *In results, the phenotypical response to artificial selection is sometimes described in minutes, other times in hours. Though this is an hurdle, it could make the values easier to compere if they were consistently formatted as minutes (hours). *

      Our response: We are unsure what the reviewer is referring to. We only see one sentence in which we used hours, and that was the concluding sentence under Results, “Phenotypic response to artificial selection.” The remainder of the manuscript refers to sleep times in minutes, phenotypes in all of the figures are plotted as minutes, and all of the supplemental material refers to times in minutes.

      16. *Over 99% of chains converged after three runs. Even though the reasons for the lack of convergence of these chains was not investigated, could this be a relevant effect? 1% of 3570 interactions is still 35 potential interactions. Do the non convergent chains relate with specific genes? *

      Our response: Bayesian MCMC inference is a stochastic algorithm, so there is a finite chance that any given run doesn’t converge, and that means that all eight parallel chains must converge and mix as measured by the stringent choice of R-hat metric being within 0.05 of unity. Relaxing the interval to 0.1 or 0.2 could still be acceptable, but we made the choice of a stringent threshold to avoid making interpretations on less-than-ideal runs. There is no evidence that there is any gene-specific problem, usually it would be one out of eight chains that would not mix well and throw off the diagnostic metrics (like relaxing the metrics, an acceptable approach could be accepting a run with 6-7 chains converging properly, but we decided to rerun all chains and only accept 100% convergence but accept a possible loss). Non-converging/nonmixing runs are likely to eventually do so, but since were are running tens of thousands of runs (3570 pairwise combinations × 3 schemes × 8 chains) a massively parallel implementation in a HPC cluster is required. Finally, seeing that 145 is ~4% of the total number of interactions, a naïve expectation would be that no more than one interaction would come out significant — while there is a chance that an interesting interaction was identified, the same can be said for potential false negatives computed using the GLM, which is a consequence of working at a high-throughput scale.

      17. The GO terms identified as significantly enriched after pvalue correction point to a clear association of the 85 genes identified with Serine proteases. Could this be discussed further to highlight biological findings of the work in the context of neuronal function or sleep regulation?

      Our response: The reviewer is correct, nine putative Serine proteases are significantly enriched among the 85 genes. All nine exhibit some expression in neurons and in epithelial cells, and all are expressed at the adult stage. The appearance of these enzymes is interesting given their role in proteolysis.

      We have updated the Discussion to read, “Interestingly, our Gene Ontology analysis identified nine genes from the 85-gene network with predicted Serine endopeptidase/peptidase/hydrolase activity: CG1304, CG10472, CG14990, CG32523, CG9676, grass, Jon65Ai, Jon65Aii, and Jon99Fii. All of these genes are expressed in neurons and epithelial cells, and all genes are expressed at the adult stage (Li et al., 2022). Serine proteases are a large group of proteins (257 in Drosophila) that perform a variety of functions (Cao and Jiang, 2018). Their predicted enzymatic activity suggests a putative role in proteolysis. This is an intriguing observation given pioneering work in mammals which suggested a role for sleep in exchanging interstitial fluid and metabolites between the brain and cerebral spinal fluid (Xie et al., 2013). Recent work demonstrated that a similar function is conserved in flies via vesicular trafficking through the fly blood-brain barrier (Artiushin et al., 2018). It would be interesting to determine whether these genes function in this process.”

      *18. Could the authors discuss the little overlap between males/females and shot/long sleep for 145 gene pairs identified after the MCMC runs. Similarly, how can the network differences be explained from a biological/evolutionary perspective? *

      Our response: The reviewer asks an interesting question. We did not detect sex-specific responses to artificial selection for long or short sleep in the present experiment. Yet differences in gene expression network pairs between males and females exist, and as the reviewer mentions, we also observed differences in network pairs between long sleepers and short sleepers. These differences reflect an inescapable conclusion: a given sleep duration phenotype can originate from more than one gene expression network configuration.

      19. *In the mutational analyses it is pointed out that CG12560 and Jon65Aii only affect females significantly. However, in the following sentence, the authors claim these two genes had the greatest effect on both sexes, which seems contradictory, at least in the way it is described. *

      Our response: Our wording may have been confusing, given that it came after a comment about Jon65Aii. Our exact statement was “Effects of the Minos insertions on night sleep duration were stronger in females than in males; when sexes were examined separately, only mutations in CG12560 and Jon65Aii affected male night sleep duration.” This was meant to convey that the effects of all Minos insertions were the same directionally for both males and females, but that only CG12560 and Jon65Aii insertions had statistically significant effects on each sex separately. We have re-worded this sentence to read “All Minos insertions had the same directional effect on night sleep for both males and females, but only the CG12560 and Jon65Aii insertions had statistically significant effects on night sleep on each sex separately.”

      20. *Maybe a small comment on how unchanged expression could lead to the observed phenotypical variation could help understanding how Minos mutations effects are biological mediated for those not familiar with the method. This seems to be the authors expectation so, could it be non-functional proteins or something else? *

      Our response: The reviewer raises an interesting point. We did not observe changes in gene expression for CG13793, Cyp6a16, or hiw compared to w1118 controls. Thus far, we have examined gene expression relative to the control for a single timepoint, and only in pooled whole flies. Differential gene expression between the Minos mutants and controls might occur at a different timepoint, or in a small set of key neurons that would be undetectable when comparing whole flies.

      We expand on this in Results, under “Mutational analyses confirms the role of candidate genes and interacting expression networks in sleep”: “Potential reasons for the lack of a significant change in gene expression in the remaining lines include: the position of the insertion within the targeted gene, which has variable effects on its expression; the relatively low statistical power of the experiment; confining our observation to a single timepoint during the day; or pooling whole flies, which might obscure gene expression changes occurring at a single-tissue level.”

      *21. The assumption that interacting genes would have their expression ratio changed by the Minos insertion would hold on situation where the affected gene causally interferes with the candidates expression. As far as I understand, causality cannot be inferred by the proposed method. Thus in a situation where both genes are co-regulated by a third factor, no change in expression ratio is to expected. How would the authors re-interpret their final result when considering this direct vs indirect interaction distinction? *

      Our response: Our method only gives us the hypothesis that two genes interact based on their correlation, and that is what we test using the Minos insertions. We do not as yet have a way to identify a third gene or factor that might be regulating the two. Given the number of genes affecting sleep, it is quite likely that there are such factors, but we can only report and test what we’ve observed. Any interpretation based on an arbitrary third factor would be purely speculative.

      **Referees cross-commenting**

      22. *I agree with Reviewer #2 comments which, to me, reads as generally pointing out the lack of biological interpretation of the results (and thus connecting this study with previous literature). Adding this component would make the manuscript well-rounded and attractive to a wider audience. *

      Our response: We agree with both reviewers that additional biological interpretation of the results would make the manuscript more attractive to a wider audience. Accordingly, we have added the following paragraph to the Discussion: “The genes we identify herein overlap and extend previous work. Of the 1,140 genes implicated in the generalized linear model, 151 (13.2 percent) overlapped with previous candidate gene, random mutagenesis, gene expression, and genome-wide association studies of sleep and circadian behavior in flies (Pegoraro e t al., 2022; Dissel et al., 2015; Seugnet et al., 2017; Shalaby et al., 2018; Thimgan et al., 2010, Thimgan et al., 2018, He et al., 2013; Mallon et al., 2014; Roessingh et al., 2019, Feng et al., 2018; Lee et al., 2021; Khoury et al., 2020; Wu et al., 2018; Harbison et al., 2013; Harbison et al., 2009; Harbison et al., 2017; Harbison et al., 2019). Notably, previous studies identified the genes CG17574, cry, dro, mip120, Mtk, NPFR1, pdgy, PGRP-LC, Shal, and vari as affecting sleep duration (Feng e t al., 2018, Dissel et al., 2015; Pegoraro et al., 2022; Thimgan et al., 2018; Mallon et al., 2014; He et al., 2013; Khoury et al., 2020; Harbison et al., 2013). Two genes, ringer and mip120, overlapped with our previous study of DNA sequence variation in flies selected for long and short sleep (Harbison et al., 2017). In that study we identified a polymorphism in an intron of ringer that changed in allele frequency with selection, with increases in the population frequency of the ‘G’ allele with increasing sleep, while the frequency of the ‘A’ allele increased with decreasing sleep. When the selective breeding procedure was relaxed, the frequency of the ‘G’ allele increased in short-sleeping populations, paralleling an increase in sleep (Souto-Maior et al., 2020). One possibility is that this polymorphism contributes to the changes in gene expression in ringer that we observed in the present study. Of the 85 genes common to both sexes that we used in the gene interaction networks, 11 (13 percent) appear in other studies of sleep: CG10444, CG2003, CG5142, CG6785, CG9114, CG9676, CR42646, hiw, NPFR1, Tie, and wb (He et al., 2013; Seugnet et al., 2017; Wu et al., 2018; Harbison et al., 2013). Thus, our study corroborates genes known to affect sleep, and identifies new candidate genes for sleep as well.”

      Reviewer #1 (Significance (Required)):

      *This study proposes the application of advanced non-linear methods to study complex traits such as sleep. As implemented, Gaussian Processes are able to identify non-linear correlations between two biological features (e.g. transcripts) over time (e.g. generations), representing an attempt to push the analytical methods available beyond the single gene paradigm. As such, more than the relevance of the biological results themselves, the authors focus on the explaining and illustrating the application of methodological advances obtained, and its relevance to obtain a better understanding of biological systems.

      However the mathematical principles required to understand the implemented method are not trivial and require advanced knowledge of machine learning and statistics. This is a potential barrier, though not an impediment, to its quick and wide adoption by the community. In addition, even if demonstrated to be a valid method when working with Drosophila, the resolution required to perform such a study may be difficult to obtain with other model systems, which would likely require further refinement of the statistical approach.

      The main audience interested in this work would be basic sleep researchers. However, this work is also related to the understanding gene selection over an artificial evolutionary process, thus evolutionary and developmental biologist may be also be interested. The methodology itself, already used in other fields of study, is a general statistical tool that could be adopted by a broad range of researchers for a diversity of topics. As such, I believe with this work, the authors will be able to stimulate the development and/or rethinking of the available analytical methods to study complex biological systems, though this would likely be done either in collaboration with the authors themselves or by a specific subset of researchers who regularly work with advanced mathematical, statistical and computational principles.

      (disclaimer) My mathematical formation does not reach the PhD level expertise that may be required to fully understand the methodology described. I have never personally worked with D. melonogaster or used Gaussian Processes in a professional setting. As such, I may not be able to fully evaluate/appreciate the more detailed technical aspects of this work.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Souto-Mairo et al. reports phenotypic and genotypic effects of artificially selecting for short and long sleep in flies. They generated an impressive time-series dataset where one could examine genetic and phenotypic changes across time (generations, total 13 generations) in response to the selection pressure. The authors explored the relationships between pairs of genes in addition to just identifying potential candidate genes involved in the regulation of the amount of sleep.

      Major points:

      1. Harbison et al 2017: This study seems to be a continuation of Harbison et al 2017. There needs to be a clearer approach in the text (introduction?) in elucidating how this study is really advancing the findings of Harbison et al., 2017. Do the two studies use the same selection lines? If not, how are they different? If they are not different, what might cause the phenotypes evolving differently? For example, day sleep, day bout number do not respond to the selection pressure similarly in both studies etc. *

      Our response: We would like to emphasize that this study is not a continuation of the Harbison et al., PLoS Genetics, 2017 paper, where we examined the changes in DNA sequence during artificial selection, and it does not use the same selection lines. The fact that the two studies are different can be seen from an examination of Figure 1A of the current study and Figure 1A of the Harbison et al 2017 study. The trajectories of each population across generation are very different. Out of convenience, we used the same nomenclature to refer to the populations in both studies (L1, L2, S1, S2, etc.), and apologize if this is the source of the confusion. Both studies do originate from the same outbred population, however, and to get to the broader question that the reviewer is asking, should one expect to see the same correlated responses to selection for night sleep among selection lines originating from the same outbred population? The answer is no, not unless the selected trait and the responding trait have a genetic correlation of 1.0. We previously estimated the correlation between day sleep and night sleep to be between 0.29 - 0.38 and between day bout number and night sleep to be -0.05 (Harbison et al., 2013; Harbison et al. 2009). In the Harbison et al. 2017 study we noted that day sleep and day bout number had correlated responses to selection for night sleep, but neither have correlated responses in the current study. The relatively low genetic correlations between these two measures and night sleep explain why we do not see a consistent correlated response among studies.

      We didn’t really elaborate on these observations in the manuscript, and so have added to the Results under “Correlated response of other sleep traits to selection for night sleep” the following: “These correlated responses concur with previous observations we made in selected populations originating from the same outbred population for night sleep and night average bout length, and night sleep and sleep latency (Harbison et al., 2017). However, unlike the previous study, we did not see a correlated response between night sleep and day sleep, and night sleep and day bout number (Harbison et al., 2017). The lack of correlated response reflects the relatively low genetic correlations these two traits have with night sleep (Harbison et al., 2013; Harbison et al., 2009).”

      2. Zeitgeber Time (ZT) for RNA collection: It is puzzling that the study reports that the RNA was collected at 12 PM. I do not understand what this information means; especially in a project where one is working with sleep. The authors might want to report ZT. Also, why a particular ZT was chosen should be discussed. These genes are potential sleep-relevant genes - hence it is not too esoteric to think that the ZT of data collection matters a lot as some of them might be cycling. To get a more appropriate picture, multiple time points of data collection might be even better. The authors seem to have ignored this crucial aspect of a clock/sleep study - time of data collection and how time of data collection might shape your findings.

      Our response: We agree with the reviewer that it would be better to have multiple timepoints for collection, but this is difficult to implement in practice as it would require an additional 5,280 flies per generation (4 pools of 10 flies per sex per population) for 12 timepoints as recommended by Hughes et al., JBR, 2017. We mention collection time in the Methods and Materials because we are aware of the changes in gene expression over the circadian day. 12PM is the midpoint between the start of the lights-on and lights-off period (i.e., ZT6), and was chosen arbitrarily. We have added the ZT notation to the Methods and Materials for clarity.

      3. Short sleeping flies: Are there reports of flies sleeping this less? "We found 2,830 interactions; 8 of these were one of the 3,570 between the 85 genes, but none of them overlapped with the 145 gene pairs found to be different from controls. The gene interactions we observed may therefore be unique to extreme sleep." What is extreme sleep? How does this study then claim to have identified evolution of potential sleep-relevant gene expression for normal, physiologically relevant sleep?

      Our response: Our statement was not very well worded, and we thank the reviewer for noticing this. What we intended to say was that the lack of overlap between our data and a known protein-protein interaction database may due to the interactions being unique to sleep as opposed to some other complex trait. We have re-worded this statement to say “The gene interactions we observed may therefore be unique to sleep.”

      *Minor points:

      4. The article uses an unnecessarily defensive tone to establish their approach to understand underlying mechanisms of sleep 'better' than that of the others (in both introduction and discussion): "In spite the large amount of studies and data generated for many systems, identifying underlying processes is still very rare; this is clear indication that better methods are needed to obtain understanding of biological processes from data." The 'still very rare' part is just factually incorrect and misleading as far as sleep is concerned. Even if we just see Drosophila studies on sleep, there is a huge progress that's being made in terms of genes, neurons and circuits relevant for sleep: both in terms of baseline sleep as an output of the circadian clock and the rebound/homeostatic sleep. Most, if not all, of these elegant and pioneering studies from multiple, independent groups took approaches that did not require artificial selection regimes. As a substitution for their defense, the authors might attempt to present their findings in the context of the existing knowledge of sleep in flies. For example, what about genes already implicated in sleep? Do they show up in their analysis? For example, Sleepless, DATfmn, Sandman, AstA, AstA-receptor, Wide-awake etc. This could really help the manuscript.*

      Our response: We certainly did not intend for this statement to suggest that no progress had been made in the identification of genes and circuits for sleep, and we agree that elegant and pioneering approaches have made significant progress in our understanding of the phenomenon. Rather, we were thinking more in terms of fully described biochemical networks. To avoid this interpretation by other readers, we have altered the “still very rare” sentence in the Introduction to read: “Despite the large amount of studies and data generated for many systems, a full understanding of underlying processes has not yet been achieved…’

      We also agree with the reviewer that it would be helpful to put our work in the context of what is already known in flies. We have added the following paragraph to the Discussion to relate the work with previous work on sleep in flies: “The genes we identify herein overlap and extend previous work. Of the 1,140 genes implicated in the generalized linear model, 151 (13.2 percent) overlapped with previous candidate gene, random mutagenesis, gene expression, and genome-wide association studies of sleep and circadian behavior in flies (Pegoraro e t al., 2022; Dissel et al., 2015; Seugnet et al., 2017; Shalaby et al., 2018; Thimgan et al., 2010, Thimgan et al., 2018, He et al., 2013; Mallon et al., 2014; Roessingh et al., 2019, Feng et al., 2018; Lee et al., 2021; Khoury et al., 2020; Wu et al., 2018; Harbison et al., 2013; Harbison et al., 2009; Harbison et al., 2017; Harbison et al., 2019). Notably, previous studies identified the genes CG17574, cry, dro, mip120, Mtk, NPFR1, pdgy, PGRP-LC, Shal, and vari as affecting sleep duration (Feng e t al., 2018, Dissel et al., 2015; Pegoraro et al., 2022; Thimgan et al., 2018; Mallon et al., 2014; He et al., 2013; Khoury et al., 2020; Harbison et al., 2013). Two genes, ringer and mip120, overlapped with our previous study of DNA sequence variation in flies selected for long and short sleep (Harbison et al., 2017). In that study we identified a polymorphism in an intron of ringer that changed in allele frequency with selection, with increases in the population frequency of the ‘G’ allele with increasing sleep, while the frequency of the ‘A’ allele increased with decreasing sleep. When the selective breeding procedure was relaxed, the frequency of the ‘G’ allele increased in short-sleeping populations, paralleling an increase in sleep (Souto-Maior et al., 2020). One possibility is that this polymorphism contributes to the changes in gene expression in ringer that we observed in the present study. Of the 85 genes common to both sexes that we used in the gene interaction networks, 11 (13 percent) appear in other studies of sleep: CG10444, CG2003, CG5142, CG6785, CG9114, CG9676, CR42646, hiw, NPFR1, Tie, and wb (He et al., 2013; Seugnet et al., 2017; Wu et al., 2018; Harbison et al., 2013). Thus, our study corroborates genes known to affect sleep, and identifies new candidate genes for sleep as well.”

      Reviewer #2 (Significance (Required)):

      5. I believe that the authors should attempt to put this study in the context of what is already known in sleep in flies and how this study advances the knowledge. And how the knowledge generated by this study would help other sleep researchers, who, for obvious reasons, would like to employ techniques other than artificial selection and big data. The data is elegant. The work seems to be extremely laborious. Nonetheless, as it stands now, this manuscript is only very specific for an audience who work with artificial selection to understand underlying genetics of behavior. In fact, even within the fly sleep field, most people might not find this manuscript very useful.

      Our response: The reviewer may not have considered the wider application of this work. This framework is applicable to any data set of gene expression sampled across time, whether sampled across generation, as we did, or across the 24-hour circadian day, or sampled at other time intervals. We have added a statement to the Discussion to stress this fact: “The Gaussian Processes we apply herein have broad applications to other experimental designs, such as gene expression measured at varying time intervals over the circadian day, or time-based sampling of gene expression responses to drug administration.”

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      The authors of this work generated a Sleep Advanced Intercross Population from 10 extreme sleeper Drosophila Genetics Reference Panel. This new out-bred population was subjected to a artificial selection with the aim of understanding the genes underlying the sleep duration differences between three populations: short-sleep, unselected, and long-sleep. Using analysis of variance the authors identified up to nearly 400 of genes that were significant selected over the various generations and showed opposite trends for long and short sleep, thus potentially relevant for the regulation of sleep duration. 85 of these genes were consistent between male and females sub-populations, suggesting a small number of genetic divergences may underlie sex-independent mechanisms of sleep.

      Given the time-course nature of the generational data obtained, the authors studied potential correlations and interactions between these 85 identified candidate genes. Initially, the authors used pairwise Spearman correlation, noticing how this method could not filter most of pairwise interaction (around 40% of all possibilities were significant). To overcome the linear limitations of the previous approach, the authors implemented a more complex, non-linear Gaussian process model able to account for pairwise interactions. This new approach was able to identify a smaller number of different, and potentially more informative, correlations between the candidate genes previously identified.

      Lastly, with genetic manipulations, the authors show in vivo that a subset of the candidate genes is causally related with the sleep duration as well as partially validating some of the correlation identified by their new model.

      The authors conclude that, given the non-linear and complex nature of biological systems, simplistic linear approaches may not suffice to fully capture underlying mechanisms of complex traits such as sleep.

      Major comments

      Most of the the work presented focus on the computational and statistical analysis of different populations submitted (or not) to a process of artificial selection for short or long sleep duration. As such, the amount of potentially relevant biological conclusions to be tested is mostly unfeasible. The authors already present additional experiments to partially support some, though not all, of their findings. Given the manuscript is written as a method innovation, these additional experiments illustrate the potential uses of the method described.

      (OPTIONAL) However, since the one of the focuses of this work in identifying potential gene interactions, it would be interesting if the authors could test a "double knockout" and perhaps demonstrate evidence for epistasis between two of the identified genes. Having access to single mutants, this experiment should be realistic. However, I have no hands-on experience working with Drosophila and I am unable to accurately estimate the amount of resources and time such and experiment could take. My initial guess would be 3-6 months work should suffice.

      In regards to the gene CG1304, it seems to be an important example used throughout the manuscript. It should be carefully re-analyzed as was considered for interaction analyses without showing opposite trends for short- and long-sleep populations (see minor comments on figure 2)

      One major comment would be that the claim that the Gaussian process method is more sensitive and specific than simpler approaches, though intuitively understandable, does not seem to be fully correct from a strict statistical point of view, given the lack of a gold standard reference to compare if the new method is indeed picking more true positives/negatives. I would reconsider re-rephrasing such statement in the absence of a biologically relevant validation set.

      Finally, the study appears to be well powered and it is clear that the authors were careful in their explanation of the statistical methods. However, I could not find the copy of the code/script used for the model. Without it, it would be very difficult to fully reproduce the results as both the language used (Stan) and the method itself are not common in the sleep research field.

      Minor comments

      The statistical cut-off used for gene expression hierarchical GLMM after BH correction was of 0.001, which is 50 times more strict than the common 0.05. Could the authors comment on how this choice may impact the results compared to those available in the literature and on the rational for choosing such a value.

      Heritability calculations are not mentioned in the methods. Could it be useful to include a small paragraph? Could a small comment be done on the differences in h2 for the short sleep replicates which show ~10x difference?

      In regards to the model implementation, what would be the implications of not enforcing positive semi-definiteness on the co-variance matrix, given than that these are strictly positive semi-defined?

      The methods mention that PCA projection were performed on the first 3 components, however only the first two are showed.

      Figure 1 refers to the mean night sleep time of the population. Could some measurement of variability (se or sd) be represented to provide a general idea of the distribution of the values? Additionally, the standard deviation of associated with the CVe estimates are mentioned but not showed explicitly. Could they maybe be added to the text as to illustrate how much such deviations were reduced?

      Figure 2 shows the linear model fits for gene CG1304. I find this gene on the list of significant genes for both sexes (tables S5/6), but it does not seem to be one that shows opposite trend for short- and long-sleep (tables S7/8). Surprisingly, it shows up again on table S10! However, the text introducing the figure reads like this should be one of the 85 sex-independent genes. Would it be best to provide an example of what a significant gene looks like?

      Figure 3 would be interesting to have both the GP correlations and the Spearman correlations to illustrate the methodological differences. I would be curious to see at least one pairwise expression scatter-plot as well just to see how they correlate in one plot.

      Figures 3S3/4 are described as showing single- and multi-channel models don't change substantially. Would this be expected and why?

      Having build different networks of pairwise associations of genes (projecting on a unified network as illustrated on figure 5), it could in interesting to compare the network topologies at a basic level such as node degrees, overlapping sub-networks, are they potentially scale free as previously described for biological systems, etc.

      On table S6 I noticed some gene symbols were loaded as dates (1-Dec)

      In results, the phenotypical response to artificial selection is sometimes described in minutes, other times in hours. Though this is an hurdle, it could make the values easier to compere if they were consistently formatted as minutes (hours).

      Over 99% of chains converged after three runs. Even though the reasons for the lack of convergence of these chains was not investigated, could this be a relevant effect? 1% of 3570 interactions is still 35 potential interactions. Do the non convergent chains relate with specific genes?

      The GO terms identified as significantly enriched after pvalue correction point to a clear association of the 85 genes identified with Serine proteases. Could this be discussed further to highlight biological findings of the work in the context of neuronal function or sleep regulation?

      Could the authors discuss the little overlap between males/females and shot/long sleep for 145 gene pairs identified after the MCMC runs. Similarly, how can the network differences be explained from a biological/evolutionary perspective?

      In the mutational analyses it is pointed out that CG12560 and Jon65Aii only affect females significantly. However, in the following sentence, the authors claim these two genes had the greatest effect on both sexes, which seems contradictory, at least in the way it is described.

      Maybe a small comment on how unchanged expression could lead to the observed phenotypical variation could help understanding how Minos mutations effects are biological mediated for those not familiar with the method. This seems to be the authors expectation so, could it be non-functional proteins or something else?

      The assumption that interacting genes would have their expression ratio changed by the Minos insertion would hold on situation where the affected gene causally interferes with the candidates expression. As far as I understand, causality cannot be inferred by the proposed method. Thus in a situation where both genes are co-regulated by a third factor, no change in expression ratio is to expected. How would the authors re-interpret their final result when considering this direct vs indirect interaction distinction?

      Referees cross-commenting

      I agree with Reviewer #2 comments which, to me, reads as generally pointing out the lack of biological interpretation of the results (and thus connecting this study with previous literature). Adding this component would make the manuscript well-rounded and attractive to a wider audience.

      Significance

      This study proposes the application of advanced non-linear methods to study complex traits such as sleep. As implemented, Gaussian Processes are able to identify non-linear correlations between two biological features (e.g. transcripts) over time (e.g. generations), representing an attempt to push the analytical methods available beyond the single gene paradigm. As such, more than the relevance of the biological results themselves, the authors focus on the explaining and illustrating the application of methodological advances obtained, and its relevance to obtain a better understanding of biological systems.

      However the mathematical principles required to understand the implemented method are not trivial and require advanced knowledge of machine learning and statistics. This is a potential barrier, though not an impediment, to its quick and wide adoption by the community. In addition, even if demonstrated to be a valid method when working with Drosophila, the resolution required to perform such a study may be difficult to obtain with other model systems, which would likely require further refinement of the statistical approach.

      The main audience interested in this work would be basic sleep researchers. However, this work is also related to the understanding gene selection over an artificial evolutionary process, thus evolutionary and developmental biologist may be also be interested. The methodology itself, already used in other fields of study, is a general statistical tool that could be adopted by a broad range of researchers for a diversity of topics. As such, I believe with this work, the authors will be able to stimulate the development and/or rethinking of the available analytical methods to study complex biological systems, though this would likely be done either in collaboration with the authors themselves or by a specific subset of researchers who regularly work with advanced mathematical, statistical and computational principles.

      (disclaimer) My mathematical formation does not reach the PhD level expertise that may be required to fully understand the methodology described. I have never personally worked with D. melonogaster or used Gaussian Processes in a professional setting. As such, I may not be able to fully evaluate/appreciate the more detailed technical aspects of this work.

    1. Recommandation 4. Introduire les notions d’éducation à la sexualité dans les programmes officiels decertaines disciplines concernées, au-delà des disciplines liées aux aspects biologiques et sanitaires et del’enseignement moral et civique.Le cadre actuel pose des difficultés liées l’absence de précisions sur l’organisation des séances d’éducation àla sexualité dans l’article L. 312-16 du code de l’éducation, qui a rendu obligatoires au moins trois séancesannuelles en matière d’éducation à la sexualité en 2001. Les circulaires successives ont défini un cadreopérationnel qui a été modifié à plusieurs reprises. Le cadre actuel décrit par la dernière circulaire, en datedu 12 septembre 2018, n’a pas repris deux points sur les supports horaires et les modalités précises sur laprise en charge concrète des séances qui figuraient dans les circulaires de 1998 et 2003. Les modificationssuivantes − sur le modèle de l’article L. 542-3 du code de l’éduction sur l’organisation de la séance annuelled’information et de sensibilisation sur l’enfance maltraitée95 − permettraient de clarifier ces questionsimportantes sur la mise en œuvre concrète de l’EAS :Recommandation 5. Inscrire au moins trois séances annuelles dédiées dans l’emploi du temps des élèves desécoles, des collèges et des lycées (disposition complétant l’article L. 312-16 du code de l’éducation).Recommandation 6. Attribuer la mission d’organisation des séances annuelles aux chefs d’établissement, enlien avec les comités d’éducation à la santé et la citoyenneté (disposition complétant l’article L. 312-16 ducode de l’éducation).
    1. This depends on the ruby code. Some projects will be semi-dormant due to various reasons. That's for us to address as a community. Are we going to let a single decade-old gem prevent us from moving Ruby forward? What's the threshold? There's libraries out there that don't work on Ruby 1.9. We left them behind or replaced them. And are people depending on a gem that's unmaintained really going to be the ones to jump on Ruby 3.0 the day after Christmas 2020? This is also still supposition. Name some gems that are unmaintained and in wide use. We can fix them! We have the technology! In my opinion, if matz's objective is to make the transition to ruby 3.0 simple, then it actually makes a lot of sense to postpone frozen strings by default. Postpone until when? 3.1? So then 3.1 will be the hard break? They've been discussed for what, ten years now? How long is long enough? We've added many ways for people to start transitioning to immutable literal strings, and people are using those mechanisms widely. We've pushed this transition a long time, and we still have another year until 3.0 is out and longer than that until people will need to make a move. What is the threshold for being "ready" to make this change? Unless we're planning to wait until Ruby 4.0 in 2030 to do this, I think we should do it now. I use frozen strings in most of my ruby projects, most of them set to true via the toplevel comment, so either way, it would not affect me. Exactly. Most people already do use frozen string literals. And adding a pragma means we can transition troublesome code to the new way with a single line per affected file. Heck, we can even add --enable:mutable-literal-string for people that are stuck with some of that old unmaintained code, allowing them to have a soft landing.
    1. Then Sean Black, a programmer on TikTok saw this and decided to contribute by creating a bot that would automatically log in and fill out applications with random user info, increasing the rate at which he (and others who used his code) could spam the Kellogg’s job applications:

      I think this is a really cool and important form of data punishment. A random programmer saw the issue of striking workers attempting to improve their working conditions getting a raise, and set out to make the bot to disrupt the hiring efforts. I think this is also bad though, because there might have been real people looking for jobs who would have worked for the price Kellogg was asking and now won't get hired.

    1. When we’ve been accessing Reddit through Python and the “PRAW” code library. The praw code library works by sending requests across the internet to Reddit, using what is called an “application programming interface” or API for short. APIs have a set of rules for what requests you can make, what happens when you make the request, and what information you can get back. If you are interested in learning more about what you can do with praw and what information you can get back, you can look at the official documentation for those. But be warned they are not organized in a friendly way for newcomers and take some getting used to to figure out what these documentation pages are talking about.

      API's have a set of endpoints, which are the URLs that the API can be accessed from, and each endpoint has a set of methods that can be used to access and retrieve information. PRAW abstracts away the complexity of making these requests and handling the responses, making it easier to access and work with data from Reddit.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewer Comments

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary:

      In developing systems, morphogens gradients pattern tissues such that cells along the patterning length sense varying levels of the morphogen. This process has a low positional error even in the presence of biological noise in numerous tissues including the early embryo of the Drosophila melanogaster. The authors of this manuscript developed a mathematical model to test the effect of noise and mean cell diameter on gradient variability and the positional error they convey.

      They solved the 1D reaction-diffusion equation for N cells with diameters and kinetic parameters sampled from a physiologically relevant mean and coefficient of variation (CV). They fit the resulting morphogen gradients to a hyperbolic cosine profile and determined the decay length (DL) and amplitude (A) for a thousand independent runs and reported the CV in DL and A.

      The authors found that CV in DL and A increases with increase in mean cell diameter. They propose a mathematical relationship between CV in DL scales as an inverse-square-root of N. Whereas the CV in DL and A is a weak function of CV of cell surface area (CVa) if CVa __They further looked at the shift in readout boundaries and compared four different readout metrics: spatial averaging, centroid readout, random readout and readout along the length of the cilium. Their results show that spatial averaging and centroid have a high readout precision.

      They finally showed that the positional error (PE) increases along the patterning length of the tissue and increases with increasing mean cell diameter.

      The authors also supported their theoretical and simulated results by looking at mean cell areas reported for in patterning tissues in literature which also have a higher readout precision with smaller cell diameters.

      Major comments:

      Most of the key conclusions are convincing. However, there are four major points that should be addressed. First, the authors conclude the section titled, "The positional error scales with the square root of the average cell diameter," by saying that morphogen systems with small cells can have high precision in absolute length scales, but not on the scale of one cell diameter. They state this would result in salt and pepper patterns in the transition zones. The authors should either support this with biological examples or explain why this is not observed experimentally.

      We thank the referee for pointing out this imprecise comment, which we have removed. The exact nature of transition zones between patterning domains is a subject of ongoing research in our group, and goes beyond the scope of the present work. We will be sharing our results on this aspect in a separate forthcoming publication.

      Second, perhaps the main conclusion of the paper is that morphogen gradients pattern best when the average cell diameter is small. The authors support this by reviewing the apical cell area of epithelial systems that are known to be patterned by morphogens and those that are not (presumably taking apical cell area as a proxy for cell diameter). However, the key parameter is not absolute cell diameter, but the cell diameter relative to the morphogen length scale. The authors should report the ratio of these two quantities in their literature analysis.

      Since cell areas and cell diameters are monotonically increasing functions of one another for reasonably regular cell shapes, we indeed consider apical cell areas as proxies for the cell diameter, as the referee correctly noted. Cell areas are more frequently reported in the literature than cell diameters, which is why we compiled these in our analysis.We have now revised our analysis of the effect of the cell diameter on patterning precision to further length scales relevant in the patterning process. We show by example of the Drosophila wing disc how the parallel changes in cell diameter and morphogen source size compensate for the increase in gradient length and domain size, which would otherwise reduce patterning precision over time as the readout positions shift away from the source to maintain the same relative position in the growing wing disc.

      Lamentably, accurate measurements of morphogen gradients in epithelial tissues are still rare. In fact, among the listed tissues that are patterned by gradients, we are only aware of measurements of the SHH and BMP gradients in the mouse NT (lambda = 20 µm) and of the Dpp gradients in the Drosophila wing and eye discs [Wartlick, et al., Science, 2011 & Wartlick et al., Development, 2014]. We agree that it would be great if experimental groups would measure this in more tissues. In this revised and extended analysis, we show that the positional error increases with the cell diameter in absolute terms, not only relative to any reference length, be it the gradient length or cell diameter.

      Third, as part of their literature analysis, the authors state that in the Drosophila syncytium, there are morphogen gradients, but they imply that because these gradients operate prior to cellularization, one cannot use the large distances between nuclei as counter evidence to their main conclusion. Rather than simply dismissing the case of the Drosophila syncytium, the authors should explain why this case does not apply, using reasoning based on their model assumptions.

      Our paper is concerned with patterning of epithelia (which we now make clearer in the manuscript), and we would not want to stretch our paper to other tissue types, as the reaction-diffusion process in them differs. But we do not share the referee’s sentiment that the syncytium would present a counter-example. Since our model explicitly represents kinetic variability between spatial regions bounded by cell membranes, which are absent in the syncytium, our model is not directly applicable to it. We now provide this argument in the discussion, as requested by the referee.

      At 100 µm [Gregor et al., Cell, 2007], the Bicoid gradient is 5 times longer than the SHH/BMP gradients in the mouse neural tube and more than 10 times the reported length of the WNT gradient in the Drosophila wing disc [Kicheva et al., Science, 2007]. The nuclei become smaller as they divide because the anterior-posterior length of the Drosophila embryo remains about 500 µm [Gregor et al., Cell, 2007], but even at the earliest patterning stage their diameter will not be larger than 10 µm at midinterphase 12 [Gregor et al., Cell, 2007, Fig. 3A].

      Fourth, related to the above: the authors then state that there are no morphogen gradients known during cellularization. Unless I am misunderstanding their point, this is untrue. The Dpp gradient acts during the process of cellularization and specifies at least three distinct spatial domains of gene expression. Furthermore, not long after gastrulation, EGFR signaling patterns the ventral ectoderm into at least two distinct domains of gene expression. What are the cell areas in that case?

      Unfortunately, the referee does not provide literature references, and we were not able to find anything in the literature ourselves. We have now rephrased the statement to “we are not aware of morphogen gradient readout during cellularisation”.

      Minor comments:

      Figs 1cd:

      The way the system is set-up: (DL = 20 micron, Patterning Length (LP) = 250 micron, Nominal cell diameter (D) = 5 micron) the DL/L ~ 0.08 which makes the exponential profile far to a small value around 100 micron. This means in all these simulations, the LP was only around 100 micron, cells beyond that saw nearly zero concentration.

      Because of this, when diameters were varied from 0.2 - 40 micron, there could be as few as 2.5 cells in the "patterning region" which could be responsible for higher variability in DL and A.

      Patterning in the neural tube works across several 100 µm. At x=100µm, there is still exp(-5)=0.0067 of the signal left, which likely well translates into appreciable numbers of the morphogen molecule (see [Vetter & Iber, 2022] for a discussion of concentration ranges cells might sense). Unfortunately, very little is known about absolute morphogen numbers in the different patterning systems — experimental data is available only on relative scales, not in absolute nu mbers. While more quantitative experiments are still outstanding, modeling work needs to be based on reasonable assumptions. The seemingly quick decay of exponential profiles (when plotted on a linear scale) can be deceiving. In fact, exponential profiles describe the same fold-change over repeated equal distances, which makes them biologically very useful for different readout mechanisms operating on different levels of morphogen abundance. Our simulations are not limited to a patterning length of 100µm. Our work merely shows that variable exponential gradients stay precise over a long distance. We draw no conclusion on whether cells are able to interpret the low morphogen concentrations that arise far in the patterning domain - this aspect certainly deserves further research.

      The referee’s observation is correct in that for a cell diameter of up to 40 µm, there are only few cells in the patterning domain (namely down to about six, for a length of 250µm, as used in the simulations). It is also correct that this is the reason why gradients in such a tissue have greater variability in lambda and C0. This is precisely the main point we are making in this study: The narrower the cells in a tissue of given size, the less variable the morphogen gradients, and the more accurate the positional information they carry. Conversely, the wider the cells in x direction, the more variable the gradients.

      Would any of the results change if DL/L was higher, around 0.2?

      As we consider steady state gradients, nothing changes if we fix the (mean) gradient decay length and only shorten the patterning domain, except for a small boundary effect at the far end of the tissue due to zero-flux conditions applied there. At a fixed gradient length, the steady-state gradients just extend further if DL/L is increased (for example to 0.2), reaching lower concentrations, but the shape remains unchanged, and so does the morphogen concentration at a given absolute readout position.

      To demonstrate what happens at DL/L = 0.2, as requested by the referee, we repeated simulations with an increased gradient decay length of DL=50 micrometers; the length of the patterning domain remained unchanged at L=250 micrometers. As it is not possible to include image files in this response, we have made the plots available at https://git.bsse.ethz.ch/iber/Publications/2022_adelmann_vetter_cell_size/-/blob/main/revision_increased_dl.pdf for the time of the reviewing process. The plots show the resulting gradient variability, which is analogous to Fig 1c,d in the original manuscript. For both gradient parameters, we still recover the identical scaling laws.

      The source region is 25 microns in length and all cell diameters above 25 micron get defaulted back to 25 micron which explains the flatness lines in the region beyond mu_delta/mu_DL> 1

      Thanks for pointing this out. We now mention this in the manuscript. Note that it’s the ratio mu_delta/L_s that matters, not mu_delta/mu_lambda. It just so happens in this case, that both are nearly equal, because L_s=5*mu_lambda/4 in our simulations.

      Results:

      Pg 2 (bottom left): In the git repository code, the morphogen gradients are fit to a hyperbolic cosines function (described in reference 19) which is not described in the main text. Having this in the main text would help readers understand why fig 1c has variation in d only, D only and all k parameters whereas fig 1d has variation with all individual parameters p, d and D and all k.

      The reason why the impact of CV_p alone on CV_lambda is not plotted in Fig 1c is that it is minuscule. We now mention this in the figure legend. This follows from the fact that the gradient length lambda is determined in the patterning domain, whereas the production rate p sets the morphogen concentration in the source domain, and thus, the gradient amplitude, but not its characteristic length. This is unrelated to the functional form used to fit the shape of the gradients, be it exponential or a hyperbolic cosine. We mention that we fit hyperbolic cosines to the numerical gradients in section Gradient parameter extraction in the Methods section, and we refer the interested reader to the original reference [Vetter & Iber, 2022], which contains all mathematical details, should they be needed.

      Figure 3b:

      In figures where markers are overlapping perhaps the authors can use a "dot" to identify one set of simulations and a "o" to identify the ones under it. The way the plots are set up currently makes it hard for the reader to understand where certain points on the plot are.

      We use a color code to represent the readout strategy and different symbols to represent the cell diameter in Fig 3b. We agree that for the smallest of the cell diameters, the diamond-shaped data points lie so close that they are not easy to tell apart at first sight. For this reason, we chose different symbol sizes. We would like to keep the symbols as they are to maintain visual consistency with the other figures, which we think is an important feature of our presentation that facilitates the interpretation. Note that all our figures are vector graphics, which allow the reader to zoom in arbitrarily deep, and to easily distinguish the data points. Note also that in this particular case, telling the data points apart is not necessary; recognizing that they are nearly identical is sufficient for the interpretation of our results.

      Methods:

      The Methods can be more descriptive to include certain aspects of the simulations such as adjusted lambda which is only described in the code and not the main text or supplementary.

      We apologize for this omitted detail. As shown in Fig. 8g in [Vetter & Iber, 2022], the mean fitted value of lambda drifts away from the prescribed value, depending on which of the kinetic parameters are varied, and by how much. To report the true observed mean gradient length in our results, we corrected for this drift in our implementation, as the referee correctly noticed. We now describe this in the methods section, and we have extended the methods also on other aspects.

      Git code:

      The git code function handles do not represent figure numbers and should be updated to make it easier for readers to find the right code

      Thank you for pointing this out — it was an oversight from an earlier preprint version. The function names now correspond to the figure numbers.

      Reviewer #1 (Significance (Required)):

      This manuscript contributes certain key aspects to the patterning domain. The three most important contributions of this work to the current literature are: (1) the scaling relationships developed here are important, (2) the idea that PE increases at the tail-end of the morphogen profile is nicely shown and (3) Comparison of various readout strategies.

      Thank you for the positive assessment.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      How morphogen gradients yield to precise patterning outputs is an important problem in developmental biology. In this manuscript, Adelmann et al. study the impact of cell size in the precision of morphogen gradients and use a theoretical framework to show that positional error is proportional to the square root of cell diameter, suggesting that the smaller the cells in a patterning field, the more precise patterns can be established against morphogen gradient variability. This result remains true even when cells average the morphogen signal across their surface or spatial correlations between cells are introduced. Thus, the authors suggest that epithelial tissues patterned by morphogen gradients buffer morphogen variability by reducing apical cell areas and support their hypothesis by examining several experimental examples of gradient-based vs. non-gradient-based patterning systems.

      Major comments:

      While the idea that smaller cells yield to more precise morphogen gradient outputs is attractive, it is unclear whether patterning systems use this strategy to make patterns more precise, as there are several mechanisms that could achieve precision. Do actual developmental systems use it as a mechanism to increase precision? Or precision is achieved through other mechanisms (for example, cell sorting as in the zebrafish neural tube; Xiong et al. Cell, 2013). Indeed, classical patterning work on Drosophila embryo suggest that segmentation patterns are of an absolute size rather by an absolute number of cells (Sullivan, Nature, 1987). According to the authors, the patterning stripes should be more precise when embryos have higher cell densities than in the wild-type, but stripes are remarkably precise in wild-type embryos. This is likely due to other precision-ensuring mechanisms (such as downstream transcriptional repressors, in this case).

      We want to emphasize that our predictions concern the precision of the gradients, not the precision of their readout, which can be strongly affected by readout noise, as we will show in a forthcoming paper. Cell sorting can sharpen boundaries in the transition zone, but this would not address errors in target domain sizes and is thus different from gradient precision as we discuss it here. Also, cell sorting as observed in the zebrafish neural tube requires higher cell motility than what is observed in most epithelial tissues. The work by Sullivan, Nature, 1987, is concerned with patterning of the early Drosophila embryo, and the stripes are defined already before cellularisation. We are unfortunately not aware of any work that quantified gradient precision at different cell densities in epithelia. This would, of course, be highly interesting data and would indeed put our predictions to a test. We are, to the best of our knowledge, the first to propose this principle with the present work. We have now made these points and distinctions clearer in the revised manuscript. Thank you for bringing this up.

      Their modeling approach is based on exponential gradients formed by diffusion and linear degradation, but in reality, actual morphogen gradients are affected by receptor and proteoglycan binding and are likely not simply exponential and/or interpreted at the steady state. Do the main results of the manuscript hold even for non-exponential gradients or before they reach a steady state?

      We can confirm that our results also hold for non-exponential gradients, as they emerge for example when morphogen degradation is self-enhanced (i.e., non-linear). This result will be published in a follow-up study [BioRxiv: 10.1101/2022.11.04.514993], which we now cite in the concluding remarks in the revised manuscript.

      The analysis of pre-steady-state gradients lies outside of the scope of the present work, and so the question as to whether our results are applicable to them as well, remains to be answered in future research. We have added a comment on this to the discussion.

      In their Discussion section, the authors note that several patterning systems, such as the Drosophila wing and eye discs, show smaller cells near the morphogen source relative to other regions in the tissue. This observation suggests a prediction of the authors' hypothesis that can be tested experimentally. In the Drosophila wing and eye discs genetic mosaics of ectopic morphogen sources (such as Dpp) can (and have) been made. Therefore, one could predict that the patterning outputs in a region of larger cross-sectional areas will be more imprecise than in the endogenous source. Since this is a theoretical paper, it is understandable that authors are not going to make this experiment themselves, but I wonder if they can use published data to test this prediction or at least mention it in the manuscript to offer the experimental biology reader an idea of how their hypothesis can be tested experimentally.

      We appreciate that the referee would like to help us inspire the experimental community. Unfortunately, the problem with the proposal is that Dpp has been shown to result in a lengthening of the cells (and thus a smaller cell width) [Widmann & Dahman, J Cell Sci, 2009]. The Dpp gradient thus ensures a small cell width close to its source, which makes it virtually impossible to test this proposal experimentally in the suggested way. Nevertheless, we have added brief comments on potential experimental testing of our predictions to the discussion.

      Other comments:

      The Methods section should be expanded and should include more details about how authors consider cell size in their simulations. As presented, I believe that experimental biologists will not be able to grasp how the analysis was done.

      We have expanded on the technical details of our model in the methods section, in particular in relation to the cell size, as requested. To avoid being overly redundant with existing published descriptions of the modeling details [Vetter & Iber, 2022], we focus here on a description of what has not been covered already, and refer the interested reader to our previous publication. It is inevitable for any kind of work, be it theoretical or experimental, to be less accessible to experts in other disciplines, but we believe that the presentation of our results is independent enough of modeling aspects to be accessible to experimental biologists, too.

      Authors use adjectives such as 'little' as 'small' without a comparative reference. For example in the abstract, the authors say that apical areas "are indeed small in developmental tissues." What does "small" mean? This should be avoided throughout the text.

      We thank the referee for raising this point. Where appropriate, we changed the phrasing accordingly to clarify what the comparative reference is. We leave all sentences unchanged where the statement holds in absolute terms. Note that in the substantially revised analysis on the impact of the different length scales involved in the patterning process, we now explicitly show with simulation data and theory that the absolute positional error increases with increasing absolute cell diameter.

      Reviewer #2 (Significance (Required)):

      Overall, I believe that the manuscript is well written and deserves consideration for publication. However, authors should consider the points outlined above in order to make their manuscript more accessible and relevant to the developmental biology community.

      Thank you for the positive assessment.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In their mansucript "Impact of cell size on morphogen gradient precision" the authors Adelmann, Vetter and Iber numerically analyse a one-dimensional PDE-based model of morphogen gradient formation in tissues in which the cell sizes and cell-specific parameters locally affecting the gradient properties are varied according to predefined distributions. They find that the average cell size has the largest impact on the variance of the gradient shape and the read-out precision downstream, while other factors such as details of the readout mechanism have markedly less influence on these properties. In addition they demonstrate that averaging gradient concentrations over typical cell areas induces a shift of the readout position, which however appears to be insignificant (~1% of the cell diameter) for typical parameters.

      Overall this manuscript is in very good shape already and tackles an interesting topic. I still would like the authors to address the comments below before I would recommend any publication. My main criticism pertains to some of the authors' derivations which, as I find, partly do deserve more detail, and to their conclusions about gradient readout precision.

      Thank you for the positive assessment.

      MAJOR COMMENTS

      p. 1, left column: The positional error of the readout position does not only depend on the variation of the gradient parameters, as suggested by the first part of the introduction. A very important factor is also the fluctuations due to random arrival of molecules to the promoters that perform the readout due to the limited (and typically low) molecule number. In fact, for positions very distant to the source of the gradient, this noise source is expected to be dominant over gradient shape fluctuations. Importantly, these fluctuations also arise for non-fluctuating, "perfect" gradient inputs if copy numbers of the morphogen molecules are limited (which they always are). This important contribution to the noise is neglected in the work of the authors. This is OK if the purpose is focusing on the origin and influence of the gradient shape fluctuations, but that focus should be clearly highlighted in the introduction, saying explicitly that noise due to diffusive arrival of transcription factors is not taken into account in the given work (see, e.g., Tkacik, Gregor, Bialek, PLoS ONE 3, 2008)

      In the present work, only precision of the gradients, but not the readout itself is studied. We have now mentioned this more explicitly in the introduction. We also acknowledge the fact that the readout itself introduces additional noise into the system. We are currently finishing up work that addresses exactly this subject, which is outside of the scope of the present paper.

      What may have led to misinterpretation of the scope of our work is that we called x_theta the readout position. x_theta defines the location where cells sense (i.e., read out) a certain concentration threshold, and is not meant to be interpreted as the location of a certain readout (a downstream transcription factor) of the morphogen. We have made this distinction clearer in the revised manuscript.

      p.1, right column: Why exactly are the parameters p, d, D assumed to follow a log-normal distribution? Such a distribution has been verified for cell size, but the rationale behind choosing it also for the named parameters should be explained, in particular for D. Why would D depend on local properties of the cell? Which diffusion / transport mechanism precisely is assumed here?

      The motivations for the used log-normal distributions for the kinetic parameters are the following:

      The morphogen production rates, degradation rates and diffusivities must be strictly positive. This rules out a normal distribution. The probability density of near-zero kinetic parameters must vanish quickly, as otherwise no successful patterning can occur. For example, a tiny diffusion coefficient would not enable morphogen transport over biologically useful distances within useful timeframes. This rules out a normal distribution truncated at zero, because very low diffusivities would occur rather frequently for such a distribution. Given the absence of reports on distributions for p, d, D from the literature, we chose a plausible probability distribution that fulfills the above two criteria and possesses just two parameters, such that they are fully defined by a mean value and coefficient of variation. This is given by a lognormal distribution. Our results are largely independent of the exact choice of probability distribution assumed for the kinetic parameters, under the constraints mentioned above. To demonstrate this, we have repeated a set of simulations with a gamma distribution with equal mean and variance as used for the lognormal distribution. Below are some simulation results for a gamma distribution with shape parameters a = 1/CV^2 and inverse scale parameter b = mu*CV^2 with CV = 0.3 as used in the results shown in the paper. As can be appreciated from these plots, the results do not change substantially, and our conclusions still hold. As we believe this information is potentially relevant for the readership of our paper, we have added this result and discussion to the supplement and to the conclusion in the main text.

      We assume extracellular, Fickean morphogen diffusion with effective diffusivity D along the epithelial cells, as specified by Eq. 2. We now state this more explicitly just below Eq. 2 in the revised manuscript. Cell-to-cell variability in the effective diffusivity may arise from effects that alter the effective diffusion path and dynamics along the surface of cells, which we do not model explicitly, but lump into the effective values of D. Such effects may include different diffusion paths (different tortuosities) or transient binding, among others.

      Moreover, is there any relationship between A_i and p_i, d_i and D_i, or are these parameters varied completely independently? If yes, is there a justification for that?

      The parameters are all varied independently, as written in the paragraph below Eq. 2 on the first page (“drawn for each cell independently”). To our knowledge there is no reported evidence for correlations between cell areas, morphogen production rates, degradation rates, or transport rates across epithelia, that we could base our model on. The choice of independent cell parameters therefore represents a plausible model of least assumptions made. Note that we explore the effect of potential spatial correlations in the kinetic parameters between neighboring cells in the section “The effect of spatial correlation”, finding that such correlations, if at all present, are unlikely to significantly alter our results.

      p. 2, right column, section on "Spatial averaging": First of all, how is "averaging" exactly defined here? Do the authors assume that the cells can perfectly integrate over their surface in the dimensions perpendicular to their height? If yes, then this should be briefly mentioned here. Secondly, the shift \Delta x calculated by the authors ultimately seems to trace back to the fact that the cells average over an exponential gradient, whose derivative also is exponential, such that levels further to the anterior from the cell center are higher (on average) than levels to the posterior of it. I suppose, therefore, that a similar calculation for linear gradients would not lead to any shift. If these things are true they deserve being mentioned in this part of the manuscript because they provide an intuitive explanation for the shift. Thirdly, in Fig. 2A the cell sizes seem exaggerated with respect to the gradient length. This seems fine for illustrative purposes, but if it is the case it should be mentioned. Also, I believe that this figure panel would benefit from showing another readout case where the average concentration e.g. in cell 1 maps to its corresponding readout position, in order to show that this process repeats in every cell. Moreover, it could be indicated that in the shown case C_\theta matches the average concentration in cell 2 at the indicated position.

      Spatial averaging is defined as perfect integration along the spatial coordinate over a length of 2r (which can generally be equal to, or smaller than, or larger than one cell diameter) as detailed in the supplementary material. In simulations, we use the trapezoid method for numerical integration to get the average concentration a cell experiences along its surface area perpendicular to their height.

      The reviewer is correct, that the shift is a consequence of averaging over an exponential gradient. The average of an exponential gradient is higher compared to the concentration at the centroid of the cell, thus the small shift. This is mentioned e.g. in the caption of Fig. S1, but also in the main text (“spatial averaging of an exponential gradient results in a higher average concentration than centroid readout”). We have now added this information also to the caption of Fig. 2. As pointed out correctly by the referee, linear gradients would not result in such a shift. A brief comment on this has been added to the revised manuscript.

      We now mention that the cell size is exaggerated in comparison to the gradient decay length for illustration purposes in the schematic of Fig. 2a, as requested.

      Unfortunately, we had a hard time following the reviewer’s final point. We show a specific readout threshold concentration, C_theta, in Fig. 2a. A cell determines its fate based on whether its sensed (possibly averaged) concentration is greater or smaller than C_theta. In the illustration, cells 1 and 2 sense a concentration greater than C_theta, and all further cells sense a concentration smaller than C_theta. Cell fate boundaries necessarily develop at cell boundaries (here; between cells 2 and 3, red). Additionally, the readout position for a continuous domain, where morphogen sensing can occur at an arbitrary point along the patterning axis, is shown (blue). This position can be different from the one restricted to cell borders. Thus, different readout positions in the patterning domain result from the two scenarios, which is what the schematic illustrates. Given that our illustration seems to go well with the other referees, we are unsure in what way it could be improved.

      As for the significance of the magnitude of the shift for typical parameters as calculated by the authors: I believe that it could be said more explicitly and clearly that under biological conditions the calculated shift overall seems insignificant, as it amounts to a small fraction of the cell diameter.

      We have made this more explicit in the text.

      Finally, and most importantly: The term "spatial averaging" can have a different meaning in developmental biology than the one employed by the authors. While the authors mean by it that individual cells average the gradient concentration over their area, in other works "spatial averaging" typically means that individual cells sense "their" gradient value (by whatever mechanism) and then exchange molecules activated by it, which encode the read-out gradient value downstream, between neighboring cells, in order to average out the gradient values "measured" under noisy conditions. The noise reduction effect of such spatial averaging can be very significant, as evidenced by this (incomplete) list of works which the authors can refer to:

      - Erdmann, Howard, ten Wolde, PRL 103, 2009

      - Sokolowski & Tkacik, PRE 91, 2015

      - Ellison et al., PNAS 113, 2016

      - Mugler, Levchenko, Nemenman, PNAS 113, 2016

      The main point, however, is that this is a different mechanism as the one described by the authors, and this should be clearly mentioned in order to distinguished them. I would therefore also advise the authors to make the section title more precise here, by changing "Spatial averaging barely affects ..." to "Spatial averaging across the cell area barely affects ..." for clarity.

      Most theory development has previously indeed been done with the syncitium of the early Drosophila embryo in mind. However, most patterning in development happens in epithelial (or mesenchymal) tissues, where spatial averaging via translated proteins is not as straightforward and natural as in a syncitium. In fact, a bucket transport of a produced protein from cell to cell would be difficult to arrange (as upon internalization, degradation would have to be prevented), be subject to much molecular noise, and be rather slow. Our paper is concerned with patterning in epithelia, which we have now stated more clearly in the manuscript.

      Regarding the section title: Our analysis does not only cover spatial morphogen averaging over the cell area, but it also includes averaging radii below (in the theory) and far above (in the theory and in the new Fig. 4c, previously 3c) half a cell diameter. With cilia of sufficient length r, epithelial cells could potentially average over spatial regions extending further than their own cell area, without need for inter-cellular molecular exchange between neighboring cells. This is the kind of spatial averaging we explored here. Restricting the section title to the cell area only would therefore be misleading. However, we agree with the referee that the distinction between different meanings of “spatial averaging” is important, and we now emphasize our interpretation and the scope of our work more in the revised text.

      p. 3, Figure 3: It would be good to highlight the fact that the colours in panel A correspond to the bullet colors in the other panels also in the main text.

      We now added this also in the main text.

      As to the comparison of different readout strategies: How exactly were the different readout mechanisms compared on the mathematical side? More precisely: How was the readout by the whole area matched (in terms of fluxes) to the readout at a single point, be it in the center of the cell or a randomly chosen point? How was it ensured that the comparison is done at equal footing?

      Our model considers that a cell can sense a single concentration even if it is exposed to a gradient of concentrations. Assuming the French flag model is correct, a cell must make a binary decision based on a sensed concentration in order to determine its fate. The different readout strategies are hypothetical and simplified mechanisms for how a cell could, in principle, detect a local morphogen signal. It is unclear to us what the referee is referring to when mentioning “matching in terms of fluxes”, as there are no fluxes involved in the modeled readout strategies. We make no assumption on the underlying biochemical mechanism that would allow cells to implement one of the strategies. The main goal of this analysis was to determine whether various different sensing strategies had a significant effect on the precision of morphogen gradients experienced by cells. To assure that we can compare the different mechanisms at equal footing, we simulated gradients and then calculated from each gradient the readout concentration in each cell and for each of the methods.

      p. 3, right column: "... similar gradient variabilities, and thus readout precision": Linking to comment 1 above, this is strictly speaking only the case when the only source of fluctuations in the readout is the gradient fluctuations. I would therefore leave this statement out.

      To avoid confusion, we have removed parts of the sentence. Thank you for pointing this out.

      p. 3, section on positional error (right column): In this part I had most troubles following the thoughts of the authors.

      First of all, the measure that the authors use for the positional error is sigma_x / mu_lambda, i.e. the standard deviation of the readout position relative to the gradient length. The question is whether this is the correct measure. It should be specified what the motivation for normalizing by mu_lambda is. In the end, one could argue, what the cells really do care about would be that the developmental process can assign cell fates with single cell precision, for which the other measure shown in Eq. (6) is the representative one. Now in contrast to the former measure, the latter actually increases with decreasing cell diameter.

      We thank the referee for raising this point, and acknowledge that we have not presented this aspect well enough. We have rewritten the entire section and the discussion about biological implications. Instead of normalizing with a constant mean gradient length in the formulas and figures, which has left room for misinterpretation, we now instead varied all relevant length scales in the patterning system, to determine the impact of each of them independently on the positional error. We now show that the positional error increases (to leading order) proportionally to the mean gradient length, the square root of the cell diameter, the square root of the location in the patterned tissue, and inversely proportional to the length of the source domain. We support these new aspects with new simulation data (Fig. 2E-2H, Fig. 3D-G, Fig. S5, Fig. S6). As the positional error is now reported in absolute terms, rather than relative to a particular length scale, the question of the relevant scale is addressed. We now show that the absolute positional error increases with increasing absolute cell diameter.

      We believe that this extension provides additional important insight into what affects the patterning precision. We thank the referee very much for motivating us to expand our analysis.

      Secondly, even when the former measure (sigma_x / mu_lambda) is employed, Fig. 3(D) shows that while it decreases with decreasing cell diameters, in the regime of small diameters the std. dev. of the readout position becomes larger than the average cell diameter, which actually would mean that cell fates cannot be assigned with single-cell precision. While the authors later report both quantities for specific gradients, it should be clarified beforehand which of the measures is the relevant one.

      This has now been addressed by considering absolute length scales as discussed at length in our answer to the previous point.

      Moreover, in the following derivations, mu_x is not properly introduced. What exactly is the definition of that quantity? Is it the mean readout position? If yes, it is not clear why exactly it would be interesting and relevant to the cell. This should be properly explained in a way that does not require the reader to look up further details in another publication.

      The referee is correct in that mu_x is the mean readout position. We apologize for not being clear enough on this, and have now defined this in the introduction together with the definition of sigma_x.

      At the end of this section the authors come back to the sigma_x / mu_delta measure again and indeed point out that it increases with decreasing mu_delta, which causes a bit of confusion because the initial part of the section only talks about the increase of the pos. error with mu_delta. Overall I find that this section should be rewritten more clearly. Right now it leaves the reader with the "take home message" that small cells are good because they lead to smaller pos. error, but when the--in my opinion--relevant measure (sigma_x/mu_delta) is employed the opposite is the case. This is confusing and unclear about the authors' intentions in that part.

      See the answer above. The “take-home message” is now reformulated in absolute terms regarding the effect of cell diameter, rather than relative to a certain choice of reference scale. Our new analysis revealed a new relative ratio that determines the positional error, mu_lambda/L_s. We now discuss this relative measure also regarding its biological significance. Once again, we thank the referee for pointing us at this source of confusion, the elimination of which allowed us to improve our analysis.

      __Finally, the authors could also supplement the numbers that they name for the FGF8 and SHH gradients by the known numbers for the Bcd gradient in Drosophila, which has been studied excessively and constitutes a paradigm of developmental biology. Here mu_delta ~= 6.5 um, while mu_lambda ~= 100 um, such that mu_delta/mu_lambda While we appreciate that most theoretical work has been done for syncytia, this paper is concerned with patterning of epithelia, which have different patterning constraints, as also explained in a reply further above. We now make the scope of our work clearer in the revised manuscript. But as the referee points out, the diameter of the nucleus relative to the gradient length is such that gradients can be expected to be sufficiently precise.

      p. 4, section on the effect of spatial correlation: Here the authors chose to order the kinetic parameters in ascending or descending order. Is there any biological motivation for that particular choice? Other types of correlations seem possible, e.g. imposing the rule that successive parameter values are sampled starting from the previous value, p_i+1 = o_i +- delta_i+1 where delta_i+1 are random numbers with a defined variance.

      In the simulations we go from zero correlation (every cell has independent kinetic parameters) to maximal correlation (every cell has the same parameters, resulting effectively in a patterning domain that consists of a single effective “cell”), see Fig. S3. Biologically plausible correlations in between these extremes should retain the same kinetic variability levels (same CVs) which we took from the measured range reported in the literature. We accomplish this by ordering the parameters after independently sampling the parameters for each cell from probability distributions with the desired CV. The motivation for this approach is that this produces a type of maximal correlation that still reflects the measured biological cell-to-cell variability, to demonstrate in Fig. S3, that even such a maximal degree of spatial correlation does not qualitatively alter our results. The kind of correlation that the referee suggests introduces a spatial correlation length that lies in between the extremes that we simulated. Since even for maximal correlation using the ordering approach, we find our conclusions to still apply, we have no reason to expect that intermediate levels of correlation would behave any differently.

      The idea brought forward by the referee effectively introduces a correlation length scale. We discuss this case in the paper, noting that the positional error will scale as x~N , where N is the number of cells sharing the same kinetic parameters. A correlation length scale will be proportional to N and will therefore simply uniformly scale the positional error accordingly, but will likely not reveal any new insight beyond that.

      Moreover, using the idea of the referee as an additional way to introduce correlation is difficult to realise in practice, as we need to recover the mean and variance of the kinetic parameters, while ensuring strict positivity for each of them. A simple random walk, as proposed, would not lend itself easily to achieve this without introducing a bias in the distribution, because negative values need to be prevented. As explained in a reply further above, an important feature of the kinetic parameters is that they are not too small to prevent the formation of a meaningful gradient, which is not straightforward to ensure with the proposed method.

      We acknowledge that there are different types of correlations conceivable, but we expect these correlations to lie between the two extremes that we present in the paper, which show no qualitative difference in the results.

      p.5, Discussion: "..., but with nuclei much wider than the average cell diameter". To be honest, I could not completely imagine what is meant with this sentence. Intuitively, it seems that the nuclei cannot be larger than the cells, but I suppose that some kind of special anisotropy is considered here? In any case, this should be made precise.

      The main tissues that are patterned by gradients are epithelia. Our paper focuses on such tissues. It is a well-known feature of pseudostratified epithelia that nuclei are on average wider than the cell width averaged over the apical-basis axis. Nature solves this problem by stacking nuclei above each other along the apical-basal axis, resulting in a single-layered tissue that appears to be a multi-layered stratified tissue when only looking at nuclei. For a schematic illustration of this, see Fig. 1 in [DOI: 10.1016/j.gde.2022.101916]. An image search for “pseudostratified epithelia” on Google yields a plethora of microscopy images. Right at the end of the quote recited by the referee, we also cite our own study [Gomez et al, 2021], which quantifies this in Fig. 5.

      Moreover, I find that the conclusion that morphogen gradients "provide precise positional information even far away from the morphogen source" goes to far based on the authors' work, precisely for the fact input fluctuations due to limited morphogen copy number, which can become detrimentally low far away from the source, are not considered, neither the timescales needed to both establish and sample such low concentrations far away from the source. While thus, according to the work of the authors, the fluctuations in the morphogen signal may be favorably small, these other factors are supposed to exert a strong limit on positional information. This conclusion therefore seems unjustified and should be toned down, or even better taken out and replaced by a more accurate one, which only focuses on the gradient shape fluctuations, not on the conveyed positional information.

      There is no evidence so far that morphogen gradient concentrations become too low to be sensed by epithelial cells, to the best of our knowledge. What we show is that the gradient variability between embryos remains low enough that precise patterning remains possible. Whether the morphogen concentration remains high enough to be read out reliably by cells is a subject that requires future research. Genetic evidence from the mouse neural tube demonstrates that the SHH gradient is still sensed at a distance beyond 15 lambda (SHH signalling represses PAX7 expression at the dorsal end of the neural tube) [Dessaud et al., Nature, 2007], where an exponential concentration has dropped more than 3-million-fold.

      As the referee correctly recites, we state that “morphogen gradients remain highly accurate over very long distances, providing precise positional information even far away from the morphogen source”. This statement is restricted to the positional information that the gradients convey, and does not touch potentially precision-enhancing or -deteriorating readout effects, nor does it concern the absolute number of morphogen molecules.

      Positional information goes through several steps. The gradients themselves convey a first level of positional information, by being variable in patterning direction, as quantified by the positional error. This is what we draw our conclusion about. This positional information from the gradients can then be translated into positional information further downstream, by specific readout mechanisms, inter-cellular processes, temporal averaging, etc. About these further levels of positional information, we make no statement.

      We therefore disagree that our conclusion is unjustified. In fact, we have phrased it exactly having the limited scope of our study in mind, making sure that we restrict the conclusion to the gradients themselves.

      MINOR COMMENTS

      - p. 1: "and find that positional accuracy is the higher, the narrower the cells".

      (This sentence, however, should be anyhow revised in view of major comment 5 above.)

      We have added “the”.

      - p. 4: "... with an even slightly smaller prefactor."

      We have removed “even”.

      Reviewer #3 (Significance (Required)):

      I believe that this work is significant to the community working on the theoretical foundations of morphogen gradient precision in developmental systems. The main interesting findings are that small cell diameters lead to smaller positional error (although the relevant measure should be clarified according to my comment no. 5), and that the gradient shape fluctuations are surprisingly robust with respect to the readout mechanism.

      Its limitations consist of the fact that the impact of small copy numbers on the readout and associated timescales are neglected, such that the findings of the authors on gradient robustness cannot be simply transferred by simple conversion formulas to readout robustness / positional information. Comment 5 goes hand in hand with this, as a different conclusion may emerge depending on how the relevant positional error measure is defined. This should be fixed by the authors as indicated in the main part of the report.

      Thank you for your assessment.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      • In developing systems, morphogens gradients pattern tissues such that cells along the patterning length sense varying levels of the morphogen. This process has a low positional error even in the presence of biological noise in numerous tissues including the early embryo of the Drosophila melanogaster. The authors of this manuscript developed a mathematical model to test the effect of noise and mean cell diameter on gradient variability and the positional error they convey.

      • They solved the 1D reaction-diffusion equation for N cells with diameters and kinetic parameters sampled from a physiologically relevant mean and coefficient of variation (CV). They fit the resulting morphogen gradients to a hyperbolic cosine profile and determined the decay length (DL) and amplitude (A) for a thousand independent runs and reported the CV in DL and A.

      • The authors found that CV in DL and A increases with increase in mean cell diameter. They propose a mathematical relationship between CV in DL scales as an inverse-square-root of N. Whereas the CV in DL and A is a weak function of CV of cell surface area (CVa) if CVa < 1.

      • They further looked at the shift in readout boundaries and compared four different readout metrics: spatial averaging, centroid readout, random readout and readout along the length of the cilium. Their results show that spatial averaging and centroid have a high readout precision.

      • They finally showed that the positional error (PE) increases along the patterning length of the tissue and increases with increasing mean cell diameter.

      • The authors also supported their theoretical and simulated results by looking at mean cell areas reported for in patterning tissues in literature which also have a higher readout precision with smaller cell diameters.

      Major comments:

      • Most of the key conclusions are convincing. However, there are four major points that should be addressed. First, the authors conclude the section titled, "The positional error scales with the square root of the average cell diameter," by saying that morphogen systems with small cells can have high precision in absolute length scales, but not on the scale of one cell diameter. They state this would result in salt and pepper patterns in the transition zones. The authors should either support this with biological examples or explain why this is not observed experimentally.

      • Second, perhaps the main conclusion of the paper is that morphogen gradients pattern best when the average cell diameter is small. The authors support this by reviewing the apical cell area of epithelial systems that are known to be patterned by morphogens and those that are not (presumably taking apical cell area as a proxy for cell diameter). However, the key parameter is not absolute cell diameter, but the cell diameter relative to the morphogen length scale. The authors should report the ratio of these two quantities in their literature analysis.

      • Third, as part of their literature analysis, the authors state that in the Drosophila syncytium, there are morphogen gradients, but they imply that because these gradients operate prior to cellularization, one cannot use the large distances between nuclei as counter evidence to their main conclusion. Rather than simply dismissing the case of the Drosophila syncytium, the authors should explain why this case does not apply, using reasoning based on their model assumptions.

      • Fourth, related to the above: the authors then state that there are no morphogen gradients known during cellularization. Unless I am misunderstanding their point, this is untrue. The Dpp gradient acts during the process of cellularization and specifies at least three distinct spatial domains of gene expression. Furthermore, not long after gastrulation, EGFR signaling patterns the ventral ectoderm into at least two distinct domains of gene expression. What are the cell areas in that case?

      Minor comments:

      • Figs 1cd:

      The way the system is set-up: (DL = 20 micron, Patterning Length (LP) = 250 micron, Nominal cell diameter (D) = 5 micron) the DL/L ~ 0.08 which makes the exponential profile far to a small value around 100 micron. This means in all these simulations, the LP was only around 100 micron, cells beyond that saw nearly zero concentration. Because of this, when diameters were varied from 0.2 - 40 micron, there could be as few as 2.5 cells in the "patterning region" which could be responsible for higher variability in DL and A.

      Would any of the results change if DL/L was higher, around 0.2?

      The source region is 25 microns in length and all cell diameters above 25 micron get defaulted back to 25 micron which explains the flatness lines in the region beyond mu_delta/mu_DL> 1

      Results:

      Pg 2 (bottom left): In the git repository code, the morphogen gradients are fit to a hyperbolic cosines function (described in reference 19) which is not described in the main text. Having this in the main text would help readers understand why fig 1c has variation in d only, D only and all k parameters whereas fig 1d has variation with all individual parameters p, d and D and all k.

      • Figure 3b:

      In figures where markers are overlapping perhaps the authors can use a "dot" to identify one set of simulations and a "o" to identify the ones under it. The way the plots are set up currently makes it hard for the reader to understand where certain points on the plot are.

      Methods:

      The Methods can be more descriptive to include certain aspects of the simulations such as adjusted lambda which is only described in the code and not the main text or supplementary.

      Git code:

      The git code function handles do not represent figure numbers and should be updated to make it easier for readers to find the right code

      Significance

      This manuscript contributes certain key aspects to the patterning domain. The three most important contributions of this work to the current literature are: (1) the scaling relationships developed here are important, (2) the idea that PE increases at the tail-end of the morphogen profile is nicely shown and (3) Comparison of various readout strategies.

    1. Author Response

      Reviewer #1 (Public Review):

      This theoretical (computational modelling) study explores a mechanism that may underlie beta (13-30Hz) oscillations in the primate motor cortex. The authors conjecture that traveling beta oscillation bursts emerge following dephasing of intracortical dynamics by extracortical inputs. This is a well written and illustrated manuscript that addressed issues that are both of fundamental and translational importance.

      We are pleased by the reviewer’s judgement about the importance of the question that we consider and about the presentation of our manuscript.

      Unfortunately, existing work in the field is not well considered and related to the present work. The rationale of the model network follows closely the description in Sherman et al (2016). The relation (difference/advance) to this published and available model needs to be explicitly made clear. Does the Sherman model lack emerging physiological features that the new proposed model exhibits?

      We view the work of Sherman et al (2016) and ours as complementary. Sherman et al propose a model of a single E-I module, using the terminology of our manuscript, that is much more detailed than ours since it approximately accounts for the layered structure of the cortex using two layers of multi-compartment spiking neurons, each comprising 100 excitatory neurons and 35 inhibitory neurons. This allows a detailed comparison of the model with local MEG signals. We used a much simpler description and only describe the population behavior of local E and I neurons populations in each module. However, contrary to Sherman’s model, this allows us to address the spatial aspect of beta oscillations which is the main target of our work. Our simple description of a local E-I module allows us to consider several hundred E-I modules with a spatially-structured connectivity and to analyze the spatio-temporal characteristics of beta activity. We have now described the relation of our work with Sherman et al (2019) in the discussion section (lines 540-547).

      The authors may also note the stability analysis in: Yaqian Chen et al., “Emergence of Beta Oscillations of a Resonance Model for Parkinson’s Disease”, Neural Plasticity, vol. 2020, https://doi.org/10.1155/2020/8824760

      We thank the reviewer for pointing out this paper that had escaped our notice. It presents the stability analysis of a single E-I module with propagation delay (and instantaneous synapses). At the mathematical level, the analysis brings little as compared to the much older article of Geisler et al., J Neurophys (2005) that we cite. However, the model specifically proposes to describe beta oscillations in the motor cortex as arising from the interaction between excitatory and inhibitory neurons, as we do. Therefore, we included this reference as well as a reference to the previous work of Pavlides et al., PLoS Comp Biol (2015) where the model was developed.

      The model-based analysis of the traveling nature of the beta frequency bursts appears to be the most original component of the manuscript. Unfortunately, this is also the least worked out component. The phase velocity analysis is limited by the small number (10 x 10) of modeled (and experimentally recorded) sites and this needs to be acknowledged.How were border effects treated in the model and which are they?

      We thank the reviewer for these points which gave us the opportunity to clarify them and improve our manuscript. As described in Methods: Simulations (line 847 and seq.) and shown in Fig. S2 (Fig. S10 in the original submission), we actually simulated our model on a 24 × 24 grid and did all our measurements in a central 10×10 grid to take into account that the electrode covers only part of the motor cortex. In addition to minimize border effects, we added on each side of the 24×24 grid two rows of E-I modules kept at their (non-oscillating) fixed points of stationary activity, as depicted in Fig. S2. In order to address the concern of the reviewer, and to check that indeed border effects had a minimal impact on our results, we have performed a new set of simulations on a 24×24 grid with periodic boundary conditions. The results are shown in the new supplementary Fig. S9 and are indistinguishable from those reported in the main text and figures. In particular, the proportion of the different wave types and the wave speeds are unaffected by this change of boundary conditions. A paragraph has been added in the revised version (lines 371-378) to discuss this point.

      How much of the phase velocities are due to unsynchronized random fluctuations? At least an analysis of shuffled LFPs needs to be performed.

      The phase velocities are indeed due to unsynchronized random fluctuations (coming from the finite number of neurons in each of our modules as well as, and more importantly, from the uncorrelated local external inputs). In order to check that the spatial-structure of connectivity was important, we followed the suggestion of the reviewer and also performed a new set of simulations to provide a further test. As proposed by the reviewer, after performing the simulations we shuffled in space the signal of the different electrodes and also did a parallel analysis where we shuffled the signal from different electrodes in the recording. We then reclassified the shuffled simulations/recordings in exactly the same way as the original ones. As shown in the new additional Fig. S16, this resulted in the full elimination of time frames classified as “planar waves” both in the model and in the experimental recordings. Additionally, it little modified the proportion of “synchronized” or “random” episodes which is intuitively understandable since shuffling does not change the nature of these states. In order to further assess the impact of connections between modules, we also decided to suppress them, namely to put their range l to zero. In order to avoid modifying the working point of a local module by this manipulation, we focused on the case without propagation delay. Without long-range connection, the local dynamics of each module is little modified. However, as shown in the new Fig. S18a, synchronization between neighboring modules is strongly decreased and the proportion of the different wave types is entirely changed: synchronized states and planar waves disappear and are replaced by random states. These results are described in two new paragraphs (lines 401-414 and lines 431-435).

      Is there a relationship between the localizations of the non-global external input and the starting sites of the traveling waves?

      This is also an interesting question that parallels some asked by the other reviewers and which we did our best to address. As described in the “Essential revisions” point 5) above, we aligned all “planar wave events” in space and time with the help of the spatio-temporal phase maps of the oscillations. We did find that planar waves were preceded by an increase in the global synchronization index σp, both in simulations and in experiments. In simulations this increase also corresponded to a shift of the global inputs away from their mean, as depicted in the new Fig. 4 in the main manuscript. However, no significant average spatio-temporal profile of the local inputs emerged when we used these temporal alignments. This is presumably due to the large variability of local inputs that can give rise to planar waves. We have described these results in the new section “Properties of planar waves and characteristics of their inputs”.

      In summary, this work could benefit from a widening of its scope to eventually inspire new experimental research questions. While the model is constructed well, there is insufficient evidence to conclude that the presented model advances over another published model (e.g. Sherman et al., 2016).

      As described in the “Essential revisions” and the discussion section of the manuscript, our work highlights a number of questions that can (and hopefully will) inspire new experimental research. We also hope that we have clarified above that our model complements Sherman et al.’s model and advances it as far as the spatial aspects of beta oscillations in motor cortex are concerned.

      Reviewer #2 (Public Review):

      Kang et. al., model the cortical dynamics, specifically distributions of beta burst durations and proportion of different kind of spatial waves using a firing rate model with local E-I connections and long range and distance dependent excitatory connections. The model also predicts that the observed cortical activity may be a result of non stationary external input (correlated at short time scales) and a combination of two sources of input, global and local. Overall, the manuscript is very clear, concise and well written. The modeling work is comprehensive and makes interesting and testable predictions about the mechanism of beta bursts and waves in the cortical activity. There are just a few minor typos and curiosities if they can be addressed by the model. Notwithstanding, the study is a valuable contribution towards developing data driven firing rate.

      We really appreciate the positive comments of the reviewer and thank her/him for them. We have done our best to correct the typos and to address the questions raised by the reviewer.

      1) The model beautifully reproduces the proportion of different kind of waves that can be seen in the data (Fig 3), however the manuscript does not comment on when would a planar/random wave appear for a given set of parameters (eg. fixed v ext, tau ext, c) from the mechanistic point of view. If these spatio-temporal activities are functional in nature, their occurrence is unlikely to be just stochastic and a strong computational model like this one would be a perfect substrate to ask this question. Is it possible to characterize what aspects of the global/local input fluctuations or interaction of input fluctuations with the network lead to a specific kind of spatio-temporal activity, even if just empirically ?

      This is an important question that parallels some asked by the other reviewers and which we did our best to address. As described in the “Essential revisions” paragraph above, we aligned all “planar wave events” either in phase or at their starting time points. We did find that planar waves were preceded by an increase in the global synchronization index σp, both in simulations and in experiments. In simulations this increase also corresponded to a shift of the global inputs away from their mean, as depicted in the new Fig. 4 in the main manuscript. When we used the same alignment to average spatio-temporal local inputs, we did not see the emergence of any significant patterns. This presumably reflects the high variability of local inputs able to produce a planar wave.

      Do different waves appear in the same trial simulation or does the same wave type persist over the whole trial? If former, are the transition probabilities between the different wave types uniform, i.e probability of a planar wave to transit into a synchronized wave equal to the probability of a random wave into synchronized wave?

      In the same trial simulation, different types of waves indeed successively appear. The curiosity of the reviewer led us to investigate this interesting point. Since time frames classified as random or synchronized are much more numerous than the planar (and radial) wave ones, it is much more probable that a planar wave transits into a synchronized or a random pattern than the reverse process (i.e., synchronized and random patterns preferentially transit into each other). Nonetheless, we considered questions related to the one of the reviewer. What are the states preceding a planar wave event? Given that a planar wave episode is preceded by a random (or synchronous) episode, is it more likely to be followed by a random or by a synchronous event? We actually find that the entry state is prominently a synchronized state. Furthermore, when the entry state is synchronized, the exit state is also synchronized much more often than would be expected by chance. This shows that most often, planar waves are created from an underlying synchronized persistent state. This has been described in the revised manuscript (lines 443-451).

      2) Denker et al 2018, also reports a strong relationship between the spatial wave category, beta burst amplitude, the beta burst duration and the velocity (Fig 6E - Denker et. al), eg synchronized waves are fastest with the highest beta amplitude and duration. Was this also observed in the model ?

      We had long exchanges with Michael Denker about his analysis since there are some differences between his code and what is described in Denker et al. (2017), possibly because of several typos in the Method section of Denker et al (2017). We have checked that the results of our code agree with his but there are some differences with the results obtained on the available datasets and those reported in Denker et al from other data sets. We have now provided the detailed statistics of the different wave types as obtained by our analysis in the simulation of model SN (Fig. S9) and SN’ (Fig. S11) and in the recordings for monkey L (Fig. S10) and monkey N (Fig. S12). In the recording data, the amplitude and speed of the synchronized and planar waves are comparable and higher than in the radial and random wave types. The duration of synchronized events is longer than the one of planar waves and of the other waves types. Comparable results are obtained in the simulations with nonetheless a few differences: the mean amplitude of planar waves is somewhat larger than those of synchronized states, the hierarchy of duration in the different states is respected but the duration themselves are longer in the simulations than in the recordings (about 40 % for the planar waves and almost two times longer for the synchronized states). We attribute these differences to the fact synchronization is slightly less effective in the recordings than in the model. Long synchronization episodes in the recordings are often cut-off by a few time frames where the synchronization index goes below the threshold value for a synchronized pattern. This happens rarely enough not to affect much the global statistics of the different states but it as a much more visible effect on the measured duration of the synchronized states.

      Reviewer #3 (Public Review):

      In this manuscript, the authors consider a rate model with recurrently connections excitatory-inhibitory (E-I) modules coupled by distance-dependent excitatory connections. The rate-based formulation with adaptive threshold has been previously shown to agree well with simulations of spiking neurons, and simplifies both analytical analysis and simulations of the model. The cycles of beta oscillations are driven by fluctuating external inputs, and traveling waves emerge from the dephasing by external inputs. The authors constrain the parameters of external inputs so that the model reproduces the power spectral density of LFPs, the correlation of LFPs from different channels and the velocity of propagation of traveling waves. They propose that external inputs are a combination of spatially homogeneous inputs and more localized ones. A very interesting finding is that wave propagation speed is on the order of 30 cm/s in their model which is consistent with the data but does not depend on propagation delays across E-I modules which may suggest that propagation speed is not a consequence of unmylenated axons as has been suggested by others. Overall, the analysis looks solid, and we found no inconsistency in their mathematical analysis.

      We thank the reviewer for his comments and for his expert review.

      However, we think that the authors should discuss more thoroughly how their modeling assumptions affect their result, especially because they use a simple rate-based model for both theory and simulations, and a very simplified proxy for the LFPs.

      In the revised manuscript, we have performed additional simulations to test different modeling assumptions as suggested by the reviewer and discussed further below.

      The authors introduce anisotropy in the connectivity to explain the findings of Rubino et al. (2006), showing that motor cortical traveling waves propagate preferentially along a specific axis. They introduce anisotropy in the connectivity by imposing that the long range excitatory connections be twice as long along a given axis, and they observe waves propagating along the orthogonal axis, where the connectivity is shorter range. Referring specifically to the direction of propagation found by Rubino et al, could the authors argue why we should expect longer range connections along the orthogonal axis? In fact, Gatter and Powell (1978, Brain) documented a preponderance of horizontal axons in layers 2/3 and 5 of motor cortex in non-human primates that were more spatially extensive along the rostro-caudal dimension as compared with the medio-lateral dimension, and Rubino et al. (2006) showed the dominant propagation direction was along the rostro-caudal axis. This is inconsistent with the modeling work presented in the current manuscript.

      This is an important comment and we thank the reviewer for pointing out these data in Gatter and Powell (1978). Since the experimental data show that planar wave propagation directions are anisotropically distributed, we have tried and investigated what the underlying mechanism of this anisotropy could be in the framework of our model. Anisotropy in connectivity is an obvious possibility. Given our result, and the data of Gatter and Powell, it appears however that it is not the underlying cause of the observed anisotropy direction in the motor cortex (in the framework of our model). We have thus investigated another possibility, namely that the local external inputs are anisotropically targeting the motor cortex, being more spread out along a given axis (lines 510-529 and new Fig. 5g-l). We find that planar waves propagate preferentially along the orthogonal axis. This leads us to conclude that the observed propagation anisotropy could be of consequence of the external input being more spread out along the medio-lateral axis. Data addressing this issue could be obtained using retroviral tracing techniques.

      The clarity and significance of the work would greatly improve if the authors discussed more thoroughly how their modeling assumptions affect their result. In particular, the prediction that external inputs are a combination of local and global ones relies on fitting the model to the correlation between LFPs at distant channels. The authors note that when the model parameter c=1, LFPs from distant channels are much more correlated than in the data, and thus have to include the presence of local inputs. We wonder whether the strong correlation between distant LFPs would be lower in a more biologically realistic model, for example a spiking model with sparse connectivity and a spiking external population, where all connections are distant dependent. While the analysis of such a model is beyond the scope of the present work, it would be helpful if the authors discussed if their prediction on the structure of external inputs would still hold in a more realistic model.

      This is a legitimate question that we indeed asked ourselves. In a previous work with a simpler chain model, we only considered finite size fluctuations. We found good agreement between our simplified description of finite size fluctuations and simulations of a spiking network with fully connected modules and sparse distance-dependent connectivity. This leads us to believe that our description of finite-size fluctuations is reliable in this setting. Assuming that it is the case, we find that with 104 neurons or more per module finite size noise is not strong enough to replace our local external inputs. Even with 2000 neurons per modules the intrinsic fluctuations the network is very synchronized (new Fig. S15e-g). With 200 neurons per module, the intrinsic fluctuations are strong enough to replace the fluctuating local inputs (Fig. S15a-d) but this is quite a low number. Our description of local noise would have to underestimate the fluctuation in a more sparsely connected network by a significant amount for agreement with the data to be obtained without local inputs. Moreover, it seems to us quite plausible that different regions of motor cortex receive different inputs but, of course, this can only settled by further experiments. Together with the new Fig. S15, we have added a paragraph to address this question in the manuscript (lines 379-400).

  6. notebooksharing.space notebooksharing.space
    1. Group project: the Climate System

      Excellent project, to the point. (sometimes too much to the point). Excellent code quality.

      Correctness 3 Quality 5 Originality 2 Total 10/10

    1. Comments in Python is the inclusion of short descriptions along with the code to increase its readability. A developer uses them to write his or her thought process while writing the code.

      IOST23 ASSIGNTMENT1 PYTHON COMMENT

    1. Comments can be used to explain Python code. Comments can be used to make the code more readable. Comments can be used to prevent execution when testing code.

      IOST23

    1. There is no predetermined correlation between this import path and the file system, and the imported module doesn’t have to know anything about the import path used in an importing module.

      This is not a good approach. It's the opposite of what you want. Module resolution remains easy for computers (because of their speed), but tedious for humans.

      As a writer, maybe there's some benefit for no correlation. As a reader trying to understand a foreign codebase, esp. one who is in the moment trying to figure out, "Where is this thing defined? Where can I read the source code?" when jumping through procedure definitions, not being able to trivially ascertain which file a given thing is in is unnecessary friction. Better to offload a tiny bit of work onto the author who knows the codebase (or at least their own immediate intention) well rather than to stymie the progress of dozens/hundreds of readers trying to work things out.

    1. Abstract

      This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.75) and has published the reviews under the same license. These are as follows.

      Reviewer 1. Ned Peel

      Is the source code available, and has an appropriate Open Source Initiative license (https://opensource.org/licenses) been assigned to the code?

      Scripts have been made publicly available on GitHub (https://www.github.com/phiweger/adaptive) under an OSI-approved BSD-3-Clause license.

      As Open Source Software are there guidelines on how to contribute, report issues or seek support on the code?

      No.

      Is the code executable?

      Unable to test

      Have any claims of performance been sufficiently tested and compared to other commonly-used packages?

      Not applicable.

      Additional Comments: Sent authors accompanying file with comments

      https://gigabyte-review.rivervalleytechnologies.com/journal/gx/download-files?YXJ0aWNsZT0zNjkmZmlsZT0xMzcmdHlwZT1nZW5lcmljJnZpZXc9ZmFsc2U~

      Reviewer 2. Julian Sommer

      Is the code executable?

      The code used for analysis of the data has been published on the corresponding github page. Although, a link on this page for downloading data from a public database does not work at the time of testing. (Resource deleted). Also, most parts of the code are executable, the generated data and figures resulting from the code does not reproduce the figures from the publication.

      Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined?

      Yes. The code placed in the github repository can be executed mostly, but require basic knowledge of coding in the used programming languages. However, for the data presented in this work, I do not see the need for more detailed instructions.

      Is the documentation provided clear and user friendly?

      Only partly

      Is there a clearly-stated list of dependencies, and is the core functionality of the software documented to a satisfactory level?

      Only partly. However, I do not see the need for further instructions.

      Is test data available, either included with the submission or openly available via cited third party sources (e.g. accession numbers, data DOIs)?

      The data is available from the stated accession numbers, but an additional data link on the github page does not work and might be necessary to test the complete code.

      Additional Comments:

      The study compared three methods of oxford nanopore-based longread sequencing for detection of antibiotic resistant bacterial pathogenes. Therefore, the authors used cultivation based detection of carbapenem-resistant bacteria from a rectal swap and subsequent singe isolate sequencing. This technique was compared to an adaptive sequencing approach using a database of antibiotic resistance genes for adaptive sequence enrichment during the sequencing run facilitation oxford nanopore sequencing. The underlying technology is a unique approach, made possible by the oxford nanopore real-time sequencing technology and is of great interest for future applications in clinical microbiology diagnostics. Therefore, this study is of great importance for this field in general. As additional method, the authors performed metagenome sequencing of the rectal swap without culture, which is a completely different technique with unique advantages and drawbacks, compared to culture-based sequencing methods. This study is important for the development of real time sequencing and adaptive sequencing for the detection of antibiotic resistance genes and in future potentially other genes. It focusses on the adaptive sequencing approach, analysing in detail the factors influencing the performance of this new approach. The number of experiments is limited, as stated by the authors, but the data is nevertheless valuable for future projects. For further improvement, I have some suggestions for the manuscript. 1. The comparison of the three methods is quite complex and one of the main goals of this paper, illustrating, that low-cost sequencing devices (Flongle) can be used for detection of antibiotic resistance genes applying adaptive sequencing. Therefore, the description of this comparison and figure 1C is essential for understanding the data of this comparison of methods. However, figure 1C is hard to read and the represented data is not easily accessible. To clarify, I suggest including additional information. Does the “Set size” and “Intersection Size” describe absolute number of detected antibiotic resistance genes? This information could be included. To achieve additional connection from the legend of figure 1C, the absolute numbers of detected genes could be included to the text, supplementing the already stated relative detection numbers (lines 51-54, 137-142). Since this figure part is essential for the understanding, a larger version of this representation would be nice. 2. Figure 2 is essential for interpretation of the presented data on variables influencing the adaptive sequencing performance. a. Figure 2A is not easily accessible, in fact I am not sure, what information about the data is represented in this part of the figure (data throughput?). The figure legend does not explain, what is shown. I suggest clarification or, if applicable, deletion of this subfigure, for increased readability of figure 1B-D. b. Figure 2D: The meaning of the “log median read length” is not explained in the text or the figure legend and should be clarified. c. Figure 2E: Same as for Figure 2D. In line 119, the absolute read length (3 kb) is stated, but this number is not visualised in this figure. I suggest adding additional information to the text, to make the representation of the data in the figure easily discoverable. 3. Discussion: In my opinion, the discussion part has some potential for improvement. a. Line 158 – 162: The authors argue that selective cultivation and subsequent adaptive sequencing for antibiotic resistance genes leads to rapid results, important for public health responses. Metagenomic sequencing on the other hand needs at least the equal time and is not cost effective. However, might the combination of metagenomics sequencing without culture and adaptive sequencing decrease the turnaround time even more without significantly higher costs? Although, experiments on this are not in the scope of this study, the authors could discuss this for future applications. b. Line: 165: “[…] reads were detected for all resistance genes known to be present […] This result does not match the results stated in line 141 “57.9 % of the resistance genes found” and line 184 “nearly two-thirds of all resistance genes”. This should be clarified or the corresponding data should be referenced in the discussion for readability. c. Line 169: Since the identity of sequencing results and hit to the database is important for detection and overall performance of the adaptive sequencing approach, I suggest discussing, if future improvement of sequencing accuracy (basecalling algorithm, pore design) might influence the performance of this approach, as only shortly mentioned in line 190. d. Line: 190 “variable sequencing yield of this new flow cell type”: This aspect is solely introduced in the conclusion and should be mentioned and discussed beforehand.

      Minor comments: 1. Figure 1 description: “[…] carrying nine plasmids and four carbapenemases genes […]”. In line 12, the Raoultella isolate is described carrying three carbapenemases. The OXA-1 beta-lactamase pictured in figure 1A is not a carbapenemase. The correct number should be three carbapenemases. 2. Line 67: Flongle flowcells were introduced in 2019. I suggest to delete “recently introduced”. 3. Line 210: The link is not correct. 4. Line 244: “Community standards”: It would be nice to add an additional reference. 5. Line 255. Reference is missing. 6. Line 283: This step

    1. And it has lots of other capabilities. It can answer historic questions (who was president of the US in 1956), it can write code (Satya Nadella believes 80% of code will be automatically generated), and it can write news articles, information summaries, and more

      1st note

    1. The code above is somewhat simplified and missing some checks that I would advise implementing in a serious production application. For example:The request contains a Date header. Compare it with current date and time within a reasonable time window to prevent replay attacks.It is advisable that requests with payloads in the body also send a Digest header, and that header be signed along in the signature. If it’s present, it should be checked as another special case within the comparison string: Instead of taking the digest value from the received header, recompute it from the received body.While this proves the request comes from an actor, what if the payload contains an attribution to someone else? In reality you’d want to check that both are the same, otherwise one actor could forge messages from other people.
    1. Float is a function or reusable code in the Python programming language that converts values into floating point numbers. Floating point numbers are decimal values or fractional numbers like 133.5, 2897.11, and 3571.213, whereas real numbers like 56, 2, and 33 are called integers.

      Meaning of Float in Python proramming language

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary:

      The submitted manuscript is comparing the effect of individual chaperones and heat-resistant obscure (Hero) proteins on the overall folding of the TDP-43 LCD-domain and its relation to aggregation propensity. Therefore, the authors apply smFRET in order to deduce eventual morphological changes of the LCD-domain from FRET efficiencies. The authors observe that the LCD domain extends its structure upon binding of chaperone/Hero proteins whereas it is collapsed in the absence of those. Furthermore, immunoblotting of filter trap assays indicate that overexpression of chaperones and Hero proteins reduce aggregation of TDP-43 in vivo. Both, the morphological effects on the LCD-domain and the aggregation propensity are significantly enhanced for the TDP-43 A315T mutant. Moreover, the authors tested a charge depleted Hero protein version with reduced "chaperone-like" behaviour. Therefore, the authors conclude that the binding or chaperone activity of the Hero protein is based on its residue specific charges. Finally, the authors conclude that Hero proteins can act similar to chaperones in order to keep protein homeostasis under stress conditions.

      We thank the Reviewer for their insightful evaluation of our study.

      Major comments:

      The similar effect of chaperones and Hero proteins on the morphology of TDP-43 found by the authors is intriguing and the applied experimental procedures seem well described and conducted.

      However, the assumption of the authors that a change in morphology of the LCD-domain by the chaperones and Hero proteins is directly connected to the reduction of TDP-43 aggregation is not entirely clear. Whether an overexpression of individual chaperones and Hero proteins has a direct effect on TDP-43 aggregation cannot be tested in vivo, only. It cannot be excluded that inside the cell the here tested chaperones and Hero proteins control intermediate processes or work as co-factors for other proteins involved in protein homeostasis rather directly influencing the aggregation of TDP-43. Therefore, I recommend in vitro aggregation experiments, using ThT signal as a readout. By adding chaperones, Hero proteins and a negative (BSA or others) control individually, a direct effect on TDP-43 aggregation could be concluded. Those experiments have been extensively used in the field and are quick and straightforward to handle.

      As the Reviewer explains, indirect effects on TDP-43 aggregation in cells may be accounted for by conducting aggregation experiments in vitro, with recombinant proteins. We are currently designing such experiments based on a previously described full-length recombinant TDP-43 with a TEV-cleavable MBP tag (Wang 2018 EMBO J). This can be incubated with Hero/DNAJA2/Control, and aggregation induced by cleavage of the tag, after which aggregation can be measured via filter trap similar to the method described in our work. We will include these results in our revised manuscript.

      We thank the Reviewer for their advice. While we note that it is controversial whether ThT binds to aggregates formed from full-length TDP-43 (used in all our assays in the current manuscript), it is reasonable to apply this assay to the LCD fragment as in the paper referenced by the Reviewer below (Lu 2022 Nat Cell Biol). Such an assay is also a reasonable method for confirming effects of Hero protein and DNAJA2 in vitro, and we can conduct this assay as a back-up if the above does not work.

      In addition, focusing on the LCD-domain as a main driver for TDP-43 aggregation is limiting this study. In particular, recent studies [1] indicate that the RRM1 and RRM2 sites of TDP-43 have a major impact on TDP-43 gelation and maturation to solid aggregates. Unfortunately, those sites have not been studied in this manuscript.

      We thank the Reviewer for their insight. While we are keen to investigate the impact of other regions on the aggregation of TDP-43 in the future, we chose to focus on the LCD in our current study because our smFRET assay is particularly suitable to monitor the dynamic conformational nature of this flexible, unfolded region.

      However, we agree with the Reviewer that it is possible the RRMs have an effect on the activities of Hero11 and DNAJA2. We will create constructs for the RRM-depleted variant, TDP43ΔRRM1&2, and RNA-binding deficient variant, TDP435FL for use in our cell-based assay. This will allow us to investigate how this domain influences the effects of Hero and DNAJA2, and we will include this in our revised manuscript.

      As an optional alternative for using Hero11KR->G could be the alteration of buffer conditions and using higher number of salts to promote charge screening. It would be of interest whether the results with the Hero11KR->G could be reproduced with wild type Hero11.

      We will perform our assays with Hero11 in high salt conditions for charge screening. While we agree that it may be a great alternative experiment, we note that changing the salt concentration may directly affect the LCD conformation, possibly complicating interpretation of results.

      [1] Lu et al. Nat Cell Biol;24(9):1378-1393 (2022)

      Minor comments:

      Overall, the text is clearly written, and the figures are appropriate.

      Whether the activity of individual chaperones or Hero proteins on TDP-43 aggregation "may result in the overall fitness of the cell" or "reinforcing the conformational health of the proteome" is disputable without knowing how the overexpression of certain chaperones or Hero proteins alter the formation of toxic TDP-43 oligomers.

      We thank the Reviewer for their balanced critique. We will remove or weaken this point regarding how Hero proteins "may result in the overall fitness of the cell" or may be "reinforcing the conformational health of the proteome" from the discussion.

      Reviewer #1 (Significance (Required)):

      Studying the mechanistic effects of chaperones on aggregating proteins is of major interest for the field in order to understand aging related disbalance of protein homeostasis and the progression of neurological decline, such as seen for amyotrophic lateral sclerosis (ALS). Furthermore, finding homolog proteins, also being able to inhibit protein aggregation, can help to understand overall mechanisms of protein aggregation and processes preventing such fatal behaviour. However, the technique used in this manuscript are not very novel and have been used numerously times before. smFRET is a common technique to look at protein folding/unfolding and is used frequently as a molecular ruler. The manuscript is of interest for the field of protein aggregation and folding, smFRET and neurodegeneration.

      My expertise lies in the field of protein aggregation and inhibition due to chaperones, measuring molecular interactions and neurodegenerative diseases.

      We greatly appreciate the Reviewer’s expert opinion on our work. As the Reviewer explains, we believe our work will contribute to the fields of protein aggregation and folding, smFRET and neurodegeneration. While the smFRET method may not be novel on its own, to our knowledge this is the first observation of the TDP-43 LCD, with the effect of a pathogenic mutation, at the single-molecule level. In fact, the production, dye-labeling and isolation of individual molecules is extremely challenging for TDP-43. This was made possible by our technical advances using genetic code expansion to site-specifically introduce an unnatural amino acid in TDP-43, purifying and labeling the TDP-43 from HEK cells, and isolation on glass slides.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, the authors build on their findings (Tsuboyama 2020) that electrostatically charged IDPs (Heros) can protect proteins from denaturation and aggregation. In their previous work, they demonstrate that these Hero proteins could decrease the fraction of insoluble GFP-TDP43∆NLS in mammalian cell lines and that this mode of action was related to the electrostatic charge of the proteins and not sequence dependent. Although this protective mode of action appears to be similar to that of canonical chaperones, it is unclear how the Hero proteins compare. In this study, the authors compare Hero11 to a panel of canonical chaperones in their cell-based assays and show that it prevents aggregation in a comparable way to DNJA2. It appears that Hero11 decreases the GFP-TDP43∆NLS aggregates better than some other chaperones. They then utilise their expertise in smFRET analysis (Tsuboyama, 2018) to compare what effect DNJA2 and Hero11 (along with Hero11KR-->G (non-charged control)) have on the dynamic structures of the GFP-TDP43∆NLS (labelled with complementary fluorophores in the LCD domain). Based on analysis of the WT GFP-TDP43∆NLS and the A315T GFP-TDP43∆NLS, the authors suggest that the presence of Hero11 and DNJA2 maintain the LCD-domain of TDP43 in an extended conformation and that by doing so, aggregation can be prevented (as assessed in the cell-based assay).

      Despite finding the results very interesting, I feel that the study is preliminary and the conclusions drawn are not fully substantiated by the presented experimental work. Many questions need addressing to validate these findings and conclusions (please see more in the "Significance" section). I have tried to list the main concerns below.

      We thank the Reviewer for their detailed and critical assessment of our current study.

      Questions/concerns:

      Authors used double transient transfections but have not shown quantification of protein levels of the chaperones versus TDP43 - western blots to confirm proper expression (and levels) of the chaperones/Hero protein is crucial without it, we cannot assume that the differences in TDP-43 aggregation are a result of effective chaperoning or due to a lack of expression of any of the chaperone proteins, or high expression of others.

      We agree with the Reviewer that this is an important and straightforward validation experiment. We will perform the Western Blotting to confirm the proper and comparable expression of the chaperones/Hero proteins.

      Authors used quite a high BSA concentration in the smFRET work; it would be useful to see what the TDP43 smFRET trace looks like without BSA incubation (to ensure it is not causing some effect). Also, is there a concentration dependence? The Authors mention they are unable to identify a Hero/TDP43 complex; but if the amount of Hero protein is high (given that it is single molecules tethered), the change in compaction may not relate to the levels/ratios found in the cells (where changes to aggregation are occurring). have the authors considered whether positively charged polymers (poly-Lys) have any impact on the TDP-43 smFRET distribution? Given that the smFRET trace is so heterogeneous, to understand what is happening here would require the comparison of more than 2 variants.

      As the Reviewer suggests, we will include additional smFRET experiments in our revision.

      First, we will perform the smFRET experiment of the TDP-43 alone in the PBS buffer. However, we would like to clarify the reason we used BSA incubation for comparison in the current experiment is to account for the possibility of non-specific macromolecular crowding effects on the conformation of the LCD (an effect reported for IDPs in general, for example in Banks 2018 Biophys. J.); we expected that it would be fair to compare Hero11 against another protein, rather than buffer alone. As the Reviewer suggested, we can also perform the same experiments at lower concentrations of Hero11 and DNAJA2, including equimolar concentrations (as suggested below). Moreover, we can also test poly-K peptides for comparison.

      Although the A315T variant has a very distinct smFRET profile, it is clear that the effects of Hero11KR-->G (that is proposed to have no effect on aggregation or on the smFRET of WT TDP43) has a clear impact on A315T. Why is this?

      We thank the Reviewer for raising this interesting point. We envision that the observed effect is due to weak interactions between the LCD domain of TDP-43 and Hero11KR->G; even without K and R, there many other functional amino acids that are fully accessible due to the extremely disordered nature of the protein. The effect is easier to be observed with the A315T mutant, compared to the WT TDP-43, presumably because the mutant tends to take more compact conformations on its own. Nonetheless, unlike WT Hero11, Hero11KR->G fails to accumulate the very extended form of the LCD (FRET signal of ~0; please see below for the explanation of this value), which appears to be associated with suppression of aggregation. We will include these in our discussion.

      The LCD region is prone to PMTs - as the tethered protein is taken from expression in mammalian cells, how can the authors be sure that it has no PMTs? Although a clear difference is observed between WT and A315T in terms of "compactness" of the LCD domain, we cannot assume that the effect of DNAJ2 and Hero11 are the same - in fact, the Hero11 KR-->G control for the A315T is clearly different from the negative control (BSA) and the effect that was seen in WT. As the LCD domain is well-known to be the site of post-translational modifications (ie. Phosphorylation - this would have an effect on an electrostatic Hero11), could the effects be related to changes in PMTs as well?

      We thank the Reviewer for their insight. We would like to clarify that we make no assumption that our dye-labeled TDP-43 is free of post-translational modifications. Indeed, the fact that it is derived from HEK293 cells suggests it should have post-translational modifications relevant to humans and may be even considered an advantage of our method. (Most structural methods require purification of a large amount of protein, often only possible through recombinant expression in E. coli, thus lacking human-relevant PTMs.) As the Reviewer points out, the LCD is known to have many phosphorylation sites, which may help explain how the positively charged Hero11 interacts with it. Thus, we will perform mass spectrometry of TDP-43 and the A315T variant expressed in HEK cells to identify what post-translational modifications are present.

      The authors mention other studies on DNJA proteins on TDP-43; is the mechanism by which they suppress aggregation known? If the authors want to compare the unknown effects of Hero11, it would be useful to know what DNJ2A is doing, otherwise, the results are still not conclusive, only that "change is similar" in two experiments. What is known about DNJ2A interactions with TDP-43? Did the authors do any pulldown assays to detect a complex in cellulo?

      While previous studies have identified various DNAJ (specifically J-domain protein B-subfamily) proteins that suppress aggregation of overexpressed TDP-43, not much is known of this specific interaction (Udan-Johns 2014 Hum Mol Genet, Chen 2016 Brain, Park 2017 PLOS Genet). To address the Reviewer’s questions, we will include experiments characterizing the effects of DNAJA2 on TDP-43. We will perform colocalization experiments, explaining effects of DNAJA2 and Hero11 on TDP-43 in the cell. As explained below, we will also perform Pulse Shape Analysis (PulSA), a flow cytometry-based method that can be used to study protein localization patterns in cell, which will also provide insight into the effects on the distribution of TDP-43 in cells. We can also perform co-IP of TDP-43 to detect if there is a detectable, stable complex with DNAJA2 and/or Hero11. Together, these will clarify the similarities and differences between DNAJA2 and Hero11.

      It is unclear how the findings of the smFRET relate to structural understanding of the LCD-domain of TDP43 (i.e. NMR studies?); is it known whether PTMs are more prominent with the A315T variant as this may explain it's more compact nature? As well, putative helical structure in the LCD domain may lend to the changes in compaction.

      The Reviewer brings up an interesting and careful discussion. Currently, it is unknown if PTMs actually cause more compaction, or if they are more prominent in the A315T variant, but we will perform mass spectrometry to detect PTMs.

      As the Reviewer mentions, it would be very interesting to compare our smFRET results to other studies of specific LCD structures. However, it is not trivial to deduce lengths (and structure) from smFRET data as various other factors, for example, dye orientation and local chemical environment, may affect FRET efficiency. Nonetheless, we can still cautiously provide a discussion of how our FRET results compare with previous studies.

      For the dye pair used in our study, Cy3 and ATTO647N, the low/no FRET signals promoted by DNAJA2 and Hero11 correspond to a range of end-to-end distances of 6.9 nm to 10.2 nm (FRET signals of 0.1 to 0.01, respectively). Assuming that the LCD behaves like a ~140 amino acid worm-like chain (WLC) with persistence length (Lp) = 0.8 nm, we expect a mean end-to-end distance of 7.35 nm. Thus, the low FRET peak can be well explained by promotion of an extended WLC behavior of the LCD by DNAJA2 and Hero11. On the other hand, the FRET peaks of WT LCD and the A315T mutant (in the absence of Hero11 or DNAJA2) correspond to ~4 and ~3.3 nm, respectively. We will include a careful discussion of how our results relate to known structural understanding of the LCD in the revised discussion.

      It is unclear how there can be such a prominent FRET ~0 peak and in fact negative values.

      We regret that we did not clearly explain this in the manuscript. Negative values arise when applying correction factors from the alternating laser scheme (ALEX) to FRET signals. FRET efficiency, E, is the ratio of acceptor signal intensity, IA, over the total signal intensity, ID+IA, (with the application of a correction factor, γ, but this doesn’t affect the negative values and won’t be discussed further here) and is given by the equation: E=IA/(γ×ID+IA). However, due to leakage of the donor signal into the acceptor channel and direct excitation of the acceptor dye by the donor laser, raw IA values, IA,raw, are erroneously higher than in reality. For example, the ~0 FRET peaks in question appear to be around 0.1–0.2 before correction. These are accounted for by applying the respective correction factors, Dleakage and Adirect, through the equation: IA=IA,rawDleakage×IDAdirect×IAA. (IAA is the acceptor signal during excitation of the acceptor dye.) These two correction factors are determined by observing the traces and choosing the mean values using iSMS software (2015 Preus Nat Methods) and applied uniformly to all traces in an experiment. When IA is especially low, such as when FRET is almost 0, the magnitude of the correction factor terms may be larger than IA,raw, resulting in negative values. This does not mean that values less than 0 are invalid, but merely that they have been overcompensated in the error application. For the dye pair in our study, FRET efficiencies less than 0.1 correspond to distances greater than 6.9 nm, meaning peaks around zero represent LCD behaviors with end-to-end distances greater than around 7 nm. Please also note that kernel density estimation often gives distributions with values beyond the (0,1) range just because of how these plots are constructed. This will be added to the methods in the revised manuscript.

      Conclusion is that Hero11 and DNJA2 maintain the TDP43 LCD-domain (soluble protein) in an extended form and that this is linked with the decrease in aggregates found in the cell; however, with the cell-based assay, no analysis to quantify the expression levels of the TDP43 and the chaperones/Hero are presented, and more importantly, no analysis on the complementary soluble fraction (to the filter assay) has been done to show that indeed, these biomolecules maintain the proteins in a soluble form. It is possible that the TDP-43 is being degraded?

      As described above, we plan to perform Western Blotting to examine the expression levels of these proteins. To address the concerns about solubility, we will perform Pulse Shape Analysis (PulSA) to quantitatively measure the expression and soluble/aggregated distribution GFP-tagged TDP43 in HEK293T cells. Measuring the soluble diffuse signals and the punctate aggregate signals will also tell us if there are differences in how GFP-TDP43 is aggregated between Hero11, DNAJA2 and controls. In addition, to support results from the FTA, we will provide sedimentation assays, where the soluble and aggregate fraction from cells is separated by centrifugation and analyzed (Krobitsch 2000 PNAS). These will provide information on TDP-43 in the soluble fraction.

      Reviewer #2 (Significance (Required)):

      Contextually, this study has novelty and potential value for basic research. Firstly, understanding the underlying mechanisms by which Hero protein prevent aggregation would be valuable towards understanding the players in protein homeostasis which can be imbalanced with respect to disease. Secondly, the use of smFRET as a tool in understanding the dynamics of TDP-43 and mutational variants can be powerful in defining structural attributes with pathological consequences in ALS. Although this work shows comparisons between the effect of a canonical chaperone (DNJA2) and Hero11 on the dynamics of monomeric protein and the effect on cellular aggregation, proposing a general mechanism on the data from two TDP-43 variants and a cell-based aggregation assay is premature and more experimental evidence is needed to define the critical link that prevents aggregation of TDP-43 within the cell. Mechanistically, the study does not give a lot of additional insight into the mode of action of Hero11 in the process of preventing aggregation (nor does it explain what DNJA2 is doing and therefore how Hero11 compares and contrasts). The proposed "extended versus collapsed" switch is simplistic and doesn't account for the complexity of TDP-43 structural dynamics. To support their proposed mechanism of action, the authors needs to examine TDP-43 mutational variants (specifically disease-related ones) using their smFRET to understand exactly what the "collapsed" and "extended" data is defining before making the leap that this effect is what is preventing aggregation. There are some structural studies about residual structure in this region (via NMR) that should be considered (https://doi.org/10.1016/j.str.2016.07.007). Although the A315T variant has a very distinct smFRET profile, it is clear that the effects of Hero11KR-->G (that is proposed to have no effect on aggregation or on the smFRET of WT TDP43) has a clear impact on A315T. Why is this? Have the authors considered that the LCD domain of TDP43 is prone to post-translational modifications? Is this variant more phosphorylated - a PMT like phosphorylation is surely to have an impact on interactions with Hero proteins as they are positively charged. Given that the protein is expressed in mammalian cells, it is likely that PMTs have occurred (but the authors should analyse for this).

      With regards to the cell-based aggregation assays, the authors again present a simplified relationship - however, a number of control experiments and additional questions arise. It appears that there is less aggregation with co-expression of some chaperones and the Hero11, but what about the soluble fraction? What is the impact of these biomolecules? Is this that it is maintaining soluble protein, enhancing degradation, propagating soluble oligomers? Equally, how do we know that the levels of the chaperones/Heros and the TDP-43 is the same in each cell - these are transient transfections, and no western blots are shown to confirm the levels of the proteins. In fact, the authors state that "co-transfection of HSP70 (HSPA8), HSP90 (HSP90AB1) or HOP all failed to suppress TDP-43 aggregation compared to GST" and mention that this is in contrast to other studies, but could this be a failure to express these in the cell models? Some western blot/lysate analysis is needed. Chaperones often form complexes with their client proteins, is there any evidence of complexes in these cell models (i.e. using immunoprecipitation)?

      We thank the Reviewer for their detailed evaluation and interest in our work. As the Reviewer describes, smFRET is a powerful tool for studying the conformational dynamics of TDP-43, and we hope that this study will contribute to our understanding of how Hero proteins and chaperones prevent aggregation.

      We are also grateful to the Reviewer for their constructive criticism of our current model, and we will revise it accordingly. We completely agree with the Reviewer that there are complex structural dynamics within the LCD that determine aggregation and phase separation behaviors. Our simple model was intended to explain how external factors that suppress aggregation, DNAJA2 and Hero11, could affect the conformation of LCD at the single-molecule level. As discussed above, we were cautious to over-interpret how our FRET observations correlate to specific conformations, leading to this simplistic model. We do not intend for our explanation of “extended versus collapsed” in the model to explain all structural dynamics of the LCD; rather, we wanted to highlight the characteristic low FRET state promoted by DNAJA2 and Hero11. We believe that the experiment plan explained above will address the Reviewer’s concerns in full, and we thank the Reviewer again for helping us to significantly improve our manuscript.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      - In a recent study (PLosBiol, 2020) the same authors described an interesting class of proteins they call 'Hero'. Based on their analyses in cultured cells and transgenic Drosophila models the authors concluded that 'Hero' proteins protect against protein instability and aggregation. So far, this class of proteins has not been analyzed by independent groups.

      In the current manuscript, they mainly confirm their own previous finding that Hero 11 prevents There are several concerns about the presented data:

      We thank the Reviewer for their critical comments on our current manuscript.

      - Based on the filter trap assays shown in figure 1 and 3 the authors conclude that DNAJB8 and Hero11 specifically interfere with the aggregation of TDP-43. However, they do not show that the expression levels of TDP-43 are not altered by the co-expression of the different proteins and are comparable in the different samples. In order to make a relevant statement about the anti-aggregation activity of the analyzed proteins, the ratio between soluble and aggregated TDP-43 has to be analyzed.

      We would like to clarify that the Reviewer means DNAJA2, not DNAJB8. Following the Reviewer’s advice, we will perform Western Blotting combined with sedimentation assays, where the soluble and aggregate fraction from cells is separated by centrifugation and analyzed to examine the expression levels. We will also perform colocalization experiments and Pulse Shape Analysis (PulSA), a flow cytometry-based method that can be used to study protein localization patterns in cell, which will provide insight into the anti-aggregation activities.

      - The FRET assays shown in figures 2 and 4 indicate a slightly higher FRET efficiency in the presence of Hero11 and DNAJA2 and Hero11. The authors postulate that is phenomenon is causally linked to the activity of Hero11 to prevent aggregation of TDP-43. First, it remains unclear whether the slight increase is really significant. Second, I could not find any experimental evidence to support the assumption that a more collapse conformation of the TDP-43 LCD measured in single molecule FRET assays, correlates with an increased aggregation tendency of TDP-43.

      We apologize that we are not sure what the Reviewer refers to by “a slightly higher FRET efficiency in the presence of Hero11 and DNAJA2 (and Hero11).” We would like to clarify that, in the presence of Hero11 and DNAJA2, what we observed was a very low (not slightly higher) FRET efficiency of ~0 (Figure 2g and h), suggesting an extended conformation. In contrast, the aggregation-prone A315T variant of TDP-43 shows a very high FRET efficiency of ~0.9 (Figure 4a), which indicates a collapsed conformation.

      A minor comment, if the authors would like to compare the specific activity of different proteins, they should use equal molarities of the different proteins and not equal amounts.

      As the Reviewer suggests, we will include experiments at equal molarities in the revision.

      - For a one-way ANOVA, the response variable residuals have to be normally distributed. With an n = 3 this cannot be tested. Thus, the quantifications of the results shown in figure 1 and 3 are not reliable.

      We thank the Reviewer for their critical comment on the statistical analysis. We would like to clarify that statistically significant differences in aggregation between conditions compared to a control are based on Dunnett’s test. While ANOVA is typically first performed to test for any significant difference in means before performing a post-hoc test, Dunnett’s test is independent and can be performed without ANOVA.

      Following the Reviewer’s advice, we carefully re-examined our assumption of normality for this data. It is reasonable to perform Dunnett’s test on a sample size of n = 3, and it is generally safe to assume that data from three independent experiments will be reasonably normally distributed. In support of this, performing Kolmogorov-Smirnov test on our data in Figure 1 showed none of the groups differ significantly from normal distributions with the respective mean and standard deviation (p-values greater than 0.05). Thus, we believe it is reasonable to assume the data are normally distributed, the residuals normally distributed, and our statistical analyses reliable. This analysis will be included in the revision to support the normality assumption.

      However, even if we did not assume a normal distribution of our data in Figure 1, we still would have obtained statistically significant differences; If we had relied on a Kruskal-Wallis test as a non-parametric equivalent of ANOVA, thus making no assumption of normality, we would have seen p = 0.005176, a value much lower than our significance level of α = 0.05, indicating sufficient evidence that there is a difference in aggregation among these groups.

      - The title is imprecise and overstate the presented data:

      'canonical chaperone' suggest that their results are valid for chaperones in general. However, they only tested DNAJA2 in the single -molecule FRET assay. Moreover, HAPA8, another canonical chaperone, obviously had an opposite effect on TDP-43 aggregation (Fig.1). Similarly, they only tested Hero11. Thus, 'canonical chaperone' has to be replaced by 'DNAJA2' and 'a heat-resistant obscure (Hero) protein' by 'Hero11'. Similarly, the term 'conformational modulation' is not as concise one would one expect for the title of a research paper.

      We would like to clarify that the Reviewer means HSPA8 (not HAPA8). According to the Reviewer’s suggestion, we will change the title to “DNAJA2 and Hero11 mediate similar conformational extension and aggregation suppression of TDP-43”.

      Reviewer #3 (Significance (Required)):

      In a recent study (PLosBiol, 2020) the same authors described an interesting class of proteins they call 'Hero'. Based on their analyses in cultured cells and transgenic Drosophila models the authors concluded that 'Hero' proteins protect against protein instability and aggregation. So far, this class of proteins has not been analyzed by independent groups.

      In the current manuscript, they mainly confirm their own previous finding that Hero 11 prevents aggregation of TDP-43 and present very few new data that would provide new insights. Specifically, only the FRET assays shown in figure 2 and 4 are really new, which, by the way, could easily be shown in one figure.

      We thank the Reviewer for their critical evaluation of our current study. As the Reviewer suggests, we believe our smFRET results provide new insights into how Hero11 and DNAJA2 function. We would like to emphasize that, rather than confirming our previous findings, our current manuscript mainly addresses a critical point that remained unknown in our previous study by investigating the mechanism of how Hero proteins prevent aggregation. Moreover, to our knowledge, this is the first observation of the TDP-43 LCD, with the effect of a pathogenic mutation, at the single-molecule level.

    1. Dee code uit de vraag hierboven op de uitgebreide manier kunnen uitwerken in C:

      Kromme zin! = De uitgebreide code uit de vraag hierboven zou je kunnen uitwerken in (de programmeertaal) C.

    1. Turbo Plugin SDK

      Rename this to UPI Turbo

      Give a gist of what UPI turbo is. Create a separate page for UPI Turbo under Payment Methods and link it here.

      Under the headless and UI integration sections, add an intro paragraph.

      in the code samples, change the mobile number to 9900099000

    1. by a sect of fanatics

      What people outside Salem viewed the Puritans as. Puritans refers to the villagers living in Salem who practise a stricter moral code of Christianity than is generally followed.

    1. Pew reports asteady increase in teen Internet use, from 73 percent in 2000, to 87 percentin 2004, to 95 percent in 2007, and a rapid increase in mobile phone owner-ship, going from 45 percent in 2004 to 71 percent

      Even with modern culture amongst teenagers pointing younger people to be tech savvy it seems that society in general has normalized the standard of use and ownership of a phone to the point that it is essential to function in a variety of situations, such as ordering food with a qr code. With the rapid evolution of tech phones are becoming more of an essential and less of a privlage.

    1. It’s difficult to say without more information about what the code is supposed to do and what’s happening when it’s executed. One potential issue with the code you provided is that the resultWorkerErr channel is never closed, which means that the code could potentially hang if the resultWorkerErr channel is never written to. This could happen if b.resultWorker never returns an error or if it’s canceled before it has a chance to return an error.

      addy

    1. Adam Marshall Dobrin • You Technocrat Founder at XCALIBER DAO/ARKLOUD.XYZ. Writer. Coder. Futurologist. Aspiring dad. 1m • 1 minute ago I came to the particular city that I am in to prove that Operation Gunsider and Project Y were "ruce's" ... #informationoperations that were part of a grand design that literally includes the whole of "Majestic" which is another key word in the research path to where we are going.It includes more than that, much more--on this song and who you all are. Closer to God, than ... "most." Closer to me, too. It includes the entirety of the KJV and "all of religion as seen through the eyes of the Christ." It includes missions to teach Latin and English and "reading and writing" to the entirety of humanity; and at this point we have to pause and really understand what is going on.We have an "Adam code" that is something like ##305407; its a word that includes research and development on what to do when the "everybody up?" generation fails or succeeds; it is a way to get "way more voters involved" in a place where we once had a world that could have have saved its past, clearly do to the inability to see it at all, much less travel to it. Today I need basic computer knowledge and general concepts of things like terraforming and physics added to the list of things that are "required to vote" in the Constitutional realms considerably here perhaps the somewhere between the third and the fifth Houses of the Capitol of the United States.I would like to make the entirety of the past, the entire A.D. timeline and perhaps something bigger than that "intelligent, omni-important, and oligarchical rulers of themselves." I would like to see Technocracy flourish as a word that literally involves the Halo of Cortana and its connection to "how we vote." I wrote for a brief time on how to engage an audience in something called "subconscious voting" and how to connect "checking your vote" to the only Labor the Party has to accomplish on it's WED/hour of "required work left once we are done with ... automation, roboticization and the revolution colloquially associated with Bolshevik and Ford.I need us to think today what kind of classes we would need to put together for ... "members of the midieval civilization of lore" ... people who coincided in cities with Cathedral's or Mission's that match the architecture associated with the One True Church--whether it be the source of the Spanish Armada or the Eastern Orthodox Byzantine Fault. What kind of classes are required to understand things like "game theory" and "solar fusion" and also the inner workings of Heaven enough to intelligently vote on whether or not another group of people, for instance, is "educated enough to be considered a peer, or a citizen."UK Home Office U.S. Immigration and Customs Enforcement (ICE) Immigration And Nationality Services (IANS) It's interesting to "see this answer" INS has aided me here in assuming you understand that acronym has changed from the historical truth, as we consider "naturalization" and what kind of history/nationhttps://lnkd.in/dvUKdGZf

      OPERATION JAZZORCIVILIZE

      Jazzercize is something my mommy did around the time I was born. It's literally just "jamboree" or some kind of popular women's ((predominantly)) exercize group. They met all over America in the 80's and they wore some funny socks ;) It is the word associated with "changing everybody up" to include the entirety of the capable group of humanity ever living on a rock with religion. It could be bigger than that, but here I've sort of defined it to literally link with significance only the Church of Rome and things that came after it. It is literally what it is, the A.D. timeline. It most likely includes a group of "less than all" who carried things like knowledge and Asimov's Foundation from the Pentagon Technocrat's "Torah guild" ... many thousands of years before the day Christ appears to have been born or died in history.

      this is a big deal. I have come to a place in Deseret I associate with a military group that is literally and ((I pray)) responsible for the colonization or the co-colonization of the known galaxy. I believe we have a number of coveted extra-galactic operations aswell, and that they include Soviet and American as well as European operations outside of Deseret. I have come here to prove that Operation's Gunsider, Holocaust and Y are "Information Operations" which is modern NSA-talk for "propaganda designed for a purpose." I do not believe the technologies are real, and it's important to understand I lived through the time others call "the Cold War" and saw with my own eyes videos of rocket's traveling along United States Federal USHWY1 up and down the Eastern Seaboard ... rather than I-95 though it existed because of known and intentional fortifications on that road for equipment so heavy it would crumble bridges. We are in a place where ... London may be the only bridge left in existence after the move from NM to NV .. if you know what it means to lose pillars of Samson in a place like the Holy Temple's heart.

      I need this to be taken seriously. If we want to stop moving towards a point where we are going to be angrier with each other than we should be; I need someone in the world with a public company to hire me to build something ... "more public than companies." It starts with software and it ends with codification in the Constitution and beyond. It's a "big deal" this is a revolution bigger than the invention of voting and money; this is big. I need a pay check from a company with that kind of oversight at the very least.

      I am open to FTSE, CAC, ASX, DAX, or similar companies on those exchanges to ones listed on the S&P 500 or the DOW. The exchanges listed are not all inclusive, but it means something that I "know what they are" I studied them and we need something at least as big as an entity governed by laws to be listed on "those" ... a private company in Dubai, for instance; is not large enough to do this properly.

      I came to the particular city that I am in to prove that Operation Gunsider and Project Y were "ruce's" ... #informationoperations that were part of a grand design that literally includes the whole of "Majestic" which is another key word in the research path to where we are going.

      It includes more than that, much more--on this song and who you all are. Closer to God, than ... "most." Closer to me, too. It includes the entirety of the KJV and "all of religion as seen through the eyes of the Christ." It includes missions to teach Latin and English and "reading and writing" to the entirety of humanity; and at this point we have to pause and really understand what is going on.

      We have an "Adam code" that is something like ##305407; its a word that includes research and development on what to do when the "everybody up?" generation fails or succeeds; it is a way to get "way more voters involved" in a place where we once had a world that could have have saved its past, clearly do to the inability to see it at all, much less travel to it. Today I need basic computer knowledge and general concepts of things like terraforming and physics added to the list of things that are "required to vote" in the Constitutional realms considerably here perhaps the somewhere between the third and the fifth Houses of the Capitol of the United States.

      I would like to make the entirety of the past, the entire A.D. timeline and perhaps something bigger than that "intelligent, omni-important, and oligarchical rulers of themselves." I would like to see Technocracy flourish as a word that literally involves the Halo of Cortana and its connection to "how we vote." I wrote for a brief time on how to engage an audience in something called "subconscious voting" and how to connect "checking your vote" to the only Labor the Party has to accomplish on it's WED/hour of "required work left once we are done with ... automation, roboticization and the revolution colloquially associated with Bolshevik and Ford.

      I need us to think today what kind of classes we would need to put together for ... "members of the midieval civilization of lore" ... people who coincided in cities with Cathedral's or Mission's that match the architecture associated with the One True Church--whether it be the source of the Spanish Armada or the Eastern Orthodox Byzantine Fault.

      What kind of classes are required to understand things like "game theory" and "solar fusion" and also the inner workings of Heaven enough to intelligently vote on whether or not another group of people, for instance, is "educated enough to be considered a peer, or a citizen."

      UK Home Office U.S. Immigration and Customs Enforcement (ICE) Immigration And Nationality Services (IANS)

      It's interesting to "see this answer" INS has aided me here in assuming you understand that acronym has changed from the historical truth, as we consider "naturalization" and what kind of history/nation

      https://lnkd.in/dvUKdGZf

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to reviewers

      Reviewer #1

      Reviewer #1 (evidence, reproducibility and clarity (required)):

      Winter et al. present a study of Ebola virus fusion in the acidic environment of the late endosome. Based on cryo-ET of Ebola virions undergoing entry into cells, they note that the VP40 matrix is disassembled and dissociated from the viral membrane in virions seen in the endosome. Subsequent in vitro and computational analyses suggest that protons diffuse across the viral membrane and neutralize anionic lipids on the inner leaflet. They argue that this loss of negative charge reduces the affinity of VP40 for the viral membrane. They further suggest that VP40 dissociation from the viral membrane precedes GP-mediated membrane fusion and contributes to reduction in the energy barrier for membrane stalk formation. Whereas most studies have focused on the importance of acidic pH in triggering GP conformational changes during fusion, the present work contributes new appreciation for VP40-membrane interactions.

      We would like to thank the reviewer for all the insightful comments and appreciation of the novelty.

      In the cryo-ET experiments aimed at visualizing Ebola entry, do the authors see evidence of viral membrane fusion? There is no mention of this in the text. Knowing that the virions that show disassembly of the VP40 matrix are in fact the virions that productively enter cells would support the conclusions of the study. As is stands, one is forced to wonder whether the virions that show VP40 disassembly prior to fusion ultimately fuse.

      *We first note that the EBOV virions shown in Figure 1 entering host cells were captured by cryo-ET at 48 hours post infection and resulted from 2-3 rounds of infection, thus the virions can productively enter the cells by micropinocytosis. Virions that are not able to undergo membrane fusion would be processed in the lysosomes and would not be detectable by cryo-ET at 48 hours post infection. In addition, the virions captured in late endosomes contain nucleocapsids, hence these virions are likely infectious. Together, this is good evidence that we really see events after successful membrane fusion. *

      *We fully agree with the reviewer that capturing a fusion event would provide further proof that fusion depends on prior disassembly of the VP40 matrix layer. To address this, we acquired additional data on cells infected at different time-points post-infection (15 cells imaged); regrettably, we have not been successful in capturing a membrane fusion event, presumably due its fast kinetics. In this study we are technically limited with the amount of the virus we can use for infection in BSL4. The current dataset was generated at an MOI of 0.1 and this makes capturing entry events difficult as we would need an MOI of at least 100-1000 to increase the chances of capturing such a rare event. *

      *Considering the technical difficulties to perform the experiment under BSL4 conditions, we have in addition performed a similar experiment using EBOV VLPs at high concentration (estimated MOI > 100) composed of VP40 and GP (Fig. S5). Despite the high VLP concentration, we could only find 2 tomograms out of 18 tomograms showing VLP entry events. These clearly show that the VP40 matrix is disassembled in VLPs residing in endosomes. The same lamellae displayed sites of viral fusion as evident from enlarged endosomal membrane surfaces studded with GPs facing endosomal lumina. Hence, this new data supports our results that VLPs that undergo VP40 disassembly are able to fuse. We have included the new supplementary figure S5 and added the following sentence to the main text: *

      Lines 96-102: “We were not able to capture virions residing in endosomes in the process of fusing with the endosomal membrane, presumably because virus membrane fusion is a rapid event. However, in a similar experiment using EBOV VLPs composed of VP40 and GP, we could confirm the absence of ordered VP40 matrix layers in VLPs inside endosomal compartments. Moreover, we were able to capture one fusion event and several intracellular membranes studded with luminal GPs, indicating that fusion had taken place (Fig. S5).”

      In the cryo-ET experiments that evaluate VP40 disassembly in vitro, why do the authors leave out NP from their VLP preparations? There is some evidence in the literature (Li et al., JVI 2016) that NP is necessary to form particles with native morphology. If the authors feel that NP is not necessary for their experiment, perhaps this could be noted.

      *Thank you very much for this important comment. Throughout this study, we mainly focused on the fate of the VP40 matrix during entry and thus reduced the complexity of the VLPs used to the minimum – VP40 and GP, so indeed NP was left out before. To address the role of the nucleocapsid in Ebola VLPs uncoating, we have now also included data on VLPs prepared by expression of nucleocapsid components (NP, VP24 and VP35) in addition to GP and VP40. Cryo-ET analysis of these VLPs showed that VLPs mainly contain loosely coiled nucleocapsid. This is consistent with a study by Bharat et al 2012, which shows that compared to virions, VLPs displayed heterogeneous nucleocapsid assembly states and reduced incorporation of nucleocapsids. It is important to note that VLPs containing nucleocapsid also displayed disassembled VP40 matrices at low pH (Fig. S7). Hence, nucleocapsid proteins do not influence the VP40 disassembly driven by low pH and GP-VP40 VLPs can be used as model to study VP40 uncoating. *

      *We included a statement shown on lines 150-153: “We further repeated the experiment using VLPs composed of VP40, GP and the nucleocapsid proteins NP, VP24 and VP35, and observed the same low pH-phenotype described above. These results show that nucleocapsid proteins do not influence the VP40 disassembly driven by low pH.” *

      The authors argue that acidic pH neutralizes the charge of PS phospholipids, thereby removing the electrostatic interactions of basic residues in VP40 and PS. They also note in the Methods section that 7 amino acids in VP40 are predicted by PROPKA to be protonated at pH 4.5. If the authors feel that protonation of these 7 amino acids is not involved in the loss of affinity for PS, this could be stated explicitly and justified. Could the protonation of these 7 amino acids contribute to disassembly of the VP40 lattice, rather than dissociation from the membrane?

      Thank you for this interesting comment. We note that the amino acids predicted to be protonated (*E76, E325, H61, H124, H210, H269, H315, see below) are far away from the interaction interface with the membrane and also away from the intra-dimerization domain. Hence, they do not likely contribute to the loss of affinity for PS but may contribute to conformational changes that facilitate the disassembly of the VP40 matrix. For clarification, we have added the following statement to the methods section: *

      Lines 541-544: “Importantly, these residues are located away from the interaction interface of VP40 with the membrane and their protonation accordingly does not influence membrane-binding. However, protonation of these residues may contribute to conformational changes that facilitate the VP40 matrix disassembly.

      Minor: Figure S5C is difficult to interpret. The red frame on the bars that indicates data acquired at low pH is nearly invisible. Better might be to indicate explicitly (ie, with words) the pH at which data were acquired.

      Thank you very much for this comment. We have changed the design of the graph accordingly. Please note that the figure numbering has changed and that Figure S5C is now Figure S6C.* * Reviewer #1 (significance (required)): The significance of the study stems from the idea that the VP40 lattice and its interaction with the viral membrane plays a direct role in facilitating viral fusion. To my knowledge, this has not been previously addressed. The significance would be considerably increased if the authors were able to demonstrate by cryo-ET that the virions with disassembled VP40 were in fact the virions that productively fused. Nonetheless, this work should be of broad interest to researchers studying viral fusion as it may represent a phenomenon relevant to numerous viruses that enter cells via the endocytic route.

      Reviewer #2 Reviewer #2 (evidence, reproducibility and clarity (required)):

      The manuscript by Winter et al., entitled "The Ebola virus VP40 matrix undergoes endosomal disassembly essential for membrane fusion" describes the structural aspects of the events that precede Ebola virus (EBOV) membrane fusion in late endosome and virion uncoating in the cytosol. By combining state-of-the-art cryo-electron tomography (cryo-ET) with biophysical and computational techniques, they have elucidated the pivotal role of the ebolaviral matrix virion protein 40 (VP40) in modulating the fusion process, in particular discovering that disassembly of the VP40 ordered lattice is low pH-driven, occurs despite the absence of a viral ion channel within the filovirus envelope and takes place through the weakening of VP40 interactions with lipids at the interface between the ebolaviral envelope and matrix. Overall, the manuscript is well written and the research work is very well conceived, with solid orthogonal experimental approaches that mutually validate their respective results. It is opinion of this reviewer that the paper contributes to the elucidation of a key step in the EBOV infection cycle and that it will be of great interest for the readership of Review Commons and for the community of structural biologists. Therefore, I recommend the publication of this paper, however after some minor revision to the text, the figures and the figure legends, which show inconsistencies in the terminology used, the acronyms and could be easily improved by some little graphical editing.

      Thank you very much for your positive feedback and your comments.

      Comments:

      • By starting their abstract and introduction sessions with the term "Ebola viruses" the authors are (on purpose?) preparing the reader to the implicit statement that their findings could be a paradigm model for the other members of the Ebolavirus genus. This is an exciting picture, especially in perspective of VP40-targeting drugs development. Therefore, although conclusions in this sense would probably require further studies, I encourage the authors to implement their figure 3 (or related supplementary figure) with a multiple-sequence alignment, and the relative text in the manuscript, by showing if and how much the basic patch at the C-terminus of VP40 is conserved within the Ebolavirus genus, especially the residues Lys224, Lys225, Lys274 and Lys275.

      Thank you very much for this comment. We have added a corresponding sequence alignment highlighting the high conservation of the basic patch of amino acids across all Ebola virus species (Suppl. Fig. S6). In the text, we refer to the sequence conservation as follows:

      Lines 213-215: “These interactions are driven by basic patches of amino acids which are highly conserved across all EBOV species (Fig. S8 H), further emphasizing their importance in adaptable membrane binding.”

      • It is a bit inconvenient for the reader to follow how a story unfolds while jumping back and forth between figures, and this is why I would recommend to move the period of the sentence at lines 88-91 to the session where figure 5 is discussed.

      *We refer in fact to Figure 1 and fixed the reference accordingly (line 95). *

      • Please, avoid the use of the slang "Ebola" without the apposition "virus", and make the text consistent throughout the manuscript by only using the acronym of each term after it was introduced for the first time.

      Thank you for this comment. We have thoroughly revised the use of technical terms.

      Minor revisions: Line 1: "matrix protein undergoes" We refer here to the entire VP40 matrix layer composed of many VP40 proteins and not to single VP40 proteins (as the individual proteins do not disassemble, but their macromolecular assembly does). For clarification, we changed the title to “matrix layer undergoes”.

      Line 19: "the matrix viral protein 40 (VP40)" We have corrected the statement.

      Line 18: considering that a virus "exists" in the form of a virion while temporarily located outside the cell, and as a "molecular entity" consisting of viral proteins and nucleic acids organised in macromolecular complexes during its life cycle inside the infected cell, this reviewer encourages the authors to rephrase as follows: " Ebola viruses (EBOVs) virions are filamentous particles, ..." Thank you for your suggestion. We have rephrased it to: „Ebola viruses (EBOVs) assemble into filamentous virions“ (line 18).

      Lines 35-36 and line 40: "that is determined by the matrix made up by the viral protein 40 (VP40), which drives ..." And then, directly use the acronym VP24 at line 40

      We have corrected the statement.

      Line 40: as VP24 and VP35 interact with NP but do not interact with the ssRNA genome, please rephrase as follows "the nucleoprotein (NP) which encapsidates the ssRNA genome, and the viral proteins VP24 and VP35 which, together with NP, form the nucleocapsid"

      We have corrected the statement.

      Lines 47-48: "...fusion glycoprotein (GP)...[...] the ebolaviral envelope"

      We have corrected the statement.

      Line 51: "...remarkably long virion of EBOVs undergoes..."

      We have rephrased the statement: line 55: “…remarkably long EBOV virions undergo…”

      Line 63: "... in vitro, and in endo-lysosomal compartments in situ, by cryo-electron..."

      We have corrected the statement.

      Lines 70-71: " to shed light on EBOVs ... [...] with EBOV (Zaire ebolavirus species, Mayinga strain) in biosafety level 4 (BSL4) containment"

      We have corrected the statement.

      Line 72: chemically fixed by? (PFA and GA acronyms have been annotated in figure 1, but should be first mentioned in their explicit form in the text)

      We have now mentioned annotations for GA and PFA both in the main text and in the figure legend in their explicit forms.

      Line 73 (cryo-FIB)

      We have corrected the acronym.

      Line 80: EBOV virions

      We have corrected the statement.

      Figure 1A and line 97: for consistency with the terminology used in the main text, should be perhaps in the second step preferred the term "vitrification" instead of cryofixation? Readers not familiar with the field could be confused by the use of the two synonyms

      We have replaced the term as suggested.

      Lines 92-93: "...these data indicate [...] and suggest..."

      We have corrected the statement.

      Figure 1C and line 100: in the color legend EBOV is annotated as dark teal, however in the segmentation of the reconstructed tomogram there are three objects, one of which in dark teal is evidently a portion of EBOV virion inside the endosome, and other two are in different shades of green. What are those? Please, could author specify their identity in the figure legend with their corresponding color code? The same applies to supplementary figure S2 (see comment below).

      Thank you very much for this comment. All three green objects are EBOV virions. For clarification, we have added numbers 1-3 to the figure and legend and adjusted the text in the legend accordingly (lines 109-110).

      Line 95: "...tomography of EBOV virions..."

      We have corrected the statement.

      Line 98: "...showing EBOV virions..." (This reviewer refers to the use of the term 'EBOVs' as for different species within the genus rather than for different EBOV particles within a dataset)

      We have corrected the statement.

      Line 105: "... a purified EBOV before..." *We realized a mistake in our phrasing: the virion shown in Fig. 1 H is not purified, but a virion found adjacent to the plasma membrane of an infected cell. We have changed the phrasing accordingly (lines 117-118). *

      Line 110 and 113: "...EBOV matrix..." And "EBOV virus-like particles (VLP)"

      We have corrected the statement.

      Line 140, 141, 145 and 147: "EBOV VLPs" and "EBOV VLP"; idem at lines 188-189, 209 and anywhere else in the manuscript (including figure 4A) We have corrected the use of “EBOV VLP(s)” as suggested.

      Line 235: "influenza virus ion channel..."

      We have corrected the statement.

      Line 249: please, use directly the above-introduced acronym for the detergent

      We have revised the use of acronyms.

      Figure 5F: in plot's X axis label: thermolysin (T)?

      Yes, this is correct and stated in the figure legend.* * Line 342: "EBOV have remarkably long..."

      We have corrected the statement.

      Line 420 "...matrix-specific"

      We have corrected the spelling error.

      Line 464: "grids"

      We have corrected the spelling error.

      Line 465: "for cryo-FIB milling"

      We have corrected the statement.

      Line 611: "influenza virus M2 ..." (Please, from which influenza virus strain does the gene come from? Alternatively, which is the NCBI Protein and/or UniProt database code?)

      We have added the information to the Methods (line 648): “….A/Udorn/307/1972 (subtype H3N2))…”

      Line 623: please, use the above-designated acronym for the detergent

      *We have used the acronym as suggested. *

      Line 716: "...based on cryo-ET..." We have corrected the statement.

      Line 718: "influenza virus" We have corrected the term.

      Line 734: "cryo-ET data" We have corrected the term.

      Fig. S8: for consistency with the main text, "thermolysin" We have corrected the spelling of thermolysin throughout the manuscript.

      Fig. S2, C and F: are these EBOV virions (as mentioned in the figure title) or EBOV VLPs (as the legends in the two panels of this figure seem to suggest)? Please, the authors should clarify

      Thank you very much for spotting this mistake! These are indeed EBOV virions and we have changed the legends within the figure accordingly.

      Line 1046: "malleable lipid envelope of the EBOV"; this adjective sounds confusing; the reviewer encourages the authors to rephrase for more clarity.

      We have removed the adjective „malleable”.

      Reviewer #2 (significance (required)): see above.

      __Reviewer #3__Reviewer #3 (evidence, reproducibility and clarity (required)):

      Winter and colleagues describe the molecular architecture of Ebola virus during entry into host cells. The main claims of the paper are that VP40 is disassembled prior to fusion. Disassembly is driven by the low pH environment in the endosomes. PH-induced uncoating works via "passive equilibration" because the Ebola virus envelope does not contain an ion channel. The authors conclude that structural remodeling of VP40 acts as a molecular switch coupling uncoating to fusion. The main novel results of the manuscript are: In situ cryo-ET of endosomal compartments shows EBOV particles with intact condensed nucleocapsids and disordered protein densities that may relate to detached VP40. Five EBOV particles were imaged in the endosome and all had detached VP40 layers. Controls, budding virions and extracellular virions showed intact VP40 layers. Incubation of VP40-Gp VLPs with a pH 4.5 buffer leads to the disorder of the VP40 matrix in vitro, which is independent of Gp presence in the VLPs. MD simulation showed VP40 dimer binding to model membranes containing 30 % PS at pH7 and reduced binding at pH 4.5. Lipidomics revealed the lipid composition of VP40-Gp VLPs demonstrating 9% PS.

      VP40-PHluorin fusions were used to determine acidification of VLPs in vitro and to calculate a permeability coefficient of 1.2 Å sec-1, which is quite low compared to the permeability of the plasma membrane (345 Å sec-1). Next they modeled membrane fusion showing that fusion is more favorable after VP40 disassembly, especially favoring stalk formation. The authors propose further that fusion pore opening is more favorable in the presence of VP40. The authors claim that strong interactions of lipids with VP40 stabilizes the hemifusion intermediate. VP40 Gp VLPs can enter host cells independent of pH once Gp has been activated by thermolysin.

      We thank the reviewer for these interesting comments and valuable suggestions.

      Some of the results are over interpreted and require appropriate modifications.

      Main points that need to be addressed: Imperfections of the membrane could be induced by proteins. Does acidification of the virion depend on GP and its transmembrane region? This can be tested with chimeric GP replacing its TM by unrelated trimeric TMs.

      We agree that this is important to consider. We have addressed this question in Fig. 2 K using VLPs composed of VP40 alone. These VLPs lack GP and still display luminal acidification as evident from the disassembled VP40 matrix when incubated at low pH. Therefore, acidification does not depend on GP. For clarification, we have adjusted the following sentence in the discussion:

      Lines 410-413: “Using VLPs of minimal protein composition (VP40 and GP, and VP40 alone), we show that VP40‑disassembly, i.e. the detachment of the matrix from the viral envelope is triggered by low endosomal pH (Fig. 2). This indicates that VP40 disassembly does not depend on structural changes of other viral proteins, including GP, and is driven solely by the acidic environment.*” *

      Virus entry assays, line 292. The low pH is not only used for Gp cleavage, but induces the conformational changes leading to the post fusion conformation of Gp2. The authors need to check what happens to Gp once it is cleaved by thermolysin. Is this sufficient to induce the conformational changes in Gp? And if so how does entry of such VLPs work, because once the conformational change is triggered, GP2 will adopt the post fusion conformation which is inactive in fusion. This requires further clarification.

      To our knowledge, there is only one study showing that EBOV GP2 changes conformation at low pH in the form of a re-arrangement of the fusion peptide from an extended loop to a kinked conformation (Gregory et al 2011). Importantly, low pH alone is not sufficient to trigger GP mediated membrane fusion and NPC1 is needed as a trigger for membrane fusion process (Das et al, 2020). Hence proteolytically processed GP requires NPC1 binding to change its conformation to post-fusion state. We addressed this question by using pre-cleaved (= GP2) and low pH- treated VLPs in our entry assay (Fig. 5 F). Since low pH-treated VLPs enter host cells as efficiently as VLPs incubated at neutral pH, and low pH-treated and additionally pre-cleaved VLPs enter even more efficiently, it is highly unlikely that low pH triggers the post-fusion conformation as this should inhibit virus entry (as the reviewer pointed out). In conclusion, low pH does not induce the post-conformation in GP2 and we have included a respective sentence for clarification:

      Lines 339-343: * Since thermolysin-treated EBOV VLPs efficiently enter untreated host cells at neutral and low pH, we further conclude that low pH alone does not induce the GP2 post-fusion conformation, which would inhibit virus entry. Together, this suggests a role of low endosomal pH beyond proteolytic processing of EBOV GP, likely for the disassembly of the VP40 matrix.”*

      In the fusion model, the authors claim that VP40 disassembly is more favorable for stalk formation, which is likely true. However, they also claim that strong VP40 interaction, which I would interpret as VP40 filaments interacting with the membrane, favor fusion pore opening. The tomograms and the in vitro experiments with VLPs indicate that the complete VP40 matrix is detached from the membrane under low pH conditions.

      We would like to stress that the modelling results for hemifusion formation and pore opening are independently calculated but have to be interpreted together because they occur sequentially. Hemifusion precedes formation of the pore and hence even though the model shows that the fusion pore opening is favored in the presence of VP40 interaction, membrane fusion cannot proceed to this stage because hemifusion is blocked until the VP40 matrix layer disassembles from the membrane. We apologize for lack of clarity, and we have added the sentences:

      Lines 315-318: “However, it is important to note that hemifusion precedes pore formation in the membrane fusion pathway. Since the disassembly of the VP40 matrix is required for hemifusion and hence for the initiation of membrane fusion, it determines the outcome of the membrane fusion pathway.*” *

      VLPs are purified. Can the authors exclude the possibility that the purification protocol does not damage the VLP membrane leading to in vitro acidification in a low pH environment? Can some of the assays be repeated with non-purified VLPs?

      *Thank you very much for this important comment. To address this question, we had performed the cryo-ET experiments using purified and unpurified VLPs and found that they are virtually indistinguishable. Importantly, unpurified VLPs also undergo VP40 disassembly. We now show images from unpurified VLPs in a supplementary figure (Fig. S7). Thereby, the manuscript contains data of purified VLPs while we also provide proof that the purification protocol does not influence the disassembly of the VP40 matrix. We added the following explanatory sentence to the main text: *

      Lines 151-156: “*We further repeated the experiment using VLPs composed of VP40, GP and the nucleocapsid proteins NP, VP24 and VP35, and observed the same low pH-phenotype described above (Fig. S5 C). Performing the experiments on unpurified VLPs harvested from the supernatant of transfected cells confirmed that the purification protocol applied did not influence the disassembly of the VP40 matrix (Fig. S7). “ *

      Does acidification only work at pH 4.5?

      *We also attempted to verify the acidification of VLPs at higher pH (~5.5. and ~6.0) by cryo-ET, however, subtle structural differences were difficult to quantify. Considering the lower permeability of the VLP membrane compared to the plasma membrane, we think that acidification occurs indeed also at higher pH (as shown for cells), albeit at slower kinetics. *

      Minor points Line 37: Ruigrok et al. 2000 J Mol Biol showed first that Ebola VP40 requires negatively charged lipids for interaction.

      *Thank you for pointing out this reference. We have included it in the text. *

      Fig. 1f: Is VP40 detaching as a filament?

      We have not observed that VP40 detaches as a filament or a linear segment of multiple VP40 dimers. *Since the VP40 dimer is inherently flexible (Fig. 3, Fig. S8) and can rotate along the N- and C-terminal intra- and inter-dimer interfaces, we believe disassembly occurs in a non-ordered fashion (not as filaments, see also Figure 2 G-K). *

      References 8 and 28 are the same. We have corrected the reference duplication.

      Lipidomics: The authors find only 9% PS in the VLPs. How do these results compare to the composition of other enevloped viruses that have been reported to assemble on negatively charged lipids.

      *We compared the lipid composition of the EBOV VLPs to the lipid composition of influenza viruses and HIV, which both bud from the plasma membrane and require negatively charged lipids. When grown in eggs, the envelope of influenza viruses contains 22-25 % PS (Ivanova et al 2015, Li et al 2011), and approximately 12% when produced from MDCK cells (Gerl et al 2012). The envelope of HIV virions produced from HeLa or MT4 cells contains 10-15% PS. These numbers suggest that the producing cell line strongly influences the lipid composition of the virus particles. Besides differences in the producing cell line, the lower amount of PS found in EBOV VLPs could have multiple implications: first, apart from PS, PIP2 has also been shown to interact specifically with VP40 at budding sites in the plasma membrane (Jeevan et al 2017, Johnson et al 2018) and thus also contributes to virion assembly (potentially allowing for a lower PS concentration); second, as recently shown for paramyxoviruses (Norris et al 2022), binding of PS to viral proteins is not based on charge alone but may include specific binding – in which case a high affinity of viral proteins to PS may allow for a lower PS concentration in the target membrane. Overall, the rather low PS content in Ebola VLPs might be important for VP40 interaction and low pH-driven disassembly. *

      EBO virus was suggested to assemble at lipid rafts. Is this reflected by the lipid composition?

      *Yes, that is correct. A hallmark of lipid rafts is the enrichment of cholesterol and sphingomyelin (~32 mol% cholesterol, ~ 14 mol% sphingomyelin) in the microdomains (Pike et al 2002). The lipid composition of the EBOV VLPs determined in our study (~ 39% cholesterol and ~10 mol% sphingomyelin) is consistent with the assembly at lipid rafts. Minor differences stem from the different cell lines and lipidomic approaches used to determine the lipid species. *

      Reviewer #3 (significance (required)): In summary, the manuscript is of high technical quality and the observation that VP40 detaches from the viral membrane prior to membrane fusion is novel and interesting to the field of virus fusion. How acidification occurs in the absence of an ion channel remains to be determined. The authors provide little insight how this might work. The strong part of the manuscript is the EM part, which shows convincing detachement of the VP40 matrix. I cannot comment too much on the modelling part, which, however, sounds solid.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The manuscript by Winter et al., entitled "The Ebola virus VP40 matrix undergoes endosomal disassembly essential for membrane fusion" describes the structural aspects of the events that precede Ebola virus (EBOV) membrane fusion in late endosome and virion uncoating in the cytosol. By combining state-of-the-art cryo-electron tomography (cryo-ET) with biophysical and computational techniques, they have elucidated the pivotal role of the ebolaviral matrix virion protein 40 (VP40) in modulating the fusion process, in particular discovering that disassembly of the VP40 ordered lattice is low pH-driven, occurs despite the absence of a viral ion channel within the filovirus envelope and takes place through the weakening of VP40 interactions with lipids at the interface between the ebolaviral envelope and matrix. Overall, the manuscript is well written and the research work is very well conceived, with solid orthogonal experimental approaches that mutually validate their respective results. It is opinion of this reviewer that the paper contributes to the elucidation of a key step in the EBOV infection cycle and that it will be of great interest for the readership of Review Commons and for the community of structural biologists. Therefore, I recommend the publication of this paper, however after some minor revision to the text, the figures and the figure legends, which show inconsistencies in the terminology used, the acronyms and could be easily improved by some little graphical editing.

      Comments:

      • By starting their abstract and introduction sessions with the term "Ebola viruses" the authors are (on purpose?) preparing the reader to the implicit statement that their findings could be a paradigm model for the other members of the Ebolavirus genus. This is an exciting picture, especially in perspective of VP40-targeting drugs development. Therefore, although conclusions in this sense would probably require further studies, I encourage the authors to implement their figure 3 (or related supplementary figure) with a multiple-sequence alignment, and the relative text in the manuscript, by showing if and how much the basic patch at the C-terminus of VP40 is conserved within the Ebolavirus genus, especially the residues Lys224, Lys225, Lys274 and Lys275.

      • It is a bit inconvenient for the reader to follow how a story unfolds while jumping back and forth between figures, and this is why I would recommend to move the period of the sentence at lines 88-91 to the session where figure 5 is discussed.

      • Please, avoid the use of the slang "Ebola" without the apposition "virus", and make the text consistent throughout the manuscript by only using the acronym of each term after it was introduced for the first time.

      Minor revisions:

      Line 1: "matrix protein undergoes"

      Line 19: "the matrix viral protein 40 (VP40)"

      Line 18: considering that a virus "exists" in the form of a virion while temporarily located outside the cell, and as a "molecular entity" consisting of viral proteins and nucleic acids organised in macromolecular complexes during its life cycle inside the infected cell, this reviewer encourages the authors to rephrase as follows: " Ebola viruses (EBOVs) virions are filamentous particles, ..."

      Lines 35-36 and line 40: "that is determined by the matrix made up by the viral protein 40 (VP40), which drives ..." And then, directly use the acronym VP24 at line 40

      Line 40: as VP24 and VP35 interact with NP but do not interact with the ssRNA genome, please rephrase as follows "the nucleoprotein (NP) which encapsidates the ssRNA genome, and the viral proteins VP24 and VP35 which, together with NP, form the nucleocapsid"

      Lines 47-48: "...fusion glycoprotein (GP)...[...] the ebolaviral envelope"

      Line 51: "...remarkably long virion of EBOVs undergoes..."

      Line 63: "... in vitro, and in endo-lysosomal compartments in situ, by cryo-electron..."

      Lines 70-71: " to shed light on EBOVs ... [...] with EBOV (Zaire ebolavirus species, Mayinga strain) in biosafety level 4 (BSL4) containment"

      Line 72: chemically fixed by? (PFA and GA acronyms have been annotated in figure 1, but should be first mentioned in their explicit form in the text)

      Line 73 (cryo-FIB)

      Line 80: EBOV virions

      Figure 1A and line 97: for consistency with the terminology used in the main text, should be perhaps in the second step preferred the term "vitrification" instead of cryofixation? Readers not familiar with the field could be confused by the use of the two synonyms

      Lines 92-93: "...these data indicate [...] and suggest..."

      Figure 1C and line 100: in the color legend EBOV is annotated as dark teal, however in the segmentation of the reconstructed tomogram there are three objects, one of which in dark teal is evidently a portion of EBOV virion inside the endosome, and other two are in different shades of green. What are those? Please, could author specify their identity in the figure legend with their corresponding color code? The same applies to supplementary figure S2 (see comment below).

      Line 95: "...tomography of EBOV virions..."

      Line 98: "...showing EBOV virions..." (This reviewer refers to the use of the term 'EBOVs' as for different species within the genus rather than for different EBOV particles within a dataset)

      Line 105: "... a purified EBOV before..."

      Line 110 and 113: "...EBOV matrix..." And "EBOV virus-like particles (VLP)"

      Line 140, 141, 145 and 147: "EBOV VLPs" and "EBOV VLP"; idem at lines 188-189, 209 and anywhere else in the manuscript (including figure 4A)

      Line 235: "influenza virus ion channel..."

      Line 249: please, use directly the above-introduced acronym for the detergent

      Figure 5F: in plot's X axis label: thermolysin (T)?

      Line 342: "EBOV have remarkably long..."

      Line 420 "...matrix-specific"

      Line 464: "grids"

      Line 465: "for cryo-FIB milling"

      Line 611: "influenza virus M2 ..." (Please, from which influenza virus strain does the gene come from? Alternatively, which is the NCBI Protein and/or UniProt database code?)

      Line 623: please, use the above-designated acronym for the detergent

      Line 716: "...based on cryo-ET..."

      Line 718: "influenza virus"

      Line 734: "cryo-ET data"

      Fig. S8: for consistency with the main text, "thermolysin"

      Fig. S2, C and F: are these EBOV virions (as mentioned in the figure title) or EBOV VLPs (as the legends in the two panels of this figure seem to suggest)? Please, the authors should clarify

      Line 1046: "malleable lipid envelope of the EBOV"; this adjective sounds confusing; the reviewer encourages the authors to rephrase for more clarity.

      Significance

      see above.

    1. . The command summary(m.out, interactions = TRUE, addlvariables = TRUE,standardize = TRUE) provides balanced interactioninformation and high-order terms of all covariate

      Code to diagnose potential interactions between variables the lead poor balance

    Annotators

    1. Among other things, I have traditionally used my Journal to think out loud to myself about my work in hand: the progress I’m making, the problems I’m encountering, and so on. Many of my best ideas have arisen by writing to myself like this.

      Richard Carter uses his writing journal practice to "think out loud" to himself. Often, laying out extended arguments helps people to refine and reshape their thinking as they're better able to see potential holes or missing pieces of arguments. It's the same sort of mechanism which is at work in rubber duck debugging of computer code: by explaining a process one is more easily able to see the missing pieces, errors, or problems with the process at hand.

      Carter's separate note taking and writing journal practice being used as a thought space or writing workshop of sorts is very similar to the process seen in my preliminary studies of Henry David Thoreau's work in which he kept commonplace books and separate (writing) journals which show evidence of his trying ideas on for size and working them before committing them to his published works.

    1. Anybody using this approach to manage contacts? How?

      reply to IvanFerrero at https://forum.zettelkasten.de/discussion/1740/anybody-using-this-approach-to-manage-contacts-how#latest

      Many of the digital note taking tools that run off of text allow you to add metadata to your basic text files (as YAML headers, inline with a key:: value pair, or via #tags). Many of them have search functionality or use other programmatic means like query blocks, DataView, DataViewJS, etc. for doing queries on your files to get back lists, tables, charts, etc. of the data you're looking for.

      The DataView repository has some good examples of how this works with something like Obsidian. Fortunately if you're using simple text files you can usually put them into one or more platforms to get the data and affordances you want out of them individually.

      As an example, I have a script block in my daily note in Obsidian for birthdays in my notes that fall on today's date:

      ```dataview LIST birthday FROM "Lists/People" WHERE birthday.day = date(2023-01-18).day ```

      If I put the text birthday:: 1927-12-08 into a note about Niklas Luhmann, his name and birthday would appear in my daily note on his birthday. One can use similar functionality to create tables of books they read with titles, authors, ratings, dates read, etc. or a variety of other data input which parses through your plaintext files. Services like Obsidian, Logseq, et al. are getting better about allowing these types of programmatic searches for users without backgrounds in programming and various communities usually provide help for pre-made little snippets like the one above that one can cut and paste into their notes to get the outputs that they need. Another Obsidian based example that uses text files for tracking academic journal articles can be found at https://nataliekraneiss.com/your-academic-reading-list-in-obsidian/; I'm sure there are similar versions for other text-based platforms.

      In pre-digital times, for a manual version of a rolodex like this in paper, one could use different color cards as pseudo-tags (doctors are on yellow cards, family members on blue cards, friends on green cards, etc.) or adding edge notches or even tabs to represent different types of metadata. See for example the edge colored cards in Hawkexpress' Pile of Index Cards: https://www.flickr.com/photos/hawkexpress/albums/72157594200490122

    1. I dream of regional communities forming online, based around organically grown web rings, and for idiosyncrasies to form in the aesthetics of our sites based on the communities we learned to code from

      This is insanely cool. A deglobalised global community.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors develop and freely disseminate the THINGS-data collection, a large-scale dataset incorporating MRI, MEG, eye-tracking, and 4.7 million similarity ratings for 1,854 object concepts. Demonstrating the reliability of their data, the authors replicate nearly a dozen previous neuroimaging papers. This "big data" approach significantly advances our ability to link behavioral measures with neuroimaging at scale, with the potential to spark future insights into how the mind represents objects.

      I thought that the article was well-written, with a sound methodological approach, high-quality results, and well-supported conclusions. I am overall enthusiastic about this work, and I think THINGS will provide an important benchmark for future big data approaches in cognitive and computational neuroscience.

      However, I thought it was also important to articulate more directly the potential insights this dataset can offer to the field. Although the authors mentioned that they "provided five examples for potential research directions", it was not clear to me what these new research directions were, given that the authors entirely describe replications in the results.

      We thank Reviewer 1 for their positive evaluation and the enthusiasm for our work! We have revised the manuscript to articulate more clearly and directly some potential research directions for the dataset. There are two aspects to consider: What sets these datasets apart from traditional small-scale research? And what sets them apart from other large-scale research? We elaborate on these two aspects in response to specific comments below.

      Reviewer #2 (Public Review):

      Hebart et al., present a large-scale multi-model dataset consisting of fMRI, EEG, and behavioral similarity measures towards the study of object representation in the mind and brain. The effort is immense, the methods are rigorous, and the data are of reasonable quality, the demonstrative analyses are extensive and provocative. (One small note regarding one leg of this multi-modal dataset is that the fMRI design consisted of a single image presentation for 0.5s without repetitions for most of the images; this design choice has particular analysis implications, e.g. the dataset will have more power when leveraging a priori grouping of images. However, unlike other datasets of this kind, here the number of images and how they were selected does support this analysis mode, e.g. multiple exemplars per object concept, and rich accompanying meta-data and behavioral data.)

      The manuscript is well-written, and the THINGs website that lets you explore the datasets is easy to navigate, delivering on the promise of making this an integrated, expanding worldwide initiative. Further, the datasets have clear complementary strengths to recent other large-scale datasets, in terms of the ways that the images were sampled (not to mention being multi-modal)-thus I suspect that the THINGs dataset will be heavily used by the cognitive/computational/neuroscience research community going forward.

      We would like to thank the reviewer for their positive evaluation of our work! We agree that the dataset has more power when leveraging a priori grouping of images, which is specifically the design choice we made here. We also agree that we can better highlight the strength of our dataset with respect to existing datasets regarding multiple exemplars per object concept and the semantic breadth of the included object categories.

      Reviewer #3 (Public Review):

      This manuscript presents a highly valuable dataset with multimodal functional human brain imaging data (fMRI and MEG) as well as behavioural annotations of the stimuli used (thousands of images from the THINGS collection, systematically covering multiple types of concrete nameable objects).

      The manuscript presents details about the dataset, quality control measures, and a careful description of preprocessing choices. The tools and approaches that were used follow the state of the art of the field in human functional brain imaging and I praise the authors for being transparent in their methodological approaches by also sharing their code along with the data. The manuscript also presents a few analyses with the data: 1) multi-dimensional embedding of perceived similarity judgments 2) decoding of neural representations of objects both with fMRI and MEG 3) A replication of findings related to visual size and animacy of objects 4) representation similarity analysis between functional brain data and behavioural ratings 5) MEG-fMRI fusion.

      We thank the reviewer for their overall positive assessment of our work!

    1. Include JavaScript code in your Webpage. Instantiate Razorpay Custom Checkout. Submit Payment Details.

      Check if it is possible use an OL for this? If not maybe these 3 points can be removed?

    1. I think it’s bad to hard-code a distinction between ‘real’ and ‘not-real’ into a conceptual system.

      My PKMS(Personal Knowledge Management System) has a folder of People and I put fictional characters, like Jimmy Neurton, in there

    1. The factory method is somewhat similar to a traditional constructor. How-ever, it has a significant advantage: its usage is indistinguishable from an ordinarymethod invocation. This allows us to substitute factory objects for classes (orone class for another) without modifying instance creation code. Instance cre-ation is always performed via a late bound procedural interface.

      The class semantics for new in ES6 really bungled this in an awful way.

    Annotators

    1. Lev Manovich identified some main differences between old and new media (Manovich,2001). There are five main characteristics of the new media, according to Manovich: 1) thenumerical representation of the object, i.e. its digital code that enables algorithmicmanipulation of the digital object - media becomes programmable; 2) modularity of theobject - media elements (images, sounds...) are represented as collections of discretesamples. These elements are assembled into larger-scale objects but they continue tomaintain their separate identity. These two, more material characteristics, enable 3)automatization of many operations with new media, as well as 4) the possibility that manydifferent versions of the same “media object” exist, i.e. its variability, that have more deepcharacteristics with far-reaching consequences. 5) Transcoding is the last characteristic thatManovich describes - to transcode is to translate something into another format. Thus newmedia becomes unrelated to a particular hardware and it also means that the computer layerand its logic and cultural/content layer influence each other creating a new media logic thatcultural sectors must take into account. The described characteristics of the media andcultural objects change our understanding of them.6 For Pierre Lévy the concept “virtual” has at least three meanings: a technical meaningassociated with IT, a contemporary meaning and a philosophical meaning. In itsphilosophical sense, the virtual is that which exists potentially rather than actually. As it iscurrently employed, i.e. in its contemporary meaning, the word virtual often signifiesunreality - reality implying a tangible presence (as in virtual reality). In its technicalmeaning, related with ICT, virtual means the possibility of generating information based onexisting digital data and users instructions. As Lévy says “within digital networks,information is obviously physically present somewhere, on a given medium, but it is alsovirtually present at each point of the network where it i

      Example, a photo that appears online can be edited by many and used for different purposes, for example propaganda. In this way perceptions can differ through different cultures.

    1. us_macro_quarterly.xlsx

      Data loading in R can be so annoying; please make sure that your code to read such file matches the file format as it stands.

      I also suggest that you comment the blah blah blah on what the code for reading the data does; for example: "this allows R to read an xlsx file with blah blah columns, being the first one in character format and additional 3 columns in numeric format"

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: The authors use an unclassified quaranjavirus, Wǔh�n mosquito virus 6 (WuMV-6), to demonstrate the possibility of orthomyxvirid global transmission dynamic analyses. The focused surface protein analysis strongly indicates a vertebrate host for WuMV-6 in addition to the insect host. The analysis is then expanded to other quaranjaviruses, which differ considerably in their surface glycoproteins, indicating a complex evolution. Finally, the authors scientifically demonstrate that orthomyxovirids are undersampled and hence that this family will have to expand considerably in the future.

      Major comments: none

      We thank the reviewer for a succinct summary of our study and we are very glad the key messages were sufficiently clear.

      Minor comments: The article lacks precision and hence some global edits are in order. Generally:

      1. For clarity to the reader, please introduce the family Orthomyxoviridae, i.e., its current official composition (i.e., 9 genera, 21 species, and 22 viruses) so the reader is not confused by terms such as "quaranjavirus" or "isavirus" etc.).

      This is a fair request though we would prefer to err on the side of caution with regards to the precise number of taxonomic ranks given the flux viral taxonomy has experienced and in light of the deluge of new taxa being discovered all the time. We refer to the “traditional” view of orthomyxovirid taxonomy at the genus level, encompassing the genera described up until 2011.

      After that, please clearly indicate which viruses are classified and which ones are not. For instance, the main virus dealt with in this paper is unclassified, and so are Astopletus and Ūsinis viruses.

      We do not think this is reasonable since virtually all RNA viruses discussed in the text are not classified and their status as such has little bearing on any of our findings.

      Please ensure correct spelling, including diacritics, of the viruses and abbreviations throughout: Wǔh�n mosquito virus 6 (WuMV-6); H�běi orthomyxo-like virus 2 [note the deletion of one "virus"]; Wēnlǐng orthomyxo-like virus 2

      Thank you for the comment, we have added the diacritics where we could identify them but may have missed some.

      For orientation of the reader, please refer to family groups of viruses as -virids (e.g., "orthomyxovirids", "human coronavirids", "some rhabdovirids"). This way it is clear to the reader that, for instance, "quaranjaviruses" refers to a genus-level group

      Thank you, we agree that this adds much needed precision in terminology.

      "influenza" is a disease. There are several viruses that can cause influenza; they belong to four different genera. Please scan for "influenza" and replace each either with a virus name (for instance, in the abstract, "...RNA viruses containing influenza A virus" or with a genus name (e.g., "alphainfluenzaviruses")

      Our apologies for that misnomer. The text has been corrected.

      Please ensure the differentiation of taxa (concepts), such as species, and viruses (things). Orthomyxoviridae cannot infect anything, it can also not be sampled etc. Orthomyxovirids, the physical members of Orthomyxoviridae can infect things. Most instances of "Orthomyxoviridae" should be replaced accordingly.

      Thank you for the comment, this has been corrected as suggested.

      In particular:

      1. The title doesn't make much sense. Orthomyxovirids are not taxonomically incomplete - they are things that we simply may not have samples or may have characterized incompletely. Also, the analyses are largely restricted to quaranjaviruses. Hence, I would suggest "...genome evolution, and broad diversity of quaranjaviruses"

      Our apologies for the confusion. The analyses we carried out to quantify evolutionary orthomyxovirid diversity likely waiting to be discovered was carried out on all known (at the time) members of ____Orthomyxoviridae____ and thus the title must still refer to the entire family rather than quaranjavirids. We felt that the term “taxonomic incompleteness” imparts on the reader exactly what the reviewer refers to, namely that new taxonomic ranks are likely to come as more evolutionary diversity gets uncovered. Alternative and more precise formulations, like referring to evolutionary incompleteness or something similar, would miss the fact that it is taxonomy that discretises the otherwise continuous evolutionary change.

      Abstract: genomes are not employed and do not make money. Please replace "employed" with "used"

      We have to respectfully disagree since the definition of the word “employ” also includes the meaning “to make use of”.

      Re: point 6 above, Introduction: species/families etc. cannot be discovered. They are being established by people for viruses that may be discovered. Please fix here and elsewhere (in most cases, "species" should be replaced with "viruses")

      We agree that taxonomic ranks are designated and not discovered and have changed the text accordingly.

      P3, second paragraph: please place "jingmenviruses" in quotation marks as this is not an official term (yet). Please add "potentially" ("as potentially causing human disease"). Even the authors only speak of an "association" and do not fulfill Koch's postulates

      We have to respectfully disagree here too. “Jingmenviruses” as a term is unambiguous in referring to a group of related segmented flaviviruses even though the groups is not officially assigned a taxonomic rank. We have altered the text to add uncertainty to the claim that jingmenviruses cause disease in humans.

      P3, top right column: "e.g., the tick-borne Johnston Atoll quaranja- and thogotoviruses" is ambiguous. Please change to "e.g., the tick-borne quaranja- and thogotoviruses" or list particular viruses and clarify which belong to which genus

      Apologies for the confusion. We fixed this instance.

      P3, right column "smaller number" - change to "lower number"

      We have altered the offending sentence in response to reviewer 2 and this combination of words is no longer present.

      P3, right column "or only the polymerase" - makes no sense to the reader as it has not been introduced; and grammatically needs to be improved as the polymerase is also encoded on a segment. Likewise, PB1 makes no sense to unacquainted reader - maybe add a few sentences to the intro right after the family introduction on general genome composition and that PB1 is part of the polymerase holoenyzme?

      We have altered the offending sentence in response to reviewer 2 but we take the point. We’ve added detail about the RNA-directed RNA polymerase of orthomyxovirids to the introduction.

      P4: the Ebola virus glycoprotein is called GP1,2 [with 1,2 in subscript] (also Figure 2 legend)

      Respectfully, while the reviewer is technically correct in that the glycoprotein of Ebola virus is referred to as GP_1,2 in proteomics literature (the 1,2 referencing the protein held together by a cysteine bridge post-cleavage), calling it GP is not out of place in evolutionary studies and the term “Ebola virus GP” is unambiguous to the reader.

      P4: please change "West Africa" to "Western Africa" (the designation of the area by the UN)

      Unfortunately, while we agree that the reviewer is correct in that the UN refers to the region as “Western Africa”, references to the “West African Ebola virus epidemic” are ubiquitous in the literature and thus we do not see the reason to change the term here either.

      P6: change "with Rainbow / Steelhead trout orthomyxviruses" to "with mykissviruses (rainbow trout orthomyxovirus and steelhead trout orthomyxovirus)" [note that virus names are not capitalized except for proper noun components; hence also "infectious salmon anemia virus, bottom right column]

      While we recognise that viruses related to infectious salmon anaemia virus discovered in trout have received a separate taxonomic designation we feel very strongly about not mentioning it in our manuscript. Our fear is that “mykissviruses” have been designated too hastily on the basis of a handful of representatives and that relatives discovered in the future may show an indiscernible continuum between “mykissviruses” and isaviruses, invalidating the former as a valid term. We would therefore strongly prefer to keep references to specific viruses rather than a taxonomic designation that may disappear so that a future reader may have an easier time with our study.

      P6, right column: please change "RNA-dependent" to the IUPAC/IUB-correct "RNA-directed"

      Done.

      Figure 2 is too small. I could not figure out B with or without my confocals... Likewise S2, S3 are way too small. In Fig 2 legend, please place "spike" into lower case

      We understand the reviewer’s concern here but Figure 2B was a compromise between vertical space available on a page, the number of taxa in the PB1 tree, and what we thought important to communicate - the variation in segment number across orthomyxoviruses and mapping of PB1 diversity to gp64 diversity. This was done at the expense of individual taxon name visibility whilst fully zoomed out. To remedy this Figure 2B was rendered in 300 dpi resolution such that zooming in will show individual taxon names clearly. We ultimately hope to publish our study in an online-only journal where printing will not present an issue. Likewise for figures S2 and S3. We have changed “Spike” to be lower case in the legend.

      Figure 3: correct spelling of virus names (from top to bottom): rainbow trout orthomyxovirus, infectious salmon anemia virus, influenza C virus, influenza D virus, influenza A virus, influenza B virus, Wēnlǐng orthomyxo-like virus 2, Dhori virus, Thogoto virus, Jos virus, Aransas Bay virus, ... Johnston Atoll virus, Quaranfil virus, H�běi orthomyxo-like virus 2, Hǎin�n orthomyxo-like virus 2, Wǔh�n mosquito virus 6. Also apply to S6 and others where applicable.

      The names for viruses in Figure 3 were taken directly from their NCBI records and since we do not show their accessions there is no other way to disambiguate them to the reader. We have, however, added the necessary diacritics where appropriate.

      [PS: based on the somewhat backward, non-UNICODE editorial manager system, I am worried that the diacritics in virus names above are not rendered corretly. If so, please look up the Pinyin spelling of Wuhan, Hainan, Wenling etc. - easiest way is to search Wikipedia for the terns and then identify the Pinyin spelling, which is typically pointed out]

      CROSS-CONSULTATION COMMENTS

      I think we (all reviewers) are all largely in agreement - this is a very useful study; the manuscripts just needs various adjustments. I agree with the requests of the other two reviewers.

      Reviewer #1 (Significance (Required)):

      The strength of the paper is that it provides a road map on how undersampled taxa may be analyzed and which kind of information can be gleaned from these analyses. The paper also demonstrates that the analysis of seemingly "unimportant" viruses can prove important. The limitation of the paper is that there is no true novel revelation here. The sampling sites of WuMV-2 GenBank records already suggest broad distribution, which often goes along with sequence diversity; the continued discovery of orthomyxovirids in metagenomic studies clearly implied undersampling (but it is nice to have this "gut feeling" scientifically fortified now). The paper is useful for evolutionary virologists, virus taxonomists, orthomyxovirid specialists, and invertebrate virologists.

      We respectfully disagree with the reviewer and believe they may have missed an important point raised by our study. We do not claim that a global distribution of WuMV6 is what makes it remarkable but that its sampled diversity is 1) sufficient to calibrate molecular clocks (in our experience this is not always the case for arthropod viruses) and 2) that WuMV6 has reached its current global distribution ____recently____.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This is a nice example of bringing together a variety of data from metatranscriptomic studies to answer fundamental evolutionary questions in the field of viral evolution. There is a focus on a single virus family, and although some might see this as a little restrictive, I think the 'deep-dive' presented in this paper leaves space for a relatively detailed and comprehensive analysis. No doubt, other studies will gain inspiration from the approach presented here and expand this work to other viral groups.

      Overall, the paper is very well written, and the figures are of a very high quality. It is a shame that there are only 3 main figures in the paper because the supplementary figures are well presented and informative.

      We thank the reviewer for the kind words.

      The manuscript discusses the importance of host quite a bit, and for that reason it would have been nice to try and incorporate the host of the various viruses into the figures somehow (perhaps as a supplementary, since the trees are already quite busy). This might help orientate the reader).

      While we appreciate that host information is of interest, we foresee several issues. For one, we refer to broad host classes (essentially arthropod versus vertebrate) because they are largely determined by membrane fusion protein classes, the actual focus of our study, which exhibit strong phylogenetic signal. Secondly, host information in metagenomic studies can be imprecise, incorrect or unavailable.

      I have some minor comments or suggestions for the authors to consider below. Note, please use line numbers in the future for your submissions.

      A paragraph in the discussion laying out the limitations of this approach would be useful to the reader and would make this excellent paper even more robust.

      Thank you for the suggestion. We presume the reviewer is referring to our interpolation of orthomyxovirid diversity and included a few sentences about the limitations of this approach in the Discussion.

      Pg 3. The sentence starting 'The vast majority of known orthomyxoviruses use one...' should be made into two sentences to make it easier to read. A second sentence for the arthropod description is the obvious edit.

      We appreciate the suggestion and have included it in the manuscript.

      Pg 3. 'The number of segments of orthomyxoviruses with genomes known to be complete varies from 6 to 8'. Rephrase to - 'Orthomyxoviruses genomes are known to have 6-8 segments, but many metagenomically discovered viruses in this group have incomplete genomes...etc...',

      Thank you for the suggestion, it has been included.

      Figure 1 - what do the white triangles mean? Are these the directions of reassortment? This should be explained in the legend...

      We apologise for the omission, this is now explained.

      New Zealand is covered up by the circular tree. It looks like there is a point which is partially obscured.

      The reviewer spotted a mistake on our part here. The figure included the coordinates for Wellington, New Zealand when the detection was actually in Wellington Shire, Australia. This has been fixed.

      PD analysis - t I think you assume that viruses are static in this analysis. As we all know, they continue to mutate and eventually new species will evolve. Is it possible to consider the mutation rate in this analysis and the evolution of new variants/ eventually leading to new species? It might be complicated, and maybe a matter for future work, but it might be worth discussing this as a limitation at the very least. Especially when extrapolating to the future (although you do not extrapolate too far, so maybe this is not an issue here...). You could choose to discuss this in relation to the bird analogy (which was great), and compare the rate of mutation which will lead to the evolution of new species on a totally different time scale.

      We appreciate the point raised by the reviewer and while we wholly agree that the possibility of new viral taxa arising over time is an important caveat, we felt the discussion ends up being rather short. On one hand taxa definitions for different viral groups can be different, and on the other speciation in RNA viruses is difficult to place in absolute time because of a phenomenon called time-dependence of evolutionary rates. Methods accounting for the latter using sophisticated models or external calibration points would seem to imply that speciation timescales exceed those of research.

      Discussion: When discussing the hypothesis that WMV6 diversity is a result of repeat exposure to vertebrate hosts, can you also describe the alternative hypothesis here, and why the evidence leads you to put more weight on the former.

      This is a fair question and we have mentioned an alternative hypothesis in the discussion that’s been brought up by our colleagues before. It’s a hypothesis that alternating between different hosts induces divergent selection pressures on gp64. We contend that since gp64 proteins are thought to use a highly conserved host receptor (NPC1) we think it likely that no major changes are required when switching hosts. We are open to discussing other alternatives if the reviewer has suggestions.

      CROSS-CONSULTATION COMMENTS

      Seems like we are all in agreement and that after some minor adjustments this will be an excellent contribution.

      Reviewer #2 (Significance (Required)):

      Please see my review above. I did not use your formatting suggestions since I only saw it upon completing my review.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary

      This manuscript describes the use of data from metagenomic analyses to make inferences about the evolutionary and geographic history of the Orthomyxoviridae family of viruses and their hosts. Data from Wuhan Mosquito Virus 6 (WMV6) derived from various RNA-seq analyses is used to analyse loss and gain of virus segments over time, the time since the last common ancestor of these segments and the selection pressure acting on different genes. These results are used to hypothesise about which species have vectored this virus in the past and their geographic distribution. The additional phylogenetic diversity provided by characterisation of additional viruses of this species is quantified and projected into the future to demonstrate the value of further work in this area. The study also demonstrates more generally the benefit of additional sequencing and of characterising viruses in metagenomic datasets, even in cases where novel viruses are not identified.

      Major Comments

      The methodology in this manuscript appears to be sound and the results support the conclusions. Appropriate and detailed analyses have been performed and are described in detail. Code is provided to allow the results to be reproduced. The figures are informative and very well presented. I do not think any additional analyses are required.

      We thank the reviewer for the kind words.

      Minor Comments

      The manuscript is a little hard to follow in places. I think a brief introduction of WHV6 in the introduction section would help with this - where has it been isolated previously and what is known about its evolutionary history (if anything), how is it related to other Orthomyxoviruses. This information is included later but it would improve the flow of the paper to include it in the introduction.

      We apologise for the inconvenience and agree with the reviewer. We have improved the flow of the manuscript per reviewer suggestion.

      I think including a little more about the Method in the Results section would also be helpful, to save the reader jumping back and forth in order to understand the results. For example, at the beginning of the results section, briefly detailing how many samples were included, their broad geographic location and what the analysis is intended to show (e.g. "three full length sequences isolated from China, seven from Australia [...], between 1995 and 2019, were used to generate a reassortment network, in order to show.....") would be helpful. Each of the subsections of the Results would benefit from something similar.

      Apologies for the lack of clarity on our part. We have added more methodological information to each section in the results.

      Although it is clear in the Materials and Methods which datasets have been included, it is less apparent why these were selected. For example, in Figure 1A there are five countries listed - are these countries for which a particularly large amount of full length sequences were available or for which any full length sequence is available? Similarly, for Figure 1B, are these all of the countries where a dataset has originated containing any segment of WHV6?

      The confusion is entirely our fault as we have clearly not provided sufficient detail. This has been fixed now by explaining this better in the methods and Figure 1 legend.

      In the Discussion, it is stated that the frequency and fast evolution of WMV6 place it uniquely to enable tracking of mosquito populations, however there is no evidence presented to support this - does WMV6 evolve faster or occur more frequently than other mosquito RNA viruses?

      Our apologies for the jump in logic. We now expand on what we meant by the following sentence in the discussion: “In our experience, metagenomically discovered RNA viruses can be rare or, when encountered often, do not always contain sufficient signal to calibrate molecular clocks (Webster et al. 2015).”

      CROSS-CONSULTATION COMMENTS

      I also agree with the requests of the other two reviewers and that the manuscript will be in great shape once these are included.

      Reviewer #3 (Significance (Required)):

      This manuscript is very interesting, for the specific results presented here but, more importantly, in opening up further avenues for investigation. The study provides a proof of concept for using viruses derived from metagenomic data for specific and detailed evolutionary and ecological analyses of a single species. The scope of the analysis performed on WMV6 is not particularly broad, but it differs from the typical analysis of viruses in metagenomic datasets, which tends to focus on identification and characterisation of novel viruses only. I believe that this work is valuable to others working in the field, reveals additional potential in existing data and could provide inspiration for many future studies. To my knowledge, it is one of the first studies to focus on a single, fairly under-studied virus, and draw ecological conclusions based on only bioinformatic analyses.

      I think the results presented here for WMV6 may be of interest to a specialised audience, but that the manuscript overall is valuable to a broad audience, including ecologists, evolutionary biologists and virologists conducting fundamental science research.

      We appreciate the reviewer’s kind words.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      This manuscript describes the use of data from metagenomic analyses to make inferences about the evolutionary and geographic history of the Orthomyxoviridae family of viruses and their hosts. Data from Wuhan Mosquito Virus 6 (WMV6) derived from various RNA-seq analyses is used to analyse loss and gain of virus segments over time, the time since the last common ancestor of these segments and the selection pressure acting on different genes. These results are used to hypothesise about which species have vectored this virus in the past and their geographic distribution. The additional phylogenetic diversity provided by characterisation of additional viruses of this species is quantified and projected into the future to demonstrate the value of further work in this area. The study also demonstrates more generally the benefit of additional sequencing and of characterising viruses in metagenomic datasets, even in cases where novel viruses are not identified.

      Major Comments:

      The methodology in this manuscript appears to be sound and the results support the conclusions. Appropriate and detailed analyses have been performed and are described in detail. Code is provided to allow the results to be reproduced. The figures are informative and very well presented. I do not think any additional analyses are required.

      Minor Comments:

      The manuscript is a little hard to follow in places. I think a brief introduction of WHV6 in the introduction section would help with this - where has it been isolated previously and what is known about its evolutionary history (if anything), how is it related to other Orthomyxoviruses. This information is included later but it would improve the flow of the paper to include it in the introduction. I think including a little more about the Method in the Results section would also be helpful, to save the reader jumping back and forth in order to understand the results. For example, at the beginning of the results section, briefly detailing how many samples were included, their broad geographic location and what the analysis is intended to show (e.g. "three full length sequences isolated from China, seven from Australia [...], between 1995 and 2019, were used to generate a reassortment network, in order to show.....") would be helpful. Each of the subsections of the Results would benefit from something similar.

      Although it is clear in the Materials and Methods which datasets have been included, it is less apparent why these were selected. For example, in Figure 1A there are five countries listed - are these countries for which a particularly large amount of full length sequences were available or for which any full length sequence is available? Similarly, for Figure 1B, are these all of the countries where a dataset has originated containing any segment of WHV6?

      In the Discussion, it is stated that the frequency and fast evolution of WMV6 place it uniquely to enable tracking of mosquito populations, however there is no evidence presented to support this - does WMV6 evolve faster or occur more frequently than other mosquito RNA viruses?

      CROSS-CONSULTATION COMMENTS

      I also agree with the requests of the other two reviewers and that the manuscript will be in great shape once these are included.

      Significance

      This manuscript is very interesting, for the specific results presented here but, more importantly, in opening up further avenues for investigation. The study provides a proof of concept for using viruses derived from metagenomic data for specific and detailed evolutionary and ecological analyses of a single species. The scope of the analysis performed on WMV6 is not particularly broad, but it differs from the typical analysis of viruses in metagenomic datasets, which tends to focus on identification and characterisation of novel viruses only. I believe that this work is valuable to others working in the field, reveals additional potential in existing data and could provide inspiration for many future studies. To my knowledge, it is one of the first studies to focus on a single, fairly under-studied virus, and draw ecological conclusions based on only bioinformatic analyses.

      I think the results presented here for WMV6 may be of interest to a specialised audience, but that the manuscript overall is valuable to a broad audience, including ecologists, evolutionary biologists and virologists conducting fundamental science research.

      My expertise is in computational genomics, focused on RNA virus evolution.

    1. Reviewer #3 (Public Review):

      In this study, the authors investigate the genetic and environmental causes of elevated Mitochondrial Membrane Potential (MMP) in yeast, and also some physiological effects correlated with increased MMP.

      The study begins with a reanalysis of transcriptional data from a yeast mutant lacking the gene MCT1 whose deletion has been shown to cause defects in mitochondrial fatty acid synthesis. The authors note that in raffinose mct1del cells, unlike WT cells, fail to induce expression of many genes that code for subunits of the Electron Transport Chain (ETC) and ATP synthase. The deletion of MCT1 also causes induction of genes involved in acetyl-CoA production after exposure to raffinose. The authors therefore conduct a screen to identify mutants that suppress the induction of one of these acetyl-CoA genes, Cit2. They then validate the hits from this screen to see which of their suppressor mutants also reduce expression in four other genes induced in a mct1del strain. This yielded 17 genes that abolished induction of all 5 genes tested in an mct1del background during growth on raffinose.

      The authors chose to focus on one of these hits, the gene coding for the phosphatase SIT4 (related to human PP6) which also caused an increase in expression of two respiratory chain genes. The authors then investigated MMP and mitochondrial morphology in strains containing SIT4 and MCT1 deletions and surprisingly saw that sit4del cells had highly elevated MMP, more reticular mitochondria, and were able to fully import the acetolactate synthase protein Ilv2p and form ETC and ATP synthase complexes, even in cells with an mct1del background, rescuing the low MMP, fragmented mitochondria, low import of Ilv2 and an inability to form ETC and ATP synthase complexes phenotypes of the mct1del strain. Surprisingly, the authors find that even though MMP is high and ETC subunits are present in the sit4del mct1del double deletion strain, that strain has low oxygen consumption and cannot grow under respiratory conditions, indicating that the elevated MMP cannot come from fully functional ETC subunits. The authors also observe that deleting key subunits of ETC complex III (QCR2) and IV (COX5) strongly reduced the MMP of the sit4del mutant, which would suggest that the majority of the increase in MMP of the sit4del mutant was dependant on a partially functional ETC. The authors note that there was still an increase in MMP in the qcr2del sit4del and cox4del sit4del strains relative to qcr2del and cox4del strains indicating that some part of the increase in MMP was not dependent on the ETC.

      The authors dismiss the possibility that the increase in MMP could have been through the reversal of ATP synthase because they observe that inhibition of ATP synthase with oligomycin led to an increase of MMP in sit4del cells. Indicating that ATP synthase is operating in a forward direction in sit4del cells.

      Noting that genes for phosphate starvation are induced in sit4del cells, the authors investigate the effects of phosphate starvation on MMP. They found that phosphate starvation caused an increase in MMP and increased Ilv2p import even in the absence of a mitochondrial genome. They find that inhibition of the ADP/ATP carrier (AAC) with bongkrekic acid (BKA) abolishes the increase of MMP in response to phosphate starvation. They speculate that phosphate starvation causes an increase in MMP through the import and conversion of ATP to ADP and subsequent pumping of ADP and inorganic phosphate out of the mitochondria.

      They further show that MMP is also increased when the cyclin dependent kinase PHO85 which plays a role in phosphate signaling is deleted and argue that this indicates that it is not a decrease in phosphate which causes the increase in MMP under phosphate starvation, but rather the perception of a decrease in phosphate as signalled through PHO85. Unlike in the case of SIT4 deletion, the increase in MMP caused by the deletion of pho85 is abolished when MCT1 is deleted.

      Finally they show an increase in MMP in immortalized human cell lines following phosphate starvation and treatment with the phosphate transporter inhibitor phosphonoformic acid (PFA). They also show an increase in MMP in primary hepatocytes and in midgut cells of flies treated with PFA.

      The link between phosphate starvation and elevated MMP is an important and novel finding and the evidence is clear and compelling. Based on their experiments in various mammalian contexts, this link appears likely to be generalizable, and they propose and begin to test an interesting hypothesis for how MMP might occur in response to phosphate starvation in the absence of the Electron Transport Chain.

      The link between phosphate starvation and deletion of the conserved phosphatase SIT4 is also interesting and important, and while the authors' experiments and analysis suggest some connection between the two observations, that connection is still unclear.

      Major points

      Mitotracker is great fluorescent dye, but it measures membrane potential only indirectly. There is a danger when cells change growth rates, ion concentrations, or when the pH changes, all MMP indicating dyes change in fluorescence: their signal is confounded Change in phosphate levels can possibly do both, alter pH and ion concentrations. Because all conclusions of the manuscript are based on a change in MMP, it would be a great precaution to use a dye-independent measure of membrane potential, and confirm at least some key results.

      Mitochondrial MMP does strongly influence amino acid metabolism, and indeed the SIT4 knockout has a quite striking amino acid profile, with histidine, lysine, arginine, tyrosine being increased in concentration. http://ralser.charite.de/metabogenecards/Chr_04/YDL047W.html<br /> Could this amino acid profile support the conclusions of the authors? At least lysine and arginine are down in petites due to a lack of membrane potential and iron sulfur cluster export.- and here they are up. Along these lines, according to the same data resource, the knock-outs CSR2, ASF1, SSN8, YLR0358 and MRPL25 share the same metabolic profile. Due to limited time I did not re-analyse the data provided by the authors- but it would be worth checking if any of these genes did come up in the screens of the authors.

      One important claim in the manuscript attempts to explain a mechanism for the MMP increase in response to phosphate starvation which is independent of the ETC and ATP synthase.

      It seems to me the only direct evidence to support this claim is that inhibition of the AAC with BKA stops the increase of mitotracker fluorescence in response to phosphate starvation in both WT and rho0 cells (Figs 4B and 4C). It would strengthen the paper if the authors could provide some orthogonal evidence.

      Introduction/Discussion The author might want to make the reader of the article aware that the 'reversal' of the ATP synthase directionality -i.e. ATP hydrolysis by the ATP synthase as a mechanism to create a membrane potential (in petites), has always been a provocative idea - but one that thus far could never be fully substantiated. Indeed some people that are very familiar with the topic, are skeptical this indeed happens. For instance, Vowinckel et al 2021 (PMID: 34799698) measured precise carbon balances for peptide cells, and found no evidence for a futile cycle - peptides grow slower, but accumulate the same biomass from glucose as peptides that re-evolve at a fast growth rate . Perhaps the manuscript could be updated accordingly.

      In the introduction and conclusion there is discussion of MMP set points. In particular the authors state:

      "Critically, we find that cells often prioritize this MMP setpoint over other bioenergetic priorities, even in challenging environments, suggesting an important evolutionary benefit."

      This does not seem to be consistent with the central finding of the manuscript that MMP changes under phosphate starvation. MMP doesn't seem so much to have a 'set point' but rather be an important physiological variable that reacts to stimuli such as phosphate starvation.

      The authors suggest that deletion of Pho85 causes an increase in MMP because of cellular signaling. However, they also state in the conclusion:

      "Unlike phosphate starvation, the pho85D mutant has elevated intracellular phosphate concentrations. This suggests that the phosphate effect on MMP is likely to be elicited by cellular signaling downstream of phosphate sensing rather than some direct effect of environmental depletion of phosphate on mitochondrial energetics."

      The authors should cite the study that shows deletion of PHO85 causes increased intracellular phosphate concentrations. It also seems possible that the 'cellular signaling' that causes the increase in MMP could be a result of this increase in intracellular phosphate concentrations, which could constitute a direct effect of an environmental overload of phosphate on mitochondrial energetics.

      Related to this point, in the conclusion, the authors state:

      "We now show that intracellular signaling can lead to an increased MMP even beyond the wild-type level in the absence of mitochondrial genome."

      In sum, the data shows that signaling is important here- but signaling alone is only the message - not the biophysical process that creates a membrane potential. The authors then could revise this slightly.

      The authors state in the conclusion that

      "We first made the observation that deletion of the SIT4 gene, which encodes the yeast homologue of the mammalian PP6 protein phosphatase, normalized many of the defects caused by loss of mtFAS, including gene expression programs, ETC complex assembly, mitochondrial morphology, and especially MMP (Fig. 1)"

      The data shown though indicates that a defect in mtFAS in terms of MMP, deletion of SIT4 causes a huge increase (and departure away from normality) whether or not mct1 is present (Fig 1D)

      The language "SIT4 is required for both the positive and negative transcriptional regulation elicited by mitochondrial dysfunction" feels strong. SIT4 seems to influence positive transcriptional regulation in response to mitochondrial dysfunction caused by MCT1 deletion (but may not be the only thing as there appears to be an increase in CIT2 expression in a sit4del background following a further deletion of MCT1). In terms of negative regulation, SIT4 deletion clearly affects the baseline, but MCT1 deletion still causes down regulation of both examples shown in Fig 1B, showing that negative transcriptional regulation can still occur in the absence of SIT4. The authors might consider showing fold change of expression as they do in later figures (Figs 4B and C) to help the reader evaluate the quantitative changes they demonstrate.

      The authors induce phosphate starvation by adding increasing amounts of potassium phosphate monobasic at a pH of 4.1 to phosphate dropout media supplemented with potassium. The authors did well to avoid confounding effects of removing potassium. The final pH of YNB is typically around 5.2. Is it possible that the authors are confounding a change in pH with phosphate starvation? One would expect the media in the phosphate starvation condition to have a higher pH than the phosphate replacement or control media. Is a change in pH possibly a confounding factor when interpreting phosphate starvation? Perhaps the authors could quantify the pH of the media they use for the experiment to understand how much of a factor that could be. One needs to be careful with Miotracker and any other fluorescent dye when pH changes. Albeit having constraints on its own, MitoLoc as a protein rather than small molecule marker of MMP might be a good complement.

    1. Contracts under the common law and Uniform Commercial Code, classification, contract terms and elements, performance. Includes enforcement, breach, and remedies, third person beneficiary contracts, assignment of contracts The Nature and classification of contracts The formation and legal definition of a contract Understanding the contract elements of capacity, genuine assent and consideration What establishes a third persons contract Defining discharge and breach of contact Remedies for breach of contract

      i can't wait to learn more about contracts, breach of contracts and the legal definition of the contracts. I want to be able to write my own up and be able to explain and execute the contracts correctly.

    1. Recommandation 25Adopter une disposition législative, ausein du code des relations entre lesusagers et l’administration, imposant depréserver plusieurs modalités d’accès auxservices publics pour qu’aucune démarcheadministrative ne soit accessible uniquementpar voie dématérialisée.
    1. Get production-accurate data and preview databases to code against, fast.

      운영환경 데이터베이스의 스냅샷을 떠서 개발용 데이터베이스를 갖추는 전문 도구이다. Postgres 을 지원한다. 초기단계의 제품이라 Postgres 관련 기능을 먼저 개발한 후에 MySQL 등도 지원할 모양이다.

    1. • Inscrire le droit à une éducation non violente et l’interdiction des châtiments corporels ettraitements humiliants dans le code de l’éducation, dans le code de la santé publique, ainsi quedans le code de l’action sociale et des familles.destinataires : Ministre de l’Éducation nationale / Ministre de la Santé et de la PréventionMinistre des Solidarités, de l’Autonomie et des Personnes handicapées
    1. When running on Windows using Git Bash and Anaconda, the previous code will not work. Multiline strings containing multiple shell commands are not executed correctly. The simplest workaround is to add &&\ to the end of all lines except the last inside the multiline shell command:
    1. stages of the design process when incompatibilities are identified.

      Designing a spacecraft is very similar to designing a piece of code that has a certain purpose and everything related to that code (the functions, libraries, etc.) are specific to the purpose/goal of the code

    1. The other option is to run x86_64 Docker images on your ARM64 Mac machine, using emulation. Docker is packaged with software that will translate or emulate x86_64 machine code into ARM64 machine code on the fly; it’s slow, but the code will run.

      Another possible solution for M1 users (see snippets below)

    2. If you have a compiler installed in your Docker image and any required native libraries and development headers, you can compile a native package from the source code. Basically, you add a RUN apt-get upgrade && apt-get install -y gcc and iterate until the package compiles successfully.

      Second possible solution for M1 users

    3. In either case, pure Python will Just Work, because it’s interpreted at runtime: there’s no CPU-specific machine code, it’s just text that the Python interpreter knows how to run. The problems start when we start using compiled Python extensions. These are machine code, and therefore you need a version that is specific to your particular CPU instruction set.

      M1 Python issues

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Marta Sanvicente-García et al and colleague developed a comprehensive and versatile genome editing web application tool and a nextflow pipeline to give support to gene editing experimental design and analysis.

      The manuscript is well written and all data are clearly shown.

      While I did not tested extensively, the software seems to work well and I have no reason to doubt the authors' claims.

      I usually prefer ready to use web applications like outknocker, they are in general easier to use for rookies (it would be good if the author could cite it, since it is very well implemented) but the nextflow implementation is anyway well suited.

      We have been able to analyze the testing dataset that they provide, but we have tried to run it with our dataset and we have not been able to obtain results. We have also tried to run it with the testing dataset of CRISPRnano and CRISPResso2 without obtaining results. The error message has been in all the cases: “No reads mapping to the reference sequence were found.”

      Few minor points:

      Regarding the methods to assess whether the genome editing is working or not, I would definitely include High Resolution Melt Analysis, which is by far the fastest and probably more sensitive amongst the others.

      Following the Reviewer 1 suggestion, we have added this technique in the introduction: “Another genotyping method that has been successfully used to evaluate genome editing is high-resolution melting analysis (HRMA) [REFERENCE]. This is a simple and efficient real-time polymerase chain reaction-based technique.”

      Another point that would important to taclke is that often these pipelines do nto define the system they are working with (eg diploid, aploid vs etc). This will change the number of reads needed ato unambigously call the genotypes detected and to perform the downstream analysis (the CRISPRnano authors mentioned this point).

      In the introduction, it is already said: " it is capable of analyzing edited bulk cell populations as well as individual clones". In addition, following this suggestion we have added in the help page of CRISPR-A web application and in the documentation of the nextflow pipeline a recommended sample coverage to orient the users on that.

      I am also wondering whether the name CRISPR-A is appropriate since someone could confuse it with CRISPRa.

      CRISPR-A is an abbreviation for CRISPR-Analytics. Even if it is true that it can be pronounced in the same way that CRISPRa screening libraries, it is spelled differently and would be easily differentiated by context.

      CROSS-CONSULTATION COMMENTS

      Reviewer 2 made an excellent work and raised important concerns about the software they need to be addressed carefully.

      In the meantime we had more time to test the software and can confirm some of the findings of Reviewer 1:

      1) We spent hours running (unsuccessfully) CRISPR A on Nextflow. The software does not seem to run properly.

      2) No manual or instruction can be found on both their repositories (https://bitbucket.org/synbiolab/crispr-a_nextflow/

      https://bitbucket.org/synbiolab/crispr-a_figures/)

      We have added a readme.md file to both repositories and we hope that with the new documentation the software can be downloaded and run easily. We have also added an example test in CRISPR-A nextflow pipeline to facilitate the testing of the software. Currently, the software is implemented in DLS1 instead of DLS2, making it impossible to be run with the latest version of nextflow. We are planning to make the update soon, but we want to do it while moving the pipeline to crisprseq nf-core pipeline to follow better standards and make it fully reproducible and reusable.

      Few more points to be considered:

      • UMI clustering is not proper terminology. Barcode multiplexing/demultiplexing (SQK-LSK109 from Oxford Nanopore).

      We have added more details in the methods section “Library prep and Illumina sequencing with Unique Molecular Identifiers (UMIs)” to clarify the process and used terminology: “Uni-Molecular Identifiers are added through a 2 cycles PCR, called UMI tagging, to ensure that each identifier comes just from one molecule. Barcodes to demultiplex by sample are added later, after the UMI tagging, in the early and late PCR.”

      We had already explained the computational pipeline through which these UMIs are clustered together to obtain a consensus of the amplified sequences in “CRISPR-A gene editing analysis pipeline” section in methods:

      “An adapted version of extract_umis.py script from pipeline_umi_amplicon pipeline (distributed by ONT https://github.com/nanoporetech/ pipeline-umi-amplicon) is used to get UMI sequences from the reads, when the three PCRs experimental protocol is applied. Then vsearch⁴⁸ is used to cluster UMI sequences. UMIs are polished using minimap2³² and racon⁴⁹ and consensus sequences are obtained using minialign (https://github.com/ocxtal/minialign) and medaka (https://github.com/nanoporetech/medaka).”

      We also have added the following in “CRISPR-A gene editing analysis pipeline” methods section to help to understand differences between the barcodes that can be used: “In case of working with pooled samples, the demultiplexing of the samples has to be done before running CRISPR-A analysis pipeline using the proper software in function of the sequencing used platform. The resulting FASTQ files are the main input of the pipeline.”

      Then, SQK-LSK109 from Oxford Nanopore is followed through the steps specified in methods: “The Custom PCR UMI (with SQK-LSK109), version CPU_9107_v109_revA_09Oct2020 (Nanopore Protocol) was followed from UMI tagging step to the late PCR and clean-up step.”

      Finally, we want to highlight that, as can be seen in methods as well as in discussion, UMIs are used to group sequences that have been amplified from the same genome and not to identify different samples: “Precision has been enhanced in CRISPR-A through three different approaches. [...] We also removed indels in noisy positions when the consensus of clusterized sequences by UMI are used after filtering by UBS.” As well as in results (Fig. 5C).

      • Text in Figure 5 is hard to read.

      We have increased the letter size of Figure 5.

      • They should test the software based on the ground truth data

      We have added a human classified dataset to do the final benchmarking. And we can see that for all examined samples CRISPR-A has an accuracy higher than 0.9. As has been shown in the figure with manual curated data, CRISPR-A shows good results in noisy samples using the empiric noise removal algorithm, without need of filtering by edition windows.

      • The alignment algorithm is not the best one, I think minimap2 would be better for general purpose (at least it work better for ONT).

      As can be seen in figure 2A, minimap is one of the alignment methods that gives better results for the aim of the pipeline. In addition, we have tuned the parameters (Figure 2B) for a better detection of CRISPR-based long deletions, which can be more difficult to report in a single open gap of the alignment.

      • The minimum configuration for installation was not mentioned (for their Docker/next flow pipeline).

      Proper documentation to indicate the configuration requirements for installation has been added to the readme.md of the repository·

      • Fig 2: why do they use PC4/PC1?

      Principal Component Analysis is used to reduce the number of dimensions in a dataset and help to understand the effect of the explainable variables, detect trends or samples that are labeled in incorrect groups, simplify data visualization… Even PC4 explains less variability than PC2 or PC3, this helps us to understand and better decipher the effect of the 4 different analyzed parameters even if the differences are not big. We have decided to include as a supplementary figure other PCs to show these.

      • There are still typos and unclear statements thorughout the whole manuscript.

      One more drawback is that the software seems to only support single FASTQ uploading (or we cannot see the option to add more FASTQ).

      In the case of paired-end reads instead of single-end reads, in the web application, these can be selected at the beginning answering the question “How should we analyze your reads? Type of Analysis: Single-end Reads; Paired-end Reads”. In the case of the pipeline, now it is explained in the documentation how to mark if the data is paired-end or single-end. It has to be indicated in “input” and “r2file” configuration variables.

      In the case of multiple samples, and for that reason multiple FASTQ files, there is the button to add more samples in the web application. In the pipeline, multiple samples can be analyzed in a single run by putting all together in a folder and indicating it with variable “input”.

      Since usually people analyze more than one clone at the time (we usually analyze 96 clones together) this would mean that I have to upload manually each one of them.

      All files can be added in the same folder and analyzed in a single run using the nextflow pipeline. Web application has a limit of ten samples that can be added clicking the button “Add more”.

      Also, the software (the webserver, the docker does not work) works with Illumina data in our hands but not with ONT.

      This should be clarified in the manuscript.

      If a fastq is uploaded to CRISPR-A, the analysis can be done even if we haven't specifically optimized the tool for long reads sequencing platforms. We have checked the performance of CRISPR-A with CRISPRnano nanopore testing dataset and we have succeeded in the analysis. See results here: https://synbio.upf.edu/crispr-a/RUNS/tmp_1118819937/.

      Summary of the results:

      Sample

      CRISPRnano

      CRISPR-A

      'rep_3_test_800'

      42.60 % (-1del); 12.72 % (-10del)

      71% (-1del);

      16% (-10del)

      – 36 (logo)

      'rep_3_test_400'

      37.50 % (-1del); 15.63 % (-10 del)

      65% (-1del);

      28% (-10del)

      – 38 (logo)

      'rep_1_test_200'

      39.29 % (-1del); 8.33 % (-17del)

      10del; 17del; 1del

      'rep_1_test_400'

      80.11 % (-17 del)

      del17; del20; del18; del16;del 16

      'rep_0_test_400'

      80.11% (-17 del)

      del17; del20; del 18; del16; del16

      'rep_0_test_200'

      71.91% (-17 del)

      del17; del18

      As we can see from these exemple, CRISPR-A reports all indels in general without classifying them as edits or noise. Since nanopore data has a high number of indels as sequencing errors the percentages of CRISPR-A are not accurate. Eventhat, CRISPR-A reports more diverse outcomes, which are probably edits, than CRISPRnano.

      Therefore, we have added the following text in results:

      “Even single-molecule sequencing (eg. PacBio, Nanopore..) can be analyzed by CRISPR-A, targeted sequencing by synthesis data is required for precise quantification.”

      Reviewer #1 (Significance (Required)):

      As I mentioned above I think this could be a useful software for those people that are screening genome editing cells. Since CRISPR is widely used i assume that the audience is broad.

      There are many other software that perform similarly to CRISPR-A but it seems that this software adds few more things and seems to be more precise. It is hard to understand if everything the author claims is accurate since it requires a lot of testing and time and the reviewing time is of just two weeks. But 1) I have no reason to doubt the authors and 2) the software works

      Broad audience (people using CRISPR)

      Genetics, Genome Engineering, software development (we develop a very similar software), genetic compensation, stem cell biology

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      CRISPR-Analytics, abbreviated as CRISPR-A, is a web application implementing a tool for analyzing editing experiments. The tool can analyze various experiment types - single cleavage experiments, base editing, prime editing, and HDR. The required data for the analysis consists of NGS raw data or simulated data, in fastq, protospacer sequence and cut site. Amplicon sequence is also needed in cases where the amplified genome is absent from the genome reference list. The tool pipeline is implemented in NextFlow and has an interactive web application for visualizing the results of the analysis, including embedding the results into an IGV browser.

      The authors developed a gene editing simulation mechanism that enables the user to assess an experiment design and to predict expected outcomes. Simulated data was generated by SimGE over primary T-cells. The parameters and distributions were also fitted for 3 cell lines to make it more generalized (Hek293, K562, and HCT116). The process simulated CRISPR-CAS9 activity and the resulting insertions, deletions, and substitutions. The simulation results are then compared to the experimental results. The authors report the Jensen-Shannon (JS) divergence between the results. The exact distributions that served as input to the JS are not well defined in the manuscript (see below).

      To clarify the used distributions in the JS divergence calculation, we have changed the following piece of text in section “Simulations evaluation” of methods:

      “ Afterward, we tested the performance on the fifth fold, generating the simulated sequences with the same target and gRNA as the samples that belong to the fifth fold, in order to calculate the distance between these. The final validation, with the mean parameters of the different training interactions, was performed on a testing data set that was not used in the training. Validation was done with samples that had never taken place in the training process. Jensen distance is used to compare the characterization of real samples and simulated samples since this is the explored distance that differentiates better replicates among samples. In order to obtain the different distributions, the T cell data, including 1.521 unique cut sites, was split into different datasets based on the different classes: deletions, insertions and substitutions. For each of these classes, giving as input the datasets with only that class, we obtained the distribution for size and then for position of indels. The same was done for the other three cell lines: K562, HEK293 and HCT116, which included 96 unique cut sites, with three replicates each. The whole datasets (with 1521 and 96 unique cut sites) were split into five-folds (4 for training and one for test) and validation, in order to train and validate the simulator. Using the parameters obtained during the training-test iterations (the average value of the 5 iterations), we generate simulated sequences with the same target and gRNA as the samples that are assigned to the test subset to calculate the Jensen-Shannon (JS) divergence between the simulated and real samples of that subset. Finally, the same was performed for validation. The input for the distance calculations were the generated simulated subset and its real equivalent (same target and gRNA) distributions of the classes. ”

      The authors also report an investigation of different alignment approaches and how they may affect the resulting characterization of editing activity.

      The authors examine three different approaches to increase what they call "edit quantification accuracy" (aka, in a different place - "precise allele counts determination" - what is this???): (1) spike-in controls (2) UMI's and (3) using mock to denoise the results. See below for our comments about these approaches.

      Moreover, the authors developed an empirical model to reduce noise in the detection of editing activity. This is done by using mock (control), and by normalization and alignment of reads with indels, with the notion and observation that indels that are far from the cut site tend to classify as noise.

      The authors then perform a comparison between 6 different tools, in the context of determining and quantifying editing activities. One important comparison approach uses manually curated data. However - the description of how this dataset was created is far from being sufficiently clear. The comparison is also performed for HDR experiment type, which can be compared only to 2 other tools.

      We have changed alleles by editing outcomes in the title section “Three different approaches to increase precise editing outcomes counts determination” trying to be more clear.

      There is already a section in methods “Manual curation of 30 edited samples” explaining how the manual curation was done.

      We see the potential contribution aspects of the paper to be the following:

      1. NextFlow pipeline implementation is an important engineering contribution. Same is true for the interactive web application
      2. The option to simulate an experiment to assess it is a nice feature and can help experiment design
      3. Identification of amplicons when not provided as input
      4. CRISPR-A seeks substitutions along the entire amplicon sequence and is less dependent on the quantification window and on the putative cutsite
      5. Analysis of the difference, in edit activity, comparing different cell lines
      6. CRISPR-A supports the use of UMIs
      7. Interesting sequence pattern insights - like "...found certain patterns associated with low diversity outcomes: free thymine or adenine at the 3' nucleotide upstream of the cut site that leads to insertions of the same nucleotide, a free cytosine at the same place that leads to its loss, and strong micro-homology patterns that lead to long deletions " We further comment on the soundness of these contributions in our comments below and on their significance in our comments related to the general potential significance of the paper.

      Major comments:

      • Upon attempting to run an analysis from the web interface (https://synbio.upf.edu/crispr-a) and using: fastq of Tx and mock (control), the human genome and the gRNA sequence provided as input for the protospacer field, our run was not successful. In fact the site crashed with no interpretable error message from CRISPR-A. We have improved the error handling together with the explanations in the help page, where you will find a video. Hopefully these improvements will avoid unexpected crashings.

      • Moreover, there should be more clear context. There is no information regarding the type of experiments that can be analyzed with the tool. We figure it is multiplex PCR and NGS but can the tool also be used for GUIDESeq, Capture, CircleSeq etc.? Experiments that could be analyzed are specified in Results: “CRISPR-A analyzes a great variety of experiments with minimal input. Single cleavage experiments, base editing (BE), prime editing (PE), predicted off-target sites or homology directed repair (HDR) can be analyzed without the need of specifying the experimental approach.” We have also specified this in the nextflow pipeline documentation as well as in the web application help page.

      • No off target analysis. Only on-target The accuracy of the tool allows checking if edits in predicted off-target sites are produced, this being an off-target analysis with some restrictions, since just variants of the predicted off-target sites are assessed. Translocations or other structural off-targets will not be detected by CRISPR-A since the input data analyzed by this tool are demultiplexed amplicon or targeted sequencing samples.

      • No translocations and long/complex deletions The source of used data as input does not allow us to do this. There are other tools like CRISPECTOR available for this kind of analysis. We have added this to supplementary table 1.

      • We view the use of a mock experiment as control as a must for any sound attempt to measure edit activity. This is even more so when off-target events need to be assessed (any rigorous application of GE, certainly any application aiming for clinical or crop engineering purposes). We therefore think that all investigation of other approaches should be put in this context. We agree with the necessity of using negative controls to assess editing. For that reason we have included the possibility of using mocks in the quantification. In addition, there are few tools that include this functionality.

      • It's a nice feature to have simulated data, however, it is not a good approach to rely on it. As can be seen in the manuscript we highlight the support that simulations can give without pretending to substitute experimental data by just simulated data. Simulated data has been useful in the development and benchmarking of CRISPR-A, but we are aware of the limitations of simulations. Here some examples from the manuscripts explaining how we have used or can be used simulated data:

      “Analytical tools, and simulations are needed to help in the experimental design.”

      “simulations to help in design or benchmarking”

      “We developed CRISPR-A, a gene editing analyzer that can provide simulations to assess experimental design and outcomes prediction.”

      “Gene editing simulations obtained with SimGE were used to develop the edits calling algorithm as well as for benchmarking CRISPR-A with other tools that have similar applications.”

      Even simulated data has been useful for the development and benchmarking of CRISPR-A, we have also used real data and human validated data.

      • In p7 the authors indicate the implementation of three approaches to improve quantification. They should be clear as to the fact that many other tools and experimental protocols are also using these approaches. for example, ampliCan, CRipresso2 and CRISPECTOR all take into account a mock experiment run in parallel to the treatment. Even in page 7 (results) we don’t mention the other tools that also use mocks for noise correction, we detail this information in Supplementary Table 1. CRISPResso2 was not included since they can run mocks in parallel but only to compare results qualitatively, i.e. there is not noise reduction in their pipeline. It has been added to the table.

      • Figure1: ○ The figure certainly provides what seems to be a positive indication of the simulations approach being close to measured results. Much more details are needed, however, to fully understand the results.

      We have added more details.

      ○ Squema = scheme ??

      We have changed the word “schema” by diagram.

      ○ What was the clustering approach?

      As is said in the caption of Figure 1 the clustering is hierarchical: “hierarchical clustering of real samples and their simulations from validation data set.” And we have added that “The clustering distance used is the JS divergence between the two subsets.”

      ○ What is the input to the JS calculation? What is the dimension of the distributions compared? These details need to be precisely provided.

      The distribution has two dimensions, sizes and counts or positions and counts.

      As said before, to clarify the used distributions in the JS divergence calculation, we have changed the following piece of text in section “Simulations evaluation” of methods:

      “ Afterward, we tested the performance on the fifth fold, generating the simulated sequences with the same target and gRNA as the samples that belong to the fifth fold, in order to calculate the distance between these. The final validation, with the mean parameters of the different training interactions, was performed on a testing data set that was not used in the training. Validation was done with samples that had never taken place in the training process. Jensen distance is used to compare the characterization of real samples and simulated samples since this is the explored distance that differentiates better replicates among samples. In order to obtain the different distributions, the T cell data, including 1.521 unique cut sites, was split into different datasets based on the different classes: deletions, insertions and substitutions. For each of these classes, giving as input the datasets with only that class, we obtained the distribution for size and then for position of indels. The same was done for the other three cell lines: K562, HEK293 and HCT116, which included 96 unique cut sites, with three replicates each. The whole datasets (with 1521 and 96 unique cut sites) were split into five-folds (4 for training and one for test) and validation, in order to train and validate the simulator. Using the parameters obtained during the training-test iterations (the average value of the 5 iterations), we generate simulated sequences with the same target and gRNA as the samples that are assigned to the test subset to calculate the Jensen-Shannon (JS) divergence between the simulated and real samples of that subset. Finally, the same was performed for validation. The input for the distance calculations were the generated simulated subset and its real equivalent (same target and gRNA) distributions of the classes. ”

      ○ What clustering/aggregation approach did the authors use here (average dist, min dist, dist of centers?)

      Hierarchical clustering.

      ○ 5 pairs were selected out of how many? Call that number K.

      We have 100 samples in the validation set. Following the suggestion of indicating the total number of samples in the testing set, we have added this information to the figure caption.

      ○ What does the order of the samples in 1C mean? Is 98_real closer to 22_sim than to 98_sim? If so then state it. If not - what is the meaning of the order? Furthermore - how often, over K choose 2 pairs does this mis-matching occur for the CRISPR-A simulator??

      Exactly, it is a hierarchical clustering, where samples are sorted by JS divergence. It was already stated in Results: “In addition, on top of comparing the distance between the experimental sample and the simulated, we have included two experimental samples, SRR7737722 and SRR7737698, which are replicates. These two and their simulated samples show a low distance between them and a higher distance with other samples.” As well as in Figure 1 caption: “For instance, SRR7737722 and SRR7737698, which cluster together, are the real sample and its simulated sample for two replicates.” Then, since these samples are replicates, its simulations will come from the same input and is expectable to find low distance between these two real samples as well as between both of them and their simulation. We have stated it in the discussion.

      • "From the characterized data we obtained the probability distribution of each class" (page 3) - How is this done? how many guides? how many replicates? what is class? where do you elabore regarding it? how you obtain the distributions? More details of the methods need to be provided. Added in methods.

      • The 96 samples used for development here - where are they taken from? This should be indicated in the first time these samples are mentioned. Namely - bottom of P6 Added: “The 96 samples, from these cell lines, are obtained from a public dataset BioProject PRJNA326019.”

      • CRISPECTOR is not mentioned in the comparison in the section: "CRISPR-A effectively calls indels in simulated and edited samples" (Table S2). Is there a specific reason for having left it out? CRISPECTOR, as well as ampliCan, is not in Table S2, since in this table is shown detailed data from Figure 2. CRISPECTOR is compared with CRISPR-A in figure 5, where the different approaches to enhance precision, like using a negative control, are explored.

      • In the section "Improved discovery and characterization of template-based alleles or objective modifications" - part of the analysis was made over simulated data and then over real data. The authors state "it is difficult to explain the origin of these differences...". Thus, needs to be investigated in more detail ... :) (P5) Moreover - the performance over real data is, at the end of the day, the more interesting one for comparison purposes. We have added this sample to the human validated dataset to understand better what was happening in this case and the results and pertinent discussion have been added in the manuscript: “CRISPResso2 is detecting a 2% more of reads classified as WT. These 2% correspond with the percentage classified as indels by CRISPR-A. In total, the percentage difference between CRISPResso2 and CRISPR-A template-based class is 0.6%, higher in CRISPR-A. CRISPR-A percentage is closer to the ground truth data than CRISPResso2.”

      • We found no explanation of "spike-in"/"spike experimental data" across the entire article. There is some general language about lengths but the scheme is still totally unclear. We have indicated in methods section when we were talking about the spike-in controls.

      • Description of the 96 gRNAs? Is this data from REF26? If so - where do you state this? If so - how do the methods described herein avoid the unique characteristics of the data of REF26? We have added the reference: “The 96 samples, from these cell lines, are obtained from a public dataset BioProject PRJNA326019.” In addition, there are other sources of data, simulations and now even human validated data.

      • "distance between the percentage of microhomology mediated end-joining deletions of samples with the same target was calculated and the mean of all these distances was used to reduce the information of the 96 different targets to a single one." (P6) What is the exact calculation used? which distance? How was clustering performed? What is the connection for gene expression? The used distance was euclidean distance and the clustering was performed using hierarchical clustering. We have added this information to the manuscript. Regarding the connection of gene expression, we are exploring the correlation of two phenotypes: the gene expression of the proteins differentially related with NHEJ and MMEJ pathways, and the gene editing landscape (indel patterns that are related with MMEJ and those that are more prone to be generated with NHEJ). We have tried to improve this explanation in the manuscript.

      • "we have fitted a linear model to transform the indels count depending on its difference in relation to the reference amplicon" (P7) - needs more explanation. Is this part of the pipeline? We have explained better how we have fitted the linear model in methods: “A linear regression model was fitted to obtain the parameters of Equation 1 using spike-in controls experimental data (original count, observed count and size of the change in the synthetic molecules). We have used the lm function from R. Parameter m in Equation 1 is equivalent to the obtained coefficient estimate of x which was 0.156 and n is the intercept (n=10). ”.

      The model is optionally used as part of the pipeline as explained at the end of section “CRISPR-A gene editing analysis pipeline” to correct amplification biases due to differences in amplicon size. Then, what is part of the pipeline is the use of this model to make the transformation of counts from the observed counts to the predicted original counts. This is done with Equation 1 and can be found in the pipeline (VC_parser-cigar.R).

      • What is it "...manually curated data set"? (page 8) This is explained in “Manual curation of 30 edited samples” in methods.

      • Section "CRISPR-A empiric model removes more noise than other approaches" - with what data were the comparisons performed? Moreover, how were the comparison criteria selected (efficiency and sensitivity)? The literature already used several approaches to compare data analysis tools for editing experiments. See for example ampliCan, Crispresso (1 and 2) and CRISPECTOR. Maybe the authors should follow similar lines. The data used in this comparison comes from the reference 26:“26. van Overbeek, M. et al. DNA Repair Profiling Reveals Nonrandom Outcomes at Cas9-Mediated Breaks. Mol. Cell 63, 633–646 (2016).We have added it to the manuscript.

      The values of efficiency and sensitivity were not used directly for the comparison. We wanted to firstly evaluate our own algorithm. For that we obtained the values of efficiency and sensitivity for the previous mentioned dataset. These values were chosen to firstly have an idea of firstly, how much noise the algorithm is able to detect, and secondly, how much of it is able to be reduced after the Tx vs M process. That established a framework of comparison in which we can then compare directly the reported percentage of edition of the different tools.

      Regarding the approaches used to compare data analysis tools for editing experiments, we are going to explain why we haven’t followed similar lines or how we have now included it:

      In the case of ampliCan, the comparison that they do is with a synthetic dataset with introduced errors:

      "synthetic benchmarking previously used to assess these tools (Lindsay et al. 2016), in which experiments were contaminated with simulated off-target reads that resemble the real on-target reads but have a mismatch rate of 30% per base pair".

      In CRISPResso2, they benchmarked the efficiency against an inhouse dataset but this dataset is not published. Finally, for the benchmarking of CRISPECTOR, a manual curated dataset is used as a standard: "Assessment of such classification requires the use of a gold standard dataset of validated editing rates. In this analysis, we define the validated percent indels as the value determined through a detailed human investigation of the individual raw alignment results". In this sense, we have added a human validated dataset to do something similar to complement the analysis that we had already done.

      In the end, we consider that simulated or synthetic datasets, as those used by ampliCan or CRISPResso2, does not capture the complete landscape of confounding events that can be detrimental to the analysis results. Similar limitations are found in the use of a gold standard dataset of validated editing rates, since the amount of reads or samples that can be validated by humans is not big since it is time consuming. In addition, humans can also make errors and have biases. Eventhogh, we have found very valuable talking into consideration adding a human validated dataset to complete our exploration.

      • In the section "CRISPR-A empiric model removes more noise than other approaches" the authors state, incorrectly, that CRISPECTOR only reports the percentage of editing activity per site (there is much more information reported in the HTML report, including the type of edit event detected - deletion, of various lengths, insertions, substitutions etc). (P8) We thank the reviewer for the observation, as indeed the state is incorrect. What we wanted to express is that with CRISPECTOR we cannot trace individually each of the called indels, as any sort of excel or file with this content is given in the output. Therefore we cannot investigate which events have been corrected. To be precise in our statement we changed this sentence to the following:

      “CRISPECTOR, although providing extensive information on the statistics and information about the indels, is not possible to track the reads along their pipeline, thus we cannot know which have been corrected and which have not.”

      • Section "CRISPR-A noise subtraction pipeline" describes a pretty naive method for noise subtraction (P12). Should be rigorously compared, for Tx vs Mock experiments, to CRISPECTOR and to CRISPResso2. In the section "CRISPR-A empiric model removes more noise than other approaches", we perform an exhaustive comparison with a dataset that contains 288 Mock Files vs 864 Tx files. This can be better appreciated in the, now included, figure Sup. 13A. CRISPResso2 was intentionally left out since their pipeline does not use a model to reduce noise but other approaches like reducing the quantification window.

      • "recalculated using a size bias correction model based on spike-in controls empiric data.." (P14). Where is the formula? The formula comes from Equation 1. Now it is correctly referenced.

      • Section "Noise subtraction comparison with ampliCan and CRISPECTOR" - fake mock was generated for comparison. We consider the avoidance of a Mock control in experiments designed to measure editing activity to not be best practice. It is OK to support this approach in CRISPR-A. However - the comparison to tools that predominantly work using a Mock control (including ampliCan and CRISPECTOR) should be done with actual Mock. Not with fake Mock .... (P15) We understand the claims of the reviewer for this point as the use of a “fake” mock may not be the best practice for general comparisons. Nevertheless here what we wanted to compare is the difference in the edition percentages using mock and not using it. Since to make a run for on-target data CRISPECTOR requires a mock, the only way to replicate the conditions of “no mock” was to use a synthetic file with the same characteristics of the treated files in terms of depth, but with no edition/noise events to avoid any correction outside this framework. The other run was made with the 288 real Mocks. This was a solution ad Hoc for CRISPECTOR, with ampliCan we used only real mock since they allow to make runs without a mock for on-target.

      We changed the word fake for synthetic in the Noise subtraction comparison with ampliCan and CRISPECTOR section:

      “As for CRISPECTOR, since it requires a mock file to perform on-target analysis, synthetic mock files were generated”.

      Minor comments:

      • "Also, most of these tools lack important functionalities like reference identification, clustering, or noise subtraction" - bold part incorrect for CRISPECTOR, although it is not aiming only for CRISPECTOR In supplementary table 1, it is already elucidated which are the functionalities that each tool has. We have also added more context to that statement to highlight the differences between different tools:

      “Even not all of them have the same missing functionalities, as can be seen in the Supplementary table 1, CRISPR-A is the only tool that can identifies the amplicon reference from in a reference genome, correct errors through UMI clustering and sequence consensus, correct quantification errors due to differences in amplicon size, and includes interactive plots and a genome browser representation of the alignment.”

      • "Same parameters and probability distributions were fitted for three other cell lines: Hek293, K562, and HCT11626, to make SimGE more generalizable and increase its applicability" (page 3) - how was fitted? It was fitted in the same way as the t-cell samples as specified in methods. We have detailed more methods explaining how SimGE is built.

      • What is the "nature of modification"? (P5) We have changed nature by type for a better understanding.

      • In the section "CRISPR-A effectively calls indels in simulated and edited samples" (P5) towards the end, the authors write that the CRISPR-A algorithm did not give good results for a few examples. They then state that this was corrected and then yielded good results. There is no explanation of what correction was done, if it was implemented in the code and how to avoid/detect it in further cases. The problem was that the used reference sequence was too short. There is no modification in CRISPR-A code, we have just used the whole amplicon reference sequence obtained with the amplicon reference identification functionality of CRISPR-A. We have tried to explain it better in the manuscript: “Once the reference sequence is corrected used is the one corresponding to the whole reference amplicon, obtained with CRISPR-A amplicon sequence discovery function, CRISPR-A shows a perfect edition profile”

      • Cell culture, transfection, and electroporation - explanation only for HEK293, what about the others? (P15) We already had explained it for HEK293 and for C2C12, that are the experiments done by use. In the case of the analysis of the three cell lines and 96 targets we reference the source of the data as this data was not produced in our lab.

      • Typos and unclear wording: ○ "obtention" (P8) → changed by obtaining

      ○ "mico" >> micro (P 7,10) → changed

      ○ "Squema" >> scheme (Fig.1) → changed

      ○ "decombuled" (P10) → changed by separated

      ○ "empiric" >> empirical (P8 and other places) → changed

      ○ "Delins" (P14) → this is not a typo, it is used to indicate that a deletion and insertion has take place (http://varnomen.hgvs.org/recommendations/DNA/variant/delins/)

      ○ "performancer" (P9) → Change to performance

      ○ Change word across all article - "edition" to "editing" → changed. In the case of edition windows it has been changed by quantification windows.

      ○ "...has enough precision to find" (P6) not related to "results" section → We have moved to discussion.

      • Comments on figures: ○ Fig. 2C:

      ■ No CRISPECTOR in the analysis

      It is not included because for on-target analysis this tool requires a mock control sample. For this reason, it is compared in Figure 5D, where samples using negative controls are compared, and in Figure 5E where all tools and their different analysis options are compared.

      ■ It is simulated data only

      Yes, it is. Comparison with real data is done in Figure 2D and 2E. And now we also have added a ground truth data in our comparisons obtained from human validation of the classification of more than 3,000 different reads.

      ■ It is not violin plot as mentioned in the description

      It is a violin plot, but in general there is not much dispersion of the data points making the density curves flat.

      ○ Fig 3A - Is it significant? Yes, it is. We have added this information in the caption of the figure.

      ○ Fig. 4:

      ■ A

      • Each row/column is a vector of 96 guides? No, as it is said in the caption of the figure, it is the “mean between the distances calculated for each of the 96 different targets.”

      • How is the replicate number decided? Is it a different experiment by date? What is separating between experiments? Rep numbers? All this information should be found in the referenced paper from which this dataset comes from as already referenced.

      ■ B - Differential expression:

      We have realized that the caption was not correct, missing the explanations for Fig. 4B and all the following ones moved to a previous letter.

      • How? did you measure RNA? It is already stated in methods that RNAseq data was obtained from SRA database and the analysis was done using nf-core/rnaseq pipeline: “RNAseq differential expression analysis of samples from BioProject PRJNA208620 and PRJNA304717 was performed using nf-core/rnaseq pipeline⁵².”

      • Is the observed data in the figure sufficiently strong in terms of P-value? Yes, at is it is highlighted in the plot with ** and ***. We have also added the p-value in the cation of the figure.

      • Where is the third cell-line? As mentioned in the text, we have just chosen the cell lines that show us higher differences in the the percentage of MMEJ: “HCT116 than in K562, which are the cell lines with the major and minor ratios of MMEJ compared with NHEJ, respectively”.

      ○ Fig.13 - There is no A and B as mentioned in the text

      We thank the reviewer for the observation as we mistakenly uploaded the wrong figure. We corrected it.

      Reviewer #2 (Significance (Required)):

      We repeat the aspects of contribution, as listed in the first part of the review, and comment about significance:

      • NextFlow pipeline implementation is an important engineering contribution. Same is true for the interactive web application

        Significant engineering contribution. Nonetheless, we were not able to run the analysis. So - needs to be checked.

      Hopefully now that the documentation is properly added to the repository it will be easier to run analysis.

      • The option to simulate an experiment to assess it is a nice feature and can help experiment design

        An important methodology contribution

      • Identification of amplicons when not provided as input

        Not important in the context of multiplex PCR and NGS measurement assays, as amplicons will be known. Not clear what other contexts the authors were aiming at.

      It is useful to save time, no need to look for the sequence of each amplicon and add it as input. Also, it can help to detect unspecific amplification, since all amplicons of the same genome can be retrieved from the discovery amplicon process. In addition, we have already found one example where this avoids getting incorrect results: “Once the reference sequence used is the one corresponding to the whole reference amplicon, obtained with CRISPR-A amplicon sequence discovery function, CRISPR-A shows a perfect edition profile”. We have added this to the discussion of the manuscript.

      • CRISPR-A seeks substitutions along the entire amplicon sequence and is less dependent on the quantification window and on the putative cutsite

        Importance/significance needs to be demonstrated

      In figure 3 are shown the results of template-based and substitutions detection. CRISPR-A is a versatile and agnostic tool for gene editing analysis. This means that it can be prepared for the analysis of gene editing of future tools, since the cut site or other elements of experiment design are not required. In addition, it has been shown that when a mock is used its performance is comparable to filtering by edition windows, avoiding the loss of edits when the cut site is slided.

      • Analysis of the difference, in edit activity, comparing different cell lines

        Significant contribution. However - the methods need to be much better explained and the results better described in order for this to be useful to the community.

      We have made an effort to try to be more clear in the description of the results.

      • CRISPR-A supports the use of UMIs

        Mildly significant technical contribution. However - only addresses on-target. Also addressing off-target would have been significant.

      The use of UMIs is something that has never been done before in this context. Sequencing biases are not taken into account and editing percentages are reported as observed. Being able to differentiate between different molecules at the beginning of the amplification sequence, allows a higher precision avoiding under or overestimation of each of the species in a bulk of cells.

      In the case of off-targets, can be for sure done using sequencing the predicted off-target sites. In addition, there are other methods, like GuideSeq that can be used to discover off-targets, but this kind of data is out of the scope of CRISPR-A. Even that, we are aware of the importance of being able to analyse off-targets when in a context of a broad analysis platform and we will take these into consideration when participating in the building of crisprseq pipeline from nf-core.

      • Interesting sequence pattern insights - like "...found certain patterns associated with low diversity outcomes: free thymine or adenine at the 3' nucleotide upstream of the cut site that leads to insertions of the same nucleotide, a free cytosine at the same place that leads to its loss, and strong micro-homology patterns that lead to long deletions "

        As stated - interesting.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      CRISPR-Analytics, abbreviated as CRISPR-A, is a web application implementing a tool for analyzing editing experiments. The tool can analyze various experiment types - single cleavage experiments, base editing, prime editing, and HDR. The required data for the analysis consists of NGS raw data or simulated data, in fastq, protospacer sequence and cut site. Amplicon sequence is also needed in cases where the amplified genome is absent from the genome reference list. The tool pipeline is implemented in NextFlow and has an interactive web application for visualizing the results of the analysis, including embedding the results into an IGV browser. The authors developed a gene editing simulation mechanism that enables the user to assess an experiment design and to predict expected outcomes. Simulated data was generated by SimGE over primary T-cells. The parameters and distributions were also fitted for 3 cell lines to make it more generalized (Hek293, K562, and HCT116). The process simulated CRISPR-CAS9 activity and the resulting insertions, deletions, and substitutions. The simulation results are then compared to the experimental results. The authors report the Jensen-Shannon (JS) divergence between the results. The exact distributions that served as input to the JS are not well defined in the manuscript (see below).

      The authors also report an investigation of different alignment approaches and how they may affect the resulting characterization of editing activity. The authors examine three different approaches to increase what they call "edit quantification accuracy" (aka, in a different place - "precise allele counts determination" - what is this???): (1) spike-in controls (2) UMI's and (3) using mock to denoise the results. See below for our comments about these approaches. Moreover, the authors developed an empirical model to reduce noise in the detection of editing activity. This is done by using mock (control), and by normalization and alignment of reads with indels, with the notion and observation that indels that are far from the cut site tend to classify as noise. The authors then perform a comparison between 6 different tools, in the context of determining and quantifying editing activities. One important comparison approach uses manually curated data. However - the description of how this dataset was created is far from being sufficiently clear. The comparison is also performed for HDR experiment type, which can be compared only to 2 other tools. We see the potential contribution aspects of the paper to be the following:

      1. NextFlow pipeline implementation is an important engineering contribution. Same is true for the interactive web application
      2. The option to simulate an experiment to assess it is a nice feature and can help experiment design
      3. Identification of amplicons when not provided as input
      4. CRISPR-A seeks substitutions along the entire amplicon sequence and is less dependent on the quantification window and on the putative cutsite
      5. Analysis of the difference, in edit activity, comparing different cell lines
      6. CRISPR-A supports the use of UMIs
      7. Interesting sequence pattern insights - like "...found certain patterns associated with low diversity outcomes: free thymine or adenine at the 3' nucleotide upstream of the cut site that leads to insertions of the same nucleotide, a free cytosine at the same place that leads to its loss, and strong micro-homology patterns that lead to long deletions " We further comment on the soundness of these contributions in our comments below and on their significance in our comments related to the general potential significance of the paper.

      Major comments:

      • Upon attempting to run an analysis from the web interface (https://synbio.upf.edu/crispr-a) and using: fastq of Tx and mock (control), the human genome and the gRNA sequence provided as input for the protospacer field, our run was not successful. In fact the site crashed with no interpretable error message from CRISPR-A.
      • Moreover, there should be more clear context. There is no information regarding the type of experiments that can be analyzed with the tool. We figure it is multiplex PCR and NGS but can the tool also be used for GUIDESeq, Capture, CircleSeq etc.?
      • No off target analysis. Only on-target
      • No translocations and long/complex deletions
      • We view the use of a mock experiment as control as a must for any sound attempt to measure edit activity. This is even more so when off-target events need to be assessed (any rigorous application of GE, certainly any application aiming for clinical or crop engineering purposes). We therefore think that all investigation of other approaches should be put in this context.
      • It's a nice feature to have simulated data, however, it is not a good approach to rely on it.
      • In p7 the authors indicate the implementation of three approaches to improve quantification. They should be clear as to the fact that many other tools and experimental protocols are also using these approaches. for example, ampliCan, CRipresso2 and CRISPECTOR all take into account a mock experiment run in parallel to the treatment.
      • Figure1:
        • The figure certainly provides what seems to be a positive indication of the simulations approach being close to measured results. Much more details are needed, however, to fully understand the results.
        • Squema = scheme ??
        • What was the clustering approach?
        • What is the input to the JS calculation? What is the dimension of the distributions compared? These details need to be precisely provided.
        • What clustering/aggregation approach did the authors use here (average dist, min dist, dist of centers?)
        • 5 pairs were selected out of how many? Call that number K.
        • What does the order of the samples in 1C mean? Is 98_real closer to 22_sim than to 98_sim? If so then state it. If not - what is the meaning of the order? Furthermore - how often, over K choose 2 pairs does this mis-matching occur for the CRISPR-A simulator??
      • "From the characterized data we obtained the probability distribution of each class" (page 3) - How is this done? how many guides? how many replicates? what is class? where do you elabore regarding it? how you obtain the distributions? More details of the methods need to be provided.
      • The 96 samples used for development here - where are they taken from? This should be indicated in the first time these samples are mentioned. Namely - bottom of P6
      • CRISPECTOR is not mentioned in the comparison in the section: "CRISPR-A effectively calls indels in simulated and edited samples" (Table S2). Is there a specific reason for having left it out?
      • In the section "Improved discovery and characterization of template-based alleles or objective modifications" - part of the analysis was made over simulated data and then over real data. The authors state "it is difficult to explain the origin of these differences...". Thus, needs to be investigated in more detail ... :) (P5) Moreover - the performance over real data is, at the end of the day, the more interesting one for comparison purposes.
      • We found no explanation of "spike-in"/"spike experimental data" across the entire article. There is some general language about lengths but the scheme is still totally unclear.
      • Description of the 96 gRNAs? Is this data from REF26? If so - where do you state this? If so - how do the methods described herein avoid the unique characteristics of the data of REF26?
      • "distance between the percentage of microhomology mediated end-joining deletions of samples with the same target was calculated and the mean of all these distances was used to reduce the information of the 96 different targets to a single one." (P6) What is the exact calculation used? which distance? How was clustering performed? What is the connection for gene expression?
      • "we have fitted a linear model to transform the indels count depending on its difference in relation to the reference amplicon" (P7) - needs more explanation. Is this part of the pipeline?
      • What is it "...manually curated data set"? (page 8)
      • Section "CRISPR-A empiric model removes more noise than other approaches" - with what data were the comparisons performed? Moreover, how were the comparison criteria selected (efficiency and sensitivity)? The literature already used several approaches to compare data analysis tools for editing experiments. See for example ampliCan, Crispresso (1 and 2) and CRISPECTOR. Maybe the authors should follow similar lines.
      • In the section "CRISPR-A empiric model removes more noise than other approaches" the authors state, incorrectly, that CRISPECTOR only reports the percentage of editing activity per site (there is much more information reported in the HTML report, including the type of edit event detected - deletion, of various lengths, insertions, substitutions etc). (P8)
      • Section "CRISPR-A noise subtraction pipeline" describes a pretty naive method for noise subtraction (P12). Should be rigorously compared, for Tx vs Mock experiments, to CRISPECTOR and to CRISPResso2.
      • "recalculated using a size bias correction model based on spike-in controls empiric data.." (P14). Where is the formula?
      • Section "Noise subtraction comparison with ampliCan and CRISPECTOR" - fake mock was generated for comparison. We consider the avoidance of a Mock control in experiments designed to measure editing activity to not be best practice. It is OK to support this approach in CRISPR-A. However - the comparison to tools that predominantly work using a Mock control (including ampliCan and CRISPECTOR) should be done with actual Mock. Not with fake Mock .... (P15)

      Minor comments:

      • "Also, most of these tools lack important functionalities like reference identification, clustering, or noise subtraction" - bold part incorrect for CRISPECTOR, although it is not aiming only for CRISPECTOR
      • "Same parameters and probability distributions were fitted for three other cell lines: Hek293, K562, and HCT11626, to make SimGE more generalizable and increase its applicability" (page 3) - how was fitted?
      • What is the "nature of modification"? (P5)
      • In the section "CRISPR-A effectively calls indels in simulated and edited samples" (P5) towards the end, the authors write that the CRISPR-A algorithm did not give good results for a few examples. They then state that this was corrected and then yielded good results. There is no explanation of what correction was done, if it was implemented in the code and how to avoid/detect it in further cases.
      • Cell culture, transfection, and electroporation - explanation only for HEK293, what about the others? (P15)
      • Typos and unclear wording:
        • "obtention" (P8)
        • "mico" >> micro (P 7,10)
        • "Squema" >> scheme (Fig.1)
        • "decombuled" (P10)
        • "empiric" >> empirical (P8 and other places)
        • "Delins" (P14)
        • "performancer" (P9)
        • Change word across all article - "edition" to "editing"
        • "...has enough precision to find" (P6) not related to "results" section
      • Comments on figures:
        • Fig. 2C:
      • No CRISPECTOR in the analysis
      • It is simulated data only
      • It is not violin plot as mentioned in the description
        • Fig 3A - Is it significant?
        • Fig. 4:
      • A
      • Each row/column is a vector of 96 guides?
      • How is the replicate number decided? Is it a different experiment by date? What is separating between experiments? Rep numbers?
      • B - Differential expression:
      • How? did you measure RNA?
      • Is the observed data in the figure sufficiently strong in terms of P-value?
      • Where is the third cell-line?
        • Fig.13 - There is no A and B as mentioned in the text

      Significance

      We repeat the aspects of contribution, as listed in the first part of the review, and comment about significance:

      • NextFlow pipeline implementation is an important engineering contribution. Same is true for the interactive web application
        • Significant engineering contribution. Nonetheless, we were not able to run the analysis. So - needs to be checked.
      • The option to simulate an experiment to assess it is a nice feature and can help experiment design
        • An important methodology contribution
      • Identification of amplicons when not provided as input
        • Not important in the context of multiplex PCR and NGS measurement assays, as amplicons will be known. Not clear what other contexts the authors were aiming at.
      • CRISPR-A seeks substitutions along the entire amplicon sequence and is less dependent on the quantification window and on the putative cutsite
        • Importance/significance needs to be demonstrated
      • Analysis of the difference, in edit activity, comparing different cell lines
        • Significant contribution. However - the methods need to be much better explained and the results better described in order for this to be useful to the community.
      • CRISPR-A supports the use of UMIs
        • Mildly significant technical contribution. However - only addresses on-target. Also addressing off-target would have been significant.
      • Interesting sequence pattern insights - like "...found certain patterns associated with low diversity outcomes: free thymine or adenine at the 3' nucleotide upstream of the cut site that leads to insertions of the same nucleotide, a free cytosine at the same place that leads to its loss, and strong micro-homology patterns that lead to long deletions "
        • As stated - interesting.
    1. Giterature: preliminary definitions We will add to the already long list of neologisms of digital literature the term giterature, a fusion between the GIT computer protocol and literary writing understood in a wider sense, including the editorial device. The following attempt to define literature is largely inspired by the achievements of the Abrüpt publishing house. The Abrüpt example introduces indeed an alternative form of platforming of the literary fact which does not correspond to the third generation as it has been defined by Flores (2019). From then on, it seems essential to us to present giterature as a literary phenomenon combining both the editorial and scriptural anchorage of a platform (the software forges in the case of giterature), and the emancipating mastery of computer writing in the service of a creativity and a poetic innovation [not sure of the term]. [git , which publishers and writers have been seizing on for some time]. Where the study of the third generation could focus more on strategies of use (??? on wattpad, anne savelli on instagram ?) and/or detour [anne archet ?] of the platforms, literature requires us to return to the confrontation between two conceptions of writing : literary writing and computer code writing, reunited in the same writing space through the GIT protocol and the Gitlab platform. Such a subject could undoubtedly fall within the scope of critical code studies, we will once again privilege the intermedial perspective – which in no way prevents us from confronting it with the poetics of code.

      Pour moi ici il faut marquer une rupture. Le cas de la giterature est très différent des autres. Je dirais quelque chose du type:

      In the cases of the twitterature and the literatube, clearly corresponds to the characteristics of the 3rd generation described by Flores: we go looking for the public where it is and we use platforms that, for this reason, become technically "transparent". The case of literature is somewhat different. If it is true that we use an existing platform, it is less true that we look for an already existing public, because the public of git (as a protocol) and of online platforms like gitlab is not - or not mainly - a literary audience. The technique creates a tension effect, thus between two possible audiences: geeks and literary people. Geeks who can become literary, but also literary people who can become geeks. The technical question - the mastery of the protocol, the mastery of the writing of the code, with its rules, its habits, its habitus - thus becomes the central point: the technique is less transparent and it is far from being a simple "means of expression". It is the meaning of the literary gesture itself. A literary gesture that explicitly becomes a technical gesture, and a particular type of technical gesture; it is not a question of writing on beautiful paper with a well-cut quill pen, like Flaubert, or on Word with a brand new Mac - as the Holliwoodian representation of the writer would have it - but of writing commands in a black terminal - which corresponds more to the image of the hacker than to that of the writer.

    1. Show Most Used Methods Use the code given below to show most used methods on Checkout:

      Show Methods With Offers?

      the heading does not match with the sample code and screenshot. need to change this

    1. Reviewer #1 (Public Review):

      The idea that because the hippocampal code generates responses that match the most needed variable for each task (time or distance) makes it a predictive code is not fully proved with the analyses provided in the manuscript. For example, in the elapsed time task, there are also place cells and in the fixed-distance travel there are also cells that encode other features. This, rather than a predictive code, can be a regular sample of the environment with an overrepresentation of the more salient variable that animals need to get in order to collect rewards. In addition, the analysis provided in the manuscript are rather simple, and better controls could be provided. Improving the analytical quantification of the results is necessary to support the main claim.

      - What is the relationship of each type of cell with the speed of the animal?<br /> - What is the relationship with the n of trial that the animal has run (first 10 trials, last 10 trials..)?<br /> - What is the average firing rate of each neuron? Is there any relationship between intrinsic firing rate and the type of coding that the cell develops in each task?<br /> - What is the relation of the units of each type with LFP features (theta phase, ripple recruitment)?

    1. Within cooking this might look like:

      Comparing coding the process of cooking to coding and the meticulous details that are essential to the process makes it easier to understand. As this is my first time coding, I am now understanding how crucial steps in achieving a successful code.

  7. betasite.razorpay.com betasite.razorpay.com
    1. The fission yeast genome contains a single gene correspondingto each of the Argonaute (ago1), Dicer (dcr1), and RNA-dependent RNA polymerase (rdp1) factors required for RNAi inother systems

      Genes that code for RNAi factors in S. pombe

    Annotators

    1. I mean people suggested that you could replace  legal contracts with small contracts which are   programs that are built on the blockchain and  that's usually accompanied with the phrase coder's   law this is a smart contract and this is a legal  contract these two things aren't the same right   you can't have law be enacted by computer code  because law inherently requires third parties   to assess evidence intentions and a bunch of  other variables that you just can't Outsource

      Fundamental difference between legal contracts and "smart contracts"

      Legal contracts are subject to judgement of evidence and intention. "Code as law" can't do that.

    1. To perform this analysis, the optimization will be run with and without need for DHW. The optimization will run with the objective of minimizing TOTEX.

      problème concernant les temps verbaux. Ici quelques info utiles pour vous:

      • Generally, past perfect is used to refer to literature (have studied), past tense is used to describe anything that you did or are proposing (studied), and present tense is used to refer to another part of the paper (are shown in Figure 2) or discuss conclusions.
      • Methods, when they are things you have done, are in the simple past (I studied cats). When they are more like something that goes on inside a code that you developed or inside the overall proposed methodology, then present tense can also be used. Example: Before optimization, 153 gates were used at the airport. (clearly past) By following the proposed methodology, the number of gates needed was reduced to 122. (Result, also past) "In the first step of the proposed methodology, passengers are divided ... Here, the gates were divided by ..." (present tense for methodology discussion, past for what was done to the data).

    Annotators

    1. Actually, using the hypothesis BOOKMARKLET is much more convinient than 'paste a link' or typing "via.hypothes.is/" in front of every link you want to annotate. With the bookmarklet all you need to do is, when you find a page that you want to bookmark, in the search bar of the mobile browser search for the name you saved the bookmarklet as and click it. It will immediately load hypothesis on the page just like clicking the hypothesis extention would do in pc. To bookmark the bookmarklet link (which can be found in https://web.hypothes.is/start) in the mobile browser, copy the link address of the bookmarklet link (which is a javascript code) and just edit an existing (useless) bookmark already there in the mobile browser replace the url with the bookmarklet link. Also give it a title (like "bookmarklet hypothesis") which you would type in the address bar of the mobile browser to find the bookmarklet bookmark.

      Manual to use hypothes.is in mobile Firefox

      via.hypothes.is does not work as they stopped providing an open proxy. It makes all URL forwarders and standalone apps on Android close to useless.

      The piece of advice provided here works, but it is highly unintuitive.

      The mechanics is this: 1. open a page where you want to add annotation 2. click on a bookmark as if you are opening a new page 3. since the bookmark is actually just a piece of javascript, it will simply load hypothes.is client 4. profit.

      To make it work in Firefox mobile, the instruction is this: 1. create a new arbitrary bookmark on some page. It will appear in the list of your bookmarks. 2. copy the bookmarklet javascript code. I was not able to do it directly in the FF mobile, so I copied it on my desktop and sent it to the phone via an IM 3. edit the newly created bookmark and a) give it a name, e.g., "hypothesize"; and b) replace the URL with the piece of copied javascript code 4. now when you want to add an annotation, follow the process above.

    1. While Clang has historically been faster than GCC at compiling, the output quality has lagged behind. As of 2014, performance of Clang-compiled programs lagged behind performance of the GCC-compiled program, sometimes by large factors (up to 5.5x),[28] replicating earlier reports of slower performance.[26] Both compilers have evolved to increase their performance since then, with the gap narrowing: Comparisons in November 2016 between GCC 4.8.2 versus clang 3.4, on a large harness of test files shows that GCC outperforms clang by approximately 17% on well-optimized source code. Test results are code-specific, and unoptimized C source code can reverse such differences. The two compilers thus seem broadly comparable.[30][unreliable source] Comparisons in 2019 on Intel Ice Lake has shown that programs generated by Clang 10 has achieved 96% of the performance of GCC 10 over 41 different benchmarks (while winning 22 and losing 19 out of them).[29]

      Clang is faster than GCC because it uses LLVM as its underlying compiler infrastructure. LLVM has a highly optimised code generation approach that makes it very efficient in terms of both space and time. In addition, Clang also contains several optimisation passes (such as global optimisations and link-time optimisation) that improve the generated code further.

    1. Les investigations sur la cause de ce comportement sont en cours.

      Hypothèses : - petit bug dans mon code ? - limitation due au fait que l'anergie utilise que le circuit à 18-19°C, donc limitation de la puissance disponible. A voir si une telle contrainte est définie dans AMPL ?

    1. all source code for the Designated Smart Contracts

      source code for Designated Smart Contracts should be explained line by line to make them accessible and inclusive to non-coders.

    1. This keeps the infrastructure-as-code syntax and deployment mechanism universal. In this post, we’ll have a look at how we deploy our infrastructure using Terraform at Slack

      desc

    1. Python uses an interpreter, so when you run a Python program, the interpreter translates the Python code into binary while it’s running it.

      What are some reasons that creator(s) of python chose to use an interpreter over standard compiler? Would python lose its popularity if it was using a compiler?

    1. we need to reunite model language and programming languages this was the great vision of Simula of beta and Delta L o Delta was not designed to

      We need to reunite model language and programming languages. This was the great vision of Simula[...] We need to stop believing that we can document programs by some well-written code or "clean code". Clean code is great for small programs. Systems need more than comments and a few diagrams—systems need the voice of the designer in them with multimedia, but they also need more expressive paradigms for putting these in our programs.

    1. Anonymisation, or de-identification, is the process of removing all information that may lead to an individual being identified. Pseudonymisation, on the other hand, means processing personal information in such a way that the data can no longer be linked to a specific data subject without the use of additional information. This often includes substituting identifiers (e.g., social security number) with other values in such a way that they can be matched back to the identifiers by means of a translated code key.

      Vi rekommenderar pseudonymisering när det gäller intervjudata. "Informant A som jobbar på skola X" t ex

    1. The bulk of the intellectual work of getting the machine to do what one wants will be about coming up with the right examples, the right training data, and the right ways to evaluate the training process

      This speculative process kind of confuses me - surely these existing languages expose APIs very carefully crafted by humans, and as technologies and human needs evolve, we'll want to deliberately and consciously design and evolve the user interfaces to computing that we're capable of. No human can check that machine code will execute how they'll expect in all cases - and if we are to define some complete and inflexible specification for how a program should behave, this is no different from programming itself - albeit with a different paradigm.

    1. I stayed up each night until the problem was released (11pm my time), but I didn’t try to code up the solution right away. Instead, I read the problem description before bed and then thought about how to solve it while falling asleep. I usually woke up every morning with a full sketch of the solution in my head, or something close to it.

      Sleep tactic for solving programming challenges

  8. Dec 2022
    1. The bar shows both my layouts ("Keyboard: us, latam") without coloring the current one, and now everything else on my bar is uncolored as well. According to the previous guide, I should add "output_format = i3bar" to my i3status config file but if I do that, my i3bar doesn't display the outputs, but the code of my config file.

      que

    1. A few examples include ivy.Container.cont_multi_map which is used for mapping a function to all leaves of multiple containers with the same nested structure, ivy.Container.cont_diff which displays the difference in nested structure between multiple containers, and ivy.Container.cont_common_key_chains which returns the nested structure that is common to all containers. There are many more examples, check out the abstract ContainerBase class to see some more!

      it might be use explaining reasoning behind keeping some method static. - Also provides examples of the functions. - Link to documentation rather than source code.

    1. Request Parameters

      The format is incorrect here.

      Should have a heading, description of the API, endpoint, code, Req Parameters and Response Parameters.

      Please make the change accordingly.

    1. At the end of the day, Copilot is supposed to be a tool to help developers write code faster, while ChatGPT is a general purpose chatbot, yet it still can streamline the development process, but GitHub Copilot wins hands down when the task is coding focused!

      GitHub Copilot is better at generating code than ChatGPT

    1. Now, if a name is going to be easily changeable forever, please do make it descriptive. I’d much rather maintain code where the variables look like numCols and numRows than i and j. (Just, for the love of God, if you change the meaning, also change the name). But if a name is going to serve as, in any sense, an identifier, something that will point at a big complicated thing from many places far away, make it an opaque identifier. You get similar advice in database schema design — if your user’s email address can change, don’t use their email address as a foreign key in your database. Use a number or a random string instead. Something immutable.
    1. Alan Turing zorgt voor de overwinning van WW2

      Beetje overdreven. Beter: een team o.l.v. Alan Turing breekt de Enigma-code m.b.v. … en draagt daarmee bij aan de overwinning van de geallieerden in de Tweede Wereldoorlog.

    1. wijst Sprite(2) naar Sprite

      In de schermafdrukken met code-voorbeelden die getoond worden heet wat hier "Sprite" genoemd wordt, "Object" - onnodig verwarrend.

    2. Deze code geeft Sprite(2) de leiding over Sprite.

      Het is niet duidelijk welke code hier bedoeld wordt. Misschien even terugverwijzen naar les 2 voor een voorbeeld?

    1. Code die omringd wordt door een warpblok, wordt in één keer tegelijkertijd uitgevoerd.

      Hier ontbreekt de voorafgaande uitleg dat Snap! zonder zo'n warp-blok druk is met allerlei andere taken, zoals het actief wachten op toetsenbordinvoer, en het bijwerken van beelden op het scherm.

    1. De opgave "Een stapje verder" lijkt mij een flinke sprong vooruit; de leerlingen zullen in praktijk veel moeten plakken en knippen om hun code werkend te krijgen. Op zijn minst zou je dan hiervóór moeten uitleggen hoe je een enkele regel kunt verwijderen, of onttrekken aan een blok. Verder denk ik dat ook snelle leerlingen veel tijd nodig zullen hebben om uit te vinden hoe ze de opgave kunnen realiseren.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We thank all the reviewers for their positive and constructive comments on the manuscript. Changes to the main text have been marked in red text in the uploaded file. We address each of the reviewers’ comments point-by-point below. The major revisions include:

      Improved statistical details and attention to subjective language throughout. New TEM data included in the new Figure 1—figure supplement 1 to illustrate the drastic ultrastructural differences between MCs and neighboring epidermal cells. Inclusion of an estimate of the “recombination efficiency” of our keratinocyte lineage trace in Figure 4. Additional quantification of MC density in the different body regions (Figure 6) and prior to squamation (Figure 7F). Imaging of the zebrafish oral mucosa (new Response Figure 1). More nuanced interpretations of the eda and fgf8a mutant phenotypes.

      2. Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The authors describe and characterize the touch system in zebrafish as a new model to study MC development and maintenance. The manuscript is very well written, and the experiments are carefully executed and beautifully illustrated. This study addresses the origin of zebrafish MCs, shows that they are innervated by somatosensory neurons and that they share molecular properties with mammalian MCs. In addition, the authors developed transgenic lines that allow them to study MCs in vivo.

      Genetic lineage tracing shows that zf MCs are derived from the epidermis as in mouse and not from neural crest cells, as described for avian MCs. In addition, longevity and turnover of murine MCs was controversial. Here, the authors show that zf MCs constantly turnover and that the distribution and turnover rate in the trunk depends on underlying scales. They show that the loss of scales in eda mutants leads to a decrease in MC production and an increase in MC death showing that scales are required for MC production and maintenance. Using a specific fgf8 mutant allele that causes an increase in Fgf signaling and an increase in scale size they demonstrate that scales are sufficient to induce MCs.

      In summary, this manuscript is a rigorous and beautiful characterization of MCs development and maintenance. The authors demonstrate that zebrafish MCs share many characteristics with mammalian MCs. The generation of MCs specific transgenic lines, coupled with existing transgenic lines that label somatosensory cells and cells in the scales sets the stage for detailed further analyses. For example, using these tools one can now study how the size of the MC progenitor domain is controlled, if progenitors migrate and what the identities of the molecular signals from the nerves and scales to progenitors and differentiated MCs are.

      Minor comments:

      Line 71: Why is the heterogeneity a limitation? Couldn't it also exist in zebrafish?

      Thank you for raising this question. The limitations are meant to refer to current limitations of the rodent system and demonstrate an opportunity for a new model system to complement the rodent system. We have rephrased this section to better articulate this point.

      Introduction:

      “While this system has been useful for understanding many aspects of MC development and function, the rodent system also has several significant limitations that warrant additional models to improve the understanding of MCs.”

      We also added the following to the discussion: *“As the majority of the analyses completed here focus on MCs found in the trunk epidermis, it will be intriguing to determine whether all MCs in different skin compartments in the juvenile and adult zebrafish share similar molecular, cellular, and functional properties.” *

      Line 295: The authors write: 'Thus, our observations indicate that the decrease in MC cell density in eda mutants is likely due to both reduced MC production and increased MC turnover'.

      It should say: '.. increased MC loss'. In mutants the MCs show poor turnover. I believe the term 'turnover' implies that the cells are being replaced, which is only partially happening here.

      Thank you for the clarification. We agree with the reviewer and have changed the wording from “turnover” to “loss” in lines 295 and 301.

      Line 301: 'The authors state: 'these data suggest that Eda signaling is required for MC development, maintenance, and distribution along the trunk.'

      The authors do not show any data that Eda signaling is involved in MC development but only that scales are needed. The MC inducing signals from scales to the epidermis could be independent from Eda signaling. Please rephrase.

      Please discuss that not all MC specification/development depends on scales. Even in the scale-less eda mutants some MCs form (as in the inter scale regions in wt?) and even turnover. Do scales secrete a signal that increases proliferation of existing MC progenitors but scales do not affect specification?

      We respectfully disagree with the reviewer on the interpretation of these results. Our experimental manipulation (examination of eda-/- vs. sibling controls) only allows us to conclude that Eda signaling - either directly or indirectly - is required for these processes along the trunk. To determine whether signaling from scales is required would require identification of the signal(s) and/or loss/ablation of scales independent of Eda. We have rephrased the results to more clearly state our interpretation. The corresponding portion of the discussion now reads “Further investigations are required to determine whether Eda signaling directly regulates the differentiation of MC progenitors. Alternatively, since eda mutants lack scales (Harris et al., 2008) and have decreased epidermal innervation (Rasmussen et al., 2018), MC development may require scale-derived and/or somatosensory neuron-derived signals.”

      Line 320: The authors describe that the fgf8 allele leads to a redistribution of MCs. Is it really a redistribution, or is it ectopic induction or expansion of existing progenitors? Redistribution implies that the expansion is due to a loss of MCs in another region, which I do not see in the data.

      Thank you for raising this point about the potentially poor wording choice relating to “redistribution”. We do not yet know whether the distribution of MCs in fgf8a mutants reflects a redistribution, ectopic induction, or expansion of existing progenitors (these are excellent ideas for future studies). Thus, in response to the reviewers comment, we have changed the heading for this results section to “The MC pattern is not predetermined along the trunk” and concluded the section as follows: “... the distribution of MCs tracked with the altered scale size and shape in the mutants, suggesting the MC pattern is not predetermined within the trunk skin compartment (Figure 9E-H).”

      Figures:

      • Figure 1, panels B-C': EM images are very dark and difficult to see. Letter 'a' is on top of the axon, maybe move to the side and pseudo-color different structures.

      In response to these suggestions, we have adjusted the brightness and contrast to lighten the TEM images in Figure 1B-C’ as much as possible. We also moved the ‘a’ off to the side in Figure 1B’ to make the axon more visible. In response to Reviewer #3’s comments (see below), we also added an additional TEM image in the new Figure 1—figure supplement 1 that has presumptive keratinocytes and a MC differentially pseudo-colored. We hesitate to pseudo-color the cells/structures in Figure 1B-C’ for fear of obscuring the underlying TEM images.

      • Figure 1, panel D: very difficult to see the magenta axons in the cartoon. Please enlarge and make brighter.

      We agree that this needed improvement. In the revised Figure 1D, we made the axons clearer and illustrated the different types of MC-axon associations we observe in Figure 2. We also refer the reader back to this figure in the corresponding axon innervation results section.

      • Figure 2, panels A and D: keeping the same antibody stainings in the same color would help with visualization. Matching the bar plots in panel C would be even nicer.

      Thank you for the suggestion. The revised Figure 2 now has a consistent color scheme.

      • Figure 2, panel C: please identify in the legend if the error bars are SD, SEM or other.

      These error bars represent 95% confidence intervals. This information has been added to the figure legend.

      • Figure 2, panels G and H: MCs are in cyan in the image, but green in the legend.

      This has been corrected.

      • Figure 3: include percentages and total number in the image instead of the legend.

      The numbers and percentages have now been added to the Figure 3 panels. We have left them in the figure legend for clarity on what was scored.

      • Figure 6, panel B: which part of the eye is being depicted?

      Thank you for the question. We imaged the corneal epithelium above the lens. This has been clarified in the appropriate parts of Figures 6 and 8 and the corresponding figure legends.

      • Figure 6, panel F: please provide error bars and statistics to show that the operculum has a higher density of MCs.

      Thank you for the suggestion. In response to the comment, we revised Figure 6F by: 1) increasing the sample size; 2) replotting the data as boxplots rather than bar graphs; and 3) including the results of a one-way ANOVA.

      • Figure 7, panels F-H: for simple linear regression, please also provide F and p values.

      Thank you for the suggestion. This information has been added to the figure legend.

      • Figure 8, panel D: colors for SL do not follow a scale, very hard to understand which is which.

      In response to the reviewer’s suggestion, we tried numerous different color palettes. However, we were unable to find a color palette that allowed us to distinguish individual points as well as the rainbow palette used in Figure 8D. Thus, after careful consideration, we have elected to keep the original palette here. For consistency, we have used the same palette in the revised Figure 8–figure supplement 1D and Figure 9–figure supplement 1D.

      Methods:

      • Line 472: the word "sex" should be used instead of "gender".

      Thank you for the correction. This is fixed in the revision.

      • Image analysis, line 593. Please provide a more detailed explanation or describe the ImageJ macro used for the analysis.

      Our ImageJ macro has been fully annotated and is provided as Figure 2—source code 1 in the revision. The corresponding methods section has also been updated to clarify the methodology.

      Reviewer #1 (Significance (Required)):

      Soft touch is perceived by Merkel cells (MCs). How MCs develop and are maintained is not well understood because MC development is difficult to study in mammals due to their in utero development. The authors describe and characterize the touch system in zebrafish as a new model to study MC development and maintenance. The study demonstrates that the zebrafish touch system shares many characteristics with its mammalian counterpart, namely its developmental origin, innervation and molecular characteristics. In contrast to mammals, zebrafish transgenic lines that the authors generated, allow the in vivo analysis of Merkel Cell specification, development and maintenance. Therefore this study is the foundation for future detailed cellular and molecular analyses of the touch sensory system and will be of interest to developmental biologists studying stem cells, regeneration and aging, as well as neuroscientists.

      We thank the reviewer for their positive assessment of the manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This is a very nice and straightforward paper characterizing mechanosensory Merkel cells in the zebrafish skin. The paper uses a number of criteria, based on our knowledge of Merkel cells in mammals, to identify a population atoh1a expressing cells, with neurosecretory granules and actin rich microvilli as Merkel cells in the zebrafish skin. The authors have used existing transgenic lines and and developed some of their own, described in this paper, to follow the development of Merkel cell in zebrafish. They show that Merkel cells are derived from basal keratinocytes not neural crest cells. They have region specific densities that influenced by underlying structures like scales and fin rays. They go to show that Ectodysplasin signaling promotes Merkel cell development in the trunk skin but not above the eye or operculum. Reduction of Merkel cells in eda mutants suggest that Eda signaling is required for their development and maintenance. Finally they show that alteration of zebfrafish scale pattern using a mutant with exaggerated fgf8a expression also alters merkel cell distribution.

      The data presented is clear and the conclusions are supported by their observations.

      I have no significant issue with the paper as is.

      Reviewer #2 (Significance (Required)):

      This study will serve as an excellent basis for future work looking at studies of Merkel cell development and function in fish. Though Merkel cells have been studied in mammals, establishing a zebrafish model for their study will help overcome many barriers that make their analysis difficult in mammals.

      We thank the reviewer for their positive assessment of the manuscript.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Brown et al. (2022) seek to characterize and address fundamental questions regarding the development and dynamics of Merkel cells (MCs) in zebrafish (Danio rerio). The authors utilize a diverse and complementary suite of methods to characterize presumptive MCs in the epidermis of adult zebrafish, including electron microscopy, novel transgenic lines, confocal imaging, and various immuno- and non-immunohistological staining techniques. These studies demonstrate that zebrafish MCs share many features with vertebrate (including mammalian) MCs, particularly regarding morphology/structure, putative functions, genetic markers, and bodily distribution.

      After establishing the identity of zebrafish MCs, the authors employ lineage tracing and cell tracking analyses to determine that trunk MCs derive from basal keratinocytes and exhibit regular cell turnover. Finally, the authors examine how trunk scales may affect MC development by using established scale mutants. These results show that the presence/absence of scales influences trunk MC development, while scale characteristics (e.g. shape, size) change the distribution of MCs.

      MAJOR COMMENTS:

      The key conclusions of the manuscript are convincing, however, several points should be addressed by the authors.

      Throughout the manuscript, the authors make general claims about zebrafish MCs (zMCs) based on the evidence collected. Yet, most of this evidence (particularly claims about MC turnover, development, structure) comes from examination and experimentation of a specific MC population: trunk MCs located in the scale epidermis. The authors remark upon mammalian MC diversity (lines 73-74) and go on to highlight the diversity of MCs throughout the adult zebrafish (Figure 6), which have differing densities and distribution patterns. Any statements that suggest all zebrafish MCs share certain qualities/features should be carefully considered given the evidence presented.

      Thank you for raising this important point. We have added wording in the results and discussion to clearly articulate that the majority of our analyses and conclusions are based on trunk MCs:

      Results:

      “Anticipating the conclusion of our analysis below, we shall hereafter refer to the epidermal atoh1a+ cells as MCs, with the majority of the analyses completed on trunk MCs unless stated otherwise.”

      Discussion:

      “As the majority of the analyses completed here focus on MCs found in the trunk epidermis, it will be intriguing to determine whether all MCs in different skin compartments in the juvenile and adult zebrafish share similar molecular, cellular, and functional properties.”

      In the manuscript, the authors validate several markers for the identification of zMCs based upon known mammalian markers (e.g. atoh1a, sox2, piezo2, SV2, 5-HT, and AM1-43; Figures 1-3). Yet, another well-known marker for MCs (CK8) is not addressed (Moll, 1995; Moll, 2005). One zebrafish ortholog for CK8 is krt4, a transgene successfully employed in this study to label keratinocytes. Do zMCs express krt4 or other mammalian MC keratins? Answering this question or addressing this discrepancy would further strengthen the authors claims that these cells are bona fide zMCs.

      We agree with the reviewer that 1) identification of a keratin(s) that distinguishes MCs from other epidermal cell types in zebrafish would be an excellent reagent; and 2) readers familiar with the mammalian MC literature may similarly wonder why this was not addressed in the manuscript. Indeed, we had considered whether we could identify homologs of CK8, CK20 or other mammalian MC keratins that would label zebrafish MCs. However, despite the confusing nomenclature that would indicate otherwise, the zebrafish keratins share more homology with each other than the corresponding mammalian proteins (Ho et al., 2022; PMID: 34991727). Our revised results section now includes the following to clarify this point: *“For example, keratins, most notably keratin 8 and keratin 20, have been used extensively as markers of mammalian MCs (Moll et al., 1995, 1984). However, zebrafish keratins have undergone extensive gene loss and duplication and are not orthologous to mammalian keratin genes (Ho et al., 2022). Thus, we considered alternative molecular markers to label zebrafish MCs.” *

      The authors utilize a previously validated eda mutant line to see if ectodysplasin signaling affects zMC development. While the results of these experiments are convincing, the authors need to make clear whether they are claiming that scales, scale-derived Eda signaling, or Eda signaling alone dictate trunk MC development. It appears that there is some conflation of these ideas, particularly with line 306 ("blocking dermal appendage formation inhibited MC development" is a different claim from 'blocking Eda signaling inhibited MC development'). One way to make this differentiation would be to perform a similar experiment as detailed in Xiao et al. 2016: using a Shh agonist in eda mutants. If scale-specific signals are required in addition to Eda, we would expect to see similar MC densities and patterns in both Shh agonist-treated and non-treated eda mutants.

      We agree that our interpretation of these results could have been more clearly articulated in our initial submission. As discussed above in response to Reviewer #1, we do not yet know whether Eda signaling directly or indirectly influences MC development. We have revised the results section to clarify our interpretation of the results as follows: “Together, these data suggest that either Eda signaling, or a scale-derived signal, is required for MC development, maintenance, and distribution along the trunk. Further studies are required to determine the specific scale-derived signal that regulates MC development in the trunk."

      The suggestion of using a Shh pathway agonist in eda mutants to attempt to rescue MC differentiation similar to Xiao et al. 2016 is an interesting one. To our knowledge, experiments validating the Smo agonist used by Xiao and colleagues (Hh-Ag1.5) in zebrafish have not been published. We also note that activation of Shh signaling by heat-shock induction of shha expression during squamation led to kyphosis and epidermal migration off of the trunk (Aman et al., 2018; PMID: 30014845). Thus, we respectfully suggest that distinguishing between the various possibilities downstream of Eda is beyond the scope of the current manuscript. We have added a discussion point along these lines: Further investigations are required to determine whether Eda signaling directly regulates the differentiation of MC progenitors. Alternatively, since eda mutants lack scales (Harris et al., 2008) and have decreased epidermal innervation (Rasmussen et al., 2018), MC development may require scale- and/or somatosensory neuron-derived signals. Finally, we note that trunk MCs are not completely absent in eda mutants, suggesting that a subset of MCs develop independent of Eda signaling.”

      Throughout the manuscript, the authors use subjective language (e.g. line 106). While this reviewer does not wish to suppress or alter the authors' voices, careful consideration should be used when employing these types of descriptors. Furthermore, the authors use suggestively quantitative language inappropriately or unjustifiably. For example, in line 221, the authors use "extensive" when describing the co-labeling between atoh1a+ MCs and lineage-traced basal keratinocytes; the percentage of co-labeled cells ranged from 29-32%. Other quantitative descriptors such as "frequently" (line 171) or "uniform" (line 249) describe various features or phenomena without quantification in figures or supplements.

      Thank you for this comment. We have paid careful attention to our subjective/statistical language in the revision. Regarding the usage of “uniform” - we have added the wording “relatively uniformly” to descriptions and a statement that our term “uniform” was not specifically quantified. Although the uniform appearance was not specifically quantified, we believe this provides an accurate description of the MC localization pattern in certain skin compartments.

      Example word change in Results:

      “For example, MCs were distributed relatively uniformly across the eye, although this spatial pattern was not specifically quantified”

      In the lineage tracing experiments (Figure 4), the authors note that "recombination is not complete" (lines 1016-1017) to explain why not all zMCs express the basal keratinocyte lineage marker. While this idea could be supported by Figure 4-figure supplement 1, one could postulate that zMCs are derived from multiple progenitor lineages. Using the basal keratinocyte lineage tracing validation, the authors could in theory calculate a "recombination efficiency" of this transgenic line and determine approximately the percent of zMCs they 'lose' as a result. Otherwise, the authors could perform other experiments to support the claim that zMCs derive from basal keratinocytes. For example, could the authors photoconvert basal keratinocytes at 1 dpf and see how many derived MCs are still photoconverted later? Could they do this photoconversion experiment with neural crest cells? Could they ablate neural crest cells and determine if MC number is affected? These additional experiments are not necessarily required for publication, but some explanation of the unexpectedly low percentage of basal keratinocyte lineage marker-labeled MCs would suffice.

      We thank the reviewer for raising this important point and the suggestion of calculating a “recombination efficiency”. We note that Cre responsive transgenes are far from a perfect technology in zebrafish as recently characterized by Lalonde et al. (2022; PMID:35582941). In response to the reviewer’s comment, we added an estimate of the recombination efficiency to Figure 4 (panels E, G, H). Importantly, a comparison between the recombination efficiency and percentage of MCs labeled by the basal keratinocyte Cre tracing was not significantly different. Our revised results section reads as follows: “After raising 4-OHT-treated animals to adulthood, we observed variable (2-81%) co-labeling between the basal keratinocyte lineage trace and a MC reporter (Figure 4D’,F). We note that our lineage tracing strategy did not label all basal keratinocytes (Figure 4D; Figure 4—figure supplement 1), suggestive of incomplete Cre-ERT2 induction and/or transgene recombination. Consistent with the latter possibility, a recent analysis demonstrated Tg(actb2:LOXP-BFP-LOXP-DsRed) has a low recombination efficiency compared to other Cre reporter transgenes (Lalonde et al., 2022). To estimate the local recombination efficiency in imaged regions, we thresholded the DsRed channel and calculated the fraction of skin cells labeled (Figure 4E). Importantly, the proportion of MCs labeled by the basal keratinocyte lineage trace was not significantly different from the local recombination efficiency (Figure 4G-H). These observations support a basal keratinocyte origin of most or all zebrafish MCs.”

      The authors use appropriate statistics and have sufficient replicates when this information is presented. Yet, the presence or absence of these data is not consistent within figure captions. The authors must ensure that they provide the N of adults and scales (when appropriate), the SL range of adults, and transgenic lines used. Statistics are missing in some figures (for example: Figures 4E, 5D, 5E, 6F, 8S-1E, 9E-H) where it would be appropriate to include them. In some figures, the N changes over time (example: 5D, 5E); an explanation in the 'Methods' section would suffice.

      Thank you for noting the need for additional statistics. We have added statistics to the above figures. For Figure 9E-H, we have not added additional statistics. Figure 9E-H serve to graphically visualize differences. We show statistical differences in Figure 9—figure supplement 1 for scale area, aspect ratio, and Feret’s diameter. We have added an explanation related to Figure 5D,E in the methods section: “Animals that died over the course of the experiment were excluded from further analysis.”

      MINOR COMMNETS

      While the authors present an extensive argument for their claims, addressing these additional comments would further strengthen their story.

      Are zMC nuclei lobulated? This ultrastructure characteristic seems to be common in MCs (Chew & Leung, 1994; Tachibana & Nawa, 2002; Moll, 2005; Boulais, 2009).

      We have not observed any lobulation of the MC nuclei by TEM, nor was this commented on in the TEM studies of Whitear and colleagues in other teleosts (Lane and Whitear, 1977; PMID: 198137; Whitear, 1989; PMID: 2510796). Nevertheless, we cannot rule out the possibility that serial sectioning or other high resolution analysis of the nuclear shape may reveal such features. In response to the reviewer’s comment, we have added the following paragraph to the discussion: “While our characterization revealed substantial similarities between mammalian and zebrafish MCs, we did observe anatomical differences in line with previous ultrastructural characterizations of teleost MCs (Lane and Whitear, 1977; Whitear, 1989). For example, the nuclei of mammalian MC are commonly lobulated (Boulais et al., 2009; Cheng Chew and Leung, 1994; Moll et al., 2005; Tachibana and Nawa, 2002). While we did not observe lobulation of zebrafish MC nuclei by TEM, we cannot rule out that serial sectioning or high-resolution reconstruction of nuclear shape would reveal lobulation. Mammalian MCs typically localize adjacent to basal keratinocytes (Boot et al., 1992; Cheng Chew and Leung, 1994; Fradette et al., 1995; Mihara et al., 1979; Moll et al., 1996; Smith, Jr, 1977), whereas zebrafish MCs appear in upper strata, typically beneath the periderm (Figure 1D,G’’). As the majority of the analyses completed here focus on MCs found in the trunk epidermis, it will be intriguing to determine whether all MCs in different skin compartments in the juvenile and adult zebrafish share similar properties.”

      In Figure 3C and 3", the authors show that AM1-43 labels zMCs. Yet, this technique should also stain sensory axons that associate with MCs (Meyers, 2003). Are axons also stained? Other positive controls for the stains could be useful as a supplement.

      The reviewer is correct that Meyers et al., (2003; PMID: 12764092) report AM1-43 staining of neurites that innervate MCs in the whisker follicle. However, they did not report similar staining of neurites innervating touch dome MCs. In murine hairy skin, the related styryl dye FM1-43 appears to most prominently stain MCs and hair follicle-associated lanceolate endings (Banks et al., 2013 PMID: 23440964; Villarino et al., 2022 preprint DOI: 10.1101/2022.05.26.493600). Our revised legend for Figure 3 now includes the following: “AM1-43 has been reported to stain neurites innervating MCs in murine whisker vibrissae (Meyers et al., 2003). However, our AM1-43 staining regiment did not strongly label cutaneous axons, although we cannot exclude low levels of staining.”

      All of the stains used in our original Figure 3 have been previously validated in zebrafish, which we have more clearly stated and cited in the corresponding results section of the revision. Because these reagents have all been previously validated and our staining patterns are consistent with the literature, we respectfully suggest that positive controls would add little value to the current manuscript. Nevertheless, in response to the reviewer’s comment, we confirmed our piezo2 FISH staining using an independent method (a piezo2 HCR probe). We have included these HCR results as the updated Figure 3D and moved the original Figure 3D to Figure 3—figure supplement 1.

      In Figure 7, the authors argue that as scales develop, MC density increases with scale area. Did the authors compare MC densities of differently-sized scales at the same age? Is fish SL/age a potential confound in the interpretation of these data?

      Thank you for the suggestion. In response to the reviewer’s comment, we have replotted the data in Figure 7G,H for animals in the range 8-10 mm SL in Figure 7—figure supplement 1. We have revised the corresponding results section as follows: “The density and number of MCs positively correlated with scale area (Figure 7G,H), although this trend was less pronounced at stages less than 10 mm (Figure 7—figure supplement 1)”. As discussed above in response to reviewer #1’s suggestion, we also now report F-statistics and P-values for the linear regressions in the figure legends.

      The authors claim that squamation begins at ~9 mm SL (line 268), prior to which MCs were "rare" in the epidermis (supported by data in Figure 7F). However, Figures 8A and 8G suggest that MCs are not rare prior to squamation/9 mm SL. Are these data in conflict?

      Thank you for raising this observation. We do not believe these data are in conflict. Figure 8A and B show images of fish 8.8-8.9 mm SL, immediately prior to squamation. MCs appear about the same time as scales develop but the exact timing varies between animals. To further strengthen this section of the manuscript, we now include quantification of the density of trunk MCs at various stages prior to 9 mm SL (new data added to the developmental timeline in Figure 7F). These data are consistent with our initial interpretation. In the revised results section we clarify this as follows:Using reporters that label MCs and scale-forming osteoblasts, we rarely observed MCs in the epidermis prior to 8 mm SL (Figure 7B, F). Between 8-10 mm SL, MCs appeared at a low density along the trunk (Figure 7F). MC density rapidly increased from 10-15 mm SL, a period of active scale growth (Figure 7C-F).”

      In Figure 6B-E, the panels are incorrectly labeled as "atoh1a:nls-Eos" (figure caption and fluorescence localization show they are atoh1a:Lifeact-EGFP).

      The low magnification panels were correctly labeled as atoh1a:nls-Eos. The insets showed atoh1a:Lifeact-EGFP as described in the figure legend. We apologize for the confusion and poor data presentation. We have revised Figure 6 to eliminate the problematic labeling/display.

      Figure 9 panels E-H are not referenced in the main body of the text.

      Thank you for pointing this out. Fixed in the revised manuscript.

      In Figure 6, the authors examine MC densities in the tail, but do not quantify changes here with eda mutants as they did for other regions (eye, operculum) in Figure 8. Why was this region not examined?

      We have clarified this point in the revised results section as follows: “eda mutants lack fins at the stages analyzed (Harris et al., 2008) precluding analysis of these regions in the homozygous mutants.”

      The authors do a good job in detailing the current literature regarding MCs, however, two missing areas are noticeable: 1) there is no mention of mammalian MCs that reside in the oral mucosa (Hashimoto, 1972) or whether they exist in zebrafish, and 2) no mention of Merkel-like cells (Halata, 2003) and why the cells in this paper are or are likely not Merkel-like cells.

      Thank you for the suggestions. Regarding the first point, we revised the introduction to reference (Hashimoto, 1972) as follows: “...vertebrates have diverse types of skin and MCs are found in both hairy and glabrous (non-hairy) skin, as well as mucocutaneous regions such as the gingiva and palate (Hashimoto, 1972; Lacour et al., 1991; Moayedi et al., 2021).” We also imaged the mucosal tissue along the roof palate of the adult mouth and identified atoh1a+ cells (see Response Figure 1 below). Close examination of the atoh1a:Lifeact-EGFP signal revealed these cells have a spherical morphology and extend short processes similar to the MCs described across the body regions examined in Figure 6. However, as the microvillar morphology of the palatal atoh1a+ cells is not identical to those identified in other skin regions, we hesitate to call these MCs without performing additional in-depth analyses. We feel that inclusion of these data in the manuscript could distract the reader from the main focus of our study, therefore we have included them here:

      __Response Figure 1. atoh1a+ cells in the adult oral epithelium. (A,B) __Low- (A) and high-magnification (B) confocal micrographs of oral roof palate epithelium in an adult expressing reporters for keratinocytes (Tg(krt4:DsRed)) and atoh1a-expressing cells (Tg(atoh1a:Lifeact-EGFP)). (B’) Reconstructed cross section along the yellow line in B showing two atoh1a+ cells in the upper strata of the oral epithelium. Scale bars: 50 µm (A) and 10 µm (B,B’).

      Regarding the second point, we have added the following sentence to the first paragraph of the discussion: “Second, zebrafish MCs extend numerous short, actin-rich microvilli and complex with somatosensory axons, classic morphological hallmarks of MCs (Mihara et al., 1979; Smith, Jr, 1977; Toyoshima et al., 1998). Our morphological observations support the interpretation that these cells are MCs rather than Merkel-like cells, which lack axon association and microvillar processes (reviewed by Halata et al., 2003).

      It may help readers understand MC morphology in context if the authors include a larger picture of the TEM data that highlights the drastic difference in ultrastructure between MCs and neighboring keratinocytes.

      Thank you for the suggestion. We added a new figure (Figure 1—figure supplement 1) to the revised manuscript that contains an additional TEM image that we believe illustrates the different morphologies of keratinocytes and MCs. We hope this will help the reader contextualize the morphology and position of MCs within the zebrafish epidermis. This is now referenced in the first results section as follows: “The cells appeared relatively small and spherical with a low cytoplasmic-to-nuclear ratio compared to neighboring keratinocytes (Figure 1B,C; Figure 1—figure supplement 1) …”

      Reviewer #3 (Significance (Required)):

      The current manuscript provides significant advancements in various biological fields and research communities. For researchers that utilize zebrafish as a model organism, these findings present a new cell type along with novel and essential genetic tools for study. These developments open the possibilities to further understand MCs, their roles in somatosensory function, mechanisms of cell type diversification, and to engage in translational research. For those already researching MCs, this manuscript shows that fundamental questions regarding MC functioning can be rigorously addressed with a new model that can fill the methodological limitations imposed by mammalian biology. Indeed, the authors do a thorough job of introducing and contextualizing our knowledge of MCs and any outstanding gaps. The authors then sit their findings comfortably alongside previous works, largely supporting those findings, and take the extra step to address MC controversies/matters of debate. This technique of supporting the current literature and then uplifting it with new findings makes this work even more impressive. Various audiences will find value in this manuscript, including but not limited to those that study epidermal cell types, the development and influence of skin appendages, somatosensation and sensory disorders, developmental biology, and Merkel cell carcinoma.

      We thank the reviewer for their positive assessment of the manuscript.

    1. Since there is no measurable performance advantage for either, any time (however marginal) spent thinking or talking about a choice between the two is wasted. When you prefer single quoted strings, you have to think when you need interpolation. When you use double quoted strings, you never have to think. (I'd also add, anecdotally, that apostrophes are more prevalent in strings than double quotes, which again means less thinking when using double-quoted strings.) Therefore always using double-quoted strings results in the least possible wasted time and effort.
    1. Request Parameters

      The order is off here.

      Should have a heading, description of the API, endpoint, Code, Request parameters and then Response parameters.

      Please check with the TW and make the changes.

    1. RequestResponsecopycurl -u [YOUR_KEY_ID]:[YOUR_KEY_SECRET] \-X POST https://api.razorpay.com/v1/virtual_accounts \-H "Content-Type: application/json" \-d '{ "receivers": { "types": [ "qr_code" ] }, "description": "First Payment by BharatQR", "customer_id": "cust_805c8oBQdBGPwS", "notes": { "reference_key": "reference_value" }}'

      The code should be placed before the request and response parameters. Also, check if the code sample is available in any other SDK languages.

    1. Reviewer #1 (Public Review):

      The authors provide a simple and clear way to understand an aspect of the implicit bias of a neural population code linking it with well-known machine learning methods and concepts such as kernel regression, sample complexity and efficiency.

      Although the mathematical results the authors employ are not novel, the way they apply them to the problem of neural coding is novel and interesting to a broad audience.<br /> In particular, the computational neuroscience community can benefit from this work being it is one of the few dealing with the impact of the model implicit bias in explaining real data.

    2. Reviewer #2 (Public Review):

      It is my opinion that the principle utility of this approach lies in its ability to identify the set of 'easily learnable' stimulus-response mappings from neural data which makes strong behavioral predictions that can be easily evaluated. I envision a simple experiment in which empirically obtained kernel functions are used to rank stimulus-response mappings according to their learnability which can then be plotted against measures of performance like the observed learning rate and saturated performance. Because kernel functions are empirically obtained, there is even the potential for meaningful cross-species comparisons. If behaviorally validated, one could also use this approach to label cortical populations by the set of easily learned stimulus-response mappings for that population. This allows for the identification of task-relevant neurons or regions which can be subsequently manipulated to enhance or degrade learning rates.

      Of course, any theoretical approach is only as good as the underlying assumptions and so while the primary strength is the simplicity and generality of this approach, the primary weakness is its neglect of some very real and very relevant aspects of neural data in particular and statistical learning in general. In particular, the three principle limitations of this work are tied to its reliance on the assumptions that (1) neurons are noiseless, (2) decoders are linear, and (3) learned weights are unbiased.

      (1) Within this framework, a realistic stimulus-dependent noise model can be easily introduced and its effects on the kernel and set of easily learned stimulus-response mappings investigated. So while the kernel would be substantially altered via the addition of a realistic noise model, the applications of the approach outlined above would not be affected. The same cannot be said for the efficient coding application described in this manuscript. There, the authors note that rotations and constant shifts of neural activity do not affect the kernel and thus do not affect the generalization error. This kernel invariance is not present when a non-trivial (i.e. non-isotropic) noise model is added. For example, suppose that neurons are independent and Poisson so that noise scales with the mean of the neural response. In this case, adding a baseline firing rate to a population of unimodal neurons representing orientation necessarily reduces the information content of the population while rotations can affect the fidelity with which certain stimulus values are represented. It is important to note, however, that while this particular efficiency result is not compelling, I believe that it is possible to perform a similar analysis that takes into account realistic noise models and focuses on a broad set of 'biologically plausible' kernels instead of particular invariant ones. For example, one could consider noise covariance structures with differential correlations (Moreno-Bote 2014). Since the magnitude of differential correlations controls the redundancy of the population code this would enable an analysis of the role of redundancy in suppressing (or enhancing) generalization error.

      (2) Similarly, the linearity assumption is somewhat restrictive. Global linear decoders of neural activity are known to be highly inefficient and completely fail when decoding orientation in the primary visual cortex in the presence of contrast fluctuations. This is because contrast modulates the amplitude of the neural response and doubling the amplitude means doubling an estimate obtained from a linear decoder even when the underlying orientation has not changed. While the contrast issue could be partially addressed by simply considering normalized neural responses, it is not yet clear how to extend this approach to account for other sources of neural variability and co-variability that cause global linear decoders to fail so badly.

      (3) This analysis relies on the assumption that decoder weights learned in the presence of finite data are efficient and unbiased. This assumption is problematic particularly when it comes to inductive bias and generalization error. This is because a standard way to reduce generalization error is to introduce bias into the learned decoder weights through a penalization scheme that privileges decoder weights with small magnitudes. This kind of regularization is particularly important when neurons are noisy. Fortunately, this issue could be addressed by parameterizing changes in the kernel function by the degree and type of regularization potentially leading to a more general result.

      Finally, I would like to conclude by explicitly stating that while the limitations imposed by the assumptions listed above temper my enthusiasm in regards to conclusions drawn in this work, I do not believe there is some fundamental problem with the general theoretical framework. Indeed, items 1 and 3 above can be easily addressed through straightforward extensions of the authors approach and I look forward to their implementation. Item 2 is a bit more troublesome, but my intuition tells me that an information-theoretic extension based upon Fisher information may be capable of eliminating all three of these limiting assumptions by exploiting the relationship between FI(\theta) and FI(y=f(\theta)).

    3. Reviewer #3 (Public Review):

      The manuscript presents a theory of generalization performance in deterministic population codes, that applies to the case of small numbers of training examples. The main technical result, as far as I understand, is that generalization performance (the expected classification or regression error) of a population code depends exclusively on the 'kernel', i.e. a measure of the pairwise similarity between population activity patterns corresponding to different inputs. The main conceptual results are that, using this theory, one can understand the inductive biases of the code just from analyzing the kernel, particularly the top eigenfunctions; and that sample-efficient learning (low generalization performance with few samples) depends on whether the task is aligned with the population's inductive bias, that is, whether the target function (i.e. the true map from inputs to outputs) is aligned with the top eigenfunctions of the kernel. For instance, in mouse V1 data, they show that the top eigenfunctions correspond to low frequency functions of visual orientation (i.e. functions that map a broad range of similar orientations to similar output value), and that consistent with the theory, the generalization performance for small sample sizes is better for tasks defined by low frequency target functions. In my opinion, perhaps the most significant finding from a neuroscience perspective, is that the conditions for good generalization at low samples are markedly different from those in the large-sample asymptotic regime studies in Stringer et al. 2018 Nature: rather than a trade-off between high-dimensionality and differentiability proposed by Stringer et al, this manuscript shows that in the low-sample regime such codes can be disadvantageous for small sample sizes, that differentiability is not required, that the top eigenvalues matter more than the tail of the spectrum, and what matters is the alignment between the task and the top eigenfunctions. The authors propose sample-efficient learning/generalization as a new principle of neural coding, replacing or complementing efficient coding.

      Overall, in my opinion this is a remarkable manuscript, presenting truly innovative theory with somewhat limited but convincing application to neural data. My main concern is that this is highly technical, dense, and long; the mathematical proofs for the theory are buried in the supplement and require knowledge of disparate techniques from statistical physics. Although some of that material on the theory of generalization is covered in previous publications by the authors, it was not clear to me if that is true for all of the technical results or only some.

      Fixed population code, learnable linear readout: the authors acknowledge in the very last sentences of the manuscript that this is a limitation, given that neural tuning curves (the population neural code) are adaptable. I imagine extending the theory to both learnable codes and learnable readouts is hard and I understand it's beyond the scope of this paper. But perhaps the authors could motivate and discuss this choice, not just because of its mathematical convenience but also in relation to actual neural systems: when are these assumptions expected to be a good approximation of the real system?

      The analysis of V1 data, showing a bias for low-frequency functions of orientation is convincing. But it could help if the authors provided some considerations on the kind of ethological behavioral context where this is relevant, or at least the design of an experimental behavioral task to probe it. Also related, it would be useful to construct and show a counter-example, a synthetic code for which the high-frequency task is easier.<br /> Line 519, data preprocessing: related to the above, is it possible that binning together the V1 responses to gratings with different orientations (a range of 3.6 deg per bin, if I understood correctly) influences the finding of a low-frequency bias?

      I found the study of invariances interesting, where the theory provides a normative prediction for the proportion of simple and complex cells. However, I would suggest the authors attempt to bring this analysis a step closer to the actual data: there are no pure simple and complex cells, usually the classification is based on responses to gratings phases (F1/F0) and real neurons take a continuum of values. Could the theory qualitatively predict that distribution?

    1. Reviewer #2 (Public Review):

      The authors present a compendium of diffusion MR, dynamic contrast-enhanced MR, histological, and other results in AQP4 KO vs. WT mice which suggest that AQP4 deletion results in stagnation of interstitial fluid movement, enlargement of interstitial volume, and an increase in total brain water. The authors also provide evidence that these effects do not arise due to changes in CSF production, perfusion, or vascular density, strengthening the conclusion that AQP4 is specifically involved in modulating parenchymal resistance, rather than another aspect of glymphatic function. While the study of AQP4 deletion using various MR and histological methods is not novel per se, the breadth of concurrent methodological approaches presented here is uncommonly extensive, and thus provides a strong, self-contained case for the conclusion(s) - more so than other works on such mouse models. The key strength and utility of this work lie in the extent of corroborating evidence provided for the conclusions.

      Another strength of the paper is the development of what appears to be a robust CSF space segmentation approach, which may be of interest to others aiming to quantify glymphatic function using MR. The source code, however, is not provided at this time.

      I have some concerns, specifically about the discussion around transmembrane water exchange - i.e., whether the exchange is truly being measured by the diffusion MR methods - and about the validity of applying an IVIM signal model across the brain. These concerns, however, do not affect the major conclusions of the paper. Indeed, the authors have included analyses using standard ADC fitting which avoids the issues with IVIM. In summary, the paper presents a compelling body of evidence describing the effects of AQP4 deletion in mice.

    1. All of the custom codes for FIJI and MATLAB used in this study have been deposited at GitHub (https://github.com/PaulYJ/Axon-spheroid).

      GitHub is not a repository. figshare, zenodo, data-dryad, and other similar sites are appropriate code repositories.

    1. a. In the Razorpay Dashboard, navigate to Settings. b. Under Account Settings, in the Theme Color section, enter the HEX code for your brand. For example, #6822CC. c. Click Save Changes.

      Numbering breaking

    1. For example, this FindDigits function loads a file into memory and searches it for the first group of consecutive numeric digits, returning them as a new slice. ``` var digitRegexp = regexp.MustCompile("[0-9]+")

      func FindDigits(filename string) []byte { b, _ := ioutil.ReadFile(filename) return digitRegexp.Find(b) } This code behaves as advertised, but the returned []byte points into an array containing the entire file. Since the slice references the original array, as long as the slice is kept around the garbage collector can’t release the array; the few useful bytes of the file keep the entire contents in memory. To fix this problem one can copy the interesting data to a new slice before returning it: func CopyDigits(filename string) []byte { b, _ := ioutil.ReadFile(filename) b = digitRegexp.Find(b) c := make([]byte, len(b)) copy(c, b) return c } ```

    1. Author Response

      1) Response to the Editor

      We thank the Editor and the Reviewers for the kind words, the helpful suggestions, and the points of critique, which have all helped us substantially strengthen the manuscript in this revised version. Regarding the 3 general critiques highlighted by the Editor:

      Essential Revisions:

      1) Some hypothesis, and in particular the one that all individuals have the same inter-burst interval distribution should be tested/justified/discussed.

      (a) We have generalized the theory to directly address this point by relaxing the assumption of an identical inter-burst interval for all individuals. In short: the main insights continue to hold and we discuss the nuances in the text.

      (b) Experimentally, the hypothesis that all single fireflies isolated from the group exhibit the same interburst interval (IBI) distribution could not be rigorously tested. The main reason is practical: in order to compare IBI distributions across individuals, we would need to collect a large number of fireflies and track them for long durations, which was not realistic given our experimental setup and the short window of firefly emergence. In addition, external environmental factors might slightly alter behaviors as well, making comparisons even more complex. Thus, due to paucity of field data, we eventually use the assumption that all individual fireflies follow the same IBI distribution.

      2) Comparison between the models and the data must be improved, in particular through a quantification of the differences between distributions and sensitivity analysis of the numerical results.

      (a) Regarding the comparison of the agent-based simulations with experimental data, in Fig. 7, we compare the underlying distributions using the two-sided Kolgomorov-Smirnov statistical test for goodness-of-fit. These appear to us the most straightforward and informative approaches, without over-fitting.

      (b) Regarding sensitivity analysis for the agent-based simulations, for each β value from 0 to 1 we statistically compared simulations to the experimental distributions to find the most well-fitted β.

      (c) Finally, owing to experimental constraints leading to sparsity of available data in characterizing the interburst distribution, we strive to strike a delicate balance between sophisticated statistical tools to compare theoretical and simulation distributions (with unrestricted access to large sample sizes) to the finite samples in the empirical distributions. As such, we think it is the apposite to use the first two moments of respective distributions In Fig. 3 to show the striking similarity of trends.

      3) More discussion of the modeling in connection to past theoretical results and existing literature is necessary to better contextualize the present work and assess its originality.

      We have done this closely following the specific suggestions from reviewers.

      2) Revised terminology: removing usage of “model”

      Since unintended ambiguity may be caused by use of the word “model”, which could refer to either (1) the theoretical framework, principle of emergent periodicity, and attendant analytic calculation , or (2) the agent-based simulation in the computational realization, we have removed all instances of the word “model” from the results presented in the paper, and replaced by the specific meaning (theory or simulation) in each context.

      Similarly, in responding to Reviewers’ comments, we clarify what we understand by their use of the word “model” in each case.

      3) Addressing an error in the agent-based simulation code

      We (OM and OP) have now addressed an inadvertent unit typo in the agent-based simulation code. The discharging time (Td) before the typo was fixed was set to 10000ms. After the fix, the Td value was correctly set to 100ms. This caused very slow discharges, keeping the voltage high until any beta addition was received, resulting in more frequent bursts than we’d actually expect from the model dynamics. This has been fixed, and in our responses to the reviewers, we address the results of this fix by referring to the “unit typo”. We corrected the panels corresponding to agent-based simulation in Figs. 3 and 5 to reflect the new numerical simulation results, as well as the corresponding sections in the text of the paper.

      4) Addressing changes to experimental dataset

      We increased the size of our N=1 dataset (N is number of fireflies) to correctly match what was reported in the original text of 10 samples. Additionally, we have added characterization of the size of the datasets for N=5, 10, 15, and 20 fireflies.

      5) Response to Reviewer 1

      We thank the Reviewer for kind remarks, and the highlights of the strengths of the paper.

      Regarding concerns raised, point by point:

      Reviewer #1 (Public Review):

      Weaknesses:

      The work presented here is an excellent start at understanding the collective behavior of this particular species of firefly. However, the model does not apply to other species in which individual males are intrinsically rhythmic. So the model is less general than it may appear at first.

      We take the Reviewer’s point well. We have added text to the paper to clearly highlight this point.

      The modeling framework is also developed under the very stylized conditions of experiments conducted in a small tent. While that is a natural place to begin, future work should consider the conditions that fireflies encounter in the wild. Swarms that are spread out in space would require a model with a more complicated structure, perhaps with network connectivity and coupling strengths that both change in time as fireflies move around. This is not so much a weakness of the present work as a call to arms for future research.

      We agree with the Reviewer that this is an exciting call to arms for future research!

      Other comments:

      This assumption that all individuals have the same IBI distribution could be directly tested. Has this been done? If not, why not? e.g. Are there difficulties with letting one firefly flash long enough to collect sufficient data to fill out the distribution?

      1. We have generalized the theory to directly address this point by relaxing the assumption that all individuals exhibit the same inter-burst interval distribution. In short: the main insights continue to hold and we discuss the nuances in the text.

      2. Experimentally, hypothesis that all single fireflies isolated from the group exhibit the same interburst interval (IBI) distribution could not be rigorously tested. The main reason is practical: in order to compare IBI distributions across individuals, we would need to collect a large number of fireflies and track them for long durations, which was not realistic given our experimental setup and the short window of firefly emergence. In addition, external environmental factors might slightly alter behaviors as well, making comparisons even more complex. Thus, due to paucity of field data, we eventually use the assumption that all individual fireflies follow the same IBI distribution.

      The derivation given in 6.2.1 is clearer than the approach taken here, which unnecessarily introduces Q, q, and c and then never uses them again.

      We agree with the Reviewer and have accordingly revised the manuscript.

      We have also implemented the suggested edits in the marked up manuscript. We are grateful for the detailed feedback, which helped us substantially extend results, and improve presentation and clarity.

      6) Response to Reviewer 2

      We thank the Reviewer for their thorough feedback. We provide point by point responses below.

      Reviewer #2 (Public Review):

      1) The biological relevance of certain hypotheses is insufficiently discussed. This is important because if the observed behaviour is a universal one, alternative models may explain it as well.

      We thank the reviewer for raising this point. The main hypotheses underlying our models are: 1) individual fireflies in isolation flash at random intervals; 2) these random intervals are drawn from the empirical distribution reported (implicitly: all fireflies follow the same distribution); 3) once a firefly flashes, it triggers all others. Hypothesis 1) is directly supported by the data presented. Hypothesis 2) is comprehensively addressed in the revised manuscript, as discussed previously. Hypothesis 3) is central to the proposed principle, and enables intrinsically non-oscillating individuals to oscillate periodically when in a group. The resulting phenomenon has been compared to experimental data and extensively discussed in the manuscript. Further, we have also simulated the effect of changing the strength of coupling between fireflies based on this hypothesis in the revised section on agent-based simulation.

      2) Comparison between the models and the data could be improved, in particular through quantification of the differences between distributions and sensitivity analysis of the numerical results.

      1. Regarding the comparison of the agent-based simulations with experimental data, in Fig. 7, we compare the underlying distributions using the two-sided Kolgomorov-Smirnov statistical test for goodness-of fit. These appear to us the most straightforward and informative approaches, without over-fitting.

      2. Regarding sensitivity analysis for the agent-based simulations, for each β value from 0 to 1 we statistically compared simulations to the experimental distributions to find the most well-fitted β.

      3. Finally, owing to experimental constraints leading to sparsity of available data in characterizing the interburst distribution, we strive to strike a delicate balance between sophisticated statistical tools to compare theoretical and simulation distributions (with unrestricted access to large sample sizes) to the finite samples in the empirical distributions. As such, we think it is the apposite to use the first two moments of respective distributions In Fig. 3 to show the striking similarity of trends.

      Reviewer #2 (Recommendations for the authors):

      A. The assumption that single-firefly spikes obey the same distribution (there is no individual variation in the frequency, or even of the composing number of bursts, of the flash) does not seem to have been verified on the data, that are instead pulled together in one single distribution (Fig. 1D). Moreover, the main feature of such distribution is that it has a minimum at 12 secs (discarding the faster bursts that are not considered in the model) and that it is sufficiently skewed so that it takes a minimal coupling for collective synchrony to emerge. I think that the agreement between the distributions for different N would be more meaningfully discussed having previous work as a reference, whereas now this is relegated to the discussion, so that it is unclear how much of the theoretical results are novel and/or unexpected. Quantification of the distance between distributions would also be interesting: it looks like the two models (analytical and simulations) disagree more among themselves than with the data.

      Regarding the hypothesis that all individual fireflies exhibit the same interflash interval, please see our response to Main Point 1. Regarding comparing the analytical theory and numerical simulation analysis, Figs. 3 and 5 have been revised after a unit typo was found in the code (see Section 2). Following the update, the analytical and numerical models agree in (1) the location of the peak in Fig. 3 for all N values, and (2) the peak approaches the minimum of the input distribution as N increases.

      B. If I understand correctly, simulations are introduced as a way to get a dependence on the intensity of the coupling (\beta). There are several issues here. First, I do not see how the coupling constant could change in the present experimental setup, where all fireflies presumably see each other (different from when there is vegetation). Second, looking at Fig. 3, the critical coupling strength appears to depend very weakly from N, and it is not clear how the 'detailed comparison' that leads to the fit is realized (in fact, the fitted \betas look larger that those at which the transition occurs in Fig. 3A). I think a sensitivity analysis is needed in order to understand how do results change when \beta is changed, and also what is the effect of the natural Tb distribution (Fig. 2 F). Results of the simulations might be clearer if instead of using the envelope of the experimental results, the authors tried to fit it to a standard distribution (ex. Poisson) so that it can be regularized. This should allow to trace with higher resolution the boundary between asynchronous and synchronous firing.

      We have included agent-based numerical simulations as a way to provide a concrete instantiation of the theory principle and analytical results in the preceding section. While the analytic theory results are fitting parameters free, in the agent-based simulations, we introduce an additional fitting parameter, to see what happens when we relax one hypothesis of the analytical theory: the instantaneous triggering of all fireflies upon an initial flasher. Additionally, the agent-based simulations pave the way for future work, allowing for convenient exploration of the connectivity between individuals and analysis of the behavior of individual fireflies. in this context, please note that Fig. 5 has been corrected (see above), leading to a stronger co-dependence of β and N. In addition to the envelopes, we also report the trends in the first empirical moments (mean and STD) for comparison and tracking of the transition to synchrony.

      C. More care should be put in explaining what are the initial conditions hypothesized for the different models. For instance, the results of paragraph 3 are understandable if all fireflies are initialized just after firing, something that is only learnt at the end of the paragraph. I also wonder whether initial conditions may be involved with T_bs in the low-coupling region of Fig. 3A not being uniformly distributed, as I would have expected for a desynchronized population.

      We have clarified that, indeed, all fireflies are re-initialized after firing. The initial conditions then become a new random vector of interflash intervals. Importantly, we found after receiving the reviews that, due to inconsistent units in our numerical simulation code, Fig. 5 was incorrect. With proper units, the new results show a much more widespread distribution at low coupling, as expected by the Reviewer.

      D. I found that equations were hard to understand either because one of the variables was not precisely (or at all) defined, or because some information was missing: Eq. 1: q is not defined Eq. 2: explain what it means: the prob. that others have not flashed times that that one flashes. Also, say explicitly what is the 'corresponding PDF. Eq. 3: the equation for \epsilon(t) to which this is coupled is missing Why introduce \beta_{i,j} and T_bi if they are then taken independent of the indexes? Definitions of collective and group burst interval should be provided. It would be clearer if t_b0 was defined in the first paragraph of the results, so as to clarify as well its relation with T_b. Define T^i_b in the caption of Fig. 3 (they are defined later than the figure is first discussed). The definition of 'the vertical axis label' (maybe find a word for that...) is pretty cumbersome. I could imagine that other definitions would allow the lines in Fig. 3 E to converge to the same line for large betas, which would make more sense, considering that in the strong coupling limit I see no reason why the collective spiking should not be the same for different N (the analytical model could help here).

      Thank you for these comments; we have incorporated these and related changes.

      E. I think that the author's reading of the two 'dynamical quorum sensing' papers they cite is incorrect: De Monte et al. was not about the Kuramoto model, but the same limit cycle oscillators as in Strogatz; Taylor et al. considers excitable systems, potentially closer to noisy integrate-and-fire, at least in that they do not have self-sustained oscillations. Both papers show that oscillations appear above a certain density threshold, and that the frequency of oscillations increases with density, as found in this work. A more accurate link to previous publications in the field of synchronization theory, including the models by Kurths and colleagues for fireflies, would be useful both in the introduction and in the discussion, and would help the reader to position this work and appreciate its original contributions.

      1. Thank you for pointing out an inaccuracy in our literature citations regarding synchronization. We have now made corrections to address this point.

      2. While we take the Reviewer’s points well, our theory framework (“model”), building off of the principle of emergent periodicity we propose here, is fundamentally different in the nature of individuals from extant “models”. The reference in question has individuals as oscillators, and the fastest frequency is the frequency of the fastest individual oscillator. In contrast, in our work there is no fastest individual oscillator and the “fastest frequency” has a completely different meaning, since individuals do not have a particular frequency associated with them. In this sense, our work is not inspired by theirs. That said, we have included citations as suggested by the Reviewer.

      F. The authors say that part of the data is unpublished. I guess they mean that the whole data set will be published with this manuscript. I think the formulation is ambiguous.

      Thank you for this comment. We have now clarified that the data will indeed be published with the manuscript.

    1. (3) A third direction, and I would say maybe the most popular one in AI alignment research right now, is called interpretability. This is also a major direction in mainstream machine learning research, so there’s a big point of intersection there. The idea of interpretability is, why don’t we exploit the fact that we actually have complete access to the code of the AI—or if it’s a neural net, complete access to its parameters? So we can look inside of it. We can do the AI analogue of neuroscience. Except, unlike an fMRI machine, which gives you only an extremely crude snapshot of what a brain is doing, we can see exactly what every neuron in a neural net is doing at every point in time. If we don’t exploit that, then aren’t we trying to make AI safe with our hands tied behind our backs?

      Interesting metaphor - it is a bit like MRI for neural networks but actually more accurate/powerful

    1. Author Response

      Reviewer #1 (Public Review):

      It is a strength of the current manuscript that it provides a near-complete picture of how the metamorphosis of a higher brain centre comes about at the cellular level. The visualization of the data and analyses is a weakness.

      I do not see any point where the conclusions of the authors need to be doubted, in particular as speculations are expressly defined as such whenever they are presented.

      The fact that molecular or genetic analyses of how the described metamorphic processes are organized are not presented should, I think, not compromise enthusiasm about what is provided at the cellular level.

      We appreciate the comments and guidance that Reviewer #1 has given us on data presentation. We have tried to simplify figures and make the images larger. For the developmental figures, a couple of illustrative examples are provided in the main figure with the remainder given in “figure supplements”

      Reviewer #2 (Public Review):

      This very nice piece of work describes and discusses the developmental progression of larval neurons of the mushroom body into those in the adult Drosophila brain. There are many surprising findings that reveal a number of strategies for how brain development has evolved to serve both the early functions specific to the larval brain and then their eventual roles in the adult brain. I think it is fascinating biology and I was educated while reviewing the paper.

      Line 115-116. 'Output from PPL1 compartments direct avoidance behavior, while that from PAM compartments results in attraction'. This is not correct and is actually reversed. The learning rule is depression so that aversive learning reduces the drive to approach pathways whereas appetitive learning reduces the drive to avoidance pathways. This should be corrected and reference made to studies demonstrating learning-directed depression.

      Line 222. It provides feed-forward inhibition from y4>2>1. I could be wrong but I'm not aware that there is functional evidence for this glutamatergic neuron being inhibitory. It's currently speculation.

      We have noted that this function was proposed by Aso et al.

      Line 242. I think it would be nice if the authors focused on extreme changes and showed larger and nicer images. The rest can be summarized but why not pick a few of the best examples to illustrate the strategies they consider in the discussion?

      We have reduced the number of neurons shown in the new Figs 5 and 6. Hopefully, the images are now large enough to appreciate. Data for the remaining neurons are now in Figure Supplements for Figs 5 and 6.

      Line 249 'became sexually dimorphic'. I may have missed it somewhere but this immediately made me think about the sex of all the images that are shown. Is this explicitly stated somewhere? Was it tracked in all larvae, pupae, and adults?

      We now begin the Methods addressing this point. We did an initial screen and found sex-specific differences only in MBIN-b1 and -b2. After this time, we kept no records as to the sex of the fly that was used except for the latter cells.

      Reviewer #3 (Public Review):

      Truman et al. investigated the contribution and remodeling of individual larval neurons that provide input and output to the Drosophila mushroom body through metamorphosis. Hereto, they used a collection of split-GAL4 lines targeting specific larval mushroom body input and output neurons, in combination with a conditional flip-switch and imaging, to follow the fates of these cells.

      Interestingly, most of these larval neurons survive metamorphosis and persist in the adult brain and only a small percentage of neurons die. The authors also elegantly show that a substantial number of neurons actually trans-differentiate and exert a different role in the larval brain, compared to their final adult functionality (similar to their role in hemimetabolous insects). This process is relatively understudied in neuroscience and of great interest.

      Using the ventral nerve cord as a proxy, the authors claim that the larval state of the neuron would be their derived state, while their adult identity is ancestral. While the authors did not show this directly for the mushroom body neurons under study, it is a very compelling hypothesis. However, writing the manuscript from this perspective and not from the perspective of the neuron (which first goes through a larval state, metamorphosis, and finally adult state), results in confusing language and I would suggest the authors adjust the manuscript to the 'lifeline' of the neuron.

      We have tried to be more “linear” in our presentation. This should make the text less confusing.

      In general, this manuscript does not explain how the larval brain has evolved as the title suggests but instead describes how the larval brain is remodeled during metamorphosis. It thus generates perspectives on the evolution of metamorphosis, rather than the larval state. Additionally, this manuscript would benefit from major rearrangements in both text and figures for the story to be better comprehended.

      We think that the end of the Discussion does relate to how a larval brain evolves. The evolution of the larval brain is faced with constraints related to the shortened period of embryonic development and the highly conserved temporal and spatial mechanisms that insects use to generate their neuronal phenotypes. These constraints result in a potential mismatch between the neurons that are needed and those that are actually made (revealed by the adult phenotypes of these neurons). The larva then turns to trans-differentiation to temporarily transform unneeded (or dead) neurons into the missing cell types to build its larval circuits.

      We think that these ideas provide some new insights into how a larval brain may have evolved and that our title is appropriate.

      The introduction is very focused on the temporal patterning of the insect nervous system, while none of the data collected incorporate this temporal code. Temporal patterning comes back in the discussion but is purely speculative.

      The Speculation about the importance of temporal patterning is now brought in late in the Discussion in reference to Figure 12

      Furthermore, the second part of the introduction describes one strategy for remodeling and why that strategy is not likely but does not present an alternative hypothesis. The first section of the results might serve as a better introduction to the paper instead, as it places the results of the paper better and concludes with the main findings. The accompanying Figure 1 would also benefit from a schematic overview of the larval and adult mushroom bodies as presented in Fig. 2A (left).

      This has been revised in the spirit of these comments

      In the second results section, the authors show the post-metamorphic fates of mushroom body input and output neurons and introduce the concept of trans-differentiation. Readers might benefit from a short explanation of this process. I also encourage the authors to revisit this part of the text since it gives the impression that the neurons themselves undergo active migration (instead of axon remodeling).

      We have tried to make it clear that there is no cell migration. Rather there is retraction/fragmentation of larval arbors followed by outgrowth to new, adult targets

      The discussion starts with a very comprehensive overview of the different strategies that neurons could use during metamorphosis (here too, re-writing the text from the neurons' perspective would increase the reflection of what actually happens to them).

      The Discussion now begins by dealing with gross changes in the MB, with reference to the compartments and eventually moves to changes in individual cells. We have reduced our discussion of the metamorphic strategies of cells and no longer have Fig 8A

      The discussion covers multiple topics concerning trans-differentiation, metamorphosis, memory, and evolution and is often disconnected from the results. It could be significantly shortened to discuss the results of the paper and place them in current literature. Generally, the figures supporting the discussion are hard to comprehend and often do not reflect what the text is saying they are showing.

      The Discussion is still long, but, hopefully, our organization now makes it much easier to read and comprehend.

    1. As it happens, there are many flavors of style guides, including documentation for brand identity, writing, voice and tone, code, design language, and user interface patterns.

      We need this accounted for in our Design System

    1. I will be writing about a series of concepts I’ve been developing called the “simples” of digital literature. Each of these simples describes some element of the deep structure of the text/algorithm interaction inherent in all digital textuality — those places where the mathematical underpinnings of text as it appears on the screen (since there is always something at work keeping the text you are reading now visible) and how artists exploit them to create unique effects.

      This is the most inclusive definition of e-lit I have encountered. It goes beyond text or storytelling in a digital medium to the actual mathematical underpinnings of how to get text on a screen and even the fragility of how long it can be there. There is a process beyond just the interaction between reader/viewer and the story/work but between the computer and and the text/code linking it all together at its root.

    1. Daarnaast vormt onze Code of Conduct sinds 2006 de basis voor ons handelen ininterne processen en maakt deze deel uit van de overeenkomsten met onze directeleveranciers. Wij delen de verantwoordelijkheid met onze leveranciers om risico’s in deproductieketens aan te pakken. Vandaar dat wij onze zakelijke partners verplichten ookin hun toeleveringsketen passende maatregelen te treffen en de Code of Conduct ookdoor te voeren bij hun leveranciers.

      Lidl deelt verantwoordelijkheid met directe leveranciers

    1. Building programs by reusing generic components willseem strange if you think of programming as the act ofassembling the raw statements and expressions of a pro-gramming language. The integrated circuit seemed just asstrange to designers who built circ uits from discrete electronic com ponents. What is truly revolutionary aboutobject-oriented programming is that it helps programmersreuse existing code. just as the silicon chip helps circuitbuilders reuse the work of chip designers.

      Oh man, this metaphor really fell apart and, if anything, works against itself.

      "If integrated circuits are superior to discrete components, why exactly are we supposed to be recreating the folly of reaching for reusable components in creating software?"

    1. For each successful payment, the Checkout returns the following: razorpay_payment_id razorpay_order_id razorpay_signature Upon payment success, the customer is redirected to the URL provided in the code.

      For every successful payment, razorpay_payment_id, razorpay_order_id and razorpay_signature are submitted via a POST request to the callback_url.

    1. Sometimes you may want to access the native browser event object inside your own code. To make this easy, Alpine automatically injects an $event magic variable:

      $event绑定本机浏览器事件对象 <button @click="$event.target.remove()">Remove Me</button>

    1. It can only be done through ranking and relevance algorithms, the more localized the better.

      The synopsis here is that: * Artificial Intelligence select our content for us * You can't "open up the AI" ->To constrain AI, you have to be explicit about what you want to code it in, and we "aren't there yet". We are just begining to set up "computational contracts". Reference: Blockchains.

      What can be done now: Why not create a marketplace of algorithms for what people choose for themselves? Third parties. Shouldn't inherently detract from current content selection business *"And here’s another important thing: right now there’s no consistent market pressure on the final details of how content is selected for users, not least because users aren’t the final customers. (Indeed, pretty much the only pressure right now comes from PR eruptions and incidents.) But if the ecosystem changes, and there are third parties whose sole purpose is to serve users, and to deliver the final content they want, then there’ll start to be real market forces that drive innovation—and potentially add more value.' * "That has to come from outside—from humans defining goals."

    1. Ce qui explique ce succès, c’est qu’Apple met à disposition un manuel de référencePour voir le manuel de référence.↩︎ qui détaille l’ensemble du code source et les schémas électriques et électroniques. Chacun peut alors développer librement de nouveaux logiciels et de nouveaux périphériques.

      Ce n'est pas Apple aujourd'hui qui donnerait les outils nécessaires à ses utilisateur.ices pour développer des nouveaux logiciels et périphériques. Il est étonnant d'observer la manière dont ces compagnies évoluent selon différentes logiques que celles qu'elles défendaient à leurs débuts.