10,000 Matching Annotations
  1. Feb 2025
    1. Reviewer #1 (Public review):

      Summary:

      In the paper, the authors investigate how the availability of genomic information and the timing of vaccine strain selection influence the accuracy of influenza A/H3N2 forecasting. The manuscript presents three key findings:

      (1) Using real and simulated data, the authors demonstrate that shortening the forecasting horizon and reducing submission delays for sharing genomic data improve the accuracy of virus forecasting.

      (2) Reducing submission delays also enhances estimates of current clade frequencies.

      (3) Shorter forecasting horizons, for example, allowed by the proposed use of "faster" vaccine platforms such as mRNA, resulting in the most significant improvements in forecasting accuracy.

      Strengths:

      The authors present a robust analysis, using statistical methods based on previously published genetic-based techniques to forecast influenza evolution. Optimizing prediction methods is crucial from both scientific and public health perspectives. The use of simulated as well as real genetic data (collected between April 1, 2005, and October 1, 2019) to assess the effects of shorter forecasting horizons and reduced submission delays is valuable and provides a comprehensive dataset. Moreover, the accompanying code is openly available on GitHub and is well-documented.

      Weaknesses:

      While the study addresses a critical public health issue related to vaccine strain selection and explores potential improvements, its impact is somewhat constrained by its exclusive reliance on predictive methods using genomic information, without incorporating phenotypic data. The analysis remains at a high level, lacking a detailed exploration of factors such as the genetic distance of antigenic sites.

      Another limitation is the subsampling of the available dataset, which reduces several tens of thousands of sequences to just 90 sequences per month with even sampling across regions. This approach, possibly due to computational constraints, might overlook potential effects of regional biases in clade distribution that could be significant. The effect of dataset sampling on presented findings remains unexplored. Although the authors acknowledge limitations in their discussion section, the depth of the analysis could be improved to provide a more comprehensive understanding of the underlying dynamics and their effects.

  2. inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net
    1. Second-generation immigrants, however, have certain obvious and con-sequential advantages over their foreign-born peers. Youth of the second generation will not have to contend with the intense disorientation of ar-riving in a new country. They do not have to learn from scratch the cul-tural nuances and social etiquette that make life predictable and easier to manage. Learning the new cultural code is stressful and exhausting, as any-one living in a foreign land for a few weeks can attest.

      Second-generation immigrants have an easier time adjusting compared to those who move to a new country, but they still face challenges. They don’t have to learn a new language or adapt to unfamiliar customs, but they still grow up balancing two cultures. Even though they are born in the U.S., they might still experience discrimination or feel different from their peers. This shows that while their struggles may not be the same as first-generation immigrants, they still have to navigate their own unique difficulties.

    1. Editors Assessment:

      DNA has huge potential as a data storage medium because of its incredibly high storage density and stability. This work addresses the potential of modified bases, specifically 5-methylcytosine (5mC), in enhancing DNA data storage systems. This paper introduces a transcoding scheme named R+, which incorporates this modified 5mC base to increase information density beyond the standard limits. By encoding various file types into DNA sequences of between 1.3 to 1.6 kb in size, this method achieves an average recovery rate of 98.97% (with reference), validating the effectiveness of the method. On top of a wet-lab protocol (hosted in protocols.io) for the experimental validation of the transcoding scheme, it also includes open source code for in-silico simulation tests. Peer review scruitinising the protocols and validation are reusable and provide convincing results. As nanopore sequencing has enabled reading of these modified bases, it is timely making them applicable as extra letters in the molecular alphabet for DNA data storage

      This evaluation refers to version 1 of the preprint

    2. AbstractDNA molecular is a promising next-generation data storage medium. Recently, it has been theoretically proposed that non-natural or modified bases can serve as extra molecular letters to increase the information density. However, the feasibility of the strategy is challenging due to the difficulty in synthesizing and the complex structure of non-natural DNA sequences. Here, we described a practical DNA data storage transcoding scheme named R+ based on expanded molecular alphabet by introducing 5-methlcytosine(5mC). We also demonstrated the experimental validation by encoding one representative file into several 1.3~1.6 kbps in vitro DNA fragments for nanopore sequencing. The results show an average data recovery rate of 98.97% and 86.91% with and without reference respectively. This work validates the practicability of 5mC in DNA storage systems, with a potentially wide range of applications.Availability & Implementation R+ is implemented in Python and the code is available under the MIT license at https://github.com/Incpink-Liu/DNA-storage-R_plus

      This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.147). These reviews (including a protocol review) are as follows.

      Reviewer 1. Abdur Rasool

      Is the source code available, and has an appropriate Open Source Initiative license been assigned to the code? However, the Git links have a typo; the working code is available at https://github.com/Incpink-Liu/DNA-storage-R_plus

      Is the code executable?

      Unable to test. Complete execution of the given code requires time and resources.

      Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined? Unable to test. Additional Comments: This manuscript focuses on DNA data storage based on an expanded molecular alphabet. In view of the challenges of non-natural bases in synthesis, sequencing, and compatibility, the manuscript proposes a DNA data storage scheme containing 5-methylcytosine based on the theory that modified bases can replace non-natural bases as extra molecular letters and develops an adaptive transcoding algorithm named R+ for corresponding experimental validation. The high data recovery rate obtained from sequencing analysis demonstrates its practicability.

      This manuscript provides a simple but relatively universal transcoding algorithm for DNA data storage that introduces additional molecular letters. The proposed DNA data storage scheme outperforms conventional DNA data storage in the potential development of information density. Considering the anticipated decrease in future synthesis costs and the expected advancements in relevant transcoding algorithms, my outlook remains optimistic regarding the potential application of this scheme. I suggest that the manuscript could be accepted after a few minor revisions listed below:

      1. Figure 3 in the paper could be further modified, specifically minimizing the excess white space on both sides of Subfigure A to make it more aesthetically pleasing.
      2. The subfigures A, B, and D in Figure 2 and Figure S2 both demonstrate the difference between poem.txt/program.py and the other four files. However, the manuscript lacks an explanation for this phenomenon. Is it relevant to the file size?
      3. The 8 nt adaptors play a key role during the sequence assembly in the experimental validation, so I suggest supplementing the specific generation process of these linkers. Text descriptions or flow charts are acceptable.
      4. It’s better to add the silico simulation to the Methods to make its structure more complete.
      5. For the practicality of DNA storage, I suggest to cite https://onlinelibrary.wiley.com/doi/10.1002/smtd.202301585 and https://academic.oup.com/bib/article/25/5/bbae463/7759103.
      6. Provide the correct URLs of GitHub links for reproducibility.

      Reviewer 2. Bi Kun

      Are there (ideally real world) examples demonstrating use of the software?

      No. Additional Comments:

      In this study, a practical DNA data storage transcoding scheme named R+ based on expanded molecular alphabet is proposed to increase the information density. The experimental validation demonstrates the practicability of DDS-5mC and highlight the enormous potential of modified bases represented by 5mC in the field of DNA data storage. Overall, the methods and results look appropriate and promising, but it has minor issues that need to be addressed currently.

      1.Please indicate the proportion of substitution: insertion: deletion in the error rates of Fig. 4C and D. 2.What is the meaning of the vertical axis of Fig. 2B? Is it the number of homopolymers per sequence, the longest length of homopolymers, or something else? 3.Line 304, please add s, "References" 4.The last sentence of the Abstract: "This work validates the practicability of 5mC over other non-natural bases in DNA storage systems". Please correspond it with the last paragraph of Results (151-154). 5.If necessary, according to the guideline of this journal, section Conclusion can be added or not.

      Reviewer 3. Lifu Song

      This manuscript explores the application of 5-methylcytosine (5mC) as an additional molecular letter in DNA data storage systems, expanding the molecular alphabet to increase information density. The authors present a novel transcoding scheme (R+) and validate it with both in silico and experimental data. The study explores GC content, homopolymer distribution, and data recovery rates under various conditions, offering detailed insights into practical applications. Experimental validation with nanopore sequencing demonstrates real-world feasibility. By improving storage density and ensuring compatibility with nanopore sequencing, the study addresses significant challenges in incorporating non-natural bases into DNA storage systems. Overall, the manuscript is well-structured and addresses a highly relevant topic in DNA data storage, offering valuable contributions to the field. I recommend it for publication, subject to minor revisions to enhance clarity and precision.

      Suggested minor revisions: 1) Although substitution errors, particularly between C and 5mC, were discussed, the manuscript does not provide a detailed explanation of how these errors affect long- term storage or large-scale applications—both of which are critical for archival storage, the primary use case of DNA data storage technology. 2) The manuscript could benefit from a broader comparison with other high-density DNA storage strategies, such as composite DNA letters, to contextualize the benefits and limitations of 5mC. 3) The discussion could be expanded to address practical challenges, such as strategies to reduce synthesis costs and improve sequencing accuracy for modified bases like 5mC, to provide a more holistic perspective on the technology's scalability.

      Protocol Review: I have taken a look at the experiment protocol associated with this manuscript in the website of protocols.io. The protocol looks sensible. I don't have any additional comments about it and am happy for it to go live.

      See: https://dx.doi.org/10.17504/protocols.io.q26g7mr78gwz/v1

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Detailed response to Reviewer comments

      We thank the reviewers for their positive and constructive evaluation of the paper. We have addressed in full the concerns raised as detailed below. We apologize for the long time it took us to respond, which was a consequence of local circumstances in the last year.


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary:

      The authors analyzed circulating cell-free DNA for COVID-19 using deep sequencing of the methylation and histone modification. The major output was cell-specific quantification. The study involved 120 unvaccinated, hospitalized patients, 19 asymptomatic/mild cases, and 40 controls. Between COVID-19 and controls, they found significant differences in lung epithelial cells, cardiomyocytes, vascular endothelial cells and erythroblasts. The latter two cell types had significant differences even in the asymptomatic patients. It is unclear if the damage seen is related to COVID-19 specifically, or related to general inflammation or infection.

      Strengths of the study include relatively high WGBS/targeted sequencing, along with fragment-level analysis with methods described in their previous work (Loyfer et al. Nature). In addition, they add and ChIP-seq data using their published methods. The work comes from a group with leading expertise in methylation cell-free DNA analysis.

      Overall, the work is most comprehensive analysis to date for COVID-19, and the data would be a valuable resource to the research community. We have major and minor comments that do not necessarily require additional experimental work.

      We thank the reviewer for these supportive comments.

      Major comments:

      1. There is a lack of data and the methods are presented in such a way that the results and conclusion can be reproduced and evaluated. Neither the code nor the data to generate the results are available. Both need to be made available during the peer review process.

      Missing data: Fragment-level FASTQ, BAM, or PAT files are needed to reproduce the results. Missing Scripts, for example in Github, is standard and reasonable for reproducing the figures shown. Missing targeted assay method details: - The authors should show the data, methods, and details for: "The validation of markers was done using DNA extracted from different cells and tissues, and the methylation status of the CpG block was assessed."

      Thank you. WGBS data files are currently being uploaded to GEO and are waiting for an accession number.

      For the validation of targeted markers, we added a new supplemental table (S11) with data on the methylation status of the loci used in this study in different cells and tissues (i.e. marker specificity), and provided a detailed text and references to the methods used.

      The authors did not list the major limitations of the study in the discussion or elsewhere. These should include (or be addressed with experimental or conclusion changes):

      1) The small sample size of the asymptomatic/mild group (if the emphasis of the paper, as suggested by the title, is on the asymptomatic/mild group - see the next major point.

      Thank you, indeed this is a limitation, we have now addressed this issue in the text. Despite this limitation, findings regarding to this population were statistically significant.

      2) The targeted assay is used on the vast majority of samples, including all of the asymptomatic/mild group. However, it is limited to a particular subset of cell types (total defined by all possible cell types in the body). Those cell types were determined based on WGBS data on only 6 COVID-19 cases.

      Thank you, indeed this is a limitation. WGBS was done on 6 critically ill patients, to uncover the potential cell types that will be of most interest in the targeted assay. In comparison to the WGBS, the targeted assay has a deeper coverage and therefore greater sensitivity. We have now addressed this issue in the limitations section.

      3) The methylation references for the WGBS data were limited to a fraction of all human cell types. For example, this paper was not able to evaluate Schwann cells or peripheral nerves, which was a significant finding for COVID-19 related multisystem inflammatory syndrome (PMID 37279751).

      The WGBS atlas (PMID: 36599988) consists of ~40 cell types that we were able to isolate at a high purity. While this is the most complete methylome atlas of human cell types generated to date, it is indeed incomplete. Unfortunately the scarcity of Schwann cells prevented us from determining the methylome of this cell type, and the matter is to be investigated in future studies. Note that the study referred to by the reviewer described the cell-free transcriptome rather than the cfDNA methylome of patients. cfDNA methylation analysis of Schwann cells remains a challenge to be addressed in future studies. This limitation is explained in the revised text.

      4) The case and control groups (severe, asymptomatic mild, and control) were collected at different times and circumstances, allowing for potential pre-analytical confounders.

      We now addressed this limitation in the text.

      5) cfDNA levels can be influenced by several unmeasured factors, including death, replication leading to more turnover, clearance/stability, and movement from tissue into circulation. The methods used cannot distinguish between these possibilities .

      Indeed, the mechanism by which cfDNA concentration is increased is not fully understood, but is certainly correlated with pathology. We clarify this in the revised text.

      6) (if true) the controls used for the targeted assay were not age/sex matched. The median age for the controls skew younger per Table S1, S2, S3.

      We used control samples that were collected before the pandemic, to make sure that they were not infected with COVID-19. Consequently, there are minor demographic differences (e.g. controls tend to be younger than the hospitalized patients, though similar age to the asymptomatic donors).

      Note that in previous studies, cfDNA levels and origins did not show differences in sex.

      In the WGBS samples, we did age and sex matched the samples.

      We explain this issue in the revised text.

      7) (optional) It is unclear whether the differences found are attributable to COVID-19, coronavirus infection, viral infections, infections in general, or inflammation in general. The appropriate alternative controls were not addressed in this study. The paper shows some degree of correlation with acute inflammatory markers (CRP, ferritin, neutrophil contribution).

      Indeed, elevated cfDNA from specific tissues reflects tissue turnover or death, with no indication of the cause of pathology. We now addressed this limitation in the text.

      The title is a bit misleading in that it revolves around the asymptotic patients. However, this is also the group with the lowest representation at n=19. The vast majority of the data is related to the hospitalized patients. While other studies may have looked at hospitalized patients, I agree with the authors that there is merit in deep sequencing and the correlated clinical data.

      Thank you. We chose to highlight in the title the most novel and provocative finding of the study.

      More details on the patient inclusion criteria are needed. Were the asymptomatic/mild positive by PCR test or a point of care immunoassay? We know the viral load is quite dynamic for these patients. What was the timing of the blood draw?

      Likewise, how did you find the hospitalized patients? Was it comprehensive over a period of time? These details help reveal any potential biases in the selection process.

      We do not have information on the viral load in patients. All were positive for a PCR test. For the asymptomatic cases we know the time of the test, and this information is now added in Supplemental Table S2.

      Hospitalized patients were recruited and consented at the Shaare Zedek Medical Center in a rather comprehensive manner – we recruited all patients that we could during May-June 2020. This is explained in the revised methods section.

      Minor comments:

      1. The abstract states: "Asymptomatic patients had elevated levels of immune-derived cfDNA but did not show evidence of pulmonary or cardiac damage." However, in Fig 5, there seems to be a bimodal distribution for the lung epithelial and cardiomyocytes. Unclear if that is an artifact of the graph.

      It is quite interesting that the asymptomatic/mild group seems to have a bimodal distribution in lung epithelial and cardiomyocyte cfDNA. Perhaps this data is not available, and the sample size is small, but could there have been a clinical difference between the two groups (e.g. asymptomatic versus mild, or had symptoms later?). It is unclear how precise the measurements are for the lung epithelial cells.

      Thank you for this comment. Since cfDNA levels of the hospitalized patients are increased by orders of magnitude, we have arranged the graphs in logarithmic scale. Consequently, the bimodality that the reviewer mentions reflects only a slight absolute difference of cfDNA levels from lung and cardiomyocytes: +-1 GE/ml, and we assume that this difference does not reflect clinical significance (and is not statistically different from the controls). This is referred to in the revised text.

      The authors listed 2 prior studies that looked at cell type or tissue damage during COVID-19. There are 2 other studies that I am aware of: PMID: 33651717 (n=84 with n=18 nonhospitalized) but probably shallow WGBS, and 37279751 (n=205 pediatric patients). Importantly, the latter paper found Schwann cells were significantly elevated, which is missing from the current study's assessment. In addition, citation 14 from the same group already found significantly increased vascular endothelial cfDNA in COVID-19 patients with severe disease versus mild. While some findings are consistent, there are also discrepancies.

      As explained above, our DNA methylation atlas does not contain a Schwann cell entry, so we cannot refer to cfDNA from this cell type; the mentioned study used cfRNA to assess this population. This is mentioned in the limitations of the study.

      We now cite more comprehensively existing literature of liquid biopsies in Covid-19, and discuss the potential sources of discrepancy. We believe these result from differences in the methylome atlas, from the higher depth of the targeted assay compared with WGBS, and from our assessment of a unique population of asymptomatic patients.

      Is Fig 2 necessary? Fig. 5 seems to display the same data but with the asymptomatic group.

      Indeed there is some redundancy. Figure 2 shows data on hospitalized patients, and Figure 5 focuses on asymptomatic patients but uses as reference the same controls and severe patients as in Figure 2. We believe that this arrangement helps clarity.

      "Elevated lung cfDNA reflects excessive lung cell death" - recommend this statement is tempered as direct evidence is not available in this study. An alternative explanation could be that endothelial cells are damaged, and it is easier for lung cfDNA to enter blood circulation rather than the respiratory system.

      Thank you for this comment. We have addressed this possibility in the revised Discussion.

      Fig 6: Add unit of measure to heatmap.

      Added.

      Supplemental Fig 1.: Add label to unit of measure in caption or figure. Average or median beta value over a series of CpGs?

      Added. Each row represents a single CpG beta value.

      The authors state the targeted assay "allows for a more accurate and sensitive detection of cfDNA from a given source", which should be tempered unless clear evidence is presented for these statements. In addition, it targets only a small subset of all cell types. The highest cell type contribution from MK cells is only represented by 2 markers

      We now discuss this in more detail and with caution. Indeed targeted assays may not be more accurate given the use of few markers, but we do believe they are at least theoretically more sensitive given the use of PCR and deep sequencing.

      Targeted assay has a few caveats that the authors should mention or fix:

      The method is not described in detail.

      More details are now provided, including multiplex PCR method and a reference to the script used for interpreting sequence data.

      Methods besides WGBS can have biases in methylation representation and a beta correlation between the 12 samples that underwent WGBS and the targeted assay would be reassuring.

      We have added a graph (new __Supplementary Figure S3) showing a good correlation of Covid-19 WGBS data and targeted analysis of the same samples.__

      The level of precision at the lower end of cellular contribution would be helpful too. The lung epithelial and cardiomyocyte cells were present at the lower end of the spectrum. This can be shown in a titration of the purified cells into plasma, or at least an in silico titration analyzed with only the targeted markers.

      Thank you. The targeted methylation assay is capable of detecting ~0.1% contribution of DNA from a given source, or 1-5 genome equivalents from this source. This is true also for our lung and cardiomyocyte markers, as previously shown (PMID 35450968, 29691397).

      The authors state "(i) Evidence of frequent cardiomyocyte death in hospitalized patients... it has not been appreciated that cardiac cell death is a feature shared by most hospitalized patients." However, COVID-19 patients have elevated troponin.

      Thank you. Evidence for troponin elevation was indeed reported in some, but not most of the hospitalized patients (see PMID: 32652195, 33121710, 32219356, 32211816). Note that troponin is not a definitive evidence of cardiac cell death (e.g. the significance of elevated troponin after a marathon or in patients with kidney disease is not clear). This provides a justification for the use of cfDNA for this purpose, as we have shown previously (PMID: 37290439). This is clarified in the revised Discussion.

      The authors state "This signal presumably reflects elevated turnover of red blood cells and increased rate of erythropoiesis". However, could it be also higher nucleated RBCs released into circulation as the authors cited?

      Thank you. Both of these possibilities are valid, and are not mutually exclusive. Elevated NRBC was reported in severe COVID-19, and is strongly associated with higher erythropoiesis. This is clarified in the revised Discussion.

      Fig 2, 4, 5: The graphs seem to suggest that the authors picked 0.001 GE/mL as not detected. Should they label that point appropriately as "not detected" or "ND"? It is not clear why 0.001 GE/mL was picked, and the analytical sensitivity of the targeted test is not reported.

      Right, this was due to the non-zero limit of log graphs. We explain this in the text.

      How many mLs of plasma were used?

      We have now added to supplemental tables the amount of plasma that was used for each patient.

      __Reviewer #1 (Significance (Required))____: __ - General assessment: Strengths - 1) Interesting topic: Non-invasive tabulation using deep methylation sequencing of cell type shedding into circulation of an important disease (COVID-19). 2) Deep sequencing using methylation and histone output is a significant improvement on past studies. Although targeting limits the scope of the cell types, the targeting was based on relatively deep WGBS sequencing on 6 cases and 6 controls.

      Limitations - The unique aspects (targeted assay and deep sequencing) are missing data and detailed methodology for reanalysis and reproducibility. See major comment 2.

      • Advance: The authors used deep sequencing through brute force (WGBS) and a unique targeted assay to study COVID-19 from a large group (n=120 patients). They found that endothelial and erythroblast lineages are overrepresented based on the presence and severity of the COVID-19 infection. Their findings are significant and go beyond what has been published. The methodologies and data (i.e. the controls) would be a great resource to the community that can be used beyond the scope of COVID-19.

      • Audience: This article would be appealing to a broad, translational/clinical audience. The authors have published on methylation deconvolution several times before, but to my knowledge, the broader targeted assay is unique and there is a large dataset with correlated clinical information that may be of broad utility.

      • Reviewer expertise: technical expertise with circulating cell-free DNA. translational/clinical expertise.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required))____: __ they performed deep WGBS on severe COVID-19 and HC plasma samples, applied the novel UXM algorithm that includes 40 human cell types to identify the tissue origins of cfDNA, and showed increased cfDNA from diverse cell and tissue types in COVID-19 patients than healthy controls. Besides WGBS, they also performed targeted methylation assay to measure cellular turnovers/death and tissue injury from major cell and tissue types involved in COVID-19 pathogenesis and used as a predictor of poor outcome. Finally, they showed that cfChIP-seq can identify heightened immune responses associated with COVID-19 and asymptomatic patients. Previous studies have shown that cfDNA has a great potential to map tissue injuries in COVID-19 and predict patient outcomes (Cheng et al., 2021 & Andargie et al., 2021). The expanded reference methylation atlas and the addition of targeted methylation assay and cfCHIP-seq in this study are very informative and fascinating. Please allow me to congratulate Ben-Ami and colleagues for this wonderful work.

      Thank you for this encouraging feedback.

      Below are some points that need to be addressed to improve the manuscript: Major 1. Given the heterogeneous nature of COVID-19 clinical manifestation, the limited number of patients (n=6) raises concern about the significance of WGBS analysis. The authors need to provide further details as to why they performed WGBS only from 6 samples out of 120 subjects and what was the selection criteria

      Study design was impacted by resource limitations. We were able to perform deep WGBS only on a small number of samples, so have used this as a guide to the general nature of tissue turnover in COVID-19 patients, and later used a narrower, highly sensitive, more affordable and more broadly available targeted assay. This is clarified in the revised text (Discussion, section on limitations of study).

      The gene expression analysis with cfCHIp-seq is interesting. Likewise, Differentially Methylated Regions (DMR) can infer gene expression. Is the methylation analysis also showing increased interferon response in COVID-19 patients? This study also showed increased cfDNA from monocytes that is not reflected in blood cell counts. Does cfCHIP-seq identify inflammatory response-related genes in monocytes/macrophages? Hadjadj et al. 2020 (PMID: 32661059: Science) reported impaired interferon response in severe COVID-19 patients. Whereas this study showed heightened interferon response in severe and asymptomatic/mild COVID-19 patients compared to healthy controls, there was no difference between Mild and Severe COVID-19 patients. The author should consider validating their finding with plasma cytokine measurement. cfChip-seq also identifies cfDNA tissues-of-origin (PMID: 33432199). How is the correlation between these three assays (WGBS, targeted methylation assay and cfCHIP-seq) to detect cell death/turnover?

      • Thank you for these comments. While cfChip does indeed reflect gene expression patterns in the cells that released cfDNA, cfDNA methylation patterns are indicative of cell identity (i.e. tissue of origin) but not dynamic gene expression (PMID: 30100054). __Unfortunately, current cfChip technology while revealing gene expression patterns in the cells that released cfChromatin, does not inform which cell types have expressed these genes (e.g. monocytes or T cells). Thus we can state that the cells releasing cfDNA expressed interferon stimulated genes, but we cannot say which cells were expressing these genes. __

      We were unable to perform additional measurements e.g. cytokines since our blood samples are almost entirely depleted.

      With regards to the tissue origins of cfDNA: as shown in the paper, there is a general good agreement between WGBS and the targeted assay. In the revised version we show a good correlation between findings in specific samples that were subject to both WGBS and the targeted assay (Supplemental Figure S3). In our hands the sensitivity and specificity of cfChip-seq for detecting tissue origins of cfDNA are lower than cfDNA methylation, hence we elected to use the cfChip information only for inference of gene expression.

      It is unclear whether hospitalized COVID-19 subjects experienced particular organ involvement. It would be interesting to link the tissue-specific cfDNA to different COVID-19 endotypes. For instance, cardiac involvement and cardiomyocyte cfDNA.

      Indeed, linking tissue-specific cfDNA to clinical phenotype has been challenging. Elevated lung cfDNA is correlated with disease severity (which is well established to be associated with pulmonary damage). We were unable to link elevated cardiac cfDNA to a clinical cardiac phenotype, also because of the limited cardiac assays that were performed on the hospitalized patients e.g troponin and cardiac eco.

      Previous studies reported cfDNA concentration in healthy controls ranges between 3 and 15 ng/mL. This study's median cfDNA level for asymptomatic COVID-19 patients falls within that range. It would be interesting if the authors comment on the methodology differences, including plasma volume, correction for extraction efficiency, and cfDNA assay type.

      Indeed, asymptomatic patients had a mild, though highly statistically significant elevation in total cfDNA concentration relative to controls, as shown in Figure 5. Samples of asymptomatic patients and controls were obtained and processed identically using the Qiasymphony liquid handling robot. This is described in the revised methods. Plasma volume collected for each sample is now shown in Supp Tables S1-4.

      Were the asymptomatic/Mild case samples collected in the same time frame as Hospitalized patients? It would be interesting if the authors comment on the effect of SARSCOV-2 variants and viral loads on plasma cfDNA level.

      Yes, all collected at the same period (May-October 2020). This is stated in the revised methods. Unfortunately we do not have information on specific variants on viral loads.

      The author showed cfDNA from total T cells and CD8 cells in particular. The authors should comment on why CD4+ T was not shown instead of T cells (which includes both CD4 and CD8 cells).

      Unfortunately our current methylome atlas does not allow for identification of specific methylation markers for CD4+ cells (PMID: 34842142).

      Considering the expensive nature of deep sequencing, it would be interesting if the authors comment on applying the UXM algorithm for low and medium- and low-coverage sequenced samples.

      The algorithm applies to WGBS samples regardless of depth, obviously with reduced performance in low coverage sequencing. Formal analysis of performance on multiple WGBS samples is ongoing.

      Minor 1. The timing of blood sample collection from hospital admission or testing positive for COVID-19 is important to use cfDNA as a predictor of outcome. The authors should explain when the sample was collected for asymptomatic/mild patients. If it's not in the "acute phase" it should also be clarified for comparison with hospitalized COVID-19.

      We have now added the time of sampling – typically a week or two after diagnosis (Supplemental Table S2).

      Is there a reason the authors included repeated measures of cfDNA within the same subject (N=120, n-142; Figure 1A)? The author should consider statistical correction for repeated measures. This is important to reduce bias.

      Thank you, we have now reanalyzed the data including only one sample for each patient. The results are largely the same as the original analysis (for reviewer eyes only).

      I believe the authors forgot to include "Code and data availability" declaration. I encourage the authors to make publicly available the WGBS data and deconvolution algorithm for reproducibility purposes.

      WGBS data files are currently being uploaded to GEO and are waiting for an accession number.

      Figure 1D should show individual data points to see the pattern of tissue-specific cfDNA better, especially as COVID-19 shows heterogeneous clinical presentation. Please consider overlaying the data point on the histogram.

      Thank you, we have changed the graph to show each datapoint.

      Methods - Page 27, the first sentences from the last paragraph, please include the unit

      Thank you, we have changed the paragraph.

      after the number "75".... In fact, this paragraph is identical to the previous paper (PMID: 33432199); please consider paraphrasing the section.

      Done.

      Please clearly define "deteriorated." What WHO score or range is considered as deteriorated?

      Deteriorated patients were defined as [maximal WHO score post sample] – [WHO score at sampling day] > 0. This is now clarified in the revised results section.

      The authors mix between 40 and 37 reference cell types. Please be consistent.

      Thank you. Done.

      Page 6, line 3, please replace erythrocyte with erythroblast.

      Done.

      Page 28, line 10, please replace COVID with COVID-19.

      Done.

      Figure 5D needs a key for recovered versus deteriorated.

      Done (figure 4D).

      Figure 5, legend title, please fix the number of healthy controls.... (n=30-45).

      Done.

      __Reviewer #2 (Significance (Required))____: __ This manuscript used a deep WGBS approach with an expanded human cell-type methylation atlas and novel deconvolution algorithm, targeted methylation assay (which makes the cfDNA test easy to use in a clinical lab setting) and cfChIP-seq on plasma cfDNA based on epigenetic markers to identify specific cellular/organ involved in COVID-19 pathogenesis and identify potential mechanistic insights associated with heightened inflammatory response. Compared to the previous study, the limited sample size raises concerns about the significance of whole-genome bisulfite sequencing data in COVID-19 patients. Additionally, whether the tissue-specific cfDNA tracks specific COVID-19-associated endotypes has yet to be discussed. Taken together, this cfDNA may help to understand COVID-19 pathogenesis and define tissue or organ injuries.

      My expertise is in Genomics and Immunology.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In the manuscript entitled "Epigenetic liquid biopsies reveal elevated vascular endothelial cell turnover and erythropoiesis in asymptomatic COVID-19 patients," Ben-Ami and colleagues perform WGBS, targeted methylation assay and cfChIP-seq to measure cellular turnover/death or tissue injuries and infer gene expression profile in COVID-19 patients and healthy controls. First, they performed deep WGBS on severe COVID-19 and HC plasma samples, applied the novel UXM algorithm that includes 40 human cell types to identify the tissue origins of cfDNA, and showed increased cfDNA from diverse cell and tissue types in COVID-19 patients than healthy controls. Besides WGBS, they also performed targeted methylation assay to measure cellular turnovers/death and tissue injury from major cell and tissue types involved in COVID-19 pathogenesis and used as a predictor of poor outcome. Finally, they showed that cfChIP-seq can identify heightened immune responses associated with COVID-19 and asymptomatic patients. Previous studies have shown that cfDNA has a great potential to map tissue injuries in COVID-19 and predict patient outcomes (Cheng et al., 2021 & Andargie et al., 2021). The expanded reference methylation atlas and the addition of targeted methylation assay and cfCHIP-seq in this study are very informative and fascinating. Please allow me to congratulate Ben-Ami and colleagues for this wonderful work.

      Below are some points that need to be addressed to improve the manuscript:

      Major

      1. Given the heterogeneous nature of COVID-19 clinical manifestation, the limited number of patients (n=6) raises concern about the significance of WGBS analysis. The authors need to provide further details as to why they performed WGBS only from 6 samples out of 120 subjects and what was the selection criteria.
      2. The gene expression analysis with cfCHIp-seq is interesting. Likewise, Differentially Methylated Regions (DMR) can infer gene expression. Is the methylation analysis also showing increased interferon response in COVID-19 patients? This study also showed increased cfDNA from monocytes that is not reflected in blood cell counts. Does cfCHIP-seq identify inflammatory response-related genes in monocytes/macrophages? Hadjadj et al. 2020 (PMID: 32661059: Science) reported impaired interferon response in severe COVID-19 patients. Whereas this study showed heightened interferon response in severe and asymptomatic/mild COVID-19 patients compared to healthy controls, there was no difference between Mild and Severe COVID-19 patients. The author should consider validating their finding with plasma cytokine measurement. cfChip-seq also identifies cfDNA tissues-of-origin (PMID: 33432199). How is the correlation between these three assays (WGBS, targeted methylation assay and cfCHIP-seq) to detect cell death/turnover?
      3. It is unclear whether hospitalized COVID-19 subjects experienced particular organ involvement. It would be interesting to link the tissue-specific cfDNA to different COVID-19 endotypes. For instance, cardiac involvement and cardiomyocyte cfDNA.
      4. Previous studies reported cfDNA concentration in healthy controls ranges between 3 and 15 ng/mL. This study's median cfDNA level for asymptomatic COVID-19 patients falls within that range. It would be interesting if the authors comment on the methodology differences, including plasma volume, correction for extraction efficiency, and cfDNA assay type.
      5. Were the asymptomatic/Mild case samples collected in the same time frame as Hospitalized patients? It would be interesting if the authors comment on the effect of SARSCOV-2 variants and viral loads on plasma cfDNA level.
      6. The author showed cfDNA from total T cells and CD8 cells in particular. The authors should comment on why CD4+ T was not shown instead of T cells (which includes both CD4 and CD8 cells).
      7. Considering the expensive nature of deep sequencing, it would be interesting if the authors comment on applying the UXM algorithm for low and medium- and low-coverage sequenced samples.

      Minor

      1. The timing of blood sample collection from hospital admission or testing positive for COVID-19 is important to use cfDNA as a predictor of outcome. The authors should explain when the sample was collected for asymptomatic/mild patients. If it's not in the "acute phase, " it should also be clarified for comparison with hospitalized COVID-19.
      2. Is there a reason the authors included repeated measures of cfDNA within the same subject (N=120, n-142; Figure 1A)? The author should consider statistical correction for repeated measures. This is important to reduce bias.
      3. I believe the authors forgot to include "Code and data availability" declaration. I encourage the authors to make publicly available the WGBS data and deconvolution algorithm for reproducibility purposes.
      4. Figure 1D should show individual data points to see the pattern of tissue-specific cfDNA better, especially as COVID-19 shows heterogeneous clinical presentation. Please consider overlaying the data point on the histogram.
      5. Methods - Page 27, the first sentences from the last paragraph, please include the unit.
      6. after the number "75".... In fact, this paragraph is identical to the previous paper (PMID: 33432199); please consider paraphrasing the section.
      7. Please clearly define "deteriorated." What WHO score or range is considered as deteriorated?
      8. The authors mix between 40 and 37 reference cell types. Please be consistent.
      9. Page 6, line 3, please replace erythrocyte with erythroblast.
      10. Page 28, line 10, please replace COVID with COVID-19.
      11. Figure 5D needs a key for recovered versus deteriorated.
      12. Figure 5, legend title, please fix the number of healthy controls.... (n=30-45).

      Significance

      This manuscript used a deep WGBS approach with an expanded human cell-type methylation atlas and novel deconvolution algorithm, targeted methylation assay (which makes the cfDNA test easy to use in a clinical lab setting) and cfChIP-seq on plasma cfDNA based on epigenetic markers to identify specific cellular/organ involved in COVID-19 pathogenesis and identify potential mechanistic insights associated with heightened inflammatory response. Compared to the previous study, the limited sample size raises concerns about the significance of whole-genome bisulfite sequencing data in COVID-19 patients. Additionally, whether the tissue-specific cfDNA tracks specific COVID-19-associated endotypes has yet to be discussed. Taken together, this cfDNA may help to understand COVID-19 pathogenesis and define tissue or organ injuries.

      My expertise is in Genomics and Immunology.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      The authors analyzed circulating cell-free DNA for COVID-19 using deep sequencing of the methylation and histone modification. The major output was cell-specific quantification. The study involved 120 unvaccinated, hospitalized patients, 19 asymptomatic/mild cases, and 40 controls. Between COVID-19 and controls, they found significant differences in lung epithelial cells, cardiomyocytes, vascular endothelial cells and erythroblasts. The latter two cell types had significant differences even in the asymptomatic patients. It is unclear if the damage seen is related to COVID-19 specifically, or related to general inflammation or infection.

      Strengths of the study include relatively high WGBS/targeted sequencing, along with fragment-level analysis with methods described in their previous work (Loyfer et al. Nature). In addition, they add and ChIP-seq data using their published methods. The work comes from a group with leading expertise in methylation cell-free DNA analysis.

      Overall, the work is most comprehensive analysis to date for COVID-19, and the data would be a valuable resource to the research community. We have major and minor comments that do not necessarily require additional experimental work.

      Major comments:

      1. There is a lack of data and the methods are presented in such a way that the results and conclusion can be reproduced and evaluated. Neither the code nor the data to generate the results are available. Both need to be made available during the peer review process.

      Missing data: Fragment-level FASTQ, BAM, or PAT files are needed to reproduce the results. Missing Scripts, for example in Github, is standard and reasonable for reproducing the figures shown. Missing targeted assay method details: - The authors should show the data, methods, and details for: "The validation of markers was done using DNA extracted from different cells and tissues, and the methylation status of the CpG block was assessed."

      1. The authors did not list the major limitations of the study in the discussion or elsewhere. These should include (or be addressed with experimental or conclusion changes):
        1. The small sample size of the asymptomatic/mild group (if the emphasis of the paper, as suggested by the title, is on the asymptomatic/mild group - see the next major point).
        2. The targeted assay is used on the vast majority of samples, including all of the asymptomatic/mild group. However, it is limited to a particular subset of cell types (total defined by all possible cell types in the body). Those cell types were determined based on WGBS data on only 6 COVID-19 cases.
        3. The methylation references for the WGBS data were limited to a fraction of all human cell types. For example, this paper was not able to evaluate Schwann cells or peripheral nerves, which was a significant finding for COVID-19 related multisystem inflammatory syndrome (PMID 37279751).
        4. The case and control groups (severe, asymptomatic mild, and control) were collected at different times and circumstances, allowing for potential pre-analytical confounders.
        5. cfDNA levels can be influenced by several unmeasured factors, including death, replication leading to more turnover, clearance/stability, and movement from tissue into circulation. The methods used cannot distinguish between these possibilities.
        6. (if true) the controls used for the targeted assay were not age/sex matched. The median age for the controls skew younger per Table S1, S2, S3.
        7. (optional) It is unclear whether the differences found are attributable to COVID-19, coronavirus infection, viral infections, infections in general, or inflammation in general. The appropriate alternative controls were not addressed in this study. The paper shows some degree of correlation with acute inflammatory markers (CRP, ferritin, neutrophil contribution).
      2. The title is a bit misleading in that it revolves around the asymptotic patients. However, this is also the group with the lowest representation at n=19. The vast majority of the data is related to the hospitalized patients. While other studies may have looked at hospitalized patients, I agree with the authors that there is merit in deep sequencing and the correlated clinical data.
      3. More details on the patient inclusion criteria are needed. Were the asymptomatic/mild positive by PCR test or a point of care immunoassay? We know the viral load is quite dynamic for these patients. What was the timing of the blood draw?

      Likewise, how did you find the hospitalized patients? Was it comprehensive over a period of time? These details help reveal any potential biases in the selection process.

      Minor comments:

      1. The abstract states: "Asymptomatic patients had elevated levels of immune-derived cfDNA but did not show evidence of pulmonary or cardiac damage." However, in Fig 5, there seems to be a bimodal distribution for the lung epithelial and cardiomyocytes. Unclear if that is an artifact of the graph.

      It is quite interesting that the asymptomatic/mild group seems to have a bimodal distribution in lung epithelial and cardiomyocyte cfDNA. Perhaps this data is not available, and the sample size is small, but could there have been a clinical difference between the two groups (e.g. asymptomatic versus mild, or had symptoms later?). It is unclear how precise the measurements are for the lung epithelial cells. 2. The authors listed 2 prior studies that looked at cell type or tissue damage during COVID-19. There are 2 other studies that I am aware of: PMID: 33651717 (n=84 with n=18 nonhospitalized) but probably shallow WGBS, and 37279751 (n=205 pediatric patients). Importantly, the latter paper found Schwann cells were significantly elevated, which is missing from the current study's assessment. In addition, citation 14 from the same group already found significantly increased vascular endothelial cfDNA in COVID-19 patients with severe disease versus mild. While some findings are consistent, there are also discrepancies. 3. Is Fig 2 necessary? Fig. 5 seems to display the same data but with the asymptomatic group. 4. "Elevated lung cfDNA reflects excessive lung cell death" - recommend this statement is tempered as direct evidence is not available in this study. An alternative explanation could be that endothelial cells are damaged, and it is easier for lung cfDNA to enter blood circulation rather than the respiratory system. 5. Fig 6: Add unit of measure to heatmap. Supplemental Fig 1.: Add label to unit of measure in caption or figure. Average or median beta value over a series of CpGs? 6. The authors state the targeted assay "allows for a more accurate and sensitive detection of cfDNA from a given source", which should be tempered unless clear evidence is presented for these statements. In addition, it targets only a small subset of all cell types. The highest cell type contribution from MK cells is only represented by 2 markers. 7. Targeted assay has a few caveats that the authors should mention or fix: The method is not described in detail. Methods besides WGBS can have biases in methylation representation and a beta correlation between the 12 samples that underwent WGBS and the targeted assay would be reassuring. The level of precision at the lower end of cellular contribution would be helpful too. The lung epithelial and cardiomyocyte cells were present at the lower end of the spectrum. This can be shown in a titration of the purified cells into plasma, or at least an in silico titration analyzed with only the targeted markers. 8. The authors state "(i) Evidence of frequent cardiomyocyte death in hospitalized patients... it has not been appreciated that cardiac cell death is a feature shared by most hospitalized patients." However, COVID-19 patients have elevated troponin.<br /> 9. The authors state "This signal presumably reflects elevated turnover of red blood cells and increased rate of erythropoiesis". However, could it be also higher nucleated RBCs released into circulation as the authors cited? 10. Fig 2, 4, 5: The graphs seem to suggest that the authors picked 0.001 GE/mL as not detected. Should they label that point appropriately as "not detected" or "ND"? It is not clear why 0.001 GE/mL was picked, and the analytical sensitivity of the targeted test is not reported. 11. How many mLs of plasma were used?

      Significance

      General assessment:

      Strengths 1. Interesting topic: Non-invasive tabulation using deep methylation sequencing of cell type shedding into circulation of an important disease (COVID-19). 2. Deep sequencing using methylation and histone output is a significant improvement on past studies. Although targeting limits the scope of the cell types, the targeting was based on relatively deep WGBS sequencing on 6 cases and 6 controls.

      Limitations The unique aspects (targeted assay and deep sequencing) are missing data and detailed methodology for reanalysis and reproducibility. See major comment 2.

      Advance: The authors used deep sequencing through brute force (WGBS) and a unique targeted assay to study COVID-19 from a large group (n=120 patients). They found that endothelial and erythroblast lineages are overrepresented based on the presence and severity of the COVID-19 infection. Their findings are significant and go beyond what has been published. The methodologies and data (i.e. the controls) would be a great resource to the community that can be used beyond the scope of COVID-19.

      Audience: This article would be appealing to a broad, translational/clinical audience. The authors have published on methylation deconvolution several times before, but to my knowledge, the broader targeted assay is unique and there is a large dataset with correlated clinical information that may be of broad utility.

      Reviewer expertise: technical expertise with circulating cell-free DNA. translational/clinical expertise.

    1. TÜLU-V2-mix (Ivison et al., 2023). TÜLU-V2-mix is designed to enhance instruction-followingcapabilities in large language models, offering adiverse dataset that improves the model’s general-ization and execution abilities across multi-domaintasks. It covers a wide range of tasks, includingquestion answering, code generation, translation,and multi-turn conversations, with a strong em-phasis on multilingual adaptability and handlingcomplex real-world scenarios. Skywork-Reward,on the other hand, is designed to align models withhuman preferences using preference pairs, helpingmodels learn to generate user-preferred responses,such as fluent and coherent text. While TÜLU-V2-mix excels in generalization across a wide range oftasks, Skywork-Reward specializes in optimizinguser-centric outputs. Together, they address com-plementary goals for advancing language modelcapabilities

      Bộ dữ liệu bao phủ một loạt các bài toán, bao gồm QA, sinh code, dịch và hội thoại nhiều lượt, nhận mạnh vào khả năng thích ứng với nhiều ngôn ngữ. và xử lý các ngữ cảnh phức tạp trong thực tế.

    1. ; * eldoc integration (defun scimax-jupyter-signature () "Try to return a function signature for the thing at point." (when (and (eql major-mode 'org-mode) (string= (or (get-text-property (point) 'lang) "") "jupyter-python")) (save-window-excursion ;;; Essentially copied from (jupyter-inspect-at-point). (jupyter-org-with-src-block-client (cl-destructuring-bind (code pos) (jupyter-code-context 'inspect) (jupyter-inspect code pos nil 0))) (when (get-buffer "Help") (with-current-buffer "Help" (goto-char (point-min)) (prog1 (cond ((re-search-forward "Signature:" nil t 1) (buffer-substring (line-beginning-position) (line-end-position))) ((re-search-forward "Docstring:" nil t 1) (forward-line) (buffer-substring (line-beginning-position) (line-end-position))) (t nil)) ;; get rid of this so we don't accidentally show old results later (with-current-buffer "Help" (toggle-read-only) (erase-buffer))))))))

    1. Reviewer #2 (Public review):

      Summary:

      This paper presents an interesting and useful analysis of grid cell heterogeneity, showing that the experimentally observed heterogeneity of spacing and orientation within a grid cell module can allow more accurate decoding of location from a single module.

      Strengths:

      (1) I found the statistical analysis of the grid cell variability to be very systematic and convincing. I also found the evidence for enhanced decoding of location based on between cell variability within a module to be convincing and important, supporting their conclusions.

      (2) Theoreticians have developed models that focus on the use of grid cells that are highly regular in their parameters, and usually vary only in the spatial phase of cells within modules and the spacing and orientation between modules. This focus on consistency is partly to obtain the generalization of the grid cell code to a broad range of previously unvisited locations. In contrast, most experimentalists working with grid cells know that many if not most grid cells show high variability of firing fields, as demonstrated in the figures in experimental papers. The authors of this current paper have highlighted this discrepancy, and shown that the variability shown in the data could actually enhance decoding of location.

    1. In this report, we carefully dissect the framework of RLHF and discuss the entire process thatdetermines the success of the algorithm’s training. We explored how the quality of the reward modelaffects the final result of the policy model. We find that the quality of the reward model directlydetermines the upper bound of the policy model, and designing an appropriate PPO algorithm is crucialfor RLHF’s successful training. Moreover, accurate code implementation matters in deep policy(practice makes perfect). Therefore, we have conducted in-depth evaluations of the inner workingsof PPO algorithm to study how code-level and theory-level optimizations change agent trainingdynamics. We propose to monitor the PPO training process by using action space modeling metricsderived from the policy model, such as perplexity, response length, and KL divergence betweenthe policy model and the SFT model. These metrics are more informative of the training stabilitythan the values of response reward and loss functions. Based on these observations, we identify thepolicy constraints in the PPO algorithm as the key factor to achieve consistent alignment with humanpreferences. After extensive comparative experiments with various possible implementations of PPOframework, we finally introduce a preferable policy optimization algorithm named PPO-max, whichincorporates the collection of effective and essential implementations, and is carefully calibratedto avoid interference among them. PPO-max alleviates the instability of vanilla PPO training andenables longer training steps with a larger training corpus. We evaluate PPO-max on 7B and 13BSFT models, demonstrating comparable alignment performance with ChatGPT
      • Khám phá ra rằng chất lượng của reward model trực tiếp xác định cận trên của mô hình gốc. và việc thiết kế thuật toán PPO một cách hợp lý là rất quan trọng để huấn luyện rlhf.
      • Đề xuất theo dõi quá trình huấn luyện PPO bằng cách sử dụng các chỉ số mô hình hóa không gian hành động được rút ra từ mô hình policy (perplexity, response lengh, khoảng cách KL giữa mô hình policy và SFT). Các chỉ số này chứa nhiều thông tin về tính ổn định khi huấn luyện hơn là các giá tri của phản hồi và hàm mất mát.

      => Dựa trên các quan sát trên, các ràng buộc về policy trong thuật toán PPO được xác định là nhân tố chính trong việc gán sự yêu thích của con người lên mô hình một cách bền vững.

      Đóng góp: giới thiệu một thuật toán tối ưu mới có tên PPO-max, tích hợp một loạt các cải tiến cần thiết và hiệu quả, và được tinh chỉnh cẩn thận để tránh xung đột giữa các cải tiến. PPO-max hạn chế sự bất ổn của quá trình huấn luyện PPO cổ điển và cho phép các bước huấn luyện dài hơn với tập ngữ liệu lớn hơn.

    1. spelled out the rights, responsibilities, and rules ofconduct regarding the interactions of free persons and slave

      Code Noir helped form the Creole community in New Orleans by encouraging racial mixing

    1. 3.8.2 Remote Procedure Calls One of the most common forms of remote service is the RPC paradigm, which was designed as a way to abstract the procedure-call mechanism for use between systems with network connections. It is similar in many respects to the IPC mechanism described in Section 3.4, and it is usually built on top of such a system. Here, however, because we are dealing with an environment in which the processes are executing on separate systems, we must use a message-based communication scheme to provide remote service. In contrast to IPC messages, the messages exchanged in RPC communication are well structured and are thus no longer just packets of data. Each message is addressed to an RPC daemon listening to a port on the remote system, and each contains an identifier specifying the function to execute and the parameters to pass to that function. The function is then executed as requested, and any output is sent back to the requester in a separate message. A port in this context is simply a number included at the start of a message packet. Whereas a system normally has one network address, it can have many ports within that address to differentiate the many network services it supports. If a remote process needs a service, it addresses a message to the proper port. For instance, if a system wished to allow other systems to be able to list its current users, it would have a daemon supporting such an RPC attached to a port—say, port 3027. Any remote system could obtain the needed information (that is, the list of current users) by sending an RPC message to port 3027 on the server. The data would be received in a reply message. The semantics of RPCs allows a client to invoke a procedure on a remote host as it would invoke a procedure locally. The RPC system hides the details that allow communication to take place by providing a stub on the client side. Typically, a separate stub exists for each separate remote procedure. When the client invokes a remote procedure, the RPC system calls the appropriate stub, passing it the parameters provided to the remote procedure. This stub locates the port on the server and marshals the parameters. The stub then transmits a message to the server using message passing. A similar stub on the server side receives this message and invokes the procedure on the server. If necessary, return values are passed back to the client using the same technique. On Windows systems, stub code is compiled from a specification written in the Microsoft Interface Definition Language (MIDL), which is used for defining the interfaces between client and server programs. Parameter marshaling addresses the issue concerning differences in data representation on the client and server machines. Consider the representation of 32-bit integers. Some systems (known as big-endian) store the most significant byte first, while other systems (known as little-endian) store the least significant byte first. Neither order is “better” per se; rather, the choice is arbitrary within a computer architecture. To resolve differences like this, many RPC systems define a machine-independent representation of data. One such representation is known as external data representation (XDR). On the client side, parameter marshaling involves converting the machine-dependent data into XDR before they are sent to the server. On the server side, the XDR data are unmarshaled and converted to the machine-dependent representation for the server. Another important issue involves the semantics of a call. Whereas local procedure calls fail only under extreme circumstances, RPCs can fail, or be duplicated and executed more than once, as a result of common network errors. One way to address this problem is for the operating system to ensure that messages are acted on exactly once, rather than at most once. Most local procedure calls have the “exactly once” functionality, but it is more difficult to implement. First, consider “at most once.” This semantic can be implemented by attaching a timestamp to each message. The server must keep a history of all the timestamps of messages it has already processed or a history large enough to ensure that repeated messages are detected. Incoming messages that have a timestamp already in the history are ignored. The client can then send a message one or more times and be assured that it only executes once. For “exactly once,” we need to remove the risk that the server will never receive the request. To accomplish this, the server must implement the “at most once” protocol described above but must also acknowledge to the client that the RPC call was received and executed. These ACK messages are common throughout networking. The client must resend each RPC call periodically until it receives the ACK for that call. Yet another important issue concerns the communication between a server and a client. With standard procedure calls, some form of binding takes place during link, load, or execution time (Chapter 9) so that a procedure call's name is replaced by the memory address of the procedure call. The RPC scheme requires a similar binding of the client and the server port, but how does a client know the port numbers on the server? Neither system has full information about the other, because they do not share memory. Two approaches are common. First, the binding information may be predetermined, in the form of fixed port addresses. At compile time, an RPC call has a fixed port number associated with it. Once a program is compiled, the server cannot change the port number of the requested service. Second, binding can be done dynamically by a rendezvous mechanism. Typically, an operating system provides a rendezvous (also called a matchmaker) daemon on a fixed RPC port. A client then sends a message containing the name of the RPC to the rendezvous daemon requesting the port address of the RPC it needs to execute. The port number is returned, and the RPC calls can be sent to that port until the process terminates (or the server crashes). This method requires the extra overhead of the initial request but is more flexible than the first approach. Figure 3.29 shows a sample interaction.

      Remote Procedure Calls (RPCs) abstract the communication process between distributed systems by allowing a client to invoke procedures on a remote machine as if they were local functions. Unlike raw socket communication, which requires applications to structure their own messages, RPCs handle function calls transparently. A key component of RPCs is parameter marshaling, which ensures data compatibility between different architectures (e.g., big-endian vs. little-endian). Additionally, RPC implementations must handle network failures and duplicate execution risks, requiring techniques such as timestamping and acknowledgment messages. The use of a matchmaker service for dynamic binding enhances flexibility, allowing clients to locate available RPC services at runtime rather than relying on fixed port assignments.

    2. 3.7.1 POSIX Shared Memory Several IPC mechanisms are available for POSIX systems, including shared memory and message passing. Here, we explore the POSIX API for shared memory. POSIX shared memory is organized using memory-mapped files, which associate the region of shared memory with a file. A process must first create a shared-memory object using the shm_open() system call, as follows: fd = shm_open(name, O_CREAT | O_RDWR, 0666); The first parameter specifies the name of the shared-memory object. Processes that wish to access this shared memory must refer to the object by this name. The subsequent parameters specify that the shared-memory object is to be created if it does not yet exist (O_CREAT) and that the object is open for reading and writing (O_RDWR). The last parameter establishes the file-access permissions of the shared-memory object. A successful call to shm_open() returns an integer file descriptor for the shared-memory object. Once the object is established, the ftruncate() function is used to configure the size of the object in bytes. The call ftruncate(fd, 4096); sets the size of the object to 4,096 bytes. Finally, the mmap() function establishes a memory-mapped file containing the shared-memory object. It also returns a pointer to the memory-mapped file that is used for accessing the shared-memory object. The programs shown in Figure 3.16 and Figure 3.17 use the producer–consumer model in implementing shared memory. The producer establishes a shared-memory object and writes to shared memory, and the consumer reads from shared memory. #include <stdio.h> #include <stdlib.h> #include <string.h> #include <fcntl.h> #include <sys/shm.h> #include <sys/stat.h> #include <sys/mman.h> int main() { /* the size (in bytes) of shared memory object */ const int SIZE = 4096; /* name of the shared memory object */ const char *name = "OS"; /* strings written to shared memory */ const char *message_0 = "Hello"; const char *message_1 = "World!"; /* shared memory file descriptor */ int fd; /* pointer to shared memory obect */ char *ptr;    /* create the shared memory object */    fd = shm_open(name,O_CREAT | O_RDWR,0666);    /* configure the size of the shared memory object */    ftruncate(fd, SIZE);    /* memory map the shared memory object */    ptr = (char *)     mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);    /* write to the shared memory object */    sprintf(ptr,"%s",message_0);    ptr += strlen(message_0);    sprintf(ptr,"%s",message_1);    ptr += strlen(message_1);    return 0; } Figure 3.16 Producer process illustrating POSIX shared-memory API. #include <stdio.h> #include <stdlib.h> #include <fcntl.h> #include <sys/shm.h> #include <sys/stat.h> #include <sys/mman.h> int main() { /* the size (in bytes) of shared memory object */ const int SIZE = 4096; /* name of the shared memory object */ const char *name = "OS"; /* shared memory file descriptor */ int fd; /* pointer to shared memory obect */ char *ptr;    /* open the shared memory object */    fd = shm_open(name, O_RDONLY, 0666);    /* memory map the shared memory object */    ptr = (char *)     mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);    /* read from the shared memory object */    printf("%s",(char *)ptr);    /* remove the shared memory object */    shm_unlink(name);    return 0; } Figure 3.17 Consumer process illustrating POSIX shared-memory API. The producer, shown in Figure 3.16, creates a shared-memory object named OS and writes the infamous string “Hello World!” to shared memory. The program memory-maps a shared-memory object of the specified size and allows writing to the object. The flag MAP_SHARED specifies that changes to the shared-memory object will be visible to all processes sharing the object. Notice that we write to the shared-memory object by calling the sprintf() function and writing the formatted string to the pointer ptr. After each write, we must increment the pointer by the number of bytes written. The consumer process, shown in Figure 3.17, reads and outputs the contents of the shared memory. The consumer also invokes the shm_unlink() function, which removes the shared-memory segment after the consumer has accessed it. We provide further exercises using the POSIX shared-memory API in the programming exercises at the end of this chapter. Additionally, we provide more detailed coverage of memory mapping in Section 13.5. 3.7.2 Mach Message Passing As an example of message passing, we next consider the Mach operating system. Mach was especially designed for distributed systems, but was shown to be suitable for desktop and mobile systems as well, as evidenced by its inclusion in the MacOS and iOS operating systems, as discussed in Chapter 2. The Mach kernel supports the creation and destruction of multiple tasks, which are similar to processes but have multiple threads of control and fewer associated resources. Most communication in Mach—including all inter-task communication—is carried out by messages. Messages are sent to, and received from, mailboxes, which are called ports in Mach. Ports are finite in size and unidirectional; for two-way communication, a message is sent to one port, and a response is sent to a separate reply port. Each port may have multiple senders, but only one receiver. Mach uses ports to represent resources such as tasks, threads, memory, and processors, while message passing provides an object-oriented approach for interacting with these system resources and services. Message passing may occur between any two ports on the same host or on separate hosts on a distributed system. Associated with each port is a collection of port rights that identify the capabilities necessary for a task to interact with the port. For example, for a task to receive a message from a port, it must have the capability MACH_PORT_RIGHT_RECEIVE for that port. The task that creates a port is that port's owner, and the owner is the only task that is allowed to receive messages from that port. A port's owner may also manipulate the capabilities for a port. This is most commonly done in establishing a reply port. For example, assume that task T1 owns port P1, and it sends a message to port P2, which is owned by task T2. If T1 expects to receive a reply from T2, it must grant T2 the right MACH_PORT_RIGHT_SEND for port P1. Ownership of port rights is at the task level, which means that all threads belonging to the same task share the same port rights. Thus, two threads belonging to the same task can easily communicate by exchanging messages through the per-thread port associated with each thread. When a task is created, two special ports—the Task Self port and the Notify port—are also created. The kernel has receive rights to the Task Self port, which allows a task to send messages to the kernel. The kernel can send notification of event occurrences to a task's Notify port (to which, of course, the task has receive rights). The mach_port_allocate() function call creates a new port and allocates space for its queue of messages. It also identifies the rights for the port. Each port right represents a name for that port, and a port can only be accessed via a right. Port names are simple integer values and behave much like UNIX file descriptors. The following example illustrates creating a port using this API: mach_port_t port; // the name of the port right mach_port_allocate(         mach_task_self(), // a task referring to itself         MACH_PORT_RIGHT_RECEIVE, // the right for this port         &port); // the name of the port right Each task also has access to a bootstrap port, which allows a task to register a port it has created with a system-wide bootstrap server. Once a port has been registered with the bootstrap server, other tasks can look up the port in this registry and obtain rights for sending messages to the port. The queue associated with each port is finite in size and is initially empty. As messages are sent to the port, the messages are copied into the queue. All messages are delivered reliably and have the same priority. Mach guarantees that multiple messages from the same sender are queued in first-in, first-out (FIFO) order but does not guarantee an absolute ordering. For instance, messages from two senders may be queued in any order. Mach messages contain the following two fields: A fixed-size message header containing metadata about the message, including the size of the message as well as source and destination ports. Commonly, the sending thread expects a reply, so the port name of the source is passed on to the receiving task, which can use it as a “return address” in sending a reply. A variable-sized body containing data. Messages may be either simple or complex. A simple message contains ordinary, unstructured user data that are not interpreted by the kernel. A complex message may contain pointers to memory locations containing data (known as “out-of-line” data) or may also be used for transferring port rights to another task. Out-of-line data pointers are especially useful when a message must pass large chunks of data. A simple message would require copying and packaging the data in the message; out-of-line data transmission requires only a pointer that refers to the memory location where the data are stored. The function mach_msg() is the standard API for both sending and receiving messages. The value of one of the function's parameters—either MACH_SEND_MSG or MACH_RCV_MSG—indicates if it is a send or receive operation. We now illustrate how it is used when a client task sends a simple message to a server task. Assume there are two ports—client and server—associated with the client and server tasks, respectively. The code in Figure 3.18 shows the client task constructing a header and sending a message to the server, as well as the server task receiving the message sent from the client. #include<mach/mach.h> struct message {   mach_msg_header_t header;   int data; }; mach_port_t client; mach_port_t server;        /* Client Code */ struct message message; // construct the header message.header.msgh_size = sizeof(message); message.header.msgh_remote_port = server; message.header.msgh_local_port = client; // send the message mach_msg(&message.header, // message header   MACH_SEND_MSG, // sending a message   sizeof(message), // size of message sent   0, // maximum size of received message - unnecessary   MACH_PORT_NULL, // name of receive port - unnecessary   MACH_MSG_TIMEOUT_NONE, // no time outs   MACH_PORT_NULL // no notify port );       /* Server Code */ struct message message; // receive the message mach_msg(&message.header, // message header   MACH_RCV_MSG, // sending a message   0,  // size of message sent   sizeof(message), // maximum size of received message   server, // name of receive port   MACH_MSG_TIMEOUT_NONE, // no time outs   MACH_PORT_NULL // no notify port ); Figure 3.18 Example program illustrating message passing in Mach. The mach_msg() function call is invoked by user programs for performing message passing. mach_msg() then invokes the function mach_msg_trap(), which is a system call to the Mach kernel. Within the kernel, mach_msg_trap() next calls the function mach_msg_overwrite_trap(), which then handles the actual passing of the message. The send and receive operations themselves are flexible. For instance, when a message is sent to a port, its queue may be full. If the queue is not full, the message is copied to the queue, and the sending task continues. If the port's queue is full, the sender has several options (specified via parameters to mach_msg(): 1. Wait indefinitely until there is room in the queue. 2. Wait at most n milliseconds. 3. Do not wait at all but rather return immediately. 4. Temporarily cache a message. Here, a message is given to the operating system to keep, even though the queue to which that message is being sent is full. When the message can be put in the queue, a notification message is sent back to the sender. Only one message to a full queue can be pending at any time for a given sending thread. The final option is meant for server tasks. After finishing a request, a server task may need to send a one-time reply to the task that requested the service, but it must also continue with other service requests, even if the reply port for a client is full. The major problem with message systems has generally been poor performance caused by copying of messages from the sender's port to the receiver's port. The Mach message system attempts to avoid copy operations by using virtual-memory-management techniques (Chapter 10). Essentially, Mach maps the address space containing the sender's message into the receiver's address space. Therefore, the message itself is never actually copied, as both the sender and receiver access the same memory. This message-management technique provides a large performance boost but works only for intrasystem messages.

      In the section discussing POSIX Shared Memory, the shared memory mechanism relies on memory-mapped files and uses the shm_open(), ftruncate(), and mmap() functions to facilitate inter-process communication (IPC). Shared memory allows processes to communicate efficiently by writing to a memory region that can be accessed by multiple processes. The producer-consumer model is used in this example, where one process (the producer) writes to shared memory, and another (the consumer) reads the data. This method is particularly fast compared to other IPC methods because it avoids the overhead of copying data between processes. However, it is important to note that proper synchronization is required to ensure that concurrent access to shared memory does not lead to data corruption or inconsistency.

      In Mach's Message Passing, communication between tasks occurs via messages sent to and received from ports. Mach message passing is designed for distributed systems and allows efficient communication, even between systems. Each task is associated with ports that manage communication through unidirectional messages, and port rights control access to these messages. The flexibility of message passing in Mach is evident in its queuing system, where messages are queued in FIFO order and can be managed using different waiting strategies if the queue is full. Additionally, Mach uses virtual memory mapping techniques to avoid the overhead of copying messages, enhancing performance in local communications.

    3. 3.6 IPC in Message-Passing Systems In Section 3.5, we showed how cooperating processes can communicate in a shared-memory environment. The scheme requires that these processes share a region of memory and that the code for accessing and manipulating the shared memory be written explicitly by the application programmer. Another way to achieve the same effect is for the operating system to provide the means for cooperating processes to communicate with each other via a message-passing facility. Message passing provides a mechanism to allow processes to communicate and to synchronize their actions without sharing the same address space. It is particularly useful in a distributed environment, where the communicating processes may reside on different computers connected by a network. For example, an Internet chat program could be designed so that chat participants communicate with one another by exchanging messages. A message-passing facility provides at least two operations: send(message) and receive(message) Messages sent by a process can be either fixed or variable in size. If only fixed-sized messages can be sent, the system-level implementation is straightforward. This restriction, however, makes the task of programming more difficult. Conversely, variable-sized messages require a more complex system-level implementation, but the programming task becomes simpler. This is a common kind of tradeoff seen throughout operating-system design. If processes P and Q want to communicate, they must send messages to and receive messages from each other: a communication link must exist between them. This link can be implemented in a variety of ways. We are concerned here not with the link's physical implementation (such as shared memory, hardware bus, or network, which are covered in Chapter 19) but rather with its logical implementation. Here are several methods for logically implementing a link and the send()/receive() operations: Direct or indirect communication Synchronous or asynchronous communication Automatic or explicit buffering We look at issues related to each of these features next. 3.6.1 Naming Processes that want to communicate must have a way to refer to each other. They can use either direct or indirect communication. Under direct communication, each process that wants to communicate must explicitly name the recipient or sender of the communication. In this scheme, the send() and receive() primitives are defined as: send(P, message)—Send a message to process P. receive(Q, message)—Receive a message from process Q. A communication link in this scheme has the following properties: A link is established automatically between every pair of processes that want to communicate. The processes need to know only each other's identity to communicate. A link is associated with exactly two processes. Between each pair of processes, there exists exactly one link. This scheme exhibits symmetry in addressing; that is, both the sender process and the receiver process must name the other to communicate. A variant of this scheme employs asymmetry in addressing. Here, only the sender names the recipient; the recipient is not required to name the sender. In this scheme, the send() and receive() primitives are defined as follows: send(P, message)—Send a message to process P. receive(id, message)—Receive a message from any process. The variable id is set to the name of the process with which communication has taken place. The disadvantage in both of these schemes (symmetric and asymmetric) is the limited modularity of the resulting process definitions. Changing the identifier of a process may necessitate examining all other process definitions. All references to the old identifier must be found, so that they can be modified to the new identifier. In general, any such hard-coding techniques, where identifiers must be explicitly stated, are less desirable than techniques involving indirection, as described next. With indirect communication, the messages are sent to and received from mailboxes, or ports. A mailbox can be viewed abstractly as an object into which messages can be placed by processes and from which messages can be removed. Each mailbox has a unique identification. For example, POSIX message queues use an integer value to identify a mailbox. A process can communicate with another process via a number of different mailboxes, but two processes can communicate only if they have a shared mailbox. The send() and receive() primitives are defined as follows: send(A, message)—Send a message to mailbox A. receive(A, message)—Receive a message from mailbox A. In this scheme, a communication link has the following properties: A link is established between a pair of processes only if both members of the pair have a shared mailbox. A link may be associated with more than two processes. Between each pair of communicating processes, a number of different links may exist, with each link corresponding to one mailbox. Now suppose that processes P1, P2, and P3 all share mailbox A. Process P1 sends a message to A, while both P2 and P3 execute a receive() from A. Which process will receive the message sent by P1? The answer depends on which of the following methods we choose: Allow a link to be associated with two processes at most. Allow at most one process at a time to execute a receive() operation. Allow the system to select arbitrarily which process will receive the message (that is, either P2 or P3, but not both, will receive the message). The system may define an algorithm for selecting which process will receive the message (for example, round robin, where processes take turns receiving messages). The system may identify the receiver to the sender. A mailbox may be owned either by a process or by the operating system. If the mailbox is owned by a process (that is, the mailbox is part of the address space of the process), then we distinguish between the owner (which can only receive messages through this mailbox) and the user (which can only send messages to the mailbox). Since each mailbox has a unique owner, there can be no confusion about which process should receive a message sent to this mailbox. When a process that owns a mailbox terminates, the mailbox disappears. Any process that subsequently sends a message to this mailbox must be notified that the mailbox no longer exists. In contrast, a mailbox that is owned by the operating system has an existence of its own. It is independent and is not attached to any particular process. The operating system then must provide a mechanism that allows a process to do the following: Create a new mailbox. Send and receive messages through the mailbox. Delete a mailbox. The process that creates a new mailbox is that mailbox's owner by default. Initially, the owner is the only process that can receive messages through this mailbox. However, the ownership and receiving privilege may be passed to other processes through appropriate system calls. Of course, this provision could result in multiple receivers for each mailbox. 3.6.2 Synchronization Communication between processes takes place through calls to send() and receive() primitives. There are different design options for implementing each primitive. Message passing may be either blocking or nonblocking—also known as synchronous and asynchronous. (Throughout this text, you will encounter the concepts of synchronous and asynchronous behavior in relation to various operating-system algorithms.) Blocking send. The sending process is blocked until the message is received by the receiving process or by the mailbox. Nonblocking send. The sending process sends the message and resumes operation. Blocking receive. The receiver blocks until a message is available. Nonblocking receive. The receiver retrieves either a valid message or a null. Different combinations of send() and receive() are possible. When both send() and receive() are blocking, we have a rendezvous between the sender and the receiver. The solution to the producer–consumer problem becomes trivial when we use blocking send() and receive() statements. The producer merely invokes the blocking send() call and waits until the message is delivered to either the receiver or the mailbox. Likewise, when the consumer invokes receive(), it blocks until a message is available. This is illustrated in Figures 3.14 and 3.15. message next_produced; while (true) {      /* produce an item in next_produced */      send(next_produced); } Figure 3.14 The producer process using message passing. message next_consumed; while (true) {      receive(next_consumed);      /* consume the item in next_consumed */ } Figure 3.15 The consumer process using message passing. 3.6.3 Buffering Whether communication is direct or indirect, messages exchanged by communicating processes reside in a temporary queue. Basically, such queues can be implemented in three ways: Zero capacity. The queue has a maximum length of zero; thus, the link cannot have any messages waiting in it. In this case, the sender must block until the recipient receives the message. Bounded capacity. The queue has finite length n; thus, at most n messages can reside in it. If the queue is not full when a new message is sent, the message is placed in the queue (either the message is copied or a pointer to the message is kept), and the sender can continue execution without waiting. The link's capacity is finite, however. If the link is full, the sender must block until space is available in the queue. Unbounded capacity. The queue's length is potentially infinite; thus, any number of messages can wait in it. The sender never blocks. The zero-capacity case is sometimes referred to as a message system with no buffering. The other cases are referred to as systems with automatic buffering.

      Message-passing systems provide an alternative to shared memory for interprocess communication (IPC), allowing processes to communicate and synchronize without sharing address space. This is particularly useful in distributed environments where processes reside on different computers. Message passing relies on send() and receive() operations, which can handle fixed or variable-sized messages, with fixed sizes simplifying system implementation but increasing programming complexity. Communication links between processes can be established through direct or indirect methods. Direct communication requires processes to explicitly name each other, creating a one-to-one link, while indirect communication uses mailboxes (or ports) for message exchange, allowing more flexibility and modularity. Synchronization in message passing can be either blocking (synchronous) or nonblocking (asynchronous). Blocking send() causes the sender to wait until the message is received, while blocking receive() makes the receiver wait until a message is available. Nonblocking send() allows the sender to continue immediately after dispatching the message, and nonblocking receive() allows the receiver to check for messages without waiting. When both operations are blocking, a rendezvous occurs, simplifying synchronization. Buffering determines how messages are temporarily stored before being received. Zero-capacity queues require senders to wait for the recipient to be ready, bounded queues impose a fixed limit on messages and may require senders to wait when full, while unbounded queues allow infinite messages without blocking. The choice of buffering impacts system performance and message flow control.

    4. 3.5 IPC in Shared-Memory Systems Interprocess communication using shared memory requires communicating processes to establish a region of shared memory. Typically, a shared-memory region resides in the address space of the process creating the shared-memory segment. Other processes that wish to communicate using this shared-memory segment must attach it to their address space. Recall that, normally, the operating system tries to prevent one process from accessing another process's memory. Shared memory requires that two or more processes agree to remove this restriction. They can then exchange information by reading and writing data in the shared areas. The form of the data and the location are determined by these processes and are not under the operating system's control. The processes are also responsible for ensuring that they are not writing to the same location simultaneously. To illustrate the concept of cooperating processes, let's consider the producer–consumer problem, which is a common paradigm for cooperating processes. A producer process produces information that is consumed by a consumer process. For example, a compiler may produce assembly code that is consumed by an assembler. The assembler, in turn, may produce object modules that are consumed by the loader. The producer–consumer problem also provides a useful metaphor for the client–server paradigm. We generally think of a server as a producer and a client as a consumer. For example, a web server produces (that is, provides) web content such as HTML files and images, which are consumed (that is, read) by the client web browser requesting the resource. One solution to the producer–consumer problem uses shared memory. To allow producer and consumer processes to run concurrently, we must have available a buffer of items that can be filled by the producer and emptied by the consumer. This buffer will reside in a region of memory that is shared by the producer and consumer processes. A producer can produce one item while the consumer is consuming another item. The producer and consumer must be synchronized, so that the consumer does not try to consume an item that has not yet been produced. Two types of buffers can be used. The unbounded buffer places no practical limit on the size of the buffer. The consumer may have to wait for new items, but the producer can always produce new items. The bounded buffer assumes a fixed buffer size. In this case, the consumer must wait if the buffer is empty, and the producer must wait if the buffer is full. Let's look more closely at how the bounded buffer illustrates interprocess communication using shared memory. The following variables reside in a region of memory shared by the producer and consumer processes: #define BUFFER_SIZE 10 typedef struct { . . . } item; item buffer[BUFFER_SIZE]; int in = 0; int out = 0; The shared buffer is implemented as a circular array with two logical pointers: in and out. The variable in points to the next free position in the buffer; out points to the first full position in the buffer. The buffer is empty when in == out; the buffer is full when ((in + 1) % BUFFER_SIZE) == out. The code for the producer process is shown in Figure 3.12, and the code for the consumer process is shown in Figure 3.13. The producer process has a local variable next_produced in which the new item to be produced is stored. The consumer process has a local variable next_consumed in which the item to be consumed is stored. item next_produced; while (true) {      /* produce an item in next_produced */      while (((in + 1) % BUFFER_SIZE) == out)        ; /* do nothing */      buffer[in] = next_produced;      in = (in + 1) % BUFFER_SIZE; } Figure 3.12 The producer process using shared memory. item next_consumed; while (true) {      while (in == out)        ; /* do nothing */      next_consumed = buffer[out];      out = (out + 1) % BUFFER_SIZE;      /* consume the item in next_consumed */ } Figure 3.13 The consumer process using shared memory. This scheme allows at most BUFFER_SIZE − 1 items in the buffer at the same time. We leave it as an exercise for you to provide a solution in which BUFFER_SIZE items can be in the buffer at the same time. In Section 3.7.1, we illustrate the POSIX API for shared memory. One issue this illustration does not address concerns the situation in which both the producer process and the consumer process attempt to access the shared buffer concurrently. In Chapter 6 and Chapter 7, we discuss how synchronization among cooperating processes can be implemented effectively in a shared-memory environment.

      Interprocess communication (IPC) in shared-memory systems allows processes to communicate by creating a shared-memory region. Normally, operating systems restrict processes from accessing each other’s memory, but shared memory requires processes to agree to lift this restriction. These processes determine data structure and management without operating system intervention. A common example is the producer–consumer problem, where a producer generates data consumed by a consumer. This paradigm extends to client-server models, such as web servers providing content to browsers. Shared memory enables concurrent execution of producers and consumers through a buffer, which can be either unbounded (allowing unlimited production) or bounded (with a fixed size, requiring synchronization). A bounded buffer, implemented as a circular array, uses two pointers, in and out, to manage data flow. The buffer is empty when in == out and full when ((in + 1) % BUFFER_SIZE) == out. The producer adds items to the buffer, while the consumer removes them. However, simultaneous access by both processes can lead to conflicts, requiring synchronization techniques, discussed in later chapters. This model enhances efficiency by minimizing kernel intervention, but careful synchronization is necessary to avoid issues like race conditions and data inconsistency.

    5. Process Control Block Each process is represented in the operating system by a process control block (PCB)—also called a task control block. A PCB is shown in Figure 3.3. It contains many pieces of information associated with a specific process, including these: Process state. The state may be new, ready, running, waiting, halted, and so on. Program counter. The counter indicates the address of the next instruction to be executed for this process. CPU registers. The registers vary in number and type, depending on the computer architecture. They include accumulators, index registers, stack pointers, and general-purpose registers, plus any condition-code information. Along with the program counter, this state information must be saved when an interrupt occurs, to allow the process to be continued correctly afterward when it is rescheduled to run. CPU-scheduling information. This information includes a process priority, pointers to scheduling queues, and any other scheduling parameters. (Chapter 5 describes process scheduling.) Memory-management information. This information may include such items as the value of the base and limit registers and the page tables, or the segment tables, depending on the memory system used by the operating system (Chapter 9). Accounting information. This information includes the amount of CPU and real time used, time limits, account numbers, job or process numbers, and so on. I/O status information. This information includes the list of I/O devices allocated to the process, a list of open files, and so on.

      The OS tracks each process using a PCB, which stores critical information like process state, program counter, CPU registers, memory management data, and I/O status. This allows the OS to pause and resume processes efficiently.

    6. 3.1.1 The Process Informally, as mentioned earlier, a process is a program in execution. The status of the current activity of a process is represented by the value of the program counter and the contents of the processor's registers. The memory layout of a process is typically divided into multiple sections, and is shown in Figure 3.1. These sections include: Text section—the executable code Data section—global variables Heap section—memory that is dynamically allocated during program run time Stack section—temporary data storage when invoking functions (such as function parameters, return addresses, and local variables)

      A process consists of different sections in memory: text (executable code), data (global variables), heap (dynamic memory), and stack (function call data). The heap and stack grow dynamically during execution, but the OS must ensure they don’t overlap.

    7. arly computers allowed only one program to be executed at a time. This program had complete control of the system and had access to all the system's resources. In contrast, contemporary computer systems allow multiple programs to be loaded into memory and executed concurrently. This evolution required firmer control and more compartmentalization of the various programs; and these needs resulted in the notion of a process, which is a program in execution. A process is the unit of work in a modern computing system.

      Early computers executed only one program at a time, whereas modern systems allow multiple programs to run concurrently. This required the concept of a process, which is a program in execution. A system consists of many processes, some running user code and others handling system operations.

    1. Summary of the Talk: Building More Powerful User Interfaces in the Browser

      Introduction & Motivation

      • The speaker reflects on a year’s work in adopting modern web technologies (AMD, Backbone, HTML5, CSS3, new browser APIs) but realizes that the fundamental power given to users has not improved significantly from 15 years ago.

      "I looked at what we had built at the end of the year and I said you know I think I could have built this 15 years ago when I started writing JavaScript."

      • Traditional web applications function like simple forms: users provide inputs, and the application computes outputs. The speaker seeks a way to eliminate this rigid distinction and create more interactive and dynamic UI models.

      "You have to decide in advance which things you think are input... and which things you think are output."

      Example: Federal Budget Visualization

      • A web-based visualization of the US 2013 federal budget illustrates the limitations of traditional input-output models.

      "What if we could actually take this and change this and not have a distinction between input and output?"

      • The speaker explores how users could interact more dynamically by locking certain variables (e.g., keeping the deficit fixed) and observing how other variables (e.g., taxes) adjust automatically.

      Concept of Constraint Programming

      • Constraint programming allows defining relationships between variables instead of prescribing explicit procedures for computation.

      "Constraint programming is about writing our programs in terms of relations instead of procedures."

      • Example: Instead of coding tax rates explicitly, define relationships between tax brackets and let the system adjust them dynamically.

      "We make all of our variables both input and output."

      Cassowary Solver for UI Constraints

      • Cassowary (CJS), a JavaScript library, enables constraint solving for UI applications. Originally used in iOS Auto Layout.

      "Cassowary is a fantastic library for doing constraint programming in JavaScript."

      • It ensures relationships like total spending = defense + non-defense spending remain consistent, even when one variable is modified dynamically.

      "The solver will automatically for us make sure that this relationship between the three variables always holds."

      • Supports priority constraints to handle over-constrained systems (e.g., some constraints are "required," others are "nice to have").

      "If you have problems that are over constrained where there's no complete solution, Cassowary will find you the best solution you can find."

      Practical Implementation

      • Basic interaction model:
      • Mark a variable as being edited (beginEdit).
      • Suggest new values (suggestValue).
      • End editing (endEdit).
      • Add constraints (stayConstraint) to keep certain variables fixed.

      "The reason this is called Suggest and not set... is that this value might not lead to an actual solution."

      • Using constraints simplifies complex relationships, such as progressive taxation and revenue calculations.

      "Several very natural things fell out of writing the relationships that would have been a real pain to code by hand."

      Limitations of Cassowary

      • Only supports linear equations and numeric variables (e.g., it cannot handle quadratic constraints).

      "C can only solve problems where the variables are numbers and it can only solve where the relationships between the numbers are linear expressions."

      • Proposes improvements:
      • Nonlinear constraint solvers (for geometric problems, e.g., keeping a point on a circle).
      • MiniKanren & Core.logic for relational programming on non-numeric problems.

      MiniKanren & Core.logic

      • A relational programming model that generalizes constraints beyond numbers to trees, lists, colors, and abstract structures.

      "MiniKanren provides relational programming just like we've been doing with Cassowary but over non-numerical problem domains."

      • Example: Sudoku Solver in Core.logic solves the problem declaratively by defining constraints rather than writing procedural code.

      "This looks like a statement of the problem of Sudoku, and yet this will run really fast and give us the answers."

      • Benchmarks: A JavaScript Core.logic Sudoku solver runs 100x faster than Peter Norvig’s optimized Python version.

      "This version which is not handwritten just happens to use a constraint solver works on average about a hundred times faster than Peter Norvig's Python code in JavaScript in the browser."

      Future Directions: Cooperating Solvers

      • Alan Kay’s Viewpoints Research Institute explores "cooperating solvers," where different constraint-solving techniques (dataflow, logical constraints) work together.

      "How do they cooperate? There's some really interesting PDFs and example code showing how to do that."

      • Potential applications:
      • More powerful layout engines.
      • Interactive problem-solving beyond algebraic constraints.

      Bonus: Brett Victor’s Scrubbing Calculator

      • Inspired by Brett Victor’s UI work, the speaker demonstrates an interactive "scrubbing calculator" implemented in Cassowary.

      "This is an example of the kind of program that would be hard to write by hand unless you really want to write your own computer algebra system."

      • Allows solving for any variable in an equation dynamically just by adjusting values interactively.

      "If we had a real computer algebra system in JavaScript wouldn't that be great?"

      Conclusion

      • Advocates for constraint programming as a means to build more powerful, flexible UI applications.
      • Suggests integrating constraint-based approaches in mainstream web development to create more intelligent, adaptive, and user-driven applications.

      "If we want to build more powerful applications, we’ve got to give our users more leverage."

      • Calls for open-source contributions to Cassowary and similar projects.

      "C.JS is a fantastic one... let's go out and build more powerful user interfaces."

    1. Summary of DevTools FM Podcast with Juan Capa on Membrane.io

      Introduction and Background

      • Juan Capa is the creator of Membrane.io, a still-in-development platform for simplifying API automation and internal tooling.

      "Juan is the creator of membrane.io, a still-on-development platform for simplifying API Automation and internal tooling."_

      • He has a background in game development, having spent over a decade working on console, mobile, and web games.

      "I have a background in game development. I spent about 10 years a little bit more than 10 years working in game development."_

      • Worked at Vercel on the CDN team after being hired through Twitter, then briefly returned to Zynga before joining Mighty under a program that allowed him to work part-time on Membrane.

      "I saw a tweet by Guillermo Rauch ... He hired me to work for Vercel ... I spent two years there as the lead in the CDN team."_

      "Then I guess my last last thing I did was join Mighty ... working on my startup but also working three days for them."_

      • Now focusing on Membrane full-time and looking to onboard users soon.

      "So yeah now I'm a member in 100 and yeah hoping that I can show to the world and onboard some users in the coming week or two."_

      Membrane: Concept and Vision

      • Membrane was inspired by game engines, where every entity is programmable and data is universally accessible.

      "In game development, you’re dealing with this Engine with this universe, and this universe is completely programmable."_

      • Aimed at simplifying API automation and small-scale applications, particularly for personal automation.

      "It’s a place to write programs to build personal automation ... optimized for personal automation programs."_

      • Membrane provides an abstraction over APIs, allowing users to interact with data and automate workflows through a graph-based system.

      "The key to Membrane is this whole concept of a graph that is the main thing that programs use to manipulate the world."_

      • Designed to be highly accessible by integrating with Visual Studio Code and leveraging JavaScript/TypeScript.

      "The entire thing is built inside of Visual Studio Code ... The most used IDE is Visual Studio Code and the most used language is JavaScript."_

      Durability & Orthogonal Persistence

      • Membrane implements "orthogonal persistence," ensuring program state is always durable.

      "I decided to start building what is sometimes called orthogonal persistence, which is this concept of a durable program."_

      • Every Membrane program is an SQLite database, meaning all messages, state, and execution history are stored persistently.

      "Every member program is actually just one SQLite database."_

      • Programs execute with an event-sourcing model, where all inputs and outputs are first logged in SQLite before execution.

      "Every message that it receives, it first goes in the database and then it's processed."_

      • Uses Linux’s soft dirty pages for memory tracking, making it highly efficient in persisting only changed memory states.

      "I use quickjs ... and there’s a constant in the Linux kernel called Soft Dirty Pages ... only serialize the pages that actually change."_

      • Future improvements include optimizing serialization using WebAssembly’s linear memory model.

      "I’m saving more data than I should, so there’s even more optimizations I can do."_

      Observability & Debugging

      • Membrane prioritizes perfect observability, logging every event to enable full program introspection and debugging.

      "If it’s not in the logs, it didn’t happen."_

      • Allows time-travel debugging, replaying past states and executions.

      "You can go back to when that message was received and then run the code that was available back then."_

      • Aims to support snapshot-based time travel for enhanced debugging.

      "The first version I’m gonna have of that type of time travel is going to be with a snapshot that is taken every hour."_

      Membrane’s Graph Model

      • Membrane’s "graph" serves as a type-safe, unified interface for APIs.

      "Everything is a node, which you can think of as an object or a scalar (string, number, JSON type)."_

      • Drivers enable API connectivity, converting external APIs into Membrane’s schema and providing a consistent interface.

      "The GitHub driver has a schema ... basically it mirrors the GitHub API as a Membrane schema."_

      • Pagination is abstracted away, making API traversal seamless.

      "With Membrane, you have this object that’s a one-page, and a page has a reference to the next page."_

      • Users can mount different programs' graphs into their own, dynamically expanding their automation environment.

      "Your graph is basically the combination of all the graphs of all your programs."_

      Chrome Extension & API Interfacing

      • Membrane includes a Chrome extension that recognizes API entities on webpages.

      "What it does is it asks Membrane, ‘Hey, do any of the programs under Juan’s account recognize anything on this page?’"_

      • Future improvements will allow automatic driver installation when encountering unrecognized APIs.

      "Eventually, I can just offer you the option to install that driver with a click from the Chrome extension."_

      • Currently requires users to provide their own API keys, but OAuth-based authentication is planned.

      "Right now, you have to bring your own keys."_

      Cron & Automation Features

      • Membrane features built-in cron-like timers, which are stored in SQLite and visualized in the UI.

      "The SQLite database has a table called timers, and that table holds all scheduled actions."_

      • Users can visually track when timers will execute and manually trigger actions for testing.

      "From Visual Studio Code, you can just hover on each timer and see how long until it fires."_

      • Logs every timer execution, ensuring full transparency in automation workflows.

      "If it’s not in the logs, it didn’t happen."_

      Potential for Expansion & Future Vision

      • Membrane’s approach is inspired by game development tooling, where objects and behaviors are always inspectable.

      "In game engines, you’re dealing with objects where you can see all their properties and control them."_

      • Aims to provide a seamless developer experience, where APIs become interactable entities without custom adapters.

      "If you wanted to automate something with Twitter, you shouldn’t have to pre-install a driver."_

      • Exploring self-hosting and open-source models to improve privacy and decentralization.

      "Self-hosting membrane is going to be a thing ... I think I want to make it open-source."_

      • Could enable mobile implementations, particularly for interacting with on-device automation.

      "You could just access your Membrane graph from your phone."_

      • Possibility of auto-generating API drivers from HAR files or OpenAPI specs.

      "There are ways to generate API specs from network traffic ... from that API spec, you can generate the driver."_

      Conclusion

      • Membrane is a powerful tool aimed at making personal automation and API interaction seamless, leveraging game engine principles for maximum programmability.
      • It provides persistent execution, deep observability, and a graph-based API abstraction layer that simplifies working with external services.
      • With a focus on usability, it integrates tightly with VS Code and JavaScript while also offering innovative features like event sourcing, time travel debugging, and drag-and-drop API connections.
      • The future of Membrane includes open-source possibilities, mobile integrations, and potentially eliminating the need for manually defining API adapters.
      • It represents a new paradigm in developer tooling, where programs are durable, transparent, and universally programmable.
    1. They draw statistical correlations between a person’s zip code or languagepatterns and her potential to pay back a loan or handle a job. Thesecorrelations are discriminatory, and some of them are illegal.

      I've always been curious to how statistical analysis of one's language, socio-economic status and upbringing can be illegal in one context, but expected in another such as sports

    1. Uncovering secrets of the proteome: Alternate RNA decoding & Protein asymmetry shaping cell fate

      This presentation "Uncovering the secrets of the human proteome" was given in May 2024 at the 50th anniversary celebration of the Barnett Institute. It focused on progress in proteomics and two new discoveries: 1. Alternate RNA decoding results in stable and abundant proteins in mammals 2. Proteome asymmetry in mouse and human embryos before fate specification

      The vastness and complexity of the human proteome have hampered its exploration. New mass spectrometry technologies are transcending those limitations and allowing for large gains in sensitivity, sequence coverage, spatial and temporal resolution. I will discuss the conceptual drivers of this progress and provide examples of how it will advance our understanding of the human proteome and enable better therapeutics.

      https://youtu.be/F4-PUuz5kcQ?si=5iHIshGprYutcRih

  3. Jan 2025
    1. Anthology, Blackboard by Anthology

      Exhaling deeply while grading my 47th essay of the evening, coffee gone cold beside me

      Oh, Blackboard... bitter laugh

      Let me tell you about Blackboard through the fragments of my daily struggle, through the prism of 4/4 teaching load spread across three campuses just to make rent:

      Each semester begins the same— Login attempts like scattered prayers Dashboard a maze of broken promises Features that mock with their corporate sheen While I upload syllabi at midnight Again and again and again

      They sold us dreams of streamlined workflows But my grades still vanish into digital void Support tickets float unanswered Like autumn leaves in administrative wind While students message: "Professor, I can't find..." And I drown in workarounds

      The cost? Oh, the cost... Not from my adjunct's pittance But I watch department meetings Where deans speak of budget constraints Yet somehow there's always money For another Blackboard module Another upgrade Another promise

      Canvas beckons from across the quad Where my tenure-track colleagues reside In their technical paradise While we contingent faculty Navigate this labyrinth of legacy code Because migration costs too much For our satellite campus

      Do you know what it's like To build a course shell from scratch Four times a year Because "course copy" fails While grading deadline looms? To explain to students Why their mobile app won't load?

      Rubbing temples, reaching for cold coffee

      But tomorrow I'll log in again Because what choice do we have? When you're paid by the course You dance to the tune they play Even when the music stutters Even when the platform breaks

      Don't talk to me about "bad actors" Talk to me about survival About making do About teaching despite, not because of These digital walls we're given

      ...I should get back to grading. These essays won't grade themselves, and the Blackboard SpeedGrader is down. Again.

    1. Making functions also can help us organize our code. It lets us give a name to a block of code, and when we use it, those function names can help make the code more understandable. Making code as functions also helps in letting us put those pieces of code in other files or in code libraries, so the file we are working on is smaller and easier to manage.

      Functions often help organize code by naming it and improving some of its readability. They are also very good at storing abstraction and usability, which makes them easier to test and maintain. Functions are great for managing namespaces for the purpose of variable conflicts.

    1. Reviewer #2 (Public review):

      Summary:

      The authors use a genetically encoded fluorescent sensor, GRABNE, to measure NE dynamics in the dorsal hippocampus of mice in response to multiple behavioral manipulations. A non-linear model and regression were used to quantitatively assess the contribution of multiple behavioral covariates to changes in NE signaling, with the result that NE signal dynamics were best predicted by time from event transitions, with the signal exponentially decaying over a period of seconds to minutes after transitions. Event transitions were implemented as a transfer from a home cage to a novel arena, a transfer to a familiar linear track, and the introduction of novel objects. Additional experiments showed that spatial context transitions dominate NE signaling over novel object presentations, and experience accelerates the decay of the NE signal after spatial context transitions. Correspondingly, the hippocampal CA1 spatial code takes minutes to stabilize after context transition in both novel and familiar spaces.

      Strengths:

      A strength of the study is the use of the NE sensor with sub-second resolution, non-linear modeling, and regression to identify the prominent variable of interest as time from event transition, and multiple behavioral controls. The use of multiple behavioral designs to investigate the effect of familiarity, experience, and interaction of spatial context transitions and novel object introduction is a strength. Relating the dynamics of NE signal decay to the rate of CA1 spatial code changes is also a strength.

      Weaknesses:

      A minor weakness is that the concept of an event boundary needs to be more broadly discussed. The manuscript uses event transitions such as spatial context changes and novel object introduction to implement an event boundary. However, especially in episodic memory studies in humans, event structure and boundaries have also been shown to occur through the automatic segmentation of experiences into discrete events (Baldassano et al., Neuron, 2017; Radvansky and Zacks, Curr. Opi. Behav. Sci, 2017). The rodent experiments in the current manuscript explicitly introduce event boundaries through changes in context or objects, which can potentially be conflated with novelty. A discussion of these differences, and whether NE can also have a role in event boundary transitions based on automatic segmentation of experiences, will add to the impact of the manuscript.

    1. eLife Assessment

      This study examined the important question of how neurons code temporal information across the hippocampus, dorsal striatum, and orbitofrontal cortex. Using a behavioral task in the rat that requires discrimination between short and long time intervals, the authors conclude that time intervals are represented in all three regions and that synchronized activity of time-coding cells across the brain regions is coordinated by theta rhythms. However, several weaknesses are noted, and in its current form, the study provides incomplete evidence for understanding how temporal information is processed and coordinated throughout these brain networks.

    1. The turn-of-last-century British artist William Morris once said you can’t have art without resistance in the materials. The computer and its multifarious peripherals are the materials. The code is the art.

      The author compares computing power to a paint canvas

    1. The critic in a critique must engage deeply in the substance of the problem a designer is solving, meaning the more expertise they have on a problem, the better. After all, the goal of a critique is to help someone else understand what you were trying to do and why, so they can provide their own perspective on what they would have done and why

      This statement makes sense to me because there's a limit to which knowledge about general design principles can help you provide advice to other people. I think this statement makes me reflect back to when I did Robotics in high school. As a senior, I would look over my underclassmen's code and from experience, it would be impossible to give them meaningful advice on how to approach their problem without understanding the topic and how to code it myself.

    1. For quite some time we had an issue in the text editor related to the cursor getting stuck when it was at the end of a paragraph. It was annoying, but we did not know how to address it and it stayed open for half a year. That part of the code depended on Pharo objects that were wrapping Rust objects. It was only when we added the ability to inspect Rust objects that we found problem.

      Quizás es por esto que aún se necesita cambiar de ventanas y volver al GToolkit cuando el cursor deja de moverse.

    1. Introduction and Purpose

      “I would like to tell you why fulcro is awesome and why it's much easier to learn than you might believe so we will look at what fulcro is and what it can do for you and why is it interesting...”

      • Emphasizes that the talk aims to introduce Fulcro, explain its ease of learning, and highlight its benefits.

      Speaker Background

      “So first of all who is… I've been doing back-end development since 2006 and front-end development since 2014 on and off…”

      • Establishes the speaker’s credibility with extensive development experience.

      “...I built learning materials for Fulcro beginners and I pair program with and mentor my private clients on their first Fulcro project...”

      • Demonstrates the speaker’s active role in teaching Fulcro to newcomers.

      Motivation for Fulcro

      “When I create web applications I want to be productive and I want to have fun… I don't want to have to manually track whether the data started loading or finished or failed…”

      • Highlights the desire to reduce boilerplate and tedious manual tasks.

      “I don't want to write tons of boilerplate and especially not to do that and again and again for every new type data in my application…”

      • Stresses that Fulcro removes repetitive coding patterns, enhancing developer efficiency.

      Choosing a Full-Stack Framework

      “Now there are simpler Frameworks… or you can pick a full stack framework that has all the parts you need…”

      • Explains how Fulcro’s integrated approach can be preferable to patching together multiple libraries.

      “...malleable web framework designed for sustainable development of real world full stack web applications...”

      • Defines Fulcro as a flexible system that supports complex, long-lived applications.

      Key Fulcro Capabilities

      “It can render data in the UI and it uses React so it's wraps React for that…”

      • Confirms that Fulcro uses React under the hood for rendering.

      “It can manage state… it keeps the state for you at some place… re-render the UI so it reflects that state…”

      • Describes automatic state management and reactive re-rendering.

      “It makes it easy to load data from the backend… you have full control...”

      • Emphasizes the fine-grained control over data fetching.

      “Fulcro also caches the data for you automatically and it does so in normalized form…”

      • Highlights how normalized data storage simplifies updates across the UI.

      “Fulcro has excellent developer experience for multiple reasons… the biggest is locality and navigability…”

      • Points out how Fulcro keeps relevant code together, making it easier to navigate and maintain.

      Core Principles

      1. Graph API / EQL (Edn Query Language)

      “...we use graph API instead of rest API which means that we have just a single endpoint and it's the front end which asks the back end for what data it wants by sending over a query…”

      • Simplifies data retrieval by letting the client specify exactly what it needs.

      • UI as Pure Function of State

      “UI is pure function of state… components only ever get the data they need from their parent…”

      • Removes side effects from the rendering flow.

      • Locality

      “...to understand the UI component I shouldn't be forced to jump over four different files… so in Fulcro a component doesn’t have only a body but also a configuration map…”

      • Co-locates component queries, rendering, and logic in one place.

      • Normalized Client-Side State

      “...it stores that data normalized in a simple tabular form where entities contain other entities replaced with references…”

      • Ensures any update in one place is reflected throughout the UI.

      Architecture Overview

      “...it's a full stack web framework so it has the front end and back end part… front end is Fulcro proper… the back end is Fulcro’s library Pathom…”

      • Describes the division between the Fulcro client and the Pathom-based server.

      “On the front end… we have client DB… we have a transactional subsystem… to the back end we have Pathom… as kind of adapter between the tree of data the UI wants and whatever data sources there are.”

      • Clarifies how Fulcro’s client and server components communicate via EQL queries and mutations.

      UI Rendering Process

      “...UI is a tree of components and for each component we have a query… these queries are composed up so that the root component’s query is the query for the whole page.”

      • Outlines how each component declares its data needs, culminating in a single root query.

      “...Fulcro takes this query, combines it with the client DB, and forms a tree of data that matches the query shape, then hands it off to the root to render.”

      • Demonstrates the round-trip from query to final rendered UI.

      Component Example

      “Here we can see how a Fulcro component looks in code… The most important part here is the query…”

      • Provides a code snippet showing query co-location with the component.

      “...the component also includes the queries of its child components so the parent can pass down just the needed data.”

      • Reinforces that data flows naturally down the component tree.

      Learning Fulcro

      “People have this assumption or believe that Fulcro is hard to learn but it's not…”

      • Dispels the notion of steep difficulty.

      “There are simpler frameworks that do just one thing… but you need to handle a number of tasks and that you need to work across both front end and back end…”

      • Explains why novices might find full-stack solutions initially overwhelming.

      “You need to rewire your brain… if you come in expecting that things just work the way you expect you will be running into walls…”

      • Advises a mindset shift for those accustomed to different paradigms.

      Recommended Beginner Resources

      “...the Fulcro Developer's Guide… it describes everything in great detail but it can be overwhelming…”

      • Mentions the official documentation’s comprehensive nature.

      “...start with the do it yourself Fulcro Workshop… play with the concepts in practice and see how they work...”

      • Suggests hands-on learning as the best first step.

      “...there's this minimalist Fulcro tutorial… tries to teach you the absolute minimum amount of things you need to know…”

      • Recommends a focused tutorial that avoids overload.

      Simplicity Through Principles

      “Fulcro doesn't do any magic… its operation is straightforward and very much possible to understand…”

      • Emphasizes that Fulcro’s complexity is principled, not opaque.

      “...UI is pure function of data, standard input of data is the graph API, standard output of side effects is the transaction subsystem, and data is data, meaning queries and mutations are just data.”

      • Summarizes how Fulcro simplifies data handling, state management, and side effects uniformly.

      Demo Highlights

      “So let's have a demo… a simple Fulcro application showing todo list…”

      • Introduces a working demonstration of a to-do list in Fulcro.

      “...every side effect goes through transaction subsystem so I should see data here and I do, I see that they are loading them…”

      • Illustrates how Fulcro logs and displays all transactions for debugging.

      “I can also see the response… the data mirrors the query… if I ask for something that doesn't exist I get back empty data…”

      • Demonstrates the transparency of EQL-based queries and responses.

      Conclusion and Key Takeaways

      “Takeaways… that full stack frameworks are really useful and especially that Fulcro is really worth looking into and learning is not hard if you are a little smart about it…”

      • Concludes that Fulcro offers an approachable path to building maintainable full-stack ClojureScript applications.

      “Here are some awesome resources especially the Fulcro Community guide where you find the workshop and tutorial…”

      • Reiterates the availability of community-driven materials to support new learners.
    1. Summary of the Tech Talk on Software Development Leverage

      Speaker's Background & Context

      • The speaker has experience with nine startups, with four successes (defined as acquired or still operational).

        "I've been involved in nine startups, four successes so far, success defined as either bought by somebody else or still exists." - Core interests include minimal degradation over time, maximum architectural clarity, and minimal boilerplate.

        "I want to build systems that have a minimal amount of that maximum architecture clarity... I want a small number of Core Concepts and I also want minimal boilerplate." - Prefers Clojure and ClojureScript due to Lisp features, a REPL, macros, full-stack capabilities, and immutable data.

        "The main things are that it's a Lisp, I've got a REPL, I've got macros, I've got full stack language immutable data and literals."

      Concept of Software Development Leverage

      • Defines leverage in software as maximizing efficiency while minimizing incidental complexity.

        "What’s the minimal amount of code I can write to build these things?" - Software generally consists of forms and reports, and optimizing these elements reduces complexity.

        "A lot of what we write are forms or reports essentially." - Critiques past attempts at UI and form abstraction (e.g., Informix 4GL, Visual Basic, Rails, Java Enterprise) as insufficient or overly complex.

        "Every kind of library on the planet trying to do the same sort of thing." - Identifies challenges in leverage: short levers, fragile systems, opposing mindsets, and complex structures.

        "You can have too short of a lever, the object that we're trying to move could be too big for the lever, or my strength... I could have a crowd of people who are just philosophically opposed to levers."

      Key Approaches to Leverage

      • Minimal Incidental Complexity: Reducing unnecessary complexity that accumulates over time.

        "We love minimal incidental complexity... other communities don’t even think about that." - Functional & Immutable Data Models: Advocates for a pure functional approach to state management and UI rendering.

        "The state of the world is some immutable thing, initialized somehow, then I walk from step to step running some pure function." - Generalized Pure Functions: Aiming for functional purity while acknowledging that some dynamism is needed.

        "To me, you're starting by breaking the ideal. You're saying, 'I’m not really going to use pure functions for that.'" - Component-Based Rendering: Prefers data-driven UI, minimizing reliance on React’s event-based state management.

        "A pure function, a render of some sort of transform of the world."

      Core Abstractions for Software Leverage

      1. Entity-Attribute-Value (EAV) Model: A flexible, normalized data structure for representing application state.

        "The first one is just the power of entity attribute value." 2. Idents (Universal Entity Identifiers): Unique tuples ([type id]) for referencing entities.

        "The kind allows you to prevent collisions... useful semantic information." 3. Graph Queries: Uses EDN-like queries to efficiently pull and update data.

        "Attach logic to graph queries that say when you get the result of this query, here's how you normalize it." 4. Full-Stack Datified Mutations: CQRS-like abstractions over side effects and state transitions.

        "CQRS kind of idea... I’m going to make an abstract thing that says what I want to do."

      Emergent Benefits of This Approach

      • Normalized State Representation: Enables automatic merging of data, reducing complexity in state updates.

        "This gives me on my world, my immutable World in that diagram of kind of our idealized application." - Minimizing UI Boilerplate: Using annotated queries and data-driven components reduces manual UI code.

        "A UI location-specific way to annotate my UI... initial state is just a mirror of that." - Abstracting Side Effects: Remote calls and transactions become well-structured, reducing ad-hoc state management.

        "Transact things... processing system talks to remotes for side effects, talks to the database for local changes, and triggers renders."

      State Machines for Process Control

      • Advocates state machines for handling application logic, avoiding scattered imperative code.

        "Very often, process is just peppered around everywhere... having a state machine that abstracts over this is powerful." - Uses state charts (Harel state machines) for complex workflows like authentication.

        "State charts are way better when your state machine gets large."

      Fulcro & RAD (Rapid Application Development)

      • Fulcro: A ClojureScript-based framework built on these principles.

        "How do I simplify F? How do I get these core pieces generic enough to reuse?" - RAD: Built to automate UI and backend generation, minimizing redundant work.

        "I really wanted to minimize the boilerplate right... tired of handwriting schema." - Plugins for Databases, Forms, Reports, and APIs: Reduces custom implementation for common application patterns.

        "Datomic support gives me my network API and integration with Datomic in 1900 lines of code."

      Key Takeaways

      • Graph-based, normalized application state leads to better leverage and scalability.
      • Functional purity where possible, and controlled side effects when necessary.
      • Automatic UI and backend generation through metadata and introspection.
      • Composable, small-core abstractions allow flexibility without unnecessary complexity.

      "A very small number of Core Concepts... it's pluggable, you can escape from everything... it's just an annotated data model."

      This approach significantly reduces the long-term maintenance cost of applications by emphasizing reusability, composition, and functional principles.

    1. Reviewer #2 (Public review):

      This work presents a 27-region DMR model for early diagnosis and prognostic prediction of colorectal cancer using plasma methylation markers. While this non-invasive diagnostic and prognostic tool could interest a broad readership, several critical issues require attention.

      Major Concerns:

      (1) Inconsistencies and clarity issues in data presentation

      a) Sample size discrepancies<br /> - The abstract mentions screening 119 CRC tissue samples, while Figure 1 shows 136 tissues. Please clarify if this represents 119 CRC and 17 normal samples.<br /> - The plasma sample numbers vary across sections: the abstract cites 161 samples, Figure 1 shows 116 samples, and the Supplementary Methods mentions 77 samples (13 Normal, 15 NAA, 12 AA, 37 CRC).

      b) Methodological inconsistencies<br /> - The Supplementary Material reports 477 hypermethylated sites from TCGA data analysis (Δβ>0.20, FDR<0.05), but Figure 1 indicates 499 sites.<br /> - The manuscript states that analyzing TCGA data across six cancer types identified 499 CRC-specific methylation sites, yet Figure 1 shows 477. Please also explain the rationale for selecting these specific cancer types from TCGA.<br /> - "404 CRC-specific DMRs" mentioned in the main text while "404 MCBs" in Figure 1, the authors need to clarify if these terms are interchangeable or how MCBs are defined.

      (2) Methodological documentation

      - The Results section requires a more detailed description of marker identification procedures and justification of methodological choices.<br /> - Figure 3 panels need reordering for sequential citation.

      (3) Quality control and data transparency

      - No quality control metrics are presented for the in-house sequencing data (e.g., sequencing quality, alignment rate, BS conversion rate, coverage, PCA plots for each cohort).<br /> - The analysis code should be publicly available through GitHub or Zenodo.<br /> - At a minimum, processed data should be made publicly accessible to ensure reproducibility.

    1. Reviewer #1 (Public review):

      Summary:

      Flowers et al describe an improved version of qFit-ligand, an extension of qFit. qFit and qFit-ligand seek to model conformational heterogeneity of proteins and ligands, respectively, cryo-EM and X-ray (electron) density maps using multi-conformer models - essentially extensions of the traditional alternate conformer approach in which substantial parts of the protein or ligand are kept in place. By contrast, ensemble approaches represent conformational heterogeneity through a superposition of independent molecular conformations.

      The authors provide a clear and systematic description of the improvements made to the code, most notably the implementation of a different conformer generator algorithm centered around RDKit. This approach yields modest improvements in the strain of the proposed conformers (meaning that more physically reasonable conformations are generated than with the "old" qFit-ligand) and real space correlation of the model with the experimental electron density maps, indicating that the generated conformers also better explain the experimental data than before. In addition, the authors expand the scope of ligands that can be treated, most notably allowing for multi-conformer modeling of macrocyclic compounds.

      Strengths:

      The manuscript is well written, provides a thorough analysis, and represents a needed improvement of our collective ability to model small-molecule binding to macromolecules based on cryo-EM and X-ray crystallography, and can therefore have a positive impact on both drug discovery and general biological research.

      Weaknesses:

      There are several points where the manuscript needs clarification in order to better understand the merits of the described work. Overall the demonstrated performance gains are modest (although the theoretical ceiling on gains in model fit and strain energy are not clear!).

    1. The Code Noir intended to restrictenslaved people’s everyday movements and activities, while the hierarchyof commandeurs and plantation managers readily used torture as punish-ment aimed to prevent rebellious behaviors. H

      similar to the US

    1. blancs (unimportant whites or mulattoes) are coming." The crew blew again and sat down as the sun was rising. At 11:00o'clock we landed at Ansa-a-galets.

      Seems as though US recolonization reinstated or re-emphasized the Code Noir.

    1. For the purposes of this Code, “technology-assisted social work services” include any social work services that involve the use of computers, mobile or landline telephones, tablets, video technology, or other electronic or digital technologies; this includes the use of various electronic or digital platforms, such as the Internet, online social media, chat rooms, text messaging, e-mail and emerging digital applications. Technology-assisted social work services encompass all aspects of social work practice, including psychotherapy; individual, family, or group counseling; community organization; administration; advocacy; mediation; education; supervision; research; evaluation; and other social work services.

      Adopting this practice for the use of technology assisted social work can help aid in multiple ways of helping people. Some people who may need help are not always in driving distance or during the day could be a bad time but having the option of using video technology could be a helpful resource for people. I would adopt very select technology assisted practices due to confidentiality issues that could barrier this. I would only use zoom or give a work phone number for clients to call if need be. Anything more like social media platforms can be tricky to navigate without conflict of interests. Now that I am practicing as a social worker, I would use social media to promote the facility in which I am working at and show people that there is help out there. No personal information would be exposed in this process.

    2. The NASW Code of Ethics reflects the commitment of all social workers to uphold the profession’s values and to act ethically. Principles and standards must be applied by individuals of good character who discern moral questions and, in good faith, seek to make reliable ethical judgments.

      I believe the NASW Code of Ethics provides challenges with maintaining a set of rules across the board for Social Workers to follow. While the NASW Code of Ethics relies on the "individuals of good character" to make ethical judgments, it also states it has "formal procedures to adjudicate ethics complaints against its members." The Code of Ethics has more room to discern what constitutes as poor ethical judgment worth procedures to address complaints, define gray areas of ethical judgment, and address who or what agencies have the power to apply punishment on poor ethical choices by the social worker(s). This lack of clear definition leaves room for misinterpretation, easily making mistakes, and jeopardizing a social worker's job / licensure. This gap of information also allows certain agencies to punish social workers who've made mistakes differently, and perhaps be harsher on social workers of baccalaureate degrees over a masters, or even based on sex, race, age, religious belief, et cetera.

    3. The Code is designed to help social workers identify relevant considerations when professional obligations conflict or ethical uncertainties arise.

      This specific section of the NASW Code of Ethics specifically reminds me of an event at my job where a In-Community therapist (IIC) violated boundaries with a client / youth. We define youths as any of our clients under 21 years old, whom we work to monitor and adjust services per their mental and behavioral health needs. In this event, the IIC sent her husband to the shelter where the youth was to send food to the client. While this was a nice gesture, it violated the youth's HIPPA rights and crossed many personal boundaries. In this instance, I think about the professional obligations that conflicted with the IIC's personal, ethical dilemmas. When receiving the call that the youth was hungry must have risen many ethical uncertainties for this therapist/ IIC.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Koren et al. derive and analyse a spiking network model optimised to represent external signals using the minimum number of spikes. Unlike most prior work using a similar setup, the network includes separate populations of excitatory and inhibitory neurons. The authors show that the optimised connectivity has a like-to-like structure, which leads to the experimentally observed phenomenon of feature competition. The authors also examine how various (hyper)parameters-such as adaptation timescale, the excitatory-to-inhibitory cell ratio, regularization strength, and background current-affect the model. These findings add biological realism to a specific implementation of efficient coding. They show that efficient coding explains, or at least is consistent with, multiple experimentally observed properties of excitatory and inhibitory neurons. 

      As discussed in the first round of reviews, the model's ability to replicate biological observations such as the 4:1 ratio of excitatory vs. inhibitory neurons hinges on somewhat arbitrary hyperparameter choices. Although this may limit the model's explanatory power, the authors have made significant efforts to explore how these parameters influence their model. It is an empirical question whether the uncovered relationships between, e.g., metabolic cost and the fraction of excitatory neurons are biologically relevant.

      The revised manuscript is also more transparent about the model's limitations, such as the lack of excitatory-excitatory connectivity. Further improvements could come from explicitly acknowledging additional discrepancies with biological data, such as the widely reported weak stimulus tuning of inhibitory neurons in the primary sensory cortex of untrained animals.

      We thank the Reviewer for their insightful characterization of our paper and for further suggestions on how to improve it. We have now further improved the transparency about model’s limitations and we explicitly acknowledged the discrepancy with biological data about connection probability and about the selectivity of inhibitory neurons (pages 4 and 15).

      Reviewer #2 (Public review): 

      Summary: 

      In this work, the authors present a biologically plausible, efficient E-I spiking network model and study various aspects of the model and its relation to experimental observations. This includes a derivation of the network into two (E-I) populations, the study of single-neuron perturbations and lateral-inhibition, the study of the effects of adaptation and metabolic cost, and considerations of optimal parameters. From this, they conclude that their work puts forth a plausible implementation of efficient coding that matches several experimental findings, including feature-specific inhibition, tight instantaneous balance, a 4 to 1 ratio of excitatory to inhibitory neurons, and a 3 to 1 ratio of I-I to E-I connectivity strength.

      Strengths: 

      While many network implementations of efficient coding have been developed, such normative models are often abstract and lacking sufficient detail to compare directly to experiments. The intention of this work to produce a more plausible and efficient spiking model and compare it with experimental data is important and necessary in order to test these models. In rigorously deriving the model with real physical units, this work maps efficient spiking networks onto other more classical biophysical spiking neuron models. It also attempts to compare the model to recent single-neuron perturbation experiments, as well as some long-standing puzzles about neural circuits, such as the presence of separate excitatory and inhibitory neurons, the ratio of excitatory to inhibitory neurons, and E/I balance. One of the primary goals of this paper, to determine if these are merely biological constraints or come from some normative efficient coding objective, is also important. Lastly, though several of the observations have been reported and studied before, this work arguably studies them in more depth, which could be useful for comparing more directly to experiments.

      Weaknesses: 

      This work is the latest among a line of research papers studying the properties of efficient spiking networks. Many of the characteristics and findings here have been discussed before, thereby limiting the new insights that this work can provide. Thus, the conclusions of this work should be considered and understood in the context of those previous works, as the authors state. Furthermore, the number of assumptions and free parameters in the model, though necessary to bring the model closer to biophysical reality, make it more difficult to understand and to draw clear conclusions from. As the authors state, many of the optimality claims depend on these free parameters, such as the dimensionality of the input signal (M=3), the relative weighting of encoding error and metabolic cost, and several others. This raises the possibility that it is not the case that the set of biophysical properties measured in the brain are accounted for by efficient coding, but rather that theories of efficient coding are flexible enough to be consistent with this regime. With this in mind, some of the conclusions made in the text may be overstated and should be considered in this light.

      Conclusions, Impact, and additional context: 

      Notions of optimality are important for normative theories, but they are often studied in simple models with as few free parameters as possible. Biophysically detailed and mechanistic models, on the other hand, will often have many free parameters by their very nature, thereby muddying the connection to optimality. This tradeoff is an important concern in neuroscientific models. Previous efficient spiking models have often been criticized for their lack of biophysically-plausible characteristics, such as large synaptic weights, dense connectivity, and instantaneous communication. This work is an important contribution in showing that such networks can be modified to be much closer to biophysical reality without losing their essential properties. Though the model presented does suffer from complexity issues which raise questions about its connections to "optimal" efficient coding, the extensive study of various parameter dependencies offers a good characterization of the model and puts its conclusions in context.

      We thank the Reviewer for their thorough and accurate assessment of our paper.  

      Reviewer #3 (Public review): 

      Summary: 

      In their paper the authors tackle three things at once in a theoretical model: how can spiking neural networks perform efficient coding, how can such networks limit the energy use at the same time, and how can this be done in a more biologically realistic way than previous work. 

      They start by working from a long-running theory on how networks operating in a precisely balanced state can perform efficient coding. First, they assume split networks of excitatory (E) and inhibitory (I) neurons. The E neurons have the task to represent some lower dimensional input signal, and the I neurons have the task to represent the signal represented by the E neurons. Additionally, the E and I populations should minimize an energy cost represented by the sum of all spikes. All this results in two loss functions for the E and I populations, and the networks are then derived by assuming E and I neurons should only spike if this improves their respective loss. This results in networks of spiking neurons that live in a balanced state, and can accurately represent the network inputs. 

      They then investigate in depth different aspects of the resulting networks, such as responses to perturbations, the effect of following Dale's law, spiking statistics, the excitation (E)/inhibition (I) balance, optimal E/I cell ratios, and others. Overall, they expand on previous work by taking a more biological angle on the theory and show the networks can operate in a biologically realistic regime.

      Strengths: 

      * The authors take a much more biological angle on the efficient spiking networks theory than previous work, which is an essential contribution to the field

      * They make a very extensive investigation of many aspects of the network in this context, and do so thoroughly

      * They put sensible constraints on their networks, while still maintaining the good properties these networks should have

      Weaknesses: 

      * One of the core goals of the paper is to make a more biophysically realistic network than previous work using similar optimization principles. One of the important things they consider is a split into E and I neurons. While this works fine, and they consider the coding consequences of this, it is not clear from an optimization perspective why the split into E and I neurons and following Dale's law would be beneficial. This would be out of scope for the current paper however.

      * The theoretical advances in the paper are not all novel by themselves, as most of them (in particular the split into E and I neurons and the use of biophysical constants) had been achieved in previous models. However, the authors discuss these links thoroughly and do more in-depth follow-up experiments with the resulting model. 

      Assessment and context: 

      Overall, although much of the underlying theory is not necessarily new, the work provides an important addition to the field. The authors succeeded well in their goal of making the networks more biologically realistic, and incorporate aspects of energy efficiency. For computational neuroscientists this paper is a good example of how to build models that link well to experimental knowledge and constraints, while still being computationally and mathematically tractable. For experimental readers the model provides a clearer link of efficient coding spiking networks to known experimental constraints and provides a few predictions.

      We thank the Reviewer for a positive assessment and for pointing out the merits of our work.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      The authors have addressed my previous concerns, and I agree that the manuscript has improved. However, I believe they could still do more to acknowledge two notable mismatches between the model and experimental data.

      (1) Stimulus selectivity of excitatory and inhibitory neurons 

      In the model, excitatory and inhibitory neurons exhibit similar stimulus selectivity, which appears inconsistent with most experimental findings. The authors argue that whether inhibitory neurons are less selective remains an open question, citing three studies in support. However, only one of these studies (Ranyan) was conducted in primary sensory cortex and it is, to my knowledge, one of the few papers showing this (indeed, it's often cited as an exception). The other two studies (Kuan and Najafi) recorded from the parietal cortex of mice trained on decision making tasks, and therefore seem less relevant to the model.

      In contrast to the cited studies, the overwhelming majority of the work has found that inhibitory neurons in sensory cortex, in particular those expressing Parvalbumin, are less stimulus selective than excitatory cells. And this is indeed the prevailing view, as summarized by the review from Hu et al. (Science, 2014): "PV+ interneurons exhibit broader orientation tuning and weaker contrast specificity than pyramidal neurons." This view emerged from numerous classical studies, including Sohya et al. (J. Neurosci., 2007), Cardin (J. Neurosci., 2007), Nowak (Cereb. Cortex, 2008), Niell et al. ( J. Neurosci., 2008), Liu (J. Neurosci., 2009), Kerlin (Neuron, 2010), Ma et al. (J. Neurosci., 2010), Hofer et al. (Nature Neurosci. 2011), and Atallah et al. (Neuron 2012). Weak inhibitory tuning has been confirmed by recent studies, such as Sanghavi & Kar (biorxiv 2023), Znamenskiy et al. (Neuron 2024), and Hong et al. (Nature, 2024).

      The authors should acknowledge this consensus and cite the conflicting evidence. Failing to do so is cherry picking from the literature. Since training can increase the stimulus selectivity of PV+ neurons to that of Pyr levels, also in primary visual cortex (Khan et al. Neuron 2018), a favourable interpretation of the model is that it represents a highly optimized, if not overtrained, state.

      We have carefully considered the literature cited by the Reviewer. We agree with the interpretation that stimulus selectivity of inhibitory neurons in our model is higher than the stimulus selectivity of Parvalbumin-positive inhibitory neurons in the primary sensory cortex of naïve animals. We have edited the text in Discussion (page 14).

      (2) Connection probability 

      The manuscript claims that "rectification sets the overall connection probability to 0.5, consistent with experimental results (Pala & Petersen; Campagnola et al.)." However, the cited studies, and others, report significantly lower probabilities, except for Pyr-PV (E-I connections in the model). For example, Campagnola et al. measured PV-Pyr connectivity at 34% in L2/3 and 20% in L5.

      It's perfectly acceptable that the model cannot replicate every detail of biological circuits. But it's important to be cautious when claiming consistency with experimental data.

      Here as well, we agree with the Reviewer that the connection probability of 0.5 is consistent with reported connectivity of Pyr-PV neurons, but less so with reported connectivity of PV-Pyr neurons. We have now qualified our claim about compatibility of the connection probability in our model with empirical observations more precise (page 4).

      Reviewer #2 (Recommendations for the authors): 

      I commend the authors for an extremely thorough and detailed rebuttal, and for all of the additional work put in to address the reviewer concerns. For the most part, I am satisfied with the current state of the manuscript. 

      We thank the Reviewer for recognizing our effort to address the first round of Reviews to our best ability.

      Here are some small points still remaining that I think the authors should address: 

      (1) Pg. 8, "We verified the robustness of the model to small deviations from the optimal synaptic weights" - while the authors now cite Calaim et al. 2022 in the discussion, its relevance to several of the results justify its inclusion in other places. Here is one place where the authors test something that was also studied in this previous paper.

      The Reviewer is correct that Calaim et al. (eLife 2022) addressed the robustness of synaptic weights, and we now cited this study when describing our results on jiVering of synaptic connections (page 8).

      (2) Pg. 9, "In our optimal E-I network we indeed found that optimal coding efficiency is achieved in absence of within-neuron feedback or with weak adaptation in both cell types" Pg. 10, "the absence of within-neuron feedback or the presence of weak and short-lasting spike-triggered adaptation in both E and I neurons are optimally efficient solutions" The authors seem to state that both weak adaptation and no adaptation at all are optimal. In contrast to the rest of the results presented, this is very vague and does not give a particular level of adaptation as being optimal. The authors should make this more clear. 

      We agree that the text about optimal level of adaptation was unclear. The optimal solution is no adaptation, while weak and short-lasting adaptation define a slightly suboptimal, yet still efficient, network state, as now stated on page 10.

      (3) Pg. 13, "In summary our analysis suggests that optimal coding efficiency is achieved with four times more E neurons than I neurons and with mean I-I synaptic efficacy about 3 times stronger..." --- claims such as these are still too strong, in my opinion. It is rather the case that the particular ratio of E to I neurons and connections strengths can be made consistent with an optimally efficient regime.

      We agree here as well. We have revised the text (page 13) to beVer explain our results.

      (4) Pg. 14, "firing rates in the 1CT model were highly sensitive to variations in the metabolic constant" (Fig. 8I, as compared to Fig. 6C). This difference between the 1CT and E-I networks is striking, and I would suspect it is due to some idiosyncrasies in the difference between the two models (e.g., the relative amount of delay that it takes for lateral inhibition to take effect, or the fact that E-E connections have not been removed in this model). The authors should ideally back up this result with some justified explanation. 

      We agree with Reviewer that the delay for lateral inhibition in the E-I model is twice that of the 1CT model and that the E-I model gains stability from the lack of E-E connectivity. Furthermore, the tuning is stronger in I compared to E neurons in the E-I model, which contributes to making the E-I network inhibition-dominated (Fig. 1H). In contrast, the average excitation and inhibition in the 1CT model are of exactly the same magnitude. The property of being inhibition-dominated makes the E-I model more stable. We report these observations in the revised text (pages 14-15). 

      Reviewer #3 (Recommendations for the authors): 

      Overall my points were very well responded to and I removed most of my weaknesses.

      I appreciate the authors implementing my suggested analysis change for Figure 8, and I find the result very clear. I would further suggest they add a bit of text for the reader as to why this is done. For a new reader without much knowledge of these networks at first it seems the inhibitory population is very good at representation in fig 8G: so why is it not further considered in fig 8H?

      We thank the reviewer for providing further suggestions. We now clarified in the text why only the excitatory population of the E-I model is considered in E-I vs 1 cell type model comparison (page 14). 

      Thanks for sharing the code. From a quick browse through it looks very manageable to implement for follow up work, although some more guidance for how to navigate the quite complicated codebase and how to reproduce specific paper results would be helpful.

      We have also updated the code repository, where we have included more complete instructions on how to reproduce results of each figure. We renamed the folders with the computer code so that they point to a specific figure in the paper. The repository has been completed with the output of the numerical simulations we run, which allows immediate replot of all figures. We have deposited the repository at Zenodo to have the final version of the code associated with the DOI ttps://doi.org/10.5281/zenodo.14628524. This is mentioned in the section Code availability (page 17).

    1. 2.10 Operating-System Debugging

      This section explores operating-system debugging, covering failure analysis, performance monitoring, and advanced tracing tools. Debugging involves identifying and fixing errors in software and hardware, with performance tuning aiming to eliminate processing bottlenecks. When a process fails, operating systems log errors and may generate core dumps for analysis, while kernel failures result in crash dumps. Debugging kernel issues is complex due to hardware control and limited debugging tools. Performance monitoring relies on counters and tracing methods. Linux provides tools like ps, top, vmstat, and /proc for tracking resource usage, while Windows uses Task Manager. Tracing tools, such as strace, gdb, and tcpdump, capture event-based data for in-depth analysis. The BCC toolkit, built on eBPF, enables secure and low-impact debugging of live systems by tracing interactions between user and kernel code. BCC tools, such as disksnoop for disk I/O and opensnoop for system calls, provide real-time insights into system performance and security without disrupting critical applications.

    2. 2.9 Building and Booting an Operating System

      This section outlines the process of building and booting an operating system, emphasizing flexibility for various hardware configurations. It begins by discussing Operating-System Generation, explaining that while most computers come with preinstalled OSs, users can build their own through a series of steps: writing or obtaining source code, configuring, compiling, installing, and booting. System configuration can be highly tailored, modular, or dynamic, affecting system size and adaptability. A case study on Linux details building an OS from source, including downloading the kernel, configuring it, compiling modules, and installing it. An alternative method involves running Linux as a virtual machine using software like VirtualBox or VMware. The System Boot process follows OS generation, starting with a bootstrap program (boot loader) that loads the kernel into memory. Traditional BIOS-based booting has largely been replaced by UEFI, offering faster and more efficient startup. Boot loaders like GRUB allow dynamic kernel parameter selection. Linux and Android use different boot mechanisms, with Android maintaining initramfs as its root filesystem. Most OSs include recovery modes for troubleshooting and repairs.

    3. 2.7 Operating-System Design and Implementation In this section, we discuss problems we face in designing and implementing an operating system. There are, of course, no complete solutions to such problems, but there are approaches that have proved successful. 2.7.1 Design Goals The first problem in designing a system is to define goals and specifications. At the highest level, the design of the system will be affected by the choice of hardware and the type of system: traditional desktop/laptop, mobile, distributed, or real time. Beyond this highest design level, the requirements may be much harder to specify. The requirements can, however, be divided into two basic groups: user goals and system goals. Users want certain obvious properties in a system. The system should be convenient to use, easy to learn and to use, reliable, safe, and fast. Of course, these specifications are not particularly useful in the system design, since there is no general agreement on how to achieve them. A similar set of requirements can be defined by the developers who must design, create, maintain, and operate the system. The system should be easy to design, implement, and maintain; and it should be flexible, reliable, error free, and efficient. Again, these requirements are vague and may be interpreted in various ways. There is, in short, no unique solution to the problem of defining the requirements for an operating system. The wide range of systems in existence shows that different requirements can result in a large variety of solutions for different environments. For example, the requirements for Wind River VxWorks, a real-time operating system for embedded systems, must have been substantially different from those for Windows Server, a large multiaccess operating system designed for enterprise applications. Specifying and designing an operating system is a highly creative task. Although no textbook can tell you how to do it, general principles have been developed in the field of software engineering, and we turn now to a discussion of some of these principles. 2.7.2 Mechanisms and Policies One important principle is the separation of policy from mechanism. Mechanisms determine how to do something; policies determine what will be done. For example, the timer construct (see Section 1.4.3) is a mechanism for ensuring CPU protection, but deciding how long the timer is to be set for a particular user is a policy decision. The separation of policy and mechanism is important for flexibility. Policies are likely to change across places or over time. In the worst case, each change in policy would require a change in the underlying mechanism. A general mechanism flexible enough to work across a range of policies is preferable. A change in policy would then require redefinition of only certain parameters of the system. For instance, consider a mechanism for giving priority to certain types of programs over others. If the mechanism is properly separated from policy, it can be used either to support a policy decision that I/O-intensive programs should have priority over CPU-intensive ones or to support the opposite policy. Microkernel-based operating systems (discussed in Section 2.8.3) take the separation of mechanism and policy to one extreme by implementing a basic set of primitive building blocks. These blocks are almost policy free, allowing more advanced mechanisms and policies to be added via user-created kernel modules or user programs themselves. In contrast, consider Windows, an enormously popular commercial operating system available for over three decades. Microsoft has closely encoded both mechanism and policy into the system to enforce a global look and feel across all devices that run the Windows operating system. All applications have similar interfaces, because the interface itself is built into the kernel and system libraries. Apple has adopted a similar strategy with its macOS and iOS operating systems. We can make a similar comparison between commercial and open-source operating systems. For instance, contrast Windows, discussed above, with Linux, an open-source operating system that runs on a wide range of computing devices and has been available for over 25 years. The “standard” Linux kernel has a specific CPU scheduling algorithm (covered in Section 5.7.1), which is a mechanism that supports a certain policy. However, anyone is free to modify or replace the scheduler to support a different policy. Policy decisions are important for all resource allocation. Whenever it is necessary to decide whether or not to allocate a resource, a policy decision must be made. Whenever the question is how rather than what, it is a mechanism that must be determined. 2.7.3 Implementation Once an operating system is designed, it must be implemented. Because operating systems are collections of many programs, written by many people over a long period of time, it is difficult to make general statements about how they are implemented. Early operating systems were written in assembly language. Now, most are written in higher-level languages such as C or C++, with small amounts of the system written in assembly language. In fact, more than one higher-level language is often used. The lowest levels of the kernel might be written in assembly language and C. Higher-level routines might be written in C and C++, and system libraries might be written in C++ or even higher-level languages. Android provides a nice example: its kernel is written mostly in C with some assembly language. Most Android system libraries are written in C or C++, and its application frameworks—which provide the developer interface to the system—are written mostly in Java. We cover Android's architecture in more detail in Section 2.8.5.2. The advantages of using a higher-level language, or at least a systems-implementation language, for implementing operating systems are the same as those gained when the language is used for application programs: the code can be written faster, is more compact, and is easier to understand and debug. In addition, improvements in compiler technology will improve the generated code for the entire operating system by simple recompilation. Finally, an operating system is far easier to port to other hardware if it is written in a higher-level language. This is particularly important for operating systems that are intended to run on several different hardware systems, such as small embedded devices, Intel x86 systems, and ARM chips running on phones and tablets. The only possible disadvantages of implementing an operating system in a higher-level language are reduced speed and increased storage requirements. This, however, is not a major issue in today's systems. Although an expert assembly-language programmer can produce efficient small routines, for large programs a modern compiler can perform complex analysis and apply sophisticated optimizations that produce excellent code. Modern processors have deep pipelining and multiple functional units that can handle the details of complex dependencies much more easily than can the human mind. As is true in other systems, major performance improvements in operating systems are more likely to be the result of better data structures and algorithms than of excellent assembly-language code. In addition, although operating systems are large, only a small amount of the code is critical to high performance; the interrupt handlers, I/O manager, memory manager, and CPU scheduler are probably the most critical routines. After the system is written and is working correctly, bottlenecks can be identified and can be refactored to operate more efficiently.

      Operating system design and implementation involve defining clear goals and balancing user and system requirements. User goals focus on convenience, reliability, and speed, while system goals emphasize ease of design, flexibility, and efficiency. A key principle is separating mechanisms (how to do something) from policies (what to do), enabling flexibility and adaptability. For example, microkernel systems use minimal, policy-free mechanisms, allowing customization, while systems like Windows integrate both for consistency. Modern operating systems are typically written in higher-level languages like C or C++, with some assembly for critical parts, improving portability, maintainability, and performance. Compiler optimizations and efficient algorithms often outweigh the benefits of assembly language, making higher-level languages preferable for most OS development.

    4. 2.6 Why Applications Are Operating-System Specific Fundamentally, applications compiled on one operating system are not executable on other operating systems. If they were, the world would be a better place, and our choice of what operating system to use would depend on utility and features rather than which applications were available. Based on our earlier discussion, we can now see part of the problem—each operating system provides a unique set of system calls. System calls are part of the set of services provided by operating systems for use by applications. Even if system calls were somehow uniform, other barriers would make it difficult for us to execute application programs on different operating systems. But if you have used multiple operating systems, you may have used some of the same applications on them. How is that possible? An application can be made available to run on multiple operating systems in one of three ways: 1. The application can be written in an interpreted language (such as Python or Ruby) that has an interpreter available for multiple operating systems. The interpreter reads each line of the source program, executes equivalent instructions on the native instruction set, and calls native operating system calls. Performance suffers relative to that for native applications, and the interpreter provides only a subset of each operating system's features, possibly limiting the feature sets of the associated applications. 2. The application can be written in a language that includes a virtual machine containing the running application. The virtual machine is part of the language's full RTE. One example of this method is Java. Java has an RTE that includes a loader, byte-code verifier, and other components that load the Java application into the Java virtual machine. This RTE has been ported, or developed, for many operating systems, from mainframes to smartphones, and in theory any Java app can run within the RTE wherever it is available. Systems of this kind have disadvantages similar to those of interpreters, discussed above. 3. The application developer can use a standard language or API in which the compiler generates binaries in a machine- and operating-system-specific language. The application must be ported to each operating system on which it will run. This porting can be quite time consuming and must be done for each new version of the application, with subsequent testing and debugging. Perhaps the best-known example is the POSIX API and its set of standards for maintaining source-code compatibility between different variants of UNIX-like operating systems. In theory, these three approaches seemingly provide simple solutions for developing applications that can run across different operating systems. However, the general lack of application mobility has several causes, all of which still make developing cross-platform applications a challenging task. At the application level, the libraries provided with the operating system contain APIs to provide features like GUI interfaces, and an application designed to call one set of APIs (say, those available from IOS on the Apple iPhone) will not work on an operating system that does not provide those APIs (such as Android). Other challenges exist at lower levels in the system, including the following. Each operating system has a binary format for applications that dictates the layout of the header, instructions, and variables. Those components need to be at certain locations in specified structures within an executable file so the operating system can open the file and load the application for proper execution. CPUs have varying instruction sets, and only applications containing the appropriate instructions can execute correctly. Operating systems provide system calls that allow applications to request various activities, such as creating files and opening network connections. Those system calls vary among operating systems in many respects, including the specific operands and operand ordering used, how an application invokes the system calls, their numbering and number, their meanings, and their return of results. There are some approaches that have helped address, though not completely solve, these architectural differences. For example, Linux—and almost every UNIX system—has adopted the ELF format for binary executable files. Although ELF provides a common standard across Linux and UNIX systems, the ELF format is not tied to any specific computer architecture, so it does not guarantee that an executable file will run across different hardware platforms. APIs, as mentioned above, specify certain functions at the application level. At the architecture level, an application binary interface (ABI) is used to define how different components of binary code can interface for a given operating system on a given architecture. An ABI specifies low-level details, including address width, methods of passing parameters to system calls, the organization of the run-time stack, the binary format of system libraries, and the size of data types, just to name a few. Typically, an ABI is specified for a given architecture (for example, there is an ABI for the ARMv8 processor). Thus, an ABI is the architecture-level equivalent of an API. If a binary executable file has been compiled and linked according to a particular ABI, it should be able to run on different systems that support that ABI. However, because a particular ABI is defined for a certain operating system running on a given architecture, ABIs do little to provide cross-platform compatibility. In sum, all of these differences mean that unless an interpreter, RTE, or binary executable file is written for and compiled on a specific operating system on a specific CPU type (such as Intel x86 or ARMv8), the application will fail to run. Imagine the amount of work that is required for a program such as the Firefox browser to run on Windows, macOS, various Linux releases, iOS, and Android, sometimes on various CPU architectures.

      Applications are often operating-system specific due to differences in system calls, binary formats, and CPU instruction sets. System calls, which enable applications to interact with the OS, vary across platforms, making cross-platform execution challenging. Three approaches enable multi-OS compatibility: 1) Interpreted languages (e.g., Python) use interpreters to execute code on different OSes, though performance and feature sets may be limited. 2) Virtual machines (e.g., Java) run applications within a portable runtime environment, but with similar limitations. 3) Porting applications to each OS using standard APIs (e.g., POSIX) is time-consuming. Binary formats (e.g., ELF, PE) and ABIs further complicate cross-platform compatibility, as they are tied to specific architectures and OSes. These factors make developing cross-platform applications, like Firefox, a complex task requiring significant adaptation for each OS and CPU architecture.

    5. 2.5 Linkers and Loaders Usually, a program resides on disk as a binary executable file—for example, a.out or prog.exe. To run on a CPU, the program must be brought into memory and placed in the context of a process. In this section, we describe the steps in this procedure, from compiling a program to placing it in memory, where it becomes eligible to run on an available CPU core. The steps are highlighted in Figure 2.11. Figure 2.11 The role of the linker and loader. Source files are compiled into object files that are designed to be loaded into any physical memory location, a format known as an relocatable object file. Next, the linker combines these relocatable object files into a single binary executable file. During the linking phase, other object files or libraries may be included as well, such as the standard C or math library (specified with the flag -lm). A loader is used to load the binary executable file into memory, where it is eligible to run on a CPU core. An activity associated with linking and loading is relocation, which assigns final addresses to the program parts and adjusts code and data in the program to match those addresses so that, for example, the code can call library functions and access its variables as it executes. In Figure 2.11, we see that to run the loader, all that is necessary is to enter the name of the executable file on the command line. When a program name is entered on the command line on UNIX systems—for example, ./main—the shell first creates a new process to run the program using the fork() system call. The shell then invokes the loader with the exec() system call, passing exec() the name of the executable file. The loader then loads the specified program into memory using the address space of the newly created process. (When a GUI interface is used, double-clicking on the icon associated with the executable file invokes the loader using a similar mechanism.) The process described thus far assumes that all libraries are linked into the executable file and loaded into memory. In reality, most systems allow a program to dynamically link libraries as the program is loaded. Windows, for instance, supports dynamically linked libraries (DLLs). The benefit of this approach is that it avoids linking and loading libraries that may end up not being used into an executable file. Instead, the library is conditionally linked and is loaded if it is required during program run time. For example, in Figure 2.11, the math library is not linked into the executable file main. Rather, the linker inserts relocation information that allows it to be dynamically linked and loaded as the program is loaded. We shall see in Chapter 9 that it is possible for multiple processes to share dynamically linked libraries, resulting in a significant savings in memory use. Object files and executable files typically have standard formats that include the compiled machine code and a symbol table containing metadata about functions and variables that are referenced in the program. For UNIX and Linux systems, this standard format is known as ELF (for Executable and Linkable Format). There are separate ELF formats for relocatable and executable files. One piece of information in the ELF file for executable files is the program's entry point, which contains the address of the first instruction to be executed when the program runs. Windows systems use the Portable Executable (PE) format, and macOS uses the Mach-O format. ELF FORMAT Linux provides various commands to identify and evaluate ELF files. For example, the file command determines a file type. If main.o is an object file, and main is an executable file, the command file main.o will report that main.o is an ELF relocatable file, while the command file main will report that main is an ELF executable. ELF files are divided into a number of sections and can be evaluated using the readelf command.

      Linkers and loaders play a crucial role in transforming a program from a disk-based binary executable (e.g., a.out or prog.exe) into a memory-resident process ready for CPU execution. Source files are compiled into relocatable object files, which the linker combines into a single executable, incorporating libraries like the standard C library. The loader then loads this executable into memory, adjusting addresses through relocation to enable proper function calls and variable access. On UNIX systems, the shell uses fork() and exec() system calls to create a process and invoke the loader. Dynamic linking, as seen with Windows DLLs, allows libraries to be linked and loaded only when needed, saving memory. Executable files follow standard formats like ELF (Linux), PE (Windows), or Mach-O (macOS), containing machine code, symbol tables, and entry points. Tools like the file and readelf commands help analyze ELF files, distinguishing between relocatable and executable formats.

    6. Virtualization is a technology that allows us to abstract the hardware of a single computer (the CPU, memory, disk drives, network interface cards, and so forth) into several different execution environments, thereby creating the illusion that each separate environment is running on its own private computer. These environments can be viewed as different individual operating systems (for example, Windows and UNIX) that may be running at the same time and may interact with each other. A user of a virtual machine can switch among the various operating systems in the same way a user can switch among the various processes running concurrently in a single operating system. Virtualization allows operating systems to run as applications within other operating systems. At first blush, there seems to be little reason for such functionality. But the virtualization industry is vast and growing, which is a testament to its utility and importance. Broadly speaking, virtualization software is one member of a class that also includes emulation. Emulation, which involves simulating computer hardware in software, is typically used when the source CPU type is different from the target CPU type. For example, when Apple switched from the IBM Power CPU to the Intel x86 CPU for its desktop and laptop computers, it included an emulation facility called “Rosetta,” which allowed applications compiled for the IBM CPU to run on the Intel CPU. That same concept can be extended to allow an entire operating system written for one platform to run on another. Emulation comes at a heavy price, however. Every machine-level instruction that runs natively on the source system must be translated to the equivalent function on the target system, frequently resulting in several target instructions. If the source and target CPUs have similar performance levels, the emulated code may run much more slowly than the native code. With virtualization, in contrast, an operating system that is natively compiled for a particular CPU architecture runs within another operating system also native to that CPU. Virtualization first came about on IBM mainframes as a method for multiple users to run tasks concurrently. Running multiple virtual machines allowed (and still allows) many users to run tasks on a system designed for a single user. Later, in response to problems with running multiple Microsoft Windows applications on the Intel x86 CPU, VMware created a new virtualization technology in the form of an application that ran on Windows. That application ran one or more guest copies of Windows or other native x86 operating systems, each running its own applications. (See Figure 1.16.) Windows was the host operating system, and the VMware application was the virtual machine manager (VMM). The VMM runs the guest operating systems, manages their resource use, and protects each guest from the others.

      The Overview of Cloud Computing Using virtualization to its advantage, cloud computing provides on-demand online access to computers, storage, and software. The resources used determine how much users pay. Public, private, hybrid, and SaaS, PaaS, and IaaS cloud types are among them. These settings can integrate several kinds of cloud services.

      Integrated Systems For certain tasks, embedded systems are customized computing devices with constrained interfaces. In order to satisfy stringent timing constraints, they frequently use real-time operating systems. These systems need to process sensor data quickly and respond to it. They are utilized in a variety of industries, including industrial automation, medical devices, and automobile control.

      Operating Systems That Are Free and Open-Source GNU/Linux is one example of an open-source operating system that makes its source code available for free alteration and dissemination. The push for free software encourages user liberties,

    1. If one gets rid of these habits one can think more clearly, and to think clearly is a necessary first step toward political regeneration: so that the fight against bad English is not frivolous and is not the exclusive concern of professional writers.

      While I've not read the whole passage yet, I really do wonder what exactly is being talked about here. Does writing in lowercase in casual communication really destroy much of anything? Depending on context, I speak and write in many different ways, some of which involve a lot of swear words, slang, "poor" punctuation, and unnecessary abbreviations. However, from what I can see, none of these different code switches seems to detract from any others. They're merely forms of expression. Again, what is being called "bad habit" here? I hope, out of everything, that it's the habit of leaving things unexamined...

    1. Vous pouvez retrouver la version corrigée et finale du code ici.

      Dans la page menu au niveau de menu__partDish, on a bien compris pourquoi vous avez redéfinie la line-height, mais par contre la taille vous auriez du expliquer !

    1. So they substitute stand-in data, or proxies.They draw statistical correlations between a person’s zip code or languagepatterns and her potential to pay back a loan or handle a job. Thesecorrelations are discriminatory, and some of them are illegal.

      The transparency, quality, and level of bias on the data going in affects the quality and bias of the data coming up. Loose coorelations will yield skewed data and predictions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper introduces a new approach to modeling human behavioral responses using image-computable models. They create a model (VAM) that is a combination of a standard CNN coupled with a standard evidence accumulation model (EAM). The combined model is then trained directly on image-level data using human behavioral responses. This approach is original and can have wide applicability. However, many of the specific findings reported are less compelling.

      Strengths:

      (1) The manuscript presents an original approach to fitting an image-computable model to human behavioral data. This type of approach is sorely needed in the field.

      (2) The analyses are very technically sophisticated.

      (3) The behavioral data are large both in terms of sample size (N=75) and in terms of trials per subject.

      Weaknesses:

      Major

      (1) The manuscript appears to suggest that it is the first to combine CNNs with evidence accumulation models (EAMs). However, this was done in a 2022 preprint

      (https://www.biorxiv.org/content/10.1101/2022.08.23.505015v1) that introduced a network called RTNet. This preprint is cited here, but never really discussed. Further, the two unique features of the current approach discussed in lines 55-60 are both present to some extent in RTNet. Given the strong conceptual similarity in approach, it seems that a detailed discussion of similarities and differences (of which there are many) should feature in the Introduction.

      Thanks for pointing this out—we agree that the novel contributions of our model (the VAM) with respect to prior related models (including RTNet) should be clarified, and have revised the Introduction accordingly. We include the following clarifications in the Introduction:

      “The key feature of the VAM that distinguishes it from prior models is that the CNN and EAM parameters are jointly fitted to the RT, choice, and visual stimulus data from individual participants in a unified Bayesian framework. Thus, both the visual representations learned by the CNN and the EAM parameters are directly constrained by behavioral data. In contrast, prior models first optimize the CNN to perform the behavioral task, then separately fit a minimal set of high-level CNN parameters [RTNet, Rafiei et al., 2024] and/or the EAM parameters to behavioral data [Annis et al., 2021; Holmes et al., 2020; Trueblood et al., 2021]. As we will show, fitting the CNN with human data—rather than optimizing the model to perform a task—has significant consequences for the representations learned by the model.”

      E.g. in the case of RTNet, the variability of the Bayesian CNN weight distribution, the decision threshold, and the magnitude of the noise added to the images are adjusted to match the average human accuracy (separately for each task condition). RTNet is an interesting and useful model that we believe has complementary strengths to our own work.

      Since there are several other existing models in addition to the VAM and RTNet that use CNNs to generate RTs or RT proxies (by our count, at least six that we cite earlier in the Introduction), we felt it was inappropriate to preferentially include a detailed comparison of the VAM and RTNet beyond the passage quoted above.

      (2) In the approach here, a given stimulus is always processed in the same way through the core CNN to produce activations v_k. These v_k's are then corrupted by Gaussian noise to produce drift rates d_k, which can differ from trial to trial even for the same stimulus. In other words, the assumption built into VAM appears to be that the drift rate variability stems entirely from post-sensory (decisional) noise. In contrast, the typical interpretation of EAMs is that the variability in drift rates is sensory. This is also the assumption built into RTNet where the core CNN produces noisy evidence. Can the authors comment on the plausibility of VAM's assumption that the noise is post-sensory?

      In our view, the VAM is compatible with a model in which the drift rate variability for a given stimulus is due to sensory noise, since we do not specify the origin of the Gaussian noise added to the drift rates. As the reviewer notes, the CNN component of the VAM processes a given stimulus deterministically, yielding the mean drift rates. This does not preclude us from imagining an additional (unmodeled) sensory process that adds variability to the drift rates. The VAM simply represents this and other hypothetical sources of variability as additive Gaussian noise. We agree however that it is worthwhile to think about the origin of the drift rate variability, though it is not a focus of our work.

      (3) Figure 2 plots how well VAM explains different behavioral features. It would be very useful if the authors could also fit simple EAMs to the data to clarify which of these features are explainable by EAMs only and which are not.

      In our view, fitting simple EAMs to the data would not be especially informative and poses a number of challenges for the particular task we study (LIM) that are neatly avoided by using the VAM. In particular, as we show in Figure 2, the stimuli vary along several dimensions that all appear to influence behavior: horizontal position, vertical position, layout, target direction, and flanker direction. Since the VAM is stimulus-computable, fitting the VAM automatically discovers how all of these stimulus features influence behavior (via their effect on the drift rates outputted by the CNN). In contrast, fitting a simple EAM (e.g. the LBA model) necessitates choosing a particular parameterization that specifies the relationship between all of the stimulus features and the EAM model parameters. This raises a number of practical questions. For example, should we attempt to fit a separate EAM for each stimulus feature, or model all stimulus features simultaneously?

      Moreover, while we could in principle navigate these issues and fit simple EAMs to the data, we do not intend to claim that simple EAMs fail to explain the relationship between stimulus features and behavior as well as the VAM. Rather, the key strength of the VAM relative to simple EAMs is that it includes a detailed and biologically plausible model of human vision. The majority of the paper capitalizes on this strength by showing how behavioral effects of interest (namely congruency effects) can be explained in terms of the VAM’s visual representations.

      (4) VAM is tested in two different ways behaviorally. First, it is tested to what extent it captures individual differences (Figure 2B-E). Second, it is tested to what extent it captures average subject data (Figure 2F-J). It wasn't clear to me why for some metrics only individual differences are examined and for other metrics only average human data is examined. I think that it will be much more informative if separate figures examine average human data and individual difference data. I think that it's especially important to clarify whether VAM can capture individual differences for the quantities plotted in Figures 2F-J.

      We would like to clarify that Fig. 2J in fact already shows how well the VAM captures individual differences for the average subject data shown in Fig. 2H (stimulus layout) and Fig. 2I (stimulus position). For a given participant and stimulus feature, we calculated the Pearson's r between model/participant mean RTs across each stimulus feature value. Fig. 2J shows the distribution of these Pearson’s r values across all participants for stimulus layout and horizontal/vertical position.

      Fig. 2G also already shows how well the VAM captures individual differences in behavior. Specifically, this panel shows individual differences in mean RT attributable to differences in age. For Fig. 2F, which shows how the model drift rates differ on congruent vs. incongruent trials, there is no sensible way to compare the models to the participants at any level of analysis (since the participants do not have drift rates). 

      (5) The authors look inside VAM and perform many exploratory analyses. I found many of these difficult to follow since there was little guidance about why each analysis was conducted. This also made it difficult to assess the likelihood that any given result is robust and replicable. More importantly, it was unclear which results are hypothesized to depend on the VAM architecture and training, and which results would be expected in performance-optimized CNNs. The authors train and examine performance-optimized CNNs later, but it would be useful to compare those results to the VAM results immediately when each VAM result is first introduced.

      Thanks for pointing this out—we apologize for any confusion caused by our presentation of the CNN analyses. We have added in additional motivating statements, methodological clarifications, and relevant references to our Results, particularly for Figure 3 in which we first introduce the analyses of the CNN representations/activity. In general, each analysis is prefaced by a guiding question or specific rationale, e.g. “How do the models' visual representations enable target selectivity for stimuli that vary along several irrelevant dimensions?” We also provide numerous references in which these analysis techniques have been used to address similar questions in CNNs or the primate visual cortex.

      We chose to maintain the current organization of our results in which the comparison between the VAM and the task-optimized models are presented in a separate figure. We felt that including analyses of both the VAM and task-optimized models in the initial analyses of the CNN representations would be overwhelming for many readers. As the reviewer acknowledges, some readers may already find these results challenging to follow. 

      (6) The authors don't examine how the task-optimized models would produce RTs. They say in lines 371-2 that they "could not examine the RT congruency effect since the task-optimized models do not generate RTs." CNNs alone don't generate RTs, but RTs can easily be generated from them using the same EAM add-on that is part of VAM. Given that the CNNs are already trained, I can't see a reason why the authors can't train EAMs on top of the already trained CNNs and generate RTs, so these can provide a better comparison to VAM.

      We appreciate this suggestion, but we judge the suggestion to “train EAMs on top of the already trained CNNs and generate RTs” to be a significant expansion of the scope of the paper with multiple possible roads forward. In particular, one must specify how the outputs of the task-optimized CNN (logits for each possible response) relate to drift rates, and there is no widely-accepted or standard way to do this. Previously proposed methods include transforming representation distances in the last layer to drift rates (https://doi.org/10.1037/xlm0000968), fitting additional subject-specific parameters that map the logits to drift rates

      (https://doi.org/10.1007/s42113-019-00042-1), or using the softmax-scored model outputs as drift rates directly (https://doi.org/10.1038/s41562-024-01914-8), though in the latter case the RTs are not on the same scale as human data. In our view, evaluating these different methods is beyond the scope of this paper. An advantage of the VAM is that one does not have to fit two separate models (a CNN and a EAM) to generate RTs.

      Nonetheless, we agree that it would be informative to examine something like RTs in the task-optimized models. Our revised Results section now includes an analysis of the confidence of the task-optimized models’ decisions, which we use a proxy for RTs:   

      “Since the task-optimized models do not generate RTs, it is not possible to directly measure RT congruency effects in these models without making additional assumptions about how the CNN's classification decisions relate to RTs. However, as a coarse proxy for RT, we can examine the confidence of the CNN's decisions, defined as the softmax-scored logit (probability) of the most probable direction in the final CNN layer. This choice of RT proxy is motivated by some prior studies that have combined CNNs with EAMs [Annis et al., 2021; Holmes et al., 2020; Trueblood et al., 2021]. These studies explicitly or implicitly derive a measure of decision confidence from the activity of the last CNN layer. The confidence measure is then mapped to the EAM drift rates, such that greater decision confidence generally corresponds to higher drift rates (and therefore shorter RTs).

      We calculated the average confidence of each task-optimized CNN separately for congruent vs. incongruent trials. On average, the task-optimized models showed higher confidence on congruent vs. incongruent trials (W = 21.0, p < 1e-3, Wilcoxon signed-rank test; Cohen's d = 0.99; n = 75 models). These analyses therefore provide some evidence that task-optimized CNNs have the capacity to exhibit congruency effects, though an explicit comparison of the magnitude of these effects with human data requires additional modeling assumptions (e.g., fitting a separate EAM).”

      (7) The Discussion felt very long and mostly a summary of the Results. I also couldn't shake the feeling that it had many just-so stories related to the variety of findings reported. I think that the section should be condensed and the authors should be clearer about which explanations are speculations and which are air-tight arguments based on the data.

      We have shortened the Discussion modestly and we have added in some clarifying language to help clarify which arguments are more speculative vs. directly supported by our data.

      Specifically, we added in the phrase “we speculate that…” for two suggestions in the Discussion (paragraphs 3 and 5), and we ensured that any other more speculative suggestions contain such clarifying language. We have also added in subheadings in the Discussion to help readers navigate this section. 

      (8) In one of the control analyses, the authors train different VAMs on each RT quantile. I don't understand how it can be claimed that this approach can serve as a model of an individual's sensory processing. Which of the 5 sets of weights (5 VAMs) captures a given subject's visual processing? Are the authors saying that the visual system of a given subject changes based on the expected RT for a stimulus? I feel like I'm missing something about how the authors think about these results.

      We agree that these particular analyses may cause confusion and have removed them from our revised manuscript.

      Reviewer #2 (Public Review):

      In an image-computable model of speeded decision-making, the authors introduce and fit a combined CCN-EAM (a 'VAM') to flanker-task-like data. They show that the VAM can fit mean RTs and accuracies as well as the congruency effect that is present in the data, and subsequently analyze the VAM in terms of where in the network congruency effects arise.

      Overall, combining DNNs and EAMs appears to be a promising avenue to seriously model the visual system in decision-making tasks compared to the current practice in EAMs. Some variants have been proposed or used before (e.g., doi.org/10.1016/j.neuroimage.2017.12.078 , doi.org/10.1007/s42113-019-00042-1), but always in the context of using task-trained models, rather than models trained on behavioral data. However, I was surprised to read that the authors developed their model in the context of a conflict task, rather than a simpler perceptual decision-making task. Conflict effects in human behavior are particularly complex, and thereby, the authors set a high goal for themselves in terms of the to-be-explained human behavior. Unfortunately, the proposed VAM does not appear to provide a great account of conflict effects that are considered fundamental features of human behavior, like the shape of response time distributions, and specifically, delta plots (doi.org/10.1037/0096-1523.20.4.731). The authors argue that it is beyond the scope of the presented paper to analyze delta plots, but as these are central to studies of human conflict behavior, models that aim to explain conflict behavior will need to be able to fit and explain delta plots.

      Theories on conflict often suggest that negative/positive-trending delta plots arise through the relative timing of response activation related to relevant and irrelevant information.

      Accumulation for relevant and irrelevant information would, as a result, either start at different points in time or the rates vary over time. The current VAM, as a feedforward neural network model, does not appear to be able to capture such effects, and perhaps fundamentally not so: accumulation for each choice option is forced to start at the same time, and rates are a static output of the CNN.

      The proposed solution of fitting five separate VAMs (one for each of five RT quantiles) is not satisfactory: it does not explain how delta plots result from the model, for the same reason that fitting five evidence accumulation models (one per RT quantile) does not explain how response time distributions arise. If, for example, one would want to make a prediction about someone's response time and choice based on a given stimulus, one would first have to decide which of the five VAMs to use, which is circular. But more importantly, this way of fitting multiple models does not explain the latent mechanism that underlies the shape of the delta plots.

      As such, the extensive analyses on the VAM layers and the resulting conclusions that conflict effects arise due to changing representations across layers (e.g., "the selection of task-relevant information occurs through the orthogonalization of relevant and irrelevant representations") - while inspiring, they remain hard to weigh, as they are contingent on the assumption that the VAM can capture human behavior in the conflict task, which it struggles with. That said, the promise of combining CNNs and EAMs is clearly there. A way forward could be to either adjust the proposed model so that it can explain delta plots, which would potentially require temporal dynamics and time-varying evidence accumulation rates, or perhaps to start simpler and combine CCNs-EAMs that are able to fit more standard perceptual decision-making tasks without conflict effects.

      We thank the reviewer for their thoughtful comments on our work. However, we note that the

      VAM does in fact capture the positive-trending RT delta plot observed in the participant data (Fig. S4A), though the intercepts for models/participants differ somewhat. On the other hand, the conditional accuracy functions (Fig. S4B) reveal a more pronounced difference between model and participant behavior. As the reviewer points out, capturing these effects is likely to require a model that can produce time-varying drift rates, whereas our model produces a fixed drift rate for a given stimulus. We also agree that fitting a separate VAM to each RT quantile is not a satisfactory means of addressing this limitation and have removed these analyses from our revised manuscript.

      However, while we agree that accurately capturing these dynamic effects is a laudable goal, it is in our view also worthwhile to consider explanations for the mean behavioral effect (i.e. the accuracy congruency effect), which can occur independently of any consideration of dynamics. One of our main findings is that across-model variability in accuracy congruency effects is better attributed to variation in representation geometry (target/flanker subspace alignment) vs.

      variation in the degree of flanker suppression. This finding does not require any consideration of dynamics to be valid at the level of explanation we pursue (across-user variability in congruency effects), but also does not preclude additional dynamic processes that could give rise to more specific error patterns. Our revised discussion now includes a section where we summarize and elaborate on these ideas:

      “It is not difficult to imagine how the orthogonalization mechanism described above, which explains variability in accuracy congruency effects across individuals, could act in concert with other dynamic processes that explain variability in congruency effects within individuals (e.g., as a function of RT). In general, any process that dynamically gates the influence of irrelevant sensory information on behavioral outputs could accomplish this, for example ramping inhibition of incorrect response activation [https://doi.org/10.3389/fnhum.2010.00222], a shrinking attention spotlight [https://doi.org/10.1016/j.cogpsych.2011.08.001], or dynamics in neural population-level geometry [https://doi.org/10.1038/nn.3643]. To pursue these ideas, future work may aim to incorporate dynamics into the visual component and decision component of the VAM with recurrent CNNs [https://doi.org/10.48550/arXiv.1807.00053, https://doi.org/10.48550/arXiv.2306.11582] and the task-DyVA model [https://doi.org/10.1038/s41562-022-01510-8], respectively.”

      Reviewer #3 (Public Review):

      Summary:

      In this article, the authors combine a well-established choice-response time (RT) model (the Linear Ballistic Accumulator) with a CNN model of visual processing to model image-based decisions (referred to as the Visual Accumulator Model - VAM). While this is not the first effort to combine these modeling frameworks, it uses this combination of approaches uniquely.

      Specifically, the authors attempt to better understand the structure of human information representations by fitting this model to behavioral (choice-RT) data from a classic flanker task. This objective is made possible by using a very large (by psychological modeling standards) industry data set to jointly fit both components of this VAM model to individual-level data. Using this approach, they illustrate (among other results) (1) how the interaction between target and flanker representations influence the presence and strength of congruency effects, (2) how the structure of representations changes (distributed versus more localized) with depth in the CNN model component, and (3) how different model training paradigms change the nature of information representations. This work contributes to the ML literature by demonstrating the value of training models with richer behavioral data. It also contributes to cognitive science by demonstrating how ML approaches can be integrated into cognitive modeling. Finally, it contributes to the literature on conflict modeling by illustrating how information representations may lead to some of the classic effects observed in this area of research.

      Strengths:

      (1) The data set used for this analysis is unique and is made publicly available as part of this article. Specifically, they have access to data for 75 participants with >25,000 trials per participant. This scale of data/individual is unusual and is the foundation on which this research rests.

      (2) This is the first time, to my knowledge, that a model combining a CNN with a choice-RT model has been jointly fit to choice-RT data at the level of individual people. This type of model combination has been used before but in a more restricted context. This joint fitting, and in particular, learning a CNN through the choice-RT modeling framework, allows the authors to probe the structure of human information representations learned directly from behavioral data.

      (3) The analysis approaches used in this article are state-of-the-art. The training of these models is straightforward given the data available. The interesting part of this article (opinion of course) is the way in which they probe what CNN has learned once trained. I find their analysis of how distractor and target information interfere with each other particularly compelling as well as their demonstration that training on behavioral data changes the structure of information representations when compared to training models on standard task-optimized data.

      Weaknesses:

      (1) Just as the data in this article is a major strength, it is also a weakness. This type of modeling would be difficult, if not impossible to do with standard laboratory data. I don't know what the data floor would be, but collecting tens of thousands of decisions for a single person is impractical in most contexts. Thus this type of work may live in the realm of industry. I do want to re-iterate that the data for this study was made publicly available though!

      We suspect (but have not systematically tested) that the VAMs can be fitted with substantially less data. We use data augmentation techniques (various randomized image transformations) during training to improve the generalization capabilities of the VAMs, and these methods are likely to be particularly important when training on smaller datasets. One could consider increasing the amount of image data augmentation when working with smaller datasets, or pursuing other forms of data augmentation like resampling from estimated RT distributions (see https://doi.org/10.1038/s41562-022-01510-8 for an example of this). In general, we don’t think that prospective users of our approach should be discouraged if they have only a few hundred trials per subject (or less) - it’s worth trying!

      (2) While this article uses choice-RT data it doesn't fully leverage the richness of the RT data itself. As the authors point out, this modeling framework, the LBA component in particular, does not account for some of the more nuanced but well-established RT effects in this data. This is not a big concern given the already nice contributions of this article and it leads to an opportunity for ongoing investigation.

      We agree that fully capturing the more nuanced behavioral effects you mention (e.g. RT delta plots and conditional accuracy functions) is a worthwhile goal for future research—see our response to Reviewer #2 for a more detailed discussion. ----------

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The phrase in the Abstract "convolutional neural network models of visual processing and traditional EAMs are jointly fitted" made me initially believe that the two models were fitted independently. You may want to re-word to clarify.

      We think that the phrase “jointly fitted” already makes it clear that both the CNN and EAM parameters are estimated simultaneously, in agreement with how this term is usually used. But we have nonetheless appended some additional clarifying language to that sentence (“in a unified Bayesian framework”).

      (2) Lines 27-28: EAMs "are the most successful and widely-used computational models of decision-making." This is only true for the specific type of decision-making examined here, namely joint modeling of choice and response times. Signal detection theory is arguably more widely-used when response times are not modeled.

      Thanks for pointing this out - we have revised the referenced sentence accordingly.

      (3) Could the authors clarify what is plotted in Figure 2F?

      Fig. 2F shows the drift rates for the target, flanker, and “other” (non-target/non-flanker) accumulators averaged over trials and models for congruent vs. incongruent trials. In case this was a source of confusion, we do not show the value of the flanker drift rates on congruent trials because the flanker and target accumulators are identical (i.e. the flanker/congruent drift rates are equivalent to the target/congruent drift rates).

      (4) Lines 214-7: "The observation that single-unit information for target direction decreased between the fourth and final convolutional layers while population-level decoding remained high is especially noteworthy in that it implies a transition from representing target direction with specialized "target neurons" to a more distributed, ensemble-level code." Can the authors clarify why this is the only reasonable explanation for these results? It seems like many other explanations could be construed.

      We have added additional clarification to this section and now use more tentative language:

      “The observation that single-unit information for target direction decreased between the fourth and final convolutional layers indicates that the units become progressively less selective for particular target directions. Since population-level decoding remained high in these layers, this suggests a transition from representing target direction with specialized "target neurons" to a more distributed, ensemble-level code.”

      (5) Lines 372-376: "Thus, simply training the model to perform the task is not sufficient to reproduce a behavioral phenomenon widely-observed in conflict tasks. This challenges a core (but often implicit) assumption of the task-optimized training paradigm, namely that to do a task well, a training model will result in model representations that are similar to those employed by humans." While I agree with the general sentiment, I feel that its application here is strange. Unless I'm missing something, in the context of the preceding sentence, the authors seem to be saying that researchers in the field expect that CNNs can produce a behavioral phenomenon (RTs) that is completely outside of their design and training. I don't think that anyone actually expects that.

      We moved the discussion/analyses of RTs to the next paragraph. It should now be clear that this statement refers specifically to the absence of an accuracy congruency effect in the task-optimized models.

      (6) Lines 387-389: "As a result, the VAMs may learn richer representations of the stimuli, since a variety of stimulus features-layout, stimulus position, flanker direction-influence behavior (Figure 2)." That is certainly true of tasks like this one where an optimal model would only focus on a tiny part of the image, whereas humans are distracted by many features. I'm not sure that this distractibility is the same as "richer representations". When CNNs classify images based on the background, would the authors claim that they have richer representations than humans?

      We agree that “richer” may not be the best way to characterize these representations, and have changed it to “more complex”.

      (7) Is it possible that drift rate d_k for each response happens to be negative on a given trial? If so, how is the decision given on such trials (since presumably none of the accumulators will ever reach the boundary)?

      It is indeed possible for all of the drift rates to be negative, though we found that this occurred for a vanishingly small number of trials (mean ± s.e.m. percent trials/model: 0.080 ± 0.011%, n = 75 models), as reported in the Methods. These trials were excluded from analyses.

      (8)  Can the authors comment on how they chose the CNN architecture and whether they expect that different architectures will produce similar results?

      Before establishing the seven-layer CNN architecture used throughout the paper, we conducted some preliminary experiments using other architectures that differed primarily in the number of CNN layers. We found that models with significantly fewer than seven layers typically failed to reach human-level accuracy on the task while larger models achieved human-level accuracy but (unsurprisingly) took longer to train.

      Reviewer #3 (Recommendations For The Authors):

      - In the introduction to this paper (particularly the paragraph beginning in line 33), the authors note that EAMs have typically been used in simplified settings and that they do not provide a means to account for how people extract information from naturalistic stimuli. While I agree with this, the idea of connecting CNNs of visual processing with EAMs for a joint modeling framework has been done. I recommend looking at and referencing these two articles as well as adjusting the tenor of this part of an introduction to better reflect the current state of the literature. For full disclosure, I am one of the authors on these articles. https://link.springer.com/article/10.1007/s42113-019-00042-1 https://www.sciencedirect.com/science/article/abs/pii/S0010027721001323

      We agree—thanks for pointing this out. The revised Introduction now discusses prior related models in more detail (including those referenced above) and better clarifies the novel contributions of our model. We specifically highlight that a novel contribution of the VAM is that “the CNN and EAM parameters are jointly fitted to the RT, choice, and visual stimulus data from individual participants in a unified Bayesian framework.”

      - The statement in lines 56-58 implies that this is the first article to glue CNNs together with EAMs. I would edit this accordingly based on the prior comment here and references provided. I will note that the second feature of the approach in this paper is still novel and really nice, namely the fact that the CNN and the EAM are jointly fitted. In the aforementioned references, the CNN is trained on the image set, and individual level Bayesian estimation was only applied to the EAM. Thus, it may be useful to highlight the joint estimation aspect of this investigation as well as how the uniqueness of the data available makes it possible.

      Agreed—see above.

      - Figure 3c and associated text. I understand the MI analysis you are performing here, however it is difficult to interpret as it stands. In the figure, what does a MI of 0.1 mean?? Can you give some context to that scale? I do find the interpretation of the hunchback shape in lines 210-222 to be somewhat of a stretch. The discussion that precedes (lines 199-209) this is clear and convincing. Can this discussion be strengthened more? And more interpretability of Figure 3c would be helpful; entropic scales can be hard to interpret without some context or scale associated.

      The MI analyses in Fig. 3C (and also Figs. 4C and 6E) show normalized MI, in which the raw MI has been divided by the entropy of the stimulus feature distribution. This normalization facilitates comparing the MI for different stimulus features, which is relevant for Figs. 4C and 6E. The normalized MI has a possible range of [0, 1], where 1 indicates perfect correlation between the two variables and 0 indicates complete independence. We now note in the legend of these figures that the possible normalized MI range is [0, 1], which should help with interpreting these values. Our revised results section for Fig. 3C now also includes some additional remarks on our interpretation of the hunchback shape of the MI.

      - Lines 244-248 and the analyses in Figure 3 suggest a change in the behavior of the CNN around layer 4. This is just a musing, but what would happen if you just used a 4 layer CNN, or even a 3 layer? This is not just a methods question. Your analysis suggests a transition from localized to distributed information representation. Right now, the EAM only sees the output of the distributed representation. What if it saw the results the more local representations from early layers? Of course, a shallower network may just form the distributed representations earlier, but it would interesting if there were a way to tease out not just the presence of distributed vs local representations, but the utility of those to the EAM.

      Thanks for this interesting suggestion. We did do some preliminary experiments in models with fewer layers, though we only examined the outputs of these models and did not assess their representations. We found that models with 3–5 layers generally failed to achieve human-level accuracy on the task. In principle, one could relate this observation to the representations of these models as a means of assessing the relative utility of distributed/local representations. However, there are confounding factors that one would ideally control for in order to compare models with different numbers of layers in this fashion (namely, the number of parameters).

      - Section Line 359 (Task optimized models) - It would be helpful to clarify here what these task-optimized models are being trained to do. As I understand it, they are being trained to directly predict the target direction. But are you asking them to learn to predict the true target direction? Or are you training them to predict what each individual responds? I think it is the second (since you have 75 of these), but it's not clear. I looked at the methods and still couldn't get a clear description of this. Also, are you just stripping the LBA off of the end of the CNN and then essentially putting a softmax in its place? If so, it would be helpful to say so.

      The task-optimized models were actually trained to output the true target direction in each stimulus, rather than trained to match the decisions of the human participants. We trained 75 such models since we wanted to use exactly the same stimuli as were used to train each VAM. The task-optimized CNNs were identical to those used in the VAMs, except that the outputs of the last layer were converted to softmax-scored probabilities for each direction rather than drift rates. The Results and Methods section now included additional commentary that clarifies these points.

      - Line 373-376: This statement is pretty well established at this point in the similarity judgement literature. I recommend looking at and referencing https://onlinelibrary.wiley.com/doi/full/10.1111/cogs.13226 https://www.nature.com/articles/s41562-020-00951-3 https://link.springer.com/article/10.1007/s42113-020-00073-z

      Thanks for pointing this out. For reference, the statement in question is “Thus, simply training the model to perform the task is not sufficient to reproduce a behavioral phenomenon widely-observed in conflict tasks. This challenges a core (but often implicit) assumption of the task-optimized training paradigm, namely that training a model to do a task well will result in model representations that are similar to those employed by humans.”

      We agree that the first and third reference you mention are relevant, and we now cite them along with some other relevant work. In our view, the second reference you mention is not particularly relevant (that paper introduces a new computational model for similarity judgements that is fit to human data, but does not comment on training models to perform tasks vs. fitting to human data).

      - Line 387-388: "VAMs may learn richer representations". This is a bit of a philosophical point, but I'll go ahead and mention it. The standard VAM does not necessarily learn "richer" feature representations. Rather, you are asking the VAM and task-optimized models to do different things. As a result, they learn different representations. "Better" or "richer" is in the eye of the beholder. In one view, you could view the VAM performance as sub-par since it exhibits strange artifacts (congruency effects) and the expansion of dimensionality in the VAM representations is merely a side-effect of poor performance. I'm not advocating this view, just playing devils advocate and suggesting a more nuanced discussion of the difference between the VAM and task-optimized models.

      We agree—this is a great point. We have changed this statement to read “the VAMs may learn more complex [rather than richer] representations of the stimuli”.

      - Lines 567-570: Here you discuss how the LBA backend of the VAM can't account for shrinking spotlight-like RT effects but that fitting models to different RT quantiles helps overcome this. I find this to be one of the weakest points of the paper (the whole process of fitting RT quantiles separately to begin with). This is just a limitation of the RT component of the model. This is a great paper but this is just a limitation inherent in the model. I don't see a need to qualify this limitation and think it would be better to just point out that this is a limitation of the LBA itself (be more clear that it is the LBA that is the limiting factor here) and that this leaves room for future research. From your last sentence of this paragraph, I agree that recurrent CNNs would be interesting. I will note that RNN choice-RT models are out there (though not with CNNs as part of the model).

      We agree and have revised this section of the Discussion accordingly (see our response to Reviewer #2 for more detail). We also removed the analyses of models trained on separate RT quantiles.

    1. Reviewer #2 (Public review):

      This manuscript asks an interesting and important question: what part of 'cerebellar' motor dysfunction is an acute control problem vs a compensatory strategy to the acute control issue? The authors use a cerebellar 'blockade' protocol, consisting of high-frequency stimuli applied to the cerebellar peduncle which is thought to interfere with outflow signals. This protocol was applied in monkeys performing center outreaching movements and has been published from this laboratory in several preceding studies. I found the take-home-message broadly convincing and clarifying - that cerebellar block reduces muscle activation acutely particularly in movements that involve multiple joints and therefore invoke interaction torques, and that movements progressively slow down to in effect 'compensate' for these acute tone deficits. The manuscript was generally well written, and the data was clear, convincing, and novel. My comments below highlight suggestions to improve clarity and sharpen some arguments.

      Primary comments:

      (1) Torque vs. tone: Is it known whether this type of cerebellar blockade is reducing muscle tone or inducing any type of acute co-contraction that could influence limb velocity through mechanisms different than 'atonia'? If so, the authors should discuss this information in the discussion section starting around line 336, and clarify that this motivates (if it does) the focus on 'torques' rather than muscle activation. Relatedly, besides the fact that there are joints involved, is there a reason there is so much emphasis on torque per se? If the muscle is deprived of sufficient drive, it would seem that it would be more straightforward to conceptualize the deficit as one of insufficient timed drive to a set of muscles than joint force. Some text better contextualizing the choices made here would be sufficient to address this concern. I found statements like those in the introduction "hand velocity was low initially, reflecting a primary muscle torque deficit" to be lacking in substance. Either that statement is self-evident or the alternative was not made clear. Finally, emphasize that it is a loss of self-generated torque at the shoulder that accounts for the velocity deficits. At times the phrasing makes it seem that there is a loss of some kind of passive torque.

      (2) Please clarify some of the experimental metrics: Ln 94 RESULTS. The success rate is used as a primary behavioral readout, but what constitutes success is not clearly defined in the methods. In addition to providing a clear definition in the methods section, it would also be helpful for the authors to provide a brief list of criteria used to determine a 'successful' movement in the results section before the behavioral consequences of stimulation are described. In particular, the time and positional error requirements should be clear.

      (3) Based on the polar plot in Figure 1c, it seemed odd to consider Targets 1-4 outward and 5-8 inward movements, when 1 and 5 are side-to-side. Is there a rationale for this grouping or might results be cleaner by cleanly segregating outward (targets 2-4) and inward (targets 6-8) movements? Indeed, by Figure 3 where interaction torques are measured, this grouping would seem to align with the hypothesis much more cleanly since it is with T2,T3,and T4 where clear coupling torques deficits are seen with cerebellar block.

      4. I did not follow Figure 3d. Both the figure axis labels and the description in the main text were difficult to follow. Furthermore, the color code per animal made me question whether the linear regression across the entire dataset was valid, or would be better performed within animal, and the regressions summarized across animals. The authors should look again at this section and figure.

      (5) Line 206+ The rationale for examining movement decomposition with a cerebellar block is presented as testing the role of the cerebellum in timing. Yet it is not spelled out what movement decomposition and trajectory variability have to do with motor timing per se.

    2. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In a previous work, Prut and colleagues had shown that during reaching, high-frequency stimulation of the cerebellar outputs resulted in reduced reach velocity. Moreover, they showed that the stimulation produced reaches that deviated from a straight line, with the shoulder and elbow movements becoming less coordinated. In this report, they extend their previous work by the addition of modeling results that investigate the relationship between the kinematic changes and torques produced at the joints. The results show that the slowing is not due to reductions in interaction torques alone, as the reductions in velocity occur even for movements that are single joints. More interestingly, the experiment revealed evidence for the decomposition of the reaching movement, as well as an increase in the variance of the trajectory.

      Strengths:

      This is a rare experiment in a non-human primate that assessed the importance of cerebellar input to the motor cortex during reaching.

      Weaknesses:

      My major concerns are described below.

      If I understand the task design correctly, the monkeys did not need to stop their hand at the target. I think this design may be suboptimal for investigating the role of the cerebellum in control of reaching because a number of earlier works have found that the cerebellum's contributions are particularly significant as the movement ends, i.e., stopping at the target. For example, in mice, interposed nucleus neurons tend to be most active near the end of the reach that requires extension, and their activation produces flexion forces during the reach (Becker and Person 2019). Indeed, the inactivation of interposed neurons that project to the thalamus results in overshooting of reaching movements (Low et al. 2018). Recent work has also found that many Purkinje cells show a burst-pause pattern as the reach nears its endpoint, and stimulation of the mossy fibers tends to disrupt endpoint control (Calame et al. 2023). Thus, the fact that the current paper has no data regarding endpoint control of the reach is puzzling to me.

      We appreciate the reviewer’s point that cerebellar contributions can be particularly critical near the endpoint of a reach. In our current task design, monkeys were indeed required to hold at the target briefly—100 ms for Monkeys S and P, and 150 ms for Monkeys C and M—before receiving a reward. However, given the size of the targets and the velocity of movements, it often happened that the monkey didn’t have to stop its movement to obtain a reward. Importantly, we relaxed the task’s requirements (by increasing target size and reducing temporal constraints) to allow monkeys to perform the task under cerebellar block conditions as we found that the strict criteria in these conditions yield a low success rate. This design is suboptimal for studying endpoint accuracy which, as we now appreciate, is an important aspect of cerebellar control. In our revision, we will clarify these aspects of the task design and acknowledge that it is sub-optimal for examining the role of cerebellum in end-point control. Future studies will explicitly address this point more carefully.

      Because stimulation continued after the cursor had crossed the target, it is interesting to ask whether this disruption had any effects on the movements that were task-irrelevant. The reason for asking this is because we have found that whereas during task-relevant eye or tongue movements the Purkinje cells are strongly modulated, the modulations are much more muted when similar movements are performed but are task-irrelevant (Pi et al., PNAS 2024; Hage et al. Biorxiv 2024). Thus, it is interesting to ask whether the effects of stimulation were global and affected all movements, or were the effects primarily concerned with the task-relevant movements.

      This is a very interesting suggestion. Although our main analysis focused on target-directed reaching movements, we have the data for the between-trial movements under continuous stimulation (e.g., return to center movements). In our revised supplementary material, we will examine the effect of cerebellar block on endpoint velocities in inter-trial movements versus task-related movements.

      If the schematic in Figure 1 is accurate, it is difficult for me to see how any of the reaching movements can be termed single joint. In the paper, T1 is labeled as a single joint, and T2-T4 are labeled as dual-joint. The authors should provide data to justify this.

      The is reviewer right and movements to all targets engages shoulder and elbow but the single joint participation varied in a target-specific manner. In the manuscript, we used the term “single-joint” to indicate a target direction in which one joint remains stationary, resulting in minimal coupling torque at the adjacent joint. Specifically, for Targets 1 and 5 in our experiments, the net torque (and thus acceleration) at the elbow was negligible, and hence the shoulder experienced correspondingly low coupling torque (as illustrated in Figure 3c of our manuscript). To avoid confusion, we will use the term ‘predominantly single-joint’ movements in our revised manuscript to indicate targets with low coupling torques. We will also include an additional figure in the revised supplementary material displaying the net torques at the shoulder and elbow, similar to Figures 2c and 3c. Our goal is to demonstrate that movements to targets 1 and 5 are characterized by predominantly one-joint engagement (i.e., the elbow is stationary with low net torque) and low coupling torques, rather than implying a purely isolated, single-joint motion.

      Because at least part of this work was previously analyzed and published, information should be provided regarding which data are new.

      We will include a clear statement in the Methods section specifying which components of the dataset and analyses are entirely new. While some of the same animals and stimulation protocol were presented in prior work, the inverse-dynamics modeling, analyses of progressive movement changes across trials under stimulation and invariance of motor noise to movement velocity are newly reported in this manuscript.

      Reviewer #2 (Public review):

      This manuscript asks an interesting and important question: what part of 'cerebellar' motor dysfunction is an acute control problem vs a compensatory strategy to the acute control issue? The authors use a cerebellar 'blockade' protocol, consisting of high-frequency stimuli applied to the cerebellar peduncle which is thought to interfere with outflow signals. This protocol was applied in monkeys performing center outreaching movements and has been published from this laboratory in several preceding studies. I found the take-home-message broadly convincing and clarifying - that cerebellar block reduces muscle activation acutely particularly in movements that involve multiple joints and therefore invoke interaction torques, and that movements progressively slow down to in effect 'compensate' for these acute tone deficits. The manuscript was generally well written, and the data was clear, convincing, and novel. My comments below highlight suggestions to improve clarity and sharpen some arguments.

      Primary comments:

      (1) Torque vs. tone: Is it known whether this type of cerebellar blockade is reducing muscle tone or inducing any type of acute co-contraction that could influence limb velocity through mechanisms different than 'atonia'? If so, the authors should discuss this information in the discussion section starting around line 336, and clarify that this motivates (if it does) the focus on 'torques' rather than muscle activation. Relatedly, besides the fact that there are joints involved, is there a reason there is so much emphasis on torque per se? If the muscle is deprived of sufficient drive, it would seem that it would be more straightforward to conceptualize the deficit as one of insufficient timed drive to a set of muscles than joint force. Some text better contextualizing the choices made here would be sufficient to address this concern. I found statements like those in the introduction "hand velocity was low initially, reflecting a primary muscle torque deficit" to be lacking in substance. Either that statement is self-evident or the alternative was not made clear. Finally, emphasize that it is a loss of self-generated torque at the shoulder that accounts for the velocity deficits. At times the phrasing makes it seem that there is a loss of some kind of passive torque.

      We appreciate the reviewer’s emphasis on distinguishing reduced muscle tone and altered co-contraction patterns as possible explanations for decreased limb velocity. Our focus on torques arises from previous studies suggesting that the core deficit in cerebellar ataxia is impaired prediction of coupling torques. This point will be added in the discussion section of our revised manuscript where we will explain why we prioritize muscle torques and how muscle-level activation collectively contributes to net joint torques. Also, we will underscore that the observed velocity deficits primarily reflect a reduction of self-generated torque at the shoulder (whether acute or adaptive), rather than any reduction in passive torques.

      (2) Please clarify some of the experimental metrics: Ln 94 RESULTS. The success rate is used as a primary behavioral readout, but what constitutes success is not clearly defined in the methods. In addition to providing a clear definition in the methods section, it would also be helpful for the authors to provide a brief list of criteria used to determine a 'successful' movement in the results section before the behavioral consequences of stimulation are described. In particular, the time and positional error requirements should be clear.

      Successful trials were trials in which monkeys didn’t leave the center position before the go signal and reached the peripheral target within a specific time criteria. These values varied in different monkeys. We will include detailed definitions of our success criteria in the revised methods section of our manuscript. Specifically, we will update our methods section to include (i) the timing criteria of each phase of the trials and (ii) the size of the peripheral targets indicating the tolerance for endpoint accuracy.

      (3) Based on the polar plot in Figure 1c, it seemed odd to consider Targets 1-4 outward and 5-8 inward movements, when 1 and 5 are side-to-side. Is there a rationale for this grouping or might results be cleaner by cleanly segregating outward (targets 2-4) and inward (targets 6-8) movements? Indeed, by Figure 3 where interaction torques are measured, this grouping would seem to align with the hypothesis much more cleanly since it is with T2,T3,and T4 where clear coupling torques deficits are seen with cerebellar block.

      We acknowledge the reviewer’s observation regarding Targets 1 and 5 being side-to-side rather than strictly “outward” or “inward.” In the first section of our results, we grouped the targets in this way to emphasize the notably stronger effect of the cerebellar block on targets involving shoulder flexion (‘outward’) as compared to those involving shoulder extension (‘inwards’). For subsequent analyses we focused on the effects of cerebellar block on outward targets where movements were single-joint (Target 1) vs. multi-joint (Targets 2-4). To clarify this aspect, in our revised manuscript we will explain the rationale for grouping T1–T4 as “outward” and T5–T8 as “inward,” including how we defined them.

      (4) I did not follow Figure 3d. Both the figure axis labels and the description in the main text were difficult to follow. Furthermore, the color code per animal made me question whether the linear regression across the entire dataset was valid, or would be better performed within animal, and the regressions summarized across animals. The authors should look again at this section and figure.

      We will revise the figure labels and legend to clarify how each axis is defined. Please note that pooling the data was done after confirming that data from each animal expressed a similar trend. Specifically, the correlation coefficients were all positive but statistically significant in 3 out of the 4 monkeys. Moreover, following the reviewers’ feedback, we also did a partial correlation analysis (which controls for the variability across monkeys) and found a significant correlation (r = 0.33, p < 0.001). These points will be described in the revised manuscript.

      (5) Line 206+ The rationale for examining movement decomposition with a cerebellar block is presented as testing the role of the cerebellum in timing. Yet it is not spelled out what movement decomposition and trajectory variability have to do with motor timing per se.

      The reviewer is right and the relations between timing, decomposition and variability need to be explicitly presented. In our revision, we will explain how decomposed movements may reflect impaired temporal coordination across multiple joints—a critical cerebellar function. We will also clarify how increased variability in joint coordination can result in increased trial-to-trial variability of trajectories.

      Reviewer #3 (Public review):

      Summary:

      In their manuscript, "Disentangling acute motor deficits and adaptive responses evoked by the loss of cerebellar output," Sinha and colleagues aim to identify distinct causes of motor impairments seen when perturbing cerebellar circuits. This goal is an important one, given the diversity of movement-related phenotypes in patients with cerebellar lesions or injuries, which are especially difficult to dissect given the chronic nature of the circuit damage. To address this goal, the authors use high-frequency stimulation (HFS) of the superior cerebellar peduncle in monkeys performing reaching movements. HFS provides an attractive approach for transiently disrupting cerebellar function previously published by this group. First, they found a reduction in hand velocities during reaching, which was more pronounced for outward versus inward movements. By modeling inverse dynamics, they find evidence that shoulder muscle torques are especially affected. Next, the authors examine the temporal evolution of movement phenotypes over successive blocks of HFS trials. Using this analysis, they find that in addition to the acute, specific effects on muscle torques in early HFS trials, there was an additional progressive reduction in velocity during later trials, which they interpret as an adaptive response to the inability to effectively compensate for interaction torques during cerebellar block. Finally, the authors examine movement decomposition and trajectory, finding that even when low-velocity reaches are matched to controls, HFS produces abnormally decomposed movements and higher than expected variability in trajectory.

      Strengths:

      Overall, this work provides important insight into how perturbation of cerebellar circuits can elicit diverse effects on movement across multiple timescales.

      The HFS approach provides temporal resolution and enables analysis that would be hard to perform in the context of chronic lesions or slow pharmacological interventions. Thus, this study describes an important advance over prior methods of circuit disruption, and their approach can be used as a framework for future studies that delve deeper into how additional aspects of sensorimotor control are disrupted (e.g., response to limb perturbations).

      In addition, the authors use well-designed behavioral approaches and analysis methods to distinguish immediate from longer-term adaptive effects of HFS on behavior. Moreover, inverse dynamics modeling provides important insight into how movements with different kinematics and muscle dynamics might be differentially disrupted by cerebellar perturbation.

      Weaknesses:

      The argument that there are acute and adaptive effects to perturbing cerebellar circuits is compelling, but there seems to be a lost opportunity to leverage the fast and reversible nature of the perturbations to further test this idea and strengthen the interpretation. Specifically, the authors could have bolstered this argument by looking at the effects of terminating HFS - one might hypothesize that the acute impacts on muscle torques would quickly return to baseline in the absence of HFS, whereas the longer-term adaptive component would persist in the form of aftereffects during the 'washout' period. As is, the reversible nature of the perturbation seems underutilized in testing the authors' ideas.

      We agree that our approach could more explicitly exploit the rapid reversibility of high-frequency stimulation (HFS) by examining post-stimulation ‘washout’ periods. However, for the present dataset, we ended the session after the set of cerebellar block trials. We plan to study the effect of cerebellar block on immediate post-block washout trials in the future.  

      The analysis showing that there is a gradual reduction in velocity during what the authors call an adaptive phase is convincing. That said, the argument is made that this is due to difficulty in compensating for interaction torques. Even if the inward targets (i.e., targets 6-8) do not show a deficit during the acute phase, these targets still have significant interaction torques (Figure 3c). Given the interpretation of the data as presented, it is not clear why disruption of movement during the adaptive phase would not be seen for these targets as well since they also have large interaction torques. Moreover, it is difficult to delve into this issue in more detail, as the analyses in Figures 4 and 5 omit the inward targets.

      The reviewer is right and movements to Targets 6–8 (inward) were seemingly unaffected despite also involving significant interaction torques. In fact, we have already attempted to address this issue in the discussion section of the version 1 of our manuscript. Specifically, we note that while outward targets (2–4) tend to involve higher coupling torque impulses on average, this alone does not fully explain the differential impact of cerebellar block, as illustrated by discrepancies at the individual target level (e.g., target 7 vs. target 1). We proposed two possible explanations: (1) a bias toward shoulder flexion in the effect of cerebellar block—consistent with earlier studies showing ipsilateral flexor activation or tone changes following stimulation or lesioning of the deep cerebellar nuclei; and (2) a posture-related facilitation of inward (shoulder extension) movements from the central starting position.

      The text in the Introduction and in the prior work developing the HFS approach overstates the selectivity of the perturbations. First, there is an emphasis on signals transmitted to the neocortex. As the authors state several times in the Discussion, there are many subcortical targets of the cerebellar nuclei as well, and thus it is difficult to disentangle target-specific behavioral effects using this approach. Second, the superior cerebellar peduncle contains both cerebellar outputs and inputs (e.g., spinocerebellar). Therefore, the selectivity in perturbing cerebellar output feels overstated. Readers would benefit from a more agnostic claim that HFS affects cerebellar communication with the rest of the nervous system, which would not affect the major findings of the study.

      The reviewer is right that the superior cerebellar peduncle carries both descending and ascending fibers, and that cerebellar nuclei project to subcortical as well as cortical targets. However, it is also important to note that in primates the cerebellar-thalamo-cortical (CTC) pathway greatly expanded (on the expanse of the cerbello-rubro-spinal tract) in mediating cerebellar control of voluntary movements (Horne and Butler, 1995). The cerebello-subcortical pathways lost its importance over the course of evolution (Nathan and Smith, 1982, Padel et al., 1981, ten Donkelaar, 1988). In our previous study we found that the ascending spinocerebellar axons which enter the cerebellum through the SCP are weakly task-related and the descending system is quite small (Cohen et al, 2017). However, we cannot rule out an effect of HFS mediated in part through other systems. In the revised introduction section, we will clarify this point and use more careful language about the scope of our stimulation, emphasizing that HFS disrupts cerebellar communication broadly, rather than solely the cerebello-thalamo-cortical pathway.

      The text implies that increased movement decomposition and variability must be due to noise. However, this assumption is not tested. It is possible that the impairments observed are caused by disrupted commands, independent of whether these command signals are noisy. In other words, commands could be low noise but still faulty.

      We recognize the reviewer’s concern about linking movement decomposition and trial-to-trial trajectory variability with motor noise. As presented in our discussion section, we interpret these motor abnormalities as a form of motor noise in the sense that they are generated by faulty motor commands. We draw our interpretation from the findings of previous research work which show that the cerebellum aids in the state estimation of the limb and subsequent generation of accurate feedforward commands. Therefore, disruption of the cerebellar output may lead to faulty motor commands resulting in the observed asynchronous joint activations (i.e., movement decomposition) and unpredictable trajectories (i.e., increased trial-to-trial variability). Both observed deficits resemble increased motor noise.

      Throughout the text, the use of the term 'feedforward control' seems unnecessary. To dig into the feedforward component of the deficit, the authors could quantify the trajectory errors only at the earliest time points (e.g., in Figure 5d), but even with this analysis, it is difficult to disentangle feedforward- and feedback-mediated effects when deficits are seen throughout the reach. While outside the scope of this study, it would be interesting to explore how feedback responses to limb perturbation are affected in control versus HFS conditions. However, as is, these questions are not explored, and the claim of impaired feedforward control feels overstated.

      We agree that to strictly focus on feedforward control, we could have examined the measured variables in the first 50-100 ms of the movement which has been shown to be unaffected by feedback responses (Pruszynski et al. 2008, Todorov and Jordan 2002, Pruszynski and Scott 2012, Crevecoeur et al. 2013). However, in our task the amplitude of movements made by our monkeys was small and therefore the response measures we used were too small in the first 50-100 ms for a robust estimation. Also, fixing a time window led to an unfair comparison between control and cerebellar block trials, in which velocity was significantly reduced and therefore movement time was longer. Therefore, we used the peak velocity, torque-impulse at the peak velocity and maximum deviation of the hand trajectory as response measures. We will acknowledge this point in the discussion section of our revised manuscript. We will also tone down references to feedforward control throughout the text of our revised manuscript as suggested by the reviewer.

      The terminology 'single-joint' movement is a bit confusing. At a minimum, it would be nice to show kinematics during different target reaches to demonstrate that certain targets are indeed single joint movements. More of an issue, however, is that it seems like these are not actually 'single-joint' movements. For example, Figure 2c shows that target 1 exhibits high elbow and shoulder torques, but in the text, T1 is described as a 'single-joint' reach (e.g. lines 155-156). The point that I think the authors are making is that these targets have low interaction torques. If that is the case, the terminology should be changed or clarified to avoid confusion.

      Indeed, as reviewer #1 also noted, movements to target 1 and 5 are not purely single-joint but rather have relatively low coupling torques. Our intention while using the term “single-joint” was to indicate a target direction in which one joint remains stationary, resulting in minimal coupling torque at the adjacent joint. Specifically, for Targets 1 and 5 in our experiments, the net torque (and thus acceleration) at the elbow was negligible, and hence the shoulder experienced correspondingly low coupling torque (as illustrated in Figure 3c of our manuscript). ). To avoid confusion, we will use the term ‘predominantly single-joint’ movements in our revised manuscript to indicate targets with low coupling torques. We will also include an additional figure in the revised supplementary material displaying the net torques at the shoulder and elbow, similar to Figures 2c and 3c. Our goal is to demonstrate that movements to targets 1 and 5 are characterized by predominantly one-joint engagement (i.e., the elbow is stationary with low net torque) and low coupling torques, rather than implying a purely isolated, single-joint motion.

      The labels in Figure 3d are confusing and could use more explanation in the figure legend.

      In Figure 3d, it is stated that data from all monkeys is pooled. However, if there is a systematic bias between animals, this could generate spurious correlations. Were correlations also calculated for each animal separately to confirm the same trend between velocity and coupling torques holds for each animal?

      We will revise the figure legend and main-text explanation for Figure 3d. Please note that pooling the data was done after confirming that data from each animal expressed a similar trend. Specifically, the correlation coefficients were positive but significant for 3 out of the 4 monkeys. Moreover, following the reviewers’ feedback, we also did a partial correlation analysis (which controls for the variability across monkeys) and found a significant correlation (r = 0.33, p < 0.001). These points will be described in the revised manuscript.

      In Table S1, it would be nice to see target-specific success rates. The data would suggest that targets with the highest interaction torques will have the largest reduction in success rates, especially during later HFS trials. Is this the case?

      We will provide a breakdown of the success rates as a function of targets. However, one should note that success/failure may depend on several factors beyond impaired limb dynamics. In a previous study (Nashef et al. 2019) we identified several causes of failure such as (i) not entering the central target in time, (ii) moving out too early from the peripheral target, (iii) Reaction time longer than permitted, or (iv) premature exit from the central target before permitted.

    1. Depuis la découverte de cette communauté d’usagers par les praticiens et les chercheurs en sciences de la santé à la fin des années 1990 [8], toute analyse consiste initialement à envisager le Web ana-mia sous l’angle d’une catégorisation de ses contenus : conseils pour maigrir, images susceptibles de déclencher des comportements à risque, ruses pour cacher la restriction ou le rejet alimentaire aux proches et aux familiers, récits autobiographiques, liens vers des fournisseurs de services de santé spécialisés (équipements, nutrition, prise en charge médicale) [9]. En particulier, les « pages perso » et les blogs, avec leur attention emphatique aux codes de communication de leurs auteurs, semblent s’accorder avec une vision très centrée sur les individus, leurs problèmes et leurs trajectoires.

      Argument épistémique : la catégorisation des contenus sur Internet avec leur propre code

    1. Summary of the Talk on the Future of CMS, No-Code, Low-Code, and AI-Generated Applications

      Evolution of CMS and No-Code Tools

      Traditional CMS:

      "Back in the day it was like WordPress... the original web where we would write code and then we would just push it up to servers."

      CMS emerged to allow non-developers to contribute to web content without coding.

      No-Code Tools:

      "No code is like your drag and drop GUI... Webflow or whatever."

      Introduced drag-and-drop interfaces for broader accessibility, with pros and cons in usability.

      AI in No-Code:

      "Fast forward even further we've got AI coding right... now a person can just make an app."

      AI models like Claude 3.5 enable app generation with minimal developer intervention.

      Current No-Code/Low-Code AI Tools Landscape

      Key Tools in the Market:

      "Let's create the definitive list here... Cursor, Bolt, Lovable."

      Cursor is for developers; Bolt and Lovable cater to non-developers with different strengths.

      Strengths and Weaknesses:

      "Bolt is a great boilerplate generator... Lovable is great if you want ShadCN styling."

      Developers prefer Bolt for flexibility; Lovable is preferred for pre-styled design systems.

      Challenges with AI-Generated Code

      Integration Issues:

      "It's not your existing code base... you need to use your components, design system, and backend logic."

      AI-generated code often exists in isolation, making integration difficult for enterprise use.

      Code Quality Concerns:

      "Engineers are not going to want a pull request by a non-engineer."

      Quality control and maintainability remain significant barriers.

      Customization and Precision:

      "Webflow is hard to use... but it gives you 100% precision control."

      While AI provides convenience, fine-grained control is still preferred by professionals.

      Future of AI-Driven Development

      Combining AI with Structured CMS-like Workflows:

      "Ideally, we have something like a headless CMS where we can make updates over API."

      Future solutions should enable AI updates via APIs while maintaining design consistency.

      Ideal Workflow Vision:

      "In an ideal world, we can be editing with prompts and visually."

      The goal is a hybrid model with AI-driven automation and manual precision controls.

      AI-Based Iteration and Optimization:

      "AI should listen to your customers... iterate really fast."

      Faster feedback loops and continuous optimization through AI experimentation.

      Technical Approaches to Solving Challenges

      Meta's React Blocks:

      "What React blocks let developers do is a backend dev in Python can code up a React UI."

      An approach that allows dynamic UI changes without shipping new native app versions.

      Mitosis Framework:

      "Mitosis is a project that explores transpilation and visual manipulation."

      Enables converting JSX into structured JSON for flexible rendering and AI-based updates.

      Code-Driven Visual Editing:

      "SwiftUI allows updating code with visual previews and vice versa."

      Bidirectional code editing is a possible future solution but is still complex.

      Current Limitations and Considerations

      Performance and Feasibility Issues:

      "When I had Google bots crawling my AI-generated site, I got a $4,000/day Anthropics bill."

      Generating content in real-time is currently too expensive at scale.

      Security and Compliance Risks:

      "Dynamic code delivery is ripe with security challenges."

      Any AI-driven solutions must consider performance, security, and governance.

      Key Use Cases and Applications

      Prototyping vs. Production:

      "Phenomenal prototyping tools, but moving to production is challenging."

      AI tools excel in concept validation but require extensive refinement for production.

      Personalization Opportunities:

      "The AI could automatically scale things up or down based on performance."

      Future possibilities include hyper-personalized user experiences.

      Conclusion and Outlook

      Near-Term Expectations:

      "Webflow and Framer will likely add more AI features over time."

      Existing players are expected to incorporate AI capabilities gradually.

      Long-Term Potential:

      "AI tools will eventually iterate and personalize dynamically based on user input."

      The convergence of AI, CMS, and design systems may redefine how software is built.

      This summary captures the essence of the speaker's discussion, highlighting key concepts, industry trends, challenges, and possible future developments in the AI-powered CMS and no-code/low-code space.

    1. "exhaustive-to include all conditions-and to ensure thatnoparticular event of sickness will be classified undermore than one code numbe

      a whole host of problems with this! 1. ethnocentric to believe that events of sickness are only coded under one number 2. misunderstands some sicknesses as interactions of others 3. misunderstands cultural definitions of sickness

    Annotators

    1. monopoly in terms of the share of aggregate industry capex that is spent on training and inference infrastructure.

      Nvidia currently holds a dominant position in the market for AI compute hardware—especially GPUs used in training and inference. While it is not literally the only option, the company’s technology and ecosystem have led to a large share of the data center capital spending on AI workloads flowing toward Nvidia products. Its robust software platform (including CUDA and related libraries), longstanding relationships with cloud providers, and high-performance hardware have contributed to this situation, resulting in a market share that can appear “monopolistic.”

      Brief Explanation

      Technical Lead: Nvidia pioneered specialized GPUs and software tools for AI, making its technology the default choice for many enterprises and research teams. Software Ecosystem: CUDA (Nvidia’s parallel computing platform) is mature, widely adopted, and well-supported, creating high switching costs for developers. Cloud Provider Partnerships: Major cloud providers (AWS, Azure, Google Cloud) heavily feature Nvidia GPUs, which solidifies Nvidia’s presence and channels a large portion of AI-related spending its way. Competition Exists: Competitors like AMD, Intel, and various AI-chip startups do exist, but they currently hold smaller market shares. Growing interest in alternatives may eventually reduce Nvidia’s dominance, but for now, Nvidia’s position remains strong.


      Nvidia’s software ecosystem—centered on the CUDA platform and its extensive libraries—offers a few key advantages:

      Easy Integration with AI Frameworks: CUDA support is built right into popular AI tools (like TensorFlow and PyTorch), so developers automatically get GPU acceleration without re-writing large sections of code.

      Optimized Performance: Nvidia continuously refines its libraries (cuDNN, TensorRT, etc.) to extract maximum speed from its GPUs for both training and inference.

      Mature Developer Tools and Documentation: Years of updates and community feedback mean robust toolsets, reliable drivers, and a wealth of tutorials, making the learning curve more manageable.

      Rich Ecosystem of Extensions and Plugins: Nvidia has packaged domain-specific SDKs (e.g., Clara for healthcare, Metropolis for smart cities), which simplify development for specialized use cases.

      High Switching Costs: Because so many workloads, frameworks, and research projects have been built and tested on CUDA, organizations are often reluctant to switch to other platforms. This entrenches Nvidia’s position and keeps its ecosystem strong.

    1. Because you have more important things to do than enriching lawyers or imposing petty restrictions on users of your code

      enriching lawyers

    1. How it works #shopify-section-template--21263627714877__afc91d2c-e257-4508-8d9c-930fa3090f7e .rich-text {background-image: linear-gradient(to bottom, #ffffff, #f3eef3); background-attachment: inherit;} .section-template--21263627714877__b32d5086-0266-4850-981d-c2aea8c1a822-padding { padding-top: 27px; padding-bottom: 60px; } @media screen and (min-width: 750px) { .section-template--21263627714877__b32d5086-0266-4850-981d-c2aea8c1a822-padding { padding-top: 36px; padding-bottom: 80px; } } 1. Become a member Purchase your annual membership ($79.99) and instantly add your Global Nomad Pass to your Apple or Google Wallet (just like a boarding pass!) Get your pass 2. Visit a local partner Each of our 500+ local partners offers an exclusive, member-only discount. See the full list of partners & discounts below or look for this sign in the partner’s window! View partners & discounts 3. Show your pass Just show your Global Nomad Pass in-store to a staff member before you pay. 4. Scan to redeem The staff member will point you to a QR code - simply scan it to redeem the discount and save money. That's it! Start saving money

      Distinguishable content - The content in this section is clearly laid out and includes preset links to the information pages being discussed in each point. This is good for web accessibility as it makes it easier to understand the section and where the users need to go to find specific information.

    1. Get a $15 online promo code towards your next purchase when you spend $100* using in-store pick up. Select brands & styles. January 23 – February 12

      Some promotional banners on the homepage, like the "Get a $15 online promo code..." banner have white text on bright backgrounds, making them easy to read. This is especially beneficial for users with low vision or colour blindness. Good design follows the perceivable principle, which talks about the importance of strong colour contrast. Having the text colour and adding a bright colour behind makes the banners easier to read for everyone.

    1. approaches. Qualitative research in anthropology aims to comprehensively describe humanbehavior and the contexts in which it occurs while quantitative research seeks patterns in numericaldata that can explain aspects of human behavior. Quantitative patterns can be gleaned from statisticalanalyses, maps, charts, graphs, and textual descriptions. Surveys are a common quantitative techniquethat usually involves closed-ended questions in which respondents select their responses from a list ofpre-defined choices such as their degree of agreement or disagreement, multiple-choice answers, andrankings of items. While surveys usually lack the sort of contextual detail associated with qualitativeresearch, they tend to be relatively easy to code numerically and, as a result, can be easier to analyzethan qualitative data. Surveys are also useful for gathering specific data points within a large popula-tion, something that is challenging to do with many qualitative techniques.

      I find it really interesting how anthropologists combine both qualitative and quantitative research to get a more complete picture of human behavior. It makes me think about how much of our daily lives can actually be measured with numbers, like food intake or physical activity, and how that data can be used to understand bigger cultural patterns. One thing I wonder is how do anthropologists balance the need for numerical data with the deeper, more personal insights that qualitative methods provide? Can numbers truly capture the complexity of human experiences, especially in something as culturally rich as food habits? I also think it’s fascinating how nutritional anthropology connects cultural and environmental factors to health. It makes me reflect on how different cultural diets affect people’s health and how my own eating habits are shaped by both culture and convenience.

    1. BBC Accessibility Help

      Robust: Compatibility with Assistive Technologies The website's structure and code are compatible with various assistive technologies, including screen readers and magnification tools. This is a good practice as it ensures that the content is accessible across different devices and assistive technologies, fulfilling the robust principle.

    1. Join Our Monthly Newsletter !function(o,t,e,a) { o._aoForms=o._aoForms||[],o._aoForms.push(a); var n=function() { var o=t.createElement(e); o.src="https://info.eco.ca/acton/content/form_embed.js", o.async=!0; for(var a=t.getElementsByTagName(e)[0],n=a.parentNode,c=document.getElementsByTagName("script"),r=!1,s=0;s<c.length;s++) { if(c[s].getAttribute("src")==o.getAttribute("src"))r=!0; } r ? typeof(_aoFormLoader) != "undefined" ? _aoFormLoader.load({id:"848d016b-1834-4191-98c4-2e9c48ba54a2:d-0034",accountId:"42902",domain:"info.eco.ca",isTemp:false,noStyle:false,prefill:false}) : "" : n.insertBefore(o,a) }; window.attachEvent ? window.attachEvent("onload",n) : window.addEventListener("load",n,!1),n() } (window,document,"script",{id:"848d016b-1834-4191-98c4-2e9c48ba54a2",accountId:"42902",domain:"info.eco.ca",isTemp:false,noStyle:false,prefill:false}); wp.i18n.setLocaleData( { 'text direction\u0004ltr': [ 'ltr' ] } ); ( function( domain, translations ) { var localeData = translations.locale_data[ domain ] || translations.locale_data.messages; localeData[""].domain = domain; wp.i18n.setLocaleData( localeData, domain ); } )( "default", {"translation-revision-date":"2024-11-04 22:51:08+0000","generator":"GlotPress\/4.0.1","domain":"messages","locale_data":{"messages":{"":{"domain":"messages","plural-forms":"nplurals=2; plural=n != 1;","lang":"en_CA"},"Notifications":["Notifications"]}},"comment":{"reference":"wp-includes\/js\/dist\/a11y.js"}} ); var gform_i18n = {"datepicker":{"days":{"monday":"Mo","tuesday":"Tu","wednesday":"We","thursday":"Th","friday":"Fr","saturday":"Sa","sunday":"Su"},"months":{"january":"January","february":"February","march":"March","april":"April","may":"May","june":"June","july":"July","august":"August","september":"September","october":"October","november":"November","december":"December"},"firstDay":1,"iconText":"Select date"}}; var gf_legacy_multi = []; var gform_gravityforms = {"strings":{"invalid_file_extension":"This type of file is not allowed. Must be one of the following:","delete_file":"Delete this file","in_progress":"in progress","file_exceeds_limit":"File exceeds size limit","illegal_extension":"This type of file is not allowed.","max_reached":"Maximum number of files reached","unknown_error":"There was a problem while saving the file on the server","currently_uploading":"Please wait for the uploading to complete","cancel":"Cancel","cancel_upload":"Cancel this upload","cancelled":"Cancelled"},"vars":{"images_url":"https:\/\/eco.ca\/wp-content\/plugins\/gravityforms\/images"}}; var gf_global = {"gf_currency_config":{"name":"Canadian Dollar","symbol_left":"$","symbol_right":"CAD","symbol_padding":" ","thousand_separator":",","decimal_separator":".","decimals":2,"code":"CAD"},"base_url":"https:\/\/eco.ca\/wp-content\/plugins\/gravityforms","number_formats":[],"spinnerUrl":"https:\/\/eco.ca\/wp-content\/plugins\/gravityforms\/images\/spinner.svg","version_hash":"55292f67cda3dd157894b17079a94e93","strings":{"newRowAdded":"New row added.","rowRemoved":"Row removed","formSaved":"The form has been saved. The content contains the link to return and complete the form."}}; #gform_wrapper_18[data-form-index="0"].gform-theme,[data-parent-form="18_0"]{--gf-color-primary: #204ce5;--gf-color-primary-rgb: 32, 76, 229;--gf-color-primary-contrast: #fff;--gf-color-primary-contrast-rgb: 255, 255, 255;--gf-color-primary-darker: #001AB3;--gf-color-primary-lighter: #527EFF;--gf-color-secondary: #fff;--gf-color-secondary-rgb: 255, 255, 255;--gf-color-secondary-contrast: #112337;--gf-color-secondary-contrast-rgb: 17, 35, 55;--gf-color-secondary-darker: #F5F5F5;--gf-color-secondary-lighter: #FFFFFF;--gf-color-out-ctrl-light: rgba(17, 35, 55, 0.1);--gf-color-out-ctrl-light-rgb: 17, 35, 55;--gf-color-out-ctrl-light-darker: rgba(104, 110, 119, 0.35);--gf-color-out-ctrl-light-lighter: #F5F5F5;--gf-color-out-ctrl-dark: #585e6a;--gf-color-out-ctrl-dark-rgb: 88, 94, 106;--gf-color-out-ctrl-dark-darker: #112337;--gf-color-out-ctrl-dark-lighter: rgba(17, 35, 55, 0.65);--gf-color-in-ctrl: #fff;--gf-color-in-ctrl-rgb: 255, 255, 255;--gf-color-in-ctrl-contrast: #112337;--gf-color-in-ctrl-contrast-rgb: 17, 35, 55;--gf-color-in-ctrl-darker: #F5F5F5;--gf-color-in-ctrl-lighter: #FFFFFF;--gf-color-in-ctrl-primary: #204ce5;--gf-color-in-ctrl-primary-rgb: 32, 76, 229;--gf-color-in-ctrl-primary-contrast: #fff;--gf-color-in-ctrl-primary-contrast-rgb: 255, 255, 255;--gf-color-in-ctrl-primary-darker: #001AB3;--gf-color-in-ctrl-primary-lighter: #527EFF;--gf-color-in-ctrl-light: rgba(17, 35, 55, 0.1);--gf-color-in-ctrl-light-rgb: 17, 35, 55;--gf-color-in-ctrl-light-darker: rgba(104, 110, 119, 0.35);--gf-color-in-ctrl-light-lighter: #F5F5F5;--gf-color-in-ctrl-dark: #585e6a;--gf-color-in-ctrl-dark-rgb: 88, 94, 106;--gf-color-in-ctrl-dark-darker: #112337;--gf-color-in-ctrl-dark-lighter: rgba(17, 35, 55, 0.65);--gf-radius: 3px;--gf-font-size-secondary: 14px;--gf-font-size-tertiary: 13px;--gf-icon-ctrl-number: url("data:image/svg+xml,%3Csvg width='8' height='14' viewBox='0 0 8 14' fill='none' xmlns='http://www.w3.org/2000/svg'%3E%3Cpath fill-rule='evenodd' clip-rule='evenodd' d='M4 0C4.26522 5.96046e-08 4.51957 0.105357 4.70711 0.292893L7.70711 3.29289C8.09763 3.68342 8.09763 4.31658 7.70711 4.70711C7.31658 5.09763 6.68342 5.09763 6.29289 4.70711L4 2.41421L1.70711 4.70711C1.31658 5.09763 0.683417 5.09763 0.292893 4.70711C-0.0976311 4.31658 -0.097631 3.68342 0.292893 3.29289L3.29289 0.292893C3.48043 0.105357 3.73478 0 4 0ZM0.292893 9.29289C0.683417 8.90237 1.31658 8.90237 1.70711 9.29289L4 11.5858L6.29289 9.29289C6.68342 8.90237 7.31658 8.90237 7.70711 9.29289C8.09763 9.68342 8.09763 10.3166 7.70711 10.7071L4.70711 13.7071C4.31658 14.0976 3.68342 14.0976 3.29289 13.7071L0.292893 10.7071C-0.0976311 10.3166 -0.0976311 9.68342 0.292893 9.29289Z' fill='rgba(17, 35, 55, 0.65)'/%3E%3C/svg%3E");--gf-icon-ctrl-select: url("data:image/svg+xml,%3Csvg width='10' height='6' viewBox='0 0 10 6' fill='none' xmlns='http://www.w3.org/2000/svg'%3E%3Cpath fill-rule='evenodd' clip-rule='evenodd' d='M0.292893 0.292893C0.683417 -0.097631 1.31658 -0.097631 1.70711 0.292893L5 3.58579L8.29289 0.292893C8.68342 -0.0976311 9.31658 -0.0976311 9.70711 0.292893C10.0976 0.683417 10.0976 1.31658 9.70711 1.70711L5.70711 5.70711C5.31658 6.09763 4.68342 6.09763 4.29289 5.70711L0.292893 1.70711C-0.0976311 1.31658 -0.0976311 0.683418 0.292893 0.292893Z' fill='rgba(17, 35, 55, 0.65)'/%3E%3C/svg%3E");--gf-icon-ctrl-search: url("data:image/svg+xml,%3Csvg version='1.1' xmlns='http://www.w3.org/2000/svg' width='640' height='640'%3E%3Cpath d='M256 128c-70.692 0-128 57.308-128 128 0 70.691 57.308 128 128 128 70.691 0 128-57.309 128-128 0-70.692-57.309-128-128-128zM64 256c0-106.039 85.961-192 192-192s192 85.961 192 192c0 41.466-13.146 79.863-35.498 111.248l154.125 154.125c12.496 12.496 12.496 32.758 0 45.254s-32.758 12.496-45.254 0L367.248 412.502C335.862 434.854 297.467 448 256 448c-106.039 0-192-85.962-192-192z' fill='rgba(17, 35, 55, 0.65)'/%3E%3C/svg%3E");--gf-label-space-y-secondary: var(--gf-label-space-y-md-secondary);--gf-ctrl-border-color: #686e77;--gf-ctrl-size: var(--gf-ctrl-size-md);--gf-ctrl-label-color-primary: #112337;--gf-ctrl-label-color-secondary: #112337;--gf-ctrl-choice-size: var(--gf-ctrl-choice-size-md);--gf-ctrl-checkbox-check-size: var(--gf-ctrl-checkbox-check-size-md);--gf-ctrl-radio-check-size: var(--gf-ctrl-radio-check-size-md);--gf-ctrl-btn-font-size: var(--gf-ctrl-btn-font-size-md);--gf-ctrl-btn-padding-x: var(--gf-ctrl-btn-padding-x-md);--gf-ctrl-btn-size: var(--gf-ctrl-btn-size-md);--gf-ctrl-btn-border-color-secondary: #686e77;--gf-ctrl-file-btn-bg-color-hover: #EBEBEB;--gf-field-img-choice-size: var(--gf-field-img-choice-size-md);--gf-field-img-choice-card-space: var(--gf-field-img-choice-card-space-md);--gf-field-img-choice-check-ind-size: var(--gf-field-img-choice-check-ind-size-md);--gf-field-img-choice-check-ind-icon-size: var(--gf-field-img-choice-check-ind-icon-size-md);--gf-field-pg-steps-number-color: rgba(17, 35, 55, 0.8);} "*" indicates required fields .ao2gf-848d016b-1834-4191-98c4-2e9c48ba54a2 { font-size: 11pt; font-family: 'Open Sans', sans-serif; background-image: none; margin: 0px; padding: 0px; background-repeat: no-repeat; background-size: auto; background-position: center center; .ao2gf-848d016b-1834-4191-98c4-2e9c48ba54a2.ao2gf input, .ao2gf textarea, .ao2gf select{ background-color: #FFFFFF; border-color: #CCCCCC; border-width: 1px; color: ; font-size: inherit; font-family: inherit; } .ao2gf-848d016b-1834-4191-98c4-2e9c48ba54a2.ao2gf input:focus, .ao2gf textarea:focus, .ao2gf select:focus{ border-color: #3B99FC; } .ao2gf-848d016b-1834-4191-98c4-2e9c48ba54a2.ao2gf input.ao2gf-error, .ao2gf textarea.ao2gf-error, .ao2gf select.ao2gf-error{ border-color: #FF0000; border-width: 1px; } .ao2gf-848d016b-1834-4191-98c4-2e9c48ba54a2.ao2gf span.ao2gf-error-message{ color: #FF0000; font-size: 11px; } .ao2gf-848d016b-1834-4191-98c4-2e9c48ba54a2.ao2gf ::-webkit-input-placeholder { color: darkgrey; font-size: inherit; font-family: inherit; text-align: inherit; } .ao2gf-848d016b-1834-4191-98c4-2e9c48ba54a2.ao2gf ::-moz-placeholder { color: darkgrey; font-size: inherit; font-family: inherit; text-align: inherit; } .ao2gf-848d016b-1834-4191-98c4-2e9c48ba54a2.ao2gf :-ms-input-placeholder { color: darkgrey; font-size: inherit; font-family: inherit; text-align: inherit; } .ao2gf-848d016b-1834-4191-98c4-2e9c48ba54a2.ao2gf :-moz-placeholder { color: darkgrey; font-size: inherit; font-family: inherit; text-align: inherit; } .ao2gf_input_672f909ba8bac { background-color: rgb(57, 155, 55); background-image: none; background-repeat: no-repeat; background-size: auto; background-position: center center; color: rgb(255, 255, 255); border-radius: 6px; display: inline-block; text-decoration: none; font-size: 12pt; font-weight: normal; font-style: normal; border-style: solid; border-color: transparent; border-width: 0px; padding: 10px; } First Name*Last Name*Email Address* Company NameCompany Name

      Labeled Forms: The forms on Eco Canada are well-labeled, which I think will allow screen readers to correctly identify input fields (name, email, company, etc) and guide users through the process. This is an essential feature that ensures everyone can complete forms without confusion and join the Eco Canada community.

    1. SOLID principles (Single Responsibility, Open/Closed, Liskov Substitution, Interface Segregation, Dependency Inversion) are design principles that promote maintainable and flexible code.

      Reading this makes me want to write more functional code

    2. Test-Driven Development (TDD) is a software development approach that emphasizes writing tests before writing the actual code.

      "Get Ahead of Development" was what Thor on Youtube used to talk about.

      Yea that's not how I develop something, I start with a function and print out a bunch of stuff from inside it, I write a test or way to run it manually, then run it over until the print statements all make sense. I can may use a testing framework for that or I might just have a bash command I run over and over that I save in the docs.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      In the manuscript the authors describe a new pipeline to measure changes in vasculature diameter upon optogenetic stimulation of neurons. The work is useful to better understand the hemodynamic response on a network /graph level.

      Strengths:

      The manuscript provides a pipeline that allows to detect changes in the vessel diameter as well as simultaneously allows to locate the neurons driven by stimulation.

      The resulting data could provide interesting insights into the graph level mechanisms of regulating activity dependent blood flow.

      Weaknesses:

      (1) The manuscript contains (new) wrong statements and (still) wrong mathematical formulas.

      The symbols in these formulas have been updated to disambiguate them, and the accompanying statements have been adjusted for clarity.

      (2) The manuscript does not compare results to existing pipelines for vasculature segmentation (opensource or commercial). Comparing performance of the pipeline to a random forest classifier (illastik) on images that are not preprocessed (i.e. corrected for background etc.) seems not a particularly useful comparison.

      We’ve now included comparisons to Imaris (a commercial) for segmentation and VesselVio (open-source) for graph extraction software.

      For the ilastik comparison, the images were preprocessed prior to ilastik segmentation, specifically by doing intensity normalization.

      Example segmentations utilizing Imaris have now been included. Imaris leaves gaps and discontinuities in the segmentation masks, as shown in Supplementary Figure 10. The Imaris segmentation masks also tend to be more circular in cross-section despite irregularities on the surface of the vessels observable in the raw data and identified in manual segmentation. This approach also requires days to months to generate per image stack.

      “Comparison with commercial and open-source vascular analysis pipelines

      To compare our results with those achievable on these data with other pipelines for segmentation and graph network extraction, we compared segmentation results qualitatively with Imaris version 9.2.1 (Bitplane) and vascular graph extraction with VesselVio [1]. For the Imaris comparison, three small volumes were annotated by hand to label vessels. Example slices of the segmentation results are shown in Supplementary Figure 10. Imaris tended to either over- or under-segment vessels, disregard fine details of the vascular boundaries, and produce jagged edges in the vascular segmentation masks. In addition to these issues with segmentation mask quality, manual segmentation of a single volume took days for a rater to annotate. To compare to VesselVio, binary segmentation masks (one before and one after photostimulation) generated with our deep learning models were loaded into VesselVio for graph extraction, as VesselVio does not have its own method for generating segmentation masks. This also facilitates a direct comparison of the benefits of our graph extraction pipeline to VesselVio. Visualizations of the two graphs are shown in Supplementary Figure 11. Vesselvio produced many hairs at both time points, and the total number of segments varied considerably between the two sequential stacks: while the baseline scan resulted in 546 vessel segments, the second scan had 642 vessel segments. These discrepancies are difficult to resolve in post-processing and preclude a direct comparison of individual vessel segments across time. As the segmentation masks we used in graph extraction derive from the union of multiple time points, we could better trace the vasculature and identify more connections in our extracted graph. Furthermore, VesselVio relies on the distance transform of the user supplied segmentation mask to estimate vascular radii; consequently, these estimates are highly susceptible to variations in the input segmentation masks.We repeatedly saw slight variations between boundary placements of all of the models we utilized (ilastik, UNet, and UNETR) and those produced by raters. Our pipeline mitigates this segmentation method bias by using intensity gradient-based boundary detection from centerlines in the image (as opposed to using the distance transform of the segmentation mask, as in VesselVio).”

      (3) The manuscript does not clearly visualize performance of the segmentation pipeline (e.g. via 2d sections, highlighting also errors etc.). Thus, it is unclear how good the pipeline is, under what conditions it fails or what kind of errors to expect.

      On reviewer’s comment, 2D slices have been added in the Supplementary Figure 4.

      (4) The pipeline is not fully open-source due to use of matlab. Also, the pipeline code was not made available during review contrary to the authors claims (the provided link did not lead to a repository). Thus, the utility of the pipeline was difficult to judge.

      All code has been uploaded to Github and is available at the following location: https://github.com/AICONSlab/novas3d

      The Matlab code for skeletonization is better at preserving centerline integrity during the pruning of hairs from centerlines than the currently available open-source methods.

      - Generalizability: The authors addressed the point of generalizability by applying the pipeline to other data sets. This demonstrates that their pipeline can be applied to other data sets and makes it more useful.  However, from the visualizations it's unclear to see the performance of the pipeline, where the pipelines fails etc. The 3d visualizations are not particularly helpful in this respect . In addition, the dice measure seems quite low, indicating roughly 20-40% of voxels do not overlap between inferred and ground truth. I did not notice this high discrepancy earlier. A thorough discussion of the errors appearing in the segmentation pipeline would be necessary in my view to better assess the quality of the pipeline.

      2D slices from the additional datasets have been added in the Supplementary Figure 13 to aid in visualizing the models’ ability to generalize to other datasets.

      The dice range we report on (0.7-0.8) is good when compared to those (0.56-86) of 3D segmentations of large datasets in microscopy [2], [3], [4], [5], [6]. Furthermore, we had two additional raters segment three images from the original training set. We found that the raters had a mean inter class correlation  of 0.73 [7]. Our model outperformed this Dice score on unseen data: Dice scores from our generalizability tests on C57 mice and Fischer rats on par or higher than this baseline.

      Reviewer #2 (Public review):<br /> The authors have addressed most of my concerns sufficiently. There are still a few serious concerns I have. Primarily, the temporal resolution of the technique still makes me dubious about nearly all of the biological results. It is good that the authors have added some vessel diameter time courses generated by their model. But I still maintain that data sampling every 42 seconds - or even 21 seconds - is problematic. First, the evidence for long vascular responses is lacking. The authors cite several papers:

      Alarcon-Martinez et al. 2020 show and explicitly state that their responses (stimulus-evoked) returned to baseline within 30 seconds. The responses to ischemia are long lasting but this is irrelevant to the current study using activated local neurons to drive vessel signals.

      Mester et al. 2019 show responses that all seem to return to baseline by around 50 seconds post-stimulus.

      In Mester et al. 2019, diffuse stimulations with blue light showed a return to baseline around 50 seconds post-stimulus (cf. Figure 1E,2C,2D). However, focal stimulations where the stimulation light is raster scanned over a small region focused in the field of view show longer-lasting responses (cf. Figure 4) that have not returned to baseline by 70 seconds post-stimulus [8]. Alarcon-Martinez et al. do report that their responses return baseline within 30 seconds; however, their physiological stimulation may lead to different neuronal and vessel response kinetics than those elicited by the optogenetic stimulations as in current work.

      O'Herron et al. 2022 and Hartmann et al. 2021 use opsins expressed in vessel walls (not neurons as in the current study) and directly constrict vessels with light. So this is unrelated to neuronal activity-induced vascular signals in the current study.

      We agree that optogenetic activation of vessel-associated cells is distinct from optogenetic activation of neurons, but we do expect the effects of such perturbations on the vasculature to have some commonalities.

      There are other papers including Vazquez et al 2014 (PMID: 23761666) and Uhlirova et al 2016 (PMID: 27244241) and many others showing optogenetically-evoked neural activity drives vascular responses that return back to baseline within 30 seconds. The stimulation time and the cell types labeled may be different across these studies which can make a difference. But vascular responses lasting 300 seconds or more after a stimulus of a few seconds are just not common in the literature and so are very suspect - likely at least in part due to the limitations of the algorithm.

      The photostimulation in Vazquez et al. 2014 used diffuse photostimulation with a fiberoptic probe similar to Mester et al. 2019 as opposed to raster scanning focal stimulation we used in this study and in the study by Mester et al. 2019  where we observed the focal photostimulation to elicited longer than a minute vascular responses. Uhlirova et al. 2016 used photostimulation powers between 0.7 and 2.8 mW, likely lower than our 4.3 mW/mm2 photostimulation. Further, even with focal photostimulation, we do see light intensity dependence of the duration of the vascular responses. Indeed, in Supplementary Figure 2, 1.1 mW/mm2 photostimulation leads to briefer dilations/constrictions than does 4.3 mW/mm2; the 1.1 mW/mm2 responses are in line, duration wise, with those in Uhlirova et al. 2016.

      Critically, as per Supplementary Figure 2, the analysis of the experimental recordings acquired at 3-second temporal resolution did likewise show responses in many vessels lasting for tens of seconds and even hundreds of seconds in some vessels.

      Another major issue is that the time courses provided show that the same vessel constricts at certain points and dilates later. So where in the time course the data is sampled will have a major effect on the direction and amplitude of the vascular response. In fact, I could not find how the "response" window is calculated. Is it from the first volume collected after the stimulation - or an average of some number of volumes? But clearly down-sampling the provided data to 42 or even 21 second sampling will lead to problems. If the major benefit to the field is the full volume over large regions that the model can capture and describe, there needs to be a better way to capture the vessel diameter in a meaningful way.

      In the main experiment (i.e. excluding the additional experiments presented in the Supplementary Figure 2 that were collected over a limited FOV at 3s per stack), we have collected one stack every 42 seconds. The first slice of the volume starts following the photostimulation, and the last slice finishes at 42 seconds. Each slice takes ~0.44 seconds to acquire. The data analysis pipeline (as demonstrated by the Supplementary Figure 2) is not in any way limited to data acquired at this temporal resolution and - provided reasonable signal-to-noise ratio (cf. Figure 5) - is applicable, as is, to data acquired at much higher sampling rates.

      It still seems possible that if responses are bi-phasic, then depth dependencies of constrictors vs dilators may just be due to where in the response the data are being captured - maybe the constriction phase is captured in deeper planes of the volume and the dilation phase more superficially. This may also explain why nearly a third of vessels are not consistent across trials - if the direction the volume was acquired is different across trials, different phases of the response might be captured.

      Alternatively, like neuronal responses to physiological stimuli, the vascular responses elicited by increases in neuronal activity may themselves be variable in both space and time.

      I still have concerns about other aspects of the responses but these are less strong. Particularly, these bi-phasic responses are not something typically seen and I still maintain that constrictions are not common. The authors are right that some papers do show constriction. Leaving out the direct optogenetic constriction of vessels (O'Herron 2022 & Hartmann 2021), the Alarcon-Martinez et al. 2020 paper and others such as Gonzales et al 2020 (PMID: 33051294) show different capillary branches dilating and constricting. However, these are typically found either with spontaneous fluctuations or due to highly localized application of vasoactive compounds. I am not familiar with data showing activation of a large region of tissue - as in the current study - coupled with vessel constrictions in the same region. But as the authors point out, typically only a few vessels at a time are monitored so it is possible - even if this reviewer thinks it unlikely - that this effect is real and just hasn't been seen.

      Uhlirova et al. 2016 (PMID: 27244241) observed biphasic responses in the same vessel with optogenetic stimulation in anesthetized and unanesthetized animals (cf Fig 1b and Fig 2, and section “OG stimulation of INs reproduces the biphasic arteriolar response”). Devor et al. (2007) and Lindvere et al. (2013) also reported on constrictions and dilations being elicited by sensory stimuli.

      I also have concerns about the spatial resolution of the data. It looks like the data in Figure 7 and Supplementary Figure 7 have a resolution of about 1 micron/pixel. It isn't stated so I may be wrong. But detecting changes of less than 1 micron, especially given the noise of an in vivo prep (brain movement and so on), might just be noise in the model. This could also explain constrictions as just spurious outputs in the model's diameter estimation. The high variability in adjacent vessel segments seen in Figure 6C could also be explained the same way, since these also seem biologically and even physically unlikely.

      Thank you for your comment. To address this important issue, we performed an additional validation experiment where we placed a special order of fluorescent beads with a known diameter of 7.32 ± 0.27um, imaged them following our imaging protocol, and subsequently used our pipeline to estimate their diameter. Our analysis converged on the manufacturer-specified diameters, estimating the diameter to be 7.34 ± 0.32. The manuscript has been updated to detail this experiment, as below:

      Methods section insert

      “Second, our boundary detection algorithm was used to estimate the diameters of fluorescent beads of a known radius imaged under similar acquisition parameters. Polystyrene microspheres labelled with Flash Red (Bangs Laboratories, inc, CAT# FSFR007) with a nominal diameter of 7.32um and a specified range of 7.32 ± 0.27um as determined by the manufacturer using a Coulter counter were imaged on the same multiphoton fluorescence microscope set-up used in the experiment (identical light path, resonant scanner, objective, detector, excitation wavelength and nominal lateral and axial resolutions, with 5x averaging). The images of the beads had a higher SNR than our images of the vasculature, so Gaussian noise was added to the images to degrade the SNR to the same level of that of the blood vessels. The images of the beads were segmented with a threshold, centroids calculated for individual spheres, and planes with a random normal vector extracted from each bead and used to estimate the diameter of the beads. The same smoothing and PSF deconvolution steps were applied in this task. We then reported the mean and standard deviation of the distribution of the diameter estimates. A variety of planes were used to estimate the diameters.”

      Results Section Insert

      “Our boundary detection algorithm successfully estimated the radius of precisely specified fluorescent beads. The bead images had a signal-to-noise ratio of 6.79 ± 0.16 (about 35% higher than our in vivo images): to match their SNR to that of in vivo vessel data, following deconvolution, we added Gaussian noise with a standard deviation of 85 SU to the images, bringing the SNR down to 5.05 ± 0.15. The data processing pipeline was kept unaltered except for the bead segmentation, performed via image thresholding instead of our deep learning model (trained on vessel data). The bead boundary was computed following the same algorithm used on vessel data: i.e., by the average of the minimum intensity gradients computed along 36 radial spokes emanating from the centreline vertex in the orthogonal plane. To demonstrate an averaging-induced decrease in the uncertainty of the bead radius estimates on a scale that is finer than the nominal resolution of the imaging configuration, we tested four averaging levels in 289 beads. Three of these averaging levels were lower than that used on the vessels, and one matched that used on the vessels (36 spokes per orthogonal plane and a minimum of 10 orthogonal planes per vessel). As the amount of averaging increased, the uncertainty on the diameter of the beads decreased, and our estimate of the bead's diameter converged upon the manufacturer's Coulter counter-based specifications (7.32 ± 0.27um), as tabulated in Table 1.”

      Reviewer #1 (Recommendations for the authors):

      Comments to the authors replies to the reviews:

      - Supplementary Figure 13:

      As indicated before the 3d images + scale makes it impossible to judge the quality of the outputs.

      As aforementioned, 2D slices have been added to the Supplementary Figure 13.

      - Supplementary Table 3:

      There is a significant increase in the Hausdorrf and Mean Surface Distance measures for the new data, why ?

      A single vessel being missed by either the rater or the model would significantly affect the Hausdorff distance (HD) and by extension Mean Surface Distance: this is particularly pertinent in the LSFM image with its much larger FOV and thus a potential for much larger max distances to result from missed vessels in the prediction or ground truth data. Large Hausdorff distances may indicate a vessel was missed in either the ground truth or the segmentation mask.

      Of note, a different rater annotated these additional datasets from the raters labeling the ground truth data. There is a high variability in boundary placements between raters. On a test where three raters segmented the same three images from the original dataset, we computed a ICC of 0.73 across their segmentations. Our model Dice scores on predictions in out-of-distribution data sets were on par with the inter-rater ICC on the Thy1ChR2 2PFM data.

      - Supplementary Figure 2: The authors provide useful data on the time responses. However, looking at those figures, it is puzzling why certain vessels were selected as responding as there seems almost no change after stimulation. In addition, some of the responses seem to actually start several tens of seconds before the actual stimulus (particularly in A).

      Only some traces in C and D (dark blue) seem to be actually responding vessels.

      This is not discussed and unclear.

      Supplementary Figure 2 displays the time courses of vessel calibre for all vessels in the FOV, not just those deemed responders.

      The aforementioned effects are due to the loess smoothing filter having been applied to the time courses for the preliminary response, which has been rectified in the updated figures. In particular, Supplementary Figure 2 has been updated with separate loess smoothing before and after photostimulation. The (pre-stimulation) effect is gone once the loess smoothing has been separated.

      - R Point 7: As indicated before and in agreement with the alternative reviewer, the quality of the results in 3d is difficult to judge. No 2d sections that compare 'ground truth' with inferred results are shown in the current manuscript which would enable a much better judgment. The provided video is still 3d and not a video going through 2d slices. Also, in the video the overlap of vasculature and raw data seems to be very good and near 100%, why is the dice measure reported earlier so low ? Is this a particularly good example ?

      Some examples, indicating where the pipeline fails (and why) would be helpful to see, to judge its performance better (ideally in 2d slices).

      As discussed in the public comments, the 2D slices are now included in Suppl. Fig. 4 and suppl. Fig 13 to facilitate visual assessment. The vessels are long and thin so that slight dilations or constrictions impact the Dice scores without being easily visualizable.

      - Author response images 6 and 7. From the presented data the constrictions measured in the smaller vessels may be a result (at least partly) of noise. This seems to be particularly the case in Author response image 7 left top and bottom for example. It would be helpful to show the actual estimates of the vessels radii overlaid in the (raw) images. In some of the pictures the noise level seems to reach higher values than the 10-20% of noise used in the tests by the authors in the revision.

      The vessel radii are estimated as averages across all vertices of the individual vessels: it is thus not possible to overlay them meaningfully in 2D slices: in Figure 2B, we do show a rendering of sample vessel-wise radii estimates.

      - "We tested the centerline detection in Python, scipy (1.9.3) and Matlab. We found that the Matlab implementation performed better due to its inclusion of a branch length parameter for the identification of terminal branches, which greatly reduced the number of false branches; the Python implementation does not include this feature (in any version) and its output had many more such "hair" artifacts. Clearmap skeletonization uses an algorithm by Palagyi & Kuba(1999) to thin segmentation masks, which does not include hair removal. Vesselvio uses a parallelized version of the scipy implementation of Lee et al. (1994) algorithm which does not do hair removal based on a terminal branch length filter; instead, Vesselvio performs a threshold-based hair removal that is frequently overly aggressive (it removes true positive vessel branches), as highlighted by the authors."

      This statement is wrong. The removal of small branches in skeletons is algorithmically independent of the skeletonization algorithm itself. The authors cite a reference concerned with the algorithm they are currently employing for the skeletonization. Careful assessment of that reference shows that this algorithm removes small length branches after skeletonization is performed. This feature is available in open-source packages as well, or could be easily implemented.

      We appreciate that skeletonization is distinct from hair removal and have reworded this paragraph for clarity. We are currently working with SciPy developers to implement hair removal in their image processing pipeline so as to render our pipeline fully open-source.

      The removal of hairs after skeletonization with length based thresholding leads to the possibility of removing parts of centerlines in the main part of vessels after branch points with hairs. The Matlab implementation does not do this and leaves the main branches intact.

      This text has been updated to:

      “Hair” segments shorter than 20 μm and terminal on one end were iteratively removed, starting with the shortest hairs and merging the longest hairs at junctions with 2 terminal branches with the main vessel branch to reduce false positive vascular branches and minimize the amount of centerlines removed. This iterative hair removal functionality of the skeletonization algorithm is currently unavailable in python, but is available in Matlab [9].

      - "On the reviewer's comment, we did try inputting normalized images into Ilastik, but this did not improve its results." This is surprising. Reasonable standard preprocessing (e.g. background removal, equalization, and vessel enhancement) would probably restore most of illastik's performance in the indicated panel.

      While the improvement may be present in a particular set of images, the generalizability of such improvement to other patches is often poor in our experience, as reflected by aforementioned results and the widespread uptake of DL approaches to image segmentation. The in vivo datasets also contain artifacts arising from eg. bleeding into the FOV that ilastik is highly sensitive to. This is an example of noise that is not easily removed by standard preprocessing.

      - "Typical pre-processing/standard computer vision techniques with parameter tuning do not generalize on out-of-distribution data with different image characteristics, motivating the shift to DL-based approaches."

      I disagree with this statement. DL approaches can generalize typically when trained with sufficient amount of diverse data. However, DL approaches can also fail with new out of distribution data. In that situation they only be 'rescued' via new time intensive data generation and retraining. Simple standard image pre-processing steps (e.g. to remove background or boost vessel structures) have well defined parameter that can be easily adapted to new out of distribution data as clear interpretations are available. The time to adapt those parameters is typically much smaller than retraining of DL frameworks.

      We find that the standard image processing approaches with parameter tuning work robustly only if fine-tuned on individual images; i.e., the fine-tuning does not generalize across datasets. This approach thus does not scale to experiments yielding large image sizes/having high throughput experiments. While DL models may not generalize to out-of-distribution data, fine-tuning DL models with a small subset of labels generally produce superior models to parameter tuning that can be applied to entire studies. Moreover, DL fine-tuning is typically an efficient process due to very limited labelling and training time required.

      - It is still unclear how the authors pipeline performs compared with other (open source or commercially) available pipelines. As indicated before, comparing to illastik, particularly when feeding non preprocessed data, does not seem to be a particularly high bar.

      This question has also been raised by the other reviewer who asked to compare to commercially available pipelines.

      This question was not answered by the authors, and instead the authors reply by claiming to provide an open source pipeline. In fact, the use of matlab in their pipeline does not make it fully open-source either. Moreover, as mentioned before, open-source pipelines for comparisons do exists.

      As discussed above, the manuscript now includes comparisons to Imaris for segmentation and Vesselvio for graph extraction. The pipeline is on github.

      -"We agree with the review that this question is interesting; however, it is not addressable using present data: activated neuronal firing will have effects on their postsynaptic neighbors, yet we have no means of measuring the spread of activation using the current experimental model."

      Distances to the closest neuron in the manuscript are measured without checking if it's active. Thus, distances to the first set of n neurons could be measured in the same way, ignoring activation effects.

      Shorter distances to an entire ensemble of neurons would still be (more) informative of metabolic demands.

      This could indeed be done within the existing framework. The connected-components-3d can be used to extract individual occurrences of neurons in the FOV from the neuron segmentation mask. Each neuron could then have its distance calculated to each point on the vessel centerlines.

      - model architecture:

      It is unclear from the description if any positional encoding was used for the image patches.

      It is unclear if the architecture / pipeline can handle any volume sizes or is trained on a fixed volume shapes? In the latter case how is the pipeline applied?

      The model includes positional encoding, as described in Hatamizadeh et al. 2021.

      The model can be applied to images of any size, as demonstrated on larger images in Supplementary Figure 9 and on smaller images in Supplementary Figure 2. The pipeline is applied in the same way. It will read in the size of an input image and output an image of the same size.

      - transformer models often show better results when using a learning rate scheduler that adjust the learning rate (up and down ramps typically). Did the authors test such approaches?

      We did not use a learning rate scheduler, as we found we were getting good results without using one.

      - formula (4): The 95% percentile of two numbers is the max, and thus (5) is certainly not what the HD95 metric is. The formula is simply wrong as displayed.

      Thank you. The formula has been updated.

      - formula (5): formula 5 is certainly wrong: n_X, n_y are either integer numbers as indicated by the sum indices or sets when used in the distances, but can't be both at the same time.

      Thank you for your comment. The Formula has been updated.

      - The statement:

      "this functionality of the skeletonization algorithm is currently unavailable in any python implementation, but is available in Matlab [56]."

      is not correct (see reply above)

      Please see the response above. This text has been updated to:

      “Hair” segments shorter than 20 μm and terminal on one end were iteratively removed, starting with the shortest hairs and merging the longest hairs at junctions with 2 terminal branches with the main vessel branch to reduce false positive vascular branches and minimize the amount of centerlines removed. This iterative hair removal functionality of the skeletonization algorithm is currently unavailable in Python, but is available in Matlab [9].

      - the centerline extraction is performed after taking the union of smoothed masks. The union operation can induce novel 'irregular' boundaries that degrade skeletonization performance. I would expect to apply smoothing after the union?

      Indeed the images were smoothed via dilation after taking the union, as described in the previous set of responses to private comments.

      - "The radius estimate defined the size of the Gaussian kernel that was convolved with the image to smooth the vessel: smaller vessels were thus convolved with narrower kernels."

      It's unclear what image were filtered ?

      We have updated this text for clarity:

      The radius estimate defined the size of the Gaussian kernel that was convolved with the 2D image slice to smooth the vessel: smaller vessels were thus convolved with narrower kernels.

      - Was deconvolution on the raw images applied or after Gaussian filtering ?

      The deconvolution was applied before Gaussian filtering.

      - ",we extracted image intensities in the orthogonal plane from the deconvolved raw registered image. A 2D Gaussian kernel with sigma equal to 80% of the estimated vessel-wise radius was used to low-pass filter the extracted orthogonal plane image and find the local signal intensity maximum searching, in 2D, from the center of the image to the radius of 10 pixels from the center."

      Would it not be better to filter the 3d image before extracting a 2d plane and filter then ?

      That could be done, but would incur a significant computational speed penalty. 2D convolutions are faster, and produced excellent accuracy when estimating radii in our bead experiment.

      What algorithm was used to obtain the 2d images.

      The 2d images were obtained using scipy.ndimage.map_coordinates.

      - Figure 2: H is this the filtered image or the raw data ?

      Panel H is raw data.

      - It would be good to see a few examples of the raw data overlaid with the radial estimates to evaluate the approach (beyond the example in K).

      Additional examples are shown in Figure 5.

      - Figure 2 K: Why are boundary points greater than 2 standard deviations away from the mean excluded ?

      They are excluded to account for irregularities as vessels approach junctions [10], [11] REF.

      - Figure 2 L: what exactly is plotted here ? What are vertex wise changes, is that the difference between the minimum and maximum of all the detected radii for a single vertex? Why do some vessels (red) show high values consistently throughout the vessel ?

      Figure 2L displays change in the radius of vertices - in this FOV- following photostimulation in relation to baseline.

      - Assortativity: to calculate the assortativity, are radius changes binned in any form to account for the fact that otherwise, $e_{xy}$ and related measures will be likely be based on single data points?

      Assortativity is not calculated from single data points. It can be calculated by either binning into categories or computing it on scalars i.e. average radius across a vessel segment:

      See here for info on calculating assortativity from binned categories (ie classifying a vessel as a constrictor, dilator or non-responder):

      https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.assortativity.attribute_assortativity_coefficient.html#networkx.algorithms.assortativity.attribute_assortativity_coefficient

      And see here for calculating assortativity from scalar values:

      https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.assortativity.numeric_assortativity_coefficient.html#networkx.algorithms.assortativity.numeric_assortativity_coefficient

      We calculated the assortativity using scalar values.

      In both cases, one uses all nodes and calculates the correlation between each node and its neighbours with an attribute that can be binned or is a scalar. Binning the value on a given node would not affect the number of nodes in a graph.

      - "Ilastik tended to over-segment vessels, i.e. the model returned numerous false positives, having a high recall (0.89{plus minus}0.19) but low precision (0.37{plus minus}0.33) (Figure 3, Supplementary Table 3)."

      As indicated before, and looking at Figure 4, over segmentation seems due to too high background. A suggested preprocessing step on the raw images to remove background could have avoided this.

      The images were normalized in preprocessing.

      - Figure 4: The 3d panels are not much easier to read in the revised version. As suggested by other reviewers, 2d sections indicating the differences and errors would be much more helpful to judge the pipelines quality more appropriately.

      As discussed above, 2D sections are now available in a supplementary figure.

      - Figure 3: What would be the dice score (and other measures) between two ground truths extracted by two annotations by two humans (assisted e.g. by illastik).

      Two additional human rates annotated images. We observed a ICC of 0.73 across a total of three raters on the three images.

      - Figure 5: The authors only provide the absolute value of SU for the sigma noise levels. This only has some meaning when compared to the mean or median SU of the images. In the text the maximal intensity of 1023 SU is mentioned, but what are those values in images with weaker / smaller vessels (as provided in the constriction examples in the revision)/

      I am unclear why this validation figure should be part of the main manuscript while generalization performance is left out.

      The manuscript has been updated with the mean SNR value of 5.05 ± 0.15 to provide context for the quality of our images.

      Bibliography

      (1) J. R. Bumgarner and R. J. Nelson, “Open-source analysis and visualization of segmented vasculature datasets with VesselVio,” Cell Rep. Methods, vol. 2, no. 4, Apr. 2022, doi: 10.1016/j.crmeth.2022.100189.

      (2) G. Tetteh et al., “DeepVesselNet: Vessel Segmentation, Centerline Prediction, and Bifurcation Detection in 3-D Angiographic Volumes,” Front. Neurosci., vol. 14, Dec. 2020, doi: 10.3389/fnins.2020.592352.

      (3) N. Holroyd, Z. Li, C. Walsh, E. Brown, R. Shipley, and S. Walker-Samuel, “tUbe net: a generalisable deep learning tool for 3D vessel segmentation,” Jul. 24, 2023, bioRxiv. doi: 10.1101/2023.07.24.550334.

      (4) W. Tahir et al., “Anatomical Modeling of Brain Vasculature in Two-Photon Microscopy by Generalizable Deep Learning,” BME Front., vol. 2020, p. 8620932, Dec. 2020, doi: 10.34133/2020/8620932.

      (5) R. Damseh, P. Delafontaine-Martel, P. Pouliot, F. Cheriet, and F. Lesage, “Laplacian Flow Dynamics on Geometric Graphs for Anatomical Modeling of Cerebrovascular Networks,” ArXiv191210003 Cs Eess Q-Bio, Dec. 2019, Accessed: Dec. 09, 2020. [Online]. Available: http://arxiv.org/abs/1912.10003

      (6) T. Jerman, F. Pernuš, B. Likar, and Ž. Špiclin, “Enhancement of Vascular Structures in 3D and 2D Angiographic Images,” IEEE Trans. Med. Imaging, vol. 35, no. 9, pp. 2107–2118, Sep. 2016, doi: 10.1109/TMI.2016.2550102.

      (7) T. B. Smith and N. Smith, “Agreement and reliability statistics for shapes,” PLOS ONE, vol. 13, no. 8, p. e0202087, Aug. 2018, doi: 10.1371/journal.pone.0202087.

      (8) J. R. Mester et al., “In vivo neurovascular response to focused photoactivation of Channelrhodopsin-2,” NeuroImage, vol. 192, pp. 135–144, May 2019, doi: 10.1016/j.neuroimage.2019.01.036.

      (9) T. C. Lee, R. L. Kashyap, and C. N. Chu, “Building Skeleton Models via 3-D Medial Surface Axis Thinning Algorithms,” CVGIP Graph. Models Image Process., vol. 56, no. 6, pp. 462–478, Nov. 1994, doi: 10.1006/cgip.1994.1042.

      (10) M. Y. Rennie et al., “Vessel tortuousity and reduced vascularization in the fetoplacental arterial tree after maternal exposure to polycyclic aromatic hydrocarbons,” Am. J. Physiol.-Heart Circ. Physiol., vol. 300, no. 2, pp. H675–H684, Feb. 2011, doi: 10.1152/ajpheart.00510.2010.

      (11) J. Steinman, M. M. Koletar, B. Stefanovic, and J. G. Sled, “3D morphological analysis of the mouse cerebral vasculature: Comparison of in vivo and ex vivo methods,” PLOS ONE, vol. 12, no. 10, p. e0186676, Oct. 2017, doi: 10.1371/journal.pone.0186676.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Point-by-point response to the public review:

      General Comment: “Using computational modeling, this manuscript explores the effect of growth feedback on the performance of gene networks capable of adaptation. The authors selected 425 hypothetical synthetic circuits that were shown to achieve nearly perfect adaptation in two earlier computational studies (see Ma et al. 2009, and Shi et al. 2017). They examined the effects of cell growth feedback by introducing additional terms to the ordinary differential equation-based models, and performed numerical simulations to check the retainment and the loss of the adaptation responses of the circuits in the presence of growth feedback. The authors show that growth feedback can disrupt the gene network adaptation dynamics in different ways, and report some exceptional core motifs which allow for robust performance in the presence of growth feedback. They also used a metric to establish a scaling law between a circuit robustness measure and the strength of growth feedback. These results have important implications in the field of synthetic biology, where unforeseen interactions between designed gene circuits and the host often disrupt the desired behavior. The paper’s conclusions are supported by their simulation results, although these are presented in their summary formats and it would be useful for the community if the detailed results for each topology were available as a supplementary file or through the authors’ GitHub repository.”

      We are grateful for the referee’s positive evaluation of our work. We have updated our GitHub and OSF repositories with detailed results for each topology. Additionally, we have included other simulation codes, result data, and detailed explanations in these two repositories that may be of interest to our readers.

      Strength 1: “This work included a detailed investigation of the reasons for adaptation failure upon introducing cell growth to the systems. The comprehensiveness of the analysis makes the work stand out among studies of functional screening of network topologies of gene regulation.”

      We are grateful for the referee’s positive assessment of our work, notably the recognition of the ‘detailed investigation’ we conducted, and the ‘comprehensiveness of the analysis’ we provided.

      Strength 2: “The authors’ approaches for assessment of robustness, such as the survival ratio Q, can be useful for a wide range of topologies beyond adaptation. The scaling law obtained with those approaches is interesting.”

      We are grateful for the referee’s positive evaluation of our defined factors for assessing circuit robustness. We also appreciate the acknowledgment of the “interesting” nature of the scaling law we discovered using the assessment factor R.

      Weaknesses 1: “The title suggests that the work investigates the ’effects of growth feedback on gene circuits’. However, the performance of ’nearly perfect adaptation’ was chosen for the majority of the work, leaving the question of whether the authors’ conclusion regarding the effects of growth feedback is applicable to other functional networks.”

      We agree that our present title can be too broad, and we have changed it from “Effects of growth feedback on gene circuits: A dynamical understanding” to “Effects of growth feedback on adaptive gene circuits: A dynamical understanding”. Although we have some brief results and discussions on the gene circuits with bistability, we admit that most of our results and discussions are focused on circuits that have adaptation.

      The new title is more specific and should be a more appropriate summary of the paper.

      Weaknesses 2: “This work relies extensively on an earlier study, evaluating only a selected set of 425 topologies that were shown to give adaptive responses (Shi et al., 2017). This limited selection has two potential issues. First, as the authors mentioned in the introduction, growth feedback can also induce emerging dynamics even without existing function-enabling gene circuits, as an example of the ”effects of growth feedback on gene circuits”. Limiting the investigation to only successful circuits for adaptation makes it unclear whether growth feedback can turn the circuits that failed to produce adaptation by themselves into adaptation-enabling circuits. Secondly, as the Shi et al. (2017) study also used numerical experiments to achieve their conclusions about successful topologies, it is unclear whether the numerical experiments in the present study are compatible with the earlier work regarding the choice of equation forms and ranges of parameter values. The authors also assumed that all readers have sufficient understanding of the 425 topologies and their derivation before reading this paper.”

      We agree with the reviewer that several issues need to be clarified in our new manuscript. We have added new discussions for all of them.

      We agree with the reviewer that growth feedback could turn the non-adaptive circuits into adaptationenabling circuits, and this indeed presents a compelling topic for future research. We have added the following discussions to our paper, talking about a relevant matter. We find that in our simulated dataset, there are cases where a higher degree of growth feedback can restore the adaptation that has been lost in a circuit. However, as we discussed in this new paragraph, a comprehensive study in the direction of turning non-adaptive circuits into adaptation-enabling circuits will “require entirely different approaches for sampling circuit parameters and selecting candidate network topologies, demanding significantly high computational costs.” Given that this topic extends beyond the scope of the current paper, we leave this matter to future research.

      “Although the primary focus of this paper is on how growth feedback can undermine an originally adaptive circuit and how to design circuits that are robust against such feedback, our simulated dataset reveals instances where growth feedback can benefit the circuit within certain ranges. Specifically, we identified 2,092 circuits across 306 different topologies where adaption, lost at an intermediate level of growth feedback, is restored at higher levels. This is 1.4% of all circuits tested. We anticipate that additional circuits exhibiting this loss-and-recovery behavior exist, as our sampling of six discrete levels of k<sub>g</sub> (0,0.2,0.4,0.6,0.8,1.0) might have overlooked numerous cases. This result again suggests the possible advantages of growth feedback in gene circuits (Tan et al., 2009; Nevozhay et al., 2012; Deris et al., 2013; Feng et al., 2014; Melendez-Alvarez and Tian, 2022). A comprehensive study into how growth feedback can endow or enhance adaption in circuits would require entirely different approaches for sampling circuit parameters and selecting candidate network topologies, demanding significantly high computational costs. Given that this topic extends beyond the scope of the current paper, we leave this matter to future research.”

      We have added the following discussions about the reasoning behind using the 425 network topologies selected from the study Shi et al. (2017).

      “We use these 425 network topologies from the study (Shi et al., 2017), avoiding redundancy with established results. Due to the unique focus of our research on the effects of growth feedback and the need to evaluate quantitative ratios of robust circuits among all functional ones, we have chosen to use a 20-fold increase in the number of random parameter sets for each network topology compared to the simulations in (Shi et al., 2017). This approach makes it computationally prohibitive to scan all possible 16,038 three-node circuits. We carefully follow the settings in (Shi et al., 2017), which also analyzed TRNs with the AND logic as in this paper. Detailed descriptions of our simulation experiments are provided in the Methods section. To make our results more convincing, we have adopted a set of adaptation criteria that are stricter than those used in (Shi et al., 2017). Consequently, the ratio of adaptive circuits is somewhat lower in our study, with 4 out of the 425 network topologies not demonstrating adaptation.”

      Other than the more strict adaptation criteria and much larger sampling sizes, as we mentioned in this paragraph, we have carefully followed the simulation details of the study Shi et al. (2017). This includes but is not limited to: the dynamical equations (when k<sub>g</sub> = 0), the input signals, the scales and ranges of the circuit parameters to be randomly sampled, and the sampling method (Latin hypercube sampling). One of the authors of the current paper was also the first author of the study Shi et al. (2017), who helped us verify the details of simulations (among many other contributions). These identical settings justify our usage of the established results with the 425 network topologies.

      To provide more information about these 425 network topologies, We have added the following introduction. It introduces the structural features of the networks, especially the shared core motifs for adaptation. In our GitHub and OSF repositories, we have also provided relevant data about the 425 topologies, including the topology structures and the parameter sets we scanned.

      “These topologies can be classified into two families based on the core topology: networks with a negative feedback loop (NFBL) and networks with an incoherent feed-forward loop (IFFL) (Shi et al., 2017). More specifically, there are 206 network topologies in the NFBL family. All of these NFBL topologies have a negative feedback loop for node B. This negative feedback loop can be formed by the loop from node B to A and back to B (such as the circuit shown in Fig. 1 (a)), by node B to C and back to B, or by a longer route, from node B to A and then to C and back to B. There is always a self-activation link from B to B in all these 206 NFBL networks. There are 219 network topologies in the IFFL family. All of them have two feed-forward pathways from the input node A to the output node C. One pathway goes from node A to C directly, while the other involves node B in the middle. One of the pathways is activating while the other one is inhibitory.”

      Weaknesses 3: “The authors’ model does not describe the impact of growth via a biological mechanism: they model growth as an additional dilution rate and calculate growth rate based on a phenomenological description with growth rate occurring at a maximum (k<sub>g</sub>) scaled by the circuit ’burden’ b(t). Therefore, the authors’ model does not capture potential growth rate changes in parameter values (e.g., synthetic protein production falls with increasing growth rate; see Scott & Hwa, 2023).”

      In our paper, we consider dilution due to cell growth as the dominant factor of growth feedback. Here we compared the adaptive circuits under no-growth conditions and their ability to maintain their adaptive behaviors after dilution into a fresh medium, which mediated a significant dilution to the circuits. This is based on our previous work, Zhang, et al. Nature chemical biology 16.6 (2020): 695-701. We agree that an increased growth rate can change synthetic protein production. However, the dynamic roles of the dilution and growthaffected production rate should be analogous, given that they both act as inhibitory factors arising from cell growth as mentioned by the reviewer. Still, we agree that taking the growth effect on the production rate into account would provide a more comprehensive study, but it is beyond the scope of the present work. We have added the following paragraph in the Discussion section of our paper.

      “In our paper, we consider dilution due to cell growth as the dominant factor of growth feedback. Here we compared the adaptive circuits under no-growth conditions and their ability to maintain their adaptive behaviors after dilution into a fresh medium, which mediated a significant dilution to the circuits. This is based on our previous work (Zhang et al. (2020)). However, growth feedback is inherently complex (Klumpp et al. (2009)). For instance, an increased growth rate can change protein synthesis rate (Hintsche and Klumpp (2013); Scott and Hwa (2023)), and cell growth rates can affect the distribution of protein expression in cell populations (Gouda et al. (2019)). In our paper, we concentrate on a simplified model with dilution, which we consider to have captured the dominant factor. The dynamic roles of the dilution and growth-affected production rate should be analogous, given that they both act as inhibitory factors arising from cell growth. Incorporating the impact of growth rate on protein synthesis into our model would offer a more comprehensive analysis, a task beyond the scope of this paper but presenting an intriguing opportunity for future research to address the complexities of growth feedback.”

      Weaknesses 4: “The authors made several claims about the bifurcations (infinite-period, saddle-node, etc) underlying the abrupt changes leading to failures of adaptations. There is a lack of evidence supporting these claims. Both local and global bifurcations can be demonstrated with semi-analytic approaches such as numerical continuation along with investigations of eigenvalues of the Jacobian matrix. The claims based on ODE solutions alone are not sound.”

      After our further simulations and verification, we found that most of the bifurcation-induced failures we mentioned in type-V and type-VI failures should be categorized as bistability or multistability-induced failures. They are still abrupt switching between adaptive and non-adaptive states, as we described in the previous version of the manuscript. However, they are actually still far away from the bifurcation points at the critical k<sub>g</sub>. We have corrected all relevant descriptions and figures, including panel Fig. 4 (c) and its captions. We have added the following paragraph in the paper to explain this issue.

      “One might expect bifurcations to play an important role in many type-V and type-VI failures. However, in our simulations, failures precisely at the bifurcation point are not observed. This is because the bifurcation points under consideration, such as fold bifurcations, are where one of the attraction basins diminishes to zero. For a failure to occur exactly at the bifurcation point, the initial condition would need to coincide precisely with the infinitesimally small basin just before it vanishes. More realistically, failures almost always largely precede the exact bifurcation point. They happen while the basin is still contracting and the basin boundary crosses the initial condition or O<sub>1</sub>. An example is shown in Fig. 4(b), where bistability persists, yet the lighter orange basin with a larger O<sub>1</sub>(C) cannot be reached as the boundary shifts away from the initial condition A<sub>0</sub> and B<sub>0</sub>. As another example, in Fig. 4 (c) from a different circuit, the higher O<sub>2</sub>(C) state disappears at k<sub>g</sub> ≈ 0.012 and switches to a lower O<sub>2</sub>(C), but this point is not a bifurcation.

      It is the point where the stable O<sub>1</sub> continuously crosses the basin boundary of O<sub>2</sub>.”

      Our further simulations have verified the existence of the oscillation-related bifurcations. We have added a new appendix discussing the phenomena associated with them in more detail.

      Weaknesses 5: “The impact of biochemical noise is not evaluated in this work; the author’s analysis is only carried out in a deterministic regime.”

      In this paper, we have not taken into account biochemical noise as we focus solely on scenarios where all protein concentrations are high. In these circumstances, the influence of noise is relatively minor. Incorporating biochemical noise, which originates from various sources and possesses diverse characteristics, would significantly complicate the analysis beyond the scope of our current work. However, exploring this aspect could be an intriguing avenue for future research. We have included the following discussions in our paper.

      “Our study focuses on scenarios where random noises are ignored. Realistically, gene circuits are subjected to diverse types of noise, which can complicate their predictable behavior and design. These noises can originate externally from a noisy input signal I, or intrinsically, directly affecting the circuit components. Further, these noises can be classified based on various mechanisms that cause them (Colin et al. (2017); Sartori and Tu (2011)) . And with different mechanisms, each type of noise can be characterized by different attributes such as frequency, amplitude, and noise color. These variances can lead to different impacts on the circuits, potentially necessitating unique mechanisms or designs for the attenuation of each category (Sartori and Tu (2011); Qiao et al. (2019) ). Given the extensive complexity and the need for thorough investigation, these noise-related challenges are beyond the scope of this paper and require a series of future studies.”

      Point-by-point response to the recommendations for the authors:

      Comment 1: - The authors’ github repository, detailed in their code availability statement, is currently unavailable and likely contains some of the answers to the queries here.

      We have updated our GitHub and OSF repositories with simulation codes, result data, and detailed explanations. The link to our GitHub repository in the previous version of the manuscript contained a format error, making it inaccessible to the referees. We apologize for this mistake and have corrected it.

      Comment 2:   - At present, it is not clear how the 425 topologies are created from the system of equations (Eq. 6-8) or from the circuit diagram in Fig 1a. This could do with being explicitly stated for the reader.

      We have added the following paragraph to discuss how the 425 topologies are selected and what the common motifs and connections they share.

      “Previous research identified 425 different three-node TRN network topologies that can achieve adaptation in the absence of growth feedback (Shi et al., 2017), providing the base of our computational study. These topologies can be classified into two families based on the core topology: networks with a negative feedback loop (NFBL) and networks with an incoherent feed-forward loop (IFFL) (Shi et al., 2017). More specifically, there are 206 network topologies in the NFBL family. All of these NFBL topologies have a negative feedback loop for node B. This negative feedback loop can be formed by the loop from node B to A and back to B (such as the circuit shown in Fig. 1 (a)), by node B to C and back to B, or by a longer route, from node B to A and then to C and back to B. There is always a self-activation link from B to B in all these 206 NFBL networks. There are 219 network topologies in the IFFL family. All of them have two feed-forward pathways from the input node A to the output node C. One pathway goes from node A to C directly, while the other involves node B in the middle. One of the pathways is activating while the other one is inhibitory. We use these 425 network topologies from the study (Shi et al., 2017), avoiding redundancy with established results. Due to the unique focus of our research on the effects of growth feedback and the need to evaluate quantitative ratios of robust circuits among all functional ones, we have chosen to use a 20-fold increase in the number of random parameter sets for each network topology compared to the simulations in (Shi et al., 2017). This approach makes it computationally prohibitive to scan all possible 16,038 three-node circuits. We carefully follow the settings in (Shi et al., 2017), which also analyzed TRNs with the AND logic as in this paper. Detailed descriptions of our simulation experiments are provided in the Methods section. To make our results more convincing, we have adopted a set of adaptation criteria that are stricter than those used in (Shi et al., 2017). Consequently, the ratio of adaptive circuits is somewhat lower in our study, with 4 out of the 425 network topologies not demonstrating adaptation.”

      Comment 3: - In the main text, the authors mentioned that they chose 425 network topologies for this study, whereas the number is 435 in the abstract. Please correct the error.

      The number 435 in our previous abstract referred to the 10 four-node circuits that we studied in the appendix, in addition to the 425 three-node network topologies. To avoid confusion and potential misunderstandings among readers, we have revised this expression of “435 distinct topological structures” to “more than four hundred topological structures”.

      Comment 4: - Please can the authors include the topologies they have studied in an appendix or as supplementary material. The impact of this work would increase significantly if for each topology the authors could include a pie chart similar to the one shown in Fig 2 so that others can use these results.

      We fully acknowledge the potential benefits of providing simulation results for each topology. However, including over four hundred more figures in this paper is not feasible. Moreover, we expect that many readers may also be interested in results not only for individual topologies but also for subsets sharing specific motifs or regulatory connections. Therefore, we have provided all the necessary data and codes in our GitHub repository to make these pie charts. We have included a detailed guide on how to generate these pie charts in the GitHub Readme file. These allow readers to plot the pie chart and extract distributions for any individual topology or use conditions to filter any subset of topologies as required. We believe this approach offers greater flexibility for our readers. We have also added the following explanation in the Methods section.

      “The codes implementing these criteria are available in our GitHub repository, with the link provided in the ”Code Availability” section. The failure type results for all circuits tested are available in our OSF repository, with the link provided in the ”Data Availability” section. An additional note is provided in the README file of our GitHub repository for further guidance on generating pie charts similar to Fig. 2 for any network topology or subset of topologies.”

      Comment 5: - At present, the authors have not given sufficient detail for their numerical methods (e.g. to identify bistability or oscillations) to enable the work to be repeated. I would appreciate it if the authors could expand their Methods section or provide a description of their method as an appendix. Additionally, the authors must clarify how many parameter sets per topology showed successful adaptation.

      In response to this comment, we have reorganized and expanded our Methods section, especially the new “Numerical simulations of circuit dynamics” and “Numerical criteria for functional adaptation and failure types” subsections. We added details on how we define and evaluate a “relatively steady state”, how to determine if there is an oscillation, how to determine the critical k<sub>g</sub> value, and how to determine if a failure is continuous or abrupt. Readers can also find the corresponding codes in our GitHub repository, where we provide a README file to help the readers locate the script file they need.

      The number of parameter sets per topology showed successful adaptation is precisely our definition of the Q-value. Q-values of most of the circuits we tested are shown in multiple figures in the paper. A complete table of Q-values with different topologies and different k<sub>growth</sub> values can be found in our OSF repository.

      Comment 6: - Looking at the Model Description, there seem to be multiple issues, as follows. The model should be rewritten and all simulations redone with the model corrected as described below:

      (a) The ”strength of growth feedback” is modeled by the maximal growth parameter k<sub>g</sub> in Equation (12). However, this rate does not represent growth feedback. In fact, this parameter must be present also for the system without growth feedback, Equations (6 - 8), because those cells grow as well! So Equation (12) with b(t)=0 should also be added to Equations (6 - 8), in addition to the dilution terms in each equation.

      (b) The dilution due to growth (dN/dt)*(B/N) is only added to Equations (9 - 11). This is wrong - growthaffects (dilutes) all protein concentrations, even without growth feedback, so similar terms must be added even to equations without growth feedback, i.e., to Equations (6 - 8).

      (c) The term representing growth feedback is actually the fraction 1/(1+b(t)). To adjust the strength ofgrowth feedback, some parameters should be introduced into this term. Specifically, the term currently has a Hill form with Hill coefficient = 1 and sensitivity = 1. The term should be converted into a general Hill function, and the parameters of that function should be altered to represent growth feedback. This Hill function is called a cellular (phenotypic) fitness landscape, see Nevozhay et al., 2012.

      Equations (6-8) only describe one part of the entire model we are studying. We are having these equations presented solely for the purpose of not overwhelming readers with a large number of parameters that are defined for the first time. They are not actually used in our simulations, but were only for explanations of the meaning of parameters. In our simulations throughout the paper, we only used Eqs. (9-13) (with various topologies). We have revised the texts to make this point clear. We have added the following descriptions in the section Model Description:

      “In order not to overwhelm readers with too many terms and parameters, we first describe a partial model (an isolated circuit without growth feedback) before introducing the complete model that we study in this work.”

      “Equations. (9) to (13) are the dynamical equations we actually use for simulating the circuit dynamics.”

      Additionaly, in the newly added subsection “Numerical simulations of circuit dynamics683” in the Methods, we explicitly mention that:

      “The dynamical equations we use are similar to Eqs. (9-13) but with different topologies.”

      We consider dilution due to cell growth as the dominant factor of growth feedback. In fact, we study the adaptive circuits without growth and their ability to maintain their adaptive behaviors after dilution into a fresh medium, based on a recent work [Zhang, et al., Nature Chemical Biology 16.6 (2020): 695-701]. The dynamic roles of the dilution and growth-affected production rate should be analogous, given that they both act as inhibitory factors arising from cell growth. The term mentioned in the comment is about how the burden of the circuit affects cell growth. We agree that it can be interesting to have a more comprehensive study on how different degrees of nonlinearity of this term can have different effects on the overall robustness towards the growth feedback problem, but this is not part of our primary focus and is beyond the scope of this paper. In this paper, we are mostly concerned with the variability of the strength of the growth feedback/dilution, controlled by the parameter k<sub>g</sub>, instead of the different types of nonlinearity.

      Comment 7:  - On the right side of Equation (7), the first term should be inhibitory, right?

      This is indeed an error. We accidentally reversed the regulation from A to B and B to A when inputting the formula. We have corrected both terms.

      Comment 8: - It seems to me that a better transition from Figs 6 and 7 to Fig 8 can be made. Did the authors choose the three circuits in Fig 8 based on the three distinct groups shown in Fig 6 and 7? The rationale for choosing the three topologies given the clusters identified earlier can be explained more clearly.

      We agree more explanation can be provided here. We have added the following descriptions, in the caption of Fig.8:

      “The other three curves represent circuits with different robustness levels: high (Circuit No. 98), moderate (Circuit No. 3), and low (Circuit No. 28) values of R, to demonstrate that this scaling behavior is generic. Each of these three circuit topologies is selected from one of the three groups illustrated in Fig. 6 and Fig. 7, and they have the highest Q(k<sub>g</sub> = 0) value within their respective groups.”

      and in the main text:

      “The three other curves represent circuit topologies that have a relatively high, moderate, and low value R among the 425 topologies tested, to demonstrate that this scaling behavior is generic. (These three topologies are the highest Q(k<sub>g</sub> = 0) topology in each of the three groups shown in Fig. 6 and Fig. 7.”

      Comment 9: - The insights from the neural network model seem to be very limited. It would be interesting to see if the model can predict the performance of network topologies that have not been exposed to the model during training.

      Machine learning is not a focus of this paper. For the section the comment was referring to, the main research question is on the relationship between circuit robustness and topology, and the point we are trying to make is that the robustness dependency varies across different connections — some connections are critical, while others are less impactful. The neural-network-based analysis was only used to provide further support to this point by demonstrating that through optimization, neural networks automatically assign different levels of weights to different connections in the circuits.

      We agree that it can be an interesting topic to study how machine learning can be used to help us design functional and robust circuits, as discussed in the final paragraph of the Discussion section. However, such an investigation would require a series of more comprehensive and carefully designed simulation experiments to validate if “neural networks can predict the performance of network topologies that have not been exposed to the model during training”. One point one should take extra care of is that many network topologies we study are very similar to many others, with shared motifs and links. These considerations extend beyond the scope of this paper.

      Other potential improvements or future work

      Comment 10: - The growth feedback examined in this paper comes from the effect of protein levels on the cell division rate (growth rate). However, the opposite effect can also occur; cell growth rates can affect the distribution of protein expression in cell populations. A good reference is Kheir Gouda et al., which is already on the list of references. These opposite effects should be described and discussed.

      We agree that growth feedback is inherently complex and has many biological effects, and in our paper, we are using a simplified model to study the dominant factor of growth feedback. We have added the following paragraph in the Discussion section, which involves the opposite effect mentioned in the comment.

      “In our paper, we consider dilution due to cell growth as the dominant factor of growth feedback. Here we compared the adaptive circuits under no-growth conditions and their ability to maintain their adaptive behaviors after dilution into a fresh medium, which mediated a significant dilution to the circuits. This is based on our previous work (Zhang et al. (2020)). However, growth feedback is inherently complex (Klumpp et al. (2009)). For instance, an increased growth rate can change protein synthesis rate (Hintsche and Klumpp (2013); Scott and Hwa (2023)), and cell growth rates can affect the distribution of protein expression in cell populations (Gouda et al. (2019)). In our paper, we concentrate on a simplified model with dilution, which we consider to have captured the dominant factor. The dynamic roles of the dilution and growth-affected production rate should be analogous, given that they both act as inhibitory factors arising from cell growth. Incorporating the impact of growth rate on protein synthesis into our model would offer a more comprehensive analysis, a task beyond the scope of this paper but presenting an intriguing opportunity for future research to address the complexities of growth feedback.”

      Comment11: - It may be worth mentioning that growth feedback can lead to persistence, see PMID:27010473.

      We have included this research as a citation.

      Comment 12: - While some other networks (two-node) are discussed, it would be worth doing this analysis for all one- and two-node networks, perhaps controlled by small molecules added externally. If not here, then as a future plan.

      We agree that this is an interesting idea for future studies.

      Comment 13: - The manuscript analyzes the deterministic dynamics of a set of gene networks. However, gene expression is always stochastic, and gene circuits have been designed to control stochastic gene expression. For example, gene expression distributions can be reshaped, or even new peaks can appear, which would be worth mentioning, PMID: 30341217. The effect of growth feedback on stochastic gene expression and future perspectives of systematically studying this should be discussed.

      We have added the following paragraph in the Discussion section to discuss the effects of noises and stochasticity. The research mentioned in the comment is also included.

      “Our study focuses on scenarios where random noises are ignored. Realistically, gene circuits are subjected to diverse types of noise, which can complicate their predictable behavior and design. These noises can originate externally from a noisy input signal I, or intrinsically, directly affecting the circuit components. Further, these noises can be classified based on various mechanisms that cause them (Colin et al. (2017); Sartori and Tu (2011)). And with different mechanisms, each type of noise can be characterized by different attributes such as frequency, amplitude, and noise color. These variances can lead to different impacts on the circuits, potentially necessitating unique mechanisms or designs for the attenuation of each category (Sartori and Tu (2011); Qiao et al. (2019)). Given the extensive complexity and the need for thorough investigation, these noise-related challenges are beyond the scope of this paper and require a series of future studies.”

  4. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. Julia Evans. Examples of floating point problems. January 2023.

      The article gives examples to show problems like calculation errors, issues with comparisons, and how floating point representation affects algorithms. Evans stresses that developers should understand these risks to prevent bugs and make their code more reliable. In general, the article acts as a guide to spot and fix floating point problems in software development.

    2. W3Schools. Python Lists. URL: https://www.w3schools.com/python/python_lists.asp (visited on 2023-11-24). [d8] W3Schools. Python Tuples. URL: https://www.w3schools.com/python/python_tuples.asp (visited on 2023-11-24). [d9] W3Schools. Python Sets. URL: https://www.w3schools.com/python/python_sets.asp (visited on 2023-11-24).

      I like W3school very much. Because I have seen a lot of detailed knowledge about coding displayed in the website. It helps me a lot because I can look it up when I have questions. And it also has a code that I can try online. I think it makes me understand better.

    1. My students teli me about an iirportant new skill: it involves maintaining eyecontact with someone while you text ro*.o.r" else

      Very very unfortunately, tech has evolved in such torturous ways as to have me trying to make eye contact with a professor as I check my phone under the table for the necessary log-in code to pull up my note-taking software on my laptop. Dear lord, I just want to take notes! (I guess typing in a username and password just doesn't cut it these days---now we need both of those, an entire additional device, a cup of sugar and a forklift certification to even log into something anymore...)

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study offers a useful treatment of how the population of excitatory and inhibitory neurons integrates principles of energy efficiency in their coding strategies. The analysis provides a comprehensive characterisation of the model, highlighting the structured connectivity between excitatory and inhibitory neurons. However, the manuscript provides an incomplete motivation for parameter choices. Furthermore, the work is insufficiently contextualized within the literature, and some of the findings appear overlapping and incremental given previous work.

      We are genuinely grateful to the Editors and Reviewers for taking time to provide extremely valuable suggestions and comments, which will help us to substantially improve our paper. We decided to do our very best to implement all suggestions, as detailed in the point-by-point rebuttal letter below. We feel that our paper has improved considerably as a result. 

      Public Reviews:

      Reviewer #1 (Public Review): 

      Summary: Koren et al. derive and analyse a spiking network model optimised to represent external signals using the minimum number of spikes. Unlike most prior work using a similar setup, the network includes separate populations of excitatory and inhibitory neurons. The authors show that the optimised connectivity has a like-to-like structure, leading to the experimentally observed phenomenon of feature competition. They also characterise the impact of various (hyper)parameters, such as adaptation timescale, ratio of excitatory to inhibitory cells, regularisation strength, and background current. These results add useful biological realism to a particular model of efficient coding. However, not all claims seem fully supported by the evidence. Specifically, several biological features, such as the ratio of excitatory to inhibitory neurons, which the authors claim to explain through efficient coding, might be contingent on arbitrary modelling choices. In addition, earlier work has already established the importance of structured connectivity for feature competition. A clearer presentation of modelling choices, limitations, and prior work could improve the manuscript.

      Thanks for these insights and for this summary of our work.  

      Major comments:

      (1) Much is made of the 4:1 ratio between excitatory and inhibitory neurons, which the authors claim to explain through efficient coding. I see two issues with this conclusion: (i) The 4:1 ratio is specific to rodents; humans have an approximate 2:1 ratio (see Fang & Xia et al., Science 2022 and references therein); (ii) the optimal ratio in the model depends on a seemingly arbitrary choice of hyperparameters, particularly the weighting of encoding error versus metabolic cost. This second concern applies to several other results, including the strength of inhibitory versus excitatory synapses. While the model can, therefore, be made consistent with biological data, this requires auxiliary assumptions.

      We now describe better the ratio of numbers of E and I neurons found in real data, as suggested. The first submission already contained an analysis of how the optimal ratio of E vs I neuron numbers depends in our model on the relative weighting of the loss of E and I neurons and on the relative weighting of the encoding error vs the metabolic cost in the loss function (see Fig. 7E). We revised the text on page 12 describing Fig. 7E. 

      To allow readers to form easily a clear idea of how the weighting of the error vs the cost may influence the optimal network configuration, we now present how optimal parameters depend on the weighting in a systematic way, by always including this type of analysis when studying all other model parameters (time constants of single E and I neurons, noise intensity, metabolic constant, ratio of mean I-I to E-I connectivity). These results are shown on the Supplementary Fig. S4 A-D and H, and we comment briefly on each of them in Results sections (pages 9, 10, 11 and 12) that analyze each of these parameters.  

      Following this Reviewer’s comment, we now included a joint analysis of network performance relative to the ratio of E-I neuron numbers and the ratio of mean I-I to E-I connectivity (Fig. 7J). We found a positive correlation between optima values of these two ratios. This implies that a lower ratio of E-I neuron numbers, such as a 2:1 ratio in human cortex mentioned by the reviewer, predicts lower optimal ratio of I-I to E-I connectivity and thus weaker inhibition in the network. We made sure that this finding is suitably described in revision (page 13).

      (2) A growing body of evidence supports the importance of structured E-I and I-E connectivity for feature selectivity and response to perturbations. For example, this is a major conclusion from the Oldenburg paper (reference 62 in the manuscript), which includes extensive modelling work. Similar conclusions can be found in work from Znamenskiy and colleagues (experiments and spiking network model; bioRxiv 2018, Neuron 2023 (ref. 82)), Sadeh & Clopath (rate network; eLife, 2020), and Mackwood et al. (rate network with plasticity; eLife, 2021). The current manuscript adds to this evidence by showing that (a particular implementation of) efficient coding in spiking networks leads to structured connectivity. The fact that this structured connectivity then explains perturbation responses is, in the light of earlier findings, not new.

      We agree that the main contribution of our manuscript in this respect is to show how efficient coding in spiking networks can lead to structured connectivity implementing lateral inhibition similar to that proposed in the recent studies mentioned by the Reviewer. We apologize if this was not clear enough in the previous version. We streamlined the presentation to make it clearer in revision.  We nevertheless think it useful to report the effects of perturbations within this network because these results give information about how lateral inhibition works in our network. Thus, we kept presenting it in the revised version, although we de-emphasized and simplified its presentation. We now give more emphasis to the novelty of the derivation of this connectivity rule from the principles of efficient coding (pages 4 and 6). We also describe better (page 8) what the specific results of our simulated perturbation experiments add to the existing literature.

      (3) The model's limitations are hard to discern, being relegated to the manuscript's last and rather equivocal paragraph. For instance, the lack of recurrent excitation, crucial in neural dynamics and computation, likely influences the results: neuronal time constants must be as large as the target readout (Figure 4), presumably because the network cannot integrate the signal without recurrent excitation. However, this and other results are not presented in tandem with relevant caveats.

      We improved the Limitations paragraph in Discussion, and also anticipated caveats in tandem with results when needed, as suggested. 

      We now mention the assumption of equal time constants between the targets and readouts in the Abstract. 

      We now added the analysis of the network performance and dynamics as a function of the time constant of the target (t<sub>x</sub>) to the Supplementary Fig S5 (C-E). These results are briefly discussed in text on page 13. The only measure sensitive to t<sub>x</sub> is the encoding error of E neurons, with a minimum at t<sub>x</sub> =9 ms, while I neurons and metabolic cost show no dependency. Firing rates, variability of spiking as well as the average and instantaneous balance show no dependency on t<sub>x</sub>. We note that t<sub>x</sub> = t, with t=1/l the time constant of the population readout (Eq. 9), is an assumption we use when we derive the model from the efficiency objective (Eq. 18 to 23). In our new and preliminary work (Koren, Emanuel, Panzeri, Biorxiv 2024), we derived a more general class of models where this assumption is relaxed, which gives a network with E-E connectivity that adapts to the time constant of the stimulus. Thus, the reviewer is correct in the intuition that the network requires E-E connectivity to better integrate target signals with a different time constant than the time constant of the membrane. We now better emphasize this limitation in Discussion (page 16).

      (4) On repeated occasions, results from the model are referred to as predictions claimed to match the data. A prediction is a statement about what will happen in the future – but most of the “predictions” from the model are actually findings that broadly match earlier experimental results, making them “postdictions”.

      This distinction is important: compared to postdictions, predictions are a much stronger test because they are falsifiable. This is especially relevant given (my impression) that key parameters of the model were tweaked to match the data.

      We now comment on every result from the model as either matching earlier experimental results, or being a prediction for experiments. 

      In Section “Assumptions and emergent properties of the efficient E-I network derived from first principles”, we report (page 4) that neural networks have connectivity structure that relates to tuning similarity of neurons (postdiction). 

      In Section “Encoding performance and neural dynamics in an optimally efficient E-I network” we report (page 5) that in a network with optimal parameters, I neurons have higher firing rate than E neurons (postdiction), that single neurons show temporally correlated synaptic currents (postdiction) and that the distribution of firing rates across neurons is log-normal (postdiction). 

      In Section “Competition across neurons with similar stimulus tuning emerging in efficient spiking networks” we report (page 6)  that the activity perturbation of E neurons induces lateral inhibition on other E neurons, and that the strength of lateral inhibition depends on tuning similarity (postdiction). We show that activity perturbation of E neurons induces lateral excitation in I neurons (prediction). We moreover show that the specific effects of the perturbation of neural activity rely on structured E-I-E connectivity (prediction for experiments, but similar result in Sadeh and Clopath, 2020). We show strong voltage correlations but weak spike-timing correlations in our network (prediction for experiments, but similar result in Boerlin et al. 2013). 

      In Section “The effect of structured connectivity on coding efficiency and neural dynamics”, we report (page 7) that our model predicts a number of differences between networks with structured and unstructured (random) connectivity. In particular, structured networks differ from unstructured ones by showing better encoding performance, lower metabolic cost, weaker variance over time in the membrane potential of each neuron, lower firing rates and weaker average and instantaneous balance of synaptic currents.

      In Section “Weak or no spike-triggered adaptation optimizes network efficiency”, we report (page 9) that our model predicts better encoding performance in networks with adaptation compared to facilitation. Our results suggest that adaptation should be stronger in E compared to I (PV+) neurons (postdiction). In the same section, we report (page 10) that our results suggest that the instantaneous balance is a better predictor of model efficiency than average balance (prediction).

      In Section “Non-specific currents regulate network coding properties”, we report (page 10) that our model predicts that more than half of the distance between the resting potential and firing threshold is taken by external currents that are unrelated to feedforward processing (postdiction). We also report (page 11) that our model predicts that moderate levels of uncorrelated (additive) noise is beneficial for efficiency (prediction for experiments, but similar results in Chalk et al., 2016, Koren et al., 2017, Timcheck et al. 2022).

      In Section “Optimal ratio of E-I neuron numbers and of mean I-I to E-I synaptic efficacy coincide with biophysical measurements”, we predict the optimal ratio of E to I neuron numbers to be 4:1 (postdiction) and the optimal ratio of mean I-I to E-I connectivity to be 3:1 (postdiction). Further, we report (page 13) that our results predict that a decrease in the ratio of E-I neuron numbers is accompanied with the decrease in the ratio of mean I-I to E-I connectivity. 

      Finally, in Section “Dependence of efficient coding and neural dynamics on the stimulus statistics”, we report (page 13) that our model predicts that the efficiency of the network has almost no dependence on the time scale of the stimulus (prediction). 

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors present a biologically plausible, efficient E-I spiking network model and study various aspects of the model and its relation to experimental observations. This includes a derivation of the network into two (E-I) populations, the study of single-neuron perturbations and lateral-inhibition, the study of the effects of adaptation and metabolic cost, and considerations of optimal parameters. From this, they conclude that their work puts forth a plausible implementation of efficient coding that matches several experimental findings, including feature-specific inhibition, tight instantaneous balance, a 4 to 1 ratio of excitatory to inhibitory neurons, and a 3 to 1 ratio of I-I to E-I connectivity strength. It thus argues that some of these observations may come as a direct consequence of efficient coding.

      Strengths:

      While many network implementations of efficient coding have been developed, such normative models are often abstract and lacking sufficient detail to compare directly to experiments. The intention of this work to produce a more plausible and efficient spiking model and compare it with experimental data is important and necessary in order to test these models.

      In rigorously deriving the model with real physical units, this work maps efficient spiking networks onto other more classical biophysical spiking neuron models. It also attempts to compare the model to recent single-neuron perturbation experiments, as well as some longstanding puzzles about neural circuits, such as the presence of separate excitatory and inhibitory neurons, the ratio of excitatory to inhibitory neurons, and E/I balance. One of the primary goals of this paper, to determine if these are merely biological constraints or come from some normative efficient coding objective, is also important.

      Though several of the observations have been reported and studied before (see below), this work arguably studies them in more depth, which could be useful for comparing more directly to experiments.

      Thanks for these insights and for the kind words of appreciation of the strengths of our work.  

      Weaknesses:

      Though the text of the paper may suggest otherwise, many of the modeling choices and observations found in the paper have been introduced in previous work on efficient spiking models, thereby making this work somewhat repetitive and incremental at times. This includes the derivation of the network into separate excitatory and inhibitory populations, discussion of physical units, comparison of voltage versus spike-timing correlations, and instantaneous E/I balance, all of which can be found in one of the first efficient spiking network papers (Boerlin et al. 2013), as well as in subsequent papers. Metabolic cost and slow adaptation currents were also presented in a previous study (Gutierrez & Deneve 2019). Though it is perfectly fine and reasonable to build upon these previous studies, the language of the text gives them insufficient credit.

      We indeed built our work on these important previous studies, and we apologize if this was not clear enough. We thus improved the text to make sure that credit to previous studies is more precisely and more clearly given (see detailed reply for the list of changes made). 

      To facilitate the understanding on how we built on previous work, we expanded the comparison of our results with the results of Boerlin et al. (2013) about voltage correlations and uncorrelated spiking (page 7), comparison with the derivation of physical units of Boerlin et al. (2013) (page 3), discussion of how results on the ratio of the number of E to I neurons relate  to Calaim et al (2022) and Barrett et al. (2016) (page 16), and comment on the previous work by Gutierrez and Deneve about adaptation (page 8).  

      Furthermore, the paper makes several claims of optimality that are not convincing enough, as they are only verified by a limited parameter sweep of single parameters at a time, are unintuitive and may be in conflict with previous findings of efficient spiking networks. This includes the following. 

      Coding error (RMSE) has a minimum at intermediate metabolic cost (Figure 5B), despite the fact that intuitively, zero metabolic cost would indicate that the network is solely minimizing coding error and that previous work has suggested that additional costs bias the output. 

      Coding error also appears to have a minimum at intermediate values of the ratio of E to I neurons (effectively the number of I neurons) and the number of encoded variables (Figures 6D, 7B). These both have to do with the redundancy in the network (number of neurons for each encoded variable), and previous work suggests that networks can code for arbitrary numbers of variables provided the redundancy is high enough (e.g., Calaim et al. 2022). 

      Lastly, the performance of the E-I variant of the network is shown to be better than that of a single cell type (1CT: Figure 7C, D). Given that the E-I network is performing a similar computation as to the 1CT model but with more neurons (i.e., instead of an E neuron directly providing lateral inhibition to its neighbor, it goes through an interneuron), this is unintuitive and again not supported by previous work. These may be valid emergent properties of the E-I spiking network derived here, but their presentation and description are not sufficient to determine this.

      With regard to the concern that our previous analyses considered optimal parameter sets determined with a sweep of a single parameter at a time, we have addressed this issue in two ways. First, we presented (Figure 6I and 7J and text on pages 11 and 13) results of joint sweeps of variations of pairs of parameters whose joint variations are expected to influence optimality in a way that cannot be understood varying one parameter at a time. These new analyses complement the joint parameter sweep of the time constants of single E and I neurons (t<sub>r</sub><sup>E</sup> and t<sub>r</sub><sup>I</sup>) that has already been presented in Fig. 5A (former Fig. 4A). Second, we conducted, within a reasonable/realistic range of possible variations of each individual parameter, a Monte-Carlo random joint sampling (10000 simulations with 20 trials each) of all 6 model parameters that we explored in the paper. We presented these new results on Fig. 2 and discuss it on pages 5-6. 

      The Reviewer is correct in stating that the error (RMSE) exhibits a counterintuitive minimum as a function of the metabolic constant despite the fact that, intuitively, for vanishing metabolic constant the network is solely minimizing the coding error (Fig. 6B). In our understanding, this counterintuitive finding is due to the presence of noise in the membrane potential dynamics. In the presence of noise, a non-vanishing metabolic constant is needed to suppress “inefficient” spikes purely induced by noise that do not contribute to coding and increase the error. This gives rise to a form of “stochastic resonance”, where the noise improves detection of the signal coming from the feedforward currents. We note that the metabolic constant and the noise variance both appear in the non-specific external current (Eq. 29f in Methods), and, thus, a covariation in their optimal values is expected. Indeed, we find that the optimal metabolic constant monotonically increases as a function of the noise variance, with stronger regularization (larger beta) required to compensate for larger variability (larger sigma) (Fig. 6I). Finally, we note that a moderate level of noise (which, in turn, induces a non-trivial minimum of the coding error as a function of beta) in the network is optimal. The beneficial effect of moderate levels of noise on performance in networks with efficient coding has been shown in different contexts in previous work (Chalk et al. 2016, Koren and Deneve, 2017). The intuition is that the noise prevents the excessive synchronization of the network and insufficient single neuron variability that decrease the performance. The points above are now explained in the revised text on page 11.

      The Reviewer is also correct in stating that the network exhibits an optimal performance for intermediate values of the number of I neurons and the number of encoded features. In our understanding, the optimal number of encoded features of M=3 arises simply because all the other parameters were optimized for those values of M. The purpose of those analyses was not to state that a network optimally encodes only a given number of features, but how a network whose parameters are optimized for a given M perform reasonably well when M is varied. We clarify this on page 13 of Results in Discussion on page 16. In the same Discussion paragraph we refer also to the results of Calaim et al mentioned by the Reviewer. 

      To address the concern about the comparison of efficiency between the E-I and the 1CT model, we took advantage of the Reviewer’s suggestions to consider this issue more deeply. In revision, we now compare the efficiency of the 1CT model with the E population of the E-I model (Fig. 8H). This new comparison changes the conclusion about which model is more efficient, as it shows the 1CT model is slightly more efficient than the E-I model. Nevertheless, the E-I model performance is more robust to small variations of optimal parameters, e.g., it exhibits biologically plausible firing rates for non-optimal values of the metabolic constant. See also the reply to point 3 of the Public Review of Reviewer 2 for more detail. We added these results and the ensuing caveats for the interpretation of this comparison on Page 14, and also revised the title of the last subsection of Results.  

      Alternatively, the methodology of the model suggests that ad hoc modeling choices may be playing a role. For example, an arbitrary weighting of coding error and metabolic cost of 0.7 to 0.3, respectively, is chosen without mention of how this affects the results. Furthermore, the scaling of synaptic weights appears to be controlled separately for each connection type in the network (Table 1), despite the fact that some of these quantities are likely linked in the optimal network derivation. Finally, the optimal threshold and metabolic constants are an order of magnitude larger than the synaptic weights (Table 1). All of these considerations suggest one of the following two possibilities. One, the model has a substantial number of unconstrained parameters to tune, in which case more parameter sweeps would be necessary to definitively make claims of optimality. Or two, parameters are being decoupled from those constrained by the optimal derivation, and the optima simply corresponds to the values that should come out of the derivation.

      We thank the reviewer for bringing about these important questions.

      In the first submission, we presented both the encoding error and the metabolic cost separately as a function of the parameters, so that readers could get an understanding of how stable optimal parameters would be to the change of the relative weighting of encoding error and metabolic cost. We specified this in Results (page 5) and we kept presenting separately encoding and metabolic terms in the revision.

      However, we agree that it is important to present the explicit quantification on how the optimal parameters may depend on g<sub>L</sub>. In the first submission, we showed the analysis for all possible weightings in case of two parameters for which we found this analysis was the most relevant – the ratio of neuron numbers (Fig. 7E, Fig. 6E in first submission) and the optimal number of input features M (see last paragraph on page 13 and Fig. 8D). We now show this analysis also for the rest of studied model parameters in the Supplementary Fig. S4 (A-D and H). This is discussed on pages 9, 10,11 and 12.

      With regard to the concern that the scaling of synaptic weights should not be controlled separately for each connection type in the network, we agree and we would like to clarify that we did not control such scaling separately. Apologies if this was not clear enough. From the optimal analytical solution, we obtained that the connectivity scales with the standard deviation of decoding weights (s<sub>w</sub><sup>E</sup> and s<sub>w</sub><sup>I</sup>) of the pre and postsynaptic populations (Methods, Eq. 32). We studied the network properties as a function of the ratio of average I-I to E-I connectivity (Fig. 7 F-I; Supplementary Fig. S4 D-H), which is equivalent to the ratio of standard deviations s<sub>w</sub><sup>I</sup> /s<sub>w</sub><sup>E</sup> (see Methods, Eq. 35). We clarified this in text on page 12.

      Next, it is correct that our synaptic weights are an order of magnitude smaller than the metabolic constant. We analysed a simpler version of the network that has the coding and dynamics identical to our full model (Methods, Eq. 25) but without the external currents. We found that the optimal parameters determining the firing threshold in such a simpler network were biologically implausible (see Supplementary Text 2 and Supplementary Table S1). We considered as another simple solution the rescaling of the synaptic efficacy such as to have biologically plausible threshold. However, that gave implausible mean synaptic efficacy (see Supplementary Text 2).  Thus, to be able to define a network with biologically plausible firing threshold and mean synaptic efficacy, we introduced the non-specific external current. After introducing such current, we were able to shift the firing threshold to biologically plausible values while keeping realistic values of mean synaptic efficacy. Biologically plausible values for the firing threshold are around 15 -– 20 mV above the resting potential (Constantinople and Bruno, 2013), which is the value that we have in our model. A plausible value for the average synaptic strength is between a fraction of one millivolt to a couple of millivolts (Constantinople & Bruno, 2013, Campagnola et al. 2022), which also corresponds to values that the synaptic weights take. The above results are briefly explained in the revised text on page 4.

      Finally, to study the optimality of the network when changing multiple parameters at a time, we added a new analysis with Monte-Carlo random joint sampling (10.000 parameter sets with 20 trials for each set) of all 6 model parameters that we explored in the paper. We compared (Fig 2) the so-obtained results of each simulation with those obtained from the understanding gained from varying one or two parameters at a time (optimal parameters reported in Table 1 and used throughout the paper).  We found (Fig. 2) that the optimal configuration in Table 1 was never improved by any other simulations we performed, and that the first three random simulations that came the closest to the optimal one of Table 1 had stronger noise intensity but also stronger metabolic cost than the configuration on Table 1. The second, third and fourth configurations had longer time constants of both E and I single neurons (adaptation time constants). Ratio of E-I neuron numbers and of I-I to E-I connectivity in the second, third and fourth best configuration were either jointly increased or decreased with respect to our configuration. These results are reported on Fig. 2 and in Tables 2-3 and they are discussed in Results (page 5).

      Reviewer #3 (Public Review):

      Summary:

      In their paper the authors tackle three things at once in a theoretical model: how can spiking neural networks perform efficient coding, how can such networks limit the energy use at the same time, and how can this be done in a more biologically realistic way than previous work?

      They start by working from a long-running theory on how networks operating in a precisely balanced state can perform efficient coding. First, they assume split networks of excitatory (E) and inhibitory (I) neurons. The E neurons have the task to represent some lower dimensional input signal, and the I neurons have the task to represent the signal represented by the E neurons. Additionally, the E and I populations should minimize an energy cost represented by the sum of all spikes. All this results in two loss functions for the E and I populations, and the networks are then derived by assuming E and I neurons should only spike if this improves their respective loss. This results in networks of spiking neurons that live in a balanced state, and can accurately represent the network inputs.

      They then investigate in-depth different aspects of the resulting networks, such as responses to perturbations, the effect of following Dale's law, spiking statistics, the excitation (E)/inhibition (I) balance, optimal E/I cell ratios, and others. Overall, they expand on previous work by taking a more biological angle on the theory and showing the networks can operate in a biologically realistic regime.

      Strengths:

      (1) The authors take a much more biological angle on the efficient spiking networks theory than previous work, which is an essential contribution to the field.

      (2) They make a very extensive investigation of many aspects of the network in this context, and do so thoroughly.

      (3) They put sensible constraints on their networks, while still maintaining the good properties these networks should have.

      Thanks for this summary and for these kind words of appreciation of the strengths of our work.  

      Weaknesses:

      (1) The paper has somewhat overstated the significance of their theoretical contributions, and should make much clearer what aspects of the derivations are novel. Large parts were done in very similar ways in previous papers. Specifically: the split into E and I neurons was also done in Boerlin et al (2008) and in Barrett et al (2016). Defining the networks in terms of realistic units was already done by Boerlin et al (2008). It would also be worth it to discuss Barrett et al (2016) specifically more, as there they also use split E/I networks and perform biologically relevant experiments.

      We improved the text to make sure that credit to previous studies is more precisely and more clearly given (see rebuttal to the specific suggestions of Reviewer 2 for a full list).

      We apologize if this was not clear enough in the previous version. 

      With regard to the specific point raised here about the E-I split, we revised the text on page 2. With regard to the realistic units, we revised the text on page 3. Finally, we commented on relation between our results and results of the study by Barrett et al. (2016) on page 16.

      (2) It is not clear from an optimization perspective why the split into E and I neurons and following Dale's law would be beneficial. While the constraints of Dale's law are sensible (splitting the population in E and I neurons, and removing any non-Dalian connection), they are imposed from biology and not from any coding principles. A discussion of how this could be done would be much appreciated, and in the main text, this should be made clear.

      We indeed removed non-Dalian connections because Dale’s law is a major constraint for biological plausibility. Our logic was to consider efficient coding within the space of networks that satisfy this (and other) biological plausibility constraints. We did not intend to claim that removing the non-Dalian connections was the result of an analytical optimization. We clarified this in revision (page 4).

      (3) Related to the previous point, the claim that the network with split E and I neurons has a lower average loss than a 1 cell-type (1-CT) network seems incorrect to me. Only the E population coding error should be compared to the 1-CT network loss, or the sum of the E and I populations (not their average). In my author recommendations, I go more in-depth on this point.

      We carefully considered these possibilities and decided to compare only the E population of the E-I model with the 1-CT model. On Fig.8G (7C of the first submission), E neurons have a slightly higher error and cost compared to the 1CT network. In the revision, we compared the loss of E neurons of the E-I model with the loss of the 1-CT model. Using such comparison, we found that the 1CT network has lower loss and is more efficient compared to E neurons of the E-I model. We revised Figure 8H and text on page 14 to address this point. 

      (4) While the paper is supposed to bring the balanced spiking networks they consider in a more experimentally relevant context, for experimental audiences I don't think it is easy to follow how the model works, and I recommend reworking both the main text and methods to improve on that aspect.

      We tried to make the presentation of the model more accessible to a non-computational audience in the revised paper. We carefully edited the text throughout to make it as accessible as possible. 

      Assessment and context:

      Overall, although much of the underlying theory is not necessarily new, the work provides an important addition to the field. The authors succeeded well in their goal of making the networks more biologically realistic, and incorporating aspects of energy efficiency. For computational neuroscientists, this paper is a good example of how to build models that link well to experimental knowledge and constraints, while still being computationally and mathematically tractable. For experimental readers, the model provides a clearer link between efficient coding spiking networks to known experimental constraints and provides a few predictions.

      Thanks for these kind words. We revised the paper to make sure that these points emerge more clearly and in a more accessible way from the revised paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Referring to the major comments:

      (1) Be upfront about particular modelling choices and why you made them; avoid talk of a "striking/surprising", etc. ability to explain data when this actually requires otherwise-arbitrary choices and auxiliary assumptions. Ideally, this nuance is already clear from the abstract.

      We removed all the "striking/surprising" and similar expressions from the text. 

      We added to the Abstract the assumption of equal time constants of the stimulus and of the membrane of E and I neurons and the assumption of the independence of encoded stimulus features.

      In revision, we performed additional analyses (joint parameter sweeps, Monte-Carlo joint sampling of all 6 model parameters) providing additional evidence that the network parameters in Table 1 capture reasonably well the optimal solution. These are reported on Figs. 2, 6I and 7J and in Results (pages 5, 11 and 13). See rebuttal to weaknesses of the public review of the Referee 2 for details.

      (2) Make even more of an effort to acknowledge prior work on the importance of structured E-I and I-E connectivity.

      We have revised the text (page 4) to better place our results within previous work on structured E-I and I-E connectivity.

      (3) Be clear about the model's limitations and mention them throughout the text. This will allow readers to interpret your results appropriately.

      We now comment more on model's limitations, in particular the simplifying assumption about the network's computation (page 16), the lack of E-E connectivity (page 3), the absence of long-term adaptation (page 10), and the simplification of only having one type of inhibitory neurons (page 16). 

      (4) Present your "predictions" for what they are: aspects of the model that can be made consistent with the existing data after some fitting. Except in the few cases where you make actual predictions, which deserve to be highlighted.

      We followed the suggestion of the reviewer and distinguished cases where the model is consistent with the data (postdictions) from actual predictions, where empirical measurements are not available or not conclusive. We compiled a list of predictions and postdictions in response to the point 4 of Reviewer 1. In revision, we now comment about every property of the model as either reproducing a known property of biological networks (postdiction) or being a prediction. We improved the text in Results on pages 4, 5, 6, 7, 9, 10, 11, 12 and 13 to accommodate these requests.

      Minor comments and recommendations

      It's a sizable list, but most can be addressed with some text edits.

      (1) The image captions should give more details about the simulations and analyses, particularly regarding sample sizes and statistical tests. In Figure 5, for example, it is unclear if the lines represent averages over multiple signals and, if so, how many. It's probably not a single realization, but if it is, this might explain the otherwise puzzling optimal number of three stimuli. Box plots visualize the distribution across simulation trials, but it's not clear how many. In Figure 7d, a star suggests statistical significance, but the caption does not mention the test or its results; the y-axis should also have larger limits.

      All statistical results were computed on 100 or 200 simulation trials, depending on the figure, with duration of the trial of 1 second of simulated time. To compute statistical results in Fig. 1, we used 10 trials with duration of 10 seconds for each trial. Each trial consisted of M independent realizations of Ornstein-Uhlenbeck (OU) processes as stimuli, independent noise in the membrane potential and an independent draw of tuning parameters, such that the results are general over specific realization of these random variables. Realizations of the OU processes were independent across stimulus dimensions and across trials. We added this information in the caption of each figure. 

      The optimal number of M=3 stimuli is the result of measuring the performance of the network in 100 simulation trials (for each parameter value), thus following the same procedure as for all other parameters. Boxplots on Fig. 8G-H were also generated from results computed in 100 simulation trials, which we have now specified in the caption of the figure, together with the statistical test used for assessing the significance (twotailed t-test). We also enlarged the limits of Fig. 8H (7D in the previous version).

      (2) The Oldenburg paper (reference 62) finds suppression of all but nearby neurons in response to two- photon stimulation of small neural ensembles (instead of single neurons, as in Chettih & Harvey). This isn't perfectly consistent with the model's results, even though the Oldenburg experiments seem more relevant given the model's small size, and strong connectivity/high connection probability between similarly tuned neurons. What might explain the potential mismatch?

      We sincerely apologize for not having been precise enough on this point when comparing our model against Chettih & Harvey and Oldenburg et al. We corrected the sentence (page 6) to remove the claim that our model reproduces both. 

      We speculate that the discrepancy between perturbing our model and the Oldenburg data may arise from the lack of E-E connectivity in our model. Synaptic connections between E neurons with similar selectivity could create an enhancement instead of suppression between neuronal pairs with very similar tuning. We added a sentence about this in the section with perturbation experiments “Competition across neurons with similar stimulus tuning emerging in efficient spiking networks” (page 7) where we discuss this limitation of our model. We feel that this example shows the utility to derive some perturbation results from our model, as not all networks with some degree of lateral inhibition will show the same perturbation results. Comparing our model's perturbation with real data perturbation results has thus some value to better appreciate strengths and limitations of our approach. 

      (3) "Previous studies optogenetically stimulated E neurons but did not determine whether the recorded neurons were excitatory or inhibitory " (p. 11). I believe Oldenburg et al. did specifically image excitatory neurons.

      The reviewer is correct about Oldenburg et al. imaging specifically excitatory neurons. We have revised this part of the Discussion (page 15). 

      (4) The authors write that efficiency is particularly achieved where adaptation is stronger in E compared to I neurons (p. 7; Figure 4). Although this would be consistent with experimental data (the I neurons in the model seem akin to fast-spiking Pv+ cells), I struggle to see it in the figure. Instead, it seems like there are roughly two regimes. If either of the neuronal timescales is faster than the stimulus timescale, the optimisation fails. If both are at least as slow, optimisation succeeds.

      We agree with the reviewer that the adaptation properties of our inhibitory neurons are compatible with Pv+ cells. What is essential for determining the dynamical regime of the network is less the relation to the time constant of the stimulus (t<sub>x</sub>) but rather the relation between the time constant of the population readout (t, which is also the membrane time constant) and the time constant of the single neuron (t<sub>r</sub><sup>y</sup> for y=E and y=I; see Eq. 23, 25 or 29e). The relation between t and t<sub>r</sub><sup>y</sup> determines if single neurons generate spike-triggered adaptation (t<sub>r</sub><sup>y</sup> > t) or spike-triggered facilitation (t<sub>r</sub><sup>y</sup> < t; see Table 4). In regimes with facilitation in either E or I neurons (or both), the network performance strongly deteriorates compared to regimes with adaptation (Fig. 5A). 

      Beyond adaptation leading to better performance, we also found different effects of adaptation in E and I neurons. We acknowledge that the difference of these effects was difficult to see from the Fig. 4B in the first submission. We have now replotted results from previously shown Fig. 4B to focus on the adaptation regime only, (since the Fig. 5A already establishes that this is the regime with better performance). We also added figures showing the differential effect of adaptation in E and I cell type on the firing rate and on the average loss (Fig. 5C-D). Fig. 5B and C (top plots) show that with adaptation in E neurons, the error and the loss increase more slowly than with adaptation in I neurons. Moreover, the firing rate in both cell types decreases with adaptation in E neurons, while this is not the case with adaptation in I neurons (Fig. 5D). These results are added to the figure panels specified above and discussed in text on page 9.

      To clarify the relation between neuronal and stimulus timescale, we now also added the analysis of network performance as a function of the time constant of the stimulus t<sub>x</sub> (Supplementary Fig. S5 C-E). We found that the model's performance is optimal when the time constant of the stimulus is close to the membrane time constant t. This result is expected, because the equality of these time constants was imposed in our analytical derivation of the model (t<sub>x</sub>  = t). We see a similar decrease in performance for values of t<sub>x</sub>  that are faster and slower with respect to the membrane time constant (Supplementary Fig. S5C, top). These results are added to the figure panels specified above and discussed in text on page 13.

      (5) A key functional property of cortical interneurons is their lower stimulus selectivity. Does the model replicate this feature?

      We think that whether I neurons are less selective than E neurons is still an open question. A number of recent empirical studies reported that the selectivity of I neurons is comparable to the selectivity of E neurons (see., e.g., Kuan et al. Nature 2024, Runyan et al. Neuron 2010, Najafi et al. Neuron 2020). In our model, the optimal solution prescribes a precise structure in recurrent connectivity (see Eq. 24 and Fig. 1C(ii)) and structured connectivity endows I neurons with stimulus selectivity. To show this, we added plots of example tuning curves and the distribution of the selectivity index across E and I neurons (Fig. 8E-F) and described these new results in Results (page 14). Tuning curves in our network were similar to those computed in a previous work that addressed stimulus tuning in efficient spiking networks (Barrett et al. 2016). We evaluated tuning curves using M=3 constant stimulus features and we varied one of the features while the two others were kept fixed. We provided details on how the tuning curves and the selectivity index were computed in a new Methods subsection (“Tuning curves and selectivity index”) on page 50.

      (6) The final panels of Figure 4 are presented as an approach to test the efficiency of biological networks. The authors seem to measure the instantaneous (and time-averaged) E-I balance while varying the adaptation parameter and then correlate this with the loss. If that is indeed the approach (it's difficult to tell), this doesn't seem to suggest a tractable experiment. Also, the conclusion is somewhat obvious: the tighter the single neuron balance, the fewer unnecessary spikes are fired. I recommend that the authors clearly explain their analysis and how they envision its application to biological data.

      We indeed measured the instantaneous (and time-averaged) E-I balance while varying the adaptation parameters and then correlating this with the loss. We did not want to imply that the latter panels of Figure 4 are a means to test the efficiency or biological networks or that we are suggesting new and possibly unfeasible experiments. We see it as a way to better conceptually understand how spike triggered adaptation helps the network’s coding efficiency, by tightening the E I balance in a way that it reduces the number of unnecessary spikes. We apologize if the previous text was confusing in this respect.   We have now removed the initial paragraph of former Results Subsection (including removing the subsection title) and added new text about different effect of adaptation in E and I neurons on Page 9. We also thoroughly revised Figure 5.

      (7) The external stimuli are repeatedly said to vary (or be tracked) across "multiple time scales", which might inadvertently be interpreted as (i) a single stimulus containing multiple timescales or (ii) simultaneously presented stimuli containing different timescales. These scenarios are potential targets for efficient coding through neuronal adaptation (reference 21 in the manuscript and Pozzorini et al. Nat. Neuro. 2013), but they are not addressed in the current model. I recommend the authors clarify their statements regarding timescales (and if they're up for it, acknowledge this as a limitation).

      We thank the reviewer for bringing up this interesting point. To address the second point raised by the Reviewer (simultaneously presented stimuli containing multiple timescales), we performed new analyses to test the model with simultaneously presented stimuli that have different timescales. We found that the model encodes efficiently such stimuli.  We tested the case with a 3-dimensional stimulus where each dimension is an Ornstein-Uhlenbeck process with a different time constant. More precisely, we kept the time constant in the first dimension fixed (at 10 ms), and varied the time constant in the second and third dimension such that the time constant in the third dimension is doubled with respect to the second dimension. We plotted the encoding error in every stimulus dimension for E and I neurons (Fig. 8B, left plot) as well as the encoding error and the metabolic cost averaged across stimulus dimensions (Fig. 8B, right plot). The results are briefly described with text on page 13.

      Regarding the case i) (single stimulus containing multiple timescales), we considered two possibilities. One possibility is that timescales of the stimulus are separable, and in this case a single stimulus containing several time scales can be decomposed in several stimuli with a single time scale each. As we assign a new set of weights for each dimension of the decomposed stimulus, this case is similar to the case ii) that we already addressed. Another possibility is that timescales of the stimulus cannot be separated. This case is not covered in the present analysis and we listed it among the limitations of the model. We revised the text (page 13) around the question of multiple time scales and included the citation of Pozzorini et al. (2013). 

      (8) It is claimed that the model uses a mixed code to represent signals, citing reference 47 (Rigotti et al., Nature 2013). But whereas the model seems to use linear mixed selectivity, the Rigotti reference highlights the virtues of nonlinear mixed selectivity. In my understanding, a linearly mixed code does not enjoy the same benefits since it’s mathematically equivalent to a non-mixed code (simply rotate the readout matrix). I recommend that the authors clarify the type of selectivity used by their model and how it relates to the paper(s) they cite.

      The reviewer is correct that our selectivity is a linear mixing of input variables, and differs from the selectivity in Rigotti et al. (2013) which is non-linear. We revised the sentence on page 4 to clarify better that the mixed selectivity we consider is linear and we removed Rigotti’s citation. 

      (9) Reference 46 is cited as evidence that leaky integration of sensory features is a relevant computation for sensory areas. I don’t think this is quite what the reference shows. Instead, it finds certain morphological and electrophysiological differences between single pyramidal neurons in the primary visual cortex compared to the prefrontal cortex. Reference 46’ then goes on to speculate that these are differences relevant to sensory computation. This may seem like a quibble, but given the centrality of the objectivee function in normative theories, I think it's important to clarify why a particular objective is chosen.

      We agree that our reference of Amatrudo et al was not the best reference and that the previous text was confusing. We thus tried to improve on its clarity. We looked at the previous theoretical efficient coding papers introducing this leaky integration and we could not find in the previous theoretical work a justification of this assumption based on experimental papers. However, there is evidence that neurons in sensory structures, and in cortical association areas respond to time varying sensory evidence by summing stimuli over time with a weight that decreases steadily going back in time from the time of firing, which suggests that neurons integrate time-varying sensory features. In many cases, these integration kernels decay approximately exponentially going back in time, and several models explaining successfully perceptual readouts of neural activity work assuming leaky integration. This suggests that the mathematical approximation of leaky integration of sensory evidence, though possibly simplistic, is reasonable.  We revised the text in this respect (page 2).  

      (10) The definition of the objective function uses beta as a tuning parameter, but later parts of the text and figures refer to a parameter g_L which might only be introduced in the convex combination of Eq. 40a.

      This is correct. Parameter optimization has been performed on a weighted sum of the average encoding error and cost as given by the Eq. 39a (40a in first submission), with the weighting g<sub>L</sub> for the error versus the cost, and not the beta that is part of the objective in Eq.10. The convex combination in Eq. 39a allowed us to find a set of optimal parameters that is within biologically realistic parameter ranges, which includes realistic values for the firing threshold. The average encoding error and metabolic cost (the two terms on the right-hand side of Eq. 39a, without weighting with g<sub>L</sub>) in our network are of the same order (see Fig 8G for the E-I model where these values are plotted separately for the optimal network). Weighing the cost with optimal beta that is in the range of ~10 would have yielded a network that optimizes almost exclusively the metabolic cost and would bias the results towards solutions with poor encoding accuracy.

      To document more fully how the choice of weighting of the error with the cost (g<sub>L</sub>) affects the optimal parameters, we now added new analysis (Fig. 8D and Supplementary Fig. S4 A-D and H) showing optimal parameters as a function of this weighting. We commented on these results in the text on pages 9-11 and 12. For further details, please see also the reply to point 1 or Reviewer 1.

      (11) Figure 1J: "In E neurons, the distribution of inhibitory and of net synaptic inputs overlap". In my understanding, they are in fact identical, and this is by construction. It might help the reader to state this.

      We apologize for an unclear statement. In E neurons, net synaptic current is the sum of the feedforward current and of recurrent inhibition (Eq. 29c and Eq. 42). With our choice of tuning parameters that are symmetric around zero and with stimulus features that have vanishing mean, the mean of the feedforward current is close to zero. Because of this, the mean of the net current is negative and is close to the mean of the inhibitory current. We have clarified this in the text (page 5).

      (12) A few typos:

      -  p1. "Minimizes the encoding accuracy" should be "maximizes..."

      -  p1: "as well the progress" should be something like "as well as the progress"

      -  p.11 In recorded neurons where excitatory or inhibitory. ", "where" should be "were" - Fig3: missing parentheses (B)

      -  Fig4B: the 200 ticks on the y-scale are cut off.

      -  Panel Fig. 5a: "stimulus" should be "stimuli".

      -  Ref 24 "Efficient andadaptive sensory codes" is missing a space.

      -  p. 26: "requires" should be "required".

      -  On several occasions, the article "the" is missing.

      We thank the reviewer for kindly pointing out the typos that we now corrected.

      Reviewer #2 (Recommendations For The Authors):

      I would like to give the authors more details about the two main weaknesses discussed above, so that they may address specific points in the paper. First, there is the relation to previous work. Several published articles have presented very similar results to those discussed here, including references 5, 26, 28, 32, 33, 42, 43, 48, and an additional reference not cited by the authors (Calaim et al. 2022 eLife e73276). This includes:

      (1) Derivation of an E-I efficient spiking network, which is found in refs. 28, 42, 43, and 48. This is not reflected in the text: e.g., "These previous implementations, however, had neurons that did not respect Dale's law" (Introduction, pg. 1); "Unlike previous approaches (28, 48), we hypothesize that E and I neurons have distinct normative objectives...". The authors should discuss how their derivation compares to these.

      We have now fully clarified on page 3 that our model builds on the seminal previous works that introduced E-I networks with efficient coding (Supplementary text in Boerlin et al. 2013, Chalk et al. 2016, Barrett et al. 2016). 

      (2) Inclusion of a slow adaptation current: I believe this also appears in a previous paper (Gutierrez & Deneve 2019, ref. 33) in almost the exact same form, and is again not reflected in the text: "The strength of the current is proportional to the difference in inverse time constants ... and is thus absent in previous studies assuming that these time constants are equal (... ref. 33). Again, the authors should compare their derivation to this previous work.

      We thank the reviewer for pointing this out. We sincerely apologize if our previous version did not recognize sufficiently clearly that the previous work of Gutierrez and Deneve (eLife 2019; ref 33) introduced first the slow adaptation current that is similar to spike-triggered adaptation in our model. We have made sure that the revised text recognizes it more clearly. We also explained better what we changed or added with respect to this previous work (see revised text on page 8). 

      The work by Gutierrez and Deneve (2019) emphasizes the interplay between single neuron property (an adapting current in single neurons) and network property (networklevel coding through structured recurrent connections). They use a network that does not distinguish E and I neurons. Our contribution instead focuses on the adaptation in an E-I network. To improve the presentation following the Reviewer’s comment, we now better emphasize the differential effect of adaptation in E and in I neurons in revision (Fig. 5 B-D). Moreover, Gutierrez and Deneve studied the effect of adaptation on slower time scales (1 or 2 seconds) while we study the adaptation on a finer time scale of tens of milliseconds. The revised text detailed this is reported on Page 8.

      (3) Background currents and physical units: Pg. 26: "these models did not contain any synaptic current unrelated to feedforward and recurrent processing" and "Moreover previous models on efficient coding did not thoroughly consider physical units of variables" - this was briefly described in ref. 28 (Boerlin et al. 2013), in which the voltage and threshold are transformed by adding a common constant, and additional aspects of physical units are discussed.

      It is correct that Boerlin et al (2013) suggested adding a common constant to introduce physical units. We now revised the text to make clearer the relation between our results and the results of Boerlin et al. (2013) (page 3). In our paper, we built on Boerlin et al. (2013) and assigned physical units to computational variables that define the model's objective (the targets, the estimates, the metabolic constant, etc.). We assigned units to computational variables in such a way that physical variables (such as membrane potential, transmembrane currents, firing thresholds and resets) have the correct physical units.  We have now clarified how we derived physical units in the section of Results where we introduce the biophysical model (page 3) and specified how this derivation relates to the results in Boerlin et al. (2013).

      (4) Voltage correlations, spike correlations, and instantaneous E/I balance: this was already pointed out in Boerlin et al. 2013 (ref 28; from that paper: "Despite these strong correlations of the membrane potentials, the neurons fire rarely and asynchronously") and others including ref. 32. The authors mention this briefly in the Discussion, but it should be more prominent that this work presents a more thorough study of this well-known characteristic of the network.

      We agree that it would be important to comment on how our results relate to these results in Boerlin et al. (2013). It is correct that in Boerlin et al. (2013) neurons have strong correlations in the membrane potentials, but fire asynchronously, similarly to what we observe in our model. However, asynchronous dynamics in Boerlin et al. (2013) strongly depends on the assumption of instantaneous synaptic transmission and time discretization, with a “one spike per time bin” rule in numerical implementation. This rule enforces that at most one spike is fired in each time bin, thus actively preventing any synchronization across neurons. If this rule is removed, their network synchronizes, unless the metabolic constant is strong enough to control such synchronization to bring it back to asynchronous regime (see ref. 36). Our implementation does not contain any specific rule that would prevent synchronization across neurons. We now cite the paper by Boerlin and colleagues and briefly summarize this discussion when we describe the result of Fig. 3D on page 7. 

      (5) Perturbations and parameters sweep: I found one previous paper on efficient spiking networks (Calaim et al. 2022) which the authors did not cite, but appears to be highly relevant to the work presented here. Though the authors perform different perturbations from this previous study, they should ideally discuss how their findings relate to this one. Furthermore, this previous study performs extensive sweeps over various network parameters, which the authors might discuss here, when relevant. For example, on pg. 8, the authors write “We predict that, if number of neurons within the population decreases, neurons have to fire more spikes to achieve an optimal population readout” – this was already shown in Calaim et al. 2022 Figure 5, and the authors should mention if their results are consistent.

      We apologize for not being aware of Calaim et al. (2022) when we submitted the first version of our paper. This important study is now cited in the revised version. We have now, as suggested, performed sweeps of multiple parameters inspired by the work of Calaim. This new analysis is described extensively in reply to Weaknesses in the Public Review of reviewer 2 and is found in Fig 2, 6I and 7J and described on pages 5,11 and 13.

      The Reviewer is also correct that the compensation mechanism that applies when changing the ratio of E-I neuron numbers is similar to the one described in Barrett et al. (2016) and related to our claim “if number of neurons within the population decreases, neurons have to fire more spikes to achieve an optimal population readout”. We have now added (page 11) that this prediction is consistent with the finding of Barrett et al. (2016).

      With regard to the dependence of optimal coding properties on the number of neurons, we have tried to better describe similarities and differences with our work and that of Calaim et al as well as with the work of Barrett et al. (2016) which reports highly relevant results. These additional considerations are summarized in a paragraph in Discussion (page 16).

      (6) Overall, the authors should distinguish which of their results are novel, which ones are consistent with previous work on efficient spiking networks, and which ones are consistent in general with network implementations of efficient and sparse coding. In many of the above cases, this manuscript goes into much more depth and study of each of the network characteristics, which is interesting and commendable, but this should be made clear. In clarifying the points listed above, I hope that the authors can better contextualize their work in relation to previous studies, and highlight what are the unique characteristics of the model presented here.

      We made a number of clarifications of the text to provide better contextualization of our model within existing literature and to credit more precisely previous publications. This includes commenting on previous studies that introduced separate objective functions of E and I neurons (page 2), spike-triggered adaptation (page 8), physical units (page 3), and changes in the number of neurons in the network (page 16). 

      Next, there are the claims of optimal parameters. As explained on pg. 35 (criterion for determining optimal model parameters), it appears to me that they simply vary each parameter one at a time around the optimal value. This argument appears somewhat circular, as they would need to know the optimal parameters before starting this sweep. In general, I find these optimality considerations to be the most interesting and novel part of the paper, but the simulations are relatively limited, so I would ask the authors to either back them up with more extensive parameter sweeps that consider covariations in different parameters simultaneously (as in Calaim et al. 2022). Furthermore, the authors should make sure that they are not breaking any of the required relationships between parameters necessary for the optimization of the loss function. Again, some of the results (such as coding error not being minimized with zero metabolic cost) suggests that there might be issues here. 

      We thank the reviewer for this insightful suggestion. We have now added a joint sweep of all relevant model parameters using Monte-Carlo parameter search with 10.000 iterations. We randomly drew parameter configurations from predetermined parameter ranges that are detailed in the newly added Table 2. Parameters were sampled from a uniform distribution. We varied all the six model parameters studied in the paper (metabolic constant, noise intensity, time constant of single E and I neurons, ratio of E to I neurons and ratio of the mean I-I to E-I connectivity).  We now present these results on a new Figure 2. We did not find any set of parameters with lower loss than the parameters in Table 1 when the weighting of the error with the cost was in the following range: 0.4<g<sub>L</sub><0.81 (Fig. 2C). While our large but finite Monte-Carlo random sampling does not fully prove that the configuration we selected as optimal (on Table 1) is a global optimum, it shows that this configuration is highly efficient. Further, and as detailed in the rebuttal to the Weaknesses of the Public Review of Referee 2, analyses of the near optimal solutions are compatible with the notion (resulting from the join parameter sweep studies that we added to Figures 6 and 7) that network optimality may be influenced by joint covariations in parameters. These new results are reported in Results (page 5, 11 and 13) and in Figure 2, 6I an 7J.

      Some more specific points:

      (1) In general, I find it difficult to understand the scaling of the RMSE, cost, and loss values in Figures 4-7. Why are RMSE values in the range of 1-10, whereas loss and cost values are in the range of 0-1? Perhaps the authors can explicitly write the values of the RMSE and loss for the simulation in Figure 1G as a reference point.

      Encoding error (RMSE), metabolic cost (MC) and average loss for a well performing network are within the range of 1-10 (see Fig. 8G or 7C in the first submission). To ease the visualization of results, we normalized the cost and the loss on Figs. 6-8 in order to plot them on the same figure (while the computation of the optima is done following the Eq. 39 and is without normalization). We have now explicitly written the values of RMSE, MC and the average loss (non-normalized) for the simulation in Fig. 1D on page 5, as suggested by the reviewer. We have also revised Fig. 4 and now show the absolute and not the relative values of the RMSE and the MC (metabolic cost). 

      (2) Optimal E-I neuron ratio of 4:1 and efficacy ratio of 3:1: besides being unintuitive in relation to previous work, are these two optimal settings related to one another? If there are 4x more excitatory neurons than inhibitory neurons, won't this affect the efficacy ratio of the weights of the two populations? What happens if these two parameters are varied together?

      Thanks for this insightful point. Indeed, the optima of these two parameters are interdependent and positively correlated - if we decrease the E-I neuron ratio, the optimal efficacy ratio decreases as well. To better show this relation we added figures with 2dimensional parameter search (Fig. 7J) where we varied jointly the two ratios. The red cross on the right figure marks the optimal ratios used as optimal parameters in our study. These finding are discussed on page 13.

      (3) Optimal dimensionality of M=[1,4]: Again, previous work (Calaim et al. 2022) would suggest that efficient spiking networks can code for arbitrary dimensional signals, but that performance depends on the redundancy in the network - the more neurons, the better the coding. From this, I don't understand how or why the authors find a minimum in Figure 7B. Why does coding performance get worse for small M?

      We optimized all model parameters with M=3 and this is the reason why M=3 is the optimal number of inputs when we vary this parameter. Our network shows a distinct minimum of the encoding error as a function of the stimulus dimensionality for both E and I neurons (Fig. 8C, top). This minimum is reflected in the minimum of the average loss (Fig. 8C, bottom). The minimum of the loss is shifted (or biased) by the metabolic cost, with strong weighting of the cost lowering the optimal number of inputs. This is discussed on pages 13-14.

      Here are a list of other, more minor points, that the authors can consider addressing to make the results and text more clear:

      (1) Feedforward efficient coding models: in the introduction (pg. 1) and discussion (pg. 11) it is mentioned that early efficient coding models, such as that of Olshausen & Field 96, were purely feedforward, which I believe to be untrue (e.g., see Eq. 2 of O&F 96). Later models made this even more explicit (Rozell et al. 2008). Perhaps the authors can either clarify what they meant by this, or downplay this point.

      We sincerely apologize for the oversight present in the previous version of the text. We agree with the reviewer that the model in Olshausen and Field (1996) indeed defines a network with recurrent connections, and the same type of recurrent connectivity has been used by Rozell et al. (2008, 2013). The structure of the connectivity in Olshausen and Field (as well as in Rozell et al (2008)) is closely related to the structure of connectivity that we derived in our model. We have corrected the text in the introduction (page 1) to remove these errors.

      (2) Pg. 2 - The authors state: "We draw tuning parameters from a normal distribution...", but in the methods, it states that these are then normalized across neurons, so perhaps the authors could add this here, or rephrase it to say that weights are drawn uniformly on the hypersphere.

      We rephrased the description of how weights were determined (page 2).

      (3) Pg. 2 - "We hypothesize the time-resolved metabolic cost to be proportional to the estimate of a momentary firing rate of the neural population" - from what I can see, this is not the usual population rate, which would be an average or sum of rates across the population.

      Indeed, the time-dependent metabolic cost is not the population rate (in the sense of the sum of instantaneous firing rates across neurons), but is proportional to it by a factor of 1/t. More precisely, we can define the instantaneous estimate of the firing rate of a single neuron i as z<sub>i</sub>(t) = 1/t<sub>r</sub> r<sub>i</sub>(t) with r<sub>i</sub>(t) as in Eq. 7. We have clarified this in the revised text on page 3. 

      (4) Pg. 3: "The synaptic strength between two neurons is proportional to their tuning similarity if the tuning similarity is positive" - based on the figure and results, this appears to be the case for I-E, E-I, and I-I connections, but not for E-E connections. This should be clarified in the text. Furthermore, one reference given in the subsequent sentence (Ko et al. 2011, ref. 51), is specifically about E-E connections, so doesn't appear to be relevant here.

      We have now specified that the Eq. 24 does not describe E-E connections. We also agree that the reference (Ko et al. 2011) did not adequately support our claim and we thus removed it and revised the text on page 3 accordingly.

      (5) Pg. 3: "the relative weight of the metabolic cost over the encoding error controls the operating regime of the network" and "and an operating regime controlled by the metabolic constant" - what do you mean by operating regime here?

      We used the expression “operating regime” in the sense of a dynamical regime of the network.  However, we agree that this expression may be confusing and we removed it in revision. 

      (6) Pg. 3: "Previous studies interpreted changes of the metabolic constant beta as changes to the firing thresholds, which has less biological plausibility" - can the authors explain why this is less plausible, or ideally provide a reference for it?

      In biological networks, global variables such as brain state can strongly modulate the way neural networks respond to a feedforward stimulus. These variables influence neural activity in at least two distinct ways. One is by changing non-specific synaptic inputs to neurons, which is a network-wide effect (Destexhe and Pare, Nature Reviews Neurosci. 2003). This is captured in our model by changing the strength of the mean and fluctuations in the external currents. Beyond modulating synaptic currents, another way of modulating neural activity is by changing cell-intrinsic factors that modulate the firing threshold in biological neurons (Pozzorini et al. 2013). Previous studies on spiking networks with efficient coding interpreted the effect of the metabolic constant as changes to the firing threshold (Koren and Deneve, 2017, Gutierrez and Deneve 2019), which corresponds to cell-intrinsic factors. Here we instead propose that the metabolic constant modulates the neural activity by changing the non-specific synaptic input, homogeneously across all neurons in the network. Interpreting the metabolic constant as setting the mean of the non-specific synaptic input was necessary in our model to find an optimal set of parameters (as in Table 1) that is also biologically plausible. We revised the text accordingly (page 4).

      (7) Pg. 4: Competition across neurons: since the model lacks E-E connectivity, it seems trivial to conclude that there is competition through lateral inhibition, and it can be directly determined from the connectivity. What is gained from running these perturbation experiments?

      We agree that a reader with a good understanding of sparse / efficient coding theory can tell that there is competition across neurons with similar tuning already from the equation for the recurrent connectivity (Eq. 24). However, we presume that not all readers can see this from the equations and that it is worth showing this with simulations.

      Following the reviewer's comment, we have now downplayed the result about the model manifesting lateral inhibition in general on page 6. We have also removed its extensive elaboration in Discussion.

      One reason to run perturbation experiments was to test to what extent the optimal model qualitatively replicates empirical findings, in particular, single neuron perturbation experiments in Chettih and Harvey, 2019, without specifically tuning any of the model parameters. We found that the model reproduces qualitatively the main empirical findings, without tuning the model to replicate the data. We revised the text on page 5 accordingly.

      Further reason to run these experiments was to refine predictions about the minimal amount of connectivity structure that generates perturbation response profiles that are qualitatively compatible with empirical observations. To establish this, we did perturbation experiments while removing the connectivity structure of a particular connectivity sub-matrices (E-I, I-I or I-E; Fig. S3 F). This allowed us to determine which connectivity matrix has to be structured to observe results that qualitatively match empirical findings. We found that the structure of E-I and I-E connectivity is necessary, but not the structure of I-I connectivity. Finally, we tested partial removal of the connectivity structure where we replaced the precise (and optimal) connectivity structure and imposed a simpler connectivity rule. In the optimal connectivity, the connection strength is proportional to the tuning similarity. A simpler connectivity rule, in contrast, only specifies that neurons with similar tuning share a connection, and beyond this the connection strength is random. Running perturbation experiments in such a network obeying a simpler connectivity rule still qualitatively replicated empirical results from Chettih and Harvey (2019). This is shown on the Supplementary Fig. S2F on described on page 8.

      (8) Pg. 4: "the optimal E-I network provided a precise and unbiased estimator of the multidimensional and time-dependent target signal" - from previous work (e.g., Calaim et al. 2022), I would guess that the estimator is indeed biased by the metabolic cost. Why is this not the case here? Did you tune the output weights to remove this bias?

      Output weights were not tuned to remove the bias. On Fig. 1H in the first submission we plotted the bias for the network that minimizes the encoding error. We forgot to specify this in the text and figure caption, for which we apologize. We now replaced this figure with a new one (Fig. 1E) where we plot the bias of the network minimizing the average loss (with parameters as in Table 1). The bias of the network minimizing the error is close to zero, B^E = 0.02 and B^I = 0.03.  The bias of the network minimizing the loss is stronger and negative, B^E = -0.15 and B^I=-0.34. In the text of Results, we now report the bias of both networks (i.e., optimizing the encoding error and optimizing the loss). We also added a plot showing trial-averaged estimates and a time-dependent bias in each stimulus dimension (Supplementary figure S1 F). Note that the network minimizing the encoding error requires a lower metabolic constant (β = 6) than the network optimizing the loss (β=14), however, the optimal metabolic cost in both networks is nonzero. We revised the text and explained these points on page 5.

      (9) Pg. 4: "The distribution of firing rates was well described by a log-normal distribution" - I find this quite interesting, but it isn't clear to me how much this is due to the simulation of a finitetime noisy input. If the neurons all have equal tuning on the hypersphere, I would expect that the variability in firing is primarily due to how much the input correlates with their tuning. If this is true, I would guess that if you extend the duration of the simulation, the distribution would become tighter. Can you confirm that this is the stationary distribution of the firing rates?

      We now simulated the network with longer simulation time (10 seconds of simulated time instead of 2 seconds used previously) and also iterated the simulation across 10 trials to report a result that is general across random draws of tuning parameters (previously a single set of tuning parameters was used). The reviewer is correct that the distribution of firing rates of E neurons has become tighter with longer simulation time, but distributions remain log-normal. We also recomputed the coefficient of variation (CV) using the same procedure. We updated these plots on Fig. 1F.

      (10) Pg. 4: "We observed a strong average E-I balance" - based on the plots in Figure 1J, the inputs appear to be inhibition-dominated, especially for excitatory neurons. So by what criterion are you calling this strong average balance?

      The reviewer is correct about the fact that the net synaptic input to single neurons in our optimal network shows excess inhibition and the network is inhibition-dominated, so we revised this sentence (page 5) accordingly.  

      (11) Pg. 4: Stronger instantaneous balance in I neurons compared to E neurons - this is curious, and I have two questions: (1) can the authors provide any intuition or explanation for why this is the case in the model? and (2) does this relate to any literature on balance that might suggest inhibitory neurons are more balanced than excitatory neurons?

      In our model, I neurons receive excitatory and inhibitory synaptic currents through synaptic connections that are precisely structured. E neurons receive structured inhibition and a feedforward current. The feedforward current consists of M=3 independent OU processes projected on the tuning vectors of E neurons w<sub>i</sub><sup>E</sup>. We speculate that because the synaptic inhibition and feedforward current are different processes and the 3 OU inputs are independent, it is harder for E neurons to achieve the instantaneous balance that would be as precise as in I neurons. While we think that the feedforward current in our model reflects biologically plausible sensory processing, it is not a mechanistic model of feedforward processing. In biological neurons, real feedforward signals are implemented as a series of complex feedforward synaptic inputs from downstream areas, while the feedforward current in our model is a sum of stimulus features, and is thus a simplification of a biological process that generates feedforward signals. We speculate that a mechanistic implementation of the feedforward current could increase the instantaneous balance in E neurons.  Furthermore, the presence of EE connections could potentially also increase the instantaneous balance in E neurons. We revised the Discussion about these important questions that lie on the side of model limitations and could be advanced in future work. We could not find any empirical evidence directly comparing the instantaneous balance in E versus I neurons.  We have reported these considerations in the revised Discussion (page 16).

      (12) Pg. 5, comparison with random connectivity: "Randomizing E-I and I-E connectivity led to several-fold increases in the encoding error as well as to significant increases in the metabolic cost" and Discussion, pg. 11: "the structured network exhibits several fold lower encoding error compared to unstructured networks": I'm wondering if these comparisons are fair. First, regarding activity changes that affect the metabolic cost - it is known that random balanced networks can have global activity control, so it is not straightforward that randomizing the connectivity will change the metabolic cost. What about shuffling the weights but keeping an average balance for each neuron's input weights? Second, regarding coding error, it is trivial that random weights will not map onto the correct readout. A fairer comparison, in my opinion, would at least be to retrain the output weights to find the best-fitting decoder for the threedimensional signal, something more akin to a reservoir network.

      Thank you for raising these interesting questions. The purpose of comparing networks with and without connectivity structure was to observe causal effects of the connectivity structure on the neural activity. We agree that the effect on the encoding error is close to trivial, because shuffling of connectivity weights decouples neural dynamics from decoding weights. We have carefully considered Reviewer's suggestions to better compare the performance of structured and unstructured networks. 

      In reply to the first point, we followed the reviewer's suggestion and compared the optimal network with a shuffled network that matched the optimal network in its average balance. This was achieved by increasing the metabolic constant, decreasing the noise intensity and slightly decreasing the feedforward stimulus (we did not find a way to match the net current in both cell types by changing a single parameter). As we compared the metabolic cost between the optimal and the shuffled network with matched average balance, we still found lower metabolic cost in the optimal network, even though the difference was now smaller. We replaced Fig. 3B from the first submission with these new results in Fig. 4B and commented on them in the text (page 7).

      In reply to the second point, we followed reviewer’s suggestion and compared the encoding error (RMSE) of the optimal network and the network with shuffled connectivity where decoding weights are trained such as to optimally reconstruct the target signal. As suggested, we now analyzed the encoding error of the networks using decoding weights trained on the set of spike trains generated by the network using linear least square regression to minimize the decoding error. For a fair and quantitative comparison and because we did not train decoding weights of our structured model, we performed this same analysis using spike trains generated by networks with structured and shuffled recurrent connectivity. We found that the encoding error is smaller in the E population and much smaller in the I population in the structured compared to the random network. Decoding weights found numerically in the optimal network approach uniform distribution of weights that we used in our model (Fig. 4A, right). In contrast, decoding weights obtained from the random network do not converge to a uniform distribution, but instead form a much sparser distribution, in particular in I neurons (Supplementary Fig. S3 A). These additional results reported in the above mentioned figures are discussed in text on page 14.  

      (13) Pg. 5: "a shift from mean-driven to fluctuation-driven spiking" and Pg. 11 "a network structured as in our efficient coding solution operates in a dynamical regime that is more stimulus-driven, compared to an unstructured network that is more fluctuation driven" - I would expect that the balanced condition dictates that spiking is always fluctuation driven. I'm wondering if the authors can clarify this.

      We agree with the reviewer that networks with and without connectivity structure are fluctuation-driven, because in a mean-driven network the mean current must be suprathreshold (Ahmadian and Miller, 2021), which is not the case of either network. We removed the claim of the change from mean to fluctuation driven regime in the revised paper. We are grateful to the Reviewer for helping us tighten the elaboration of our findings.

      (14) Pg. 5: "suggesting that variability of spiking is independent of the connectivity structure" - the literature of balanced networks argues against this. Is this not simply because you have a noisy input? Can you test this claim?

      We thank the reviewer for the suggestion. We tested this claim by measuring the coefficient of variation in networks receiving a constant stimulus. In particular, we set the same strength in each of the M=3 stimulus dimensions and set the stimulus amplitude such as to match the firing rate of the optimal network in response to the OU stimulus. We computed the coefficient of variation in 200 simulation trials.  The removal of connectivity structure did not cause significant change of the coefficient of variation in a network driven by a constant stimulus (Fig. 4E). These additional results are discussed in text on page 7. 

      We also taken the suggestion about variability of spiking being independent of the connectivity structure. We removed this claim in the revision, because we only tested a couple of specific cases where the connectivity is structured with respect to tuning similarity (fully structured, fully unstructured and partially unstructured networks). This is not exhaustive of all possible structures that recurrent connectivity may have.

      (15) Pg. 6: "we also removed the connectivity structure only partially, keeping like-to-like connectivity structure and removing all structure beyond like-to-like" - can you clarify what this means, perhaps using an equation? What connectivity structure is there besides like-to-like?

      In the optimal model, the strength of the synapse between a pair of neurons is proportional to the tuning similarity of the two neurons, Y<sub>ij</sub> proportional to J<sub>ij</sub> for Y<sub>ij</sub> >0 (see Eq. 24 and Fig. 1C(ii)). Besides networks with optimal connectivity, we also tested networks with a simpler connectivity rule. Such a simpler rule prescribes a connection if the pair of neurons has similar tuning (Y<sub>ij</sub> >0), and no connection otherwise. The strength of the connection following this simpler connectivity rule is otherwise random (and not proportional to pairwise tuning similarity Y<sub>ij</sub> as it is in the optimal network). We clarified this in the revision (page 8), also by avoiding the term “like-to-like” for the second type of networks, which could indeed be prone to confusion.

      (16) Pgs. 6-7: "we indeed found that optimal coding efficiency is achieved with weak adaptation in both cell types" and "adaptation in E neurons promotes efficient coding because it enforces every spike to be error- correcting" - this was not clear to me. First, it appears as though optimal efficiency is achieved without adaptation nor facilitation, i.e., when the time constants are all equal. Indeed, this is what is stated in Table 1. So is there really a weak adaptation present in the optimal case? Second, it seems that the network already enforces each spike to be errorcorrecting without adaptation, so why and how would adaptation help with this?

      We agree with the Reviewer that the network without adaptation in E and I neurons is already optimal. It is also true that most spikes in an optimal network should already be error-correcting (besides some spikes that might be caused by the noise). However, regimes with weak adaptation in E neurons remain close to optimality. Spike-triggered facilitation, meanwhile, ads spikes that are unnecessary and decrease network efficiency. We revised the Fig.5 (Fig. 4 in first submission) and replaced 2-dimensional plots in Fig.4 C-F with plots that show the differential effect of adaptation in E neurons (top) and in I neurons (bottom plots) for the measures of the encoding error (RMSE), the efficiency (average loss) and the firing rate (Fig. 5B-D). On the new Fig. 5C it is evident that the loss of E and I population grows slowly with adaptation in E neurons (top) while it grows faster with adaptation in I neurons (bottom). These considerations are explained in revised text on page 9.

      (17) Pg. 7: "adaptation in E neurons resulted in an increase of the encoding error in E neurons and a decrease in I neurons" - it would be nice if the authors could provide any explanation or intuition for why this is the case. Could it perhaps be because the E population has fewer spikes, making the signal easier to track for the I population?

      We agree that this could indeed be the case. We commented on it in revision (page 9).

      (18) Pg. 7: "The average balance was precise...with strong adaptation in E neurons, and it got weaker when increasing the adaptation in I neurons (Figure 4E)" - I found the wording of this a bit confusing. Didn't the balance get stronger with larger I time constants?

      By increasing the time constant of I neurons, the average imbalance got weaker (closer to zero) in E neurons (Fig. 5G, left), but stronger (further away from zero) in I neurons (Fig. 5G, right). We have revised the text on page 9 to make this clearer.

      (19) Pg. 7: Figure 4F is not directly described in the text.

      We have now added text (page 9) commenting on this figure in revision.

      (20) Pg. 8: "indicating that the recurrent network dynamics generates substantial variability even in the absence of variability in the external current" -- how does this observation relate to your earlier claim (which I noted above) that "variability of spiking is independent of connectivity structure"?

      We agree that the claim about variability of spiking being independent of connectivity structure was overstated and we thus removed it. The observation that we wanted to report is that both structured and unstructured networks have very similar levels of variability of spiking of single neurons. The fact that much of the variability of the optimal network is generated by recurrent connections is not incompatible. We revised the related text (page 11) for clarity.

      (21) Pg. 9: "We found that in the optimally efficient network, the mean E-I and I-E synaptic efficacy are exactly balanced" - isn't this by design based on the derivation of the network?

      True, the I-E connectivity matrix is the transpose of the E-I connectivity matrix, and their means are the same by the analytical solution. This however remains a finding of our study. We have clarified this in the revised text (page 12).

      (22) Pg. 30, eq. 25: the authors should verify if they include all possible connectivity here, or if they exclude EE connectivity beforehand.

      We now specify that the equation for recurrent connectivity (Eq. 24, Eq. 25 in first submission) does not include the E-E connectivity in the revised text (page 41).

      Reviewer #3 (Recommendations For The Authors):

      Essential

      (1)  Currently, they measure the RMSE and cost of the E and I population separately, and the 1CT model. Then, they average the losses of the E and I populations, and compare that to the 1CT model, with the conclusion that the 1CT model has a higher average loss. However, it seems to me that only the E population should be compared to the 1CT model. The I population loss determines how well the I population can represent the E population representation (which it can do extremely well). But the overall coding accuracy of the network of the input signal itself is only represented by the E population. Even if you do combine the E and I losses, they should be summed, not averaged. I believe a more fair conclusion would be that the E/I networks have generally slightly worse performance because of needing to follow Dale's law, but are still highly efficient and precise nonetheless. Of course, I might be making a critical error somewhere above, and happy to be convinced otherwise!

      We carefully considered the reviewer's comment and tested different ways of combining the losses of the E and I population. We decided to follow the reviewer's suggestion and to compare the loss of the E population of the E-I model with the loss of the one cell type model. As evident already from the Fig. 8G, such comparison indeed changes the result to make the 1CT model more efficient. Also, the sum of losses of E and I neurons results in the 1CT model being more efficient than the E-I model. Note, however, the robustness of the E-I model to changes in the metabolic constant (Fig. 6C, top). The firing rates of the E-I model stay within physiological ranges for any value of the metabolic constant, while the firing rate of the 1CT model skyrocket for the metabolic constant that is lower than optimal (Fig. 8I).

      We added to Results (page 14) a summary of these findings.

      (2) The methods and main text should make much clearer what aspects of the derivation are novel, and which are not novel (see review weaknesses for specifics).

      We specified these aspects, as discussed in more detail in the above reply to point 4 of the public review of Reviewer 1.

      Request:

      If possible, I would like to see the code before publication and give recommendations on that (is it easy to parse and reproduce, etc.)

      We are happy to share the computer code with the reviewer and the community. We added a link to our public repository containing the computer code that we used for simulations and analysis to the preprint and submission (section “Code availability” on page 17). 

      Suggestions:

      (1) I believe that for an eLife audience, the main text is too math-heavy at the beginning, and it could be much simplified, or more effort could be made to guide the reader through the math.

      We tried to do our best to improve the clarity of description of mathematical expressions in the main text.

      (2) Generally vector notation makes network equations for spiking neurons much clearer and easier to parse, I would recommend using that throughout the paper (and not just in the supplementary methods).

      We now use vector notation throughout the paper whenever we think that this improves the intelligibility of the text. 

      (3) In the discussion or at the end of the results adding a clear section summarizing what the minimal requirements or essential assumptions are for biological networks to implement this theory would be helpful for experimentalists and theorists alike.

      We have added such a section in Discussion (page 15). 

      (5) I think the title is a bit too cumbersome and hard to parse. Might I suggest something like 'Efficient coding and energy use in biophysically realistic excitatory-inhibitory spiking networks' or 'Biophysically constrained excitatory-inhibitory spiking networks can efficiently implement efficient coding'.

      We followed reviewer’s suggestion and changed the title to “Efficient coding in biophysically realistic excitatory-inhibitory spiking networks.”

      (6) How the connections were shuffled exactly was not clear to me in how it was described now. Did they just take the derived connectivity, and shuffle the connections around? I recommend a more explicit methods section on it (I might have missed it).

      Indeed, the connections of the optimal network were randomly shuffled, without repetition, between all neuronal pairs of a specific connectivity matrix. This allows to preserve all properties of the distribution of connectivity weights and only removes the structure of the connectivity, which is precisely what we wanted to test. We now added a section in Methods (“Removal of connectivity structure”) on pages 51-52 where we explain how the connectivity structure is removed.

      (7) Figure 1 sub-panel ordering was confusing to read (first up down, then left right). Not sure if re- arranging is possible, but perhaps it could be A, B, and C at the top, with subsublabels (i) and (ii). Might become too busy though.

      We followed this suggestion and rearranged the Fig. 1 as suggested by the reviewer. 

      (8) Equation 3 in the main text should specify that 'y' stands for either E or I.

      This has been specified in the revision (page 3). 

      (9) Figure 1D shows a rough sketch of the types of connectivities that exist, but I would find it very useful to also see the actual connection strengths and the effect of enforcing Dale's law.

      We revised this figure (now Fig. 1B (ii)) and added connection strengths as well as a sketch of a connection that was removed because of Dale’s law.

      (10) The main text mentions how the readout weights are defined (normal distributions), but I think this should also be mentioned in the methods.

      Agreed. We indeed had Methods section “Parametrization of synaptic connectivity (page 46), where we explain how readout weights are defined. We apologize if a call on this section was not salient enough in the first submission. We made sure that the revised main text contains a clear pointer to this Methods section for details. 

      (11) The text seems to mix ‘decoding weights’ and ‘readout weights’.

      Thanks for this suggestion to use consistent language. We opted for ‘decoding weights’ and removed ‘readout weights’.

      (12) The way the paper is written makes it quite hard to parse what are new experimental predictions, and what results reproduce known features. I wonder if some sort of 'box' is possible with novel predictions that experimentalists could easily look at and design an experiment around.

      We now revised the text. We clarified for every property of the model if this property is a prediction of facts that were not yet experimentally tested or if it accounts for previously observed properties of biological neurons. Please see the reply to point 4 of Reviewer 1. 

      (13) Typo's etc.:

      Page 5 bottom -- ("all") should have one of the quotes change direction (common latex typo, seems to be the only place with the issue).

      We thank the reviewer for pointing out this typo that has been removed in revision.

    1. There are multiple choices for the Proxy App in this case: Option 1: using the single TAC ProxyApp on TON to rely users’ transactions on TAC EVM extension network. In this case, you can use the TAC SDK to easily implement the frontend with TON Connect. Option 2: you can build your own TAC Proxy App on TON and build a frontend that will prepare and execute transacitons for your proxy app. Note: TAC SDK (Option 1) is reccomended if you don’t know how to write secure code in funC (TON Native language). Opt for a custom Proxy App only if you need specific business logic to be executed on TON side before triggering Solidity code deployed on TAC EVM.

      remove this

    1. Auto-Generation Process Proxy apps are automatically generated when you integrate your EVM application with TAC. You don’t need to write proxy contract code yourself - TAC handles this automatically.

      remove this - tac team is working on this planned on roadmap

    1. API Access On the frontend side, developers can access the TON network through two public endpoints: Mainnet: https://toncenter.com/api/v2/jsonRPC Testnet: https://testnet.toncenter.com/api/v2/jsonRPC To deploy Solidity code and directly access the TAC EVM Layer, developers can rely on public RPC endpoints that implements the standard EVM JSON-RPC interface: Mainnet: to-be-defined Testnet (Turin): [Coming Soon - later this week] Third party providers like Anrk, offers private RPC endpoints and agumented APIs.

      remove this

    1. There is no agreed theory of language--aselementary mechanics, for example, is an agreed the-ory. Later on in this book we will see why there can-not be.

      In this small section the writer claims to understand that when pertaining to the Theory of Language there is no universal code/law that readers or philosophers can come together and claim. The theory isn't a static subject with its own concrete law and order that's followed like the agreed upon elementary mechanics.

    1. # calculating the monthly average temperature for each month

      you compute it correctly, but it wasn't necessary to plot every month separately. If you want to check it for every month, I would recommend to just use a for loop, i.e.: for m in np.arange(1,12.1,1): TM_month= TM.sel(month=m)-273.15 .... This avoids repeating similar code...

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity

      Summary: The manuscript by Yang et al. describes a new CME accessory protein. CCDC32 has been previously suggested to interact with AP2 and in the present work the authors confirm this interaction and show that it is a bona fide CME regulator. In agreement with its interaction with AP2, CCDC32 recruitment to CCPs mirrors the accumulation of clathrin. Knockdown of CCDC32 reduces the amount of productive CCPs, suggestive of a stabilisation role in early clathrin assemblies. Immunoprecipitation experiments mapped the interaction of CCDC42 to the α-appendage of the AP2 complex α-subunit. Finally, the authors show that the CCDC32 nonsense mutations found in patients with cardio-facial-neuro-developmental syndrome disrupt the interaction of this protein to the AP2 complex. The manuscript is well written and the conclusions regarding the role of CCDC32 in CME are supported by good quality data. As detailed below, a few improvements/clarifications are needed to reinforce some of the conclusions, especially the ones regarding CFNDS.

      Response: We thank the referee for their positive comments. In light of a recently published paper describing CCDC32 as a co-chaperone required for AP2 assembly (Wan et al., PNAS, 2024, see reviewer 2), we have added several additional experiments to address all concerns and consequently gained further insight into CCDC32-AP2 interactions and the important dual role of CCDC32 in regulating CME.

      Major comments:

      1) Why did the protein could just be visualized at CCPs after knockdown of the endogenous protein? This is highly unusual, especially on stable cell lines. Could this be that the tag is interfering with the expressed protein function rendering it incapable of outcompeting the endogenous? Does this points to a regulated recruitment?

      Response: The reviewer is correct, this would be unusual; however, it is not the case. We misspoke in the text (although the figure legend was correct) these experiments were performed without siRNA knockdown and we can indeed detect eGFP-CCDC32 being recruited to CCPs in the presence of endogenous protein. Nonetheless, we repeated the experiment to be certain.

      2) The disease mutation used in the paper does not correspond to the truncation found in patients. The authors use an 1-54 truncation, but the patients described in Harel et al. have frame shifts at the positions 19 (Thr19Tyrfs*12) and 64 (Glu64Glyfs*12), while the patient described in Abdalla et al. have the deletion of two introns, leading to a frameshift around amino acid 90. Moreover, to be precisely test the function of these disease mutations, one would need to add the extra amino acids generated by the frame shift. For example, as denoted in the mutation description in Harel et al., the frameshift at position 19 changes the Threonine 19 to a Tyrosine and ads a run of 12 extra amino acids (Thr19Tyrfs*12).

      Response: The label of the disease mutant p.(Thr19Tyrfs∗12) and p.(Glu64Glyfs∗12) is based on a 194aa polypeptide version of CCDC32 initiated at a nonconventional start site that contains a 9 aa peptide (VRGSCLRFQ) upstream of the N-terminus we show. Thus, we are indeed using the appropriate mutation site (see: https://www.uniprot.org/uniprotkb/Q9BV29/entry). The reviewer is correct that we have not included the extra 12 aa in our construct; however as these residues are not present in the other CFNDS mutants, we think it unlikely that they contribute to the disease phenotype. Rather, as neither of the clinically observed mutations contain the 78-98 aa sequence required for AP2 binding and CME function, we are confident that this defect contributed to the disease. Thus, we are including the data on the CCDC32(1-54) mutant, as we believe these results provide a valuable physiological context to our studies.

      3) The frameshift caused by the CFNDS mutations (especially the one studied) will likely lead to nonsense mediated RNA decay (NMD). The frameshift is well within the rules where NMD generally kicks in. Therefore, I am unsure about the functional insights of expressing a disease-related protein which is likely not present in patients.

      Response: We thank the reviewer for bringing up this concern. However, as shown in new Figure S1, the mutant protein is expressed at comparable levels as the WT, suggesting that NMD is not occurring.

      4) Coiled coils generally form stable dimers. The typically hydrophobic core of these structures is not suitable for transient interactions. This complicates the interpretation of the results regarding the role of this region as the place where the interaction to AP2 occurs. If the coiled coil holds a stable CCDC32 dimer, disrupting this dimer could reduce the affinity to AP2 (by reduced avidity) to the actual binding site. A construct with an orthogonal dimeriser or a pulldown of the delta78-98 protein with of the GST AP2a-AD could be a good way to sort this issue.

      Response: We were unable to model a stable dimer (or other oligomer) of this protein with high confidence using Alphafold 3.0. Moreover, we were unable to detect endogenous CCDC32 co-immunoprecipitating with eGFP-CCDC32 (Fig. S6C). Thus, we believe that the moniker, based solely on the alpha-helical content of the protein is a misnomer. We have explained this in the main text.

      Minor comments:

      1) The authors interchangeably use the term "flat CCPs" and "flat clathrin lattices". While these are indeed related, flat clathrin lattices have been also used to refer to "clathrin plaques". To avoid confusion, I suggest sticking to the term "flat CCPs" to refer to the CCPs which are in their early stages of maturation.

      Response: Agreed. Thank you for the suggestion. We have renamed these structures flat clathrin assemblies, as they do not acquire the curvature needed to classify them as pits, and do not grow to the size that would classify then as plaques.

      Significance

      General assessment: CME drives the internalisation of hundreds of receptors and surface proteins in practically all tissues, making it an essential process for various physiological processes. This versatility comes at the cost of a large number of molecular players and regulators. To understand this complexity, unravelling all the components of this process is vital. The manuscript by Yang et al. gives an important contribution to this effort as it describes a new CME regulator, CCDC32, which acts directly at the main CME adaptor AP2. The link to disease is interesting, but the authors need to refine their experiments. The requirement for endogenous knockdown for recruitment of the tagged CCDC32 is unusual and requires further exploration.

      Advance: The increased frequency of abortive events presented by CCDC32 knockdown cells is very interesting, as it hints to an active mechanism that regulates the stabilisation and growth of clathrin coated pits. The exact way clathrin coated pits are stabilised is still an open question in the field.

      Audience: This is a basic research manuscript. However, given the essential role of CME in physiology and the growing number of CME players involved in disease, this manuscript can reach broader audiences.

      Response: We thank the referee for recognizing the 'interesting' advances our studies have made and for considering these studies as 'an important contribution' to 'an essential process for various physiological processes' and able 'to reach broader audiences'. We have addressed and reconciled the reviewer's concerns in our revised manuscript.

      Field of expertise of the reviewer: Clathrin mediated endocytosis, cell biology, microscopy, biochemistry.


      Reviewer #2

      Evidence, reproducibility and clarity

      In this manuscript, the authors demonstrate that CCDC32 regulates clathrin-mediated endocytosis (CME). Some of the findings are consistent with a recent report by Wan et al. (2024 PNAS), such as the observation that CCDC32 depletion reduces transferrin uptake and diminishes the formation of clathrin-coated pits. The primary function of CCDC32 is to regulate AP2 assembly, and its depletion leads to AP2 degradation. However, this study did not examine AP2 expression levels. CCDC32 may bind to the appendage domain of AP2 alpha, but it also binds to the core domain of AP2 alpha. Overall, while this work presents some interesting ideas, it remains unclear whether CCDC32 regulates AP2 beyond the assembly step.

      Response: We thank the reviewer for drawing our attention to the Wan et al. paper, that appeared while this work was under review. However, our in vivo data are not fully consistent with the report from Wan et al. The discrepancies reveal a dual function of CCDC32 in CME that was masked by complete knockout vs siRNA knockdown of the protein, and also likely affected by the position of the GFP-tag (C- vs N-terminal) on this small protein. Thus:

      • Contrary to Wan et al., we do not detect any loss of AP2 expression (see new Figure S3A-B) upon siRNA knockdown. Most likely the ~40% residual CCDC32 present after siRNA knockdown is sufficient to fulfill its catalytic chaperone function but not its structural role in regulating CME beyond the AP2 assembly step.
      • Contrary to Wan et al., we have shown that CCDC32 indeed interacts with intact AP2 complex (Figure S3C and 6B,C) showing that all 4 subunits of the AP2 complex co-IP with full length eGFP-CCDC32. Interestingly, whereas the full length CCDC32 pulls down the intact AP2 complex, co-IP of the ∆78-98 mutant retains its ability to pull down the b2-µ2 hemicomplex, its interactions with α:σ2 are severely reduced. While this result is consistent with the report of Wan et al that CCDC32 binds to the α:σ2 hemi-complex, it also suggests that the interactions between CCDC32 and AP2 are more complex and will require further studies.
      • Contrary to Wan et al., we provide strong evidence that CCDC32 is recruited to CCPs. Interestingly, modeling with AlphaFold 3.0 identifies a highly probably interaction between alpha helices encoded by residues 66-91 on CCDC32 and residues 418-438 on a. The latter are masked by µ2-C in the closed confirmation of the AP2 core, but exposed in the open confirmation triggered by cargo binding, suggesting that CCDC32 might only bind to membrane-bound AP2. Thus, our findings are indeed novel and indicate striking multifunctional roles for CCDC32 in CME, making the protein well worth further study.

      • Besides its role in AP2 assembly, CCDC32 may potentially have another function on the membrane. However, there is no direct evidence showing that CCDC32 associates with the plasma membrane.

      Response: We disagree, our data clearly shows that CCDC32 is recruited to CCPs (Fig. 1B) and that CCPs that fail to recruit CCDC32 are short-lived and likely abortive (Fig. 1C). Wan et al. did not observe any colocalization of C-terminally tagged CCDC32 to CCPs, whereas we detect recruitment of our N-terminally tagged construct, which we also show is functional (Fig. 6F). Further, we have demonstrated the importance of the C-terminal region of CCDC32 in membrane association (see new Fig. S7). Thus, we speculate that a C-terminally tagged CCDC32 might not be fully functional. Indeed, SIM images of the C-terminally-tagged CCDC32 in Wan et al., show large (~100 nm) structures in the cytosol, which may reflect aggregation.

      CCDC32 binds to multiple regions on AP2, including the core domain. It is important to distinguish the functional roles of these different binding sites.

      Response: We have localized the AP2-ear binding region to residues 78-99 and shown these to be critical for the functions we have identified. As described above we now include data that are complementary to those of Wan et al. However, our data also clearly points to additional binding modalities. We agree that it will be important and map these additional interactions and identify their functional roles, but this is beyond the scope of this paper.

      AP2 expression levels should be examined in CCDC32 depleted cells. If AP2 is gone, it is not surprising that clathrin-coated pits are defective.

      Response: Agreed and we have confirmed this by western blotting (Figure S3A-B) and detect no reduction in levels of any of the AP2 subunits in CCDC32 siRNA knockdown cells. As stated above this could be due to residual CCDC32 present in the siRNA KD vs the CRISPR-mediated gene KO.

      If the authors aim to establish a secondary function for CCDC32, they need to thoroughly discuss the known chaperone function of CCDC32 and consider whether and how CCDC32 regulates a downstream step in CME.

      Response: Agreed. We have described the Wan et al paper, which came out while our manuscript was in review, in our Introduction. As described above, there are areas of agreement and of discrepancies, which are thoroughly documented and discussed throughout the revised manuscript.

      The quality of Figure 1A is very low, making it difficult to assess the localization and quantify the data.

      Response: The low signal:noise in Fig. 1A the reviewer is concerned about is due to a diffuse distribution of CCDC32 on the inner surface of the plasma membrane. We now, more explicitly describe this binding, which we believe reflects a specific interaction mediated by the C-terminus of CCDC32; thus the degree of diffuse membrane binding we observe follows: eGFP-CCDC32(FL)> eGFP-CCDC32(∆78-98)>eGFP-CCDC32(1-54)~eGFP/background (see new Fig. S7). Importantly, the colocalization of CCDC32 at CCPs is confirmed by the dynamic imaging of CCPs (Fig 1B).

      In Figure 6, why aren't AP2 mu and sigma subunits shown?

      Response: Agreed. Not being aware of CCDC32's possible dual role as a chaperone, we had assumed that the AP2 complex was intact. We have now added this data in Figure 6 B,C and Fig. S3C, as discussed above.

      Page 5, top, this sentence is confusing: "their surface area (~17 x 10 nm2) remains significantly less than that required for the average 100 nm diameter CCV (~3.2 x 103 nm2)."

      Response: Thank you for the criticism. We have clarified the sentence and corrected a typo, which would definitely be confusing. The section now reads, "While the flat CCSs we detected in CCDC32 knockdown cells were significantly larger than in control cells (Fig. 4D, mean diameter of 147 nm vs. 127 nm, respectively), they are much smaller than typical long-lived flat clathrin lattices (d{greater than or equal to}300 nm)(Grove et al., 2014). Indeed, the surface area of the flat CCSs that accumulate in CCDC32 KD cells (mean ~1.69 x 104 nm2) remains significantly less than the surface area of an average 100 nm diameter CCV (~3.14 x 104 nm2). Thus, we refer to these structures as 'flat clathrin assemblies' because they are neither curved 'pits' nor large 'lattices'. Rather, the flat clathrin assemblies represent early, likely defective, intermediates in CCP formation."

      Significance

      Please see above.(from above: Overall, while this work presents some interesting ideas, it remains unclear whether CCDC32 regulates AP2 beyond the assembly step)

      Response: Our responses above argue that we have indeed established that CCDC32 regulates AP2 beyond the assembly step. We have also identified several discrepancies between our findings and those reported by Wan et al., most notably binding between CCDC32 and mature AP2 complexes and the AP2-dependent recruitment of CCDC32 to CCPs. It is possible that these discrepancies may be due to the position of the GFP tag (ours is N-terminal, theirs is C-terminal; we show that the N-terminal tagged CCDC32 rescues the knockdown phenotype, while Wan et al., do not provide evidence for functionality of the C-terminal construct).

      __Reviewer #3 __

      Evidence, reproducibility and clarity (Required):

      In this manuscript, Yang et al. characterize the endocytic accessory protein CCDC32, which has implications in cardio-facio-neuro-developmental syndrome (CFNDS). The authors clearly demonstrate that the protein CCDC32 has a role in the early stages of endocytosis, mainly through the interaction with the major endocytic adaptor protein AP2, and they identify regions taking part in this recognition. Through live cell fluorescence imaging and electron microscopy of endocytic pits, the authors characterize the lifetimes of endocytic sites, the formation rate of endocytic sites and pits and the invagination depth, in addition to transferrin receptor (TfnR) uptake experiments. Binding between CCDC32 and CCDC32 mutants to the AP2 alpha appendage domain is assessed by pull down experiments. Together, these experiments allow deriving a phenotype of CCDC32 knock-down and CCDC32 mutants within endocytosis, which is a very robust system, in which defects are not so easily detected. A mutation of CCDC32, known to play a role in CFNDS, is also addressed in this study and shown to have endocytic defects.

      Response: We thank the reviewer for their positive remarks regarding the quality of our data and the strength of our conclusions.

      In summary, the authors present a strong combination of techniques, assessing the impact of CCDC32 in clathrin mediated endocytosis and its binding to AP2, whereby the following major and minor points remain to be addressed:

      • The authors show that CCDC32 depletion leads to the formation of brighter and static clathrin coated structures (Figure 2), but that these were only prevalent to 7.8% and masked the 'normal' dynamic CCPs. At the same time, the authors show that the absence of CCDC32 induces pits with shorter life times (Figure 1 and Figure 2), the 'majority' of the pits. Clarification is needed as to how the authors arrive at these conclusions and these numbers. The authors should also provide (and visualize) the corresponding statistics. The same statement is made again later on in the manuscript, where the authors explain their electron microscopy data. Was the number derived from there?

      These points are critical to understanding CCDC32's role in endocytosis and is key to understanding the model presented in Figure 8. The numbers of how many pits accumulate in flat lattices versus normal endocytosis progression and the actual time scales could be included in this model and would make the figure much stronger.

      Response: Thank you for these comments. We understand the paradox between the visual impression and the reality of our dynamic measurements. We have been visually misled by this in previous work (Chen et al., 2020), which emphasizes the importance of unbiased image analysis afforded to us through the well-documented cmeAnalysis pipeline, developed by us (Aguet et al., 2013) and now used by many others (e.g. (He et al., 2020)).

      The % of static structures was not derived from electron microscopy data, but quantified using cmeAnalysis, which automatedly provides the lifetime distribution of CCPs. We have now clarified this in the manuscript and added a histogram (Fig. S4) quantifying the fraction of CCPs in lifetime cohorts 150s (static).

      • In relation to the above point, the statistics of Figure 2E-G and the analysis leading there should also be explained in more detail: For example, what are the individual points in the plot (also in Figures 6G and 7G)? The authors should also use a few phrases to explain software they use, for example DASC, in the main text.

      Response: Each point in these bar graphs represents a movie, where n{greater than or equal to}12. These details have been added to the respective figure legend. We have also added a brief description of DASC analysis in the text.

      • There are several questions related to the knock-down experiments that need to be addressed:

      Firstly, knock-down of CCDC32 does not seem to be very strong (Figure S2B). Can the level of knock-down be quantified?

      Response: We have now quantified the KD efficiency. It is ~60%. This turns out to be fortuitous (see responses to reviewer 2), as a recent publication, which came out after we completed our study, has shown by CRISPR-mediated knockout, that CCD32 also plays an essential chaperone function required for AP2 assembly. We do not see any reduction in AP2 levels or its complex formation under our conditions (see new Supplemental Figure S3), which suggests that the effects of CCDC32 on CCP dynamics are more sensitive to CCDC32 concentration than its roles as a chaperone. Our phenotypes would have been masked by more efficient depletion of CCDC32.

      In page 6 it is indicated that the eGFP-CCDC32(1-54) and eGFP-CCDC32(∆78-98) constructs are siRNA-resistant. However in Fig S2B, these proteins do not show any signal in the western blot, so it is not clear if they are expressed or simply not detected by the antibody. The presence of these proteins after silencing endogenous CCDC32 needs to be confirmed to support Figures 6 and Figures 7, which critically rely on the presence of the CCDC32 mutants.

      Response: Unfortunately, the C-terminally truncated CCDC32 proteins are not detected because they lack the antibody epitope, indeed even the D78-98 deletion is poorly detected (compare the GFP blot in new S1A with the anti-CCDC32 blot in S1B). However, these constructs contain the same siRNA-resistance mutation as the full length protein. That they are expressed and siRNA resistant can be seen in Fig. S2A (now Fig. S1A) blotting for GFP.

      In Figures 6 and 7, siRNA knock-down of CCDC32 is only indicated for sub-figures F to G. Is this really the case? If not, the authors should clarify. The siRNA knock-down in Figure 1 is also only mentioned in the text, not in the figure legend. The authors should pay attention to make their figure legends easy to understand and unambiguous.

      Response: No, it is not the case. Thank you for pointing out the uncertainty. We have added these details to the Figure legends and checked all Figure legends to ensure that they clearly describe the data shown.

      • It is not exactly clear how the curves in Figure 3C (lower panel) on the invagination depth were obtained. Can the authors clarify this a bit more? For example, what are kT and kE in Figure 3A? What is I0? And how did the authors derive the logarithmic function used to quantify the invagination depth? In the main text, the authors say that the traces were 'logarithmically transformed'. This is not a technical term. The authors should refer to the actual equation used in the figure.

      Response: This analysis was developed by the Kirchhausen lab (Saffarian and Kirchhausen, 2008). We have added these details and reference them in the Figure legend and in the text. We also now use the more accurate descriptor 'log-transformed'.

      • In the discussion, the claim 'The resulting dysregulation of AP2 inhibits CME, which further results in the development of CFNDS.' is maybe a bit too strong of a statement. Firstly, because the authors show themselves that CME is perturbed, but by no means inhibited. Secondly, the molecular link to CFNDS remains unclear. Even though CCDC32 mutants seem to be responsible for CFNDS and one of the mutant has been shown in this study to have a defect in endocytosis and AP2 binding, a direct link between CCDC32's function in endocytosis and CFNDS remains elusive. The authors should thus provide a more balanced discussion on this topic.

      Response: We have modified and softened our conclusions, which now read that the phenotypes we see likely "contribute to" rather than "cause" the disease.

      • In Figure S1, the authors annotate the presence of a coiled-coil domain, which they also use later on in the manuscript to generate mutations. Could the authors specify (and cite) where and how this coiled-coil domain has been identified? Is this predicted helix indeed a coiled-coil domain, or just a helix, as indicated by the authors in the discussion?

      Response: See response to Reviewer 1, point 4. We have changed this wording to alpha-helix. The 'coiled-coil' reference is historical and unlikely a true reflection of CCDC32 structure. AlphaFold 3.0 predictions were unable to identify with certainly any coiled-coil structures, even if we modelled potential dimers or trimers; and we find no evidence of dimerization of CCDC32 in vivo. We have clarified this in the text.

      Minor comments

      • In general, a more detailed explanation of the microscopy techniques used and the information they report would be beneficial to provide access to the article also to non-expert readers in the field. This concerns particularly the analysis methods used, for example: How were the cohort-averaged fluorescence intensity and lifetime traces obtained? How do the tools cmeAnalysis and DASC work? A brief explanation would be helpful.

      Response: We have expanded Methods to add these details, and also described them in the main text.

      • The axis label of Figure 2B is not quite clear. What does 'TfnR uptake % of surface bound' mean? Maybe the authors could explain this in more detail in the figure legend? Is the drop in uptake efficiency also accessible by visual inspection of the images? It would be interesting to see that.

      Response: This is a standard measure of CME efficiency. 'TfnR uptake % of surface bound' = Internalized TfnR/Surface bound TfnR. Again, images may be misleading as defects in CME lead to increased levels of TfnR on the cell surface, which in turn would result in more Tfn uptake even if the rate of CME is decreased.

      • Figure 4: How is the occupancy of CCPs in the plasma membrane measured? What are the criteria used to divide CCSs into Flat, Dome or Sphere categories?

      Response: We have expanded Methods to add these details. Based on the degree of invagination, the shapes of CCSs were classified as either: flat CCSs with no obvious invagination; dome-shaped CCSs that had a hemispherical or less invaginated shape with visible edges of the clathrin lattice; and spherical CCSs that had a round shape with the invisible edges of clathrin lattice in 2D projection images. In most cases, the shapes were obvious in 2D PREM images. In uncertain cases, the degree of CCS invagination was determined using images tilted at {plus minus}10-20 degrees. The area of CCSs were measured using ImageJ and used for the calculation of the CCS occupancy on the plasma membrane.

      • Figure 5B: Can the authors explain, where exactly the GFP was engineered into AP2 alpha? This construct does not seem to be explained in the methods section.

      Response: We have added this information. The construct, which corresponds to an insertion of GFP into the flexible hinge region of AP2, at aa649, was first described by (Mino et al., 2020) and shown to be fully functional. This information has been added to the Methods section.

      • Figure S1B: The authors should indicate the colour code used for the structural model.

      Response: We have expanded our structural modeling using AlphaFold 3.0 in light of the recent publication suggesting the CCDC32 interacts with the µ2 subunit and does not bind full length AP2. These results are described in the text. The color coding now reflects certainty values given by AlphaFold 3.0 (Fig. S6B, D).

      • The list of primers referred to in the materials and methods section does not exist. There is a Table S1, but this contains different data. The actual Table S1 is not referenced in the main text. This should be done.

      Response: We apologize for this error. We have now added this information in Table S2.

      __ Significance (Required):__

      In this study, the authors analyse a so-far poorly understood endocytic accessory protein, CCDC32, and its implication for endocytosis. The experimental tool set used, allowing to quantify CCP dynamics and invagination is clearly a strength of the article that allows assessing the impact of an accessory protein towards the endocytic uptake mechanism, which is normally very robust towards mutations. Only through this detailed analysis of endocytosis progression could the authors detect clear differences in the presence and absence of CCDC32 and its mutants. If the above points are successfully addressed, the study will provide very interesting and highly relevant work allowing a better understanding of the early phases in CME with implication for disease.

      The study is thus of potential interest to an audience interested in CME, in disease and its molecular reasons, as well as for readers interested in intrinsically disordered proteins to a certain extent, claiming thus a relatively broad audience. The presented results may initiate further studies of the so-far poorly understood and less well known accessory protein CCDC32.

      Response: We thank the reviewer for their positive comments on the significance of our findings and the importance of our detailed phenotypic analysis made possible by quantitative live cell microscopy. We also believe that our new structural modeling of CCDC32 and our findings of complex and extensive interactions with AP2 make the reviewers point regarding intrinsically disordered proteins even more interesting and relevant to a broad audience. We trust that our revisions indeed address the reviewer's concerns.

      The field of expertise of the reviewer is structural biology, biochemistry and clathrin mediated endocytosis. Expertise in cell biology is rather superficial.


      References:

      Aguet, F., Costin N. Antonescu, M. Mettlen, Sandra L. Schmid, and G. Danuser. 2013. Advances in Analysis of Low Signal-to-Noise Images Link Dynamin and AP2 to the Functions of an Endocytic Checkpoint. Developmental Cell. 26:279-291.

      Chen, Z., R.E. Mino, M. Mettlen, P. Michaely, M. Bhave, D.K. Reed, and S.L. Schmid. 2020. Wbox2: A clathrin terminal domain-derived peptide inhibitor of clathrin-mediated endocytosis. Journal of Cell Biology. 219.

      Grove, J., D.J. Metcalf, A.E. Knight, S.T. Wavre-Shapton, T. Sun, E.D. Protonotarios, L.D. Griffin, J. Lippincott-Schwartz, and M. Marsh. 2014. Flat clathrin lattices: stable features of the plasma membrane. Mol Biol Cell. 25:3581-3594.

      He, K., E. Song, S. Upadhyayula, S. Dang, R. Gaudin, W. Skillern, K. Bu, B.R. Capraro, I. Rapoport, I. Kusters, M. Ma, and T. Kirchhausen. 2020. Dynamics of Auxilin 1 and GAK in clathrin-mediated traffic. J Cell Biol. 219.

      Mino, R.E., Z. Chen, M. Mettlen, and S.L. Schmid. 2020. An internally eGFP-tagged α-adaptin is a fully functional and improved fiduciary marker for clathrin-coated pit dynamics. Traffic. 21:603-616.

      Saffarian, S., and T. Kirchhausen. 2008. Differential evanescence nanometry: live-cell fluorescence measurements with 10-nm axial resolution on the plasma membrane. Biophys J. 94:2333-2342.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this manuscript, Yang et al. characterize the endocytic accessory protein CCDC32, which has implications in cardio-facio-neuro-developmental syndrome (CFNDS). The authors clearly demonstrate that the protein CCDC32 has a role in the early stages of endocytosis, mainly through the interaction with the major endocytic adaptor protein AP2, and they identify regions taking part in this recognition. Through live cell fluorescence imaging and electron microscopy of endocytic pits, the authors characterize the lifetimes of endocytic sites, the formation rate of endocytic sites and pits and the invagination depth, in addition to transferrin receptor (TfnR) uptake experiments. Binding between CCDC32 and CCDC32 mutants to the AP2 alpha appendage domain is assessed by pull down experiments. Together, these experiments allow deriving a phenotype of CCDC32 knock-down and CCDC32 mutants within endocytosis, which is a very robust system, in which defects are not so easily detected. A mutation of CCDC32, known to play a role in CFNDS, is also addressed in this study and shown to have endocytic defects.

      In summary, the authors present a strong combination of techniques, assessing the impact of CCDC32 in clathrin mediated endocytosis and its binding to AP2, whereby the following major and minor points remain to be addressed:

      • The authors show that CCDC32 depletion leads to the formation of brighter and static clathrin coated structures (Figure 2), but that these were only prevalent to 7.8% and masked the 'normal' dynamic CCPs. At the same time, the authors show that the absence of CCDC32 induces pits with shorter life times (Figure 1 and Figure 2), the 'majority' of the pits.

      Clarification is needed as to how the authors arrive at these conclusions and these numbers. The authors should also provide (and visualize) the corresponding statistics. The same statement is made again later on in the manuscript, where the authors explain their electron microscopy data. Was the number derived from there?

      These points are critical to understanding CCDC32's role in endocytosis and is key to understanding the model presented in Figure 8. The numbers of how many pits accumulate in flat lattices versus normal endocytosis progression and the actual time scales could be included in this model and would make the figure much stronger. - In relation to the above point, the statistics of Figure 2E-G and the analysis leading there should also be explained in more detail: For example, what are the individual points in the plot (also in Figures 6G and 7G)? The authors should also use a few phrases to explain software they use, for example DASC, in the main text. - There are several questions related to the knock-down experiments that need to be addressed:

      Firstly, knock-down of CCDC32 does not seem to be very strong (Figure S2B). Can the level of knock-down be quantified?

      In page 6 it is indicated that the eGFP-CCDC32(1-54) and eGFP-CCDC32(∆78-98) constructs are siRNA-resistant. However in Fig S2B, these proteins do not show any signal in the western blot, so it is not clear if they are expressed or simply not detected by the antibody. The presence of these proteins after silencing endogenous CCDC32 needs to be confirmed to support Figures 6 and Figures 7, which critically rely on the presence of the CCDC32 mutants.

      In Figures 6 and 7, siRNA knock-down of CCDC32 is only indicated for sub-figures F to G. Is this really the case? If not, the authors should clarify. The siRNA knock-down in Figure 1 is also only mentioned in the text, not in the figure legend. The authors should pay attention to make their figure legends easy to understand and unambiguous. - It is not exactly clear how the curves in Figure 3C (lower panel) on the invagination depth were obtained. Can the authors clarify this a bit more? For example, what are kT and kE in Figure 3A? What is I0? And how did the authors derive the logarithmic function used to quantify the invagination depth? In the main text, the authors say that the traces were 'logarithmically transformed'. This is not a technical term. The authors should refer to the actual equation used in the figure. - In the discussion, the claim 'The resulting dysregulation of AP2 inhibits CME, which further results in the development of CFNDS.' is maybe a bit too strong of a statement. Firstly, because the authors show themselves that CME is perturbed, but by no means inhibited. Secondly, the molecular link to CFNDS remains unclear. Even though CCDC32 mutants seem to be responsible for CFNDS and one of the mutant has been shown in this study to have a defect in endocytosis and AP2 binding, a direct link between CCDC32's function in endocytosis and CFNDS remains elusive. The authors should thus provide a more balanced discussion on this topic. - In Figure S1, the authors annotate the presence of a coiled-coil domain, which they also use later on in the manuscript to generate mutations. Could the authors specify (and cite) where and how this coiled-coil domain has been identified? Is this predicted helix indeed a coiled-coil domain, or just a helix, as indicated by the authors in the discussion?

      Minor comments:

      • In general, a more detailed explanation of the microscopy techniques used and the information they report would be beneficial to provide access to the article also to non-expert readers in the field. This concerns particularly the analysis methods used, for example: How were the cohort-averaged fluorescence intensity and lifetime traces obtained? How do the tools cmeAnalysis and DASC work? A brief explanation would be helpful.
      • The axis label of Figure 2B is not quite clear. What does 'TfnR uptake % of surface bound' mean? Maybe the authors could explain this in more detail in the figure legend? Is the drop in uptake efficiency also accessible by visual inspection of the images? It would be interesting to see that.
      • Figure 4: How is the occupancy of CCPs in the plasma membrane measured? What are the criteria used to divide CCSs into Flat, Dome or Sphere categories?
      • Figure 5B: Can the authors explain, where exactly the GFP was engineered into AP2 alpha? This construct does not seem to be explained in the methods section.
      • Figure S1B: The authors should indicate the colour code used for the structural model.
      • The list of primers referred to in the materials and methods section does not exist. There is a Table S1, but this contains different data. The actual Table S1 is not referenced in the main text. This should be done.

      Significance

      In this study, the authors analyse a so-far poorly understood endocytic accessory protein, CCDC32, and its implication for endocytosis. The experimental tool set used, allowing to quantify CCP dynamics and invagination is clearly a strength of the article that allows assessing the impact of an accessory protein towards the endocytic uptake mechanism, which is normally very robust towards mutations. Only through this detailed analysis of endocytosis progression could the authors detect clear differences in the presence and absence of CCDC32 and its mutants. If the above points are successfully addressed, the study will provide very interesting and highly relevant work allowing a better understanding of the early phases in CME with implication for disease.

      The study is thus of potential interest to an audience interested in CME, in disease and its molecular reasons, as well as for readers interested in intrinsically disordered proteins to a certain extent, claiming thus a relatively broad audience. The presented results may initiate further studies of the so-far poorly understood and less well known accessory protein CCDC32.

      The field of expertise of the reviewer is structural biology, biochemistry and clathrin mediated endocytosis. Expertise in cell biology is rather superficial.

    1. If one economy suffered, the effects of that decline were sure to be felt elsewhere.

      In week 1, we learned of the communal attitude that ancient Egypt and the code of Hammurabi participated in. The Egyptians understood that if the canal in one area was damaged, it impacted not only that farmer, but everyone down stream. The same holds true for the trade routes. Would they send aid to those distant communities that were more fragile? Or was it every man for himself?

    1. Reviewer #1 (Public review):

      Summary:

      "Neural noise", here operationalized as an imbalance between excitatory and inhibitory neural activity, has been posited as a core cause of developmental dyslexia, a prevalent learning disability that impacts reading accuracy and fluency. This is study is the first to systematically evaluate the neural noise hypothesis of dyslexia. Neural noise was measured using neurophysiological (electroencephalography [EEG]) and neurochemical (magnetic resonance spectroscopy [MRS]) in adolescents and young adults with and without dyslexia. The authors did not find evidence of elevated neural noise in the dyslexia group from EEG or MRS measures, and Bayes factors generally informed against including the grouping factor in the models. Although the comparisons between groups with and without dyslexia did not support the neural noise hypothesis, a mediation model that quantified phonological processing and reading abilities continuously revealed that EEG beta power in the left superior temporal sulcus was positively associated with reading ability via phonological awareness. This finding lends support for analysis of associations between neural excitatory/inhibitory factors and reading ability along a continuum, rather than as with a case/control approach, and indicates the relevance of phonological awareness as an intermediate trait that may provide a more proximal link between neurobiology and reading ability. Further research is needed across developmental stages and over a broader set of brain regions to more comprehensively assess the neural noise hypothesis of dyslexia, and alternative neurobiological mechanisms of this disorder should be explored.

      Strengths:

      The inclusion of multiple methods of assessing neural noise (neurophysiological and neurochemical) is a major advantage of this paper. MRS at 7T confers an advantage of more accurately distinguishing and quantifying glutamate, which is a primary target of this study. In addition, the subject-specific functional localization of the MRS acquisition is an innovative approach. MRS acquisition and processing details are noted in the supplementary materials using according to the experts' consensus recommended checklist (https://doi.org/10.1002/nbm.4484). Commenting on rigor the EEG methods is beyond my expertise as a reviewer.<br /> Participants recruited for this study included those with a clinical diagnosis of dyslexia, which strengthens confidence in the accuracy of the diagnosis. The assessment of reading and language abilities during the study further confirms the persistently poorer performance of the dyslexia group compared to the control group.<br /> The correlational analysis and mediation analysis provide complementary information to the main case-control analyses, and the examination of associations between EEG and MRS measures of neural noise is novel and interesting.<br /> The authors follow good practice for open science, including data and code sharing. They also apply statistical rigor, using Bayes Factors to support conclusions of null evidence rather than relying only on non-significant findings. In the discussion, they acknowledge the limitations and generalizability of the evidence and provide directions for future research on this topic.

      Appraisal:

      The authors present a thorough evaluation of the neural noise hypothesis of developmental dyslexia in a sample of adolescents and young adults using multiple methods of measuring excitatory/inhibitory imbalances as an indicator of neural noise. The authors concluded that there was not support for the neural noise hypothesis of dyslexia in their study based on null significance and Bayes factors. This conclusion is justified, and further research is called for to more broadly evaluate the neural noise hypothesis in developmental dyslexia.

      Impact:

      This study provides an exemplar foundation for the evaluation of the neural noise hypothesis of dyslexia. Other researcher may adopt the model applied in this paper to examine neural noise in various populations with/without dyslexia, or across a continuum of reading abilities, to more thoroughly examine evidence (or lack thereof) for this hypothesis. Notably, the lack of evidence here does not rule out the possibility for a role of neural noise in dyslexia, and the authors point out that presentation with co-occurring conditions, such as ADHD, may contribute to neural noise in dyslexia. Dyslexia remains a multi-faceted and heterogenous neurodevelopmental condition, and many genetic, neurobiological and environmental factors play a role. This study demonstrates one step toward evaluating neurobiological mechanisms that may contribute to reading difficulties.

    2. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      "Neural noise", here operationalized as an imbalance between excitatory and inhibitory neural activity, has been posited as a core cause of developmental dyslexia, a prevalent learning disability that impacts reading accuracy and fluency. This is study is the first to systematically evaluate the neural noise hypothesis of dyslexia. Neural noise was measured using neurophysiological (electroencephalography [EEG]) and neurochemical (magnetic resonance spectroscopy [MRS]) in adolescents and young adults with and without dyslexia. The authors did not find evidence of elevated neural noise in the dyslexia group from EEG or MRS measures, and Bayes factors generally informed against including the grouping factor in the models. Although the comparisons between groups with and without dyslexia did not support the neural noise hypothesis, a mediation model that quantified phonological processing and reading abilities continuously revealed that EEG beta power in the left superior temporal sulcus was positively associated with reading ability via phonological awareness. This finding lends support for analysis of associations between neural excitatory/inhibitory factors and reading ability along a continuum, rather than as with a case/control approach, and indicates the relevance of phonological awareness as an intermediate trait that may provide a more proximal link between neurobiology and reading ability. Further research is needed across developmental stages and over a broader set of brain regions to more comprehensively assess the neural noise hypothesis of dyslexia, and alternative neurobiological mechanisms of this disorder should be explored.

      Strengths:

      The inclusion of multiple methods of assessing neural noise (neurophysiological and neurochemical) is a major advantage of this paper. MRS at 7T confers an advantage of more accurately distinguishing and quantifying glutamate, which is a primary target of this study. In addition, the subject-specific functional localization of the MRS acquisition is an innovative approach. MRS acquisition and processing details are noted in the supplementary materials using according to the experts' consensus recommended checklist (https://doi.org/10.1002/nbm.4484). Commenting on rigor the EEG methods is beyond my expertise as a reviewer.

      Participants recruited for this study included those with a clinical diagnosis of dyslexia, which strengthens confidence in the accuracy of the diagnosis. The assessment of reading and language abilities during the study further confirms the persistently poorer performance of the dyslexia group compared to the control group.

      The correlational analysis and mediation analysis provide complementary information to the main case-control analyses, and the examination of associations between EEG and MRS measures of neural noise is novel and interesting.

      The authors follow good practice for open science, including data and code sharing. They also apply statistical rigor, using Bayes Factors to support conclusions of null evidence rather than relying only on non-significant findings. In the discussion, they acknowledge the limitations and generalizability of the evidence and provide directions for future research on this topic.

      Weaknesses:

      Though the methods employed in the paper are generally strong, the MRS acquisition was not optimized to quantify GABA, so the findings (or lack thereof) should be interpreted with caution. Specifically, while 7T MRS affords the benefit of quantifying metabolites, such as GABA, without spectral editing, this quantification is best achieved with echo times (TE) of 68 or 80 ms in order to minimize the spectral overlap between glutamate and GABA and reduce contamination from the macromolecular signal (Finkelman et al., 2022, https://doi.org/10.1016/j.neuroimage.2021.118810). The data in the present study were acquired at TE=28 ms, and are therefore likely affected by overlapping Glu and GABA peaks at 2.3 ppm that are much more difficult to resolve at this short TE, which could directly affect the measures that are meant to characterize the Glu/GABA+ ratio/imbalance. In future research, MRS acquisition schemes should be optimized for the acquisition of Glutamate, GABA, and their relative balance.

      As the authors note in the discussion, additional factors such as MRS voxel location, participant age, and participant sex could influence associations between neural noise and reading abilities and should be considered in future studies.

      We have modified Figure 2 and revised the paragraph discussing the MRS methodological limitations in accordance with Reviewer #1's recommendations. Additionally, we have included the CRLB and linewidth thresholds in the Results section. Furthermore, a new figure showing the correlations between EEG and MRS biomarkers has been added (Figure 3).

      Appraisal:

      The authors present a thorough evaluation of the neural noise hypothesis of developmental dyslexia in a sample of adolescents and young adults using multiple methods of measuring excitatory/inhibitory imbalances as an indicator of neural noise. The authors concluded that there was not support for the neural noise hypothesis of dyslexia in their study based on null significance and Bayes factors. This conclusion is justified, and further research is called for to more broadly evaluate the neural noise hypothesis in developmental dyslexia.

      Impact:

      This study provides an exemplar foundation for the evaluation of the neural noise hypothesis of dyslexia. Other researcher may adopt the model applied in this paper to examine neural noise in various populations with/without dyslexia, or across a continuum of reading abilities, to more thoroughly examine evidence (or lack thereof) for this hypothesis. Notably, the lack of evidence here does not rule out the possibility for a role of neural noise in dyslexia, and the authors point out that presentation with co-occurring conditions, such as ADHD, may contribute to neural noise in dyslexia. Dyslexia remains a multi-faceted and heterogenous neurodevelopmental condition, and many genetic, neurobiological and environmental factors play a role. This study demonstrates one step toward evaluating neurobiological mechanisms that may contribute to reading difficulties.

      Reviewer #2 (Public review):

      Summary:

      This study utilized two complimentary techniques (EEG and 7T MRI/MRS) to directly test a theory of dyslexia: the neural noise hypothesis. The authors report finding no evidence to support an excitatory/inhibitory balance, as quantified by beta in EEG and Glutamate/GABA ratio in MRS. This is important work and speaks to one potential mechanism by which increased neural noise may occur in dyslexia.

      Strengths:

      This is a well conceived study with in depth analyses and publicly available data for independent review. The authors provide transparency with their statistics and display the raw data points along with the averages in figures for review and interpretation. The data suggest that an E/I balance issue may not underlie deficits in dyslexia and is a meaningful and needed test of a possible mechanism for increased neural noise.

      Weaknesses:

      The researchers did not include a visual print task in the EEG task, which limits analysis of reading specific regions such as the visual word form area, which is a commonly hypoactivated region in dyslexia. This region is a common one of interest in dyslexia, yet the researchers measured the I/E balance in only one region of interest, specific to the language network.

      Reviewer #3 (Public review):

      Summary:

      This study by Glica and colleagues utilized EEG (i.e., Beta power, Gamma power, and aperiodic activity) and 7T MRS (i.e., MRS IE ratio, IE balance) to reevaluating the neural noise hypothesis in Dyslexia. Supported by Bayesian statistics, their results show convincing evidence of no differences in EI balance between groups, challenging the neural noise hypothesis.

      Strengths:

      Combining EEG and 7T MRS, this study utilized both the indirect (i.e., Beta power, Gamma power, and aperiodic activity) and direct (i.e., MRS IE ratio, IE balance) measures to reevaluating the neural noise hypothesis in Dyslexia.

    1. Below is a concise overview of the key concepts in the article “How Real-Time Materialized Views Work with ksqlDB, Animated.” It explains:

      1. What Real-Time Materialized Views Are
      2. A real-time materialized view is a continuously updated “pre-aggregated” or “read-optimized” result of incoming streaming data.
      3. Instead of recalculating the entire view on demand (as in many traditional databases), stream processing incrementally updates the view with each new event (the “delta”).

      4. How ksqlDB Maintains These Views

      5. Continuous Queries: When you write a SQL-like query in ksqlDB (e.g., CREATE TABLE ... SELECT ... FROM readings GROUP BY ... EMIT CHANGES;), ksqlDB creates a persistent query that runs forever, reading new events from Kafka topics and updating the view.
      6. Incremental Updates + Changelog: As ksqlDB updates the materialized view in its local state store (RocksDB), it also emits a new record to a changelog topic in Kafka that captures the change.

        • This changelog topic is essentially the “audit log” of every update.
        • The local RocksDB store is fast but treated as transient; changelog topics in Kafka provide durability and fault tolerance.
      7. Push vs. Pull Queries

      8. Pull Queries ask for the current state of the materialized view at the moment you run the query (e.g., SELECT * FROM avg_readings WHERE sensor=...;).
      9. Push Queries subscribe to changes as they happen (e.g., SELECT * FROM avg_readings EMIT CHANGES;). You get a continuous stream of updates whenever a new change arrives.

      10. RocksDB as the Local Store

      11. Each partition of the input stream(s) to a ksqlDB query is associated with its own local RocksDB instance.
      12. RocksDB stores the current state needed for aggregations, joins, etc.
      13. Because data is partitioned, all rows with the same key end up on the same partition (and thus the same RocksDB instance).

      14. Automatic Repartitioning

      15. If your grouping key is not the same as the original Kafka key, ksqlDB must shuffle data so that rows with the same group key end up on the same partition.
      16. This shuffle is automatically handled by creating a *-repartition topic.
      17. If your original keys are already aligned with the grouping columns, ksqlDB skips this shuffle to save I/O.

      18. Fault Tolerance via Changelogs

      19. If a ksqlDB node dies, a new node can rebuild the materialized view by replaying the changelog from Kafka.
      20. Changelog topics use log compaction, which removes older updates to each key, keeping only the latest.
      21. This keeps replay time manageable (rather than applying every single historical update).

      22. Latest-by-Offset Aggregations

      23. Besides sum, min, max, or average, ksqlDB also supports “latest by offset” to store just the most recent value for each key, effectively creating a “recency cache.”
      24. Example:<br /> sql CREATE TABLE latest_readings AS SELECT sensor, LATEST_BY_OFFSET(area) AS area, LATEST_BY_OFFSET(reading) AS last FROM readings GROUP BY sensor EMIT CHANGES;
        • This ensures the table always reflects the last known value for each key (based on Kafka offset).

      Why This Matters

      • Fast Queries: Because the materialized view is already “pre-aggregated,” queries against it are extremely fast—no need to scan or recalculate everything from scratch.
      • Real-Time Updates: The view is updated continuously as new data arrives, so you always have a near-real-time representation of what is happening.
      • Scalable & Fault-Tolerant: Using Kafka’s partitions and log compaction for changelogs, ksqlDB scales horizontally (across multiple nodes) and recovers state quickly when nodes fail.

      Further Resources

      • Try It Out
      • The ksqlDB quickstart is a straightforward way to experiment locally.
      • Once it’s running, you can execute the code examples in the article to see real-time materialized views in action.
      • Next Steps
      • Deep dive into ksqlDB’s fault tolerance and scaling model (i.e., how queries distribute across clusters).
      • Explore additional stream processing patterns such as windowed aggregations for time-based summaries.
      • Learn how joins work between tables and streams in ksqlDB (similar incremental update logic, but with different partitioning considerations).

      In essence, real-time materialized views in ksqlDB let you maintain continually up-to-date “snapshots” of streaming data. By storing incremental results in a local state store and capturing updates in a Kafka changelog, ksqlDB can serve extremely fast queries, recover quickly from failures, and scale out for large data volumes.

    1. Remote Code Execution (RCE) in a DoD website Share: Timeline joaomatosf submitted a report to U.S. Dept Of Defense. March 24, 2018, 2:59am UTC

      SUMMARY: The DoD https://███/psc/EXPROD/ Web System uses the Oracle PeopleSoft platform which is vulnerable to Remote Code Execution (RCE) and Denial of Service Attacks (DoS) over a Java Object Deserialization (CWE-502) in the “monitor” service. Thus an attacker can generate and send malicious java objects of special types to your system and achieve arbitrary effects (such as RCE os DoS) during their deserialization (the objects are deserialized by readObject() method without any type of validation). This is related to CVE-2017-10366 [1]. PROOF OF CONCEPT For PoC I sent a special serialized java object in order to force the vulnerable server to perform a DNS Lookup for a domain controlled by me (dod.jexboss.info). In this way, if the code is executed successfully by the DoD server I will receive a DNS query from DoD and see it in the logs of my BIND daemon (the vulnerable DoD server will perform a local DNS query for dod.jexboss.info and the local DNS will try to query the authoritative nameserver for the jexboss.info domain (ns1.jexboss.info), which is mine). For more details about this payload used, see [2]. Attached is a video detailing the PoC. Generating the payload: for generate the payload I used the tool ysoserial. Code 197 BytesUnwrap lines Copy Download $ git clone https://github.com/frohoff/ysoserial.git $ cd ysoserial $ mvn clean package –DskipTests $ cd target $ java -jar ysoserial-0.0.6-SNAPSHOT-all.jar URLDNS http://dod.jexboss.info > payload

    1. The problem is, once you really understand a problem, you realize that most problems are not solvable at all.  They’re tangled webs of causality, which one might call “wicked” problems

      "The problem is, once you really understand a problem, you realize that most problems are not solvable at all. They’re tangled webs of causality, which one might call “wicked” problems"

      I understand this is the case that must be considered for all designers, however this is something that I am uncomfortable with. For myself, I get stressed out when I try to fix one problem and then realize that 3 other problems must be fixed. it often leads me to think about just restarting or going back to square one if that makes sense. For example, my coding assignments often have me in this scenario. I think of a solution for one part of the code but then realize there are issues with the other pieces of code that I wrote. One problem, revealing/leading to other problems is something I dislike but have learned to get better at handling.

    1. Now, there are many reasons one might be suspicious about utilitarianism as a cheat code for acting morally, but let’s assume for a moment that utilitarianism is the best way to go. When you undertake your utility calculus, you are, in essence, gathering and responding to data about the projected outcomes of a situation. This means that how you gather your data will affect what data you come up with.

      This is interesting because it shows how important the information is when applying utilitarianism to decision-making. Even if the goal is to improve happiness, our decisions may turn out to be unethical if the evidence is inaccurate or lacking. It makes me wonder if utilitarianism can be used as an accurate moral code. Also, the results may not be accurate if the data is incomplete or unfair.

    2. Now, there are many reasons one might be suspicious about utilitarianism as a cheat code for acting morally, but let’s assume for a moment that utilitarianism is the best way to go. When you undertake your utility calculus, you are, in essence, gathering and responding to data about the projected outcomes of a situation.

      Utilitarianism's reliance on utility calculus highlights its practical challenge: Gathering and predicting the outcomes of actions accurately is inherently uncertain, and subjective. Its aim is to maximise well being, however due to the nature of forecasting the consequences of its actions and providing an account of all those affected, its application is more speculative than precise.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) As VRMate (a component of behaviorMate) is written using Unity, what is the main advantage of using behaviorMate/VRMate compared to using Unity alone paired with Arduinos (e.g. Campbell et al. 2018), or compared to using an existing toolbox to interface with Unity (e.g. Alsbury-Nealy et al. 2022, DOI: 10.3758/s13428-021-01664-9)? For instance, one disadvantage of using Unity alone is that it requires programming in C# to code the task logic. It was not entirely clear whether VRMate circumvents this disadvantage somehow -- does it allow customization of task logic and scenery in the GUI? Does VRMate add other features and/or usability compared to Unity alone? It would be helpful if the authors could expand on this topic briefly.

      We have updated the manuscript (lines 412-422) to clarify the benefits of separating the VR system as an isolated program and a UI that can be run independently. We argue that “…the recommended behaviorMate architecture has several important advantages. Firstly, by rendering each viewing angle of a scene on a dedicated device, performance is improved by splitting the computational costs across several inexpensive devices rather than requiring specialized or expensive graphics cards in order to run…, the overall system becomes more modular and easier to debug [and] implementing task logic in Unity would require understanding Object-Oriented Programming and C# … which is not always accessible to researchers that are typically more familiar with scripting in Python and Matlab.”

      VRMate receives detailed configuration info from behaviorMate at runtime as to which VR objects to display and receives position updates during experiments. Any other necessary information about triggering rewards or presenting non-VR cues is still handled by the UI so no editing of Unity is necessary. Scene configuration information is in the same JSON format as the settings files for behaviorMate, additionally there are Unity Editor scripts which are provided in the VRmate repository which permit customizing scenes through a “drag and drop” interface and then writing the scene configuration files programmatically. Users interested in these features should see our github page to find example scene.vr files and download the VRMate repository (including the editor scripts).  We provided 4 vr contexts, as well as a settings file that uses one of them which can be found on the behaviorMate github page (https://github.com/losonczylab/behaviorMate) in the “vr_contexts” and “example_settigs_files” directories. These examples are provided to assist VRMate users in getting set up and could provide a more detailed example of how VRMate and behaviorMate interact.

      (2) The section on "context lists", lines 163-186, seemed to describe an important component of the system, but this section was challenging to follow and readers may find the terminology confusing. Perhaps this section could benefit from an accompanying figure or flow chart, if these terms are important to understand.

      We maintain the use of the term context and context list in order to maintain a degree of parity with the java code. However, we have updated lines 173-175 to define the term context for the behaviorMate system: “... a context is grouping of one or more stimuli that get activated concurrently. For many experiments it is desirable to have multiple contexts that are triggered at various locations and times in order to construct distinct or novel environments.”

      a. Relatedly, "context" is used to refer to both when the animal enters a particular state in the task like a reward zone ("reward context", line 447) and also to describe a set of characteristics of an environment (Figure 3G), akin to how "context" is often used in the navigation literature. To avoid confusion, one possibility would be to use "environment" instead of "context" in Figure 3G, and/or consider using a word like "state" instead of "context" when referring to the activation of different stimuli.

      Thank you for the suggestion. We have updated Figure 3G to say “Environment” in order to avoid confusion.

      (3) Given the authors' goal of providing a system that is easily synchronizable with neural data acquisition, especially with 2-photon imaging, I wonder if they could expand on the following features:

      a. The authors mention that behaviorMate can send a TTL to trigger scanning on the 2P scope (line 202), which is a very useful feature. Can it also easily generate a TTL for each frame of the VR display and/or each sample of the animal's movement? Such TTLs can be critical for synchronizing the imaging with behavior and accounting for variability in the VR frame rate or sampling rate.

      Different experimental demands require varying levels of precision in this kind of synchronization signals. For this reason, we have opted against a “one-size fits all” for synchronization with physiology data in behaviorMate. Importantly this keeps the individual rig costs low which can be useful when constructing setups specifically for use when training animals. behaviorMate will log TTL pulses sent to GPIO pins setup as sensors, and can be configured to generate TTL pulses at regular intervals. Additionally all UDP packets received by the UI are time stamped and logged. We also include the output of the arduino millis() function in all UDP packets which can be used for further investigation of clock drift between system components. Importantly, since the system is event driven there cannot be accumulating drift across running experiments between the behaviorMate UI and networked components such as the VR system.

      For these reasons, we have not needed to implement a VR frame synchronization TTL for any of our experiments, however, one could extend VRMate to send "sync" packets back to behaviorMate to log when each frame was displayed precisely or TTL pulses (if using the same ODROID hardware we recommend in the standard setup for rendering scenes). This would be useful if it is important to account for slight changes in the frame rate at which the scenes are displayed. However, splitting rendering of large scenes between several devices results in fast update times and our testing and benchmarks indicate that display updates are smooth and continuous enough to appear coupled to movement updates from the behavioral apparatus and sufficient for engaging navigational circuits in the brain.

      b. Is there a limit to the number of I/O ports on the system? This might be worth explicitly mentioning.

      We have updated lines 219-220 in the manuscript to provide this information: Sensors and actuators can be connected to the controller using one of the 13 digital or 5 analog input/output connectors.

      c. In the VR version, if each display is run by a separate Android computer, is there any risk of clock drift between displays? Or is this circumvented by centralized control of the rendering onset via the "real-time computer"?

      This risk is mitigated by the real-time computer/UI sending position updates to the VR displays. The maximum amount scenes can be out of sync is limited because they will all recalibrate on every position update – which occurs multiple times per second as the animal is moving. Moreover, because position updates are constantly being sent by behaviorMate to VRMate and VRMate is immediately updating the scene according to this position, the most the scene can become out of sync with the mouse's position is proportional to the maximum latency multiplied by the running speed of the mouse. For experiments focusing on eliciting an experience of navigation, such a degree of asynchrony is almost always negligible. For other experimental demands it could be possible to incorporate more precise frame timing information but this was not necessary for our use case and likely for most other use cases. Additionally, refer to the response to comment 3a.

      Reviewer #2 (Public review):

      (1) The central controlling logic is coupled with GUI and an event loop, without a documented plugin system. It's not clear whether arbitrary code can be executed together with the GUI, hence it's not clear how much the functionality of the GUI can be easily extended without substantial change to the source code of the GUI. For example, if the user wants to perform custom real-time analysis on the behavior data (potentially for closed-loop stimulation), it's not clear how to easily incorporate the analysis into the main GUI/control program.

      Without any edits to the existing source code behaviorMate is highly customizable through the settings files, which allow users to combine the existing contexts and decorators in arbitrary combinations. Therefore, users have been able to perform a wide variety of 1D navigation tasks, well beyond our anticipated use cases by generating novel settings files. The typical method for providing closed-loop stimulation would be to set up a context which is triggered by animal behavior using decorators (e.g. based on position, lap number and time) and then trigger the stimulation with a TTL pulse. Rarely, if users require a behavioral condition not currently implemented or composable out of existing decorators, it would require generating custom code in Java to extend the UI. Performing such edits requires only knowledge of basic object-oriented programming in Java and generating a single subclass of either the BasicContextList or ContextListDecorator classes. In addition, the JavaFX (under development) version of behaviorMate incorporates a plugin which doesn't require recompiling the code in order to make these changes. However, since the JavaFX software is currently under development, documentation does not yet exist. All software is open-sourced and available on github.com for users interested in generating plugins or altering the source code.

      We have added the additional caveat to the manuscript in order to clarify this point (Line 197-202): “However, if the available set of decorators is not enough to implement the required task logic, some modifications to the source code may be necessary. These modifications, in most cases, would be very simple and only a basic understanding of object-oriented programming is required. A case where this might be needed would be performing novel customized real-time analysis on behavior data and activating a stimulus based on the result”

      (2) The JSON messaging protocol lacks API documentation. It's not clear what the exact syntax is, supported key/value pairs, and expected response/behavior of the JSON messages. Hence, it's not clear how to develop new hardware that can communicate with the behaviorMate system.

      The most common approach for adding novel hardware is to use TTL pulses (or accept an emitted TTL pulse to read sensor states). This type of hardware addition  is possible through the existing GPIO without the need to interact with the software or JSON API. Users looking to take advantage of the ability to set up and configure novel behavioral paradigms without the need to write any software would be limited to adding hardware which could be triggered with and report to the UI with a TTL pulse (however fairly complex actions could be triggered this way).

      For users looking to develop more customized hardware solutions that interact closely with the UI or GPIO board, additional documentation on the JSON messaging protocol has been added to the behaviormate-utils repository (https://github.com/losonczylab/behaviormate_utils). Additionally, we have added a link to this repository in the Supplemental Materials section (line 971) and referenced this in the manuscript (line 217) to make it easier for readers to find this information.

      Furthermore, developers looking to add completely novel components to the UI  can implement the interface described by Context.java in order to exchange custom messages with hardware. (described  in the JavaDoc: https://www.losonczylab.org/behaviorMate-1.0.0/)  These messages would be defined within the custom context and interact with the custom hardware (meaning the interested developer would make a novel addition to the messaging API). Additionally, it should be noted that without editing any software, any UDP packets sent to behaviorMate from an IP address specified in the settings will get time stamped and logged in the stored behavioral data file meaning that are a large variety of hardware implementation solutions using both standard UDP messaging and through TTL pulses that can work with behaviorMate with minimal effort. Finally, see response to R2.1 for a discussion of the JavaFX version of the behaviorMatee UI including plugin support.

      (3) It seems the existing control hardware and the JSON messaging only support GPIO/TTL types of input/output, which limits the applicability of the system to more complicated sensor/controller hardware. The authors mentioned that hardware like Arduino natively supports serial protocols like I2C or SPI, but it's not clear how they are handled and translated to JSON messages.

      We provide an implementation for an I2C-based capacitance lick detector which interested developers may wish to copy if support for novel I2C or SPI. Users with less development experience wishing to expand the hardware capabilities of  behaviorMatecould also develop adapters which can be triggered  on a TTL input/output. Additionally, more information about the JSON API and how messages are transmitted to the PC by the arduino is described in point (2) and the expanded online documentation.

      a. Additionally, because it's unclear how easy to incorporate arbitrary hardware with behaviorMate, the "Intranet of things" approach seems to lose attraction. Since currently, the manuscript focuses mainly on a specific set of hardware designed for a specific type of experiment, it's not clear what are the advantages of implementing communication over a local network as opposed to the typical connections using USB.

      As opposed to serial communication protocols as typical with USB, networking protocols seamlessly function based on asynchronous message passing. Messages may be routed internally (e.g. to a PCs localhost address, i.e. 0.0.0..0) or to a variety of external hardware (e.g. using IP addresses such as those in the range 192.168.1.2 - 192.168.1.254). Furthermore, network-based communication allows modules, such as VR, to be added easily. behavoirMate systems can be easily expanded using low-cost Ethernet switches and consume only a single network adapter on the PC (e.g. not limited by the number of physical USB ports). Furthermore, UDP message passing is implemented in almost all modern programming languages in a platform independent manner (meaning that the same software can run on OSX, Windows, and Linux). Lastly, as we have pointed out (Line 117) a variety of tools exist for inspecting network packets and debugging; meaning that it is possible to run behaviorMate with simulated hardware for testing and debugging.

      The IOT nature of behaviorMate means there is no requirement for novel hardware to be implemented  using an arduino,  since any system capable of  UDP communication can  be configured. For example, VRMate is usually run on Odroid C4s, however one could easily create a system using Raspberry Pis or even additional PCs. behaviorMate is agnostic to the format of the UDP messages, but packaging any data in the JSON format for consistency would be encouraged. If a new hardware is a sensor that has input requiring it to be time stamped and logged then all that is needed is to add the IP address and port information to the ‘controllers’ list in a behaviorMate settings file. If more complex interactions are needed with novel hardware than a custom implementation of ContextList.java may be required (see response to R2.2). However, the provided UdpComms.java class could be used to easily send/receive messages from custom Context.java subclasses.

      Solutions for highly customized hardware do require basic familiarity with object-oriented programming using the Java programming language. However, in our experience most behavioral experiments do not require these kinds of modifications. The majority of 1D navigation tasks, which behaviorMate is currently best suited to control, require touch/motion sensors, LEDs, speakers, or solenoid valves,  which are easily controlled by the existing GPIO implementation. It is unlikely that custom subclasses would even be needed.

      Reviewer #3 (Public review):

      (1) While using UDP for data transmission can enhance speed, it is thought that it lacks reliability. Are there error-checking mechanisms in place to ensure reliable communication, given its criticality alongside speed?

      The provided GPIO/behavior controller implementation sends acknowledgement packets in response to all incoming messages as well as start and stop messages for contexts and “valves”. In this way the UI can update to reflect both requested state changes as well as when they actually happen (although there is rarely a perceptible gap between these two states unless something is unplugged or not functioning). See Line 85 in the revised manuscript “acknowledgement packets are used to ensure reliable message delivery to and from connected hardware”.

      (2) Considering this year's price policy changes in Unity, could this impact the system's operations?

      VRMate is not affected by the recent changes in pricing structure of the Unity project.

      The existing compiled VRMate software does not need to be regenerated to update VR scenes, or implement new task logic (since this is handled by the behaviorMate GUI). Therefore, the VRMate program is robust to any future pricing changes or other restructuring of the Unity program and does not rely on continued support of Unity. Additionally, while the solution presented in VRMate has many benefits, a developer could easily adapt any open-source VR Maze project to receive the UDP-based position updates from behaviorMate or develop their own novel VR solutions.

      (3) Also, does the Arduino offer sufficient precision for ephys recording, particularly with a 10ms check?

      Electrophysiology recording hardware typically has additional I/O channels which can provide assistance with tracking behavior/synchronization at a high resolution. While behaviorMate could still be used to trigger reward valves, either the ephys hardware or some additional high-speed DAQ would be recommended to maintain accurately with high-speed physiology data. behaviorMate could still be set up as normal to provide closed and open-loop task control at behaviorally relevant timescales alongside a DAQ circuit recording events at a consistent temporal resolution. While this would increase the relative cost of the individual recording setup, identical rigs for training animals could still be configured without the DAQ circuit avoiding unnecessary cost and complexity.

      (4) Could you clarify the purpose of the Sync Pulse? In line 291, it suggests additional cues (potentially represented by the Sync Pulse) are needed to align the treadmill screens, which appear to be directed towards the Real-Time computer. Given that event alignment occurs in the GPIO, the connection of the Sync Pulse to the Real-Time Controller in Figure 1 seems confusing.

      A number of methods exist for synchronizing recording devices like microscopes or electrophysiology recordings with behaviorMate’s time-stamped logs of actuators and sensors. For example, the GPIO circuit can be configured to send sync triggers, or receive timing signals as input. Alternatively a dedicated circuit could record frame start signals and relay them to the PC to be logged independently of the GPIO (enabling a high-resolution post-hoc alignment of the time stamps). The optimal method to use varies based on the needs of the experiment. Our setups have a dedicated BNC output and specification in the settings file that sends a TTL pulse at the start of an experiment in order to trigger 2p imaging setups (see line 224, specifically that this is a detail of “our” 2p imaging setup). We provide this information as it might be useful suggesting how to have both behavior and physiology data start recording at the same time. We do not intend this to be the only solution for alignment. Figure 1 indicates an “optional” circuit for capturing a high speed sync pulse and providing time stamps back to the real time PC. This is another option that might be useful for certain setups (or especially for establishing benchmarks between behavior and physiology recordings). In our setup event alignment does not exclusively occur on the GPIO.

      a. Additionally, why is there a separate circuit for the treadmill that connects to the UI computer instead of the GPIO? It might be beneficial to elaborate on the rationale behind this decision in line 260.

      Event alignment does not occur on the GPIO, separating concerns between position tracking and more general input/output features which improves performance and simplifies debugging.  In this sense we maintain a single event loop on the Arduino, avoiding the need to either run multithreaded operations or rely extensively on interrupts which can cause unpredictable code execution (e.g. when multiple interrupts occur at the same time). Our position tracking circuit is therefore coupled to a separate,low-cost arduino mini which has the singular responsibility of position-tracking.

      b. Moreover, should scenarios involving pupil and body camera recordings connect to the Analog input in the PCB or the real-time computer for optimal data handling and processing?

      Pupil and body camera recordings would be independent data streams which can be recorded separately from behaviorMate. Aligning these forms of full motion video could require frame triggers which could be configured on the GPIO board using single TTL like outputs or by configuring a valve to be “pulsed” which is a provided type customization.

      We also note that a more advanced developer could easily leverage camera signals to provide closed loop control by writing an independent module that sends UDP packets to behavoirMate. For example a separate computer vision based position tracking module could be written in any preferred language and use UDP messaging to send body tracking updates to the UI without editing any of the behaviorMate source code (and even used for updating 1D location).

      (5) Given that all references, as far as I can see, come from the same lab, are there other labs capable of implementing this system at a similar optimal level?

      To date two additional labs have published using behaviorMate, the Soltez and Henn labs (see revised lines 341-342). Since behaviorMate has only recently been published and made available open source, only external collaborators of the Losonczy lab have had access to the software and design files needed to do this. These collaborators did, however, set up their own behavioral setups in separate locations with minimal direct support from the authors–similar to what would be available to anyone seeking to set a behaviorMate system would find online on our github page or by posting to the message board.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (4) To provide additional context for the significance of this work, additional citations would be helpful to demonstrate a ubiquitous need for a system like behaviorMate. This was most needed in the paragraph from lines 46-65, specifically for each sentence after line 55, where the authors discuss existing variants on head-fixed behavioral paradigms. For instance, for the clause "but olfactory and auditory stimuli have also been utilized at regular virtual distance intervals to enrich the experience with more salient cues", suggested citations include Radvansky & Dombeck 2018 (DOI: 10.1038/s41467-018-03262-4), Fischler-Ruiz et al. 2021 (DOI: 10.1016/j.neuron.2021.09.055).

      We thank the reviewer for the suggested missing citations and have updated the manuscript accordingly (see line 58).

      (5) In addition, it would also be helpful to clarify behaviorMate's implementation in other laboratories. On line 304 the authors mention "other labs" but the following list of citations is almost exclusively from the Losonczy lab. Perhaps the citations just need to be split across the sentence for clarity? E.g. "has been validated by our experimental paradigms" (citation set 1) "and successfully implemented in other labs as well" (citation set 2).

      We have split the citation set as suggested (see lines 338-342).

      Minor Comments:

      (6) In the paragraph starting line 153 and in Fig. 2, please clarify what is meant by "trial" vs. "experiment". In many navigational tasks, "trial" refers to an individual lap in the environment, but here "trial" seems to refer to the whole behavioral session (i.e. synonymous with "experiment"?).

      In our software implementation we had originally used “trial” to refer to an imaging session rather than experiment (and have made updates to start moving to the more conventional lexicon). To avoid confusion we have remove this use of “trial” throughout the manuscript and replaced with “experiment” whenever possible

      (7) This is very minor, but in Figure 3 and 4, I don't believe the gavage needle is actually shown in the image. This is likely to avoid clutter but might be confusing to some readers, so it may be helpful to have a small inset diagram showing how the needle would be mounted.

      We assessed the image both with and without the gavage needle and found the version in the original (without) to be easier to read and less cluttered and therefore maintained that version in the manuscript.

      (8) In Figure 5 legend, please list n for mice and cells.

      We have updated the Figure 5 legend to indicate that for panels C-G, n=6 mice (all mice were recorded in both VR and TM systems), 3253 cells in VR classified as significantly tuned place cells VR, and 6101 tuned cells in TM,

      (9) Line 414: It is not necessary to tilt the entire animal and running wheel as long as the head-bar clamp and objective can rotate to align the imaging window with the objective's plane of focus. Perhaps the authors can just clarify the availability of this option if users have a microscope with a rotatable objective/scan head.

      We have added the suggested caveat to the manuscript in order to clarify when the goniometers might be useful (see lines 281-288).

      (10) Figure S1 and S2 could be referenced explicitly in the main text with their related main figures.

      We have added explicit references to figures S1 and S2 in the relevant sections (see lines 443, 460  and 570)

      (11) On line 532-533, is there a citation for "proximal visual cues and tactile cues (which are speculated to be more salient than visual cues)"?

      We have added citations to both Knierim & Rao 2003 and Renaudineau et al. 2007 which discuss the differential impact of proximal vs distal cues during navigation as well as Sofroniew et al. 2014 which describe how mice navigate more naturally in a tactile VR setup as opposed to purely visual ones.

      (12) There is a typo at the end of the Figure 2 legend, where it should say "Arduino Mini."

      This typo has been fixed.

      Reviewer #2 (Recommendations For The Authors):

      (4) As mentioned in the public review: what is the major advantage of taking the IoT approaches as opposed to USB connections to the host computer, especially when behaviorMate relies on a central master computer regardless? The authors mentioned the readability of the JSON messages, making the system easier to debug. However, the flip side of that is the efficiency of data transmission. Although the bandwidth/latency is usually more than enough for transmitting data and commands for behavior devices, the efficiency may become a problem when neural recording devices (imaging or electrophysiology) need to be included in the system.

      behaviorMate is not intended to do everything, and is limited to mainly controlling behavior and providing some synchronizing TTL style triggers. In this way the system can easily and inexpensively be replicated across multiple recording setups; particularly this is useful for constructing additional animal training setups. The system is very much sufficient for capturing behavioral inputs at relevant timescales (see the benchmarks in Figures 3 and 4 as well as the position correlated neural activity in Figures 5 and 6 for demonstration of this). Additional hardware might be needed to align the behaviorMate output with neural data for example a high-speed DAQ or input channels on electrophysiology recording setups could be utilized (if provided). As all recording setups are different the ideal solution would depend on details which are hard to anticipate. We do not mean to convey that the full neural data would be transmitted to the behaviorMate system (especially using the JSON/UDP communications that behaviorMate relies on).

      (5) The author mentioned labView. A popular open-source alternative is bonsai (https://github.com/bonsai-rx/bonsai). Both include a graphical-based programming interface that allows the users to easily reconfigure the hardware system, which behaviorMate seems to lack. Additionally, autopilot (https://github.com/auto-pi-lot/autopilot) is a very relevant project that utilizes a local network for multiple behavior devices but focuses more on P2P communication and rigorously defines the API/schema/communication protocols for devices to be compatible. I think it's important to include a discussion on how behaviorMate compares to previous works like these, especially what new features behaviorMate introduces.

      We believe that behaviorMate provides a more opinionated and complete solution than the projects mentioned. A wide variety of 1D navigational paradigms can be constructed in behaviorMate without the need to write any novel software. For example, bonsai is a “visual programming language” and would require experimenters to construct a custom implementation of each of their experiments. We have opted to use Java for the UI with distributed computations across modules in various languages. Given the IOT methodology it would be possible to use any number of programming languages or APIs; a large number of design decisions were made  when building the project and we have opted to not include this level of detail in the manuscript in order to maintain readability. We strongly believe in using non-proprietary and open source projects, when possible, which is why the comparison with LabView based solutions was included in the introduction. Also, we have added a reference to the autopilot reference to the section of the introduction where this is discussed.

      (6) One of the reasons labView/bonsai are popular is they are inherently parallel and can simultaneously respond to events from different hardware sources. While the JSON events in behaviorMate are asynchronous in nature, the handling of those events seems to happen only in a main event loop coupled with GUI, which is sequential by nature. Is there any multi-threading/multi-processing capability of behaviorMate? If so it's an important feature to highlight. If not I think it's important to discuss the potential limitation of the current implementation.

      IOT solutions are inherently concurrent since the computation is distributed. Additional parallelism could be added by further distributing concerns between additional independent modules running on independent hardware. The UI has an eventloop which aggregates inputs and then updates contexts based on the current state of those inputs sequentially. This sort of a “snapshot” of the current state is necessary to reason about when the start certain contexts based on their settings and applied decorators. While the behaviorMate UI uses multithreading libraries in Java to be more performant in certain cases, the degree to which this represents true vs “virtual” concurrency would depend on the individual PC architecture it is run on and how the operating system allocates resources. For this reason, we have argued in the manuscript that behaviorMate is sufficient for controlling experiments at behaviorally relevant timescales, and have presented both benchmarks and discussed different synchronization approaches and permit users to determine if this is sufficient for their needs.

      (7) The context list is an interesting and innovative approach to abstract behavior contingencies into a data structure, but it's not currently discussed in depth. I think it's worth highlighting how the context list can be used to cover a wide range of common behavior experimental contingencies with detailed examples (line 185 might be a good example to give). It's also important to discuss the limitation, as currently the context lists seem to only support contingencies based purely on space and time, without support for more complicated behavior metrics (e.g. deliver reward only after X% correct).

      To access more complex behavior metrics during runtime, custom context list decorators would need to be implemented. While this is less common in the sort of 1D navigational behaviors the project was originally designed to control, adding novel decorators is a simple process that only requires basic object oriented programming knowledge. As discussed we are also implementing a plugin-architecture in the JavaFX update to streamline these types of additions.

      Minor Comments:

      (8) In line 202, the author suggests that a single TTL pulse is sent to mark the start of a recording session, and this is used to synchronize behavior data with imaging data later. In other words, there are no synchronization signals for every single sample/frame. This approach either assumes the behavior recording and imaging are running on the same clock or assumes evenly distributed recording samples over the whole recording period. Is this the case? If so, please include a discussion on limitations and alternative approaches supported by behaviorMate. If not, please clarify how exactly synchronization is done with one TTL pulse.

      While the TTL pulse triggers the start of neural data in our setups, various options exist for controlling for the described clock drift across experiments and the appropriate one depends on the type of recordings made, frame rate duration of recording etc. Therefore behaviorMate leaves open many options for synchronization at different time scales (e.g. the adding a frame-sync circuit as shown in Figure 1 or sending TTL pulses to the same DAQ recording electrophysiology data).  Expanded consideration of different synchronization methods has been included in the manuscript (see lines 224-238).

      (9) Is the computer vision-based calibration included as part of the GUI functionality? Please clarify. If it is part of the GUI, it's worth highlighting as a very useful feature.

      The computer vision-based benchmarking is not included in the GUI. It is in the form of a script made specifically for this paper. However for treadmill-based experiments behaviorMate has other calibration tools built into it (see line 301-303).

      (10) I went through the source code of the Arduino firmware, and it seems most "open X for Y duration" functions are implemented using the delay function. If this is indeed the case, it's generally a bad idea since delay completely pauses the execution and any events happening during the delay period may be missed. As an alternative, please consider approaches comparing timestamps or using interrupts.

      We have avoided the use of interrupts on the GPIO due to the potential for unpredictable code execution. There is a delay which is only just executed if the duration is 10 ms or less as we cannot guarantee precision of the arduino eventloop cycling faster than this. Durations longer than 10 ms would be time stamped and non-blocking. We have adjusted this MAX_WAIT to be specified as a macro so it can be more easily adjusted (or set to 0).

      (11) Figure 3 B, C, D, and Figure 4 D, E suffer from noticeable low resolution.

      We have converted Figure 3B, C, D and 4C, D, E to vector graphics in order to improve the resolution.

      (12) Figure 4C is missing, which is an important figure.

      This figure appeared when we rendered and submitted the manuscript. We apologize if the figure was generated such that it did not load properly in all pdf viewers. The panel appears correctly in the online eLife version of the manuscript. Additionally, we have checked the revision in Preview on Mac OS as well as Adobe Acrobat and the built-in viewer in Chrome and all figure panels appear in each so we hope this issue has been resolved.

      (13) There are thin white grid lines on all heatmaps. I don't think they are necessary.

      The grid lines have been removed from the heatmaps  as suggested.

      (14) Line 562 "sometimes devices directly communicate with each other for performance reasons", I didn't find any elaboration on the P2P communication in the main text. This is potentially worth highlighting as it's one of the advantages of taking the IoT approaches.

      In our implementation it was not necessary to rely on P2P communication beyond what is indicated in Figure 1. The direct communication referred to in line 562 is meant only to refer to the examples expanded on in the rest of the paragraph i.e. the behavior controller may signal the microscope directly using a TTL signal without looping back to the UI. As necessary users could implement UDP message passing between devices, but this is outside the scope of what we present in the manuscript.

      (15) Line 147 "Notably, due to the systems modular architecture, different UIs could be implemented in any programming language and swapped in without impacting the rest of the system.", this claim feels unsupported without a detailed discussion of how new code can be incorporated in the GUI (plugin system).

      This comment refers to the idea of implementing “different UIs”. This would entail users desiring to take advantage of the JSON messaging API and the proposed electronics while fully implementing their own interface. In order to facilitate this option we have improved documentation of the messaging API posted in the README file accompanying the arduino source code. We have added reference to the supplemental materials where readers can find a link to the JSON API implementation to clarify this point.

      Additionally, while a plugin system is available in the JavaFX version of behaviorMate, this project is currently under development and will update the online documentation as this project matures, but is unrelated to the intended claim about completely swapping out the UI.

      Reviewer #3 (Recommendations For The Authors):

      (6) Figure 1 - the terminology for each item is slightly different in the text and the figure. I think making the exact match can make it easier for the reader.

      - Real-time computer (figure) vs real-time controller (ln88).

      The manuscript was adjusted to match figure terminology.

      - The position controller (ln565) - position tracking (Figure).

      We have updated Figure 1 to highlight that the position controller does the position tracking.

      - Maybe add a Behavior Controller next to the GPIO box in Figure 1.

      We updated Figure 1 to highlight that the Behavior Controller performs the GPIO responsibility such that "Behavior Controller" and "GPIO circuit" may be used interchangeably.

      - Position tracking (fig) and position controller (subtitle - ln209).

      We updated Figure 1 to highlight that the position controller does the position tracking.

      - Sync Pulse is not explained in the text.

      The caption for Figure 1 has been updated to better explain the Sync pulse and additional systems boxes

      (7) For Figure 3B/C: What is the number of data points? It would be nice to see the real population, possibly using a swarm plot instead of box plots. How likely are these outliers to occur?

      In order to better characterize the distributions presented in our benchmarking data we have added mean and standard deviation information the plots 3 and 4. For Figure 3B: 0.0025 +/- 0.1128, Figure 3C: 12.9749 +/- 7.6581, Figure 4C: 66.0500 +/- 15.6994, Figure 4E: 4.1258 +/- 3.2558.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Plasmacytoid dendritic cells (pDCs) are the major producers of type I interferon after viral infections and play key role in antiviral immune response. This article by Joshi et al. investigates the role of pDCs in regulating the Hepatitis E virus (HEV) infection. In Fig. 1, the authors investigated the immunocompetence of different cell lines and HepG2/C3A and PLC3 were chosen for further studies. By utilizing a combination of flow cytometry, RT-qPCR and other techniques, the authors showed in Fig. 2 that the cell-cell contacts between pDCs and HEV infected cells induce the pDCs to secrete interferon (IFN). This interaction is mediated by cell adhesion molecules and is dependent on TLR7 signaling. The authors then went on to show that the IFN produced by pDCs controlled the viral spread. Further, using several mutant forms of ORF2 protein and utilizing imaging, RT-qPCR and other techniques, in Fig. 3 and 4 the authors elucidated the importance of the glycosylation pattern, localization of different forms of HEV ORF2 protein, cell-cell contact in triggering the immune response. Overall, this study provided insights in the pDC mediated IFN response against HEV.

      Major comments:

      1. The authors report that in the PLC3 cells, STOP mutation significantly reduced IFN⍺ production (Fig. 3f), significantly reduced pDC contact with infected cells (Fig. 4c) and thus concluded that the ORF2g/c is involved in pDC-infected cell interaction and IFN⍺ production. However, in the HepG2/C3A cells, the STOP mutation does not decrease the IFN⍺ production (Fig. 3e). In the manuscript, one of the key conclusions is that the glycosylated form of ORF2 leads to better recognition of the infected cells by pDC. So, it is critical that the difference in the IFN⍺ production between these two cell lines with STOP mutation is addressed with further details.
      2. The authors show that the IFN⍺ response was reduced in 5R/5A mutant HepG2/C3A cells (Fig. 3e), whereas the IFN⍺ response was completely absent in 5R/5A mutant PLC3 cells (Fig. 3f). The authors suggested that the difference in IFN⍺ response may be due to lack of ORF2i in PLC3 and other cell specific regulation in HepG2/C3A. Further evidence for this differential regulation would strengthen the claim.
      3. In the PLC3-pDC co-culture experiment (Fig. 2b), there is already an induction of IFN-1 (Interferon Lambda 1) in the uninfected PLC3-pDC co-culture (right panel, Fig. 2b). An explanation for the IFN-1 (Interferon Lambda 1) expression in the uninfected state would be helpful.

      Additional comments:

      1. Authors checked the expression of two ISGs- MXA, ISG15 in Fig. 1a-c, 2a-b. Were the expressions of other ISGs, such as members of OAS family (OAS1, OAS2 etc.), IFITM family or any other ISGs checked? This may be helpful, since in the Fig. 2c there is IFN⍺ production in pDC-infected PLC3 co-culture, but the ISGs (MXA, ISG15) are not upregulated significantly in Fig. 2b.
      2. In the HepG2/C3A-pDC co-culture experiment (Fig. 2a), there is not much difference in IFN-1 (Interferon Lambda 1) level in the infected HepG2/C3A-pDC co-culture (right panel, Fig. 2a) in comparison to infected HepG2/C3A alone (left panel, Fig. 2b), and also this outcome is different from that in the PLC3 experiment (Fig. 2b). Further clarification would help to support the conclusion regarding the IFN-1 (Interferon Lambda 1) upregulation in HEV infected cells-pDC co-culture.
      3. The authors show that in the pDC-PLC3 co-culture system, IFN⍺ was induced at 18h (Fig. 2c-2e), but the viral replication was not decreased in PLC3 cells (Fig. 2g). But, the HepG2/C3A-pDC co-culture has reduced viral replication at 18h (Fig. 2f). An explanation for the difference in the observation in two different cell lines at the same timepoints would strengthen the antiviral role of pDCs on HEV infected cells.
      4. The authors quantified the fold change in HEV infected PLC3+ cells in Fig. 2h. Was it performed by flow cytometry? It would be helpful to mention it in the figure legend. Also, if the said quantitation was done by flow cytometry, performing similar assay with HEPG2/C3A cells at 48h would provide the readers a better idea about the antiviral response across the cell lines at<br /> comparable timepoints.

      Minor comments:

      1. Was it expected to observe the increased induction of IL6 (Fig. 1b) in HepG2/C3A cells (but not in other cell lines) after IFN- (Interferon Lambda) treatment?
      2. In Fig. 3e, for the WT cells, 4 datapoints are visible while in the legend it is mentioned n=5.
      3. Typo: IRS661 in line 263, 699, Figure 2e.
      4. Typo: 200l in line 579.
      5. Catalogue number for ELISA kit is missing (Line 584).
      6. It would be helpful if the color code for the imaging in Supplementary figure 2f is provided on the top of the images, as it is provided in other images.

      Significance

      This article by Joshi et al. provides insight about the role of pDCs in controlling the HEV infection. However, the importance of pDC-infected cell contact mediated IFN-I secretion in antiviral response has been previously shown by the authors' group (Assil et al., 2019, Cell Host & Microbe) and others as well (E.g., Yun et al., 2021, Sci. Immunol.). The involvement of integrin mediated cell adhesion and TLR signaling in mediating this response was also shown. Though this manuscript does not advance the field of pDC biology or virology significantly, it does provide better understanding of the pDC antiviral response in the landscape of HEV infection. Although, it is out of the scope of this manuscript, elucidation of the mechanistic regulation how ORF2g/c controls the pDC-infected cell contact would be of great interest and significance. Overall, this study could be of interest to a general audience, especially to the virologists and researchers working in pDC biology.

    1. TJJD shall compile statewide statistics on the incidence of abuse, ne-glect, and exploitation as required by §261.402, Family Code.(1) The following statistical data, which contains no case-specific identifiers, is available to the public upon written request:(A) the number of reported allegations of abuse, ne-glect, and exploitation;(B) the classifications assigned to reported allegationsof abuse, neglect, and exploitation; and(C) the dispositions assigned to investigations ofreported allegations of abuse, neglect, and exploitation.

      is this different?

    2. Juvenile justice professionals must report [to the appropri-ate authorities and/or entities] any unethical behavior or violations ofthe code of ethics to TJJD and the administration of the department,program, facility, or non-juvenile justice contract facility where the ju-venile justice professional is an employee, volunteer, or contractor

      good

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The work analyzes how centrosomes mature before cell division. A critical aspect is the accumulation of pericentriolar material (PCM) around the centrioles to build competent centrosomes that can organize the mitotic spindle. The present work builds on the idea that the accumulation of PCM is catalyzed either by the centrioles themselves (leading to a constant accumulation rate) or by enzymes activated by the PCM itself (leading to autocatalytic accumulation). These ideas are captured by a previous model derived for PCM accumulation in C. elegans (ref. 8) and are succinctly summarized by Eq. 1. The main addition of the present work is to allow the activated enzymes to diffuse in the cell, so they can also catalyze the accumulation of PCM in other centrosomes (captured by Eqs. 2-4). The authors claim that this helps centrosomes to reach the same size, independent of potential initial mismatches.

      A strength of the paper is the simplicity of the equations, which are reduced to the bare minimum and thus allow a detailed inspection of the physical mechanism. One shortcoming of this approach is that all equations assume that the diffusion of molecules is much faster than any of the reactive time scales, although there is no experimental evidence for this.

      We appreciate the reviewer’s recognition of the strengths of our work. Indeed, the centrosome growth model incorporates multiple timescales corresponding to various reactions, and existing experimental data do not directly provide diffusion constants for the cytosolic proteins. However, we can estimate these diffusion constants using protein mass, based on the Stokes-Einstein relation, and compare the diffusion timescales with the reaction timescales obtained from FRAP analysis. For example, we estimate that the diffusion timescale for centrosomes separated by 5-10 micrometers is much smaller than the reaction timescales deduced from the FRAP experiments. Specifically, for SPD-5, a scaffold protein with a mass of ~150 kDa, the estimated diffusion constant is ~17 µm<sup>2</sup>/s, using the Stokes-Einstein relation and a reference diffusion constant of ~30 µm<sup>2</sup>/s for a 30 kDa GFP protein (reference: Bionumbers book). This results in a diffusion timescale of ~1 second for centrosomes 10 µm apart. In contrast, FRAP recovery timescales for SPD-5 in C. elegans embryos are on the order of several minutes, suggesting that scaffold protein binding reactions are much slower than diffusion. Therefore, a reaction-limited model is appropriate for studying PCM self-assembly during centrosome maturation. We have revised the manuscript to clarify this point and to include a discussion of the diffusion and reaction timescales.

      Spatially extended model with diffusion

      Both the reviewers have pointed out the importance of considering diffusion effects in centrosome size dynamics, and we agree that this is important to explore. We have developed a spatially extended 3D version of the centrosome growth model, incorporating stochastic reactions and diffusion (see Appendix 4). In this model, the system is divided into small reaction volumes (voxels), where reactions depend on local density, and diffusion is modeled as the transport of monomers/building blocks between voxels.

      We find that diffusion can alter the timescales of growth, particularly when the diffusion timescale is comparable to or slower than the reaction timescale, potentially mitigating size inequality by slowing down autocatalysis. However, the main conclusions of the catalytic growth model remain unchanged, showing robust size regulation independent of diffusion constant or centrosome separation (Figure 2—figure supplement 3). Hence, we focused on the effect of subunit diffusion on the autocatalytic growth model. We find that in the presence of diffusion, the size inequality reduces with increasing diffusion timescale, i.e., increasing distance between centrosomes and decreasing diffusion constant (Figure 2—figure supplement 4). However, the lack of robustness in size control in the autocatalyic growth model remains, i.e., the final size difference increases with increasing initial size difference. Notably, in the diffusion-limited regime (very small diffusion or large distances), the growth curve loses its sigmoidal shape, resembling the behavior in the non-autocatalytic limit (Figure 2). These findings are discussed in the revised manuscript.

      Another shortcoming of the paper is that it is not clear what species the authors are investigating and how general the model is. There are huge differences in centrosome maturation and the involved proteins between species. However, this is not mentioned in the abstract or introduction. Moreover, in the main body of the paper, the authors mention C. elegans on pages 2 and 3, but refer to Drosophila on page 4, switching back to C. elegans on page 5, and discuss Drosophila on page 6. This is confusing and looks as if they are cherry-picking elements from various species. The original model in ref. 8 was constructed for C. elegans and it is not clear whether the autocatalytic model is more general than that. In any case, a more thorough discussion of experimental evidence would be helpful.

      We believe one strength of our approach is its applicability across organisms. Our goal in comparing the theoretical model with experimental data from C. elegans and D.

      melanogaster is to demonstrate that the apparent qualitative differences in centrosome growth across species (see e.g., the extent of size scaling discussed in the section “Cytoplasmic pool depletion regulates centrosome size scaling with cell size”) may arise from the same underlying mechanisms in the theoretical model, albeit with different parameter values. We acknowledge differences in regulatory molecules between species, but the core pathways remain conserved see e.g. Raff, Trends in Cell Biology 2019, section: “Molecular Components of the Mitotic Centrosome Scaffold Appear to Have Been Conserved in Evolution from Worms to Humans”. In the revised manuscript, we have expanded the introduction to clarify this point and explain how our theory applies across species. We have also provided a clearer discussion of the experimental systems used throughout the manuscript and the available experimental evidence.

      The authors show convincingly that their model compensates for initial size differences in centrosomes and leads to more similar final sizes. These conclusions rely on numerical simulations, but it is not clear how the parameters listed in Table 1 were chosen and whether they are representative of the real situation. Since all presented models have many parameters, a detailed discussion on how the values were picked is indispensable. Without such a discussion, it is not clear how realistic the drawn conclusions are. Some of this could have been alleviated using a linear stability analysis of the ordinary differential equations from which one could have gotten insight into how the physical parameters affect the tendency to produce equal-sized centrosomes.

      Following the suggestion of the reviewer, we have revised the manuscript to add references and discussions justifying the choice of the parameter values used for the numerical simulations. These references and parameter choices can be found in Table 1 and Table 2, and are also discussed in relevant figure captions and within the manuscript text.

      We thank the reviewer for the excellent suggestion of including linear stability analysis of the ODE models of centrosome growth. We included linear stability analyses of the catalytic and autocatalytic growth models in Appendix 3. Analysis of the catalytic growth model reaffirms the robustness of size equality and the analysis of autocatalytic growth provides an approximate condition of size inequality. We have modified the revised manuscript to discuss these results.

      The authors use the fact that their model stabilizes centrosome size to argue that their model is superior to the previously published one, but I think that this conclusion is not necessarily justified by the presented data. The authors claim that "[...] none of the existing quantitative models can account for robustness in centrosome size equality in the presence of positive feedback." (page 1; similar sentence on page 2). This is not shown convincingly. In fact, ref 8. already addresses this problem (see Fig. 5 in ref. 8) to some extent.

      The linear stability analysis shown in Fig 5 in ref 8 (Zwicker et al, PNAS, 2014) shows that the solutions are stable around the fixed point and it was inferred from this result that Ostwald ripening can be suppressed by the catalytic activity of the centriole, therefore stabilizing the centrosomes (droplets) against coarsening by Ostwald ripening. But, if size discrepancy arises from the growth process (e.g., due to autocatalysis) the timescale of relaxation for such discrepancy is not clear from the above-mentioned result. We show (in figure 2 - figure supplement 3) that for any appreciable amount of positive feedback, the solution moves very slowly around the fixed point (almost like a line attractor) and cannot reach the fixed point in a biologically relevant timescale. Hence the model in ref 8 does not provide a robust mechanism for size control in the presence of autocatalytic growth. We have added this discussion in the Discussion section.

      More importantly, the conclusion seems to largely be based on the analysis shown in Fig. 2A, but the parameters going into this figure are not clear (see the previous paragraph). In particular, the initial size discrepancy of 0.1 µm^3 seems quite large, since it translates to a sphere of a radius of 300 nm. A similarly large initial discrepancy is used on page 3 without any justification. Since the original model itself already showed size stability, a careful quantitative comparison would be necessary.

      We thank the reviewer for the valuable suggestions. The parameters used in Fig. 2A are listed in Table 1 with corresponding references, and we used the parameter values from Zwicker et al. (2014) for rate constants and concentrations.

      The issue of initial size differences between centrosomes is important, but quantitative data on this are not readily available for C. elegans and Drosophila. Centrosomes may differ initially due to disparities in the amount and incorporation rate of PCM between the mother and daughter centrioles. Based on available images and videos (Cabral et al, Dev. Cell, 2019, DOI: https://doi.org/10.1016/j.devcel.2019.06.004), we estimated an initial radius of ~0.5 μm for centrosomes. Accounting for a 5% radius difference would lead to a volume difference of ~0.1 μm<sup>3</sup>, which was used in our analysis (Fig. 2A). These differences likely arise from distinct growth conditions of centrosomes containing different centrioles (older mother and newer daughter).

      More importantly, we emphasize that the initial size difference does not qualitatively alter the results presented in Figure 2. We agree that a quantitative analysis will further clarify our conclusions, and we have revised the manuscript accordingly. For example, Figure 2—figure supplement 3 provides a detailed analysis of how the final centrosome size depends on initial size differences across various parameter values. Additionally, Appendix 3 now includes analytical estimates of the onset of size inequality as a function of these parameters.

      The analysis of the size discrepancy relies on stochastic simulations (e.g., mentioned on pages 2 and 4), but all presented equations are deterministic. It's unclear what assumptions go into these stochastic equations, and how they are analyzed or simulated. Most importantly, the noise strength (presumably linked to the number of components) needs to be mentioned. How is this noise strength determined? What are the arguments for this choice? This is particularly crucial since the authors quote quantitative results (e.g., "a negligible difference in steady-state size (∼ 2% of mean size)" on page 4).

      As described in the Methods, we used the exact Gillespie method (Gillespie, JPC, 1977) to simulate the evolution of the stochastic trajectories of the systems, corresponding to the deterministic growth and reaction kinetics outlined in the manuscript. We've expanded the Methods to include further details on the stochastic simulations and refer to Appendix 1, where we describe the chemical master equations governing autocatalytic growth..

      The noise strength (fluctuations about the mean size of centrosome) does depend on the total monomer concentration (the pool size), and this may affect size inequality. Similar values of the total monomer concentration were used in the catalytic (0.04 uM) and autocatalytic growth (0.33 uM) simulations. These values for the pool size are similar to previous studies (Zwicker et al, PNAS, 2012) and have been optimized to obtain a good fit with experimental growth curves from C. elegans embryo data.

      To present more quantitative results, we have revised our manuscript to add data showing the effect of pool size on centrosome size inequality (Figure 3 - figure supplement 2). We find the size inequality in catalytic growth to increase with decreasing pool size as the origin of this inequality is the stochastic fluctuation in individual centrosome size. The size inequality (ratio of dv/<V>) in the autocatalytic growth does not depend (strongly) on the pool size (dv and <V> both increase similarly with pool size).

      Moreover, the two sets of testable predictions that are offered at the end of the paper are not very illuminative: The first set of predictions, namely that the model would anticipate an "increase in centrosome size with increasing enzyme concentration, the ability to modify the shape of the sigmoidal growth curve, and the manipulation of centrosome size scaling patterns by perturbing growth rate constants or enzyme concentrations.", are so general that they apply to all models describing centrosome growth. Consequently, these observations do not set the shared enzyme pool apart and are thus not useful to discriminate between models. The second part of the first set of predictions about shifting "size scaling" is potentially more interesting, although I could not discern whether "size scaling" referred to scaling with cell size, total amount of material, or enzymatic activity at the centrioles. The second prediction is potentially also interesting and could be checked directly by analyzing published data of the original model (see Fig. 5 of ref. 8). It is unclear to me why the authors did not attempt this.

      In response to the reviewers' valuable feedback, we have revised the manuscript to include results on potential methods for distinguishing catalytic growth from autocatalytic growth. Since the growth dynamics of a single centrosome do not significantly differ between these two models, it is necessary to experimentally examine the growth dynamics of a centrosome pair under various initial size perturbations. In Figure 3-figure supplement 2, we present theoretical predictions for both catalytic and autocatalytic growth models, illustrating the correlation between initial and final sizes after maturation. The figure demonstrates that the initial size difference and final size difference should be correlated only in the autocatalytic growth and the relative size inequality decreases with increasing subunit pool size in catalytic growth while remains almost unchanged in autocatalytic growth. These predictions can be experimentally examined by inducing varying centrosome sizes at the early stage of maturation for different expression levels of the scaffold former proteins.

      A second experimentally testable feature of the catalytic growth model involves sharing of the enzyme between both centrosomes. This could be tested through immunofluorescent staining of the kinase or by constructing a FRET reporter for PLK1 activity, where it can be studied if the active form of the PLK1 is found in the cytoplasm around the centrosomes indicating a shared pool of active enzyme. Additionally, photoactivated localization microscopy could be employed, where fluorescently tagged enzyme can be selectively photoactivated in one centrosome and intensity can be measured at the other centrosome to find the extent of enzyme sharing between the centrosomes.

      We also discuss shifts in centrosome size scaling behavior with cell size by varying parameters of the catalytic growth model (Fig 4). While quantitative analysis of size scaling in Drosophila is currently unavailable, such an investigation could enable us to distinguish catalytic growth mode with other models. We have included this point in the Discussion section.

      “The second prediction is potentially also interesting …” We assume the reviewer is referencing the scenario in Zwicker et al. (ref 8), where differences in centriole activity lead to unequal centrosome sizes. The data in that study represent a case of centrosome growth with variable centriole activity, resulting in size differences in both autocatalytic and catalytic growth models. This differs from our proposed experiment, where we induce unequal centrosome sizes without modifying centriole activity. We have now revised the text to clarify this distinction.

      Taken together, I think the shared enzyme pool is an interesting idea, but the experimental evidence for it is currently lacking. Moreover, the model seems to make little testable predictions that differ from previous models.

      We appreciate the reviewer’s interest in the core idea of our work. As mentioned earlier, we have improved the clarity in model predictions in the revised discussion section. Unfortunately, the lack of publicly available experimental data limits our ability to provide more direct experimental evidence. However, we are hopeful that our theoretical model will inspire future experiments to test these model predictions.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, Banerjee & Banerjee argue that a solely autocatalytic assembly model of the centrosome leads to size inequality. The authors instead propose a catalytic growth model with a shared enzyme pool. Using this model, the authors predict that size control is enzyme-mediate and are able to reproduce various experimental results such as centrosome size scaling with cell size and centrosome growth curves in C. elegans.

      The paper contains interesting results and is well-written and easy to follow/understand.

      We are delighted that the reviewer finds our work interesting, and we appreciate the thoughtful suggestions provided. In response, we have revised the text and figures to incorporate these recommendations. Below, we address each of the reviewer’s comments point by point:

      Suggestions:

      ● In the Introduction, when the authors mention that their "theory is based on recent experiments uncovering the interactions of the molecular components of centrosome assembly" it would be useful to mention what particular interactions these are.

      As the reviewer suggested, we have modified the introduction section to add the experimental observations upon which we build our model.

      ● In the Results and Discussion sections, the authors note various similarities and differences between what is known regarding centrosome formation in C. elegan and Drosophila. It would have been helpful to already make such distinctions in the Introduction (where some phenomena that may be C. elegans specific are implied to hold centrosomes universally). It would also be helpful to include more comments for the possible implications for other systems in which centrosomes have been studied, such as human, Zebrafish, and Xenopus.

      We thank the reviewer for this suggestion. We have modified the Introduction to motivate the comparative study of centrosome growth in different organisms and draw relevant connections to centrosome growth in other commonly studied organisms like Zebrafish and Xenopus.

      ● For Fig 1.C, the two axes are very close to being the same but are not. It makes the graph a little bit more difficult to interpret than if they were actually the same or distinctly different. It would be more useful to have them on the same scale and just have a legend.

      We have modified the Figure 1C in the revised manuscript. The plot now shows the growth of a single and a pair of centrosomes both on the same y-axis scale.

      ● The authors refer to Equation 1 as resulting from an "active liquid-liquid phase separation", but it is unclear what that means in this context because the rheology of the centrosome does not appear to be relevant.

      We used the term “active liquid-liquid phase separation” simply to refer to a previous model proposed by Zwicker et al (PNAS, 2014) where the underlying process of growth results from liquid-liquid phase separation. We agree with the reviewer that the rheological property of the centrosome is not very relevant in our discussions and we have thus removed the sentence from the revised manuscript to avoid any confusion.

      ● The authors reject the non-cooperative limit of Eq 1 because, even though it leads to size control, it does not give sigmoidal dynamics (Figure 2B). While I appreciate that this is just meant to be illustrative, I still find it to be a weak argument because I would guess a number of different minor tweaks to the model might keep size control while inducing sigmoidal dynamics, such as size-dependent addition of loss rates (which could be due to reactions happen on the surface of the centrosome instead of in its bulk, for example). Is my intuition incorrect? Is there an alternative reason to reject such possible modifications?

      The reviewer raises an interesting point here. However, we disagree with the idea that minor adjustments to the model can produce sigmoidal growth curves while still maintaining size control. In the absence of an external, time-dependent increase in building block concentration (which would lead to an increasing growth rate), achieving sigmoidal growth requires a positive feedback mechanism in the growth rate. This positive feedback alone could introduce size inequality unless shared equally between the centrosomes, as it is in our model of catalytic growth in a shared enzyme pool. The proposed modification involving size-dependent addition or loss rates due to surface assembly/disassembly may result in unequal sizes precisely because of this positive feedback. A similar example is provided in Appendix 1, where assembly and disassembly across the pericentriolic material volume lead to sigmoidal growth but also generate significant size inequality and lack of robustness in size control.

      ● While the inset of Figure 3D is visually convincing, it would be good to include a statistical test for completeness.

      Following the reviewer’s suggestion, we present a statistical analysis in Figure 3 - Figure supplement 2 in the modified manuscript to enhance clarity. We show that the size difference values are uncorrelated (Pearson’s correlation coefficient ~ 0) with the initial size difference indicating the robustness of the size regulation mechanism.

      ● The authors note that the pulse in active enzyme in their model is reminiscent of the Polo kinase pulse observed in Drosophila. Can the authors use these published experimental results to more tightly constrain what parameter regime in their model would be relevant for Drosophila? Can the authors make predictions of how this pulse might vary in other systems such as C. elegans?

      Thank you for the insightful suggestion regarding the use of pulse dynamics in experiments to better constrain the model’s parameter regime. In our revised manuscript, we attempted this analysis; however, the data from Wong et al. (EMBO 2022) for Drosophila are presented as normalized intensity in arbitrary units, rather than as quantitative measures of centrosome size or Polo enzyme concentration. This lack of quantitative data limits our ability to benchmark the model beyond capturing qualitative trends. We thus believe that quantitative measurements of centrosome size and enzyme concentration are necessary to achieve a tighter alignment between model predictions and biological data.

      We discuss the enzyme dynamics in C. elegans in the revised manuscript. We find the enzyme dynamics corresponding to the fitted growth curves of C. elegans centrosomes are distinctly different from the ones observed in Drosophila. Instead of the pulse-like feature, we find a step-like increase in (cytosolic) active enzyme concentration.

      ● The authors mention that the shared enzyme pool is likely not diffusion-limited in C. elegans embryos, but this might change in larger embryos such as Drosophila or Xenopus. It would be interesting for the authors to include a more in-depth discussion of when diffusion will or will not matter, and what the consequence of being in a diffusion-limit regime might be.

      Both the reviewers have pointed out the importance of considering diffusion effects in centrosome size dynamics, and we agree that this is important to explore. We have developed a spatially extended 3D version of the centrosome growth model, incorporating stochastic reactions and diffusion (see Appendix 4). In this model, the system is divided into small reaction volumes (voxels), where reactions depend on local density, and diffusion is modeled as the transport of monomers/building blocks between voxels.

      We find that diffusion can alter the timescales of growth, particularly when the diffusion timescale is comparable to or slower than the reaction timescale, potentially mitigating size inequality by slowing down autocatalysis. However, the main conclusions of the catalytic growth model remain unchanged, showing robust size regulation independent of diffusion constant or centrosome separation (Figure 2—figure supplement 3). Hence, we focused on the effect of subunit diffusion on the autocatalytic growth model. We find that in the presence of diffusion, the size inequality reduces with increasing diffusion timescale, i.e., increasing distance between centrosomes and decreasing diffusion constant (Figure 2—figure supplement 4). However, the lack of robustness in size control in the autocatalyic growth model remains, i.e., the final size difference increases with increasing initial size difference. Notably, in the diffusion-limited regime (very small diffusion or large distances), the growth curve loses its sigmoidal shape, resembling the behavior in the non-autocatalytic limit (Figure 2). These findings are discussed in the revised manuscript.

      ● The authors state "Firstly, our model posits the sharing of the enzyme between both centrosomes. This hypothesis can potentially be experimentally tested through immunofluorescent staining of the kinase or by constructing FRET reporter of PLK1 activity." I don't understand how such experiments would be helpful for determining if enzymes are shared between the two centrosomes. It would be helpful for the authors to elaborate.

      Our results indicate the necessity of the centrosome-activated enzyme to be shared for the robust regulation of centrosome size equality. If a FRET reporter of the active form of the enzyme (e.g., PLK1) can be constructed then the localization of the active form of the enzyme may be determined in the cytosol. We propose this based on reports of studying PLK activities in subcellular compartments using FRET as described in Allen & Zhang, BBRC (2006). Such experiments will be a direct proof of the shared enzyme pool. Following the reviewer’s suggestion, we have modified the description of the FRET based possible experimental test for the shared enzyme pool hypothesis in the revised manuscript.

      Additionally, we have added another possible experimental test based on photoactivated localization microscopy (PALM), where tagged enzyme can be selectively photoactivated in one centrosome and intensity measured at the other centrosome to indicate whether the enzyme is shared between the centrosomes.

      Recommendations for the authors:

      The manuscript needs to clarify better what species the model describes, how alternative models were rejected, and how the parameters were chosen.

      In the revised manuscript, we have connect the chemical species in our model to those documented in organisms like Drosophila and C. elegans. This connection is detailed in the main text under the Catalytic Growth Model section and summarized in Table 2. We discuss alternative models and our reasons for excluding them in the first results section on autocatalytic growth, with additional details provided in Appendix 1 and the accompanying supplementary figures. The selection of model parameters is addressed in the main text and methods, with references listed in Table 1. We believe that these revisions, along with our point-by-point responses to reviewer comments, comprehensively address all reviewer concerns.

      Reviewer #1 (Recommendations For The Authors):

      I think the style and structure of the paper could be improved on at least two accounts:

      (1) What's the role of the last section ("Multi-component centrosome model reveals the utility of shared catalysis on centrosome size control.")? It seems to simply add another component, keeping the essential structure of the model untouched. Not surprisingly, the qualitative features of the model are preserved and quantitative features are not discussed anyway.

      This model provides a more realistic description of centrosome growth by incorporating the dynamics of the two primary scaffold-forming subunits and their interactions with an enzyme. It is based on the observation that the major interaction pathways among centrosome components are conserved across many organisms (see Raff, Trends in Cell Biology, 2019 and Table 2), typically involving two scaffold-forming proteins and one enzyme that mediates positive feedback between them. These pathways may involve homologous proteins in different species.

      This model allows us to validate the experimentally observed spatial spread of the two subunits, Cnn and Spd-2, in Drosophila. Additionally, we used it to investigate the impact of relaxing the assumption of a shared enzyme pool on size control. Although similar insights could be obtained using a single-component model, the two-component model offers a more biologically relevant framework. We have highlighted these points in the revised manuscript to ensure clarity.

      (2 ) The very long discussion section is not very helpful. First, it mostly reiterates points already made in the main text. Second, it makes arguments for the choice of modeling (top left column of page 8), which probably should have been made when introducing the model. Third, it introduces new results (lower left column of page 8), which should probably be moved to the main text. Fourth, the interpretation of the model in light of the known biochemistry is useful and should probably be expanded although I think it would be crucial to keep information from different organisms clearly separate (this last point actually holds for the entire manuscript).

      We thank the reviewer for the feedback. We have modified the discussion section to focus more on the interpretation of the results, model predictions and future outlook with possible experiments to validate crucial aspects of the model. We have moved most of the justifications to the main text model description.

      Here are a few additional minor points:

      * page 1: Typo "for for" → "for"

      * Page 8: Typo "to to" → "to"

      We thank the reviewer for the useful recommendations. We have corrected all the typos in the revised manuscript.

      * Why can diffusion be neglected in Eq. 1? This is discussed only very vaguely in the main text (on page 3). Strangely, there is some discussion of this crucial initial step in the discussion section, although the diffusion time of PLK1 is compared to the centrosome growth time there and not the more relevant enzyme-mediate conversion rate or enzyme deactivation rate.

      We now discuss the justification of neglecting diffusion while motivating the model. We have added a more detailed discussion in the Methods section. We estimate the timescale of diffusion for the scaffold formers and the enzyme and compare them with the turnover timescales of the respective proteins Spd-2, Cnn and Polo. We find the proteins to diffuse fast compared to their FRAP recovery timescales indicating reaction timescales to be slower than the timescales of diffusion. Nevertheless, following the reviewer’s suggestion, we have also investigated the effect of diffusion on the growth process in Appendix 4.

      * Page 3: The comparison k_0^+ ≫ k_1^+ is meaningless without specifying the number of subunits n. I even doubt that this condition is the correct one since even if k_0^+ is two orders of magnitude larger than k_1^+, the autocatalytic term can dominate if there are many subunits.

      We thank the reviewer for the insightful comment on the comparison between the growth rates k^+_0 and k^+_1. Indeed, the pool size matters and we have now included a linear stability analysis of the autocatalytic growth equations in Appendix 3 to estimate the condition for size inequality. We have commented on these new findings in the revised manuscript.

      * The Eqs. 2-4 are difficult to follow in my mind. For instance, it is not clear why the variables N_av and N_av^E are introduced when they evidently are equivalent to S_1 and E. It would also help to explicitly mention that V_c is the cell volume. Moreover, do these equations contain any centriolar activity? If so, I could not understand what term mediates this. If not, it might be good to mention this explicitly.

      Following the reviewer’s suggestion, we have modified the equations 2-4 and added the definition of V_c to enhance clarity in the revised manuscript. The centriole activity is given by k^+ in the catalytic model. We now explicitly mention it.

      * Page 4: The observed peak of active enzyme (Fig 3C) is compared to experimental observation of a PLK1 peak at centrosomes in Drosophila (ref. 28). However, if I understand correctly, the peak in the model refers to active enzyme in the entire cell (and the point of the model is that this enzymatic pool is shared everywhere), whereas the experimental measurement quantified the amount of PLK1 at the centrosome (and not the activity of the enzyme). How are the quantity in the model related to the experimental measurements?

      The reviewer is correct in pointing out the difference between the quantities calculated from our model and those measured in the experiment by Wong et al. We have clarified this point in the revised manuscript. We hypothesize that if, in future experiments, the active (phosphorylated) polo can be observed by using a possible FRET reporter of activity then the cytosolic pulse can be observed too. We discuss this point in the revised manuscript.

      * Page 6: The asymmetry due to differences in centriolar activity is apparently been done for both models (Eq. 1 and Eqs. 2-4), referring to a parameter k_0^+ in both cases. How does this parameter enter in the latter model? More generally, I don't really understand the difference in the two rows in Fig. 5 - is the top row referring to growth driven by centriolar activity while the lower row refers to pure autocatalytic growth? If so, what about the hybrid model where both mechanisms enter? This is particularly relevant, since ref. 8 claims that such a hybrid model explains growth curves of asymmetric centrosomes quantitatively. Along these lines, the analysis of asymmetric growth is quite vague and at most qualitative. Can the models also explain differential growth quantitatively?

      We believe the reviewer’s comment on centrosome size asymmetry may stem from a lack of clarity in our initial explanation. In this section, as shown in Figure 5, we compare the full autocatalytic model (where both k_0^+ and k_1^+ are non-zero) with the catalytic model. The confusion might have arisen due to an unclear definition of centriolar activity in the catalytic growth model, which we have clarified in the revised manuscript. Specifically, we use k+ in the catalytic model and k0+ in the autocatalytic model as indicators of centriolar activity.

      Our findings quantitatively demonstrate that variations in centriole activity can robustly drive size asymmetry in catalytic growth, independent of initial size differences. However, in autocatalytic growth, increased initial size differences make the system more vulnerable to a loss of regulation, as positive feedback can amplify these differences, ultimately influencing the final size asymmetry. Our results do not contradict Zwicker et al. (ref 8); rather, they complement it. We show that size asymmetry in autocatalytic growth is governed by both centriole activity and positive feedback, highlighting that centriole activity alone cannot robustly regulate centrosome size asymmetry within this framework.

      * The code for performing the simulations does not seem to be available

      We have now made the main codes available in a GitHub repository. Link: https://github.com/BanerjeeLab/Centrosome_growth_model

    1. Cono Elliot's work on notational design and his influential papers - Cono got his PhD at Carnegie Mellon University in the '90s under Frank Fenny working on higher order unification. - Cono has devoted his life to thinking and refining graphic computation and tools behind it, and has published influential papers on various topics related to functional programming and notational design.

      Living in a forest setting with deep connection to nature. - Conor lives on 20 acres next to his family's 60 acres and has a deep emotional connection to the place because of his parents' presence. - He sees a connection between nature and technology, highlighting the non-sequential nature of computation and neurology.

      We are in a pre-scientific age of thinking about computation. - Humans have created thinking organisms that think systematically, leading to computation. - We are in an awkward phase of thinking about computation in a clumsy and pre-scientific way.

      Humans are driven by curiosity to understand the universe. - We have a limited ability to perceive the universe due to our evolutionary constraints. - Through the advancement of science and technology, we have developed tools like telescopes, microscopes, high-speed cameras, and time-lapse to enhance our perception.

      Elegance and wonder in computer science - Elegance is the deepest value in computer science, inspiring a sense of play and wonder. - Computer science is in a deeply inelegant phase, but there is potential for Elegance and Beauty in the field.

      Elegance as a guiding value in theoretical physics - Elegance guided Einstein in developing the special and general theory of relativity. - Modern civilization is built on general relativity and quantum physics; GPS system corrects for relativity.

      Elegance and simplicity in formalizing concepts in computer science - Elegance and simplicity in formalizing concepts are related - People often mistake familiarity for simplicity in programming

      Academia today lacks time for critical thinking - Focused on churning out papers and credentials - Issue of education accessibility affecting teaching quality

      Semantics is crucial in programming - Meanings are called semantics - The relationship between a program and its meaning is important

      Dana Scott answered the crucial question of the mathematical meaning of Lambda calculus in 1970. - Lambda calculus was originally intended for encoding high order logic and quantifiers, not for programming. - Peter Landon realized the potential of Lambda calculus for programming and introduced the concept of executing Lambda calculus on a machine.

      Languages convey meanings, computation looks at meanings. - Languages and programming languages serve the same purpose: to convey meanings. - Computation and technological tools help us observe and understand meanings in various forms, from stars and quasars to microorganisms and atoms.

      Euclid revolutionized geometry with his conceptual approach - Introduced a new way of thinking about geometry with axioms and postulates - Plato's influence on the idea of mathematical space and its relation to the physical world

      Mathematics describes real truth and possibly taps into platonic truth. - Platonist perspective considers mathematics as a way to describe truth beyond us. - Success of mathematical fantasy or story inspires acting as if tapping into platonic truth.

      Ancient beliefs about movement of stars and planets - Stars and planets thought to move in circular paths due to perfection/God concept - Some stars behaved differently, known as 'The Wanderers' or planets

      Kepler discovered planetary laws - Planets move in an ellipse, not a circle - Kepler's explanation lacked why planets move in an ellipse

      Scientific theories evolve with enhanced observations - Newton's theory successful until discrepancies discovered in the 20th century - Einstein's theory validated through observations of planet Mercury during solar eclipse

      Scientific exploration is an unending journey - Science aims to understand what we don't know - In academia, the system often fails to reward wonder and not knowing

      Denotational semantics helps distinguish beauty and elegance from complexity - Beauty or elegance in theory is described precisely in terms of mathematics - Fortran, led by John Backus, introduced expressions, advancing from Von Neumann style sequential programming

      Functional programming emphasizes expressions over statements - Fortran blends statements and expressions but still leans towards statements - Functional programming eliminates everything except expressions

      Hardware limitations led to sequential model prototyping - John Von Neumann's experiment from 1947 is still relevant in 2022 - John Backus discussed fundamental problems in computing during the war

      Von Neumann bottleneck affects computer performance. - Physical bottleneck slows computers due to high heat generation. - Mental bottleneck limits brain capacity and mental efficiency.

      Breaking out of the Von Neumann bottleneck - The Von Neumann style of programming forces us to think small and is fundamentally sequential and mechanistic. - The lecture emphasizes the importance of thinking in larger, powerful notions and focusing on functions rather than words.

      Functions as building blocks for knowledge - Functions built from other functions allow for scalability and creation of complex systems - Importance of denotational semantics in designing new languages rather than just explaining existing ones

      Backus emphasized fixing defects and learning from mistakes. - Using denotational semantics reveals detailed defects in existing languages. - Advancement in computer science involves replacing outdated concepts like go-to with structured and functional programming.

      The cost of focusing on education and progress is losing the ability to make significant advances in science. - The speaker expresses disappointment with the impact of Academia on progress and science. - The speaker remains dedicated to truth and beauty, advocating for the importance of denotational semantics in making aesthetic distinctions.

      Ideas are expressions of beauty or ugliness which give deep insights across fields. - Denotational semantics serves as a reliable guide to beauty and elegance in ideas. - Beauty and elegance are valuable guides for understanding the universe and computation.

      Passion for mathematics and computer graphics - Attended undergrad in math at UC Santar with a small group of math students in a nurturing environment - Transitioned to grad school at Carnegie Mellon for computer science and pursued computer graphics due to love for geometry and math

      Had to change plans at Carnegie Mellon - Arrived at CMU to study computer graphics, but found out the people I wanted to study with had left - Discovered a group focusing on reasoning about programs, which became the focus of my PhD work

      Transition to computer graphics and involvement in group projects - Worked with notable advisors like Dana Scott, John Reynolds, and Frank - Focused on exploring the next advancements in programming interfaces and data structures at Sun Microsystems

      Introduction to denotational semantics in understanding language meanings - Studied denotational semantics under Stevenh Brooks and Dana Scott in grad school, leading to a revelation on language meanings - Believes language meanings should be independent of specific machines and analyzed compositionally for better understanding

      Graphics programs are sequential commands organizing video memory for visual output. - Graphics programs are different from traditional software due to their focus on organizing instructions for video memory. - Alternative design paradigms focus on conveying meanings and inventing tools to help users view desired content through a computer.

      Designing a language library for geometry and colors - Creating a composable vocabulary of geometry and colors, similar to modern linguistic frameworks - Developing a rich system of types for three-dimensional geometry and adding a time component to the design

      Rendering graphics offscreen to build up incrementally for a correct answer. - Rendering offscreen allows showing previous true things before replacing them incrementally. - Temporal discreetness in computer graphics breaks compositionality and introduces fundamental bugs.

      Compositional models with approximations lose accuracy when composed - Compositional models incorporating approximations result in gross inaccuracies upon composition - Functional reactive programming involves composing before approximating for accurate results

      Outline fonts are resolution independent - Outline fonts are continuous and do not have pixels when zoomed in - Switching from bitmap graphics to outline fonts improves efficiency and clarity

      Transition from discrete to continuous programming in space and time. - Examples of continuous programming in space like fonts, 2D and 3D geometry, vector graphics. - Applying continuous programming principles to time requires a fundamental shift in implementing and describing things that vary with time within the Von Neumann model.

      John Reynolds introduced the idea of using functions from the reals instead of sequences for solving time interpolation problems - This approach helped in resolving issues with interpolations and time manipulation - Continuous time modeling was found to be more effective than discrete modeling for things that vary with time

      Functional programming requires a shift from loops to lazy lists - Functional programming involves describing the mathematical model behind the data manipulation - The common reasoning that input and output data should have the same nature is wrong in functional programming

      Functional reactive programming is about understanding concepts in the simplest, most elegant compositional terms. - It emphasizes denotational semantics, where types have a mathematical model. - It focuses on fully explaining operations in terms of the model, independent of implementation.

      Programming expresses ideas with clear understanding before implementation - Category Theory is appreciated for its precise and elegant tools in mathematics - Functional Reactive Programming lacks denotational and compositional principles, leading to fundamental misunderstandings in programming

      Algebraic patterns like monoids and distributivity are powerful for organizing reasoning - There are different types of monoids like addition and multiplication each with their own properties - Multiplication distributes over addition and zero plays a special role in this interaction

      Algebra and category theory provide reusability and reasoning in mathematics and programming. - Algebra allows for reasoning that is parameterized and applicable to different mathematical scenarios. - Category theory generalizes various algebraic concepts and is important for correctness in programming.

      The complexity of Python programs and limited cognitive abilities can lead to a lack of understanding. - Options include quitting the profession or divorcing what you've seen from what you do. - Another option is switching to a language with simple semantics, such as purely functional or denotative languages.

      Denotative programming allows for proving program correctness. - Denotative programming enables answering questions about the multiple meanings of programs. - Functional programs can have meanings within a cartesian closed category.

      Tropical semi-rings relate to timing analysis of parallel computations. - Understanding operations of plus and max in relation to semi-rings. - Realization of dot products and matrix multiplication pattern in timing analysis.

      Timing analysis can be described compositionally using the language of categories - I realized the parallel sequential composition is the fundamental building blocks of functions computation - The type Lambda calculus has more than one model, and the mathematical values it describes can have different interpretations

      Realizing the connection between HCLL and lambda calculus led to successful compilation to hardware. - HCLL translates to a small core lambda calculus - Interpreting lambda calculus in cartesian closed categories enabled successful compilation to hardware

      Exploring unconventional categories for computation - Discovering powerful ideas by compiling categories since 1980 - Seeking beauty in solutions to drive innovation and never settling for unsatisfactory answers

      Geometry and the introduction to proof changed my life - The systematic way of exploring what is true and growing knowledge in geometry was a life-changing concept for me. - Discovering computers at Lawrence Hall of Science through the Star Trek Club in high school eventually led me to computation.

      Introduction to programming through games on teletypes - Experiencing games and printing out results on rolls of paper as souvenirs - Discovering source code hidden in the printed paper, initiating an interest in programming

      Started college with no computer science department, emphasized logic and enjoyed math contests - Computer science classes offered in math department or College of Engineering - Discovered talent and passion for math despite discouragement from elementary school teacher

      The origin of computer science in universities and its impact on its development - Initial classes were labeled as Computer Science or logic, sparking a debate on department placement. - Placement in engineering rather than mathematics influenced the practical nature of computer science education.

      Transition from imperative to functional programming - Discovered Haskell as a better alternative to imperative programming languages - Applied Haskell in programming for 25+ years and mentorship in hardware design for machine learning

      Realizing the power of category theory in simplifying automatic differentiation - Changed vocabulary to be more symmetric with respect to composition - Describing automatic differentiation in the language of categories simplifies and generalizes it

      Denotational design is key for software implementation - HLL was not effective for teaching denotational design - Inner guidance essential for understanding and using HLL effectively

      Struggling with teaching denotations and homomorphisms in programming - Encountered issues with students not understanding correct implementations - Wanted compiler to indicate errors instead of personally correcting

      Understanding the question is more important than answering it correctly - Operational thinking is about biases in answering problems and questions - The most important thing is to understand the question in the most beautiful way

      Realization about teaching and learning process - Programmers differ in their attitude towards being told they're wrong - Importance of being open to feedback for growth in programming

      Automation has benefits but limited scalability - SMT automation has advantages in problem-solving but faces scaling limitations - Despite advancements, SMT technology cannot achieve unlimited scalability

      Agda is the most tasteful tool for working with dependent types. - Agda offers beauty, consistency, simplicity, and tremendous power. - Agda contributes to an incredibly beautiful story about the equivalence of computation, logic, and the foundations of mathematics.

      Exploring if all of mathematics can be built on logic - David Hilbert's attempt to formalize logic in the early stages - Can natural numbers be understood via logic as a foundation?

      Natural numbers are a profound and important concept - Natural numbers are a product of human construction on top of other systems - Piano numbers are a significant concept in mathematics

      Constructive logic allows expression of proofs as either A or B - In constructive logic, every proof of A or B can be expressed as a proof of A or a proof of B - Brower's logic allows for this expression without the law of excluded middle, leading to simple answers for negation, implication, truth, and falsehood in terms of types.

      De bruyne pioneered logic computable through computers - Exploration of dependent typing and realization of logic and types - Mechanization of information and manipulation, leading to modern programming languages

      The power of math and knowledge in programming - Manipulating from the bones is a powerful and beautiful concept - Embracing sequential stateful notion of computation limits insights and learning

      Written language enabled deep reflection and improvement of ideas. - Written language allowed ideas to be examined and improved over time. - Written language initiated a feedback loop for continuous enhancement of concepts.

      Continuous improvement through iterative optimization - Iteratively refining program logic and expressions for efficiency and clarity. - Enhanced abstraction and reusability through denotational design and parameterization.

      The debate on using formal proofs in industry - Industry perspective often argues against formal proofs due to perceived time constraints and impracticality. - Decision to use formal proofs depends on the objectives and the value placed on accuracy and thoroughness.

      Achieving 100% correctness is the only way to reach 95%. - Errors compound, leading to significant deviations in calculations. - Approximations and probable correctness can lead to overall incorrectness in complex projects.

      Inspired by deep conversation - The conversation has been engaging and has touched on major topics of interest. - The speaker hopes to discuss denotational design and its application in software design.

      Create space for contemplation in the age of instant information - Encourage meditation and reflection on content - Announcement about a dedicated email for audience feedback and inquiries

    1. Reviewer #3 (Public review):

      Summary:

      Li et al. describe an audiovisual temporal recalibration experiment in which participants perform baseline sessions of ternary order judgments about audiovisual stimulus pairs with various stimulus-onset asynchronies (SOAs). These are followed by adaptation at several adapting SOAs (each on a different day), followed by post-adaptation sessions to assess changes in psychometric functions. The key novelty is the formal specification and application/fit of a causal-inference model for the perception of relative timing, providing simulated predictions for the complete set of psychometric functions both pre and post adaptation.

      Strengths:

      (1) Formal models are preferable to vague theoretical statements about a process, and prior to this work, certain accounts of temporal recalibration (specifically those that do not rely on a population code) had only qualitative theoretical statements to explain how/why the magnitude of recalibration changes non-linearly with the stimulus-onset asynchrony of the adaptor.<br /> (2) The experiment is appropriate, the methods are well described, and the average model prediction is a good match to the average data (Figure 4). Conclusions are supported by the data and modelling.<br /> (3) The work should be impactful. There seems a good chance that this will become the go-to modelling framework for those exploring non population-code accounts of temporal recalibration (or comparing them with population-code accounts).<br /> (4) Key issues for the generality of the model, such as recalibration asymmetries reported by other authors that are inconsistent with those reported here, are thoughtfully discussed.

      Weaknesses:

      (1) Models are not compared using a gold-standard measure such as leave-one-out cross validation. However, this is legitimate given lengthy model fitting times, and a sensible approximation is presented.<br /> (2) The model misses in a systematic way for the psychometric functions of some participants/conditions. In addition to misses relating to occasional failures to estimate the magnitude of recalibration, some of the misses are because all functions are only permitted to shift in central tendency (whereas some participants show changes better characterized at one or both decision criteria). Given the fact that the modelling in general embraces individual differences, it might have been worth allowing different kinds of change for different participants. However, this is not really critical for the central concern (changes in the magnitude of recalibration for different adaptors) and there is a limit to how much can be done along these lines without making the model too flexible to test.<br /> (3) As a minor point, the model relies on simulation, which may limit its take-up/application by others in the field (although open access code will be provided).

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study asks whether the phenomenon of crossmodal temporal recalibration, i.e. the adjustment of time perception by consistent temporal mismatches across the senses, can be explained by the concept of multisensory causal inference. In particular, they ask whether the explanation offered by causal inference better explains temporal recalibration better than a model assuming that crossmodal stimuli are always integrated, regardless of how discrepant they are.

      The study is motivated by previous work in the spatial domain, where it has been shown consistently across studies that the use of crossmodal spatial information is explained by the concept of multisensory causal inference. It is also motivated by the observation that the behavioral data showcasing temporal recalibration feature nonlinearities that, by their nature, cannot be explained by a fixed integration model (sometimes also called mandatory fusion).

      To probe this the authors implemented a sophisticated experiment that probed temporal recalibration in several sessions. They then fit the data using the two classes of candidate models and rely on model criteria to provide evidence for their conclusion. The study is sophisticated, conceptually and technically state-of-the-art, and theoretically grounded. The data clearly support the authors’ conclusions.

      I find the conceptual advance somewhat limited. First, by design, the fixed integration model cannot explain data with a nonlinear dependency on multisensory discrepancy, as already explained in many studies on spatial multisensory perception. Hence, it is not surprising that the causal inference model better fits the data.

      We have addressed this comment by including an asynchrony-contingent model, which is capable of predicting the nonlinearity of recalibration effects by employing a heuristic approximation of the causal-inference process (Fig. 3). We also updated the previous competitor model with a more reasonable asynchrony-correction model as the baseline of model comparison, which assumes recalibration aims to restore synchrony whenever the sensory measurement of SOA indicates an asynchrony. The causal-inference model outperformed both models, as indicated by model evidence (Fig. 4A). Furthermore, model predictions show that the causal-inference model more accurately captures recalibration at large SOAs at both the group (Fig. 4B) and the individual levels (Fig. S4).

      Second, and again similar to studies on spatial paradigms, the causal inference model fails to predict the behavioral data for large discrepancies. The model predictions in Figure 5 show the (expected) vanishing recalibration for large delta, while the behavioral data don’t decay to zero. Either the range of tested SOAs is too small to show that both the model and data converge to the same vanishing effect at large SOAs, or the model's formula is not the best for explaining the data. Again, the studies using spatial paradigms have the same problem, but in my view, this poses the most interesting question here.

      We included an additional simulation (Fig. 5B) to show that the causal-inference model can predict non-zero recalibration for long adapter SOAs, especially in observers with a high common-cause prior and low sensory precision. This ability to predict a non-zero recalibration effect even at large SOA, such as 0.7 s, is one key feature of the causal-inference model that distinguishes it from the asynchrony-contingent model.

      In my view there is nothing generally wrong with the study, it does extend the 'known' to another type of paradigm. However, it covers little new ground on the conceptual side.

      On that note, the small sample size of n=10 is likely not an issue, but still, it is on the very low end for this type of study.

      This study used a within-subject design, which included 3 phases each repeated in 9 sessions, totaling 13.5 hours per participant. This extensive data collection allows us to better constrain the model for each participant. Our conclusions are based on the different models’ ability to fit individual data.

      Reviewer #2 (Public Review):

      Summary:

      Li et al.’s goal is to understand the mechanisms of audiovisual temporal recalibration. This is an interesting challenge that the brain readily solves in order to compensate for real-world latency differences in the time of arrival of audio/visual signals. To do this they perform a 3-phase recalibration experiment on 9 observers that involves a temporal order judgment (TOJ) pretest and posttest (in which observers are required to judge whether an auditory and visual stimulus were coincident, auditory leading or visual leading) and a conditioning phase in which participants are exposed to a sequence of AV stimuli with a particular temporal disparity. Participants are required to monitor both streams of information for infrequent oddballs, before being tested again in the TOJ, although this time there are 3 conditioning trials for every 1 TOJ trial. Like many previous studies, they demonstrate that conditioning stimuli shift the point of subjective simultaneity (pss) in the direction of the exposure sequence.

      These shifts are modest - maxing out at around -50 ms for auditory leading sequences and slightly less than that for visual leading sequences. Similar effects are observed even for the longest offsets where it seems unlikely listeners would perceive the stimuli as synchronous (and therefore under a causal inference model you might intuitively expect no recalibration, and indeed simulations in Figure 5 seem to predict exactly that which isn't what most of their human observers did). Overall I think their data contribute evidence that a causal inference step is likely included within the process of recalibration.

      Strengths:

      The manuscript performs comprehensive testing over 9 days and 100s of trials and accompanies this with mathematical models to explain the data. The paper is reasonably clearly written and the data appear to support the conclusions.

      Weaknesses:

      While I believe the data contribute evidence that a causal inference step is likely included within the process of recalibration, this to my mind is not a mechanism but might be seen more as a logical checkpoint to determine whether whatever underlying neuronal mechanism actually instantiates the recalibration should be triggered.

      We have addressed this comment by replacing the fixed-update model with an asynchrony-correction model, which assumes that the system first evaluates whether the measurement of SOA is asynchronous, thus indicating a need for recalibration (Fig. 3). If it does, it shifts the audiovisual bias by a proportion of the measured SOA. We additionally included an asynchrony-contingent model, which is capable of replicating the nonlinearity of recalibration effects by a heuristic approximation of the causal-inference process.

      Model comparisons indicate that the causal-inference model of temporal recalibration outperforms both alternative models (Fig. 4A). Furthermore, the model predictions demonstrate that the causal-inference model more accurately captures recalibration at large SOAs at both the group level (Fig. 4B) and individual level (Fig. S4).

      The authors’ causal inference model strongly predicts that there should be no recalibration for stimuli at 0.7 ms offset, yet only 3/9 participants appear to show this effect. They note that a significant difference in their design and that of others is the inclusion of longer lags, which are unlikely to originate from the same source, but don’t offer any explanation for this key difference between their data and the predictions of a causal inference model.

      We added further simulations to show that the causal-inference model can predict non-zero recalibration also for longer adapter SOAs, especially in observers with a large common-cause prior (Fig. 5A) and low sensory precision (Fig. 5B). This ability to predict a non-zero recalibration effect even at longer adapter SOAs, such as 0.7 s, is a key feature of the causal-inference model that distinguishes it from the asynchrony-contingent model.

      I’m also not completely convinced that the causal inference model isn’t ‘best’ simply because it has sufficient free parameters to capture the noise in the data. The tested models do not (I think) have equivalent complexity - the causal inference model fits best, but has more parameters with which to fit the data. Moreover, while it fits ‘best’, is it a good model? Figure S6 is useful in this regard but is not completely clear - are the red dots the actual data or the causal inference prediction? This suggests that it does fit the data very well, but is this based on predicting held-out data, or is it just that by having more parameters it can better capture the noise? Similarly, S7 is a potentially useful figure but it's not clear what is data and what are model predictions (what are the differences between each row for each participant; are they two different models or pre-test post-test or data and model prediction?!).

      I'm not an expert on the implementation of such models but my reading of the supplemental methods is that the model is fit using all the data rather than fit and tested on held-out data. This seems problematic.

      We recognize the risk of overfitting with the causal-inference model. We now rely on Bayesian model comparisons, which use model evidence for model selection. This method automatically incorporates a penalty for model complexity through the marginalization over the parameter space (MacKay, 2003).

      Our design is not suitable for cross-validation because the model-fitting process is computationally intensive and time-consuming. Each fit of the causal-inference model takes approximately 30 hours, and multiple fits with different initial starting points are required to rule out that the parameter estimates correspond to local minima.

      I would have liked to have seen more individual participant data (which is currently in the supplemental materials, albeit in a not very clear manner as discussed above).

      We have revised Supplementary Figures S4-S6 to show additional model predictions of the recalibration effect for individual participants, and participants’ temporal-order judgments are now shown in Supplement Figure S7. These figures confirm the better performance of the causal-inference model.

      The way that S3 is described in the text (line 141) makes it sound like everyone was in the same direction, however, it is clear that 2 /9 listeners show the opposite pattern, and 2 have confidence intervals close to zero (albeit on the -ve side).

      We have revised the text to clarify that the asymmetry occurs in both directions and is idiosyncratic (lines 168-171). We summarized the distribution of the individual asymmetries of the recalibration effect across visual-leading and auditory-leading adapter SOAs in Supplementary Figure S2.

      Reviewer #3 (Public Review):

      Summary:

      Li et al. describe an audiovisual temporal recalibration experiment in which participants perform baseline sessions of ternary order judgments about audiovisual stimulus pairs with various stimulus-onset asynchronies (SOAs). These are followed by adaptation at several adapting SOAs (each on a different day), followed by post-adaptation sessions to assess changes in psychometric functions. The key novelty is the formal specification and application/fit of a causal-inference model for the perception of relative timing, providing simulated predictions for the complete set of psychometric functions both pre and post-adaptation.

      Strengths:

      (1) Formal models are preferable to vague theoretical statements about a process, and prior to this work, certain accounts of temporal recalibration (specifically those that do not rely on a population code) had only qualitative theoretical statements to explain how/why the magnitude of recalibration changes non-linearly with the stimulus-onset asynchrony of the adapter.

      (2) The experiment is appropriate, the methods are well described, and the average model prediction is a fairly good match to the average data (Figure 4). Conclusions may be overstated slightly, but seem to be essentially supported by the data and modelling.

      (3) The work should be impactful. There seems a good chance that this will become the go-to modelling framework for those exploring non-population-code accounts of temporal recalibration (or comparing them with population-code accounts).

      (4) A key issue for the generality of the model, specifically in terms of recalibration asymmetries reported by other authors that are inconsistent with those reported here, is properly acknowledged in the discussion.

      Weaknesses:

      (1) The evidence for the model comes in two forms. First, two trends in the data (non-linearity and asymmetry) are illustrated, and the model is shown to be capable of delivering patterns like these. Second, the model is compared, via AIC, to three other models. However, the main comparison models are clearly not going to fit the data very well, so the fact that the new model fits better does not seem all that compelling. I would suggest that the authors consider a comparison with the atheoretical model they use to first illustrate the data (in Figure 2). This model fits all sessions but with complete freedom to move the bias around (whereas the new model constrains the way bias changes via a principled account). The atheoretical model will obviously fit better, but will have many more free parameters, so a comparison via AIC/BIC or similar should be informative

      In the revised manuscript, we switched from AIC to Bayesian model selection, which approximates and compares model evidence. This method incorporates a strong penalty for model complexity through marginalization over the parameter space (MacKay, 2003).

      We have addressed this comment by updating the former competitor model into a more reasonable version that induces recalibration only for some measured SOAs and by including another (asynchrony-contingent) model that is capable of predicting the nonlinearity and asymmetry of recalibration (Fig. 3) while heuristically approximating the causal inference computations. The causal-inference model outperformed the asynchrony-contingent model, as indicated by model evidence (Fig. 4A). Furthermore, model predictions show that the causal-inference model more accurately captures recalibration at large SOAs at both the group (Fig. 4B) and the individual level (Fig. S4).

      (2) It does not appear that some key comparisons have been subjected to appropriate inferential statistical tests. Specifically, lines 196-207 - presumably this is the mean (and SD or SE) change in AIC between models across the group of 9 observers. So are these differences actually significant, for example via t-test?

      We statistically compared the models using Bayes factors (Fig. 4A). The model evidence for each model was approximated using Variational Bayesian Monte Carlo. Bayes factors provided strong evidence in support of the causal-inference model relative to the other models.

      (3) The manuscript tends to gloss over the population-code account of temporal recalibration, which can already provide a quantitative account of how the magnitude of recalibration varies with adapter SOA. This could be better acknowledged, and the features a population code may struggle with (asymmetry?) are considered.

      We simulated a population-code model to examine its prediction of the recalibration effect for different adapter SOAs (lines 380–388, Supplement Section 8). The population-code model can predict the nonlinearity of recalibration, i.e., a decreasing recalibration effect as the adapter SOA increases. However, to capture the asymmetry of recalibration effects across auditory-leading and visual-leading adapter stimuli, we would need to assume that the auditory-leading and visual-leading SOAs are represented by neural populations with unequal tuning curves.

      (4) The engagement with relevant past literature seems a little thin. Firstly, papers that have applied causal inference modeling to judgments of relative timing are overlooked (see references below). There should be greater clarity regarding how the modelling here builds on or differs from these previous papers (most obviously in terms of additionally modelling the recalibration process, but other details may vary too). Secondly, there is no discussion of previous findings like that in Fujisaki et al.’s seminal work on recalibration, where the spatial overlap of the audio and visual events didn’t seem to matter (although admittedly this was an N = 2 control experiment). This kind of finding would seem relevant to a causal inference account.

      References:

      Magnotti JF, Ma WJ and Beauchamp MS (2013) Causal inference of asynchronous audiovisual speech. Front. Psychol. 4:798. doi: 10.3389/fpsyg.2013.00798

      Sato, Y. (2021). Comparing Bayesian models for simultaneity judgement with different causal assumptions. J. Math. Psychol., 102, 102521.

      We have revised the Introduction and Discussion to better situate our study within the existing literature. Specifically, we have incorporated the suggested references (lines 66–69) and provided clearer distinctions on how our modeling approach builds on or differs from previous work on causal-inference models, particularly in terms of modeling the recalibration process (lines 75–79). Additionally, we have discussed findings that might contradict the assumptions of the causal-inference model (lines 405–424).

      (5) As a minor point, the model relies on simulation, which may limit its take-up/application by others in the field.

      Upon acceptance, we will publicly share the code for all models (simulation and parameter fitting) to enable researchers to adapt and apply these models to their own data.

      (6) There is little in the way of reassurance regarding the model’s identifiability and recoverability. The authors might for example consider some parameter recovery simulations or similar.

      We conducted a model recovery for each of the six models described in the main text and confirmed that the asynchrony-contingent and causal-inference models are identifiable (Supplement Section 11). Simulations of the asynchrony-correction model were sometimes best fit by causal-inference models, because the latter behaves similarly when the prior of a common cause is set to one.

      We also conducted a parameter recovery for the winning model, the causal-inference model with modality-specific precision (Supplement Section 13).

      Key parameters, including audiovisual bias  , amount of auditory latency noise  , amount of visual latency noise  , criterion, lapse rate  showed satisfactory recovery performance. The less accurate recovery of  is likely due to a tradeoff with learning rate  .

      (7) I don't recall any statements about open science and the availability of code and data.

      Upon acceptance of the manuscript, all code (simulation and parameter fitting) and data will be made available on OSF and publicly available.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      In addition to the comments below, we would like to offer the following summary based on the discussion between reviewers:

      The major shortcoming of the work is that there should ideally be a bit more evidence to support the model, over and above a demonstration that it captures important trends and beats an account that was already known to be wrong. We suggest you:

      (1) Revise the figure legends (Figure 5 and Figure 6E).

      We revised all figures and figure legends.

      (2) Additionally report model differences in terms of BIC (which will favour the preferred model less under the current analysis);

      We now base the model comparison on Bayesian model selection, which approximates and compares model evidence. This method incorporates a strong penalty for model complexity through marginalization over the parameter space (MacKay, 2003).

      (3) Move to instead fitting the models multiple times in order to get leave-one-out estimates of best-fitting loglikelihood for each left-out data point (and then sum those for the comparison metric).

      Unfortunately, our design is not suitable for cross-validation methods because the model-fitting process is computationally intensive and time-consuming. Each fit of the causal-inference model takes approximately 30 hours, and multiple fits with different initial starting points are required to rule out local minima.

      (4) Offering a comparison with a more convincing model (for example an atheoretical fit with free parameters for all adapters, e.g. as suggested by Reviewer 3.

      We updated the previous competitor model and included an asynchrony-contingent model, which is capable of predicting the nonlinearity of recalibration (Fig. 3). The causal-inference model still outperformed the asynchrony-contingent model (Fig. 4A). Furthermore, model predictions show that only the causal-inference model captures non-zero recalibration effects for long adapter SOAs at both the group level (Fig. 4B) and individual level (Figure S4).

      Reviewer #1 (Recommendations For The Authors):

      A larger sample size would be better.

      This study used a within-subject design, which included 9 sessions, totaling 13.5 hours per participant. This extensive data collection allows us to better constrain the model for each participant. Our conclusions are based on the different models’ ability to fit individual data rather than on group statistics.

      It would be good to better put the study in the context of spatial ventriloquism, where similar model comparisons have been done over the last ten years and there is a large body of work to connect to.

      We now discuss our model in relation to models of cross-modal spatial recalibration in the Introduction (lines 70–78) and Discussion (lines 324–330).

      Reviewer #2 (Recommendations For The Authors):

      Previous authors (e.g. Yarrow et al.,) have described latency shift and criterion change models as providing a good fit of experimental data. Did the authors attempt a criterion shift model in addition to a shift model?

      We have considered criterion-shift variants of our atheoretical recalibration models in Supplement Section 1. To summarize the results, we varied two model assumptions: 1) the use of either a Gaussian or an exponential measurement distribution, and 2) recalibration being implemented either as a shift of bias or a criterion. We fit each model variant separately to the ternary TOJ responses of all sessions. Bayesian model comparisons indicated that the bias-shift model with exponential measurement distributions best captured the data of most participants.

      Figure 4B - I'm not convinced that the modality-independent uncertainty is anything but a straw man. Models not allowed to be asymmetric do not show asymmetry? (the asymmetry index is irrelevant in the fixed update model as I understand it so it is not surprising the model is identical?).

      We included the assumption that temporal uncertainty might be modality-independent for several reasons. First, there is evidence suggesting that a central mechanism governs the precision of temporal-order judgments (Hirsh & Sherrick, 1961), indicating that precision is primarily limited by a central mechanism rather than the sensory channels themselves. Second, from a modeling perspective, it was necessary to test whether an audio-visual temporal bias alone, i.e., assuming modality-independent uncertainty, could introduce asymmetry across adapter SOAs. Additionally, most previous studies implicitly assumed symmetric likelihoods, i.e., modality-independent latency noise, by fitting cumulative Gaussians to the psychometric curves derived from 2AFC-TOJ tasks (Di Luca et al., 2009; Fujisaki et al., 2004; Harrar & Harris, 2005; Keetels & Vroomen, 2007; Navarra et al., 2005; Tanaka et al., 2011; Vatakis et al., 2007, 2008; Vroomen et al., 2004).

      Why does a zero SOA adapter shift the pss towards auditory leading? Is this a consequence of the previous day’s conditioning - it’s not clear from the methods whether all listeners had the same SOA conditioning sequence across days.

      The auditory-leading recalibration effect for an adapter SOA of zero has been consistently reported in previous studies (e.g., Fujisaki et al., 2004; Vroomen et al., 2004). This effect symbolizes the asymmetry in recalibration. This asymmetry can be explained by differences across modalities in the noisiness of the latencies (Figure 5C) in combination with audiovisual temporal bias (Figure S8).

      We added details about the order of testing to the Methods section (lines 456–457).

      Reviewer #3 (Recommendations For The Authors):

      Abstract

      “Our results indicate that human observers employ causal-inference-based percepts to recalibrate cross-modal temporal perception” Your results indicate this is plausible. However, this statement (basically repeated at the end of the intro and again in the discussion) is - in my opinion - too strong.

      We have revised the statement as suggested.

      Intro and later

      Within the wider literature on relative timing perception, the temporal order judgement (TOJ) task refers to a task with just two response options. Tasks with three response options, as employed here, are typically referred to as ternary judgments. I would suggest language consistent with the existing literature (or if not, the contrast to standard usage could be clarified).

      Ref: Ulrich, R. (1987). Threshold models of temporal-order judgments evaluated by a ternary response task. Percept. Psychophys., 42, 224-239.

      We revised the term for the task as suggested throughout the manuscript.

      Results, 2.2.2

      “However, temporal precision might not be due to the variability of arrival latency.” Indeed, although there is some recent evidence that it might be.

      Ref: Yarrow, K., Kohl, C, Segasby, T., Kaur Bansal, R., Rowe, P., & Arnold, D.H. Neural-latency noise places limits on human sensitivity to the timing of events. Cognition, 222, 105012 (2022).

      We included the reference as suggested (lines 245–248).

      Methods, 4.3.

      Should there be some information here about the order of adaptation sessions (e.g. random for each observer)?

      We added details about the order of testing to the Methods section (lines 456–457).

      Supplemental material section 1.

      Here, you test whether the changes resulting from recalibration look more like a shift of the entire psychometric function or an expansion of the psychometric function on one side (most straightforwardly compatible with a change of one decision criterion). Fine, but the way you have done this is odd, because you have introduced a further difference in the models (Gaussian vs. exponential latency noise) so that you cannot actually conclude that the trend towards a win for the bias-shift model is simply down to the bias vs. criterion difference. It could just as easily be down to the different shapes of psychometric functions that the two models can predict (with the exponential noise model permitting asymmetry in slopes). There seems to be no reason that this comparison cannot be made entirely within the exponential noise framework (by a very simple reparameterization that focuses on the two boundaries rather than the midpoint and extent of the decision window). Then, you would be focusing entirely on the question of interest. It would also equate model parameters, removing any reliance on asymptotic assumptions being met for AIC.

      We revised our exploration of atheoretical recalibration models. To summarize the results, we varied two model assumptions: 1) the use of either a Gaussian or an exponential measurement distribution, and 2) recalibration being implemented either as a shift of the cross-modal temporal bias or as a shift of the criterion. We fit each model separately to the ternary TOJ responses of all sessions. Bayesian model comparisons indicated that the bias-shift model with exponential measurement distributions best described the data of most participants.

      References

      Di Luca, M., Machulla, T.-K., & Ernst, M. O. (2009). Recalibration of multisensory simultaneity:

      cross-modal transfer coincides with a change in perceptual latency. Journal of Vision, 9(12), Article 7.

      Fujisaki, W., Shimojo, S., Kashino, M., & Nishida, S. ’ya. (2004). Recalibration of audiovisual simultaneity. Nature Neuroscience, 7(7), 773–778.

      Harrar, V., & Harris, L. R. (2005). Simultaneity constancy: detecting events with touch and vision. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 166(3-4), 465–473.

      Hirsh, I. J., & Sherrick, C. E., Jr. (1961). Perceived order in different sense modalities. Journal of Experimental Psychology, 62(5), 423–432.

      Keetels, M., & Vroomen, J. (2007). No effect of auditory-visual spatial disparity on temporal recalibration. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 182(4), 559–565.

      MacKay, D. J. (2003). Information theory, inference and learning algorithms.https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=201b835c3f3a3626ca07b e68cc28cf7d286bf8d5

      Navarra, J., Vatakis, A., Zampini, M., Soto-Faraco, S., Humphreys, W., & Spence, C. (2005). Exposure to asynchronous audiovisual speech extends the temporal window for audiovisual integration. Brain Research. Cognitive Brain Research, 25(2), 499–507.

      Tanaka, A., Asakawa, K., & Imai, H. (2011). The change in perceptual synchrony between auditory and visual speech after exposure to asynchronous speech. Neuroreport, 22(14), 684–688.

      Vatakis, A., Navarra, J., Soto-Faraco, S., & Spence, C. (2007). Temporal recalibration during asynchronous audiovisual speech perception. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 181(1), 173–181.

      Vatakis, A., Navarra, J., Soto-Faraco, S., & Spence, C. (2008). Audiovisual temporal adaptation of speech: temporal order versus simultaneity judgments. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 185(3), 521–529.

      Vroomen, J., Keetels, M., de Gelder, B., & Bertelson, P. (2004). Recalibration of temporal order perception by exposure to audio-visual asynchrony. Brain Research. Cognitive Brain Research, 22(1), 32–35.

    1. to determine a composite risk score

      did you came up with this risk score, or is their a reference you took it from? If you came up with it, you should explain it in the text, not only in the code. Or you should give the reference for further details

    2. # Calculating the average Temperature in °C: t2m = ds1.t2m.mean(dim='time') avg_t2m = t2m - 273.15 # Converting from [K] into [°C] # Defining the map projection (how does the earths sphere get depicted on a 2-dimensional map): ax = plt.axes(projection=ccrs.Robinson()) # Plotting the variable T onto ax: avg_t2m.plot(ax=ax, transform=ccrs.PlateCarree(), cmap='plasma', vmin=-40, vmax=30, cbar_kwargs={'label': 'Temperature [°C]'}) #Note: The keyword "transform" tells the function in which projection the data is stored. # Adding gridlines and coastlines to the plot ax.coastlines(); ax.gridlines(); ax.set_title("Average annual 2m air temperature (ERA5 1979-2018)") plt.show()

      Nicely done! Great code commenting!

      However, if you add vmin and vmax, make sure to justify, double-check which values to choose...

    1. For hundreds of years, people tried to decipher the message on the Rosetta Stone but were unable to crack the code. Finally, in 1822, Jean-Francois Champollion, who could read both Greek and Coptic,

      Wow, I wonder how Champollion was able to read Greek and Coptic. Why could nobody else decipher the rosetta stone? Were they not able to learn how to read in these languages or did they not know that it was written in Greek and Coptic?

      • Definition and Promise of Reactivity

        “Reactivity is the future of JS frameworks! Reactivity allows you to write lazy variables that are efficiently cached and updated, making it easier to write clean and fast code.”

        • The article frames reactivity as a foundational approach for modern JavaScript frameworks, emphasizing its power to optimize code by caching and selectively recalculating only what changes.
      • Introduction to Reactively

        “I’ve been working on a new fine grained reactivity libary called Reactively inspired by my work on the SolidJS team.”

        • Reactively is presented as a fine-grained, “tiny (<1 kb)” library that focuses on lazy variables, caching, and incremental recalculation, aiming to be “the fastest reactive library in its category.”
      • Core Functionality Example

        “Here’s an example of using Reactively for a lazy variable:”

        • The provided code snippet (with reactive(10) and a deferred fetch) illustrates how Reactively defers computations and retrieves data only when needed, ensuring on-demand execution for performance benefits.
      • Dependency Graph Awareness

        “Reactive libraries work by maintaining a graph of dependencies between reactive elements.”

        • By automatically detecting and organizing which reactive elements depend on which sources, libraries like Reactively minimize developer effort and maximize performance by only re-executing relevant nodes in the graph.
      • Wide Use Across Frameworks

        “Reactivity libraries are at the heart of modern web component frameworks like Solid, Qwik, Vue, and Svelte.”

        • Fine-grained reactivity is not limited to standalone libraries; it also underpins the state management logic of multiple popular front-end ecosystems.
      • Primary Goals of Reactive Libraries

        “The goal of a reactive library is to run reactive functions when their sources have changed.”

        • Two fundamental objectives emerge—efficiency (avoiding unnecessary computations) and glitch-free updates (ensuring all dependencies are in sync before rendering to the user).
      • Lazy vs. Eager Evaluation

        “Reactive libraries can be divided into two categories: lazy and eager.”

        • Eager libraries update as soon as a source changes, while lazy libraries defer recalculation until the value is explicitly requested. Each approach has implications for performance optimizations and complexity handling.
      • The Diamond Problem & Equality Check Problem

        “Evaluating D twice is inefficient and may cause a user visible glitch.”

        • Eager libraries often face the “diamond problem” (risk of double-updating downstream nodes), while lazy libraries must handle the “equality check problem” (e.g., re-checking parents unnecessarily if the value hasn’t changed).
      • MobX Algorithm

        “MobX uses a two pass algorithm, with both passes proceeding from A down through its observers.”

        • MobX solves the diamond problem by a “count” mechanism across two phases, ensuring every dependent node is updated exactly once and in the correct order, also tracking whether a parent’s value has changed for equality checks.
      • Preact Signals Approach

        “Preact checks whether the parents of any signal need to be updated before updating that signal. It does this by storing a version number on each node and on each edge…”

        • Preact employs version numbers and a two-phase “down” and “up” traversal. This allows quick detection of stale nodes without re-walking the entire graph when no real changes occur.
      • Reactively’s Graph Coloring Mechanism

        “Instead of version numbers, Reactively uses only graph coloring.”

        • Reactively marks changed nodes as red (dirty) and their children as green (check). An “up” phase (updateIfNecessary()) recalculates only if a node or any of its ancestors is red, then cleans the graph state once values are confirmed.
      • Benchmarks and Observations

        “Reactively is the fastest (who would’ve guessed 😉).”

        • Although performance differences are often negligible for typical app use, experiments show Reactively performing strongly under heavy load, with Solid excelling in wide graphs, and Preact Signals being both swift and memory-efficient.
      • Memory Management & Data Structures

        “In a future blog post we'll look at the data structures and optimizations used in each framework…”

        • Beyond the core update algorithms, internal implementation details—like how each library manages and structures reactive nodes—play a crucial role in overall speed and memory usage.
      • Overall Conclusion

        “Most important is that the framework not run any user code unnecessarily.”

        • Across modern fine-grained reactive libraries, the central guiding principle is to avoid superfluous work. The focus remains on providing glitch-free, lazy or eager updates while preserving efficiency and correctness.
    1. In [9]: # Plot fig, axes = plt.subplots(2, 1, figsize=(12, 10), subplot_kw={'projection': ccrs.PlateCarree()}) # January ax1 = axes[0] monthly_avg_precip.sel(month=1).plot.contourf(ax=ax1, levels=[0.5, 1, 2, 3, 4, 5, 7, 10, 15, 20, 40], cmap='YlGnBu') # August ax2 = axes[1] monthly_avg_precip.sel(month=8).plot.contourf(ax=ax2, levels=[0.5, 1, 2, 3, 4, 5, 7, 10, 15, 20, 40], cmap='YlGnBu') #set title ax1.set_title('2.1: Mean average precipitation per day in January (mm/day)') ax1.coastlines() ax2.set_title('2.2: Mean average precipitation per day in August (mm/day)') ax2.coastlines() plt.show()

      Great label units and titles! Levels are also correctly shown. Though I would prefer "precipitation" over "tp" ;-)

      gridlines and the Robinson projection are missing (same as in the figures above). Here, specifically the gridlines would be helpful to see how far north or south the equator the ITCZ lies...

      Potential solution for the code: prcp_cycle = ds.tp.groupby('time.month').mean() * 1000 pm = prcp_cycle.sel(month=1) ax = plt.axes(projection=ccrs.Robinson()) pm.plot(ax=ax, transform=ccrs.PlateCarree(), cbar_kwargs={'label':'precipitation (mm day$^{-1}$)'}, levels=[0.5, 1, 2, 3, 4, 5, 7, 10, 15, 20, 40], cmap='YlGnBu') ax.set_title('$\overline{P_{Jan}}$') ax.coastlines(); ax.gridlines();

    2. mean_temp = ds_t2m['t2m'].mean(dim='time') #in Kelvin mean_temp_celcius = mean_temp - 273.15 # Plot fig, ax = plt.subplots(figsize=(12, 5), subplot_kw={'projection': ccrs.PlateCarree()}) mean_temp_celcius.plot(ax=ax, cmap='coolwarm', cbar_kwargs={'label': 'Temperature (°C)'}) ax.coastlines() ax.set_title('1.1: Temporal mean Temperature $\overline{T}$ (°C)') plt.show()

      Nice titles and labels. However, it is better and important to use the Robinson projection as the Earth is not a plate ;-) same true for the other figures: e.g. by using this code here:

      t2_tavg = ds.t2m.mean(dim='time') - 273.15 ax = plt.axes(projection=ccrs.Robinson()) t2_tavg.plot(ax=ax, transform=ccrs.PlateCarree(), cbar_kwargs={'label':'$\overline{T}$ [K]'}) ax.coastlines(); ax.gridlines();

    1. levels= [0.5, 1, 2, 3, 4, 5, 7, 10, 15, 20, 40] fig, axes = plt.subplots(2, 1, figsize=(15, 10), subplot_kw={'projection': ccrs.Robinson()}) fig.suptitle('Average Daily Precipitation \n',fontsize=15, fontweight='bold') # January jan = pre_mthly_mm.sel(month=1) jan.plot(ax=axes[0], cmap='YlGnBu', transform=ccrs.PlateCarree(),cbar_kwargs={'label': 'Precipitation(mm)'}) axes[0].coastlines(); axes[0].gridlines() axes[0].set_title('January') #August aug= pre_mthly_mm.sel(month=8) aug.plot(ax=axes[1],cmap='YlGnBu', transform=ccrs.PlateCarree(), cbar_kwargs={'label':'Precipitation(mm)'}) axes[1].coastlines(); axes[1].gridlines() axes[1].set_title('August');

      you define the levels as requested, but you do not use them to plot the figure. The diferences are better visible when actually showing distinct levels as requested.

      Example code: prcp_cycle = ds.tp.groupby('time.month').mean() * 1000 pm = prcp_cycle.sel(month=1) ax = plt.axes(projection=ccrs.Robinson()) pm.plot(ax=ax, transform=ccrs.PlateCarree(), cbar_kwargs={'label':'precipitation (mm day$^{-1}$)'}, levels=[0.5, 1, 2, 3, 4, 5, 7, 10, 15, 20, 40], cmap='YlGnBu') ax.set_title('$\overline{P_{Jan}}$') ax.coastlines(); ax.gridlines();

      In addition, precipitation unit is "mm per day" ( not just mm as you labelled it)

      • Motivation & Purpose of the Talk

        “This talk is called I see what you mean what a tiny language can teach us about gigantic systems… which sort of uh formed the rock on which I built all of my thesis work.”

        • Alvaro introduces a small, “tiny” language (Dedalus) that explores how to effectively build and reason about distributed systems by focusing on semantics rather than purely operational details.
      • Importance of Abstraction & Its Pitfalls

        “Abstraction is a thing… arguably the the the best tool that we have in computer science… but sometimes it’s harmful.”

        • While abstraction helps manage complexity, it can also hide essential details about distributed behavior and lead to design failures (e.g., RPC “leaking” distributed complexity).
      • Division Between “Infrastructure Programmers” and “Users”

        “We tend to think of abstractions as these fixed boundaries, you know these walls… we put us right we put the Geniuses the infrastructure programmers the 10x Engineers below the wall… who goes above the wall well the the despised users.”

        • Alvaro criticizes the mindset that library writers and library users are separate classes of people; instead, we all alternate between these roles.
      • Spark for a Declarative Approach

        “…if you kind of squint your eyes the work that I was doing in those two modes [C code vs. SQL queries]… it wasn’t really that different.”

        • Observing that data wrangling in both imperative and declarative styles shares core similarities prompted an interest in “could you write distributed systems using a logic/query language?”
      • Model-Theoretic Semantics & Queries

        “…model theoretic semantics say that no no the meaning of a program is precisely the structures that make the statements in the program true… data becomes a Common Language…”

        • A logic-based or query-based approach allows mapping “programs” to “outcomes” directly through data, making correctness and debugging potentially clearer than in purely imperative styles.
      • Data Log & Concurrency

        “Data log is interesting because we see that there's this Rich intimate connection between talking about what you don't know and having to commit to particular orders to get deterministic results.”

        • Data log provides a unifying lens for data, but the addition of recursion, negation, and timing must be carefully managed to keep semantics deterministic in distributed settings.
      • Introducing Dedalus (pronounced ‘Day-Duh-Luss’)

        “So the idea is we want to take that clock and rify it, make the clock a piece of every uh unit of knowledge that we have… time is just a device that was invented to keep everything from happening at once.”

        • Dedalus extends data log with explicit time and asynchronous rules so programmers can represent mutable state, concurrency, and message ordering in a precise logical framework.
      • Three Rule Types in Dedalus

        “…we say you know every every record has a time stamp… deductive rules say the conclusion has the same time stamp as the premise… inductive rules say the conclusion has one Higher Tim stamp… asynchronous rules say hey look there's this infinite domain of time we randomly pick from it…”

        • Dedalus’s key contribution is capturing “now,” “next,” and “eventually” semantics, reflecting real-world distributed behaviors (e.g., immediate local inference vs. future state vs. network delays).
      • State as “Induction in Time”

        “…unlike in databases which with having no time had only state in Dedalus there is no state… state is what you get when you say when you know something then you know it at the next time and by induction you keep knowing it.”

        • Dedalus reframes state changes as an inductive process on discrete time steps, allowing logic-based reasoning about mutation.
      • Confluence & Determinism

        “If we take away that pesky negation… or with very carefully controlled negation… monotonic… we know that negation free or monotonic more broadly Dedalus programs are confluent… they're deterministic without coordination.”

        • By restricting programs to monotonic logic (no negative conditions or well-controlled negation), a system can behave deterministically despite asynchronous execution and failures.
      • Significance for Distributed Systems

        “…there’s this Rich intimate connection between… the meaning of programs, the uniqueness of a model… and this really valuable systems property of deterministic outcomes…”

        • Dedalus reveals how purely logical constructs (stable models, minimal models) can correspond directly to reliable, deterministic distributed protocols in practice.
      • Legacy & Extensions

        “…on top of Bloom we built Blazes… that allow programmers… exactly why they aren’t if they aren’t [deterministic]… lineage driven fault injection… we can prove that our programs are fault tolerant…”

        • Dedalus’s ideas led to subsequent systems like Bloom, Blazes, and lineage-driven fault injection that leverage logic-based reasoning to auto-generate or verify coordination strategies.
      • Closing Thoughts & Academic Invitation

        “We don’t do a good enough job respecting our users… If any of you are interested in spending the next five or six years screwing around inventing languages Building Systems with them… I’m looking for PhD students.”

        • Alvaro emphasizes user-focused abstractions, fluid design, and invites new students to further this research in language-driven system development.
    1. On the other hand, some bots are made with the intention of harming, countering, or deceiving others. For example, people use bots to spam advertisements at people. You can use bots as a way of buying fake followers [c8], or making fake crowds that appear to support a cause (called Astroturfing [c9]). As one example, in 2016, Rian Johnson, who was in the middle of directing Star Wars: The Last Jedi, got bombarded by tweets that all originated in Russia (likely making at least some use of bots).

      I've had my experiences with antagonistic bots. In high school, a few of my friends would use Kahoot bots to overflow the number of students who joined the Kahoot game. There was this website where you would give the Kahoot game code and the bots would join by themselves.

    1. ### plotting the temperal average temperature t2_tavg = ds.t2m.mean(dim='time') -273.15 ax = plt.axes(projection=ccrs.Robinson()) t2_tavg.plot(ax=ax, transform=ccrs.PlateCarree(), cmap='coolwarm', center=False, vmin=-40, vmax=20, levels=10, cbar_kwargs={'label': '°C'}) ax.set_title('Average annual 2m air temperature, ERA5 1979-2018') ax.coastlines(); ax.gridlines();

      nice labelling of the code. However, the levels are strangely defined. You would need to increase the amount of levels to have round numbers (e.g. for every 5°C) or just use a continuous scale. I would also prefer a colorscale that is centered at zero °C (not center=False)

    1. SAIDs MUST be encoded as a CESR [CESR] Primitive. As defined above, a CESR Primitive includes a pre-pended derivation code that encodes the cryptographic suite or algorithm used to generate the digest.

      ^

  5. notebooksharing.space notebooksharing.space
    1. # Compute the temporal mean temperature t2m = ds['t2m'] t2m_c = t2m - 273.15 # Convert to Celsius T_mean = t2m_c.mean(dim='time') # Plot the global mean temperature fig = plt.figure(figsize=(12, 5)) ax = plt.axes(projection=ccrs.Robinson()) T_mean.plot(ax=ax, transform=ccrs.PlateCarree(), cbar_kwargs={'label': 'Mean Temperature (°C)'}) ax.coastlines() ax.gridlines() plt.title('Temporal Mean Temperature (°C)') plt.show()

      Great description of labels, code and very good that you added a title!

    1. Python uses an interpreter, so when you run a Python program, the interpreter translates the Python code into binary while it’s running it.

      This design brings several advantages to Python. Since it is interpreted line by line during execution, Python allows for more lenient type checking and requirements, making it easier for beginners to start learning and using the language. Additionally, as long as different platforms implement their own interpreters, the same code can run seamlessly across platforms without worrying about compatibility—something that is often a complex issue for many other languages. The downside, however, lies in some performance limitations. But for our course's purpose it is a perfect match.

  6. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. Pseudocode. November 2023. Page Version ID: 1185265918. URL: https://en.wikipedia.org/w/index.php?title=Pseudocode&oldid=1185265918 (visited on 2023-11-17).

      Pseudocode is a simplified, language-agnostic way of describing the logic and steps of an algorithm or program using plain language and basic programming constructs. It helps developers understand and plan code structure without focusing on syntax or language-specific details.

    1. How are people’s expectations different for a bot and a “normal” user?

      Chapter 3 focuses on Bots. First I learned that bots can do actions through social media accounts and can appear to be like any other user, and chapter 3 also discusses the ethical significance of bots. Like the division of responsibility between the person and the operator who created the robot. For the How are people's expectations different for a bot and a "normal" user? In this case, I think the biggest difference is that some of the comments and content posted by ordinary users are more real. Although it may be more subjective, it also draws some conclusions based on the user's personal feelings and experiences. But bots words have no personal emotion, and their statements are the product of code.

    1. Ask for work that is easy to verify. Your job as a programmer using an LLM is to read the code it produces, think about it, and decide if the work is good. You can ask an LLM to do things you would never ask a human to do. “Rewrite all of your new tests introducing an <intermediate concept designed to make the tests easier to read>” is an appalling thing to ask a human, you’re going to have days of tense back-and-forth about whether the cost of the work is worth the benefit. An LLM will do it in 60 seconds and not make you fight to get it done. Take advantage of the fact that redoing work is extremely cheap.

      I would like to read people giving more examples of this kind of thing, and how they do it

    1. Software engineers

      I didn’t think about software engineering being a part of the design. It felt like problem-solving, but I realized that design is a part of problem-solving, usually occurring initially. I view design as pseudo code; after understanding the problem, I think about the approaches I’m going take to solve the problem, in this case, design.

    1. The first term is the constant 3, representing the three assignment statements at the start of the fragment. The second term is 3n2, since there are three statements that are performed n2 times due to the nested iteration. The third term is 2n, two statements iterated n times. Finally, the fourth term is the constant 1, representing the final assignment statement. This gives us T(n)=3+3n2+2n+1=3n2+2n+4. By looking at the exponents, we can easily see that the n2 term will be dominant and therefore this fragment of code is O(n2).

      how to get the T(n) equation

  7. inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net
    1. When they set foot in kindergarten, how many years "behind" are they in learning opportunities, literacy and numeracy development, reading and writ-ing "behaviors," and the many benefits of quality early care?

      many families do not have the privilege of picking a home and area code based off if the schools are good or bad they are limited to picking a address at which they are able to afford. the statement of how far behind your child is when they get to kindergarden also favors the wealthy because many parents aren't able to send their child to a day care before school starts where they can get a head start along with many families work full time and are not able to teach their children at home before kindergarden.

  8. notebooksharing.space notebooksharing.space
    1. paste0(root_dir, "/output/_1_classified_points/*_ground.las")

      super minor change, but for future code adaptability, it may make sense to replace this by something like file.path(dir_list[1], "*_ground.las").

      I won't mention this for the other write_las operations, but this would apply across the entire code.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Review: 

      Summary: 

      The authors present a new application of the high-content image-based morphological profiling Cell Painting (CP) to single cell type classification in mixed heterogeneous induced pluripotent stem cell-derived mixed neural cultures. Machine learning models were trained to classify single cell types according to either "engineered" features derived from the image or from the raw CP multiplexed image. The authors systematically evaluated experimental (e.g., cell density, cell types, fluorescent channels) and computational (e.g., different models, different cell regions) parameters and convincingly demonstrated that focusing on the nucleus and its surroundings contain sufficient information for robust and accurate cell type classification. Models that were trained on mono-cultures (i.e., containing a single cell type) could generalize for cell type prediction in mixed co-cultures, and to describe intermediate states of the maturation process of iPSC-derived neural progenitors to differentiation neurons.

      Strengths:

      Automatically identifying single cell types in heterogeneous mixed cell populations hold great promise to characterize mixed cell populations and to discover new rules of spatial organization and cell-cell communication. Although the current manuscript focuses on the application of quality control of iPSC cultures, the same approach can be extended to a wealth of other applications including in depth study of the spatial context. The simple and high-content assay democratizes use and enables adoption by other labs.

      The manuscript is supported by comprehensive experimental and computational validations that raises the bar beyond the current state of the art in the field of highcontent phenotyping and makes this manuscript especially compelling. These include (i) Explicitly assessing replication biases (batch effects); (ii) Direct comparison of featurebased (a la cell profiling) versus deep-learning-based classification (which is not trivial/obvious for the application of cell profiling); (iii) Systematic assessment of the contribution of each fluorescent channel; (iv) Evaluation of cell-density dependency; (v) explicit examination of mistakes in classification; (vi) Evaluating the performance of different spatial contexts around the cell/nucleus; (vii) generalization of models trained on cultures containing a single cell type (mono-cultures) to mixed co-cultures; (viii) application to multiple classification tasks.

      Comments on latest version:

      I have consulted with Reviewer #3 and both of us were impressed by revised manuscript, especially by the clear and convincing evidence regarding the nucleocentric model use of the nuclear periphery and its benefit for the case of dense cultures. However, there are two issues that are incompletely addressed (see below). Until these are resolved, the "strength of evidence" was elevated to "compelling".

      First, the analysis of the patch size is not clearly indicating that the 12-18um range is a critical factor (Fig. 4E). On the contrary, the performance seems to be not very sensitive to the patch size, which is actually a desired property for a method. Still, Fig. 4B convincingly shows that the nucleocentric model is not sensitive to the culture density, while the other models are. Thus, the authors can adjust their text saying that the nucleocentric approach is not sensitive to the patch size and that the patch size is selected to capture the nucleus and some margins around it, making it less prone to segmentation errors in dense cultures.

      We agree that there is a significant tolerance to different patch sizes, and have therefore reformulated the conclusion as suggested in the results and the discussion sections (page 10 and 16). As very large patch sizes (>40µm) do increase the variability of the predictions and the imbalance between recall and precision, we have left this observation in the results section, as it also motivates for using smaller patch sizes.  

      Second, the GitHub does not contain sufficient information to reproduce the analysis. Its current state is sparse with documentation that would make reproducing the work difficult. What versions of the software were used? Where should data be downloaded? The README contains references to many different argparse CLI arguments, but sparse details on what these arguments actually are, and which parameters the authors used to perform their analyses. Links to images are broken. Ideally, all of these details would be present, and the authors would include a step-by-step tutorial on how to reproduce their work. Fixing this will lead to an "exceptional" strength of evidence.

      We have added additional information to the GitHub to increase the reproducibility of the analysis.  

      • The README now contains additional documentation and more extensive explanations. A flowchart has been added, making the dataflow and order of analyses more clear.  

      • The accompanying dataset is 20GB in size and can be downloaded as a .zip-file from https://figshare.com/articles/dataset/Nucleocentric-Profiling/27141441?file=49522557. This file contains 2x480 raw images and a layout file.  

      • The used software versions are included in the manuscript in table 4. To increase the reproducibility, a Conda environment file (.yaml) has been added to the GitHub. This can be installed and contains the correct package versions.

      • The README now contains for each script and its arguments a short description on its meaning, on whether it is required or optional and its default setting.  

      • A step-by-step tutorial on the use of the test dataset has been included. This tutorial includes the arguments used to run the code from the command line terminal.

      Recommendations for the authors:

      There are no reference from the text to Fig. 2D and to Fig. 3C.

      This has been changed. The text has been added to the manuscript at page 6 (fig. 2D) and the reference to Fig. 3C has been included at page 8.

  9. inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net
    1. Most Americans believe that everyone has the right to pursue success but that only some deserve to win, based on their tal-ent, effort, or ambition

      I heard something similar about the lottery, where people think that almost anyone can win, but only a single person does. With so many like minded individuals its the perfect opportunity for a business opportunity. But having this idea as well as the opposing view is what makes america america, speech. I disagree though, you cannot diregard, parent's economic status, zip code, and social status of the person to determine their success. Schools can never equalize this

    1. Visual Studio CodePour coder et réaliser les exercices du cours, vous aurez besoin :d’un éditeur de code (Visual Studio Code) pour écrire les fichiers de code,mais aussi d’un navigateur web à jour (Chrome, Safari, Firefox ou encore Edge) pour voir en direct le résultat de ce que vous codez.

      bonjour, deja merci pour le partage de ces informations diverses sur l'informatique, du pour tester, pour demander si ya possibilité de faire fonctionner l"extension hypothesis sur opéra gx et sur le texte surligné d'ajouter qu"on puisse mettre Visual Studio Code en français (pour le coup coup l ai decouvert sur votre vidéo et chercher sur internet comment faire

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Previous experimental studies demonstrated that membrane association drives avidity for several potent broadly HIV-neutralizing antibodies and its loss dramatically reduces neutralization. In this study, the authors present a tour de force analysis of molecular dynamics (MD) simulations that demonstrate how several HIV-neutralizing membrane-proximal external region (MPER)-targeting antibodies associate with a model lipid bilayer.

      First, the authors compared how three MPER antibodies, 4E10, PGZL1, and 10E8, associated with model membranes, constructed with two lipid compositions similar to native viral membranes. They found that the related antibodies 4E10 and PGZL1 strongly associate with a phospholipid near heavy chain loop 1, consistent with prior crystallographic studies. They also discovered that a previously unappreciated framework region between loops 2-3 in the 4E10/PGZL1 heavy chain contributes to membrane association. Simulations of 10E8, an antibody from a different lineage, revealed several differences from published X-ray structures. Namely, a phosphatidylcholine binding site was offset and includes significant interaction with a nearby framework region. The revised manuscript demonstrates that these lipid interactions are robust to alterations in membrane composition and rigidity. However, it does not address the reverse-that phospholipids known experimentally not to associate with these antibodies (if any such lipids exist) also fail to interact in MD simulations.

      Next, the authors simulate another MPER-targeting antibody, LN01, with a model HIV membrane either containing or missing an MPER antigen fragment within. Of note, LN01 inserts more deeply into the membrane when the MPER antigen is present, supporting an energy balance between the lowest energy conformations of LN01, MPER, and the complex. These simulations recapitulate lipid binding interactions solved in published crystallographic studies but also lead to the discovery of a novel lipid binding site the authors term the "Loading Site", which could guide future experiments with this antibody.

      The authors next established course-grained (CG) MD simulations of the various antibodies with model membranes to study membrane embedding. These simulations facilitated greater sampling of different initial antibody geometries relative to membrane. These CG simulations , which cannot resolve atomistic interactions, are nonetheless compelling because negative controls (ab 13h11, BSA) that should not associate with membrane indeed sample significantly less membrane.

      Distinct geometries derived from CG simulations were then used to initialize all-atom MD simulations to study insertion in finer detail (e.g., phospholipid association), which largely recapitulate their earlier results, albeit with more unbiased sampling. The multiscale model of an initial CG study with broad geometric sampling, followed by all-atom MD, provides a generalized framework for such simulations.

      Finally, the authors construct velocity pulling simulations to estimate the energetics of antibody membrane embedding. Using the multiscale modelling workflow to achieve greater geometric sampling, they demonstrate that their model reliably predicts lower association energetics for known mutations in 4E10 that disrupt lipid binding. However, the model does have limitations: namely, its ability to predict more subtle changes along a lineage-intermediate mutations that reduce lipid binding are indistinguishable from mutations that completely ablate lipid association. Thus, while large/binary differences in lipid affinity might be predictable, the use of this method as a generative model are likely more limited.

      The MD simulations conducted throughout are rigorous and the analysis are extensive, creative, and biologically inspired. Overall, these analyses provide an important mechanistic characterization of how broadly neutralizing antibodies associate with lipids proximal to membrane-associated epitopes to drive neutralization.

      Reviewer #2 (Public review):

      In this study, Maillie et al. have carried out a set of multiscale molecular dynamics simulations to investigate the interactions between the viral membrane and four broadly neutralizing antibodies that target the membrane proximal exposed region (MPER) of the HIV-1 envelope trimer. The simulation recapitulated in several cases the binding sites of lipid head groups that were observed experimentally by X-ray crystallography, as well as some new binding sites. These binding sites were further validated using a structural bioinformatics approach. Finally, steered molecular dynamics was used to measure the binding strength between the membrane and variants of the 4E10 and PGZL1 antibodies.

      The use of multiscale MD simulations allows for a detailed exploration of the system at different time and length scales. The combination of MD simulations and structural bioinformatics provides a comprehensive approach to validate the identified binding sites. Finally, the steered MD simulations offer quantitative insights into the binding strength between the membrane and bnAbs.

      While the simulations and analyses provide qualitative insights into the binding interactions, they do not offer a quantitative assessment of energetics. The coarse-grained simulations exhibit artifacts and thus require careful analysis.

      This study contributes to a deeper understanding of the molecular mechanisms underlying bnAb recognition of the HIV-1 envelope. The insights gained from this work could inform the design of more potent and broadly neutralizing antibodies.

      Recommendations for the authors:

      Reviewing Editor:

      We recommend the authors remove the figure and section related to bnAb LN01, perform additional analysis (e.g., further expanding on the differences in antibody binding in the presence or absence of antigen), and present this as a separate manuscript in a follow-up study.

      We consider the analysis of a bnAb with a transmembrane antigen and of LN01 as essential to the manuscript and novel results.  Study of LN01 provides many insights unique from the other MPER bnAbs in this study.  We agree further characterization of LN01 and bnAbs with transmembrane antigen or full-length Env are intriguing and necessary to complete the full mechanistic understanding of lipid-associated antibodies.  LN01 section in this paper is novel in the field and demonstrates the preliminary evidence motivating further work, which we agree are beyond the scope of this already long detailed study.

      Reviewer #1 (Recommendations for the authors):

      I appreciate the degree to which the authors responded to my previous points raised in the private review, including edits where I might have missed something in the manuscript or relevant literature. I imagine such a point-by-point response was quite onerous. Thank you also for balancing presentation/clarity with content/rigor considering the large information content of this manuscript; in silico results are inherently hard to present given the delicate balance between rigorous validation and novel information content. I apologize if I repeat points raised and addressed previously and commend the authors on their revised study, which is much improved in clarity; any additional revisions are of course entirely at your discretion.

      "...now having more diversity in lipid headgroup chemistries" references the wrong figure-it should be: Figure 2-figure supplement 2A-C. The incorrect figure is also referenced again several sentences down: "...relevant CDR and framework surface loops..."

      Thank you for pointing out this error. We have corrected figure references.

      "One shared conformational difference observed for these bnAbs the higher cholesterol bilayers was slightly more extensive and broader interaction profiles as well as modestly deeper embedding of the relevant CDR and framework surfaces loops" please rephrase

      Thank you for this suggestion.  We rephrased this for improved clarity and flow. 

      "These results bolster the feasibility for using all-atom MD as an in silico platform to explore differential phospholipid affinity at these sites (i.e., specificity studies) and influence on antibody preferred conformation as membrane composition and lipid chemistry are systematically varied" Please tone down these speculations-you have demonstrated that simulations are robust to different headgroup chemistries but have not provided evidence for the exclusion of lipids that are known not to associate with these antibodies.

      We rephrased this speculation to highlight the potential of this application. We also emphasize future studies that would be required to achieve this application in the following sentence.

      “These results motivate use of all-atom MD as an in silico approach for exploring differential phospholipid affinity at these sites…”

      Figure 2A: Specify which PDB entry corresponds to the displayed crystal structures in the main figure or caption.

      We clarified these PDB entries in the figure caption. 

      Check reference formatting in supplemental figures when generating VOR.

      I am not sure how relevant this might be to the claims of Figure 2-figure supplement 3, but AlphaFold3-based phospholigand docking might provide an additional orthogonal approach if relevant ligand(s) are available for such analysis (particularly for the newly proposed 10E8 POPC complex).

      Thank you for this suggestion.  AI/ML based prediction methods like AF3 and RoseTTAFold All-Atom (RFAA) are interesting new methods that have come since our initial submission.   We’ve decided these experiments are beyond the scope of this already long and detailed study. We have added a sentence suggesting use of these methods in future work.

      "We next studied bnAb LN01 to interrogate differences" --> this transition still reads a bit unclear. Why shift gears and change antibodies? Also, while you do go into its interactions both +/- antigen, there's no lead into the simulation initialization with and without antigen to guide the reader into the comparisons you will draw in the figure. Also, the order of information presentation is a bit strange, where the rationale for choosing a single monomeric helix is brought up in the middle of the paragraph instead of at the beginning of the section. In the next paragraph, it goes back to the initialization of the membrane composition again, which feels a bit disorganized-I do appreciate the unique challenge of having to weave through so much quality data! In fact, if you were to conduct simulations of membrane + antigen vs. membrane + LN01 vs. membrane + LN01 + antigen, I am tempted to say that this could be removed from this manuscript and flow better as a paper in and of itself.

      We thank the reviewer for the suggestion to improve the writing style.  We feel this section adds a lot of value to the manuscript, so we will keep it in the paper and improved the transition as well as rationale.  

      We selected to study the additional antibody LN01 and the monomeric MPER-TM antigen conformation because of the existing structural evidence available without additional creative model building.  This rationale has been updated in the new text.  

      We changd the order of information as suggested, moving the rationale for antigen fragment earlier in the paragraph followed by the background of the lipids sites from the crystal that can lead into simulation set-up.  We clarified the simulation initialization was similar for systems with and without antigen in the opening sentence of the paragraph

      "previously observed snorkeling and hydration of TM Arg686" --> Is this R696 (numbering could be different based on the particular Env)?

      Thank you for noting this typo, we have corrected the numbering.

      Potential font color issue with Figure 3-Figure supplement 1 B and part of A text-could be fixed in typesetting.

      The discussion reads very well. Is it possible to direct antibody maturation, even in an engineered context, towards membrane affinity without increasing immunogenic polyreactivity? This is mentioned very briefly and cited with ref 36, but I would be interested in the author's thoughts on this topic.

      We thank the reviewer for the insightful idea to explore in future work.  Our conclusion alludes to possibly artificially evolving membrane affinity studied by MD, as done in vitro by Nieva and co-workers.  Because the hypothetical nature, we’ve chosen not to elaborate on those ideas from this manuscript.

      Reviewer #2 (Recommendations for the authors):

      To ensure reproducibility and facilitate further research, the authors should publicly deposit the code for running the MD simulations and analyses (e.g., on GitHub) along with the underlying data used in the study (e.g., on Zenodo.org).

      We appreciate the consideration for open-source code and analysis. Representative code and simulation trajectories were uploaded to the following repositories:

      https://github.com/cmaillie98/mper_bnAbs.git

      https://zenodo.org/records/13830877

      —-

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Previous experimental studies demonstrated that membrane association drives avidity for several potent broadly HIV-neutralizing antibodies and its loss dramatically reduces neutralization. In this study, the authors present a tour de force analysis of molecular dynamics (MD) simulations that demonstrate how several HIV-neutralizing membrane-proximal external region (MPER)-targeting antibodies associate with a model lipid bilayer.

      First, the authors compared how three MPER antibodies, 4E10, PGZL1, and 10E8, associated with model membranes, constructed with a lipid composition similar to the native virion. They found that the related antibodies 4E10 and PGZL1 strongly associate with a phospholipid near heavy chain loop 1, consistent with prior crystallographic studies. They also discovered that a previously unappreciated framework region between loops 2-3 in the 4E10/PGZL1 heavy chain contributes to membrane association. Simulations of 10E8, an antibody from a different lineage, revealed several differences from published X-ray structures. Namely, a phosphatidylcholine binding site was offset and includes significant interaction with a nearby framework region.

      Next, the authors simulate another MPER-targeting antibody, LN01, with a model HIV membrane either containing or missing an MPER antigen fragment within. Of note, LN01 inserts more deeply into the membrane when the MPER antigen is present, supporting an energy balance between the lowest energy conformations of LN01, MPER, and the complex. Additional contacts and conformational restraints imposed by ectodomain regions of the envelope glycoprotein, however, remain unaddressed-the size of such simulations likely runs into technical limitations including sampling and compute time.

      The authors next established course-grained (CG) MD simulations of the various antibodies with model membranes to study membrane embedding. These simulations facilitated greater sampling of different initial antibody geometries relative to membrane. Distinct geometries derived from CG simulations were then used to initialize all-atom MD simulations to study insertion in finer detail (e.g., phospholipid association), which largely recapitulate their earlier results, albeit with more unbiased sampling. The multiscale model of an initial CG study with broad geometric sampling, followed by all-atom MD, provides a generalized framework for such simulations.

      Finally, the authors construct velocity pulling simulations to estimate the energetics of antibody membrane embedding. Using the multiscale modelling workflow to achieve greater geometric sampling, they demonstrate that their model reliably predicts lower association energetics for known mutations in 4E10 that disrupt lipid binding. However, the model does have limitations: namely, its ability to predict more subtle changes along a lineage-intermediate mutations that reduce lipid binding are indistinguishable from mutations that completely ablate lipid association. Thus, while large/binary differences in lipid affinity might be predictable, the use of this method as a generative model are likely more limited.

      The MD simulations conducted throughout are rigorous and the analysis are extensive. However, given the large amount of data presented within the manuscript, the text would benefit from clearer subsections that delineate discrete mechanistic discoveries, particularly for experimentalists interested in antibody discovery and design. One area the paper does not address involves the polyreactivity associated with membrane binding antibodies-MD simulations and/or pulling velocity experiments with model membranes of different compositions, with and without model antigens, would be needed. Finally, given the challenges in initializing these simulations and their limitations, the text regarding their generalized use for discovery, rather than mechanism, could be toned down.

      Overall, these analyses provide an important mechanistic characterization of how broadly neutralizing antibodies associate with lipids proximal to membrane-associated epitopes to drive neutralization.

      Reviewer #2 (Public Review):

      In this study, Maillie et al. have carried out a set of multiscale molecular dynamics simulations to investigate the interactions between the viral membrane and four broadly neutralizing antibodies that target the membrane proximal exposed region (MPER) of the HIV-1 envelope trimer. The simulation recapitulated in several cases the binding sites of lipid head groups that were observed experimentally by X-ray crystallography, as well as some new binding sites. These binding sites were further validated using a structural bioinformatics approach. Finally, steered molecular dynamics was used to measure the binding strength between the membrane and variants of the 4E10 and PGZL1 antibodies.

      The conclusions from the paper are mostly well supported by the simulations, however, they remain very descriptive and the key findings should be better described and validated. In particular:

      It has been shown that the lipid composition of HIV membrane is rich in cholesterol [1], which accounts for almost 50% molar ratio. The authors use a very different composition and should therefore provide a reference. It has been shown for 4E10 that the change in lipid composition affects dynamics of the binding. The robustness of the results to changes of the lipid composition should also be reported.

      The real advantage of the multiscale approach (coarse grained (CG) simulation followed by a back-mapped all atom simulation) remains unclear. In most cases, the binding mode in the CG simulations seem to be an artifact.

      The results reported in this study should be better compared to available experimental data. For example how does the approach angle compare to cryo-EM structure of the bnAbs engaging with the MPER region, e.g. [2-3]? How do these results from this study compare to previous molecular dynamics studies, e.g.[4-5]?

      References<br /> (1) Brügger, Britta, et al. "The HIV lipidome: a raft with an unusual composition." Proceedings of the National Academy of Sciences 103.8 (2006): 2641-2646.<br /> (2) Rantalainen, Kimmo, et al. "HIV-1 envelope and MPER antibody structures in lipid assemblies." Cell Reports 31.4 (2020).<br /> (3) Yang, Shuang, et al. "Dynamic HIV-1 spike motion creates vulnerability for its membrane-bound tripod to antibody attack." Nature Communications 13.1 (2022): 6393.<br /> (4) Carravilla, Pablo, et al. "The bilayer collective properties govern the interaction of an HIV-1 antibody with the viral membrane." Biophysical Journal 118.1 (2020): 44-56.<br /> (5) Pinto, Dora, et al. "Structural basis for broad HIV-1 neutralization by the MPER-specific human broadly neutralizing antibody LN01." Cell host & microbe 26.5 (2019): 623-637.

      Considering reviewer suggestions, we slightly reorganized the results section into specific sub-sections with headings and changed the order in which key results were presented to allow the subsequent analysis more accessible for readers.  Supplemental materials were redistributed into eLife format, having each supplemental item grouped to a corresponding main figure. Many slightly detail modifications were made to figures (mostly supplemental items) without changing their character, such as clearer axes labels or revised annotations within panels.

      The major additions within the results sections based on the reviews were:

      (1) An expanded the comparison between our simulation analyses to previous simulations and to existing cryo-EM structural evidence for MPER antibodies’ membrane orientation the context of full-length antigen, resulting in new supplemental figure panels.

      (2) New atomistic simulations of 10E8, PGZL1, and 4E10 evaluating the phospholipid binding predictions in a different lipid composition more closely modeling HIV membranes.

      Minor edits to the analyses and interpretations include:

      (1) Outlining the geometric components contributing to variance in substates after clustering the atomistic 10E8, 4E10, and PGZL1 simulations.

      (2) Better defining the variance and durability of membrane interactions within and across systems in the coarse grain methods section.

      (3) Removed interpretations in the original results sections regarding polyreactivity and energetics for MPER bnAbs that were not explicitly supported by data.   

      (4) More context of the prevenance of bnAb loop geometries in structural informatics section

      (5) Rationale for the choice of the continuous helix MPER-TM conformation in LN01-antigen conformations, and citations to previous gp41 TM simulations.

      (6) Removed language on the novelty of the coarse grain and steered pulling simulations as newly developed approaches; tempering the potential discriminating power and applications of those approaches, in light of their limitations.

      The discussion was revised to provide more novel context of the results within the field, including discussing direct relevance of the simulation methods for evaluating immune tolerance mechanisms and into antibody engineering.   We have shared custom scripts used for molecular dynamics analysis on github (https://github.com/cmaillie98/mper_bnAbs.git) and uploaded trajectories to a public repository hosted on Zenodo (https://zenodo.org/records/13830877).

      Recommendations for the authors:

      Below, I provide an extensive list of minor edits associated with the text and figures for the authors to consider. I provide these with the hope of increasing the accessibility of the manuscript to broader audiences but leave changes to the discretion of the authors.

      Text/clarity

      Figure 1 main text

      The main text discussing Figure 1 is disorganized, making the analysis difficult to follow. I would suggest the following: moving the sentence, "4E10 and PG2L1 are structurally homologous" immediately after the paragraph discussing the simulation initiation. Then, add a sentence that directly compares their experimental affinity, neutralization, and polyreactivity of 4E10 and PG2L1 (later, an unintroduced idea pops up, "These patterns may in part explain 4E10's greater polyreactivity"). Next, lead into the discussion of the MD simulation data with something to the effect of: "Given these similarities, we first compared mechanisms of membrane insertion between 4E10 and PG2L1 to bolster confidence in our predictions". Later, the sentence "Across 4E10 and PGZL1 simulations, the bound lipid phosphates"

      We thank the reviewer for the suggestion and we have restructured the beginning of the results to implement this style: to first introduce then discuss the comparative PGZL1 & 4E10 results, i.e. Figure 1 plus associated supplements.

      In the background and the introduction text leading up to Figure 1, CDR-H3 is discussed at length, however, the first figure focuses almost entirely on how CDR-H1 coordinates a lipid phosphate headgroup. Are there experimental mutations in this loop that do not affect affinity (e.g., to a soluble gp41 peptide), but do affect neutralization (like the WAWA mutation for CDR-H3, discussed later)?

      We have altered the Introduction (para 2) and Results (4E10/PGZL1 sub-section) to give more balanced discussion of CDRs H1 & H3.  That includes referencing experimental data addressing the reviewer’s question; a PGZL1 clone H4K3 where mutations to CDRH1 were introduced and shown have minimal impact on affinity to MPER peptide via ELISA and BLI, but those mutant bnAbs had significantly reduced neutralization efficacy (PMC6879610).

      The sentence "These phospholipid binding events were highly stable, typically persisting for hundreds of nanoseconds" should be moved down to immediately precede, "[However], in a PGZL1 simulation, we observed a". This would be a good place for a paragraph break following, "Thus, these bnABs constitutively", since this block of text is very long.

      Similarly, the sentence and parts of the section, "Likewise, the interactions coordinating the lipid phosphate oxygens at CDR-H1" more appropriately belongs immediately before or after the sentence, "Our simulations uncover the CDR-lipid interactions that are the most feasible".

      Thank you for the detailed guidance in reorganizing the Figure 1 results.  We followed the advice to directly compare 4E10 and PGZL1 results separately from 10E8, moving those sections of text appropriately.  New paragraph breaks were added to improve accessibility and flow of concepts throughout the Results.

      In the sentence, "our simulations uncover CDR-lipid interactions that are the most feasible and biologically relevant in the context of a full [HIV] lipid bilayer... validation to which of the many possible ions" à have you confidently determined lipid binding and positioning outside of the site validated in figure 1? Which site(s) are these referencing? The next two sentences then introduce two new ideas on the loop backbone stability then lead into lipid exchange, which is a bit jarring.

      We have adjusted the language concerning the putative ions/lipids electron density across the many PGZL1 and 4E10 crystal structures, and additionally make the explicit point that we confidently determined the lack of lipid binding outside of the site focused on in Figure 1.

      “… both bnAbs showed strong hotspots for a lipid phosphate bound within the CDR-H1 loops, with minimal phospholipid or cholesterol ordering around the proteins elsewhere.  The simulated lipid phosphates bound within CDR-H1 have exceptional overlap with electron densities and atomic details of modelled headgroups from respective lipid-soaked co-crystal structures…”

      Figure 2 main text

      "We similarly investigated bnAb 10E8" - Please make this a separate subheader, the block text is very long up to this point.

      Thank you for the suggestion. We introduced a sub-header to separate work on 10E8 all-atom simulations.

      "we observed a POPC complexed with... modelled as headgroup phosphoglycerol anions..." - please cite the references within the text.

      Thank you for pointing out this missing reference, we added the appropriate reference.

      "One striking and novel observation" - please remove the phrase "striking" throughout, for following best practices in scientific writing (PMC10212555)-this is generally well-done throughout.

      We removed “striking” from our text per your suggestion.

      "This CDR-L1 site highlights... (>500 fold) across HIV strains" - How much do R29 and Y32 also contribute to antigen binding and the conformation of this loop? These mutants also decreased Kd by approximately 20X, and based on the co-crystal structure with the TM antigen (PDB: 4XCC), seem to play a more direct role in antigen contact. Additionally, these residues should be highlighted on a figure, otherwise it's difficult to understand why they are important for membrane association.

      We thank the reviewer for deep engagement to these supporting experimental details.  The R29A+Y32A 10E8 mutant referenced in the text showed only 4-fold Kd increase, a modest change for an SPR binding experiment.  Whereas R29E+Y32E 10E8 mutant resulted in 40x Kd increase, the “20x” the reviewer refers to.  Both 10E8 mutants showed similar drastically reduced breadth and potency of over 2 orders of magnitude on average.

      These mutated CDR-L1 residues are not directly involved in antigen contact and adopt the same loop helix conformation when antigen is bound.  A minor impact on antigen binding affinity could be due altering pre-organization of CDR loops upon losing interactions from the Tyr & Arg sidechains - particularly Tyr31 in contact with CDR-H3.

      As per the suggestion, clearer annotated figure panel denoting these sidechains has been added to Figure 2-Figure Supplement 1 for 10E8 analysis.

      "Structural searches querying... identified between 10^5 and 2*10^6..." - why is this value represented as such a large range? Does this depend on the parameters used for analysis? Please clarify.

      Additionally, how prevalent are any random loop conformations compared to the ones you searched? It's otherwise difficult to attribute number of occurrences within the 2 A cutoff to biological significance, as this number is not put in context.

      We appreciate the reviewers comment to contextualize the range and relative frequency of the bnAb loop conformations.   RMSD and length of loop are the key parameters, which can be controlled by searching reference loops of similar length.  The main point of the backbone-level searching is simply to imply the bnAb loops are not particularly rare when comparing loops of similar length.   

      We did as was suggested and added comparison to random loops of the same length to the main text, including a new Supplementary Table 4.   

      “…identified between 105 to 2∙106 geometrically similar sub-segments within natural proteins (<2 Å RMSD)40, reflecting they are relatively prevalent (not rare) in the protein universe, comparing well with frequency of other surface loops of similar length in antibodies (Supplementary Table 3).”

      "We next examined the geometries" could start after its own new subheading. Moreover, while there's an emphasis on tilt for neutralization, there is not a figure clearly modelling the proposed Env tilt compared to the relatively planar bilayer. It would be helpful to have an additional panel somewhere that shows the orientation of the antibody (e.g., a representative pose) in the simulations relative to an appropriately curved membrane, Env, the binding conformation of the antibody to Env, and apo Env, given the tilting observed in PMID: 32348769 and theorized in PMC5338832. What additional conformational changes or tilting need to occur between the antibodies and Env to accomplish binding to their respective epitopes?

      Thank you for outlining an interesting element to consider in our analysis of a multi-step binding mechanism for MPER antibodies. We added additional figure panels in the supplement to outline the similarities and differences between our simulations and Fabs with the inferred membranes in cryo-EM experiments of full-length HIV Env.  The simulated Fabs’ angles are very similar with only minor tilting to match the cryo-EM antibody-membrane geometries. 

      We added Figure 1-figure supplement 1A & Figure 2-figure supplement 2A, and alter to text to reflect this:

      “The primary difference is Env-bound Fabs in cryo-EM adopt slightly more shallow approach angles (~15_°_) relative to the bilayer normal.  The simulated bnAbs in isolation prefer orientations slightly more upright, but presenting CDRs at approximately the same depth and orientation.  Thus, these bnAbs appear pre-disposed in their membrane surface conformations, needing only a minor tilt to form the membrane-antibody-antigen neutralization complex.”   

      Env tilt dynamics and membrane curvature of natural virions may reconcile some of these differences.  Recent in situ tomography of Full-length Env in pseudo-virions corroborates our approximation of flat bilayers over the short length scales around Env.

      The sentence "we next examined the geometries" mentions "potential energy cost, if any, for reorienting...". However, there's no further discussions of geometry or energy cost within this section. Please rephrase, or move this figure to main and increase discussion associated with the various conformational ensembles, their geometry, and their phospholipid association.

      As the reviewer highlights, the unbiased simulations and our analysis do not explicitly evaluate energetics.  We removed this phrase, and now only allude to the minimal energy barrier between the similar geometric conformations, relative to the tilting & access requirements for antigen binding mechanism.

      “The apparent barrier for re-orientation is likely much less energetically constraining than shielding glycans and accessibility of MPER”

      ".. describing the spectrum of surface-bound conformations" cites the wrong figure.

      Thank you for noticing this error; we correct the figure reference to (Figure 2-figure supplement 4).

      Please comment on the significance of how global clustering (Fig. S5A-C) was similar for 4E10 and PGZL1, but different for 10E8 (e.g., blue, orange, and yellow clusters for 4E10 and PHZL1 versus cyan, red, and green clusters for 10E8). As the cyan cluster seems to be much closer in Euclidian space to the 4E10/PGZL1 clusters, it might warrant additional analysis. What do these clusters represent in terms of structure/conformation? How do these clusters differ in membrane insertion as in (A)?

      We are grateful you identify analysis in the geometric clustering section that may be of interest to other readers. We have added additional supplementary table (Table 2) to detail the CDR loop membrane insertion and global Fab angles which describe each cluster, to demonstrate their similarities and differences.  We also better describe how global clustering was similar for 4E10 and PGZL1, but different for 10E8 in the relevant results section<br /> The cyan cluster is not close in structure to 4E10/PGZL1 clusters.  We note the original figure panel had an error.  The updated Figure 2-supplement 4B shows the correct Euclidian distance hierarchy with an early split between 4e10/pgzl1 and 10e8 clusters.

      Figure 3 main text

      The start of this section, "We next studied bnAb LN01...", is a good place for a new subheader.

      We have added an additional subheader here: Antigen influence on membrane bound conformations and lipid binding sites for LN01

      There should be a sentence in the main text defining the replicate setup and production MD run time. Is the apo and complex based on a published structure? How do you embed the MPER? Is the apo structure docked to membrane like in 4E10? The MD setup could also be better delineated within the methods.

      The first two paragraphs in this section have been updated to clarify the relevant simulations configuration and Fab membrane docking prediction details. 

      The procedure was the same for predicting an initial membrane insertion, albeit now we use the LN01-TM complex and the calculation will account for the membrane burial of the the TM domain and MPER fragment.  As mentioned, LN01 is predicted as inserted with CDR loops insert similarly with or without the TM-MPER fragment.  The geometry differs from PGZL1/4E10 and 10E8, denoted by the text.

      Please comment on the oligomerization state of the antigen used in the MD simulation: how does the simulation differ from a crossed MPER as observed in an MPER antibody-bound Env cryo-EM structure (PMID: 32348769), a three-helix bundle (PMC7210310), or single transmembrane helix (PMC6121722)? How does the model MPER monomer embed in the membrane compared to simulations with a trimeric MPER (PMC6035291, PMID: 33882664)-namely, key arginine residues such as R696?

      We thank the reviewer for pointing out critical underlying rationale for modeling this TM-MPER-LN01 complex which we have corrected in the revised draft. The range of potential conformations and display of MPER based on TM domain organization could easily be its own paper – we in fact have a manuscript in preparation on the topic.  

      The updated text expands the rationale for choosing the monomeric uninterrupted helix form of the MPER-TM model antigen (para 1 of LN01 section). The alternative conformations we did not to explore are called out, with references provided by the reviewer.

      The discussion qualified that the MPER presentation is likely oversimplified here, noting MPER display in the full-length Env trimer will vary in different conformational states or membrane environments. However, the only cryo-EM structures of full-length ENV with TM domains resolved have this continuous helix MPER-TM conformation – seen both within crossing TM dimers or dissociated TM monomers.

      Are there additional analyses that can validate the dynamics of the MPER monomer in the membrane and relative to LN01? Such as key contacts you would expect to maintain over the duration of the MD simulation?

      We also increased description of this TM domain’s behavior, dynamics (tilt, orientation, Arg696 snorkeling, and complex w LN01) to provide a clearer picture of the simulation results – which aligns with past MD of the gp41 TM domain as a monomer (para 2 of LN01 section).  As well, we noted key LN01-MPER contacts that were maintained.

      How does the model MPER modulate membrane properties like lipid density and lipid proximities near LN01?

      We checked and didn’t notice differences for the types of lipids (chol, etc) proximal to the MPER-TM or the CDR loops versus the bulk lipid bilayer distributions.  Due to the already long & detailed nature of this manuscript, we elect not to include discussion on this topic.

      Supplemental figure 1H-I would be better positioned as a figure 3-associated supplemental figure.

      We rearranged to follow the eLife format and have paired supplemental panels with their most relevant main figures.

      Figure 3F/H reference a "loading site" but this site is defined much later in the text, which was confusing.

      Thank you for pointing out this source of confusion, we rearranged our discussion to reflect the order in which we present data in figures.

      What evidence suggests that lipids "quickly exchange from the Loading site into the X-ray site by diffusion"? I do not gather this from Figure S1H/I.

      We have rearranged the loading side and x-ray site RMSD maps in Figure 3-Figure supplement 1 to better illustrate how a lipid exchanges between these sites.

      Figure 4 main text

      The authors assert that in the CG simulations, restraints, "[maintain] Fab tertiary and quaternary structure". However, backbone RMSD does not directly assert this claim-an additional analysis of the key interfacial residues between chains, or geometric analysis between the chains, would better support this claim.

      Thank you for pointing this point.  We rephrased to add that the major sidechain contacts between heavy and light chain persist, in addition to backbone RMSD, to describe how these Fabs maintain the fold stably in CG representation. 

      In several cases, CG models sample and then dissociate from the membrane. In the text, the authors mention, "course-grained models can distinguishing unfavorable and favorable membrane-bound conformations". Is there a particular orientation that causes/favors membrane association and dissociation? This analysis could look at conformations immediately preceding association and dissociation to give clues as to what orientation(s) favor each state.

      Thank you for suggesting this interesting analysis.  Clustering analysis of associated states are presented in Figure 5, Figure 5-Figure Supplement 1, and Figure 6, which show all CDR and framework loop directed insertion.  This feature is currently described in the main text.  

      We did not find strong correlation of specific orientations as “pre-dissociation” states or ineffective non-inserting “scanning” events.  We revised the key sentence to reflect the major take away – that non-CDR alternative conformations did not insert and most of those having CDRs inserted in a different manner than all-atom simulations also were prone to dissociate:

      “Given that non-CDR directed and alternative CDR-embedded orientations readily dissociate, we conclude that course-grained models can distinguish unfavorable and favorable membrane-bound conformations to an extent that provides utility for characterizing antibody-bilayer interaction mechanisms.”

      Figure 6 main text

      "For 4E10, trajectories initiated from all three geometries..." only two geometries are shown for each antibody. Please include all three on the plot.

      The plots include markers for all three geometries for 4E10, highlighted in stars or with letters on the density plots of angles sampled (Figure 6B,C)

      "Aligning a full-length IgG... unlikely that two Fabs simultaneously..." Are there theoretical conformations in which two Fabs could simultaneously associate with membrane? If this was physiological or could be designed rationally, could an antibody benefit further from avidity?

      Our modeling suggests the theoretical conformations having two Fabs on the membrane are infeasible.  It’s even less likely multiple Env antigens could be engaged by one IgG.  We have revised the text to express this more clearly.

      Figure 7 main text

      "An intermediate... showed a modest reduction in affinity..." what affinity does PGZL1 have for this antigen?

      The preceding sentence for this information: “Mature PGZL1 has relatively high affinity to the MPER epitope peptide (Kd = 10 nM) and demonstrates great breadth and potency, neutralizing 84% of a 130 strain panel “

      Figures

      Figure 1

      It would be helpful to have an additional panel at the top of this figure further zoomed out showing the orientation of the antibody (e.g., a representative pose) in the simulations relative to an appropriately curved membrane, Env, the binding conformation of the antibody to Env, and apo Env, given the tilting observed in PMID: 32348769 and theorized in PMC5338832. What additional conformational changes or tilting need to occur between the antibodies and Env to accomplish binding to their respective epitopes?

      Thank you for the suggestion to include this analysis.  We have added to the text reflecting this information, as well as making new supplemental panels for 4E10 and 10E8 that we compare simulated 4E10 and 10E8 Fab conformations to cryoEM density maps with Fabs bound to full-length HIV Env. Figure 1-figure supplement 1A & Figure 2-figure supplement 2A

      In Figure 1, space permitting, it would be helpful to annotate the distances between the phosphates and side chains (similarly, for Figure S1A).

      To avoid the overloading the Main figure panels with text, those relevant distances are listed in the methods sections.  Those distances are used to define the “bound” lipid phosphate state.  Generally, we note the interactions are within hydrogen bonding distance.

      Annotating "Replicate 1" and "Replicate 2" on the left side of Figure 1C/D would make this figure immediately intuitive.

      We have added these labels.

      Figure caption 1C: Please clarify the threshold/definition of a contact used to binarize "bound" versus "unbound" (for example, "mean distance cutoff of 2A between the phosphate oxygen and the COM of CDR-H1") [on further reading of the methods section, this criterion is quite involved and might benefit from: a sentence that includes "see methods"]. Additionally, C could use a sentence explaining the bar such as in E, "Phosphate binding is mapped to above each MD trajectory" Please define FR-H3 in the figure caption for E/F.

      We have added these details to the figure caption.

      Because Figure 1 is aggregated simulation time, it would be helpful to also represent the data as individual replicates or incorporate this information to calculate standard deviations/statistics (e.g., 1 microsecond max using the replicates to compute a standard deviation).

      We believe the current quantification & display of data via sharing all trajectories is sufficient to convey the major point for how often each CDR-phosholipid binding site it occupied.  Further tracking and statistics of inter-atomic distances will likely be too tedious & add minimal value. There is some dynamics of the phosphate oxygens between the polar within the CDR site but our “bound” state definitions sufficiently describe the key participating interactions are made.

      Figure 2

      For A, it would be helpful to annotate the yellow and blue mesh on the figure itself.

      We have defined the orange phosphate and blue choline densities.

      Also, where are R29 and Y32 relative to this site? In the X-ray panels, Y38 is not shown, and the box delineating the zoom-in is almost imperceptible.

      Thank you for this suggestion to include those amino acids which are referenced in the text as critical sites where mutation impacts function. To clarify, Y32 is the pdb numbering for residue Y38 in IMGT numbering. We have added a panel to Figure 2-Figure Supplement 1 having a cartoon graphic of 10E8 loop groove with sidechains & annotating R29 and Y38, staying consistent with out use of IMGT numbering in the manuscript.

      Figure 3

      It might read clearer to have "LN01+MPER-TM" and "LN01-Apo" in the middle of A/B and C/D, respectively, and a dotted line delineating the left and right side of the figure panels.

      We have added these details to the figure for clarity for readers.

      It would be helpful to show some critical interactions that are discussed in the text, such as the salt bridge with K31, by labeling these on the figure (e.g., in E-H).

      We drafted figure panels with dashed lines to indicate those key interactions.  However, they became almost imperceptible and overloaded with annotations that distracted from the overall details.  For K31, the interaction occurs in LN01 crystal structures readers can refer to.

      Why are axes cut off for J?

      We corrected this.

      Please re-define K/L plots as in Figure 1, and explain abbreviations.

      We updated the figure caption to reflect these changes.

      Figure 4

      The caption for panel A states that the Fab begins in solvent 1-2 nm above the bilayer, but the main text states 0.5-2 nm.

      We have reconciled this difference and listed the correct distances: 0.5-2nm.

      Please label the y-axis as "Replicate" for relevant figure panels so that they are more immediately interpretable.

      This label has been added.

      A legend with "membrane-associated" and "non-associated" within the figure would be helpful. Additionally, the average percent membrane associated, with a standard deviation, should be shown (Similar to 1C, albeit with the statistics).

      This legend has been added.  We also added the additional statistical metrics requested to strengthen our analysis.

      The text references "10, 14, and 12 extended insertion events" for the three antibody-based simulations. How do you define "extended insertion events"? Would breaking this into average insertion time and standard deviation better highlight the association differences between MPER antibodies and controls, in addition to the variability due to difference random initialization?

      We thank the reviewer for the insightful suggestion on how to better organize quantitative analysis to support the method. Supplemental Table 3 includes these numbers.

      Figure 5

      The analysis in Fig. S6C could be included here as a main figure.

      The drafted revised figure adding S6C to Figure 5 made for too much information.  Likewise, putting this panel S6C separated it from the parent clustering data of S6B, so we decided to keep these figures separated.  The S6 figure is now Figure 5-figure supplement 1.

      Figure 6

      Please annotate membrane insertion on E as %.

      These are phosphate binding RMSD/occupancy vs time.  The panels are now too small to annotate by %.  The qualitative presentation is sufficient at this stage.  The quantitative % are listed in-line within text when relevant to support assertions made. 

      Please use the figure caption to explain why certain clusters (e.g., 10E8 cluster A, artifact, Fig. S6E) are not included in panel E.

      We have added this information in the figure caption.

      Figure 7

      Please show all points on the box and whisker plots (panels E and F), and perform appropriate statistical tests to see if means are significantly different (these are mentioned in the text, but should be annotated on the graph and mentioned within the figure caption).

      We have changed these plots to show all data points along with relevant statistical comparisons. The figure captions describe unpaired t-test statistical tests used.

      Figure S1

      G, H, and I do not belong here-they should be moved to accompany their relevant text section, which associates with Figure 3. It would be helpful to associate this with Figure 3 in the eLife format, "Figure 3-Supplemental Figure 1" or its equivalent.

      It's very difficult to distinguish the green and blue circles on panel G.

      We darkened the shading and added outline for better visualization

      Subfigure I is missing a caption, could be included with H: "(H,I) Additional replicates for LN01+TM (H) and LN01 (I)".

      We corrected this as suggested.

      Why is H only 3 simulations and not 4? Does it not have a lipid in the x-ray site? Also, the caption states "(top, green)" and "(bottom, cyan)", but the green vs. cyan figures are organized on the left and right. Additional labels within the figure would help make this more intuitive.

      If the point of H and I is to illustrate that POPC exchanges between the X-ray and loading sites, this is unclear from the figure. Consider clarifying these figures.

      Thank you for describing the confusion in this figure, we have added labels to clarify.

      Figure S2 (panels split between revised Figure 4 associated figure supplements)

      The LN01 figures should likely follow later so that they can associate with Figure 3, despite being a similar analysis.

      We corrected supplements to eLife format so supplements are associated with relevant main figures.

      Figure S3 (panels split between revised Figure 1 & 2 associated figure supplements)

      As hydrophobicity is discussed as a driving factor for residue insertion, it would be helpful to have a rolling hydrophobicity chart underneath each plot to make this claim obvious.

      We prefer the current format, due to the worry of having too much information in these already data-rich panels.  As well, residues are not apolar but are deeply inserted.

      Figure S4 (panels split between revised Figure 1 & 2 associated figure supplements)

      It would be helpful to label the relevant loops on these figures.

      We have labeled loops for clarity.

      Do any of these loops have minor contacts with Env in the structure?

      The 4E10 and PGZL1 CDRH-1 loop does not directly contact bound MPER peptides bound in crystal structures. 

      FRL-3 and CDR-H1 in 10E8 do not contact the MPER peptide antigen component based on x-ray crystal structures.

      Do motif contacts with lipid involve minor contacts with additional loops other than those displayed in this figure?

      The phosphate-loop interactions in motifs used as query bait here are mediated solely by the backbone and side chain interactions of the loops displayed. We visually inspected most matches and did not see any “consensus” additional peripheral interactions common across each potential instance in the unrelated proteins.  The supplied Supplemental Table 2 contains the information if a reader wanted to conduct a detailed search. 

      Why is there such a difference between the loop conformation adopted in the X-ray structure and that in the MD simulation, and why does this lead to the large observed differences in ligand-binding structure matches?

      We thank the reviewer for carefully noting our error in labeling of CDR loop and framework region input queries. We revised the labeling to clarify the issue.

      The is minimal structural difference between the loops in x-ray and MD.

      Figure S5 (Figure 2-Figure supplement 4)

      This figure is not colorblind friendly-it would be helpful to change to such a pallet as the data are interesting, but uninterpretable to some.

      We have left this figure the same.

      "Susbstates" - "Substates"

      Corrected, thank you.

      Panel B is uninterpretable-please break the axis so that the Euclidian distances can be represented accurately but the histograms can be interpreted.

      We have adjusted axis for this plot to better illustrate the cluster thresholds.

      The clusters in D-H should be analyzed in greater depth. What is the structural relevance of these clusters other than differences in phospholipid occupancy in (I)? Snapshots of representative poses for each cluster could help clarify these differences.

      We have adjusted the text to describe the geometric differences in each of those clusters that result in the different exceptionally lower propensities for forming the key phospholipid interaction.  

      The figure caption should make it clear that 3 μS of aggregate simulation time is being used here instead of 4 μS to start with unique tilt initializations. E.g., "unique starting membrane-bound conformations (0 degrees, -15 degrees, 15 degrees initialization relative to the docked pose)". Further, why was the particular 0-degree replicate chosen while the other was thrown out? Or was this information averaged? Why is the full 4 μS then used for D-I?

      We thank the reviewer for noting these details.  We didn’t want to bias the differential between 10E8 and 4E10/PGZL1 by including the replicate simulations.  The analysis was mainly intended to achieve more coarse resolution distinction between 10E8 and the similar PGZL1/4E10.  

      In the subsequent clustering of individual bnAb simulation groups, the replicate 0 degree simulations had sufficiently different geometric sampling and unique lipid binding behavior that we though it should be used (4 us total) to achieve finer conformational resolution for each bnAb.

      Figure S6 (now Figure 5-Figure Supplement 1)

      Please label the CDRs in C and provide a color key like in other figures. Also, please label the y-axes. This figure could move to main below 5B with the clusters "A,B,C" labeled on 5B.

      We have added the axes labels and color key legend.  We retained a minimal CDR loop labeling scheme for the more throughput interaction profiles here where colored sections in the residue axes denote CDR loop regions.

      Figure S7 (Figure 7 Figure Supplement 1)

      Panels A and B would likely read better if swapped.

      We have swapped these panels for a better flow.

      For panel C, please display mean and standard deviation, and compare these values with an appropriate statistical test.

      This is already displayed in main figure, we have removed it from supplement.

      For E and F, please clarify from which trajectory(s) you are extracting this conformation from. Are these the global mean/representative poses? How do they compare to other geometrically distinct clusters?

      The requested information was added to supplemental figure caption.  These are frames from 2 distinct time points selected phosphate bound frames from 0-degree tilt replicates for both 4E10 and 10E8, representing at least 2 distinct macroscopic substates differing in global light chain and heavy chain orientation towards the membrane. 

      Table S2 (now Supplementary Table 3)

      Please add details for the 13h11 simulation.

      Additionally, please add average contact time and their standard deviation to the table, rather than just the aggregated total time. This will highlight the variability associated with the random initializations of each simulation.

      We have added the details for 13h11 and the requested analysis (average aggregated time +/- standard deviation and average time per association event +- standard deviation) to supplement our summary statistics for this method.

      Reviewer #2 (Recommendations For The Authors):

      (1) The structure of the manuscript should be improved. For example, almost half of the introduction (three paragraphs) summarize the results. I found it hard to navigate all the data and specific interactions described in the result section. Furthermore, the claims at the end of several sections seem unsupported. Especially for the generalization of the approach. This should be moved to the discussion section. The discussion is pretty general and does not provide much context to the results presented in this study.

      We have significantly reorganized the results section to improve the flow of the manuscript and accessibility for readers, especially the first sections of all-atom simulations. We also removed claims not directly supported by data from our results, and expanded on some of these concepts in the discussion to make some more novel context to the result.

      (2) The author should cite more rigorously previous work and refrain from using the term "develop" to describe the simple use of a well established method. E.g. Several studies have investigated membrane protein interactions e.g. [1], membrane protein-bilayer self-assembly [2], steered molecular dynamics [3], etc.

      Thank you for identifying relevant work for the simulations that set precedent for our novel application to antibody-membrane interactions.  We have removed language about development of simulation methods from the text and now better reference the precedent simulation methods used here.

      (3) Have the authors considered estimating the PMF by combining the steered MD simulation through the application of Jarzynski's equality?

      We performed from preliminary PMFs for Fab-membrane binding, but saw it was taking upward of 40 us to reach convergence.  Steered simulations focus on a key lipid may be easier.

      Although PMFs are beyond the scope of this work, we added proposals & allusion to their utility as the next steps for more rigorous quantification of fab-membrane interactions.

      Minor

      (4) The term "integrative modeling" is usually used for computational pipelines which incorporate experimental data. Multiscale modeling would be more appropriate for this study.

      We altered descriptions throughout the manuscript to reflect this comment.

      (5) Units to report the force in the steered molecular dynamics are incorrect. They should be 98.

      We changed axes and results to correctly report this unit.

      (6) Labels for axes of several graphs are not missing.

      We added labels to all axes of graphs, except for a few where stacked labels can be easily interpreted to save space and reduce complexity in figures.

      (7) Figure 3 K & L is this really < 1% of total? The term "total" should also be clarified.

      Thank you for pointing this out, we changed the % labels to be correct with axes from 0-100%. We clarified total in the figure caption.

      (8) The font size in figures should be uniformized.

      This suggestion has been applied

      (9) Time needed for steered MD should be reported in CPUh and not hours (page 17).

      We removed comments on explicit time measurements for our simulations.

      (10) Version of Martini force field is missing in methods section

      We used Martini 2.6 and added this to the methods.

      References

      (1) Prunotto, Alessio, et al. "Molecular bases of the membrane association mechanism potentiating antibiotic resistance by New Delhi metallo-β-lactamase 1." ACS infectious diseases 6.10 (2020): 2719-2731.

      (2) Scott, Kathryn A., et al. "Coarse-grained MD simulations of membrane protein-bilayer self-assembly." Structure 16.4 (2008): 621-630.

      (3) Izrailev, S., et al. "Computational molecular dynamics: challenges, methods, ideas. Chapter 1. Steered molecular dynamics." (1997).

    1. Voici un sommaire minuté de la transcription :

      • 0:00-3:20 Introduction au vote électronique
      • Mélanie Mondo et Maxime Lalisse présentent Scrutin.app, une application visant à garantir un vote électronique sécurisé et libre.
      • Distinction entre le vote par urne électronique et le vote électronique en ligne, l’application se concentrant sur ce dernier.
      • Avantages et inconvénients du vote électronique en ligne.
      • Recommandations de l'ANSSI pour le vote électronique : vérifiabilité, transparence, confidentialité.
      • Objectif de Scrutin.app : garantir le secret du vote, la vérifiabilité, la transparence, et proposer une solution libre et open source.
      • Public cible de l’application : collectifs et associations.
      • Méthodes de vote supportées : suffrage uninominal et jugement majoritaire.

      • 3:20-7:40 Fonctionnement de Scrutin.app

      • Maxime Lalisse détaille le fonctionnement de Scrutin.app, soulignant la simplicité d’utilisation et la sécurité.
      • Avantages de l’application mobile en termes de sécurité.
      • Scrutin.app s’appuie sur le protocole Belenios, développé par le Loria à Nancy.
      • Explication du chiffrement et de la vérifiabilité des votes.
      • Résistance à l'influence et possibilité de revoter.

      • 7:40-11:00 Confidentialité et vérifiabilité des votes

      • Mélanie Mondo explique le chiffrement asymétrique et homomorphique utilisé pour garantir le secret du vote.
      • Rôle des gardiens de l’élection et du partage du secret de chiffrement.
      • Anonymisation des votes via le chiffrement homomorphique ou les mixnets.
      • Transparence et vérifiabilité grâce à une base de données publique et à des cérémonies de création et de dépouillement des élections.
      • Vérification de la prise en compte des votes et des résultats.
      • Vérifiabilité de l’éligibilité des candidats via des signatures.

      • 11:00-12:30 Projets et questions de l’audience

      • Mélanie Mondo et Maxime Lalisse présentent l'état actuel de l'application et ses utilisations.
      • Améliorations prévues : interface utilisateur, site internet, disponibilité sur les stores d'applications, intégration de nouveaux modes de vote.
      • Proposition d'accompagnement des structures et associations pour la mise en place de votes en ligne.
      • Début de la session de questions/réponses avec l’audience.

      • 12:30-15:00 Sécurité et clés privées

      • Un membre de l'audience questionne l'envoi de clés privées par email.
      • Maxime Lalisse explique le système actuel basé sur Belenios et l'envoi d'un mot de passe générant une clé privée.
      • Discussion sur l’utilisation de PGP pour signer les votes.
      • Maxime Lalisse reconnaît la nécessité d'améliorer la gestion des clés privées, mais la considère comme un objectif à long terme.

      • 15:00-17:20 Rôle des gardiens et sécurité de l’application web

      • Un membre de l'audience s’interroge sur le rôle des gardiens et la nécessité de leur présence pour le dépouillement.
      • Maxime Lalisse explique le principe du partage de secret de Shamir et la possibilité de choisir les paramètres de sécurité, y compris de faire de tous les votants des gardiens.
      • Un autre membre de l’audience questionne la fiabilité de l'application web par rapport à l'application mobile.
      • Maxime Lalisse met en avant les avantages de l'application mobile en termes de sécurité et de vérifiabilité du code source.

      • 17:20-20:10 Sécurité de l'accès au serveur et de la base de données

      • Un membre de l'audience soulève la question de la sécurité de l’accès au serveur et de la base de données.
      • Maxime Lalisse explique que Belenios et Scrutin.app utilisent un append-only log, empêchant la suppression de données de la base.
      • La base de données est distribuée et chaque appareil peut vérifier qu'aucune donnée n’a été supprimée.
      • L’administrateur de la base de données ne peut pas la manipuler.

      • 20:10-21:10 Business model et financement

      • Un membre de l’audience demande si un business model est prévu pour Scrutin.app.
      • Mélanie Mondo évoque les réflexions en cours concernant le financement, notamment via des fondations.
      • L’objectif actuel est de proposer un outil gratuit et accessible, avec une éventuelle option premium pour les élections de grande envergure.
      • La possibilité d’un système de dons est également mentionnée.

      • 21:10-23:00 Résultats en temps réel et résistance à la coercition

      • Un membre de l’audience revient sur la question des gardiens et des résultats en temps réel.
      • Maxime Lalisse précise que les résultats ne sont dévoilés qu’au moment du dépouillement, même si la base de données est accessible à tous.
      • Un autre membre de l'audience interroge la résistance de Scrutin.app à la coercition.
      • Mélanie Mondo reconnaît la difficulté de ce problème, lié à la fois à la confidentialité et à la vérifiabilité du vote.
      • Différentes solutions sont évoquées, comme le revote, la confiance dans l'appareil de vote, et des protocoles impliquant des tiers de confiance.

      • 23:00-Fin de la présentation

      • Maxime Lalisse souligne que le vote papier actuel présente également des faiblesses en matière de coercition.
      • Il rappelle l’importance d’une société saine autour du vote, quel que soit le système utilisé.
      • Maxime Lalisse encourage l’audience à utiliser Scrutin.app et à les contacter pour toute question ou besoin spécifique.
    1. Reviewer #2 (Public review):

      Summary:

      The study investigates the brain's functional connectivity (FC) dynamics across different timescales using simultaneous recordings of intracranial EEG/source-localized EEG and fMRI. The primary research goal was to determine which of three convergence/divergence scenarios is the most likely to occur.

      The results indicate that despite similar FC patterns found in different data modalities, the timepoints were not aligned, indicating spatial convergence but temporal divergence.

      The researchers also found that FC patterns in different frequencies do not overlap significantly, emphasizing the multi-frequency nature of brain connectivity. Such asynchronous activity across frequency bands supports the idea of multiple connectivity states that operate independently and are organized into a multiplex system.

      Strengths:

      The data supporting the authors' claims are convincing and come from simultaneous recordings of fMRI and iEEG/EEG, which has been recently developed and adapted.

      The analysis methods are solid and involved a novel approach to analyzing the co-occurrence of FC patterns across modalities (cross-modal recurrence plot, CRP) and robust statistics, including replication of the main results using multiple operationalizations of the functional connectome (e.g., amplitude, orthogonalized, and phase-based coupling).

      In addition, the authors provided a detailed interpretation of the results, placing them in the context of recent advances and understanding of the relationships between functional connectivity and cognitive states.

      The authors also did a control analysis and verified the effect of temporal window size or different functional connecvitity operationalizations. I also applaud their effort to make the analysis code open-sourced.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife Assessment

      This study presents valuable findings on the potential of short-movie viewing fMRI protocol to explore the functional and topographical organization of the visual system in awake infants and toddlers. Although the data are compelling given the difficulty of studying this population, the evidence presented is incomplete and would be strengthened by additional analyses to support the authors' claims. This study will be of interest to cognitive neuroscientists and developmental psychologists, especially those interested in using fMRI to investigate brain organisation in pediatric and clinical populations with limited fMRI tolerance.

      We are grateful for the thorough and thoughtful reviews. We have provided point-bypoint responses to the reviewers’ comments, but first, we summarize the major revisions here. We believe these revisions have substantially improved the clarity of the writing and impact of the results.

      Regarding the framing of the paper, we have made the following major changes in response to the reviews:

      (1) We have clarified that our goal in this paper was to show that movie data contains topographic, fine-grained details of the infant visual cortex. In the revision, we now state clearly that our results should not be taken as evidence that movies could replace retinotopy and have reworded parts of the manuscript that could mislead the reader in this regard.

      (2) We have added extensive details to the (admittedly) complex methods to make them more approachable. An example of this change is that we have reorganized the figure explaining the Shared Response Modelling methods to divide the analytic steps more clearly.

      (3) We have clarified the intermediate products contributing to the results by adding 6 supplementary figures that show the gradients for each IC or SRM movie and each infant participant.

      In response to the reviews, we have conducted several major analyses to support our findings further:

      (1) To verify that our analyses can identify fine-grained organization, we have manually traced and labeled adult data, and then performed the same analyses on them. The results from this additional dataset validate that these analyses can recover fine-grained organization of the visual cortex from movie data.

      (2) To further explore how visual maps derived from movies compare to alternative methods, we performed an anatomical alignment control analysis. We show that high-quality maps can be predicted from other participants using anatomical alignment.

      (3) To test the contribution of motion to the homotopy analyses, we regressed out the motion effects in these analyses. We found qualitatively similar results to our main analyses, suggesting motion did not play a substantial role.

      (4) To test the contribution of data quantity to the homotopy analyses, we correlated the amount of movie data collected from each participant with the homotopy results. We did not find a relationship between data quantity and the homotopy results. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Ellis et al. investigated the functional and topographical organization of the visual cortex in infants and toddlers, as evidenced by movie-viewing data. They build directly on prior research that revealed topographic maps in infants who completed a retinotopy task, claiming that even a limited amount of rich, naturalistic movie-viewing data is sufficient to reveal this organization, within and across participants. Generating this evidence required methodological innovations to acquire high-quality fMRI data from awake infants (which have been described by this group, and elsewhere) and analytical creativity. The authors provide evidence for structured functional responses in infant visual cortex at multiple levels of analyses; homotopic brain regions (defined based on a retinotopy task) responded more similarly to one another than to other brain regions in visual cortex during movie-viewing; ICA applied to movie-viewing data revealed components that were identifiable as spatial frequency, and to a lesser degree, meridian maps, and shared response modeling analyses suggested that visual cortex responses were similar across infants/toddlers, as well as across infants/toddlers and adults. These results are suggestive of fairly mature functional response profiles in the visual cortex in infants/toddlers and highlight the potential of movie-viewing data for studying finer-grained aspects of functional brain responses, but further evidence is necessary to support their claims and the study motivation needs refining, in light of prior research.

      Strengths:

      - This study links the authors' prior evidence for retinotopic organization of visual cortex in human infants (Ellis et al., 2021) and research by others using movie-viewing fMRI experiments with adults to reveal retinotopic organization (Knapen, 2021).

      - Awake infant fMRI data are rare, time-consuming, and expensive to collect; they are therefore of high value to the community. The raw and preprocessed fMRI and anatomical data analyzed will be made publicly available.

      We are grateful to the reviewer for their clear and thoughtful description of the strengths of the paper, as well as their helpful outlining of areas we could improve.

      Weaknesses:

      - The Methods are at times difficult to understand and in some cases seem inappropriate for the conclusions drawn. For example, I believe that the movie-defined ICA components were validated using independent data from the retinotopy task, but this was a point of confusion among reviewers. 

      We acknowledge the complexity of the methods and wish to clarify them as best as possible for the reviewers and the readers. We have extensively revised the methods and results sections to help avoid potential misunderstandings. For instance, we have revamped the figure and caption describing the SRM pipeline (Figure 5).

      To answer the stated confusion directly, the ICA components were derived from the movie data and validated on the (completely independent) retinotopy data. There were no additional tasks. The following text in the paper explains this point:

      “To assess the selected component maps, we correlated the gradients (described above) of the task-evoked and component maps. This test uses independent data: the components were defined based on movie data and validated against task-evoked retinotopic maps.” Pg. 11

      In either case: more analyses should be done to support the conclusion that the components identified from the movie reproduce retinotopic maps (for example, by comparing the performance of movie-viewing maps to available alternatives (anatomical ROIs, group-defined ROIs). 

      Before addressing this suggestion, we want to restate our conclusions: features of the retinotopic organization of infant visual cortex could be predicted from movie data. We did not conclude that movie data could ‘reproduce’ retinotopic maps in the sense that they would be a replacement. We recognize that this was not clear in our original manuscript and have clarified this point throughout, including in this section of the discussion:

      “To be clear, we are not suggesting that movies work well enough to replace a retinotopy task when accurate maps are needed. For instance, even though ICA found components that were highly correlated with the spatial frequency map, we also selected some components that turned out to have lower correlations. Without knowing the ground truth from a retinotopy task, there would be no way to weed these out. Additionally, anatomical alignment (i.e., averaging the maps from other participants and anatomically aligning them to a held-out participant) resulted in maps that were highly similar to the ground truth. Indeed, we previously[23] found that adult-defined visual areas were moderately similar to infants. While functional alignment with adults can outperform anatomical alignment methods in similar analyses[27], here we find that functional alignment is inferior to anatomical alignment. Thus, if the goal is to define visual areas in an infant that lacks task-based retinotopy, anatomical alignment of other participants’ retinotopic maps is superior to using movie-based analyses, at least as we tested it.” Pg. 21

      As per the reviewer’s suggestion and alluded to in the paragraph above, we have created anatomically aligned visual maps, providing an analogous test to the betweenparticipant analyses like SRM. We find that these maps are highly similar to the ground truth. We describe this result in a new section of the results:

      “We performed an anatomical alignment analog of the functional alignment (SRM) approach. This analysis serves as a benchmark for predicting visual maps using taskbased data, rather than movie data, from other participants. For each infant participant, we aggregated all other infant or adult participants as a reference. The retinotopic maps from these reference participants were anatomically aligned to the standard surface template, and then averaged. These averages served as predictions of the maps in the test participant, akin to SRM, and were analyzed equivalently (i.e., correlating the gradients in the predicted map with the gradients in the task-based map). These correlations (Table S4) are significantly higher than for functional alignment (using infants to predict spatial frequency, anatomical alignment > functional alignment: ∆<sub>Fisher Z</sub> M=0.44, CI=[0.32–0.58], p<.001; using infants to predict meridians, anatomical alignment > functional alignment: ∆<sub>Fisher Z</sub> M=0.61, CI=[0.47–0.74], p<.001; using adults to predict spatial frequency, anatomical alignment > functional alignment: ∆<sub>Fisher Z</sub> M=0.31, CI=[0.21–0.42], p<.001; using adults to predict meridians, anatomical alignment > functional alignment: ∆<sub>Fisher Z</sub> M=0.49, CI=[0.39–0.60], p<.001). This suggests that even if SRM shows that movies can be used to produce retinotopic maps that are significantly similar to a participant, these maps are not as good as those that can be produced by anatomical alignment of the maps from other participants without any movie data.” Pg. 16–17

      Also, the ROIs used for the homotopy analyses were defined based on the retinotopic task rather than based on movie-viewing data alone - leaving it unclear whether movie-viewing data alone can be used to recover functionally distinct regions within the visual cortex.

      We agree with the reviewer that our approach does not test whether movie-viewing data alone can be used to recover functionally distinct regions. The goal of the homotopy analyses was to identify whether there was functional differentiation of visual areas in the infant brain while they watch movies. This was a novel question that provides positive evidence that these regions are functionally distinct. In subsequent analyses, we show that when these areas are defined anatomically, rather than functionally, they also show differentiated function (e.g., Figure 2). Nonetheless, our intention was not to use the homotopy analyses to define the regions. We have added text to clarify the goal and novelty of this analysis.

      “Although these analyses cannot define visual maps, they test whether visual areas have different functional signatures.” Pg. 6

      Additionally, even if the goal were to define areas based on homotopy, we believe the power of that analysis would be questionable. We would need to use a large amount of the movie data to define the areas, leaving a low-powered dataset to test whether their function is differentiated by these movie-based areas.

      - The authors previously reported on retinotopic organization of the visual cortex in human infants (Ellis et al., 2021) and suggest that the feasibility of using movie-viewing experiments to recover these topographic maps is still in question. They point out that movies may not fully sample the stimulus parameters necessary for revealing topographic maps/areas in the visual cortex, or the time-resolution constraints of fMRI might limit the use of movie stimuli, or the rich, uncontrolled nature of movies might make them inferior to stimuli that are designed for retinotopic mapping, or might lead to variable attention between participants that makes measuring the structure of visual responses across individuals challenging. This motivation doesn't sufficiently highlight the importance or value of testing this question in infants. Further, it's unclear if/how this motivation takes into account prior research using movie-viewing fMRI experiments to reveal retinotopic organization in adults (e.g., Knapen, 2021). Given the evidence for retinotopic organization in infants and evidence for the use of movie-viewing experiments in adults, an alternative framing of the novel contribution of this study is that it tests whether retinotopic organization is measurable using a limited amount of movie-viewing data (i.e., a methodological stress test). The study motivation and discussion could be strengthened by more attention to relevant work with adults and/or more explanation of the importance of testing this question in infants (is the reason to test this question in infants purely methodological - i.e., as a way to negate the need for retinotopic tasks in subsequent research, given the time constraints of scanning human infants?).

      We are grateful to the reviewer for giving us the opportunity to clarify the innovations of this research. We believe that this research contributes to our understanding of how infants process dynamic stimuli, demonstrates the viability and utility of movie experiments in infants, and highlights the potential for new movie-based analyses (e.g., SRM). We have now consolidated these motivations in the introduction to more clearly motivate this work:

      “The primary goal of the current study is to investigate whether movie-watching data recapitulates the organization of visual cortex. Movies drive strong and naturalistic responses in sensory regions while minimizing task demands[12, 13, 24] and thus are a proxy for typical experience. In adults, movies and resting-state data have been used to characterize the visual cortex in a data-driven fashion[25–27]. Movies have been useful in awake infant fMRI for studying event segmentation[28], functional alignment[29], and brain networks[30]. However, this past work did not address the granularity and specificity of cortical organization that movies evoke. For example, movies evoke similar activity in infants in anatomically aligned visual areas[28], but it remains unclear whether responses to movie content differ between visual areas (e.g., is there more similarity of function within visual areas than between31). Moreover, it is unknown whether structure within visual areas, namely visual maps, contributes substantially to visual evoked activity. Additionally, we wish to test whether methods for functional alignment can be used with infants. Functional alignment finds a mapping between participants using functional activity – rather than anatomy – and in adults can improve signal-to-noise, enhance across participant prediction, and enable unique analyses[27, 32–34].” Pg. 3-4

      Furthermore, the introduction culminates in the following statement on what the analyses will tell us about the nature of movie-driven activity in infants:

      “These three analyses assess key indicators of the mature visual system: functional specialization between areas, organization within areas, and consistency between individuals.” Pg. 5

      Furthermore, in the discussion we revisit these motivations and elaborate on them further:

      [Regarding homotopy:] “This suggests that visual areas are functionally differentiated in infancy and that this function is shared across hemispheres[31].” Pg. 19

      [Regarding ICA:] “This means that the retinotopic organization of the infant brain accounts for a detectable amount of variance in visual activity, otherwise components resembling these maps would not be discoverable.” Pg. 19–20

      [Regarding SRM:] “This is initial evidence that functional alignment may be useful for enhancing signal quality, like it has in adults[27,32,33], or revealing changing function over development[45].” Pg. 21

      Additionally, we have expanded our discussion of relevant work that uses similar methods such as the excellent research from Knapen (2021) and others:

      “In adults, movies and resting-state data have been used to characterize the visual cortex in a data-driven fashion[25-27].” Pg. 4

      “We next explored whether movies can reveal fine-grained organization within visual areas by using independent components analysis (ICA) to propose visual maps in individual infant brains[25,26,35,42,43].” Pg. 9

      Reviewer #2 (Public Review):

      Summary:

      This manuscript shows evidence from a dataset with awake movie-watching in infants, that the infant brain contains areas with distinct functions, consistent with previous studies using resting state and awake task-based infant fMRI. However, substantial new analyses would be required to support the novel claim that movie-watching data in infants can be used to identify retinotopic areas or to capture within-area functional organization.

      Strengths:

      The authors have collected a unique dataset: the same individual infants both watched naturalistic animations and a specific retinotopy task. These data position the authors to test their novel claim, that movie-watching data in infants can be used to identify retinotopic areas.

      Weaknesses:

      To claim that movie-watching data can identify retinotopic regions, the authors should provide evidence for two claims:

      - Retinotopic areas defined based only on movie-watching data, predict retinotopic responses in independent retinotopy-task-driven data.

      - Defining retinotopic areas based on the infant's own movie-watching response is more accurate than alternative approaches that don't require any movie-watching data, like anatomical parcellations or shared response activation from independent groups of participants.

      We thank the reviewer for their comments. Before addressing their suggestions, we wish to clarify that we do not claim that movie data can be used to identify retinotopic areas, but instead that movie data captures components of the within and between visual area organization as defined by retinotopic mapping. We recognize that this was not clear in our original manuscript and have clarified this point throughout, including in this section of the discussion:

      “To be clear, we are not suggesting that movies work well enough to replace a retinotopy task when accurate maps are needed. For instance, even though ICA found components that were highly correlated with the spatial frequency map, we also selected some components that turned out to have lower correlations. Without knowing the ground truth from a retinotopy task, there would be no way to weed these out. Additionally, anatomical alignment (i.e., averaging the maps from other participants and anatomically aligning them to a held-out participant) resulted in maps that were highly similar to the ground truth. Indeed, we previously[23] found that adult-defined visual areas were moderately similar to infants. While functional alignment with adults can outperform anatomical alignment methods in similar analyses[27], here we find that functional alignment with infants is inferior to anatomical alignment. Thus, if the goal is to define visual areas in an infant that lacks task-based retinotopy, anatomical alignment of other participants’ retinotopic maps is superior to using movie-based analyses, at least as we tested it.” Pg. 21

      In response to the reviewer’s suggestion, we compare the maps identified by SRM to the averaged, anatomically aligned maps from infants. We find that these maps are highly similar to the task-based ground truth and we describe this result in a new section:

      “We performed an anatomical alignment analog of the functional alignment (SRM) approach. This analysis serves as a benchmark for predicting visual maps using taskbased data, rather than movie data, from other participants. For each infant participant, we aggregated all other infant or adult participants as a reference. The retinotopic maps from these reference participants were anatomically aligned to the standard surface template, and then averaged. These averages served as predictions of the maps in the test participant, akin to SRM, and were analyzed equivalently (i.e., correlating the gradients in the predicted map with the gradients in the task-based map). These correlations (Table S4) are significantly higher than for functional alignment (using infants to predict spatial frequency, anatomical alignment < functional alignment: ∆<sub>Fisher Z</sub> M=0.44, CI=[0.32–0.58], p<.001; using infants to predict meridians, anatomical alignment < functional alignment: ∆<sub>Fisher Z</sub> M=0.61, CI=[0.47–0.74], p<.001; using adults to predict spatial frequency, anatomical alignment < functional alignment: ∆<sub>Fisher Z</sub> M=0.31, CI=[0.21–0.42], p<.001; using adults to predict meridians, anatomical alignment < functional alignment: ∆<sub>Fisher Z</sub> M=0.49, CI=[0.39–0.60], p<.001). This suggests that even if SRM shows that movies can be used to produce retinotopic maps that are significantly similar to a participant, these maps are not as good as those that can be produced by anatomical alignment of the maps from other participants without any movie data.” Pg. 16–17

      Note that we do not compare the anatomically aligned maps with the ICA maps statistically. This is because these analyses are not comparable: ICA is run withinparticipant whereas anatomical alignment is necessarily between-participant — either infant or adults. Nonetheless, an interested reader can refer to the Table where we report the results of anatomical alignment and see that anatomical alignment outperforms ICA in terms of the correlation between the predicted and task-based maps.

      Both of these analyses are possible, using the (valuable!) data that these authors have collected, but these are not the analyses that the authors have done so far. Instead, the authors report the inverse of (1): regions identified by the retinotopy task can be used to predict responses in the movies. The authors report one part of (2), shared responses from other participants can be used to predict individual infants' responses in the movies, but they do not test whether movie data from the same individual infant can be used to make better predictions of the retinotopy task data, than the shared response maps.

      So to be clear, to support the claims of this paper, I recommend that the authors use the retinotopic task responses in each individual infant as the independent "Test" data, and compare the accuracy in predicting those responses, based on:

      -  The same infant's movie-watching data, analysed with MELODIC, when blind experimenters select components for the SF and meridian boundaries with no access to the ground-truth retinotopy data.

      -  Anatomical parcellations in the same infant.

      -  Shared response maps from groups of other infants or adults.

      -  (If possible, ICA of resting state data, in the same infant, or from independent groups of infants).

      Or, possibly, combinations of these techniques.

      If the infant's own movie-watching data leads to improved predictions of the infant's retinotopic task-driven response, relative to these existing alternatives that don't require movie-watching data from the same infant, then the authors' main claim will be supported.

      These are excellent suggestions for additional analyses to test the suitability for moviebased maps to replace task-based maps. We hope it is now clear that it was never our intention to claim that movie-based data could replace task-based methods. We want to emphasize that the discoveries made in this paper — that movies evoke fine-grained organization in infant visual cortex — do not rely on movie-based maps being better than alternative methods for producing maps, such as the newly added anatomical alignment.

      The proposed analysis above solves a critical problem with the analyses presented in the current manuscript: the data used to generate maps is identical to the data used to validate those maps. For the task-evoked maps, the same data are used to draw the lines along gradients and then test for gradient organization. For the component maps, the maps are manually selected to show the clearest gradients among many noisy options, and then the same data are tested for gradient organization. This is a double-dipping error. To fix this problem, the data must be split into independent train and test subsets.

      We appreciate the reviewer’s concern; however, we believe it is a result of a miscommunication in our analytic strategy. We have now provided more details on the analyses to clarify how double-dipping was avoided. 

      To summarize, a retinotopy task produced visual maps that were used to trace both area boundaries and gradients across the areas. These data were then fixed and unchanged, and we make no claims about the nature of these maps in this paper, other than to treat them as the ground truth to be used as a benchmark in our analyses. The movie data, which are collected independently from the same infant in the session, used the boundaries from the retinotopy task (in the case of homotopy) or were compared with the maps from the retinotopy task (in the case of ICA and SRM). In other words, the statement that “the data used to generate maps is identical to the data used to validate those maps” is incorrect because we generated the maps with a retinotopy task and validated the maps with the movie data. This means no double dipping occurred.

      Perhaps a cause of the reviewer’s interpretation is that the gradients used in the analysis are not clearly described. We now provide this additional description:  “Using the same manually traced lines from the retinotopy task, we measured the intensity gradients in each component from the movie-watching data. We can then use the gradients of intensity in the retinotopy task-defined maps as a benchmark for comparison with the ICA-derived maps.” Pg. 10

      Regarding the SRM analyses, we take great pains to avoid the possibility of data contamination. To emphasize how independent the SRM analysis is, the prediction of the retinotopic map from the test participant does not use their retinotopy data at all; in fact, the predicted maps could be made before that participant’s retinotopy data were ever collected. To make this prediction for a test participant, we need to learn the inversion of the SRM, but this only uses the movie data of the test participant. Hence, there is no double-dipping in the SRM analyses. We have elaborated on this point in the revision, and we remade the figure and its caption to clarify this point:

      We also have updated the description of these results to emphasize how double-dipping was avoided:

      “We then mapped the held-out participant's movie data into the learned shared space without changing the shared space (Figure 5c). In other words, the shared response model was learned and frozen before the held-out participant’s data was considered.

      This approach has been used and validated in prior SRM studies[45].” Pg. 14

      The reviewer suggests that manually choosing components from ICA is double-dipping. Although the reviewer is correct that the manual selection of components in ICA means that the components chosen ought to be good candidates, we are testing whether those choices were good by evaluating those components against the task-based maps that were not used for the ICA. Our statistical analyses evaluate whether the components chosen were better than the components that would have been chosen by random chance. Critically: all decisions about selecting the components happen before the components are compared to the retinotopic maps. Hence there is no double-dipping in the selection of components, as the choice of candidate ICA maps is not informed by the ground-truth retinotopic maps. We now clarify what the goal of this process is in the results:

      “Success in this process requires that 1) retinotopic organization accounts for sufficient variance in visual activity to be identified by ICA and 2) experimenters can accurately identify these components.” Pg. 10

      The reviewer also alludes to a concern that the researcher selecting the maps was not blind to the ground-truth retinotopic maps from participants and this could have influenced the results. In such a scenario, the researcher could have selected components that have the gradients of activity in the places that the infant has as ground truth. The researcher who made the selection of components (CTE) is one of the researchers who originally traced the areas in the participants approximately a year prior to the identification of ICs. The researcher selecting the components didn’t use the ground-truth retinotopic maps as reference, nor did they pay attention to the participant IDs when sorting the IC components. Indeed, they weren’t trying to find participant specific maps per se, but rather aimed to find good candidate retinotopic maps in general. In the case of the newly added adult analyses, the ICs were selected before the retinotopic mapping was reviewed or traced; hence, no knowledge about the participant-specific ground truth could have influenced the selection of ICs. Even with this process from adults, we find results of comparable strength as we found in infants, as shown below. Nonetheless, there is a possibility that this researcher’s previous experience of tracing the infant maps could have influenced their choice of components at the participant-specific level. If so, it was a small effect since the components the researcher selected were far from the best possible options (i.e., rankings of the selected components averaged in the 64th percentile for spatial frequency maps and the 68th percentile for meridian maps). We believe all reasonable steps were taken to mitigate bias in the selection of ICs.

      Reviewer #3 (Public Review):

      The manuscript reports data collected in awake toddlers recording BOLD while watching videos. The authors analyse the BOLD time series using two different statistical approaches, both very complex but do not require any a priori determination of the movie features or contents to be associated with regressors. The two main messages are that 1) toddlers have occipital visual areas very similar to adults, given that an SRM model derived from adult BOLD is consistent with the infant brains as well; 2) the retinotopic organization and the spatial frequency selectivity of the occipital maps derived by applying correlation analysis are consistent with the maps obtained by standard and conventional mapping.

      Clearly, the data are important, and the author has achieved important and original results. However, the manuscript is totally unclear and very difficult to follow; the figures are not informative; the reader needs to trust the authors because no data to verify the output of the statistical analysis are presented (localization maps with proper statistics) nor so any validation of the statistical analysis provided. Indeed what I think that manuscript means, or better what I understood, may be very far from what the authors want to present, given how obscure the methods and the result presentation are.

      In the present form, this reviewer considers that the manuscript needs to be totally rewritten, the results presented each technique with appropriate validation or comparison that the reader can evaluate.

      We are grateful to the reviewer for the chance to improve the paper. We have broken their review into three parts: clarification of the methods, validation of the analyses, and enhancing the visualization.

      Clarification of the methods

      We acknowledge that the methods we employed are complex and uncommon in many fields of neuroimaging. That said, numerous papers have conducted these analyses on adults (Beckman et al., 2005; Butt et al., 2015; Guntupalli et al., 2016; Haak & Beckman, 2018; Knapen, 2021; Lu et al., 2017) and non-human primates (Arcaro & Livingstone, 2017; Moeller et al., 2009). We have redoubled our efforts in the revision to make the methods as clear as possible, expanding on the original text and providing intuitions where possible. These changes have been added throughout and are too vast in number to repeat here, especially without context, but we hope that readers will have an easier time following the analyses now. 

      Additionally, we updated Figures 3 and 5 in which the main ICA and SRM analyses are described. For instance, in Figure 3’s caption we now add details about how the gradient analyses were performed on the components: 

      “We used the same lines that were manually traced on the task-evoked map to assess the change in the component’s response. We found a monotonic trend within area from medial to lateral, just like we see in the ground truth.” Pg. 11

      Regarding Figure 5, we reconsidered the best way to explain the SRM analyses and decided it would be helpful to partition the diagram into steps, reflecting the analytic process. These updates have been added to Figure 5, and the caption has been updated accordingly.

      We hope that these changes have improved the clarity of the methods. For readers interested in learning more, we encourage them to either read the methods-focused papers that debut the analyses (e.g., Chen et al., 2015), read the papers applying the methods (e.g., Guntupalli et al., 2016), or read the annotated code we publicly release which implements these pipelines and can be used to replicate the findings.

      Validation of the analyses

      One of the requests the reviewer makes is to validate our analyses. Our initial approach was to lean on papers that have used these methods in adults or primates (e.g., Arcaro, & Livingstone, 2017; Beckman et al., 2005; Butt et al., 2015; Guntupalli et al., 2016; Haak & Beckman, 2018; Knapen, 2021; Moeller et al., 2009) where the underlying organization and neurophysiology is established. However, we have made changes to these methods that differ from their original usage (e.g., we used SRM rather than hyperalignment, we use meridian mapping rather than traveling wave retinotopy, we use movie-watching data rather than rest). Hence, the specifics of our design and pipeline warrant validation. 

      To add further validation, we have rerun the main analyses on an adult sample. We collected 8 adult participants who completed the same retinotopy task and a large subset of the movies that infants saw. These participants were run under maximally similar conditions to infants (i.e., scanned using the same parameters and without the top of the head-coil) and were preprocessed using the same pipeline. Given that the relationship between adult visual maps and movie-driven (or resting-state) analyses has been shown in many studies (Beckman et al., 2005; Butt et al., 2015; Guntupalli et al., 2016; Haak & Beckman, 2018; Knapen, 2021; Lu et al., 2017), these adult data serve as a validation of our analysis pipeline. These adult participants were included in the original manuscript; however, they were previously only used to support the SRM analyses (i.e., can adults be used to predict infant visual maps). The adult results are described before any results with infants, as a way to engender confidence. Moreover, we have provided new supplementary figures of the adult results that we hope will be integrated with the article when viewing it online, such that it will be easy to compare infant and adult results, as per the reviewer’s request. 

      As per the figures and captions below, the analyses were all successful with the adult participants: 1) Homotopic correlations are higher than correlations between comparable areas in other streams or areas that are more distant within stream. 2) A multidimensional scaling depiction of the data shows that areas in the dorsal and ventral stream are dissimilar. 3) Using independent components analysis on the movie data, we identified components that are highly correlated with the retinotopy task-based spatial frequency and meridian maps. 4) Using shared response modeling on the movie data, we predicted maps that are highly correlated with the retinotopy task-based spatial frequency and meridian maps.

      These supplementary analyses are underpowered for between-group comparisons, so we do not statistically compare the results between infants and adults. Nonetheless, the pattern of adult results is comparable overall to the infant results. 

      We believe these adult results provide a useful validation that the infant analyses we performed can recover fine-grained organization.

      Enhancing the visualization

      The reviewer raises an additional concern about the lack of visualization of the results. We recognize that the plots of the summary statistics do not provide information about the intermediate analyses. Indeed, we think the summary statistics can understate the degree of similarity between the components or predicted visual maps and the ground truth. Hence, we have added 6 new supplementary figures showing the intensity gradients for the following analyses: 1. spatial frequency prediction using ICA, 2. meridian prediction using ICA, 3. spatial frequency prediction using infant SRM, 4. meridian prediction using infant SRM, 5. spatial frequency prediction using adult SRM, and 6. meridian prediction using adult SRM.

      We hope that these visualizations are helpful. It is possible that the reviewer wishes us to also visually present the raw maps from the ICA and SRM, akin to what we show in Figure 3A and 3B. We believe this is out of scope of this paper: of the 1140 components that were identified by ICA, we selected 36 for spatial frequency and 17 for meridian maps. We also created 20 predicted maps for spatial frequency and 20 predicted meridian maps using SRM. This would result in the depiction of 93 subfigures, requiring at least 15 new full-page supplementary figures to display with adequate resolution. Instead, we encourage the reader to access this content themselves: we have made the code to recreate the analyses publicly available, as well as both the raw and preprocessed data for these analyses, including the data for each of these selected maps.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) As mentioned in the public review, the authors should consider incorporating relevant adult fMRI research into the Introduction and explain the importance of testing this question in infants.

      Our public response describes the several citations to relevant adult research we have added, and have provided further motivation for the project.

      (2) The authors should conduct additional analyses to support their conclusion that movie data alone can generate accurate retinotopic maps (i.e., by comparing this approach to other available alternatives).

      We have clarified in our public response that we did not wish to conclude that movie data alone can generate accurate retinotopic maps, and have made substantial edits to the text to emphasize this. Thus, because this claim is already not supported by our analyses, we do not think it is necessary to test it further.

      (3) The authors should re-do the homotopy analyses using movie-defined ROIs (i.e., by splitting the movie-viewing data into independent folds for functional ROI definition and analyses).

      As stated above, defining ROIs based on the movie content is not the intended goal of this project. Even if that were the general goal, we do not believe that it would be appropriate to run this specific analysis with the data we collected. Firstly, halving the data for ROI definition (e.g., using half the movie data to identify and trace areas, and then use those areas in the homotopy analysis to run on the other half of data) would qualitatively change the power of the analyses described here. Secondly, we would be unable to define areas beyond hV4/V3AB with confidence, since our retinotopic mapping only affords specification of early visual cortex. Thus we could not conduct the MDS analyses shown in Figure 2.

      (4) If the authors agree that a primary contribution of this study and paper is to showcase what is possible to do with a limited amount of movie-viewing data, then they should make it clearer, sooner, how much usable movie data they have from infants. They could also consider conducting additional analyses to determine the minimum amount of fMRI data necessary to reveal the same detailed characteristics of functional responses in the visual cortex.

      We agree it would be good to highlight the amount of movie data used. When the infant data is first introduced in the results section, we now state the durations:

      “All available movies from each session were included (Table S2), with an average duration of 540.7s (range: 186--1116s).” Pg. 5

      Additionally, we have added a homotopy analysis that describes the contribution of data quantity to the results observed. We compare the amount of data collected with the magnitude of same vs. different stream effect (Figure 1B) and within stream distance effect (Figure 1C). We find no effect of movie duration in the sample we tested, as reported below:

      “We found no evidence that the variability in movie duration per participant correlated with this difference [of same stream vs. different stream] (r=0.08, p=.700).” Pg. 6-7

      “There was no correlation between movie duration and the effect (Same > Adjacent: r=-0.01, p=.965, Adjacent > Distal: r=-0.09, p=.740).” Pg. 7

      (5) If any of the methodological approaches are novel, the authors should make this clear. In particular, has the approach of visually inspecting and categorizing components generated from ICA and movie data been done before, in adults/other contexts?

      The methods we employed are similar to others, as described in the public review.

      However, changes were necessary to apply them to infant samples. For instance, Guntupalli et al. (2016) used hyperalignment to predict the visual maps of adult participants, whereas we use SRM. SRM and hyperalignment have the same goal — find a maximally aligned representation between participants based on brain function — but their implementation is different. The application of functional alignment to infants is novel, as is their use in movie data that is relatively short by comparison to standard adult data. Indeed, this is the most thorough demonstration that SRM — or any functional alignment procedure — can be usefully applied to infant data, awake or sleeping. We have clarified this point in the discussion.

      “This is initial evidence that functional alignment may be useful for enhancing signal quality, like it has in adults[27,32,33], or revealing changing function over development[45], which may prove especially useful for infant fMRI[52].” Pg. 21

      (6) The authors found that meridian maps were less identifiable from ICA and movie data and suggest that this may be because these maps are more susceptible to noise or gaze variability. If this is the case, you might predict that these maps are more identifiable in adult data. The authors could consider running additional analyses with their adult participants to better understand this result.

      As described in the manuscript, we hypothesize that meridian maps are more difficult to identify than spatial frequency maps because meridian maps are a less smooth, more fine-grained map than spatial frequency. Indeed, it has previously been reported (Moeller et al., 2009) that similar procedures can result in meridian maps that are constituted by multiple independent components (e.g., a component sensitive to horizontal orientations, and a separate component sensitive to vertical components). Nonetheless, we have now conducted the ICA procedure on adult participants and again find it is easier to identify spatial frequency components compared to meridian maps, as reported in the public review.

      Minor corrections:

      (1) Typo: Figure 3 title: "Example retintopic task vs. ICA-based spatial frequency maps.".

      Fixed

      (2) Given the age range of the participants, consider using "infants and toddlers"? (Not to diminish the results at all; on the contrary, I think it is perhaps even more impressive to obtain awake fMRI data from ~1-2-year-olds). Example: Figure 3 legend: "A) Spatial frequency map of a 17.1-monthold infant.".

      We agree with the reviewer that there is disagreement about the age range at which a child starts being considered a toddler. We have changed the terms in places where we refer to a toddler in particular (e.g., the figure caption the reviewer highlights) and added the phrase “infants and toddlers” in places where appropriate. Nonetheless, we have kept “infants” in some places, particularly those where we are comparing the sample to adults. Adding “and toddlers” could imply three samples being compared which would confuse the reader.

      (3) Figure 6 legend: The following text should be omitted as there is no bar plot in this figure: "The bar plot is the average across participants. The error bar is the standard error across participants.".

      Fixed

      (4) Table S1 legend: Missing first single quote: Runs'.

      Fixed

      Reviewer #2 (Recommendations For The Authors):

      I request that this paper cite more of the existing literature on the fMRI of human infants and toddlers using task-driven and resting-state data. For example, early studies by (first authors) Biagi, Dehaene-Lambertz, Cusack, and Fransson, and more recent studies by Chen, Cabral, Truzzi, Deen, and Kosakowski.

      We have added several new citations of recent task-based and resting state studies to the second sentence of the main text:

      “Despite the recent growth in infant fMRI[1-6], one of the most important obstacles facing this research is that infants are unable to maintain focus for long periods of time and struggle to complete traditional cognitive tasks[7].”

      Reviewer #3 (Recommendations For The Authors):

      In the following, I report some of my main perplexities, but many more may arise when the material is presented more clearly.

      The age of the children varies from 5 months to about 2 years. While the developmental literature suggests that between 1 and 2 years children have a visual system nearly adult-like, below that age some areas may be very immature. I would split the sample and perhaps attempt to validate the adult SRM model with the youngest children (and those can be called infants).

      We recognize the substantial age variability in our sample, which is why we report participant-specific data in our figures. While splitting up the data into age bins might reveal age effects, we do not think we can perform adequately powered null hypothesis testing of the age trend. In order to investigate the contribution of age, larger samples will be needed. That said, we can see from the data that we have reported that any effect of age is likely small. To elaborate: Figures 4 and 6 report the participant-specific data points and order the participants by age. There are no clear linear trends in these plots, thus there are no strong age effects.

      More broadly, we do not think there is a principled way to divide the participants by age. The reviewer suggests that the visual system is immature before the first year of life and mature afterward; however, such claims are the exact motivation for the type of work we are doing here, and the verdict is still out. Indeed, the conclusion of our earlier work reporting retinotopy in infants (Ellis et al., 2021) suggests that the organization of the early visual cortex in infants as young as 5 months — the youngest infant in our sample — is surprisingly adult-like.

      The title cannot refer to infants given the age span.

      There is disagreement in the field about the age at which it is appropriate to refer to children as infants. In this paper, and in our prior work, we followed the practice of the most attended infant cognition conference and society, the International Congress of Infant Studies (ICIS), which considers infants as those aged between 0-3 years old, for the purposes of their conference. Indeed, we have never received this concern across dozens of prior reviews for previous papers covering a similar age range. That said, we understand the spirit of the reviewer’s comment and now refer to the sample as “infants and toddlers” and to older individuals in our sample as “toddlers” wherever it is appropriate (the younger individuals would fairly be considered “infants” under any definition).

      Figure 1 is clear and an interesting approach. Please also show the average correlation maps on the cortical surface.

      While we would like to create a figure as requested, we are unsure how to depict an area-by-area correlation map on the cortical surface. One option would be to generate a seed-based map in which we take an area and depict the correlation of that seed (e.g., vV1) with all other voxels. This approach would result in 8 maps for just the task-defined areas, and 17 maps for anatomically-defined areas. Hence, we believe this is out of scope of this paper, but an interested reader could easily generate these maps from the data we have released.

      Figure 2 results are not easily interpretable. Ventral and dorsal V1-V3 areas represent upper or lower VF respectively. Higher dorsal and ventral areas represent both upper and lower VF, so we should predict an equal distance between the two streams. Again, how can we verify that it is not a result of some artifacts?

      In adults, visual areas differ in their functional response properties along multiple dimensions, including spatial coding. The dorsal/ventral stream hypothesis is derived from the idea that areas in each stream support different functions, independent of spatial coding. The MDS analysis did not attempt to isolate the specific contribution of spatial representations of each area but instead tested the similarity of function that is evoked in naturalistic viewing. Other covariance-based analyses specifically isolate the contribution of spatial representations (Haak et al., 2013); however, they use a much more constrained analysis than what was implemented here. The fact that we find broad differentiation of dorsal and ventral visual areas in infants is consistent with adults (Haak & Beckman, 2018) and neonate non-human primates (Arcaro & Livingstone, 2017). 

      Nonetheless, we recognize that we did not mention the differences in visual field properties across areas and what that means. If visual field properties alone drove the functional response then we would expect to see a clustering of areas based on the visual field they represent (e.g., hV4 and V3AB should have similar representations). Since we did not see that, and instead saw organization by visual stream, the result is interesting and thus warrants reporting. We now mention this difference in visual fields in the manuscript to highlight the surprising nature of the result.

      “This separation between streams is striking when considering that it happens despite differences in visual field representations across areas: while dorsal V1 and ventral V1 represent the lower and upper visual field, respectively, V3A/B and hV4 both have full visual field maps. These visual field representations can be detected in adults[41]; however, they are often not the primary driver of function[39]. We see that in infants too: hV4 and V3A/B represent the same visual space yet have distinct functional profiles.” Pg. 8

      The reviewer raises a concern that the MDS result may be spurious and caused by noise. Below, we present three reasons why we believe these results are not accounted for by artifacts but instead reflect real functional differentiation in the visual cortex. 

      (1) Figure 2 is a visualization of the similarity matrix presented in Figure S1. In Figure S1, we report the significance testing we performed to confirm that the patterns differentiating dorsal and ventral streams — as well as adjacent areas from distal areas — are statistically reliable across participants. If an artifact accounted for the result then it would have to be a kind of systematic noise that is consistent across participants.

      (2) One of the main sources of noise (both systematic and non-systematic) with infant fMRI is motion. Homotopy is a within-participant analysis that could be biased by motion. To assess whether motion accounts for the results, we took a conservative approach of regressing out the framewise motion (i.e., how much movement there is between fMRI volumes) from the comparisons of the functional activity in regions. Although the correlations numerically decreased with this procedure, they were qualitatively similar to the analysis that does not regress out motion:

      “Additionally, if we control for motion in the correlation between areas --- in case motion transients drive consistent activity across areas --- then the effects described here are negligibly different (Figure S5).” Pg. 7

      (3) We recognize that despite these analyses, it would be helpful to see what this pattern looks like in adults where we know more about the visual field properties and the function of dorsal and ventral streams. This has been done previously (e.g., Haak & Beckman, 2018), but we have now run those analyses on adults in our sample, as described in the public review. As with infants, there are reliable differences in the homotopy between streams (Figure S1). The MDS results show that the adult data was more complex than the infant data, since it was best described by 3 dimensions rather than 2. Nonetheless, there is a rotation of the MDS such that the structure of the ventral and dorsal streams is also dissociable. 

      Figure 3 also raises several alternative interpretations. The spatial frequency component in B has strong activity ONLY at the extreme border of the VF and this is probably the origin of the strong correlation. I understand that it is only one subject, but this brings the need to show all subjects and to report the correlation. Also, it is important to show the putative average ICA for retinotopy and spatial frequencies across subjects and for adults. All methods should be validated on adults where we have clear data for retinotopy and spatial frequency.

      The reviewer notes that the component in Figure 3 shows strong negative response in the periphery. It is often the case, as reported elsewhere (Moeller et al., 2009), that ICA extracts portions of visual maps. To make a full visual map would require combining components into a composite (e.g., a component that has a high response in the periphery and another component that has a high response in the fovea). If we were to claim that this component, or others like it, could replace the need for retinotopic mapping, then we would want to produce these composite maps; however, our conclusion in this project is that the topographic information of retinotopic maps manifest in individual components of ICA. For this purpose, the analysis we perform adequately assesses this topography.

      Regarding the request to show the results for all subjects, we address this in the public response and repeat it here briefly: we have added 6 new figures to show results akin to Figure 3C and D. It is impractical to show the equivalent of Figure 3A and B for all participants, yet we do release the data necessary to see to visualize these maps easily.

      Finally, the reviewer suggests that we validate the analyses on adult participants. As shown in Figure S3 and reported in the public response, we now run these analyses on adult participants and observe qualitatively similar results to infants.

      How much was the variation in the presumed spatial frequency map? Is it consistent with the acuity range? 5-month-old infants should have an acuity of around 10c/deg, depending on the mean luminance of the scene.

      The reviewer highlights an important weakness of conducting ICA: we cannot put units on the degree of variation we see in components. We now highlight this weakness in the discussion:

      “Another limitation is that ICA does not provide a scale to the variation: although we find a correlation between gradients of spatial frequency in the ground truth and the selected component, we cannot use the component alone to infer the spatial frequency selectivity of any part of cortex. In other words, we cannot infer units of spatial frequency sensitivity from the components alone.” Pg. 20

      Figure 5 pipeline is totally obscure. I presumed that I understood, but as it is it is useless. All methods should be clearly described, and the intermediate results should be illustrated in figures and appropriately discussed. Using such blind analyses in infants in principle may not be appropriate and this needs to be verified. Overall all these techniques rely on correlation activities that are all biased by head movement, eye movement, and probably the dummy sucking. All those movements need to be estimated and correlated with the variability of the results. It is a strong assumption that the techniques should work in infants, given the presence of movements.

      We recognize that the SRM methods are complex. Given this feedback, we remade Figure 5 with explicit steps for the process and updated the caption (as reported in the public review).

      Regarding the validation of these methods, we have added SRM analyses from adults and find comparable results. This means that using these methods on adults with comparable amounts of data as what we collected from infants can predict maps that are highly similar to the real maps. Even so, it is not a given that these methods are valid in infants. We present two considerations in this regard. 

      First, as part of the SRM analyses reported in the manuscript, we show that control analyses are significantly worse than the real analyses (indicated by the lines on Figure 6). To clarify the control analysis: we break the mapping (i.e., flip the order of the data so that it is backwards) between the test participant and the training participants used to create the SRM. The fact that this control analysis is significantly worse indicates that SRM is learning meaningful representations that matter for retinotopy. 

      Second, we believe that this paper is a validation of SRM for infants. Infant fMRI is a nascent field and SRM has the potential to increase the signal quality in this population. We hope that readers will see these analyses as a proof of concept that SRM can be used in their work with infants. We have stated this contribution in the paper now.

      “Additionally, we wish to test whether methods for functional alignment can be used with infants. Functional alignment finds a mapping between participants using functional activity -- rather than anatomy -- and in adults can improve signal-to-noise, enhance across participant prediction, and enable unique analyses[27,32-34].” Pg. 4

      “This is initial evidence that functional alignment may be useful for enhancing signal quality, like it has in adults[27,32,33], or revealing changing function over development[45].” Pg. 21

      Regarding the reviewer’s concern that motion may bias the results, we wish to emphasize the nature of the analyses being conducted here: we are using data from a group of participants to predict the neural responses in a held-out participant. For motion to explain consistency between participants, the motion would need to be timelocked across participants. Even if motion was time-locked during movie watching, motion will impair the formation of an adequate model that can contain retinotopic information. Thus, motion should only hurt the ability for a shared response to be found that can be used for predicting retinotopic maps. Hence, the results we observed are despite motion and other sources of noise.

      What is M??? is it simply the mean value??? If not, how it is estimated?

      M is an abbreviation for mean. We have now expanded the abbreviation the first time we use it.

      Figure 6 should be integrated with map activity where the individual area correlation should be illustrated. Probably fitting SMR adult works well for early cortical areas, but not for more ventral and associative, and the correlation should be evaluated for the different masks.

      With the addition of plots showing the gradients for each participant and each movie (Figures S10–S13) we hope we have addressed this concern. We additionally want to clarify that the regions we tested in the analysis in Figure 6 are only the early visual areas V1, V2, V3, V3A/B, and hV4. The adult validation analyses show that SRM works well for predicting the visual maps in these areas. Nonetheless, it is an interesting question for future research with more extensive retinotopic mapping in infants to see if SRM can predict maps beyond extrastriate cortex.

      Occipital masks have never been described or shown.

      The occipital mask is from the MNI probabilistic structural atlas (Mazziotta et al., 2001), as reported in the original version and is shared with the public data release. We have added the additional detail that the probabilistic atlas is thresholded at 0% in order to be liberally inclusive. 

      “We used the occipital mask from the MNI structural atlas[63] in standard space -- defined liberally to include any voxel with an above zero probability of being labelled as the occipital lobe -- and used the inverted transform to put it into native functional space.” Pg. 27–28

      Methods lack the main explanation of the procedures and software description.

      We hope that the additions we have made to address this reviewer’s concerns have provided better explanations for our procedures. Additionally, as part of the data and code release, we thoroughly explain all of the software needed to recreate the results we have observed here.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      My main concern is the use of the 700K SNP dataset. This set of SNPs suffers from a heavy ascertainment bias, which can be seen in the PCA in the supplementary material where all the aurochs cluster in the center within the variation of cattle. Given the coverage of some of the samples, multiple individuals would have less than 10K SNP covered. The majority of these are unlikely to be informative here given that they would just represent fixed positions between taurine and indicine or SNPs mostly variable in milk cattle breeds. The authors would get a much better resolution (i.e. many more SNPs to work with their very low genome coverage data) using the 1000 bull genome project VCF data set:

      https://www.ebi.ac.uk/ena/browser/view/PRJEB42783 which based on whole genome resequencing data from many cattle. This will certainly help with improving the resolution of qpAdm and f4 analysis, which have huge confidence intervals in most cases. Right now some individuals have huge confidence intervals ranging from 0 to 80% auroch ancestry...

      We thank the reviewer for this suggestion. We repeated our analyses with a SNP panel from Run 6 of the 1000 Bulls project presented in Naval-Sanchez et al 2020. This panel reduced standard errors and narrowed down confidence intervals for the ancient samples. Another consequence is that more single-source qpAdm models can now be rejected highlighting the abundance of hybridization. For our comparison to modern breeds, we still use the 700K dataset as it provides a set of different modern European cattle breeds.

      I agree with the authors that qpAdm is likely to give quite a noisy estimate of ancestry here (likely explain part of the issue I mentioned above). Although qpAdm is good for model testing here for ancestry proportion the authors instead could use an explicit f4 ratio - this would allow them to specify a model which would make the result easier to interpret.

      We have added ancestry estimates from f4 ratios to the manuscript and display them together with qpAdm and Struct-f4 (as suggested by reviewer #3) in our new Table 1. We decided to keep all three different estimates to illustrate that results are not consistent for all analyses. An additional feature of qpAdm is the possibility that two source models can be rejected and additional ancestries can be identified.

      The interpretation of the different levels of allele sharing on X vs autosome being the result of sex-bias admixture is not very convincing. Could these differences simply be due to a low recombination rate on the X chromosome and/or lower effective population size, which would lead to less efficient purifying selection?

      Following this comment (and another comment referring to the X chromosome analysis by reviewer #2), we decided to remove sex bias from the title of our study and add more information on the caveats of this analysis. While estimating ancestry on the X chromosome can be difficult, we also add that our patterns are consistent with what has been suggested based on ancient mitochondrial data (Verdugo et al 2019). For Neolithic Anatolia, it has been suggested that the insemination of domestic cows by auroch bulls has been intentional or even ritual (Peters et al 2012). A recent parallel archaeogenomic study also concluded sex-biased introgression from autosomal, X-chromosomal and Y-chromosomal data (Rossi et al 2024). As our results are consistent with these previous studies as well as the lower differentiation of modern breeds on the X chromosome (da Fonseca et al 2019), we still consider the general pattern of our results valid even if the exact extent of sex bias is difficult to assess.

      The authors suggest that 2 pop model rejection in some domestic population might be due to indicine ancestry, this seems relatively straightforward to test.

      We had already performed this analysis of modeling their ancestry from three sources using qpAdm. The results are shown in Supplementary Table S6 and we now refer to this more explicitly in the text: “The presence of indicine ancestry can be confirmed in a qpAdm analysis using three sources resulting in fitting models for all breeds (Supplementary Table S6).”

      The first sentence of the paper is a bit long-winded, also dogs were domesticated before the emergence of farming societies.

      We rephrased the first sentence to “Domestication of livestock and crops has been the dominant and most enduring innovation of the transition from a hunter-gathering lifestyle to farming societies.”

      It would be good to be specific about the number of genomes and coverage info in the last paragraph of the intro.

      This information is included in the first paragraph of the results section and we decided to not duplicate the numbers in the preceding introduction paragraph to retain a flow for the readers.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, the authors investigated the admixture history of domestic cattle since they were introduced into Iberia, by studying genomic data from 24 ancient samples dated to ~2000-8000 years ago and comparing them to modern breeds. They aimed to (1) test for introgression from (local) wild aurochs into domestic cattle; (2) characterize the pattern of admixture (frequency, extent, sex bias, directionality) over time; (3) test for correlation between genetic ancestry and stable isotope levels (which are indicative of ecological niche); and (4) test for the hypothesized higher aurochs ancestry in a modern breed of fighting bulls.

      Strengths:

      Overall, this study collects valuable new data that are useful for testing interesting hypotheses, such as admixture between domestic and wild populations, and correlation between genome-wide aurochs ancestry and aggressiveness.

      Thank you for highlighting the importance of our study and the potential of our dataset.

      Weaknesses:

      Most conclusions are partially supported by the data presented. The presence of admixed individuals in prehistorical periods supports the hypothesized introgression, although this conclusion needs to be strengthened with an analysis of potential contamination. The frequency, sex-bias, and directionality of admixture remain highly uncertain due to limitations of the data or issues with the analysis. There is considerable overlap in stable isotope values between domestic and wild groups, indicating a shared ecological niche, but variation in classification criteria for domestic vs wild groups and in skeletal elements sampled for measurements significantly weakens this claim. Lastly, the authors presented convincing evidence for relatively constant aurochs ancestry across all modern breeds, including the Lidia breed which has been bred for aggressiveness for centuries. My specific concerns are outlined below.

      Contamination is a common concern for all ancient DNA studies. Contamination by modern samples is perhaps unlikely for this specific study of ancient cattle, but there is still the possibility of cross-sample contamination. The authors should estimate and report contamination estimates for each sample (based on coverage of autosomes and sex chromosomes, or heterozygosity of Y or MT DNA). Such contamination estimates are particularly important to support the presence of individuals with admixed ancestry, as a domestic sample contaminated with a wild sample (or vice versa) could appear as an admixed individual.

      We thank the reviewer for this suggestion. Due to our low coverage data, we focused on estimating contamination from the mitochondrial data by implementing the approach used by Green et al (2008). We make the code for this step available on Github. While most samples displayed low levels of contamination, we identified one sample (moo013a) with a surprisingly high (~50%) level of contamination which was excluded from further analysis.

      A major limitation of this study is uncertainty in the "population identity" for most sampled individuals (i.e., whether an individual belonged to the domesticated or wild herd when they were alive). Based on chronology, morphology, and genetic data, it is clear the Mesolithic samples from the Artusia and Mendandia sites are bona fide aurochs, but the identities of individuals from the other two sites are much less certain. Indeed, archeological and morphological evidence from El Portalon supports the presence of both domestic animals and wild aurochs, which is echoed by the inter-individual heterogeneity in genetic ancestry. Based on results shown in Fig 1C and Fig 2 it seems that individuals moo017, moo020, and possibly moo012a are likely wild aurochs that had been hunted and brought back to the site by humans. Although the presence of individuals (e.g., moo050, moo019) that can only be explained by two-source models strongly supports that interbreeding happened (if cross-contamination is ruled out), it is unclear whether these admixed individuals were raised in the domestic population or lived in the wild population and hunted.

      The reviewer is pointing out an important topic, the unknown identity of the studied individuals. We have revised the text making clear that we do not know whether the individuals were hunted or herded. At the same time, their genomic ancestry speaks for itself showing that there was hybridization between wild and domestic and that different individuals carried different degrees of wild ancestry. In the revised version, we have added the unknown identity as well as the fact that our results can be affected by both, changes in human hunting and herding practices over time. Regardless of the exact identity of the individuals, our results can still be seen as (a) evidence for hybridization and (b) changes in human practices (hunting and/or herding) and their relationship to bovids over time.

      Such uncertainty in "population identity" limits the authors' ability to make conclusions regarding the frequency, sex bias, and directionality of gene flow between domestic and wild populations. For instance, the wide range of ancestry estimates in Neolithic and Chalcolithic samples could be interpreted as evidence of (1) frequent recent gene flow or (2) mixed practices of herding and hunting and less frequent gene flow. Similarly, the statement about "bidirection introgression" (on pages 8 and 11) is not directly supported by data. As the genomic, morphological, and isotope data cannot confidently classify an individual as belonging to the domesticated or wild population, it seems impossible to conclude the direction of gene flow (if by "bidirection introgression" the authors mean something other than "bidirectional gene flow", they need to clearly explain this before reaching the conclusion.)

      We have removed “bidirectional introgression” from the text and replaced it with the more neutral term “hybridization”. Furthermore, we used the revision to mention at several places in the text that it is not clear whether the sequenced individuals were hunted and herded and that the observed pattern likely reflects changes in both hunting and herding practices.

      The f4 statistics shown in Fig 3B are insufficient to support the claim regarding sex-biased hybridization, as the f4 statistic values are not directly comparable between the X chromosome and autosomes. Because the effective population size is different for the X chromosome and autosomes (roughly 3:4 for populations with equal numbers of males and females), the expected amount of drift is different, hence the fraction of allele sharing (f4) is expected to be different. In fact, the observation that moo004 whose autosomal genome can be modeled as 100% domestic ancestry still shows a higher f4 value for the X chromosome than autosomes hints at this issue. A more robust metric to test for sex-biased admixture is the admixture proportion itself, which can be estimated by qpAdm or f4-ratio (see Patterson et al 2012). However, even with this method, criticism has been raised (e.g., Lazaridis and Reich 2017; Pfennig and Lachance, 2023). In general, detecting sex-bias admixture is a tough problem.

      In response to this comment and another comment by reviewer #1, we decided to remove sex bias from the title. In the revised version of our study, we have now switched this analysis from f4 statistics to comparing f4 ratios between the X chromosome and autosomes (Figure 3). Furthermore, we have added more information on the caveats of this analysis citing the articles mentioned by the reviewer. At the same time, we highlight that our patterns are consistent with what has been suggested based on ancient mitochondrial data (Verdugo et al 2019). Unfortunately, the low coverage data does not allow to call Y chromosomal haplotypes which would also allow an analysis of the paternal lineage. But our results are consistent with additional examples from the literature: For Neolithic Anatolia, it has been suggested that the insemination of domestic cows by auroch bulls has been intentional or even ritual (Peters et al 2012) and there is a lower differentiation of modern breeds on the X chromosome (da Fonseca et al 2019). A recent parallel archaeogenomic study also concluded sex-biased introgression from autosomal, X-chromosomal and Y-chromosomal data (Rossi et al 2024). Similar to the broader hybridization signal, our interpretation does not depend on the estimates for single individuals as we describe the broader pattern. As our results are consistent with previous results based on other types of data, we still consider the general pattern of our results valid even if the exact extent of sex bias is difficult to assess.

      In general, the stable isotope analysis seems to be very underpowered, due to the issues of variation in classification criteria and skeletal sampling location discussed by the authors in supplementary material. The authors claimed a significant difference in stable nitrogen isotope between (inconsistently defined) domestic cattle and wild aurochs, but no figures or statistics are presented to support this claim. Please describe the statistical method used and the corresponding p-values. The authors can consider including a figure to better show the stable isotope results.

      In combination with updated tables, we have added a supplementary figure showing the stable isotope results (S9). In light of the reanalysis of the genetic data, we have reassessed the genetic models used to assign species in the stable isotope analysis. We have provided more details of the statistical methods used and the p-values are given in the supplementary materials. There is a significant difference in the nitrogen isotope values when comparing B. taurus and B. primigenius (identified on morphology) but no other comparisons are significant at the p = 0.05 threshold. The reviewer highlights what we have mentioned in the supplementary material regarding the varied skeletal elements used for stable isotope analysis and the difficulty of assigning a species identity (as this depends on what criteria are used; morphological or some kind of genetic threshold of ancestry). Indeed, how to identify the species is at the heart of the paper. Given that identity could be defined in many ways, we have used 3 different genetic models to reflect this and the morphological categories, to help explore different possible scenarios. The reviewer is correct to point out that some of this analysis is not helped by the variety of skeletal elements used, but we have been careful not to over-interpret the results. The only samples that have nitrogen values higher than one standard deviation from the mean are domestic cattle, so it is not unreasonable to suggest that only domestic cattle have high nitrogen isotope values.

      Reviewer #3 (Public Review):

      Summary:

      Günther and colleagues leverage ancient DNA data to track the genomic history of one of the most important farm animals (cattle) in Iberia, a region showing peculiarities both in terms of cultural practices as well as a climatic refugium during the LGM, the latter of which could have allowed the survival of endemic lineages. They document interesting trends of hybridisation with wild aurochs over the last 8-9 millennia, including a stabilisation of auroch ancestry ~4000 years ago, at ~20%, a time coincidental with the arrival of domestic horses from the Pontic steppe. Modern breeds such as the iconic Lidia used in bullfighting or bull running retain a comparable level of auroch ancestry.

      Strengths:

      The generation of ancient DNA data has been proven crucial to unravel the domestication history of traditional livestock, and this is challenging due to the environmental conditions of the Iberian peninsula, less favourable to DNA preservation. The authors leverage samples unearthed from key archaeological sites in Spain, including the karstic system of Atapuerca. Their results provide fresher insights into past management practices, and permit characterisation of significant shifts in hybridization with wild aurochs.

      We thank the reviewer for their positive assessment of our work and for highlighting the strength and potential of the study.

      Weaknesses:

      - Treatment of post-mortem damage: the base quality of nucleotide transitions was recalibrated down to a quality score of 2, but for 5bp from the read termini only. In some specimens (e.g. moo022), the damage seems to extend further. Why not use dedicated tools (e.g. mapDamage), or check the robustness by conditioning on nucleotide transversions?

      We agree that using such a non-standard data preparation approach requires some testing. Since our main analyses are all based on f statistics, we compared f4 statistics and f4 ratios of our rescaled base quality data with data only using transversion sites. While estimates are highly correlated, the data set reduced to transversions produces larger confidence intervals in f4 ratios due to the lower number of sites. Consequently, we decided to use the rescaled data for all analyses displayed in main figures. We also prefer not to perform reference based rescaling as implemented in mapDamage as it might be sensitive to mapping bias (Günther & Nettelblad 2019).

      - Their more solid analyses are based on qpAdm, but rely on two single-sample donor populations. As the authors openly discuss, it is unclear whether CPC98 is a good proxy for Iberian aurochs despite possibly forming a monophyletic clade (the number of analysed sites is simply too low to assess this monophyly; Supplementary Table S2). Additionally, it is also unclear whether Sub1 was a fully unadmixed domestic specimen, depleted of auroch ancestry. The authors seem to suggest themselves that sex-biased introgression may have already taken place in Anatolia ("suggesting that sex-biased processes already took place prior to the arrival of cattle to Iberia").

      We expanded the discussion on this topic but removed the analysis of whether European aurochs form a clade due to the low number of sites. We do highlight that a recent parallel study on aurochs genomes confirmed that Western European aurochs form a clade, probably even originating from an Iberian glacial refugium (Rossi et al 2024). Even if minor structure in the gene pool of European aurochs might affect our quantitative results, it should not drive the qualitative pattern. The same should be the case for Sub1 as our tests would detect additional European aurochs ancestry that was not present in Sub1. The corresponding paragraph now reads:

      “A limitation of this analysis is the availability of genomes that can be used as representatives of the source populations as we used German and British aurochs to represent western European aurochs ancestry and a single Anatolian Neolithic to represent the original domestic cattle that was introduced into Europe. Our Mesolithic Iberian aurochs contained too little endogenous DNA to be used as a proxy aurochs reference and all Neolithic and Chalcolithic samples estimated with predominantly aurochs ancestry (including the 2.7x genome of moo014) already carry low (but significant) levels of domestic ancestry. However, the fact that all of these aurochs samples carried P mitochondria strongly suggests that western European aurochs can be considered monophyletic. Furthermore, a recent parallel study also concluded that Western European aurochs all form a clade (27). The Anatolian Sub1 might also not be depleted of any European aurochs ancestry and could not fully represent the original European Neolithic gene pool as also indicated by qpAdm and Struct-f4 identifying small proportions of other Asian ancestries in some Iberian individuals.

      While these caveats should affect our quantitative estimates of European aurochs ancestry, they should not drive the qualitative pattern as our tests would still detect any excess European aurochs ancestry that was not present in Neolithic Anatolia.”

      Alternatively, I recommend using Struct-f4 as it can model the ancestry of all individuals together based on their f4 permutations, including outgroups and modern data, and without the need to define pure "right" and "left" populations such as CPC98 and Sub1. It should work with low-coverage data, and allows us to do f4-based MDS plots as well as to estimate ancestry proportions (including from ghost populations).

      We thank the reviewer for this suggestion. We added Struct-f4 as an analysis but observed that it would not converge in an individual-based analysis due to the low coverage of most of our samples. We added Struct-f4 results for samples with >0.1X to the new Table 1, the results are similar to the results obtained using f4 ratios and (to a lower degree) the qpAdm results.

      - In the admixture graph analyses (supplementary results), the authors use population groups based on a single sample. If these samples are pseudohaploidised (or if coverage is insufficient to estimate heterozygosity - and it is at least for moo004 and moo014), f3 values are biased, implying that the fitted graph may be wrong. The graph shown in Fig S7 is in fact hard to interpret. For example, the auroch Gyu2 from Anatolia but not the auroch CPC98 also from Anatolia received 62% of ancestry from North Africa? The Neolithic samples moo004 and moo014 also show the same shocking disparity. I would consider re-doing this analysis with more than a sample per population group

      There seems to be some confusion relating to the sample identity in these figures. CPC98 is British and not Anatolian while Gyu2 is from the Caucasus and not Anatolia which would explain why they are different. Furthermore, moo004 is mostly of domestic ancestry while, moo014 is mostly of European aurochs ancestry according to our other analyses, which should explain why they also behave differently in this analysis. To avoid confusion and since this is a supplementary analysis from which we are not drawing any major conclusions, we decided to remove the graphs and the analysis from the study.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Fig 3A: The red regression line is misleading. It seems to show that the average aurochs ancestry fraction has been steadily decreasing since ~8000 years ago, but the "averaging" is not meaningful as not all samples necessarily represent domestic cattle remains and the sample size is rather small. In other words, the samples are just a small, random collection of domestic and wild animals, and the average ancestry is subject to large sampling noise. I would suggest removing the regression line (along with the associated confidence interval) in this figure. It would also be helpful to label the samples with their IDs and morphology in the plot for cross-reference with other figures. Also, it is said in the legend that "Modern Iberian breeds... are added around date 0 with some vertical jitter". Do the authors mean "horizontal jitter" instead?

      Thank you for noticing this! We have removed the regression line and corrected the figure legend.

      Fig 2 vs Fig 3A: are the error bars the same in these two plots? They seem to be highly similar, if not identical, but the legends read very differently ("95% confidence interval by block-jackknife vs. on standard error"). Please explain.

      The figure legends have been corrected.

      Fig 3B: What do the error bars in Fig 3B mean? 95% confidence interval or one standard error? Please clarify in the legend.

      We have removed this figure and replaced it with a different way of displaying the results (now Figure 3). We ensured that the error bars are displayed consistently across figures.

      According to the f4 statistics shown in Fig 1C and Fig 3B, moo012b carries a relatively high amount of domestic ancestry. How is this compatible with the observation in Fig 2 that this individual can be modeled with 100% aurochs (i.e., aurochs as the single source)? Does this simply reflect the low genome coverage?

      moo012b is indeed one of the lowest coverage samples in our has at <0.02x sequencing depth. Even in our revised analysis using more sites, there is a discrepancy between the results of f4 statistics and qpAdm (suggesting mostly domestic ancestry) and f4 ratio suggesting mostly aurochs ancestry (Figure 1C and Table 1). We believe that this highlights the sensitivity of different methods to assumptions about the relationships of sources and potential “outgroups” which might not be well resolvable with low coverage data and in the presence of potentially complex admixture. Our general results, however, do not depend on the estimates for single individuals as our interpretations are based on the general pattern.

      I don't fully understand the rationale behind the statement "However, at some point, the herding practices must have changed since modern Iberian breeds show approximately 20-25% aurochs ancestry". Can the stable ancestry fraction from 4000 years to the present (relative to the highly variable ancestry before) reflect of discontinuation of hunting rather than changes in herding practices?

      We agree that this statement was not justified here, we rephrased the sentence to “In fact, from the Bronze Age onwards, most estimates overlap with the approximately 25% aurochs ancestry in modern Iberian cattle” and generally tried to make the text more nuanced on the issue of herding and hunting practices.

      Reviewer #3 (Recommendations For The Authors):

      Thanks for this interesting piece of work. The results are clearly presented, and I have no additional concerns other than those reflected in the public report, except perhaps:

      (i) trying to use more informative sample names (eg. including the date and location). It may facilitate reading without going back and forth to the table "Sample List".

      We have now added a main table listing our post-Mesolithic samples together with their age, site and estimated aurochs ancestry proportions. We hope that his table makes it easier for readers to follow our sample IDs.

      (ii) Briefly describe in the main the age of aurochs and Sub1 not generated in this study.

      Fixed.

    1. Plasmid, as a mobile genetic element, plays a pivotal role in facilitating the transfer of traits, such as antimicrobial resistance, among the bacterial community. Annotating plasmid-encoded proteins with the widely used Gene Ontology (GO) vocabulary is a fundamental step in various tasks, including plasmid mobility classification. However, GO prediction for plasmid-encoded proteins faces two major challenges: the high diversity of functions and the limited availability of high-quality GO annotations. Thus, we introduce PlasGO, a tool that leverages a hierarchical architecture to predict GO terms for plasmid proteins. PlasGO utilizes a powerful protein language model to learn the local context within protein sentences and a BERT model to capture the global context within plasmid sentences. Additionally, PlasGO allows users to control the precision by incorporating a self-attention confidence weighting mechanism. We rigorously evaluated PlasGO and benchmarked it against six state-of-the-art tools in a series of experiments. The experimental results collectively demonstrate that PlasGO has achieved commendable performance. PlasGO significantly expanded the annotations of the plasmid-encoded protein database by assigning high-confidence GO terms to over 95% of previously unannotated proteins, showcasing impressive precision of 0.8229, 0.7941, and 0.8870 for the three GO categories, respectively, as measured on the novel protein test set.

      This work has been peer reviewed in GigaScience (see paper), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer name: **Nguyen Quoc Khanh Le **

      Review content: 1. The manuscript introduces PlasGO, which leverages a hierarchical architecture for GO term prediction in plasmid-encoded proteins. However, the novelty of the approach could be questioned. While the combination of protein language models and BERT for GO prediction is innovative, similar methods have been applied in other contexts. 2. The study heavily relies on data from the RefSeq database, yet there is limited discussion on the quality and completeness of this data. The manuscript should address potential biases introduced by incomplete or incorrect GO annotations in the database. Moreover, the study uses protein sequences of up to 1K length, which might exclude relevant longer sequences, potentially limiting the model's applicability to all plasmid-encoded proteins. 3. The manuscript claims that PlasGO can generalize well to novel proteins, but this claim is based on a specific dataset. The model's generalizability should be tested on more diverse and independent datasets, including plasmids from different bacterial species or environmental contexts. 4. While the model's performance is quantitatively evaluated, the interpretability of the results remains unclear. The study should include an analysis of how well the model's predictions align with known biological functions and pathways. Additionally, it would be helpful to include examples where PlasGO provides novel insights that other models do not, thereby demonstrating its practical utility. 5. The manuscript does not provide detailed information on the computational resources required to train and run PlasGO. Given the complexity of the model, there are potential concerns about its scalability, particularly for larger plasmid datasets or in settings with limited computational power. 6. The manuscript compares PlasGO with several state-ofthe-art tools, but the comparison might not be fully exhaustive. Additionally, statistical significance tests for performance differences should be provided to support the comparative analysis. 7. Language models have been used in previous bioinformatics studies i.e., PMID: 37381841, PMID: 38636332. Therefore, the authors are suggested to refer to more works in this description to attract a broader readership. 8. The study should discuss any ethical considerations related to the use of public datasets, particularly regarding data privacy and consent if any sensitive data is involved. Furthermore, the potential commercial implications of the PlasGO tool, especially if it is used for proprietary research, should be addressed. 9. While the manuscript mentions that PlasGO's code will be made available, it is crucial to ensure that all aspects of the research are fully reproducible. 10. The hierarchical architecture and the use of extensive training data might lead to overfitting, especially given the high dimensionality of the input features. The manuscript should discuss the measures taken to prevent overfitting, such as regularization techniques, dropout, or cross-validation strategies. 11. The study could benefit from a more detailed discussion on the practical implications of using PlasGO in real-world plasmid research. How can this tool be integrated into existing workflows for plasmid function prediction? What are the potential limitations in practical applications?

    1. these communities are trying to sift through the layers of the world to see what else might have been left behind at the code level by the developers during the making of the game.

      intertextual (esp wrt code); marginalia; SKAM — transmedia emergent storytelling

    1. Reviewer #1 (Public review):

      Summary:

      This work aims at understanding the role of thalamus POm in dorsal lateral striatum (DLS) projection in learning a sensorimotor associative task. The authors first confirm that POm forms "en passant" synapses with some of the DLS neuronal subtypes. They then perform a go/no-go associative task that consists of the mouse learning to discriminate between two different textures and to associate one of them with an action. During this task they either record the activity of the POm to DLS axons using endoscopy or silence their activity. They report that POm axons in the DLS are activated around the sensory stimulus but that the activity is not modulated by the reward. Last, they showed that silencing the POm axons at the level of DLS slows down learning the task.

      The authors show convincing evidence of projections from POm to DLS and that POm inputs to DLS code for whisking whatever the outcome of the task is. However, their results do not allow to conclude if more neurones are recruited during the learning process or if the already activated fibres get activated more strongly. Last, because POm fibres in the DLS are also projecting to S1, silencing the POm fibres in the DLS could have affected inputs in S1 as well and therefore, the slowdown in acquiring the task is not necessarily specific to the POm to DLS pathway.

      Strengths:

      One of the main strengths of the paper is to go from slice electrophysiology to behaviour to get an in-depth characterization of one pathway. The authors did a comprehensive description of the POm projections to the DLS using transgenic mice to unambiguously identify the DLS neuronal population. They also used a carefully designed sensorimotor association task, and they exploited the results in depth.

      It is a very nice effort to have measured the activity of the axons in the DLS not only after the mice have learned the task but throughout the learning process. It shows the progressive increase of activity of POm axons in the DLS, which could imply that there is a progressive strengthening of the pathway. The results show convincingly that POm axons in the DLS are not activated by the outcome of the task but by the whisker activity, and that this activity in average increases with learning.

      Weaknesses:

      One of the main targets of the striatum from thalamic input are the cholinergic neurons that weren't investigated here, is there information that could be provided?

      It is interesting to know that the POm projects to all neuronal types in the DLS, but this information is not used further down the manuscript so the only take-home message of Figure 1 is that the axons that they image or silence in the DLS are indeed connected to DLS neurons and not just passing fibres. In this line, are these axons the same as the ones projecting to S1? If this is the case, why would we expect a different behaviour of the axon activity at the DLS level compared to S1?

      The authors used endoscopy to measure the POm axons in the DLS activity, which makes it impossible to know if the progressive increase of POm response is due to an increase of activity from each individual neurons or if new neurons are progressively recruited in the process.

      The picture presented in Figure 4 of the stimulation site is slightly concerning as there are hardly any fibres in neocortical layer 1 while there seems to be quite a lot of them in layer 4, suggesting that the animal here was injected in the VB. This is especially striking as the implantation and projection sites presented in Figure 1 and 2 are very clean and consistent with POm injection.

      Comment after review: The weaknesses remain as concerns have not been addressed. The dataset is interesting but the interpretation, due partly to the lack of control (especially relative to VPM contamination), is difficult.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work aims to understand the role of thalamus POm in dorsal lateral striatum (DLS) projection in learning a sensorimotor associative task. The authors first confirm that POm forms "en passant" synapses with some of the DLS neuronal subtypes. They then perform a go/no-go associative task that consists of the mouse learning to discriminate between two different textures and to associate one of them with an action. During this task, they either record the activity of the POm to DLS axons using endoscopy or silence their activity. They report that POm axons in the DLS are activated around the sensory stimulus but that the activity is not modulated by the reward. Last, they showed that silencing the POm axons at the level of DLS slows down learning the task.

      The authors show convincing evidence of projections from POm to DLS and that POm inputs to DLS code for whisking whatever the outcome of the task is. However, their results do not allow us to conclude if more neurons are recruited during the learning process or if the already activated fibres get activated more strongly. Last, because POm fibres in the DLS are also projecting to S1, silencing the POm fibres in the DLS could have affected inputs in S1 as well and therefore, the slowdown in acquiring the task is not necessarily specific to the POm to DLS pathway.

      We thank the reviewer for these constructive comments. The points are addressed below.  

      Strengths:

      One of the main strengths of the paper is to go from slice electrophysiology to behaviour to get an in-depth characterization of one pathway. The authors did a comprehensive description of the POm projections to the DLS using transgenic mice to unambiguously identify the DLS neuronal population. They also used a carefully designed sensorimotor association task, and they exploited the results in depth.

      It is a very nice effort to have measured the activity of the axons in the DLS not only after the mice have learned the task but throughout the learning process. It shows the progressive increase of activity of POm axons in the DLS, which could imply that there is a progressive strengthening of the pathway. The results show convincingly that POm axons in the DLS are not activated by the outcome of the task but by the whisker activity, and that this activity on average increases with learning.

      Weaknesses:

      One of the main targets of the striatum from thalamic input are the cholinergic neurons that weren't investigated here, is there information that could be provided?

      This is true of the parafascicular (Pf) thalamic nucleus, which has been well studied in this context. However, there is much less known about the striatal projections of other thalamic nuclei, including POm, and their inputs to cholinergic neurons. Anatomical tracing evidence from Klug et al. (2018), which mapped brain-wide inputs to striatal cholinergic (ChAT) interneurons, suggests that Pf provides the majority of thalamic innervation of striatal ChAT neurons compared to other thalamic nuclei. Many other thalamic nuclei, including POm, showed very little of no labeling, suggesting weak innervation of ChAT interneurons. However, it is possible that these thalamic nuclei, including POm, do provide functional innervation of ChAT interneurons that is not sufficiently assessed by anatomical tracing. Understanding the innervation patterns of POm-striatal projections beyond the three cell types we have studied here would be an important area of further study.

      It is interesting to know that the POm projects to all neuronal types in the DLS, but this information is not used further down the manuscript so the only take-home message of Figure 1 is that the axons that they image or silence in the DLS are indeed connected to DLS neurons and not just passing fibres. In this line, are these axons the same as the ones projecting to S1? If this is the case, why would we expect a different behaviour of the axon activity at the DLS level compared to S1?

      Tracing of single POm axons by Ohno et al. (2012) indicated that POm axons form a branched collateral that innervates striatum, while the main axon continues in the rostral-dorsal direction to innervate cortex. We think it is reasonable, based on the morphology, that our optogenetic suppression experiment restricted the suppression of glutamate release to this branch and avoided the other branches of the axon that project to cortex. However, testing this would require monitoring S1 activity during the POm-striatal axon suppression, which we did not do in this study.

      It is a very interesting question whether there could be different axon activity behavior in striatum versus S1. There is surprising evidence that POm synaptic terminals are different sizes in S1 and M1 and show different synaptic physiological properties depending on these cortical projection targets (Casas-Torremocha et al., 2022). Based on this, it is possible that POm-striatal synapses show distinct properties compared to cortex; however, this will need to be tested in future work.

      The authors used endoscopy to measure the POm axons in the DLS activity, which makes it impossible to know if the progressive increase of POm response is due to an increase of activity from each individual neuron or if new neurons are progressively recruited in the process.

      This is a good point. It would be necessary to perform chronic two-photon imaging of POm neurons (or chronic electrophysiological recordings) to determine whether the activity of individual neurons increased versus whether individual neuron activity levels remained similar but new neurons became active with learning. Even under baseline conditions, it is not known in detail what fraction of the population of POm neurons is active during sensory processing or behavior, highlighting how much is still to be discovered in this exciting area of neuroscience.

      The picture presented in Figure 4 of the stimulation site is slightly concerning as there are hardly any fibres in neocortical layer 1 while there seems to be quite a lot of them in layer 4, suggesting that the animal here was injected in the VB. This is especially striking as the implantation and projection sites presented in Figures 1 and 2 are very clean and consistent with POm injection.

      Although this image was selected to demonstrate the position of the POm injection site and optical fiber implant above striatal axons, the reviewer is correct that there appears to be mixed labeling of axons in L4 and L5a. In some cases, there was expression slightly outside the border of POm (see Fig. 1B, right), which might explain the cortical innervation pattern in this figure. While cortically bound VPM axons pass through the striatum, they do not form synaptic terminals until reaching the cortex (Hunnicutt et al., 2016). If, as may be the case, inhibitory opsins suppress release of neurotransmitter at synaptic terminals more effectively than action potential propagation in axons, it may be likely that optogenetic suppression of POm-striatal terminals is more effective than suppression of action potentials in off-target-labelled VPM axons of passage. Ideally, we could compare effects of suppression of POm-striatal synapses with POm-cortical synapses and VPM-cortical synapses, but this was outside the bandwidth of the present study.

      Reviewer #2 (Public Review):

      Summary:

      Yonk and colleagues show that the posterior medial thalamus (POm), which is interconnected with sensory and motor systems, projects directly to major categories of neurons in the striatum, including direct and indirect pathway MSNs, and PV interneurons. Activity in POm-striatal neurons during a sensory-based learning task indicates a relationship between reward expectation and arousal. Inhibition of these neurons slows reaction to stimuli and overall learning. This circuit is positioned to feed salient event activation to the striatum to set the stage for effective learning and action selection.

      Strengths:

      The results are well presented and offer interesting insight into an understudied thalamostriatal circuit. In general, this work is important as part of a general need for an increased understanding of thalamostriatal circuits in complex learning and action selection processes, which have generally received less attention than corticostriatal systems.

      Weaknesses:

      There could be a stronger connection between the connectivity part of the data - showing that POm neurons context D1, D2, and PV neurons in the striatum but with some different properties - and the functional side of the project. One wonders whether the POm neurons projecting to these subtypes or striatal neurons have unique signaling properties related to learning, or if there is a uniform, bulk signal sent to the striatum. This is not a weakness per se, as it's reasonable for these questions to be answered in future papers.

      We are very interested to understand the potentially distinct learning-related synaptic and circuit changes that potentially occur at the POm synapses with D1- and D2-SPNs and PV interneurons, and other striatal cell types. We agree that this would be an important topic for further investigation.

      All the in vivo activity-related conclusions stem from data from just 5 mice, which is a relatively small sample set. Optogenetic groups are also on the small side.

      We appreciate this point and agree that higher N can be important for observing robust effects. A factor of our experiments that helped reduce the number of animals used was the longitudinal design, with repeated measures in the same subjects. This allowed for the internal control of comparing learning effects in the same subject from naïve to expert stages and therefore increased robustness. Even with relatively small group sizes, results were statistically significant, suggesting that the use of more mice was unnecessary, which we considered consistent with best practice in the use of animals in research. We also note that our group sizes were consistent with other studies in the field.  

      Reviewer #3 (Public Review):

      Yonk and colleagues investigate the role of the thalamostriatal pathway. Specifically, they studied the interaction of the posterior thalamic nucleus (PO) and the dorsolateral striatum in the mouse. First, they characterize connectivity by recording DLS neurons in in-vitro slices and optogenetically activating PO terminals. PO is observed to establish depressing synapses onto D1 and D2 spiny neurons as well as PV neurons. Second, the image PO axons are imaged by fiber photometry in mice trained to discriminate textures. Initially, no trial-locked activity is observed, but as the mice learn PO develops responses timed to the audio cue that marks the start of the trial and precedes touch. PO does appear to encode the tactile stimulus type or outcome. Optogenetic suppression of PO terminals in striatum slow task acquisition. The authors conclude that PO provides a "behaviorally relevant arousal-related signal" and that this signal "primes" striatal circuitry for sensory processing.

      A great strength of this paper is its timeliness. Thalamostriatal processing has received almost no attention in the past, and the field has become very interested in the possible functions of PO. Additionally, the experiments exploit multiple cutting-edge techniques.

      There seem to be some technical/analytical weaknesses. The in vitro experiments appear to have some contamination of nearby thalamic nuclei by the virus delivering the opsin, which could change the interpretation. Some of the statistical analyses of these data also appear inappropriate. The correlative analysis of Pom activity in vivo, licking, and pupil could be more convincingly done.

      The bigger weakness is conceptual - why should striatal circuitry need "priming" by the thalamus in order to process sensory stimuli? Why would such circuitry even be necessary? Why is a sensory signal from the cortex insufficient? Why should the animal more slowly learn the task? How does this fit with existing ideas of striatal plasticity? It is unclear from the experiments that the thalamostriatal pathway exists for priming sensory processing. In fact, the optogenetic suppression of the thalamostriatal pathway seems to speak against that idea.

      We thank the reviewer for these constructive comments. The points are addressed below.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Do POm neurons innervate CINs also? The connection between the PF thalamus and CINs is mentioned in a couple of places - one question is how unique are the input patterns for the POm versus adjacent sensorimotor thalamic regions, including the PF? This isn't a weakness per se but knowing the answer to that question would help in forming a more complete picture of how these different thalamostriatal circuits do or do not contribute uniquely to learning and action selection.

      Anatomical tracing evidence from Klug et al. (2018), which mapped brain-wide inputs to striatal cholinergic (ChAT) interneurons, suggests that Pf provides the majority of thalamic innervation of striatal ChAT neurons compared to other thalamic nuclei. Many other thalamic nuclei, including POm, showed very little or no labeling, suggesting weak innervation of ChAT interneurons. However, it is possible that these thalamic nuclei, including POm, do provide functional innervation of ChAT interneurons that is not sufficiently assessed by anatomical tracing.

      Another difference between Pf and other thalamic nuclei (likely including POm) comes from anatomical tracing evidence (Smith et al., 2014; PMID: 24523677) which indicates that Pf inputs form the majority of their synapses onto dendritic shafts of SPNs, while other thalamic nuclei form synapses onto dendritic spines. Understanding the innervation patterns of POm-striatal projections beyond the three cell types we have studied here, including ChAT neurons and subcellular localization, would be an important area of further study.

      It would be useful to know to what extent these POm-striatum neurons are activated generally during movement, versus this discrimination task specifically.

      We agree that distinguishing general movement-related activity from task-specific activity would be very useful. Earlier work (Petty et al., 2021) showed a close relationship between POm neuron activity, spontaneous (task-free) whisker movements, and pupil-indexed arousal in head-restrained mice. Oram et al. (2024; PMID: 39003286) recently recorded VPM and POm in freely moving mice during natural movements, finding that activity of both nuclei correlated with head and whisker movements. These studies indicate that POm is generally coactive with exploratory head and whisker movements.

      During task performance, the situation may change with training and attentional effects. For example, Petty and Bruno (2024) (https://elifesciences.org/reviewed-preprints/97188) showed that POm activity correlates more closely with task demands than tactile or visual stimulus modality. Our data indicate that POm axonal signals are increased at trial start during anticipation of tactile stimulus delivery and through the sensory discrimination period, then decrease to baseline levels during licking and water reward collection (Fig. 3). Results of Petty and Bruno (2024) together with ours suggest that POm is particularly active during the context of behaviorally relevant task performance. Thus, we think it is likely that, while pupil dilation indexes general movement and arousal, POm activity is more specific to movement and arousal associated with task engagement and behavioral performance. We have strengthened this point in the Discussion.

      Many of the data panels and text for legends/axes are quite small, and the stroke on line art is quite faint - overall figures could be improved from a readability standpoint.

      We thank the reviewer for their careful attention to the figures. 

      Reviewer #3 (Recommendations For The Authors):

      Major

      (1) Page 4, the Results regarding PSP and distance from injection site. The r-squared is the wrong thing to look at to test for a relationship. One should look at the p-value on the coefficient corresponding to the slope. The p-value is probably significant given the figures, in which case there may be a relationship contrary to what is stated. All the low r-squared value says is that, if there is a relationship, it does not explain a lot of the PSP variability.

      We thank the reviewer for alerting us this oversight. We have included the p value (p = 0.0293) in the figure and legend, and indicated that the relationship is “small but significant”.

      (2) Figure 1B suggests that the virus injections extend beyond POm and into other thalamic structures. Do any of the results change if the injections contaminating other nuclei are excluded from the analysis? I am not suggesting the authors change the figures/analyses. I am simply suggesting they double-check.

      We selected for injections that were predominantly expressing in POm as determined by post-hoc histological analysis (see Fig. 1, right). As above, we think that axons of passage that do not form striatal synapses are less likely to be suppressed than axons with terminals; however, this would need to be determined in further experiments. Because the preponderance of expression is within POm, we think the results would be similar even with a stricter selection criterion. 

      (3) The authors conclude that POm and licking are not correlated (bottom of page 6 pertaining to Figures 3A-F). The danger of these analyses is that they assume that GCaMP8 is a perfect linear reporter of POm spikes. The reliability of GCaMP8 has been quantified in some cell types, but not thalamic neurons, which have relatively higher firing rates.

      The reviewer is correct that the relationship between GCaMP8 fluorescence changes and spiking has not been sufficiently characterized in thalamic neurons, and that this would be important to do.

      What if the indicator is simply saturated late into the trial (after the average reaction time)? It would look like there is no response and one would conclude no correlation, but there could be a very strong correlation.

      While saturation is worthy of concern, the signal dynamics here argue against this possibility. The reason is that the signal increased in the early part of the trial and decreased by the end. If saturation was an issue, this would have been apparent during the initial increase. When the signal decreased in amplitude at the end of the trial, this indicates that the signal is not saturated because it is returning from a point closer to its maximum (and is becoming less saturated).

      Also, what happens between trials? Are the correlations the same, stronger, weaker? Ideally, the authors would analyze the data during and between trials.

      Between trials the signal did not show further changes in baseline beyond what was displayed at the start and end of behavioral trials. There were no consistent increases or decreases in signals between trials, except perhaps during strong whisking bouts. This is anecdotal because we did not analyze between-trial data. However, it is interesting and important to note that signals increased dramatically in amplitude from naïve, early learning to expert behavioral performance (Fig. 3), highlighting that POm-axonal signals relate to behavioral engagement and performance rather than spontaneous behaviors.  

      (4) Axonal activity could also appear more correlated with the pupil than licking because pupil dynamics are slow like the dynamics of calcium indicators. These kernels could artificially inflate the correlation. Ideally, the authors could consider these temporal effects. Perhaps they could deconvolve the temporal profiles of calcium and pupil before correlating? Or equivalently incorporate the profiles into their analysis?

      We analyzed the lick probability histograms, which had a temporal profile similar to the calcium signals (Fig. 3D,E), ruling out concerns about effects of temporal effects on correlations. It is also worth noting that we observed changes in correlations between calcium signals and pupil with learning stage (Fig. 3I), even though the temporal profiles (signal dynamics) are not changing. Thus, temporal effects of the signals themselves are not the driver of correlations, but rather the changes in relative timing between calcium signals and pupil, as occur with learning.

      (5) The authors conclude that PO provides a "behaviorally relevant arousal-related signal" and that this signal "primes" striatal circuitry for sensory processing. The data here support the first part. It is not clear that the data support the second part, largely because it is vague what "priming" of sensory processing or "a key role in the initial stages of action selection (p.9) even means here. Why would such circuitry even be necessary? Why is a sensory signal from the cortex insufficient? Why should the animal more slowly learn the task? How does this fit with existing ideas of striatal plasticity? Some conceptual proposals from the authors, even if speculative and not offered as a conclusion, would be helpful.

      We appreciate these good points and have added further consideration and revision of the concept of priming and potential roles in an extensively revised Discussion section.

      (6) The photometry shows that PO turns on about 2 seconds before the texture presentation. PO's activity seems locked to the auditory cue, not the texture (Figure 2). This means that the attempt to suppress the thalamostriatal pathway with JAWS (Figure 4) is rather late, isn't it? Some PO signals surely go through. This seems to contradict the idea of priming above. It would be good if the authors could factor this into their narrative. Perhaps labelling the time of the auditory cue in Figure 4C would also be helpful.

      The start of texture presentation (movement of the texture panel toward the mouse) and auditory cue occur at the same time. To clarify this, we added a label “start tone” in Figure 4C and also in Figure 2C.

      For optogenetic (JAWS) suppression, we intentionally chose a time window between start tone onset and texture presentation, because our photometry experiments showed that this was when the preponderance of the signal occurred. However, the reviewer is correct that our chosen optogenetic suppression (JAWS) onset occurs shortly after the photometry signal has already started, potentially leaving the early photometry signal un-suppressed. Our motivation for choosing a restricted time window surrounding the texture presentation time was 1) to minimize illumination and potential heating of brain tissue; 2) to target a time window that avoids the auditory cue but covers stimulus presentation. We did not want to extend the duration of the suppression to before the trial started, because this could produce task-non-specific effects, such as distraction or loss of attention before the start of the trial.

      Even if some signal were getting through before suppression, we don’t think this contradicts the possibility of ‘priming’, because the process underlying priming would still be disrupted even if not totally suppressed. This would alter the temporal relationship between POm-striatal inputs and further corticostriatal inputs (from S1 and M1 cortex, for example). We have included further consideration of these points and possible relation to the priming concept in the Discussion.

      Minor

      (1) Page 5, "the sensitivity metric is artificially increased". What do you mean "artificially"? The mice are discriminating better. It is true that either a change in HR or FAR can cause the sensitivity metric to change, but there is nothing artificial or misleading about this.

      We removed the word artificial and clarified our definition of behaviorally Expert in this context:

      “Mice were considered Expert once they had reached ≥ 0.80 Hit Rate and ≤ 0.30 FA Rate for two consecutive sessions in lieu of a strict sensitivity (d’) threshold; we found this definition more intuitive because d’ is enhanced as Hit Rate and FA Rate approach their extremes (0 or 1)”

      (2) Page 7, "Upon segmentation (Figure S4G-J)". Do you mean "segregation by trial outcome"?

      Corrected.

      (3) Page 9, "POm projections may have discrete target-specific functions, such that POm-striatal inputs may play a distinct role in sensorimotor behavior compared to POm-cortical inputs". Would POm-cortical inputs not also be sensorimotor? The somatosensory cortex contains a lot of corticostriatal cells. It also has various direct and indirect links to the motor cortex as well.

      We have clarified the wording here to convey the possibility that POm signals could be received and processed differently by striatal versus cortical circuitry, and have moved this statement to later in the discussion for better elaboration.

      (4) The Methods state that male and female mice were used. Why not say how many of each and whether or not there are any sex-specific differences?

      We added the following information to the Methods:

      The number of male and female mice were as follows, by experiment type: 6 male, 4 female (electrophysiology); 3 male, 2 female (fiber photometry); 4 male, 5 female (optogenetics). Data were not analyzed for sex differences.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1:

      This study is one in a series of excellent papers by the Forstmann group focusing on the ability of fMRI to reliably detect activity in small subcortical nuclei - in this case, specifically those purportedly involved in the hyper- and indirect inhibitory basal ganglia pathways. I have been very fond of this work for a long time, beginning with the demonstration of De Hollander, Forstmann et al. (HBM 2017) of the fact that 3T fMRI imaging (as well as many 7T imaging sequences) do not afford sufficient signal to noise ratio to reliably image these small subcortical nuclei. This work has done a lot to reshape my view of seminal past studies of subcortical activity during inhibitory control, including some that have several thousand citations.

      Comments on revised version:

      This is my second review of this article, now entitled "Multi-study fMRI outlooks on subcortical BOLD responses in the stop-signal paradigm" by Isherwood and colleagues.

      The authors have been very responsive to the initial round of reviews.

      I still think it would be helpful to see a combined investigation of the available 7T data, just to really drive the point home that even with the best parameters and a multi-study sample size, fMRI cannot detect any increases in BOLD activity on successful stop compared to go trials. However, I agree with the authors that these "sub samples still lack the temporal resolution seemingly required for looking at the processes in the SST." As such, I don't have any more feedback.

      We thank the reviewer for their positive feedback, and for their thorough and constructive comments on our initial submission. 

      Reviewer 2:

      This work aggregates data across 5 openly available stopping studies (3 at 7 tesla and 2 at 3 tesla) to evaluate activity patterns across the common contrasts of Failed Stop (FS) > Go, FS > stop success (SS), and SS > Go. Previous work has implicated a set of regions that tend to be positively active in one or more of these contrasts, including the bilateral inferior frontal gyrus, preSMA, and multiple basal ganglia structures. However, the authors argue that upon closer examination, many previous papers have not found subcortical structures to be more active on SS than FS trials, bringing into question whether they play an essential role in (successful) inhibition. In order to evaluate this with more data and power, the authors aggregate across five datasets and find many areas that are *more* active for FS than SS, including bilateral preSMA, GPE, thalamus, and VTA. They argue that this brings into question the role of these areas in inhibition, based upon the assumption that areas involved in inhibition should be more active on successful stop than failed stop trials, not the opposite as they observed.

      Since the initial submission, the authors have improved their theoretical synthesis and changed their SSRT calculation method to the more appropriate integration method with replacement for go omissions. They have also done a better job of explaining how these fMRI results situate within the broader response inhibition literature including work using other neuroscience methods.

      They have also included a new Bayes Factor analysis. In the process of evaluating this new analysis, I recognized the following comments that I believe justify additional analyses and discussion:

      First, if I understand the author's pipeline, for the ROI analyses it is not appropriate to run FSL's FILM method on the data that were generated by repeating the same time series across all voxels of an ROI. FSL's FILM uses neighboring voxels in parts of the estimation to stabilize temporal correlation and variance estimates and was intended and evaluated for use on voxelwise data. Instead, I believe it would be more appropriate to average the level 1 contrast estimates over the voxels of each ROI to serve as the dependent variables in the ROI analysis.

      We agree with the reviewer’s assertion that this approach could create estimation problems. However, in this instance, we turned off the spatial smoothing procedure that FSL’s FILM normally uses for estimating the amount of autocorrelation – thus, the autocorrelation was estimated based on each voxel’s timeseries individually. We also confirmed that all voxels within each ROI had identical statistics, which would not be the case if the autocorrelation estimates differed per voxel. We have added the following text to the Methods section under fMRI analysis: ROI-wise:

      Note that the standard implementation of FSL FILM uses a spatial smoothing procedure prior to estimating temporal autocorrelations which is suitable for use only on voxelwise data (Woolrich et al., 2001). We therefore turned this spatial smoothing procedure off and instead estimated autocorrelation using each voxel’s individual timeseries.

      Second, for the group-level ROI analyses there seems to be inconsistencies when comparing the z-statistics (Figure 3) to the Bayes Factors (Figure 4) in that very similar zstatistics have very different Bayes Factors within the same contrast across different brain areas, which seemed surprising (e.g., a z of 6.64 has a BF of .858 while another with a z of 6.76 has a BF of 3.18). The authors do briefly discuss some instances in the frequentist and Bayesian results differ, but they do not ever explain by similar z-stats yield very different bayes factors for a given contrast across different brain areas. I believe a discussion of this would be useful.

      We thank the reviewer for their keen observation, and agree that this is indeed a strange inconsistency. Upon reviewing this issue, we came across an error in our analysis pipeline, which led to inconsistent scaling of the parameter estimates between datasets. We corrected this error, and included new tables (Figures 3, 4, and Supplementary Figure 5) which now show improved correspondence between the frequentist results from FSL and the Bayesian results.

      We have updated the text of the Results section accordingly. In this revision, we have also updated all BFs to be expressed in log<sub>10</sub> form, to ensure consistency for the reader. Updates to the manuscript are given below.

      Results: Behavioural Analyses:

      Consistent with the assumptions of the standard horse-race model (Logan & Cowan, 1984), the median failed stop RT is significantly faster within all datasets than the median go RT (Aron_3T: p < .001, BF<sub>log10</sub> = 2.77; Poldrack_3T: p < .001, BF<sub>log10</sub> = 23.49; deHollander_7T: p < .001, B BF<sub>log10</sub> = 8.88; Isherwood_7T: p < .001, BF<sub>log10</sub> = 2.95; Miletic_7T: p = .0019, BF<sub>log10</sub> = 1.35). Mean SSRTs were calculated using the integration method and are all within normal range across the datasets.

      Results: ROI-wise GLMS: 

      To further statistically compare the functional results between datasets, we then fit a set of GLMs using the canonical HRF with a temporal derivative to the timeseries extracted from each ROI. Below we show the results of the group-level ROI analyses over all datasets using z-scores (Fig. 3) and log-transformed Bayes Factors (BF; Fig. 4). Note that these values were time-locked to the onset of the go signal. See Supplementary Figure 5 for analyses where the FS and SS trials were time-locked to the onset of the stop signal. To account for multiple comparisons, threshold values were set using the FDR method for the frequentist analyses. 

      For the FS > GO contrast, the frequentist analysis found significant positive z-scores in all regions bar left and right M1, and the left GPi. The right M1 showed a significant negative z-score; left M1 and GPi showed no significant effect in this contrast. The BFs showed moderate or greater evidence for the alternative hypothesis in bilateral IFG, preSMA, caudate, STN, Tha, and VTA, and right GPe. Bilateral M1 and left GPi showed moderate evidence for the null. Evidence for other ROIs was anecdotal (see Fig 4). 

      For the FS > SS contrast, we found significant positive z-scores in in all regions except the left GPi. The BFs showed moderate or greater evidence for right IFG, right GPi, and bilateral M1, preSMA, Tha, and VTA, and moderate evidence for the null in left GPi. Evidence for other ROIs was anecdotal (see Fig 4). 

      For the SS > GO contrast we found a significant positive z-scores in bilateral IFG, right Tha, and right VTA, and significant negative z-scores in bilateral M1, left GPe, right GPi, and bilateral putamen. The BFs showed moderate or greater evidence for the alternative hypothesis in bilateral M1 and right IFG, and moderate or greater evidence for the null in left preSMA, bilateral caudate, bilateral GPe, left GPi, bilateral putamen, and bilateral SN. Evidence for other ROIs was anecdotal (see Fig 4). 

      Although the frequentist and Bayesian analyses are mostly in line with one another, there were also some differences, particularly in the contrasts with FS. In the FS > GO contrast, the interpretation of the GPi, GPe, putamen, and SN differ. The frequentist models suggests significantly increased activation for these regions (except left GPi) in FS trials. In the Bayesian model, this evidence was found to be anecdotal in the SN and right GPi, and moderate in the right GPe, while finding anecdotal or moderate evidence for the null hypothesis in the left GPe, left GPi, and putamen. For the FS > SS contrast, the frequentist analysis showed significant activation in all regions except for the left GPi, whereas the Bayesian analysis found this evidence to be only anecdotal, or in favour of the null for a large number of regions (see Fig 4 for details).  

      Since the Bayes Factor analysis appears to be based on repeated measures ANOVA and the z-statistics are from Flame1+2, the BayesFactor analysis model does not pair with the frequentist analysis model very cleanly. To facilitate comparison, I would recommend that the same repeated measures ANOVA model should be used in both cases. My reading of the literature is that there is no need to be concerned about any benefits of using Flame being lost, since heteroscedasticity does not impact type I errors and will only potentially impact power.

      We agree with the reviewer that there are differences between the two analyses. The advantage of the z-statistics from FSL’s flame 1+2 is that these are based on a multi-level model in which measurement error in the first level (i.e., subject level) is taken into account in the group-level analysis. This is an advantage especially in the current paper since the datasets differ strongly in the degree of measurement error, both due to the differences in field strength and in the number of trials (and volumes). Although multilevel Bayesian approaches exist, none (except by use of custom code) allow for convolution with the HRF of a design matrix like typical MRI analyses. Thus, we extracted the participant-level parameter estimates (converted to percent signal change), and only estimated the dataset and group level parameters with the BayesFactor package. As such, this approach effectively ignores measurement error. However, despite these differences in the analyses, the general conclusions from the Bayesian and frequentist analyses are very aligned after we corrected for the error described above. The Bayesian results are more conservative, which can be explained by the unfiltered participantlevel measurement error increasing the uncertainty of the group-level parameter estimates. At worst, the BFs represent the lower bounds of the true effect, and are thus safe to interpret. 

      We have also included an additional figure (Supplementary Figure 7) that shows the correspondence between the BFs and the z scores. 

      Though frequentist statistics suggest that many basal ganglia structures are significantly more active in the FS > SS contrast (see 2nd row of Figure 3), the Bayesian analyses are much more equivocal, with no basal ganglia areas showing Log10BF > 1 (which would be indicative of strong evidence). The authors suggest that "the frequentist and Bayesian analyses are monst in line with one another", but in my view, this frequentist vs. Bayesian analysis for the FS > SS contrast seems to suggest substantially different conclusions. More specifically, the frequentist analyses suggest greater activity in FS than SS in most basal ganglia ROIs (all but 2), but the Bayesian analysis did not find *any* basal ganglia ROIs with strong evidence for the alternative hypothesis (or a difference), and several with more evidence for the null than the alternative hypothesis. This difference between the frequentist and Bayesian analyses seems to warrant discussion, but unless I overlooked it, the Bayesian analyses are not mentioned in the Discussion at all. In my view, the frequentist analyses are treated as the results, and the Bayesian analyses were largely ignored.

      The original manuscript only used frequentist statistics to assess the results, and then added Bayesian analyses later in response to a reviewer comment. We agree that the revised discussion did not consider the Bayesian results in enough detail, and have updated the manuscript throughout to more thoroughly incorporate the Bayesian analyses and improve overall readability. 

      In the Methods section, we have updated the fMRI analysis – general linear models (GLMs): ROIwise GLMs section to more thoroughly incorporate the Bayesian analyses as follows: 

      We compared the full model (H1) comprising trial type, dataset and subject as predictors to the null model (H0) comprising only the dataset and subject as predictor. Datasets and subjects were modeled as random factors in both cases. Since effect sizes in fMRI analyses are typically small, we set the scaling parameter on the effect size prior for fixed effects to 0.25, instead of the default of 0.5, which assumes medium effect sizes (note that the same qualitative conclusions would be reached with the default prior setting; Rouder et al., 2009). We divided the resultant BFs from the full model by the null model to provide evidence for or against a difference in beta weights for each trial type. To interpret the BFs, we used a modified version of Jeffreys’ scale (Andraszewicz et al., 2014; Jeffreys, 1939). To facilitate interpretation of the BFs, we converted them to the logarithmic scale. The approximate conversion between the interpretation of logarithmic BFs and standard interpretation on the adjusted Jeffreys’ scale can be found in Table 3.   

      The Bayesian results are also more incorporated into the Discussion as follows: 

      Evidence for the role of the basal ganglia in response inhibition comes from a multitude of studies citing significant activation of either the SN, STN or GPe during successful inhibition trials (Aron, 2007; Aron & Poldrack, 2006; Mallet et al., 2016; Nambu et al., 2002; Zhang & Iwaki, 2019). Here, we re-examined activation patterns in the subcortex across five different datasets, identifying differences in regional activation using both frequentist and Bayesian approaches. Broadly, the frequentist approach found significant differences between most ROIs in FS>GO and FS>SS contrasts, and limited differences in the SS>GO contrast. The Bayesian results were more conservative; while many of the ROIs showed moderate or strong evidence, some with small but significant z scores were considered only anecdotal by the Bayesian analysis. In our discussion, where the findings between analytical approaches differ, we focus mainly on the more conservative Bayesian analysis.

      Here, our multi-study results found limited evidence that the canonical inhibition pathways (the indirect and hyperdirect pathways) are recruited during successful response inhibition in the SST. We expected to find increased activation in the nodes of the indirect pathway (e.g., the preSMA, GPe, STN, SN, GPi, and thalamus) during successful stop compared to go or failed stop trials. We found strong evidence for activation pattern differences in the preSMA, thalamus, and right GPi between the two stop types (failed and successful), and limited evidence, or evidence in favour of the null hypothesis, in the other regions, such as the GPe, STN, and SN. However, we did find recruitment of subcortical nodes (VTA, thalamus, STN, and caudate), as well as preSMA and IFG activation during failed stop trials. We suggest that these results indicate that failing to inhibit one’s action is a larger driver of the utilisation of these nodes than action cancellation itself. 

      These results are in contention to many previous fMRI studies of the stop signal task as well as research using other measurement techniques such as local field potential recordings, direct subcortical stimulation, and animal studies, where activation of particularly the STN has consistently been observed (Alegre et al., 2013b; Aron & Poldrack, 2006; Benis et al., 2014; Fischer et al., 2017; Mancini et al., 2019; Wessel et al., 2016).

    1. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Espejo et al describe a method, SICKO, that allows for long-term longitudinal examination of bacterial colonization in the gut of C. elegans. SICKO utilizes a well-plate format where single worms are housed in each well with a small NGM pad surrounded by an aversive palmitic acid barrier to prevent worms from fleeing the well. The main benefit of this method is that it captures longitudinal data across individual worms with the ability to capture tens to hundreds of worms at once. The output data of SICKO in the heatmap is also very clear and robustly shows bacterial colonization in the gut across a large sample size, which is far superior to the current gold standard of imaging 10-20 worms in a cross-sectional matter at various timepoints of aging. They then provide a few examples of how this method can be applied to understand how colonization correlates with animal health.

      Strengths:

      -The method presented in this manuscript is sure to be of great utility to the host-pathogen field of C. elegans. The method also allows for utilization of large sample sizes and a way to present highly transparent data, both of which are excellent for promoting rigor and reproducibility of science.<br /> -The manuscript also does a great job in describing the limitations of the system, which is always appreciated.<br /> -The methods section for the SICKO data analysis pipeline and the availability of the code on Github are strong pluses.

      Weaknesses:

      -There are minor weaknesses in the methods that could be addressed relatively easily by expanding the explanation of how to set up the individual worm chambers (see comment 1 below).

      I am making all my comments and suggestions to the reviewers public, as I believe these comments can be useful to the general readership as well. Comment 1 is important to make the methods more accessible and comment 2 is important to make the data presentation more accessible to a broader audience. However, comments 3-4 are things/suggestions that should be considered by the authors and future users of SICKO for interpretation of all the data presented in the manuscript.

      (1) The methods section needs to be described in more detail. Considering that this is a methods development paper, more detailed explanation is required to ensure that readers can actually adapt these experiments into their labs.<br /> (a) What is the volume of lmNGM in each well?<br /> (b) Recommended volume of bacteria to seed in each well?<br /> (c) A file for the model for the custom printed 3D adaptor should be provided.<br /> (d) There should be a bit more detail on how the chambers should be assembled with all the components. After reading this, I am not sure I would be able to put the chamber together myself.<br /> (e) What is the recommended method to move worms into individual wells? Manual picking? Pipetting in a liquid?<br /> (f) Considering that a user-defined threshold is required (challenging for non-experienced users), example images should be provided on what an acceptable vs. nonacceptable threshold would look like.

      (2) The output data in 1e is very nice - it is a very nice and transparent plot, which I like a lot. However, since the data is complex, a supplemental figure to explain the data better would be useful to make it accessible for a broader audience. For example, highlighting a few rows (i.e., individual worms) and showing the raw image data for each row would be useful. What I mean is that it would be useful to show what does the worm actually look like for a "large colony size" or "small colony size"? What is the actual image of the worm that represents the yellow (large), versus dark blue (small), versus teal (in the middle)? And also the transition from dark blue to yellow would also be nice to be shown. This can probably also just be incorporated into Fig. 1d by just showing what color each of those worm images from day 1 to day 8 would represent in the heat map (although I still think a dedicated supplemental figure where you highlight a few rows and show matching pictures for each row in image files would be better).

      (3) I am not sure that doing a single-time point cross-sectional data is a fair comparison since several studies do multi-timepoint cross-sectional studies (e.g., day 1, day 5, day 9). This is especially true for using only day 1 data - most people do gut colonization assays at later timepoints since the gut barrier has been shown to break down at older ages, not day 1. The data collected by SICKO is done every day across many individuals worms and is clearly superior to this type of cross-sectional data (even with multiple timepoints), and I think this message would be further strengthened by comparing it directly to cross-sectional data collected across more than 1 timepoint of aging.

      (4) The authors show that SICKO can detect differences in wild-type vs. pmk-1 loss of function and between OP50 and PA14. However, these are very dramatic conditions that conventional methods can easily detect. I would think that the major benefit of SICKO over conventional methods is that it can detect subtle differences that cross-sectional methods would fail to visualize. It might be useful to see how well SICKO performs for these more subtle effects (e.g., OP50 on NGM vs. bacteria-promoting media; OP50 vs. HT115; etc.).<br /> (a) Similar to the above comment, the authors discuss how pmk-1 has colonization-independent effects on host-pathogen interactions. Maybe using a more direct approach to affect colonization (e.g., perturbing gut actin function like act-5) would be better.

  10. Dec 2024
    1. Author response:

      The following is the authors’ response to the previous reviews.

      We have carefully addressed all the reviewers' suggestions, and detailed responses are provided at the end of this letter. In summary:

      • We conducted two additional replicates of the study to obtain more robust and reliable data.

      • The Introduction has been revised for greater clarity and conciseness.

      • The Results section was shortened and reorganized to highlight the key findings more effectively.

      • The Discussion was modified according to the reviewers' suggestions, with a focus on reorganization and conciseness.

      We hope you find this revised version of the manuscript satisfactory.

      Reviewer #1 (Public Review):

      Summary:

      This study examines the role of host blood meal source, temperature, and photoperiod on the reproductive traits of Cx. quinquefasciatus, an important vector of numerous pathogens of medical importance. The host use pattern of Cx. quinquefasciatus is interesting in that it feeds on birds during spring and shifts to feeding on mammals towards fall. Various hypotheses have been proposed to explain the seasonal shift in host use in this species but have provided limited evidence. This study examines whether the shifting of host classes from birds to mammals towards autumn offers any reproductive advantages to Cx. quinquefasciatus in terms of enhanced fecundity, fertility, and hatchability of the offspring. The authors found no evidence of this, suggesting that alternate mechanisms may drive the seasonal shift in host use in Cx. quinquefasciatus.

      Strengths:

      Host blood meal source, temperature, and photoperiod were all examined together.

      Weaknesses:

      The study was conducted in laboratory conditions with a local population of Cx. quinquefasciatus from Argentina. I'm not sure if there is any evidence for a seasonal shift in the host use pattern in Cx. quinquefasciatus populations from the southern latitudes.

      Comments on the revision: 

      Overall, I am not quite convinced about the possible shift in host use in the Argentinian populations of Cx. quinquefasciatus. The evidence from the papers that the authors cite is not strong enough to derive this conclusion. Therefore, I think that the introduction and discussion parts where they talk about host shift in Cx. quinquefasciatus should be removed completely as it misleads the readers. I suggest limiting the manuscript to talking only about the effects of blood meal source and seasonality on the reproductive outcomes of Cx. quinquefasciatus

      As mentioned in the previous revision, we agree on the reviewer observation about the lack of evidence on seasonal shift in the host use pattern in Cx. quinquefasciatus populations from Argentina. We include this topic in the discussion.

      Additionally, we also added a paragraph in the discussion section to include the limitations of our study and conclusions. One of them is the fact that our results are based on controlled conditions experiments. Future studies are needed to elucidate if the same trend is found in the field.

      Reviewer #1 (Recommendations for the authors): 

      Abstract

      Line 73: shift in feeding behavior

      Accepted as suggested. 

      Discussion

      Line 258: addressed that Accepted as suggested.

      Line 263: blood is nutritionally richer

      Accepted as suggested.

      Reviewer #2 (Public Review): 

      Summary:

      Conceptually, this study is interesting and is the first attempt to account for the potentially interactive effects of seasonality and blood source on mosquito fitness, which the authors frame as a possible explanation for previously observed host-switching of Culex quinquefasciatus from birds to mammals in the fall. The authors hypothesize that if changes in fitness by blood source change between seasons, higher fitness on birds in the summer and on mammals in the autumn could drive observed host switching. To test this, the authors fed individuals from a colony of Cx. quinquefasciatus on chickens (bird model) and mice (mammal model) and subjected each of these two groups to two different environmental conditions reflecting the high and low temperatures and photoperiod experienced in summer and autumn in Córdoba, Argentina (aka seasonality). They measured fecundity, fertility, and hatchability over two gonotrophic cycles. The authors then used a generalized linear model to evaluate the impact of host species, seasonality, and gonotrophic cycle on fecundity, fertility, and hatchability. The authors were trying to test their hypothesis by determining whether there was an interactive effect of season and host species on mosquito fitness. This is an interesting hypothesis; if it had been supported, it would provide support for a new mechanism driving host switching. While the authors did report an interactive impact of seasonality and host species, the directionality of the effect was the opposite from that hypothesized. The authors have done a very good job of addressing many of the reviewer concerns, with several exception that continue to cause concern about the conclusions of the study. 

      Strengths:

      (1) Using a combination of laboratory feedings and incubators to simulate seasonal environmental conditions is a good, controlled way to assess the potentially interactive impact of host species and seasonality on the fitness of Culex quinquefasciatus in the lab.

      (2) The driving hypothesis is an interesting and creative way to think about a potential driver of host switching observed in the field. 

      (3) The manuscript has become a lot clearer and easier to read with the revisions - thank you to the authors for working hard to make many of the suggested changes. 

      Weaknesses:

      (1) The authors have decided not to follow the suggestion of conducting experimental replicates of the study. This is understandable given the significant investment of resources and time necessary, however, it leaves the study lacking support. Experimental replication is an important feature of a strong study and helps to provide confidence that the observed patterns are real and replicable. Without replication, I continue to lack confidence in the conclusions of the study. 

      We included replicates as suggested.  

      (2) The authors have included some additional discussion about the counterintuitive nature of their results, but the paragraph discussing this in the discussion was confusing. I believe that this should be revised. This is a key point of the paper and needs to be clear to the reader.

      Revised as suggested. 

      (3) There should be more discussion of the host switching observed in the two studies conducted in Argentina referenced by the authors. Since host switching is the foundation for the hypothesis tested in this paper, it is important to fully explain what is currently known in Argentina. 

      Accepted as suggested.

      (4) In some cases, the explanations of referenced papers are not entirely accurate. For example, when referencing Erram et al 2022, I think the authors misrepresented the paper's discussion regarding pre-diuresis- Erram et al. are suggesting that pre-diuresis might be the mechanism by which C. furens compensates for the lower nutritional value of avian blood, leading to no significant difference between avian/mammal blood on fecundity/fertility (rather than leading to higher fecundity on birds, as stated in this manuscript). The study performed by Erram et al. also didn't prove this phenomenon, they just suggest it as a possible mechanism to explain their results, so that should be made clear when referencing the paper. 

      Changed as suggested.

      (5) In some cases, the conclusions continue to be too strongly worded for the evidence available. For example, lines 322-324: I don't think the data is sufficient to conclude that a different physiological state is induced, nor that they are required to feed on a blood source that results in higher fitness. 

      Redaction was modified as suggested to tight our discussion with results.

      (6) There is limited mention of the caveat that this experiment performed with simulated seasonality that does not perfectly replicate seasonality in the field. I think this caveat should be discussed in the discussion (e.g. that humidity is held constant).

      This topic is now included in the discussion as suggested. 

      Reviewer #2 (Recommendations for the authors): 

      59-60: These terms should end with -phagic instead of -philic. These papers study blood feeding patterns, not preference. I understand that the Janssen papers calls it "mammalophilic" in their title, but this was an incorrect use of the term in their paper. There are some review papers that explain the difference in this terminology if it's helpful.

      Accepted as suggested. 

      73: edit to "in" feeding behavior 

      Accepted as suggested.

      77-78: Given that the premise of your study is based on the phenomenon of host switching, I suggest that you expand your discussion of these two papers. What did they observe? Which hosts did they switch from / to and how dramatic was the shift?

      Accepted as suggested. 

      79: replace acknowledged with experienced 

      Accepted as suggested.

      79-80: the way that this is written is misleading. It suggests that Spinsanti showed that seasonal variation in SLEV could be attributed to a host shift, which isn't true. This citation should come before the comma and then you should use more cautious language in the second half. E.g which MIGHT be possible to attribute to .... 

      Accepted as suggested.

      80-82: this is not convincing. Even if the Robin isn't in Argentina, Argentina does have migrating birds, so couldn't this be the case for other species of birds? Do any of the birds observed in previous blood meal analyses in Argentina migrate? If so, couldn't this hypothesis indeed play a role? 

      A paragraph about this topic was added to the discussion as suggested.

      90: hypotheses for what? The fall peak in cases? Or host switching? 

      Changed to be clearer.

      98: where was this mentioned before? I think "as mentioned before" can be removed. 

      Accepted as suggested.

      101: edit to "whether an interaction effect exists" 

      Accepted as suggested.

      104: edit to "We hypothesize that..." 

      Accepted as suggested.

      106: reported host USE changes, not host PREFERENCE changes, right? 

      All the terminology was change to host pattern and not preference to avoid confusion.

      200: Briefly reading Carsey and Harden, it looks like the methodology was developed for social science. Is there anything you can cite to show this applied to other types of data? If not, I think this requires more explanation in your MS. 

      This was removed as replicates were included.

      237-239: I think it is best not to make a definitive statement about greater/higher if it isn't statistically significant; I suggest modifying the sentences to state that the differences you are listing were not significantly different up front rather than at the end, otherwise if people aren't reading carefully, they may get the wrong impression. 

      Accepted as suggested.

      245: you only use the term MS-I once before and I forgot what it meant since it wasn't repeated, so I had to search back through with command-F. I suggest writing this out rather than using the acronym. 

      Accepted as suggested.

      249: edit to: "an interaction exists between the effect of..." 

      Accepted as suggested.

      253-254: greater compared to what? 

      Change for clearness. 258-260: edit for grammar 

      Accepted as suggested.

      260-262: edit for grammar; e.g. "However, this assumption lacks solid evidence; there is a scarcity of studies regarding nutritional quality of avian blood and its impact on mosquito fitness." 

      Accepted as suggested.

      263: edit: blood is nutritionally... 

      Accepted as suggested.

      264-267: This doesn't sound like an accurate interpretation of what the paper suggests regarding pre-diuresis in their discussion - they are suggesting that pre-diuresis might be the mechanism by which C. furens compensates for the lower nutritional value of avian blood, leading to no significant difference between avian/mammal blood on fecundity/fertility. They also don't show this, they just suggest it as a possible mechanism to explain their results. 

      This topic was removed given the restructuring of discussion.

      253-269: You should tie this paragraph back to your results to explicitly compare/contrast your findings with the previous literature. 

      Accepted as suggested.

      270-282: This paragraph would be a good place to explain the caveat of working in the laboratory - for example, humidity was the same across the two seasons which I'm guessing isn't the case in the field in Argentina. You can discuss what aspects of laboratory season simulation do not accurately replicate field conditions and how this can impact your findings. You said in your response to the reviewers that you weren't interested in measuring other variables (which is fair, and not expected!), but the beauty of the discussion section is to be able to think about how your experimental design might impact your results - one possibility is that your season simulation may not have produced the results produced by true seasonal shifts. 

      Accepted as suggested.

      279-281: You say your experiment was conducted within the optimal range, which would suggest that both summer and autumn were within that range, but then you only talk about summer as optimal in the following sentence. 

      Changed for clearness.

      281-282: You should clarify this sentence - state what the interaction has an effect on. 

      Accepted as suggested.

      283-291: I appreciate that your discussion now acknowledges the small sample size and the questions that remain unanswered due to the results being opposite to that of the hypothesis, but this paragraph lacks some details and in places doesn't make sense. 

      I think you need to emphasize which groups had small sample size and which conclusions that might impact. I also think you need to explain why the sample size was substantially smaller for some groups (e.g. did they refuse to feed on the mouse in the autumn?). I appreciate that sample sizes are hard to keep high across many groups and two gonotrophic periods, but unfortunately, that is why fitness experiments are so hard to do and by their nature, take a long time. I understand that other papers have even lower sample size, but I was not asked to review those papers and would have had the same critique of them. I don't believe that creating simulated data via a Monte Carlo approach can make up for generating real data. As I understand it from your explanation, you are parametrizing the Monte Carlo simulations with your original data, which was small to begin with for autumn mouse. Using this simulation doesn't seem like a satisfactory replacement for an experimental replicate in my opinion. I maintain that at least a second replicate is necessary to see whether the patterns that you have observed hold. 

      The performing of a power analysis and addition of more replicates tried to solve the issue of sample size. More about this critic is added in the discussion. The simulation approach was totally removed.

      Regarding the directionality of the interaction effect, I think this warrants more discussion. Lines 287-291 don't make sense to me. You suggest that feeding on birds in the autumn may confer a reproductive advantage when conditions are more challenging. But then why wouldn't they preferentially feed on birds in the autumn, rather than mammals? I suggest rewriting this paragraph to make it clearer. 

      Accepted as suggested.

      297: earlier mentioned treatments? Do you mean compared to the first gonotrophic cycle? This isn't clear. 

      Changed for clearness.

      302-303: Did you clarify whether you are allowed to reference unpublished data in eLife? 

      This was removed to follow the guidelines of eLife.

      316-317: "it becomes apparent" sounds awkward, I suggest rewording and also explaining how this conclusion was made. 

      Accepted as suggested.

      322-324: I think that this statement is too strongly worded. I don't think your data is sufficient to conclude that a different physiological state is induced, nor that they are required to feed on a blood source that results in higher fitness. Please modify this and make your conclusions more cautious and closely linked to what you actually demonstrated. 

      Accepted as suggested.

      325: change will perform to would have 

      Accepted as suggested.

      326: add to the sentence: "and vice versa in the summer" 

      Accepted as suggested.

      330: possible explanations, not explaining scenarios. 

      Accepted as suggested.

      517: I think you should repeat the abbreviation definitions in the caption to make it easier for readers, otherwise they have to flip back and forth which can be difficult depending on formatting.

      Accepted as suggested. 

      In general, I think that your captions need more information. I think the best captions explain the figure relatively thoroughly such that the reader can look at the figure and caption and understand without reading the paper in depth. (e.g. the statistical test used).

      Data availability: The eLife author instructions do say that data must be made available, so there should be a statement on data availability in your MS. I also suggest you make the code available.

      Accepted as suggested.

    1. Reviewer #2 (Public review):

      Summary:

      With this report, I suggest what are in my opinion crucial additions to the otherwise very interesting and credible research manuscript "Cluster size determines morphology of transcription factories in human cells".

      Strengths:

      The manuscript in itself is technically sound, the chosen simulation methods are completely appropriate the figures are well-prepared, the text is mostly well-written spare a few typos. The conclusions are valid and would represent a valuable conceptual contribution to the field of clustering, 3D genome organization and gene regulation related to transcription factories, which continues to be an area of most active investigation.

      Weaknesses:

      However, I find that the connection to concrete biological data is weak. This holds especially given that the data that are needed to critically assess the applicability of the derived cross-over with factory size is, in fact, available for analysis, and the suggested experiments in the Discussion section are actually done and their results can be exploited. In my judgement, unless these additional analysis are added to a level that crucial predictions on TF demixing and transcriptional bursting upon TU clustering can be tested, the paper is more fitted for a theoretical biophysics venue than for a biology journal.

      Major points

      (1) My first point concerns terminology. The Merriam-Webster dictionary describes morphology as the study of structure and form. In my understanding, none of the analyses carried out in this study actually address the form or spatial structuring of transcription factories. I see no aspects of shape, only size. Unless the authors want to assess actual shapes of clusters, I would recommend to instead talk about only their size/extent. The title is, by the same argument, in my opinion misleading as to the content of this study.

      (2) Another major conceptual point is the choice of how a single TF:pol particle in the model relates to actual macromolecules that undergo clustering in the cell. What about the fact that even single TF factories still contain numerous canonical transcription factors, many of which are also known to undergo phase separation? Mediator, CDK9, Pol II just to name a few. This alone already represents phase separation under the involvement of different species, which must undergo mixing. This is conceptually blurred with the concept of gene-specific transcription factors that are recruited into clusters/condensates due to sequence-specific or chromatin-epigenetic-specific affinities. Also, the fact that even in a canonical gene with a "small" transcription factory there are numerous clustering factors takes even the smallest factories into a regime of several tens of clustering macromolecules. It is unclear to me how this reality of clustering and factory formation in the biological cell relates to the cross-over that occurs at approximately n=10 particles in the simulations presented in this paper.

      (3) The paper falls critically short in referencing and exploiting for analysis existing literature and published data both on 3D genome organization as well as the process of cluster formation in relation to genomic elements. In terms of relevant literature, most of the relevant body of work from the following areas has not been included:

      (i) mechanisms of how the clustering of Pol II, canonical TFs, and specific TFs is aided by sequence elements and specific chromatin states

      (ii) mechanisms of TF selectivity for specific condensates and target genomic elements

      (iii) most crucially, existing highly relevant datasets that connect 3D multi-point contacts with transcription factor identity and transcriptional activity, which would allow the authors to directly test their hypotheses by analysis of existing data

      Here, especially the data under point iii are essential. The SPRITE method (cited but not further exploited by the authors), even in its initial form of publication, would have offered a data set to critically test the mixing vs. demixing hypothesis put forward by the authors. Specifically, the SPRITE method offers ordered data on k-mers of associated genomic elements. These can be mapped against the main TFs that associate with these genomic elements, thereby giving an account of the mixed / demixed state of these k-mer associations. Even a simple analysis sorting these associations by the number of associated genomic elements might reveal a demixing transition with increasing association size k. However, a newer version of the SPRITE method already exists, which combines the k-mer association of genomic elements with the whole transcriptome assessment of RNAs associated with a particular DNA k-mer association. This can even directly test the hypotheses the authors put forward regarding cluster size, transcriptional activation, correlation between different transcription units' activation etc.

      To continue, the Genome Architecture Mapping (GAM) method from Ana Pombo's group has also yielded data sets that connect the long-range contacts between gene-regulatory elements to the TF motifs involved in these motifs, and even provides ready-made analyses that assess how mixed or demixed the TF composition at different interaction hubs is. I do not see why this work and data set is not even acknowledged? I also strongly suggest to analyze, or if they are already sufficiently analyzed, discuss these data in the light of 3D interaction hub size (number of interacting elements) and TF motif composition of the involved genomic elements.

      Further, a preprint from the Alistair Boettiger and Kevin Wang labs from May 2024 also provides direct, single-cell imaging data of all super-enhancers, combined with transcription detection, assessing even directly the role of number of super-enhancers in spatial proximity as a determinant of transcriptional state. This data set and findings should be discussed, not in vague terms but in detailed terms of what parts of the authors' predictions match or do not match these data.

      For these data sets, an analysis in terms of the authors' key predictions must be carried out (unless the underlying papers already provide such final analysis results). In answering this comment, what matters to me is not that the authors follow my suggestions to the letter. Rather, I would want to see that the wealth of available biological data and knowledge that connects to their predictions is used to their full potential in terms of rejecting, confirming, refining, or putting into real biological context the model predictions made in this study.

      References for point (iii):

      RNA promotes the formation of spatial compartments in the nucleus<br /> https://www.cell.com/cell/fulltext/S0092-8674(21)01230-7?dgcid=raven_jbs_etoc_email

      Complex multi-enhancer contacts captured by genome architecture mapping<br /> https://www.nature.com/articles/nature21411

      Cell-type specialization is encoded by specific chromatin topologies<br /> https://www.nature.com/articles/s41586-021-04081-2

      Super-enhancer interactomes from single cells link clustering and transcription<br /> https://www.biorxiv.org/content/10.1101/2024.05.08.593251v1.full

      For point (i) and point (ii), the authors should go through the relevant literature on Pol II and TF clustering, how this connects to genomic features that support the cluster formation, and also the recent literature on TF specificity. On the last point, TF specificity, especially the groups of Ben Sabari and Mustafa Mir have presented astonishing results, that seem highly relevant to the Discussion of this manuscript.

      (4) Another conceptual point that is a critical omission is the clarification that there are, in fact, known large vs. small transcription factories, or transcriptional clusters, which are specific to stem cells and "stressed cells". This distinction was initially established by Ibrahim Cisse's lab (Science 2018) in mouse Embryonic Stem Cells, and also is seen in two other cases in differentiated cells in response to serum stimulus and in early embryonic development:

      Mediator and RNA polymerase II clusters associate in transcription-dependent condensates<br /> https://www.science.org/doi/10.1126/science.aar4199

      Nuclear actin regulates inducible transcription by enhancing RNA polymerase II clustering<br /> https://www.science.org/doi/10.1126/sciadv.aay6515

      RNA polymerase II clusters form in line with surface condensation on regulatory chromatin<br /> https://www.embopress.org/doi/full/10.15252/msb.202110272

      If "morphology" should indeed be discussed, the last paper is a good starting point, especially in combination with this additional paper:

      Chromatin expansion microscopy reveals nanoscale organization of transcription and chromatin<br /> https://www.science.org/doi/10.1126/science.ade5308

      (5) The statement "scripts are available upon request" is insufficient by current FAIR standards and seems to be non-compliant with eLife requirements. At a minimum, all, and I mean all, scripts that are needed to produce the simulation outcomes and figures in the paper, must be deposited as a publicly accessible Supplement with the article. Better would be if they would be structured and sufficiently documented and then deposited in external repositories that are appropriate for the sharing of such program code and models.

    1. Author response:

      The following is the authors’ response to the original reviews.

      The revised manuscript contains new results and additional text. Major revisions:

      (1) Additional simulations and analyses of networks with different biophysical parameters and with identical time constants for E and I neurons (Methods, Supplementary Fig. 5).

      (2) Additional simulations and analyses of networks with modifications of connectivity parameters to further analyze effects of E/I assemblies on manifold geometry (Supplementary Fig. 6).

      (3) Analysis of synaptic current components (Figure 3 D-F; to analyze mechanism of modest amplification in Tuned networks). 

      (4) More detailed explanation of pattern completion analysis (Results).

      (5) Analysis of classification performance of Scaled networks (Supplementary Fig.8).

      (6) Additional analysis (Figure 5D-F) and discussion (particularly section “Computational functions of networks with E/I assemblies”) of functional benefits of continuous representations in networks with E-I assemblies. 

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Meissner-Bernard et al present a biologically constrained model of telencephalic area of adult zebrafish, a homologous area to the piriform cortex, and argue for the role of precisely balanced memory networks in olfactory processing. 

      This is interesting as it can add to recent evidence on the presence of functional subnetworks in multiple sensory cortices. It is also important in deviating from traditional accounts of memory systems as attractor networks. Evidence for attractor networks has been found in some systems, like in the head direction circuits in the flies. However, the presence of attractor dynamics in other modalities, like sensory systems, and their role in computation has been more contentious. This work contributes to this active line of research in experimental and computational neuroscience by suggesting that, rather than being represented in attractor networks and persistent activity, olfactory memories might be coded by balanced excitation-inhibitory subnetworks. 

      Strengths: 

      The main strength of the work is in: (1) direct link to biological parameters and measurements, (2) good controls and quantification of the results, and (3) comparison across multiple models. 

      (1) The authors have done a good job of gathering the current experimental information to inform a biological-constrained spiking model of the telencephalic area of adult zebrafish. The results are compared to previous experimental measurements to choose the right regimes of operation. 

      (2) Multiple quantification metrics and controls are used to support the main conclusions and to ensure that the key parameters are controlled for - e.g. when comparing across multiple models.  (3) Four specific models (random, scaled I / attractor, and two variant of specific E-I networks - tuned I and tuned E+I) are compared with different metrics, helping to pinpoint which features emerge in which model. 

      Weaknesses: 

      Major problems with the work are: (1) mechanistic explanation of the results in specific E-I networks, (2) parameter exploration, and (3) the functional significance of the specific E-I model. 

      (1) The main problem with the paper is a lack of mechanistic analysis of the models. The models are treated like biological entities and only tested with different assays and metrics to describe their different features (e.g. different geometry of representation in Fig. 4). Given that all the key parameters of the models are known and can be changed (unlike biological networks), it is expected to provide a more analytical account of why specific networks show the reported results. For instance, what is the key mechanism for medium amplification in specific E/I network models (Fig. 3)? How does the specific geometry of representation/manifolds (in Fig. 4) emerge in terms of excitatory-inhibitory interactions, and what are the main mechanisms/parameters? Mechanistic account and analysis of these results are missing in the current version of the paper. 

      We agree that further mechanistic insights would be of interest and addressed this issue at different levels:

      (1) Biophysical parameters: to determine whether network behavior depends on specific choices of biophysical parameters in E and I neurons we equalized biophysical parameters across neuron types. The main observations are unchanged, suggesting that the observed effects depend primarily on network connectivity (see also response to comment [2]).

      (2) Mechanism of modest amplification in E/I assemblies: analyzing the different components of the synaptic currents demonstrate that the modest amplification of activity in Tuned networks results from an “imperfect” balance of recurrent excitation and inhibition within assemblies (see new Figures 3D-F and text p.7). Hence, E/I co-tuning substantially reduces the net amplification in Tuned networks as compared to Scaled networks, thus preventing discrete attractor dynamics and stabilizing network activity, but a modest amplification still occurs, consistent with biological observations.

      (3) Representational geometry: to obtain insights into the network mechanisms underlying effects of E/I assemblies on the geometry of population activity we tested the hypothesis that geometrical changes depend, at least in part, on the modest amplification of activity within E/I assemblies (see Supplementary Figure 6). We changed model parameters to either prevent the modest amplification in Tuned networks (increasing I-to-E connectivity within assemblies) or introduce a modest amplification in subsets of neurons by other mechanisms (concentration-dependent increase in the excitability of pseudo-assembly neurons; Scaled I networks with reduced connectivity within assemblies). Manipulations that introduced a modest, input-dependent amplification in neuronal subsets had geometrical effects similar to those observed in Tuned networks, whereas manipulations that prevented a modest amplification abolished these effects (Supplementary Figure 6). Note however that these manipulations generated different firing rate distributions. These results provide a starting point for more detailed analyses of the relationship between network connectivity and representational geometry (see p.12).

      In summary, our additional analyses indicate that effects of E/I assemblies on representational geometry depend primarily on network connectivity, rather than specific biophysical parameters, and that the resulting modest amplification of activity within assemblies makes an important contribution. Further analyses may reveal more specific relationships between E/I assemblies and representational geometry, but such analyses are beyond the scope of this study.

      (2) The second major issue with the study is a lack of systematic exploration and analysis of the parameter space. Some parameters are biologically constrained, but not all the parameters. For instance, it is not clear what the justification for the choice of synaptic time scales are (with E synaptic time constants being larger than inhibition: tau_syn_i = 10 ms, tau_syn_E = 30 ms). How would the results change if they are varying these - and other unconstrained - parameters? It is important to show how the main results, especially the manifold localisation, would change by doing a systematic exploration of the key parameters and performing some sensitivity analysis. This would also help to see how robust the results are, which parameters are more important and which parameters are less relevant, and to shed light on the key mechanisms.  

      We thank the reviewer for raising this point. We chose a relatively slow time constant for excitatory synapses because experimental data indicate that excitatory synaptic currents in Dp and piriform cortex contain a prominent NMDA component. Nevertheless, to assess whether network behavior depends on specific choices of biophysical parameters in E and I neurons, we have performed additional simulations with equal synaptic time constants and equal biophysical parameters for all neurons. Each neuron also received the same number of inputs from each population (see revised Methods). Results were similar to those observed previously (Supplementary Fig.5 and p.9 of main text). We therefore conclude that the main effects observed in Tuned networks cannot be explained by differences in biophysical parameters between E and I neurons but is primarily a consequence of network connectivity.

      (3) It is not clear what the main functional advantage of the specific E-I network model is compared to random networks. In terms of activity, they show that specific E-I networks amplify the input more than random networks (Fig. 3). But when it comes to classification, the effect seems to be very small (Fig. 5c). Description of different geometry of representation and manifold localization in specific networks compared to random networks is good, but it is more of an illustration of different activity patterns than proving a functional benefit for the network. The reader is still left with the question of what major functional benefits (in terms of computational/biological processing) should be expected from these networks, if they are to be a good model for olfactory processing and learning. 

      One possibility for instance might be that the tasks used here are too easy to reveal the main benefits of the specific models - and more complex tasks would be needed to assess the functional enhancement (e.g. more noisy conditions or more combination of odours). It would be good to show this more clearly - or at least discuss it in relation to computation and function. 

      In the previous manuscript, the analysis of potential computational benefits other than pattern classification was limited and the discussion of this issue was condensed into a single itemized paragraph to avoid excessive speculation. Although a thorough analysis of potential computational benefits exceeds the scope of a single paper, we agree with the reviewer that this issue is of interest and therefore added additional analyses and discussion.

      In the initial manuscript we analyzed pattern classification primarily to investigate whether Tuned networks can support this function at all, given that they do not exhibit discrete attractor states. We found this to be the case, which we consider a first important result.

      Furthermore, we found that precise balance of E/I assemblies can protect networks against catastrophic firing rate instabilities when assemblies are added sequentially, as in continual learning. Results from these simulations are now described and discussed in more detail (see Results p.11 and Discussion p.13).

      In the revised manuscript, we now also examine additional potential benefits of Tuned networks and discuss them in more detail (see new Figure 5D-F and text p.11). One hypothesis is that continuous representations provide a distance metric between a given input and relevant (learned) stimuli. To address this hypothesis, we (1) performed regression analysis and (2) trained support vector machines (SVMs) to predict the concentration of a given odor in a mixture based on population activity. In both cases, Tuned E+I networks outperformed Scaled and _rand n_etworks in predicting the concentration of learned odors across a wide range mixtures (Figure 5D-F).  E/I assemblies therefore support the quantification of learned odors within mixtures or, more generally, assessments of how strongly a (potentially complex) input is related to relevant odors stored in memory. Such a metric assessment of stimulus quality is not well supported by discrete attractor networks because inputs are mapped onto discrete network states.

      The observation that Tuned networks do not map inputs onto discrete outputs indicates that such networks do not classify inputs as distinct items. Nonetheless, the observed geometrical modifications of continuous representations support the classification of learned inputs or the assessment of metric relationships by hypothetical readout neurons. Geometrical modifications of odor representations may therefore serve as one of multiple steps in multi-layer computations for pattern classification (and/or other computations). In this scenario, the transformation of odor representations in Dp may be seen as related to transformations of representations between different layers in artificial networks, which collectively perform a given task (notwithstanding obvious structural and mechanistic differences between artificial and biological networks). In other words, geometrical transformations of representations in Tuned networks may overrepresent learned (relevant) information at the expense of other information and thereby support further learning processes in other brain areas. An obvious corollary of this scenario is that Dp does not perform odor classification per se based on inputs from the olfactory bulb but reformats representations of odor space based on experience to support computational tasks as part of a larger system. This scenario is now explicitly discussed (p.14).

      Reviewer #2 (Public Review): 

      Summary: 

      The authors conducted a comparative analysis of four networks, varying in the presence of excitatory assemblies and the architecture of inhibitory cell assembly connectivity. They found that co-tuned E-I assemblies provide network stability and a continuous representation of input patterns (on locally constrained manifolds), contrasting with networks with global inhibition that result in attractor networks. 

      Strengths: 

      The findings presented in this paper are very interesting and cutting-edge. The manuscript effectively conveys the message and presents a creative way to represent high-dimensional inputs and network responses. Particularly, the result regarding the projection of input patterns onto local manifolds and continuous representation of input/memory is very Intriguing and novel. Both computational and experimental neuroscientists would find value in reading the paper. 

      Weaknesses: 

      that have continuous representations. This could also be shown in Figure 5B, along with the performance of the random and tuned E-I networks. The latter networks have the advantage of providing network stability compared to the Scaled I network, but at the cost of reduced network salience and, therefore, reduced input decodability. The authors may consider designing a decoder to quantify and compare the classification performance of all four networks. 

      We have now quantified classification by networks with discrete attractor dynamics (Scaled) along with other networks. However, because the neuronal covariance matrix for such networks is low rank and not invertible, pattern classification cannot be analyzed by QDA as in Figure 5B. We therefore classified patterns from the odor subspace by template matching, assigning test patterns to one of the four classes based on correlations (see Supplementary Figure 8). As expected, Scaled networks performed well, but they did not outperform Tuned networks. Moreover, the performance of Scaled networks, but not Tuned networks, depended on the order in which odors were presented to the network. This hysteresis effect is a direct consequence of persistent attractor states and decreased the general classification performance of Scaled networks (see Supplementary Figure 8 for details). These results confirm the prediction that networks with discrete attractor states can efficiently classify inputs, but also reveal disadvantages arising from attractor dynamics. Moreover, the results indicate that the classification performance of Tuned networks is also high under the given task conditions, which simulate a biologically realistic scenario.

      We would also like to emphasize that classification may not be the only task, and perhaps not even a main task, of Dp/piriform cortex or other memory networks with E/I assemblies. Conceivably, other computations could include metric assessments of inputs relative to learned inputs or additional learning-related computations. Please see our response to comment (3) of reviewer 1 for a further discussion of this issue. 

      Networks featuring E/I assemblies could potentially represent multistable attractors by exploring the parameter space for their reciprocal connectivity and connectivity with the rest of the network. However, for co-tuned E-I networks, the scope for achieving multistability is relatively constrained compared to networks employing global or lateral inhibition between assemblies. It would be good if the authors mentioned this in the discussion. Also, the fact that reciprocal inhibition increases network stability has been shown before and should be cited in the statements addressing network stability (e.g., some of the citations in the manuscript, including Rost et al. 2018, Lagzi & Fairhall 2022, and Vogels et al. 2011 have shown this).  

      We thank the reviewer for this comment. We now explicitly discuss multistability (see p. 12) and refer to additional references in the statements addressing network stability.

      Providing raster plots of the pDp network for familiar and novel inputs would help with understanding the claims regarding continuous versus discrete representation of inputs, allowing readers to visualize the activity patterns of the four different networks. (similar to Figure 1B). 

      We thank the reviewer for this suggestion. We have added raster plots of responses to both familiar and novel inputs in the revised manuscript (Figure 2D and Supplementary Figure 4A).

      Reviewer #3 (Public Review): 

      Summary: 

      This work investigates the computational consequences of assemblies containing both excitatory and inhibitory neurons (E/I assembly) in a model with parameters constrained by experimental data from the telencephalic area Dp of zebrafish. The authors show how this precise E/I balance shapes the geometry of neuronal dynamics in comparison to unstructured networks and networks with more global inhibitory balance. Specifically, E/I assemblies lead to the activity being locally restricted onto manifolds - a dynamical structure in between high-dimensional representations in unstructured networks and discrete attractors in networks with global inhibitory balance. Furthermore, E/I assemblies lead to smoother representations of mixtures of stimuli while those stimuli can still be reliably classified, and allow for more robust learning of additional stimuli. 

      Strengths: 

      Since experimental studies do suggest that E/I balance is very precise and E/I assemblies exist, it is important to study the consequences of those connectivity structures on network dynamics. The authors convincingly show that E/I assemblies lead to different geometries of stimulus representation compared to unstructured networks and networks with global inhibition. This finding might open the door for future studies for exploring the functional advantage of these locally defined manifolds, and how other network properties allow to shape those manifolds. 

      The authors also make sure that their spiking model is well-constrained by experimental data from the zebrafish pDp. Both spontaneous and odor stimulus triggered spiking activity is within the range of experimental measurements. But the model is also general enough to be potentially applied to findings in other animal models and brain regions. 

      Weaknesses: 

      I find the point about pattern completion a bit confusing. In Fig. 3 the authors argue that only the Scaled I network can lead to pattern completion for morphed inputs since the output correlations are higher than the input correlations. For me, this sounds less like the network can perform pattern completion but it can nonlinearly increase the output correlations. Furthermore, in Suppl. Fig. 3 the authors show that activating half the assembly does lead to pattern completion in the sense that also non-activated assembly cells become highly active and that this pattern completion can be seen for Scaled I, Tuned E+I, and Tuned I networks. These two results seem a bit contradictory to me and require further clarification, and the authors might want to clarify how exactly they define pattern completion. 

      We believe that this comment concerns a semantic misunderstanding and apologize for any lack of clarity. We added a definition of pattern completion in the text: “…the retrieval of the whole memory from noisy or corrupted versions of the learned input.”. Pattern completion may be assessed using different procedures. In computational studies, it is often analyzed by delivering input to a subset of the assembly neurons which store a given memory (partial activation). Under these conditions, we find recruitment of the entire assembly in all structured networks, as demonstrated in Supplementary Figure 3. However, these conditions are unlikely to occur during odor presentation because the majority of neurons do not receive any input.

      Another more biologically motivated approach to assess pattern completion is to gradually modify a realistic odor input into a learned input, thereby gradually increasing the overlap between the two inputs. This approach had been used previously in experimental studies (references added to the text p.6). In the presence of assemblies, recurrent connectivity is expected to recruit assembly neurons (and thus retrieve the stored pattern) more efficiently as the learned pattern is approached. This should result in a nonlinear increase in the similarity between the evoked and the learned activity pattern. This signature was prominent in Scaled networks but not in Tuned or rand networks. Obviously, the underlying procedure is different from the partial activation of the assembly described above because input patterns target many neurons (including neurons outside assemblies) and exhibit a biologically realistic distribution of activity. However, this approach has also been referred to as “pattern completion” in the neuroscience literature, which may be the source of semantic confusion here. To clarify the difference between these approaches we have now revised the text and explicitly described each procedure in more detail (see p.6). 

      The authors argue that Tuned E+I networks have several advantages over Scaled I networks. While I agree with the authors that in some cases adding this localized E/I balance is beneficial, I believe that a more rigorous comparison between Tuned E+I networks and Scaled I networks is needed: quantification of variance (Fig. 4G) and angle distributions (Fig. 4H) should also be shown for the Scaled I network. Similarly in Fig. 5, what is the Mahalanobis distance for Scaled I networks and how well can the Scaled I network be classified compared to the Tuned E+I network? I suspect that the Scaled I network will actually be better at classifying odors compared to the E+I network. The authors might want to speculate about the benefit of having networks with both sources of inhibition (local and global) and hence being able to switch between locally defined manifolds and discrete attractor states. 

      We agree that a more rigorous comparison of Tuned and Scaled networks would be of interest. We have added the variance analysis (Fig 4G) and angle distributions (Fig. 4H) for both Tuned I and Scaled networks. However, the Mahalanobis distances and Quadratic Discriminant Analysis cannot be applied to Scaled networks because their neuronal covariance matrix is low rank and not invertible_. To nevertheless compare these networks, we performed template matching by assigning test patterns to one of the four odor classes based on correlations to template patterns (Supplementary Figure 8; see also response to the first comment of reviewer 2). Interestingly, _Scaled networks performed well at classification but did not outperform Tuned networks, and exhibited disadvantages arising from attractor dynamics (Supplementary Figure 8; see also response to the first comment of reviewer 2). Furthermore, in further analyses we found that continuous representational manifolds support metric assessments of inputs relative to learned odors, which cannot be achieved by discrete representations. These results are now shown in Figure 5D-E and discussed explicitly in the text on p.11 (see also response to comment 3 of reviewer 1).

      We preferred not to add a sentence in the Discussion about benefits of networks having both sources of inhibition_,_ as we find this a bit too speculative.

      At a few points in the manuscript, the authors use statements without actually providing evidence in terms of a Figure. Often the authors themselves acknowledge this, by adding the term "not shown" to the end of the sentence. I believe it will be helpful to the reader to be provided with figures or panels in support of the statements.  

      Thank you for this comment. We have provided additional data figures to support the following statements:

      “d<sub>M</sub> was again increased upon learning, particularly between learned odors and reference classes representing other odors (Supplementary Figure 9)”

      “decreasing amplification in assemblies of Scaled networks changed transformations towards the intermediate behavior, albeit with broader firing rate distributions than in Tuned networks (Supplementary Figure 6 B)”  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Meissner-Bernard et al present a biologically constrained model of telencephalic area of adult zebrafish, a homologous area to the piriform cortex, and argue for the role of precisely balanced memory networks in olfactory processing. 

      This is interesting as it can add to recent evidence on the presence of functional subnetworks in multiple sensory cortices. It is also important in deviating from traditional accounts of memory systems as attractor networks. Evidence for attractor networks has been found in some systems, like in the head direction circuits in the flies. However, the presence of attractor dynamics in other modalities, like sensory systems, and their role in computation has been more contentious. This work contributes to this active line of research in experimental and computational neuroscience by suggesting that, rather than being represented in attractor networks and persistent activity, olfactory memories might be coded by balanced excitation-inhibitory subnetworks. 

      The paper is generally well-written, the figures are informative and of good quality, and multiple approaches and metrics have been used to test and support the main results of the paper. 

      The main strength of the work is in: (1) direct link to biological parameters and measurements, (2) good controls and quantification of the results, and (3) comparison across multiple models. 

      (1) The authors have done a good job of gathering the current experimental information to inform a biological-constrained spiking model of the telencephalic area of adult zebrafish. The results are compared to previous experimental measurements to choose the right regimes of operation. 

      (2) Multiple quantification metrics and controls are used to support the main conclusions and to ensure that the key parameters are controlled for - e.g. when comparing across multiple models.   (3) Four specific models (random, scaled I / attractor, and two variant of specific E-I networks - tuned I and tuned E+I) are compared with different metrics, helping to pinpoint which features emerge in which model. 

      Major problems with the work are: (1) mechanistic explanation of the results in specific E-I networks, (2) parameter exploration, and (3) the functional significance of the specific E-I model. 

      (1) The main problem with the paper is a lack of mechanistic analysis of the models. The models are treated like biological entities and only tested with different assays and metrics to describe their different features (e.g. different geometry of representation in Fig. 4). Given that all the key parameters of the models are known and can be changed (unlike biological networks), it is expected to provide a more analytical account of why specific networks show the reported results. For instance, what is the key mechanism for medium amplification in specific E/I network models (Fig. 3)? How does the specific geometry of representation/manifolds (in Fig. 4) emerge in terms of excitatory-inhibitory interactions, and what are the main mechanisms/parameters? Mechanistic account and analysis of these results are missing in the current version of the paper. 

      Precise balancing of excitation and inhibition in subnetworks would lead to the cancellation of specific dynamical modes responsible for the amplification of responses (hence, deviating from the attractor dynamics with an unstable specific mode). What is the key difference in the specific E/I networks here (tuned I or/and tuned E+I) which make them stand between random and attractor networks? Excitatory and inhibitory neurons have different parameters in the model (Table 1). Time constants of inhibitory and excitatory synapses are also different (P. 13). Are these parameters causing networks to be effectively more excitation dominated (hence deviating from a random spectrum which would be expected from a precisely balanced E/I network, with exactly the same parameters of E and I neurons)? It is necessary to analyse the network models, describe the key mechanism for their amplification, and pinpoint the key differences between E and I neurons which are crucial for this. 

      To address these comments we performed additional simulations and analyses at different levels. Please see our reply to comment (1) of the public review (reviewer 1) for a detailed description. We thank the reviewer for these constructive comments.

      (2) The second major issue with the study is a lack of systematic exploration and analysis of the parameter space. Some parameters are biologically constrained, but not all the parameters. For instance, it is not clear what the justification for the choice of synaptic time scales are (with E synaptic time constants being larger than inhibition: tau_syn_i = 10 ms, tau_syn_E = 30 ms). How would the results change if they are varying these - and other unconstrained - parameters? It is important to show how the main results, especially the manifold localisation, would change by doing a systematic exploration of the key parameters and performing some sensitivity analysis. This would also help to see how robust the results are, which parameters are more important and which parameters are less relevant, and to shed light on the key mechanisms.  

      We thank the reviewer for this comment. We have now carried out additional simulations with equal time constants for all neurons. Please see our reply to the public review for more details (comment 2 of reviewer 1).

      (3) It is not clear what the main functional advantage of the specific E-I network model is compared to random networks. In terms of activity, they show that specific E-I networks amplify the input more than random networks (Fig. 3). But when it comes to classification, the effect seems to be very small (Fig. 5c). Description of different geometry of representation and manifold localization in specific networks compared to random networks is good, but it is more of an illustration of different activity patterns than proving a functional benefit for the network. The reader is still left with the question of what major functional benefits (in terms of computational/biological processing) should be expected from these networks, if they are to be a good model for olfactory processing and learning. 

      One possibility for instance might be that the tasks used here are too easy to reveal the main benefits of the specific models - and more complex tasks would be needed to assess the functional enhancement (e.g. more noisy conditions or more combination of odours). It would be good to show this more clearly - or at least discuss it in relation to computation and function.

      Please see our reply to the public review (comment 3 of reviewer 1).

      Specific comments: 

      Abstract: "resulting in continuous representations that reflected both relatedness of inputs and *an individual's experience*" 

      It didn't become apparent from the text or the model where the role of "individual's experience" component (or "internal representations" - in the next line) was introduced or shown (apart from a couple of lines in the Discussion) 

      We consider the scenario that that assemblies are the outcome of an experience-dependent plasticity process. To clarify this, we have now made a small addition to the text: “Biological memory networks are thought to store information by experience-dependent changes in the synaptic connectivity between assemblies of neurons.”.

      P. 2: "The resulting state of "precise" synaptic balance stabilizes firing rates because inhomogeneities or fluctuations in excitation are tracked by correlated inhibition" 

      It is not clear what the "inhomogeneities" specifically refers to - they can be temporal, or they can refer to the quenched noise of connectivity, for instance. Please clarify what you mean. 

      The statement has been modified to be more precise: “…“precise” synaptic balance stabilizes firing rates because inhomogeneities in excitation across the population or temporal variations in excitation are tracked by correlated inhibition…”.

      P. 3 (and Methods): When odour stimulus is simulated in the OB, the activity of a fraction of mitral cells is increased (10% to 15 Hz) - but also a fraction of mitral cells is suppressed (5% to 2 Hz). What is the biological motivation or reference for this? It is not provided. Is it needed for the results? Also, it is not explained how the suppressed 5% are chosen (e.g. randomly, without any relation to the increased cells?). 

      We thank the reviewer for this comment. These changes in activity directly reflect experimental observations. We apologize that we forgot to include the references reporting these observations (Friedrich and Laurent, 2001 and 2004); this is now fixed.

      In our simulation, OB neurons do not interact with each other, and the suppressed 5% were indeed randomly selected. We changed the text in Methods accordingly to read: “An additional 75 randomly selected mitral cells were inhibited” 

      P. 4, L. 1-2: "... sparsely connected integrate-and-fire neurons with conductance-based synapses (connection probability {less than or equal to}5%)." 

      Specify the connection probability of specific subtypes (EE, EI, IE, II).  

      We now refer to the Methods section, where this information can be found. 

      “... conductance-based synapses (connection probability ≤5%, Methods)”  

      P. 4, L. 6-7: "Population activity was odor-specific and activity patterns evoked by uncorrelated OB inputs remained uncorrelated in Dp (Figure 1H)" 

      What would happen to correlated OB inputs (e.g. as a result of mixture of two overlapping odours) in this baseline state of the network (before memories being introduced to it)? It would be good to know this, as it sheds light on the initial operating regime of the network in terms of E/I balance and decorrelation of inputs.  

      This information was present in the original manuscript at (Figure 3) but we improved the writing to further clarify this issue: “ (…) we morphed a novel odor into a learned odor (Figure 3A), or a learned odor into another learned odor (Supplementary Figure 3B), and quantified the similarity between morphed and learned odors by the Pearson correlation of the OB activity patterns (input correlation). We then compared input correlations to the corresponding pattern correlations among E neurons in Dp (output correlation). In rand networks, output correlations increased linearly with input correlations but did not exceed them (Figure 3B and Supplementary Figure 3B)”

      P. 4, L. 12-13: "Shuffling spike times of inhibitory neurons resulted in runaway activity with a probability of ~80%, .."   Where is this shown? 

      (There are other occasions too in the paper where references to the supporting figures are missing). 

      We now provide the statistics: “Shuffling spike times of inhibitory neurons resulted in runaway activity with a probability of 0.79 ± 0.20”

      P. 4: "In each network, we created 15 assemblies representing uncorrelated odors. As a consequence, ~30% of E neurons were part of an assembly ..." 

      15 x 100 / 4000 = 37.5% - so it's closer to 40% than 30%. Unless there is some overlap? 

      Yes: despite odors being uncorrelated and connectivity being random, some neurons (6 % of E neurons) belong to more than one assembly.

      P. 4: "When a reached a critical value of ~6, networks became unstable and generated runaway activity (Figure 2B)." 

      Can this transition point be calculated or estimated from the network parameters, and linked to the underlying mechanisms causing it? 

      We thank the reviewer for this interesting question. The unstability arises when inhibitions fails to counterbalance efficiently the increased recurrent excitation within Dp. The transition point is difficult to estimate, as it can depend on several parameters, including the probability of E to E connections, their strength, assembly size, and others. We have therefore not attempted to estimate it analytically.

      P. 4: "Hence, non-specific scaling of inhibition resulted in a divergence of firing rates that exhausted the dynamic range of individual neurons in the population, implying that homeostatic   global inhibition is insufficient to maintain a stable firing rate distribution." 

      I don't think this is justified based on the results and figures presented here (Fig. 2E) - the interpretation is a bit strong and biased towards the conclusions the authors want to draw. 

      To more clearly illustrate the finding that in Scaled networks, assembly neurons are highly active (close to maximal realistic firing rates) whereas non-assembly neurons are nearly silent we have now added Supplementary Fig. 2B. Moreover, we have toned down the text: “Hence, non-specific scaling of inhibition resulted in a large and biologically unrealistic divergence of firing rates (Supplementary Figure 2B) that nearly exhausted the dynamic range of individual neurons in the population, indicating that homeostatic global inhibition is insufficient to maintain a stable firing rate distribution”

      P. 5, third paragraph: Description of Figure 2I, inset is needed, either in the text or caption. 

      The inset is now referred to in the text: ”we projected synaptic conductances of each neuron onto a line representing the E/I ratio expected in a balanced network (“balanced axis”) and onto an orthogonal line (“counter-balanced axis”; Figure 2I inset, Methods).”

      P. 5, last paragraph: another example of writing about results without showing/referring to the corresponding figures: 

      "In rand networks, firing rates increased after stimulus onset and rapidly returned to a low baseline after stimulus offset. Correlations between activity patterns evoked by the same odor at different time points and in different trials were positive but substantially lower than unity, indicating high variability ..." 

      And the continuation with similar lack of references on P. 6: 

      "Scaled networks responded to learned odors with persistent firing of assembly neurons and high pattern correlations across trials and time, implying attractor dynamics (Hopfield, 1982; Khona and Fiete, 2022), whereas Tuned networks exhibited transient responses and modest pattern correlations similar to rand networks." 

      Please go through the Results and fix the references to the corresponding figures on all instances. 

      We thank the reviewer for pointing out these overlooked figure references, which are now fixed.

      P. 8: "These observations further support the conclusion that E/I assemblies locally constrain neuronal dynamics onto manifolds." 

      As discussed in the general major points, mechanistic explanation in terms of how the interaction of E/I dynamics leads to this is missing. 

      As discussed in the reply to the public review (comment 3 of reviewer 1), we have now provided more mechanistic analyses of our observations.

      P. 9: "Hence, E/I assemblies enhanced the classification of inputs related to learned patterns."   The effect seems to be very small. Also, any explanation for why for low test-target correlation the effect is negative (random doing better than tuned E/I)? 

      The size of the effect (plearned – pnovel = 0.074; difference of means; Figure 5C) may appear small in terms of absolute probability, but it is substantial relative to the maximum possible increase (1 – p<sub>novel</sub> =  0.133; Figure 5C). The fact that for low test-target correlations the effect is negative is a direct consequence of the positive effect for high test-target correlations and the presence of 2 learned odors in the 4-way forced choice task. 

      P. 9: "In Scaled I networks, creating two additional memories resulted in a substantial increase   in firing rates, particularly in response to the learned and related odors"   Where is this shown? Please refer to the figure. 

      We thank the reviewer again for pointing this out. We forgot to include a reference to the relevant figure which has now been added in the revised manuscript (Figure 6C).

      P. 10: "The resulting Tuned networks reproduced additional experimental observations that were not used as constraints including irregular firing patterns, lower output than input correlations, and the absence of persistent activity" 

      It is difficult to present these as "additional experimental observations", as all of them are negative, and can exist in random networks too - hence cannot be used as biological evidence in favour of specific E/I networks when compared to random networks. 

      We agree with the reviewer that these additional experimental observations cannot be used as biological evidence favouring Tuned E+I networks over random networks. We here just wanted to point out that additional observations which we did not take into account to fit the model are not invalidating the existence of E-I assemblies in biological networks. As assemblies tend to result in persistent activity in other types of networks, we feel that this observation is worth pointing out.

      Methods: 

      P. 13: Describe the parameters of Eq. 2 after the equation. 

      Done.

      P. 13: "The time constants of inhibitory and excitatory synapses were 10 ms and 30 ms, respectively." 

      What is the (biological) justification for the choice of these parameters? 

      How would varying them affect the main results (e.g. local manifolds)? 

      We chose a relatively slow time constant for excitatory synapses because experimental data indicate that excitatory synaptic currents in Dp and piriform cortex contain a prominent NMDA component. We have now also simulated networks with equal time constants for excitatory and inhibitory synapses and equal biophysical parameters for excitatory and inhibitory neurons, which did not affect the main results (see also reply to the public review: comment 2 of reviewer 1).

      P. 14: "Care was also taken to ensure that the variation in the number of output connections was low across neurons"   How exactly?

      More detailed explanations have now been added in the Methods section: “connections of a presynaptic neuron y to postsynaptic neurons x were randomly deleted when their total number exceeded the average number of output connections by ≥5%, or added when they were lower by ≥5%.“

      Reviewer #2 (Recommendations For The Authors): 

      Congratulations on the great and interesting work! The results were nicely presented and the idea of continuous encoding on manifolds is very interesting. To improve the quality of the paper, in addition to the major points raised in the public review, here are some more detailed comments for the paper: 

      (1) Generally, citations have to improve. Spiking networks with excitatory assemblies and different architectures of inhibitory populations have been studied before, and the claim about improved network stability in co-tuned E-I networks has been made in the following papers that need to be correctly cited: 

      • Vogels TP, Sprekeler H, Zenke F, Clopath C, Gerstner W. 2011. Inhibitory Plasticity Balances Excitation and Inhibition in Sensory Pathways and Memory Networks. Science 334:1-7. doi:10.1126/science.1212991 (mentions that emerging precise balance on the synaptic weights can result in the overall network stability) 

      • Lagzi F, Bustos MC, Oswald AM, Doiron B. 2021. Assembly formation is stabilized by Parvalbumin neurons and accelerated by Somatostatin neurons. bioRxiv doi: https://doi.org/10.1101/2021.09.06.459211 (among other things, contrasts stability and competition which arises from multistable networks with global inhibition and reciprocal inhibition)   • Rost T, Deger M, Nawrot MP. 2018. Winnerless competition in clustered balanced networks: inhibitory assemblies do the trick. Biol Cybern 112:81-98. doi:10.1007/s00422-017-0737-7 (compares different architectures of inhibition and their effects on network dynamics) 

      • Lagzi F, Fairhall A. 2022. Tuned inhibitory firing rate and connection weights as emergent network properties. bioRxiv 2022.04.12.488114. doi:10.1101/2022.04.12.488114 (here, see the eigenvalue and UMAP analysis for a network with global inhibition and E/I assemblies) 

      Additionally, there are lots of pioneering work about tracking of excitatory synaptic inputs by inhibitory populations, that are missing in references. Also, experimental work that show existence of cell assemblies in the brain are largely missing. On the other hand, some references that do not fit the focus of the statements have been incorrectly cited. 

      The authors may consider referencing the following more pertinent studies on spiking networks to support the statement regarding attractor dynamics in the first paragraph in the Introduction (the current citations of Hopfield and Kohonen are for rate-based networks): 

      • Wong, K.-F., & Wang, X.-J. (2006). A recurrent network mechanism of time integration in perceptual decisions. Journal of Neuroscience, 26(4), 1314-1328. https://doi.org/10.1523/JNEUROSCI.3733-05.2006 

      • Wang, X.-J. (2008). Decision making in recurrent neuronal circuits. Neuron, 60(2), 215-234. https://doi.org/10.1016/j.neuron.2008.09.034  

      • F. Lagzi, & S. Rotter. (2015). Dynamics of competition between subnetworks of spiking neuronal networks in the balanced state. PloS One. 

      • Goldman-Rakic, P. S. (1995). Cellular basis of working memory. Neuron, 14(3), 477-485. 

      • Rost T, Deger M, Nawrot MP. 2018. Winnerless competition in clustered balanced networks: inhibitory assemblies do the trick. Biol Cybern 112:81-98. doi:10.1007/s00422-017-0737-7. 

      • Amit DJ, Tsodyks M (1991) Quantitative study of attractor neural network retrieving at low spike rates: I. substrate-spikes, rates and neuronal gain. Network 2:259-273. 

      • Mazzucato, L., Fontanini, A., & La Camera, G. (2015). Dynamics of Multistable States during Ongoing and Evoked Cortical Activity. Journal of Neuroscience, 35(21), 8214-8231. 

      We thank the reviewer for the references suggestions. We have carefully reviewed the reference list and made the following changes, which we hope address the reviewer’s concerns:

      (1) We adjusted References about network stability in co-tuned E-I networks.

      (2) We added the Lagzi & Rotter (2015), Amit et al. (1991), Mazzucato et al. (2015) and GoldmanRakic (1995) papers in the Introduction as studies on attractor dynamics in spiking neural networks. We preferred to omit the two X.J Wang papers, as they describe attractors in decision making rather than memory processes.

      (3) We added the Ko et al. 2011 paper as experimental evidence for assemblies in the brain. In our view, there are few experimental studies showing the existence of cell assemblies in the brain, which we distinguish from cell ensembles, group of coactive neurons. 

      (4) We also included Hennequin 2018, Brunel 2000, Lagzi et al. 2021 and Eckmann et al. 2024, which we had not cited in the initial manuscript.

      (5) We removed the Wiechert et al. 2010 reference as it does not support the statement about geometry-preserving transformation by random networks.

      (2) The gist of the paper is about how the architecture of inhibition (reciprocal vs. global in this case) can determine network stability and salient responses (related to multistable attractors and variations) for classification purposes. It would improve the narrative of the paper if this point is raised in the Introduction and Discussion section. Also see a relevant paper that addresses this point here: 

      Lagzi F, Bustos MC, Oswald AM, Doiron B. 2021. Assembly formation is stabilized by Parvalbumin neurons and accelerated by Somatostatin neurons. bioRxiv doi: https://doi.org/10.1101/2021.09.06.459211 

      Classification has long been proposed to be a function of piriform cortex and autoassociative memory networks in general, and we consider it important. However, the computational function of Dp or piriform cortex is still poorly understood, and we do not focus only on odor classification as a possibility. In fact, continuous representational manifolds also support other functions such as the quantification of distance relationships of an input to previously memorized stimuli, or multi-layer network computations (including classification). In the revised manuscript, we have performed additional analyses to explore these notions in more detail, as explained above (response to public reviews, comment 3 of reviewer 1). Furthermore, we have now expanded the discussion of potential computational functions of Tuned networks and explicitly discuss classification but also other potential functions. 

      (3) A plot for the values of the inhibitory conductances in Figure 1 would complete the analysis for that section. 

      In Figure 1, we decided to only show the conductances that we use to fit our model, namely the afferent and total synaptic conductances. As the values of the inhibitory conductances can be derived from panel E, we refrained from plotting them separately for the sake of simplicity. 

      (4) How did the authors calculate correlations between activity patterns as a function of time in Figure 2E, bottom row? Does the color represent correlation coefficient (which should not be time dependent) or is it a correlation function? This should be explained in the Methods section. 

      The color represents the Pearson correlation coefficient between activity patterns within a narrow time window (100 ms). We updated the Figure legend to clarify this: “Mean correlation between activity patterns evoked by a learned odor at different time points during odor presentation. Correlation coefficients were calculated between pairs of activity vectors composed of the mean firing rates of E neurons in 100 ms time bins. Activity vectors were taken from the same or different trials, except for the diagonal, where only patterns from different trials were considered.”

      (5) Figure 3 needs more clarification (both in the main text and the figure caption). It is not clear what the axes are exactly, and why the network responses for familiar and novel inputs are different. The gray shaded area in panel B needs more explanation as well.  

      We thank the reviewer for the comment. We have improved Figure 3A, the figure caption, as well as the text (see p.6). We hope that the figure is now clearer.

      (6) The "scaled I" network, known for representing input patterns in discrete attractors, should exhibit clear separation between network responses in the 2D PC space in the PCA plots. However, Figure 4D and Figure 6D do not reflect this, as all network responses are overlapped. Can the authors explain the overlap in Figure 4D? 

      In Figure 4D, activity of Scaled networks is distributed between three subregions in state space that are separated by the first 2 PCs. Two of them indeed correspond to attractor states representing the two learned odors while the third represents inputs that are not associated with these attractor states. To clarify this, please see also the density plot in Figure 4E. The few datapoints between these three subregions are likely outliers generated by the sequential change in inputs, as described in Supplementary Figure 8C.

      (7) The reason for writing about the ISN networks is not clear. Co-tuned E-I assemblies do not necessarily have to operate in this regime. Also, the results of the paper do not rely on any of the properties of ISNs, but they are more general. Authors should either show the paradoxical effect associated with ISN (i.e., if increasing input to I neurons decreases their responses) or show ISN properties using stability analysis (See computational research conducted at the Allen Institute, namely Millman et al. 2020, eLife ). Currently, the paper reads as if being in the ISN regime is a necessary requirement, which is not true. Also, the arguments do not connect with the rest of the paper and never show up again. Since we know it is not a requirement, there is no need to have those few sentences in the Results section. Also, the choice of alpha=5.0 is extreme, and therefore, it would help to judge the biological realism if the raster plots for Figs 2-6 are shown.

      We have toned down the part on ISN and reduced it to one sentence for readers who might be interested in knowing whether activity is inhibition-stabilized or not. We have also added the reference to the Tsodyks et al. 1997 paper from which we derive our stability analysis. The text now reads “Hence, pDp<sub>sim</sub> entered a balanced state during odor stimulation (Figure 1D, E) with recurrent input dominating over afferent input, as observed in pDp (Rupprecht and Friedrich, 2018). Shuffling spike times of inhibitory neurons resulted in runaway activity with a probability of 0.79 ± 0.20, demonstrating that activity was inhibition-stabilized (Sadeh and Clopath, 2020b, Tsodyks et al., 1997).”  

      We have now also added the raster plots as suggested by the reviewer (see Figure 2D, Supplementary Figure 1 G, Supplementary Figure 4). We thank the reviewer for this comment.

      (8) In the abstract, authors mention "fast pattern classification" and "continual learning," but in the paper, those issues have not been addressed. The study does not include any synaptic plasticity. 

      Concerning “continual learning” we agree that we do not simulate the learning process itself. However, Figure 6 show results of a simulation where two additional patterns were stored in a network that already contained assemblies representing other odors. We consider this a crude way of exploring the end result of a “continual learning” process. “Fast pattern classification” is mentioned because activity in balanced networks can follow fluctuating inputs with high temporal resolution, while networks with stable attractor states tend to be slow. This is likely to account for the occurrence of hysteresis effects in Scaled but not Tuned networks as shown in Supplementary

      Fig. 8.

      (9) In the Introduction, the first sentence in the second paragraph reads: "... when neurons receive strong excitatory and inhibitory synaptic input ...". The word strong should be changed to "weak".

      Also, see the pioneering work of Brunel 2000. 

      In classical balanced networks, strong excitatory inputs are counterbalanced by strong inhibitory inputs, leading to a fluctuation-driven regime. We have added Brunel 2000.

      (10) In the second paragraph of the introduction, the authors refer to studies about structural co-tuning (e.g., where "precise" synaptic balance is mentioned, and Vogels et al. 2011 should be cited there) and functional co-tuning (which is, in fact, different than tracking of excitation by inhibition, but the authors refer to that as co-tuning). It makes it easier to understand which studies talk about structural co-tuning and which ones are about functional co-tuning. The paper by Znamenski 2018, which showed both structural and functional tuning in experiments, is missing here. 

      We added the citation to the now published paper by Znamenskyi et al. (2024).  

      (11) The third paragraph in the Introduction misses some references that address network dynamics that are shaped by the inhibitory architecture in E/I assemblies in spiking networks, like Rost et al 2018 and Lagzi et al 2021. 

      These references have been added.

      (12) The last sentence of the fourth paragraph in the Introduction implies that functional co-tuning is due to structural co-tuning, which is not necessarily true. While structural co-tuning results in functional co-tuning, functional co-tuning does not require structural co-tuning because it could arise from shared correlated input or heterogeneity in synaptic connections from E to I cells.  

      We generally agree with the reviewer, but we are uncertain which sentence the reviewer refers to.

      We assume the reviewer refers to the last sentence of the second (rather than the fourth paragraph), which explicitly mentions the “…structural basis of E/I co-tuning…”. If so, we consider this sentence still correct because the “structural basis” refers not specifically to E/I assemblies, but also includes any other connectivity that may produce co-tuning, including the connectivity underlying the alternative possibilities mentioned by the reviewer (shared correlated input or heterogeneity of synaptic connections).

      (13) In order to ensure that the comparison between network dynamics is legit, authors should mention up front that for all networks, the average firing rates for the excitatory cells were kept at 1 Hz, and the background input was identical for all E and I cells across different networks.

      We slightly revised the text to make this more clear “We (…) uniformly scaled I-to-E connection weights by a factor of χ until E population firing rates in response to learned odors matched the corresponding firing rates in rand networks, i.e., 1 Hz”

      (14) In the last paragraph on page 5, my understanding was that an individual odor could target different cells within an assembly in different trials to generate trial to trail variability. If this is correct, this needs to be mentioned clearly. 

      This is not correct, an odor consists of 150 activated mitral cells with defined firing rates. As now mentioned in the Methods, “Spikes were then generated from a Poisson distribution, and this process was repeated to create trial-to-trial variability.”

      (15) The last paragraph on page 6 mentions that the four OB activity patterns were uncorrelated but if they were designed as in Figure 4A, dues to the existing overlap between the patterns, they cannot be uncorrelated. 

      This appears to be a misunderstanding. We mention in the text (and show in Figure 4B) that the four odors which “… were assigned to the corners of a square…” are uncorrelated.  The intermediate odors are of course not uncorrelated. We slightly modified the corresponding paragraph (now on page 7) to clarify this: “The subspace consisted of a set of OB activity patterns representing four uncorrelated pure odors and mixtures of these pure odors. Pure odors were assigned to the corners of a square and mixtures were generated by selecting active mitral cells from each of the pure odors with probabilities depending on the relative distances from the corners (Figure 4A, Methods).”

      (16) The notion of "learned" and "novel" odors may be misleading as there was no plasticity in the network to acquire an input representation. It would be beneficial for the authors to clarify that by "learned," they imply the presence of the corresponding E assembly for the odor in the network, with the input solely impacting that assembly. Conversely, for "novel" inputs, the input does not target a predefined assembly. In Figure 2 and Figure 4, it would be especially helpful to have the spiking raster plots of some sample E and I cells.  

      As suggested by the reviewer, we have modified the existing spiking raster plots in Figure 2, such that they include examples of responses to both learned and novel odors. We added spiking raster plots showing responses of I neurons to the same odors in Supplementary Figure 1F, as well as spiking raster plots of E neurons in Supplementary Figure 4A. To clarify the usage of “learned” and “novel”, we have added a sentence in the Results section: “We thus refer to an odor as “learned” when a network contains a corresponding assembly, and as “novel” when no such assembly is present.”.

      (17) In the last paragraph of page 8, can the authors explain where the asymmetry comes from? 

      As mentioned in the text, the asymmetry comes from the difference in the covariance structure of different classes. To clarify, we have rephrased the sentence defining the Mahalanobis distance: 

      “This measure quantifies the distance between the pattern and the class center, taking into account covariation of neuronal activity within the class. In bidirectional comparisons between patterns from different classes, the mean dM may be asymmetric if neural covariance differs between classes.”

      (18) The first paragraph of page 9: random networks are not expected to perform pattern classification, but just pattern representation. It would have been better if the authors compared Scaled I network with E/I co-tuned network. Regardless of the expected poorer performance of the E/I co-tuned networks, the result would have been interesting. 

      Please see our reply to the public review (reviewer 2).

      (19) Second paragraph on page 9, the authors should provide statistical significance test analysis for the statement "... was significantly higher ...". 

      We have performed a Wilcoxon signed-rank test, and reported the p-value in the revised manuscript (p < 0.01). 

      (20) The last sentence in the first paragraph on page 11 is not clear. What do the authors mean by "linearize input-output functions", and how does it support their claim? 

      We have now amended this sentence to clarify what we mean: “…linearize the relationship between the mean input and output firing rates of neuronal populations…”.

      (21) In the first sentence of the last paragraph on page 11, the authors mentioned “high variability”, but it is not clear compared with which of the other 3 networks they observed high variability.

      Structurally co-tuned E/I networks are expected to diminish network-level variability. 

      “High variability” refers to the variability of spike trains, which is now mentioned explicity in the text. We hope this more precise statement clarifies this point.

      (22) Methods section, page 14: "firing rates decreased with a time constant of 1, 2 or 4 s". How did they decrease? Was it an implementation algorithm? The time scale of input presentation is 2 s and it overlaps with the decay time constant (particularly with the one with 4 s decrease).  

      Firing rates decreased exponentially. We have added this information in the Methods section.

      Reviewer #3 (Recommendations For The Authors): 

      In the following, I suggest minor corrections to each section which I believe can improve the manuscript. 

      - There was no github link to the code in the manuscript. The code should be made available with a link to github in the final manuscript. 

      The code can be found here: https://github.com/clairemb90/pDp-model. The link has been added in the Methods section.

      Figure 1: 

      - Fig. 1A: call it pDp not Dp. Please check if this name is consistent in every figure and the text. 

      Thank you for catching this. Now corrected in Figure 1, Figure 2 and in the text.

      - The authors write: "Hence, pDpsim entered an inhibition-stabilized balanced state (Sadeh and Clopath, 2020b) during odor stimulation (Figure 1D, E)." and then later "Shuffling spike times of inhibitory neurons resulted in runaway activity with a probability of ~80%, demonstrating that activity was indeed inhibition-stabilized. These results were robust against parameter variations (Methods)." I would suggest moving the second sentence before the first sentence, because the fact that the network is in the ISN regime follows from the shuffled spike timing result. 

      Also, I'd suggest showing this as a supplementary figure. 

      We thank the reviewer for this comment. We have removed “inhibition-stabilized” in the first sentence as there is no strong evidence of this in Rupprecht and Friedrich, 2018. And removed “indeed” in the second sentence. We also provided more detailed statistics. The text now reads “Hence, pDpsim entered a balanced state during odor stimulation (Figure 1D, E) with recurrent input dominating over afferent input, as observed in pDp (Rupprecht and Friedrich, 2018). Shuffling spike times of inhibitory neurons resulted in runaway activity with a probability of 0.79 ± 0.20, demonstrating that activity was inhibition-stabilized (Sadeh and Clopath, 2020b).”

      Figure 2: 

      - "... Scaled I networks (Figure 2H." Missing ) 

      Corrected.

      - The authors write "Unlike in Scaled I networks, mean firing rates evoked by novel odors were indistinguishable from those evoked by learned odors and from mean firing rates in rand networks (Figure 2F)." 

      Why is this something you want to see? Isn't it that novel stimuli usually lead to high responses? Eg in the paper Schulz et al., 2021 (eLife) which is also cited by the authors it is shown that novel responses have high onset firing rates. I suggest clarifying this (same in the context of Fig. 3C). 

      In Dp and piriform cortex, firing rates evoked by learned odors are not substantially different from firing rates evoked by novel odors. While small differences between responses to learned versus novel odors cannot be excluded, substantial learning-related differences in firing rates, as observed in other brain areas, have not been described in Dp or piriform cortex. We added references in the last paragraph of p.5. Note that the paper by Schulz et al. (2021) models a different type of circuit.  

      - Fig. 2B: Indicate in figure caption that this is the case "Scaled I" 

      This is not exactly the case “Scaled I”, as the parameter 𝝌𝝌 (increased I to E strength) is set to 1.

      - Suppl Fig. 2I: Is E&F ever used in the manuscript? I couldn't find a reference. I suggest removing it if not needed. 

      Suppl. Fig 2I E&F is now Suppl Fig.1G&H. We now refer to it in the text: “Activity of networks with E assemblies could not be stabilized around 1 Hz by increasing connectivity from subsets of I neurons receiving dense feed-forward input from activated mitral cells (Supplementary Figure 1GH; Sadeh and Clopath, 2020).”

      Figure 3: 

      - As mentioned in my comment in the public review section, I find the arguments about pattern completion a little bit confusing. For me it's not clear why an increase of output correlations over input correlations is considered "pattern completion" (this is not to say that I don't find the nonlinear increase of output correlations interesting). For me, to test pattern completion with second-order statistics one would need to do a similar separation as in Suppl Fig. 3, ie measuring the pairwise correlation at cells in the assembly L that get direct input from L OB with cells in the assembly L that do not get direct input from OB. If the pairwise correlations of assembly cells which do not get direct input from OB increase in correlations, I would consider this as pattern completion (similar to the argument that increase in firing rate in cells which are not directly driven by OB are considered a sign of pattern completion). 

      Also, for me it now seems like that there are contradictory results, in Fig. 3 only Scaled I can lead to pattern completion while in the context of Suppl. Fig. 3 the authors write "We found that assemblies were recruited by partial inputs in all structured pDpsim networks (Scaled and Tuned) without a significant increase in the overall population activity (Supplementary Figure 3A)."   I suggest clarifying what the authors exactly mean by pattern completion, why the increase of output correlations above input correlations can be considered as pattern completion, and why the results differs when looking at firing rates versus correlations. 

      Please see our reply to the public review (reviewer 3).

      - I actually would suggest adding Suppl. Fig. 3 to the main figure. It shows a more intuitive form of pattern completion and in the text there is a lot of back and forth between Fig. 3 and Suppl. Fig. 3 

      We feel that the additional explanations and panels in Fig.3 should clarify this issue and therefore prefer to keep Supplementary Figure 3 as part of the Supplementary Figures for simplicity.  

      - In the whole section "We next explored effects of assemblies ... prevented strong recurrent amplification within E/I assemblies." the authors could provide a link to the respective panel in Fig. 2 after each statement. This would help the reader follow your arguments. 

      We thank the reviewer for pointing this out. The references to the appropriate panels have been added. 

      - Fig. 3A: I guess the x-axis has been shifted upwards? Should be at zero. 

      We have modified the x-axis to make it consistent with panels B and C.  

      - Fig. 3B: In the figure caption, the dotted line is described as the novel odor but it is actually the unit line. The dashed lines represent the reference to the novel odor. 

      Fixed.

      - Fig. 3C: The " is missing for Pseudo-Assembly N

      Fixed.

      - "...or a learned odor into another learned odor." Have here a ref to the Supplementary Figure 3B.

      Added.

      Figure 4:   

      - "This geometry was largely maintained in the output of rand networks, consistent with the notion that random networks tend to preserve similarity relationships between input patterns (Babadi and Sompolinsky, 2014; Marr, 1969; Schaffer et al., 2018; Wiechert et al., 2010)." I suggest adding here reference to Fig. 4D (left). 

      Added.

      - Please add a definition of E/I assemblies. How do the authors define E/I assemblies? I think they consider both, Tuned I and Tuned E+I as E/I assemblies? In Suppl. Fig. 2I E it looks like tuned feedforward input is defined as E/I assemblies. 

      We thank the reviewer for pointing this out. E/I assemblies are groups of E and I neurons with enhanced connectivity. In other words, in E/I assemblies, connectivity is enhanced not only between subsets of E neurons, but also between these E neurons and a subset of I neurons. This is now clarified in the text: “We first selected the 25 I neurons that received the largest number of connections from the 100 E neurons of an assembly. To generate E/I assemblies, the connectivity between these two sets of neurons was then enhanced by two procedures.”. We removed “E/I assemblies” in Suppl. Fig.2, where the term was not used correctly, and apologize for the confusion.

      - Suppl. Fig. 4: Could the authors please define what they mean by "Loadings" 

      The loadings indicate the contribution of each neuron to each principal component, see adjusted legend of Suppl. Fig. 4: “G. Loading plot: contribution of neurons to the first two PCs of a rand and a Tuned E+I network (Figure 4D).”

      - Fig. 4F: The authors might want to normalize the participation ratio by the number of neurons (see e.g. Dahmen et al., 2023 bioRxiv, "relative PR"), so the PR is bound between 0 and 1 and the dependence on N is removed. 

      We thank the reviewer for the suggestion, but we prefer to use the non-normalized PR as we find it more easily interpretable (e.g. number of attractor states in Scaled networks).

      - Fig. 4G&H: as mentioned in the public review, I'd add the case of Scaled I to be able to compare it to the Tuned E+I case. 

      As already mentioned in the public review, we thank the reviewer for this suggestion, which we have implemented.

      - Figure caption Fig. 4H "Similar results were obtained in the full-dimensional space." I suggest showing this as a supplemental panel. 

      Since this only adds little information, we have chosen not to include it as a supplemental panel to avoid overloading the paper with figures.

      Figure 5: 

      - As mentioned in the public review, I suggest that the authors add the Scaled I case to Fig. 5 (it's shown in all figures and also in Fig. 6 again). I guess for Scaled I the separation between L and M will be very good? 

      Please see our reply to the public review (reviewer 3).

      - Fig. 5A&B: I am a bit confused about which neurons are drawn to calculate the Mahalanobis distance. In Fig. 5A, the schematic indicates that the vector B from which the neurons are drawn is distinct from the distribution Q. For the example of odor L, the distribution Q consists of pure odor L with odors that have little mixtures with the other odors. But the vector v for odor L seems to be drawn only from odors that have slightly higher mixtures (as shown in the schematic in Fig. 5A). Is there a reason to choose the vector v from different odors than the distribution Q? 

      The distribution Q and the vector v consist of activity patterns across the same neurons in response to different odors. The reason to choose a different odor for v was to avoid having this test datapoint being included in the distribution Q. We also wanted Q to be the same for all test datapoints. 

      What does "drawn from whole population" mean? Does this mean that the vectors are drawn from any neuron in pDp? If yes, then I don't understand how the authors can distinguish between different odors (L,M,O,N) on the y-axis. Or does "whole population" mean that the vector is drawn across all assemblies as shown in the schematic in Fig. 5A and the case "neurons drawn from (pseudo-) assembly" means that the authors choose only one specific assembly? In any case, the description here is a bit confusing, I think it would help the reader to clarify those terms better.  

      Yes, “drawn from whole population” means that we randomly draw 80 neurons from the 4000 E neurons in pDp. The y-axis means that we use the activity patterns of these neurons evoked by one of the 4 odors (L, M, N, O) as reference. We have modified the Figure legend to clarify this: “d<sub>M</sub> was computed based on the activity patterns of 80 E neurons drawn from the four (pseudo-) assemblies (top) or from the whole population of 4000 E neurons (bottom). Average of 50 draws.”

      - Suppl Fig. 5A: In the schematic the distance is called d_E(\bar{Q},\bar{V}) while the colorbar has d_E(\bar{Q},\bar{Q}) with the Qs in different color. The green Q should be a V. 

      We thank the reviewer for spotting this mistake, it is now fixed.

      - Fig. 5: Could the authors comment on the fact that a random network seems to be very good in classifying patterns on it's own. Maybe in the Discussion? 

      The task shown in Figure 5 is a relatively easy one, a forced-choice between four classes which are uncorrelated. In Supplementary Figure 9, we now show classification for correlated classes, which is already much harder.

      Figure 6: 

      - Is the correlation induced by creating mixtures like in the other Figures? Please clarify how the correlations were induced. 

      We clarified this point in the Methods section: “The pixel at each vertex corresponded to one pure odor with 150 activated and 75 inhibited mitral cells (…) and the remaining pixels corresponded to mixtures. In the case of correlated pure odors (Figure 6), adjacent pure odors shared half of their activated and half of their inhibited cells.”. An explicit reference to the Methods section has also been added to the figure legend.

      - Fig. 6C (right): why don't we see the clear separation in PC space as shown in Fig. 4? Is this related to the existence of correlations? Please clarify. 

      Yes. The assemblies corresponding to the correlated odors X and Y overlap significantly, and therefore responses to these odors cannot be well separated, especially for Scaled networks. We added the overlap quantification in the Results section to make this clear. “These two additional assemblies had on average 16% of neurons in common due to the similarity of the odors.”

      - "Furthermore, in this regime of higher pattern similarity, dM was again increased upon learning, particularly between learned odors and reference classes representing other odors (not shown)." Please show this (maybe as a supplemental figure). 

      We now show the data in Supplementary Figure 9.

      Discussion: 

      - The authors write: "We found that transformations became more discrete map-like when amplification within assemblies was increased and precision of synaptic balance was reduced. Likewise, decreasing amplification in assemblies of Scaled networks changed transformations towards the intermediate behavior, albeit with broader firing rate distributions than in Tuned networks (not shown)." 

      Where do I see the first point? I guess when I compare in Fig. 4D the case of Scaled I vs Tuned E+I, but the sentence above sounds like the authors showed this in a more step-wise way eg by changing the strength of \alpha or \beta (as defined in Fig. 1). 

      Also I think if the authors want to make the point that decreasing amplification in assemblies changes transformation with a different rate distribution in scaled vs tuned networks, the authors should show it (eg adding a supplemental figure). 

      The first point is indeed supported by data from different figures. Please note that the revised manuscript now contains further simulations that reinforce this statement, particularly those shown in Supplementary Figure 6, and that this point is now discussed more extensively in the Discussion. We hope that these revisions clarify this general point.

      The data showing effects of decreasing amplification in assemblies is now shown in Supplementary Figure 6 (Scaled[adjust])

      - I suggest adding the citation Znamenskiy et al., 2024 (Neuron; https://doi.org/10.1016/j.neuron.2023.12.013), which shows that excitatory and inhibitory (PV) neurons with functional similarities are indeed strongly connected in mouse V1, suggesting the existence of E/I assembly structure also in mammals.

      Done.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment:

      Developing a reliable method to record ancestry and distinguish between human somatic cells presents significant challenges. I fully acknowledge that my current evidence supporting the claim of lineage tracing with fCpG barcodes is inadequate. I agree with Reviewer 1 that fCpG barcodes are essentially a cellular division clock that diverges over time. A division clock could potentially document when cells cease to divide during development, with immediate daughter cells likely exhibiting more similar barcodes than those that are less related. Although it remains uncertain whether the current fCpG barcodes capture useful biological information, refinement of this type of tool could complement other approaches that reconstruct human brain function, development, and aging.

      Due to my lack of clarity, the fCpG barcode was perceived to be a new type of cell classifier. However, it is fundamentally different. fCpG sites are selected based on their differences between cells of the same type, while traditional cell classifiers focus on sites with consistent methylation patterns in cells of the same type. Despite these opposing criteria, fCpG barcodes and traditional cell classifiers may align because neuron subtypes often share common progenitors. As a result, cells of the same phenotype are also closely related by ancestry, and ex post facto, have similar fCpG barcodes. fCpG barcodes are complementary to cell type classifiers, and potentially provide insights into aspects such as mitotic ages, diversity within a clade, and migration of immediate daughters---information which is otherwise difficult to obtain. The title has been modified to “Human Brain Ancestral Barcodes” to better reflect the function of the fCpG barcodes. The manuscript is edited to correct errors, and a new Supplement is added to further explain fCpG barcode mechanics and present new supporting data.

      Reviewer #1 (Public review):

      I thank Reviewer 1 for his constructive comments. Major noted weaknesses were 1) insufficient clarity and brevity of the methodology, 2) inconsistent or erroneous use of neurodevelopmental concepts, and 3) lack of consideration for alternative explanations.

      (1) The methodology is now outlined in detailed in a new Supplement, including simulations that indicate that the error rate consistent with the experimental data is about 0.01 changes in methylation per fCpG site per division.

      (2) Conceptual and terminology errors noted by the Reviewers are corrected in the manuscript.

      (3) I agree completely with the alternative explanation of Reviewer 1 that fCpGs are “a cellular division clock that diverges over 'time'”. Differences between more traditional cell type classifiers and fCpG barcodes are more fully outlined in the new Supplement.  Ancestry recorded by fCpGs and cell type classifiers are confounded because cells of the same phenotype typically have common progenitors---cells within a clade have similar fCpG barcodes because they are closely related. fCpG barcodes can compliment cell type classifiers with additional information such as mitotic ages, ancestry within a clade, and daughter cell migration.

      Reviewer #1 (Recommendations for the authors):

      (1) A lot of the interpretations suffer from an extremely loose/erroneous use of developmental concepts and a lack of transparency. For instance:

      a) The thalamus is not part of the brain stem

      Corrected.

      b) The pons contains cells other than inhibitory neurons in the data; the same is true for the hippocampus which contains multiple cell types

      Corrected to refer to the specific cell types in these regions.

      c) The author talks about the rostral-caudal timing a lot which is not really discussed to this degree in the cited references. Thus, it is also unclear how interneurons fit in this model as they are distinguished by a ventral-dorsal difference from excitatory neurons. Also, it is unclear whether the timing is really as distinct as claimed. For instance, inhibitory neurons and excitatory neurons significantly overlap in their birth timing. Finally, conceptually, it does not make sense to go by developmental timing as the author proposes that it is the number of divisions that is relevant. While they are somewhat correlated there are potentially stark differences.

      The manuscript attempts to describe what might be broadly expected when barcodes are sampled from different cell types and locations. As a proposed mitotic clock, the fCpG barcode methylation level could time when each neuron ceased division and differentiated. The wide ranges of fCpG barcode methylation of each cell type (Fig 2A) would be consistent with significant overlap between cell types. The manuscript is edited to emphasize overlapping rather than distinct sequential differentiation of the cell types.

      d) Neocortical astrocytes and some oligodendrocytes share a lineage, whereas a subset of oligodendrocytes in the cortex shares an origin with interneurons. This could confound results but is never discussed.

      The manuscript does not assess glial lineages in detail because neurons were preferentially included in the sampling whereas glial cells were non-systematically excluded. This sampling information is now included in the section “fCpG barcode identification”.

      e) Neocortical interneurons should be more closely related in terms of lineage-to-excitatory neurons than other inhibitory neurons of, for instance, the pons. This is not clearly discussed and delineated.

      This is not discussed. It may not be possible analyze these details with the current data. The ancestral tree reconstructions indicate that excitatory neurons that appear earlier in development (and are more methylated) are more often more closely related to inhibitory neurons.

      f) While there is some spread of excitatory neurons tangentially, there is no tangential migration at the scale of interneurons as (somewhat) suggested/implied here.

      The abstract and results have been modified to indicate greater inhibitory than excitatory neuron tangential migration, but that the extent of excitatory neuron tangential migration cannot be determined because of the sparse sampling and that barcodes may be similar by chance.

      g) The nature of the NN cells is quite important as cells not derived from the neocortical anlage are unlikely to share a developmental origin (e.g., microglia, endothelial cells). This should be clarified and clearly stated.

      The manuscript is modified to indicate that NN cells are microglial and endothelial cells. These cells have different developmental origins, and their data are present in Fig 2A, but are not further used for ancestral analysis.  

      (2) The presentation is often somewhat confusing to me and lacks detail. For instance:

      a) The methods are extremely short and I was unable to find a reference for a full pipeline, so other researchers can replicate the work and learn how to use the pipeline.

      The pipeline including python code is outlined in the new Supplement

      b) Often numbers are given as ~XX when the actual number with some indication of confidence or spread would be more appropriate.

      Data ranges are often indicated with the violin plots.

      c) Many figure legends are exceedingly short and do not provide an appropriate level of detail.

      Figure legends have been modified to include more detail

      d) Not defining groups in the figure legends or a table is quite unacceptable to me. I do not think that referring to a prior publication (that does not consistently use these groups anyway) is sufficient.

      The cell groups are based on the annotations provided with each single cell in the public databases.

      e) The used data should be better defined and introduced (number of cells, different subtypes across areas, which cells were excluded; I assume the latter as pons and hippocampus are only mentioned for one type of neuronal cells, see also above).

      The data used are present in Supplemental File 2 under the tab “cell summary H01, H02, H04”.

      f) Why were different upper bounds used for filtering for H01 and H02, and H04 is not mentioned? Why are inhibitory and excitatory neurons specifically mentioned (Lines 61-66)?

      The filtering is used to eliminate, as much as possible, cell type specific methylation, or CpG sites with skewed neuron methylation. The filtering eliminates CpG sites with high or low methylation within each of the three brains, and within the two major neuron subtypes. The goal is to enrich for CpG sites with polymorphic but not cell type specific methylation. This process is ad hoc as success criteria are currently uncertain. The extent of filtering is balanced by the need to retain sufficient numbers of fCpGs to allow comparisons between the neurons.

      g) What 'progenitor' does the author refer to? The Zygote? If yes, can the methylation status be tested directly from a zygote? There is no single progenitor for these cells other than the zygote. Does the assumption hold true when taking this into account? See, for instance, PMID 33737485 for some estimation of lineage bottlenecks.

      A brain progenitor cell can be defined as the common ancestor of all adult neurons, and is the first cell where each of its immediate daughter cell lineages yield adult neurons. The zygote is a progenitor cell to all adult cells, and barcode methylation at the start of conception, from the oocyte to the ICM, was analyzed in the new Supplement. The proposed brain progenitor cell with a fully methylated barcode was not yet evident even in the ICM.

      (3) I am generally not convinced that the fCpGs represent anything but a molecular clock of cell divisions and that many of the similarities are a function of lower division numbers where the state might be more homogenous. This mainly derives from the issues cited above, the lack of convincing evidence to the contrary, and the sparsity of the assessed data.

      Agree that the fCpG barcode is a mitotic clock that becomes polymorphic with divisions. As outlined in the new Supplement, ancestry and cell type are confounded because cells of the same type typically have a common progenitor.

      a) There appears little consideration or modeling of what the ability to switch back does to the lineage reconstruction.

      fCpG methylation flipping is further analyzed and discussed in the new Supplement.

      b) None of the data convinced me that the observations cannot be explained by the aforementioned molecular clock and systematic methylation similarities of cell types due to their cell state.

      See above

      (4) Uncategorized minor issues:

      a) The author should explain concepts like 'molecular clock hypothesis' (line 27) or 'radial unit hypothesis' (line 154), as they are somewhat complex and might not be intuitive to readers.

      The molecular clock hypothesis is deleted and the radial unit hypothesis is explained in more detail in the manuscript.

      b) Line 32: '[...] replication errors are much higher compared to base replication [...]'. I think this is central to the method and should be better explained and referenced. Maybe even through a schematic, as this is a central concept for the entire manuscript.

      The fCpG barcode mechanics are better explained in the new Supplement. With simulations, the fCpG flip rate is about 0.01 per division per fCpG.

      c) Line 41: 'neonatal'. Does the author mean to say prenatal? Most of the cells discussed are postmitotic before birth.

      Corrected to prenatal.

      d) Line 96: what does 'flip' mean in this context? Please also see the comment on Figure 2C.

      Edited to “chage”

      e) Lines 134-135: I am not sure whether the author claims to provide evidence for this question, and I would be careful with claims that this work does resolve the question here.

      Have toned down claims as evidence for my analysis is currently inadequate.

      f) Lines 192-193: I disagree as the fCpGs can switch back and the current data does not convince me that this is an improvement upon mosaic mutation analysis. In my mind, the main advantage is the re-analysis of existing data and the parallel functional insights that can be obtained.

      Lineage analysis is more straightforward with DNA sequencing, but with an error rate of ~10-9 per base per division, one needs to sequence a billion base pairs to distinguish between immediate daughter cells. By contrast, with an inferred error rate of ~10-2 per fCpG per division, much less sequencing (about a million-fold less) is needed to find differences between daughter cells.

      g) Lines 208-209: I would be careful with claims of complexity resolution given many of the limitations and inherent systematic similarities, as well as the potential of fCpGs to change back to an ancestral state later in the lineage.

      Have modified the manuscript to indicate the analysis would be more challenging due to back changes.

      h) There seem to be few figures that assess phenomena across the three brains. Even when they exist there is no attempt to provide any statistical analyses to support the conclusions or permutations to assess outlier status relative to expectations.

      The analysis could be more extensive, but with only three brains, any results, like this study itself, would be rightly judged inadequate.

      Figure 2B: there appears to be a higher number of '0s' for, for instance, inhibitory neurons compared to excitatory neurons. Is that correct and worth mentioning? The changing axes scales also make it hard to assess.

      Inhibitory neurons do appear to have more unmethylated fCpGs compared to excitatory neurons, but in general, most inhibitory fCpGs are methylated with a skew to fully methylated fCpGs, consistent with the barcode starting predominately methylated and inhibitory neurons generally appearing earlier in development relative to excitatory neurons.

      j) Figure 2C: I have several issues with this. A minor one is the use of 'Glial' which, I believe, does not appear anywhere else before this, so I am unclear what this curve represents. Generally, however, I am not sure what the y-axis represents, as it is not described in the methods or figure legend. I initially thought it was the cumulative frequency, but I do not think that this squares with the data shown in B. I appreciate the overall idea of having 'earlier'/samples with fewer divisions being shifted to the left, but it is very confusing to me when I try to understand the details of the plot.

      This graph is now better described in the legend. “Glial” cells are defined as oligodendrocytes and astrocytes. Other non-neuronal cells (such a microglial cells) have now been removed from the graph.

      This graph attempts to illustrate how it may be possible to reconstruct brain development from adult neurons, assuming barcodes are mitotic clocks that become polymorphic with cell division. The X axis is “time”, and the Y axis indicates when different cell types reach their adult levels. The cartoon indicates what is visually present along the X axis during development--- brainstem, then ganglionic eminences with a thin cortex, and finally the mature brain with a robust cortex. Time for the X axis is barcode methylation and starts at 100% and ends at 50% or greater methylation. The fCpG barcode methylation of each cell places it on this timeline and indicates when it ceased dividing and differentiated.

      The Y axis indicates the progressive accumulation of the final adult contents of each cell type during this timeline. Early in development, the brain is rudimentary and adult cells are absent. At 90% methylation, only the inhibitory neurons in the pons are present. At 80% methylation, some excitatory neurons are beginning to appear. Inhibitory neurons in the pons have reached their final adult levels and many other inhibitory neuron types are reaching adult levels. By 70% methylation, most inhibitory neurons have reached their adult levels, and more adult excitatory neurons (mainly low cortical neurons, L4-6) and glial cells are beginning to appear. By 60% methylation, inhibitory neurogenesis has largely finished. Adult excitatory neurons and glial cells are more abundant and reach their adult levels by 50% or greater cell barcode methylation levels.

      The graph illustrates a rough alignment between mitotic ages inferred by barcode methylation levels and the physical appearances of different neuronal types during development. Many neurons die during development, and this graph, if valid, indicates when neurons that survive to adulthood appear during development.

      k) Figure 4Bff: it is confusing to me that the text jumps to these panels after introducing Figure 5. This makes it very hard to read this section of the text.

      The Figures appear in the order they are first referred to in the text.

      l) Figure 5A: could any of this difference be explained by the shared lineage of excitatory neurons and dorsal neocortical glia?

      Not sure

      m) Figure 5B: after stating that interneurons have a higher lineage fidelity, the figure legend here states the opposite and I am somewhat confused by this statement.

      The legend and text have been clarified. Fig 5A restricts fidelity to within inhibitory cell types. Fig 5B compares between neuron subtypes, and illustrates more apparent inhibitory subtype switching, albeit there are more interneuron subtypes than excitatory subtypes.

      n) Figure 5E: generally, the use of tSNE for large pairwise distance analysis is often frowned upon (e.g., PMID 37590228), and I would reconsider this argument.

      This analysis was an attempt to illustrate that cells of the same phenotype based on their tSNE metrics can be either closely or more distantly related. Although the tSNE comparisons were restricted to subtypes (and not to the entire tSNE graph), tSNE are not designed for such comparisons. This graph and discussion are deleted. 

      Reviewer #2 (Public review):

      The manuscript by Shibata proposed a potentially interesting idea that variation in methylcytosine across cells can inform cellular lineage in a way similar to single nucleotide variants (SNVs). The work builds on the hypothesis that the "replication" of methylcytosine, presumably by DNMT1, is inaccurate and produces stochastic methylation variants that are inherited in a cellular lineage. Although this notion can be correct to some extent, it does not account for other mechanisms that modulate methylcytosines, such as active gain of methylation mediated by DNMT3A/B activity and activity demethylation mediated by TET activity. In some cases, it is known that the modulation of methylation is targeted by sequence-specific transcription factors. In other words, inaccurate DNMT1 activity is only one of the many potential ways that can lead to methylation variants, which fundamentally weakens the hypothesis that methylation variants can serve as a reliable lineage marker. With that being said (being skeptical of the fundamental hypothesis), I want to be as open-minded as possible and try to propose some specific analyses that might better convince me that the author is correct. However, I suspect that the concept of methylation-based lineage tracing cannot be validated without some kind of lineage tracing experiment, which has been successfully demonstrated for scRNA-seq profiling but not yet for methylation profiling (one example is Delgado et al., nature. 2022).

      I thank Reviewer 2 for the careful evaluation. The validation experiment example (Delgado et al.) introduced sequence barcodes in mice, which is not generally feasible for human studies.

      (1) The manuscript reported that fCpG sites are predominantly intergenic. The author should also score the overlap between fCpG sites and putative regulatory elements and report p-values. If fCpG sites commonly overlap with regulatory elements, that would increase the possibility that these sites being actively regulated by enhancer mechanisms other than maintenance methyltransferase activity.

      As mentioned for Reviewer 1, fCpGs are filtered to eliminate cell type specific methylation.

      (2) The overlap between fCpG and regulatory sequence is a major alternative explanation for many of the observations regarding the effectiveness of using fCpG sites to classify cell types correctly. One would expect the methylation level of thousands of enhancers to be quite effective in distinguishing cell types based on the published single-cell brain methylome works.

      As mentioned above, the manuscript did not clearly indicate that the fCpG barcode is not a cell type classifier. The distinctions between fCpG barcodes and cell type classifiers are better explained in the new Supplement.

      (3) The methylation level of fCpG sites is higher in hindbrain structures and lower in forebrain regions. This observation was interpreted as the hindbrain being the "root" of the methylation barcodes and, through "progressive demethylation" produced the methylation states in the forebrain. This interpretation does not match what is known about methylation dynamics in mammalian brains, in particular, there is no data supporting the process of "progressive demethylation". In fact, it is known that with the activation of DNMT3A during early postnatal development in mice or humans (Lister et al., 2013. Science), there is a global gain of methylation in both CH and CG contexts. This is part of the broader issue I see in this manuscript, which is that the model might be correct if "inaccurate mC replication" is the only force that drives methylation dynamics. But in reality, active enzymatic processes such as the activation of DNMT3A have a global impact on the methylome, and it is unclear if any signature for "inaccurate mC replication" survives the de novo methylation wave caused by DNMT3A activity.

      Reviewer 2 highlights a critical potential flaw in that any ancestral signal recorded by random replication errors could be overwritten by other active methylation processes. I cannot present data that indicates fCpG replication errors are never overwritten, but new data indicate barcode reproducibility and stability with aging.

      New data are also present where barcodes are compared between daughter cells (zygote to ICM) in the setting of active and passive demethylation, when germline methylation is erased. This new analysis shows that daughter cells in 2 to 8 cell embryos have more related barcodes than morula or ICM cells. The subsequent active remethylation by a wave of DNMT3A activity may underlie the observation that the barcode appears to start predominately methylated in brain progenitors.

      (3) Perhaps one way the author could address comment 3 is to analyze methylome data across several developmental stages in the same brain region, to first establish that the signal of "inaccurate mC replication" is robust and does not get erased during early postnatal development when DNMT3A deposits a large amount of de novo methylation.

      See above

      (4) The hypothesis that methylation barcodes are homogeneous among progenitor cells and more polymorphic in derived cells is an interesting one. However, in this study, the observation was likely an artifact caused by the more granular cell types in the brain stem, intermediate granularity in inhibitory cells, and highly continuous cell types in cortical excitatory cells. So, in other words, single-cell studies typically classify hindbrain cell types that are more homogenous, and cortical excitatory cells that are much more heterogeneous. The difference in cell type granularity across brain structures is documented in several whole-brain atlas papers such as Yao et al. 2023 Nature part of the BICCN paper package.

      As noted above, fCpG barcode polymorphisms and cell type differentiation are confounded because cells of the same phenotype tend to have common progenitors. The fCpG barcode is not a cell type classifier but more a cell division clock that becomes polymorphic with time. Although fCpG barcodes could be more polymorphic in cortical excitatory cells because there are many more types, fCpG barcodes would inherently become more polymorphic in excitatory cells because they appear later in development.

      (5) As discussed in comment 2, the author needs to assess whether the successful classification of cell types (brain lineage) using fCpG was, in fact, driven by fCpG sites overlapping with cell-type specific regulatory elements.

      Although unclear in the manuscript, the fCpG is not a cell classifier and the barcode is polymorphic between cells of the same type. fCpG barcodes can appear to be cell classifiers because cell types appear at different times during development, and therefore different cell types have characteristic average barcode methylation levels.

      (6) In Figure 5E, the author tried to address the question of whether methylation barcodes inform lineage or post-mitotic methylation remodeling. The Y-axis corresponds to distances in tSNE. However, tSNE involves non-linear scaling, and the distances cannot be interpreted as biological distances. PCA distances or other types of distances computed from high-dimensional data would be more appropriate.

      The Figure and discussion are deleted (similar comment by Reviewer 1)

      Reviewer #3 (Public review):

      Summary:

      In the manuscript entitled "Human Brain Barcodes", the author sought to use single-cell CpG methylation information to trace cell lineages in the human brain.

      Strengths:

      Tracing cell lineages in the human brain is important but technically challenging. Lineage tracing with single-cell CpG methylation would be interesting if convincing evidence exists.

      Weaknesses:

      As the author noted, "DNA methylation patterns are usually copied between cell division, but the replication errors are much higher compared to base replication". This unstable nature of CpG methylation would introduce significant problems in inferring the true cell lineage. The unreliable CpG methylation status also raises the question of what the "Barcodes" refer to in the title and across this study. Barcodes should be stable in principle and not dynamic across cell generations, as defined in Reference#1. It is not convincing that the "dynamic" CpG methylation fits the "barcodes" terminology. This problem is even more concerning in the last section of results, where CpG would fluctuate in post-mitotic cells.

      I thank Reviewer 3 for his thoughtful and careful evaluation. I think the “barcode” terminology is appropriate. Dynamic engineered barcodes such as CRISPR/Cas9 mutable barcodes are used in biology to record changes over time. The fCpG barcode appears to start with a single state in a progenitor cell and changes with cell division to become polymorphic in adult cells. Therefore, I think the description of a dynamic fCpG barcode is appropriate.

      Reviewer #3 (Recommendations for the authors):

      (1) As the author noted, "DNA methylation patterns are usually copied between cell division, but the replication errors are much higher compared to base replication". This unstable nature of CpG methylation would introduce significant problems in inferring the true cell lineage. To establish DNA methylation as a means for lineage tracing, one control experiment would be testing whether the DNA methylation patterns can faithfully track cell lineages for in vitro differentiated & visibly tracked cell lineages. Has this kind of experiment been done in the field?

      These types of experiments have not been performed to my knowledge and an appropriate tissue culture model is uncertain. New single cell WGBS data from the zygote to ICM indicate that more immediate daughter cells have more related barcodes even in the setting of active DNA demethylation.

      (2) The study includes assumptions that should be backed with solid rationale, supporting evidence, or reference. Here are a couple of examples:

      a) the author discarded stable CpG sites with <0.2 or >0.8 average methylation without a clear rationale in H02, and then used <0.3 and >0.7 for a specific sample H01.

      The filtering was ad hoc and was used to remove, as much as possible, CpG sites with cell type specific or patient specific methylation. CpG sites with skewed methylation are more likely cell type specific, whereas X chromosome CpG sites with methylation closer to 0.5 in male cells are more likely to be unstable. The ad hoc filtering attempted to remove cell specific CpGs sites while still retaining enough CpG sites to allow comparisons between cells.

      b) The author assumed that the early-formed brain stem would resemble progenitors better and have a higher average methylation level than the forebrain. However, this difference in DNA methylation status could reflect developmental timing or cell type-specific gene expression changes.

      This observation that brain stem neurons that appear early in development have highly methylated fCpG barcodes in all 3 brains supports the idea that the fCpG barcode starts predominately methylated. Alternative explanations are possible.

      (3) The conclusion that excitatory neurons undergo tangential migration is unclear - how far away did the author mean for the tangential direction? Lateral dispersion is known, but it would be striking that the excitatory neurons travel across different brain regions. The question is, how would the author interpret shared or divergent methylation for the same cell type across different brain regions?

      As noted with Reviewer 1, this analysis is modified to indicate that evidence of tangential migration is greater for inhibitory than excitatory neurons, but the extent of excitatory neuron migration is uncertain because of sparse sampling, and because fCpG barcodes can be similar by chance.

      (4) The sparsity and resolution of the single-cell DNA methylation data. The methylation status is detected in only a small fraction (~500/31,000 = 1.6%) of fCpGs per cell, with only 48 common sites identified between cell pairs. Given that the human genome contains over 28 million CpG sites, it is important to evaluate whether these fCpGs are truly representative. How many of these sites were considered "barcodes"?

      fCpG barcodes are distinct from traditional cell type classifiers, and how fCpGs are identified are better outlined in the new Supplement.

      (5) While focusing on the X-chromosome may simplify the identification of polymorphic fCpGs, the confidence in determining its methylation status (0 or 1) is questionable when a CpG site is covered by only one read. Did the author consider the read number of detected fCpGs in each cell when calculating methylation levels? Certain CpG sites on autosomes may also have sufficient coverage and high variability across cells, meeting the selection criteria applied to X-chromosome CpGs.

      In most cases, a fCpG site was covered by only a single read

      (6) The overall writing in the Title, the Main text, Figure legends, and Methods sections are overly simplified, making it difficult to follow. For instance, how did the author perform PWD analysis? How did they handle missing values when constructing lineage trees?

      There is not much introduction to lineage tracing in the human brain or the use of DNA methylation to trace cell lineage.

      These shortcomings are improved in the manuscript and with the new Supplement. The analysis pipeline including the Python programs are outlined and included as new Supplemental materials. IQ tree can handle the binary fCpG barcode data and skips missing values with its standard settings.

      Line 80: it is unclear: "Brain patterns were similar"

      Clarified

      Line 98: The meaning is unclear here: "Outer excitatory and glial progenitor cells are present" What are these glial progenitor cells and when/how they stop dividing?

      The glial cells are the oligodendrocytes and astrocytes. The main take away point is that these glial cells have low barcode methylation, consistent with their appearances later in development.

      Line 104: It is unclear if this is a conclusion or assumption -- "A progenitor cell barcode should become increasingly polymorphic with subsequent divisions." The "polymorphic" happens within the progenitors, their progenies, or their progenies at different time points.

      The statement is now clarified as an assumption in the manuscript.

      Similarly line 134 "Barcodes would record neuronal differentiation and migration." Is this a conclusion from this study or a citation? How is the migration part supported?

      The reasoning is better explained in the manuscript.  Migration can be documented if immediate daughter cells with similar barcodes are found in different parts of the adult brain, albeit analysis is confounded by sparse sampling and because barcodes may be similar by chance.

      Line 148 and 150: "Nearest neighbor ... neuron pairs" in DNA methylation status would conceivably reflect their cell type-specific gene expression, how did the author distinguish this from cell lineage?

      As noted above, because cells with similar phenotypes usually arise from common progenitors, cells within a clade are also usually related. However, the barcodes are still polymorphic within a clade and potentially add complementary information on mitotic ages, ancestry within a clade, and possible cell migration.

      Figure 3C: "Cells that emerge early in development" Where are they on the figure?

      Hindbrain neurons differentiate early in development and their barcodes are more methylated. The figure has been modified to label some of the values with their neuron types. Also, the older figure mistakenly included data from all 3 brains and now the data are only from brain H01.

      Figures 4D and 4E, distinguishing cell subtypes is challenging, as the same color palette is used for both excitatory and inhibitory neurons.

      Unfortunate limitations due to complexity and color limitations

      Figures 4 and 5, what are these abbreviations?

      The abbreviations are presented in Figure 1 and maintained in subsequent figures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      We thank the reviewer for their comments and suggestions. We have made several edits to the paper to address these comments, including the addition of several new control experiments, corrections to mislabeled figures in Fig 2, and other additions to improve the clarity of several figures.

      This work is missing several controls that are necessary to substantiate their claims. My most important concern is that the optogenetic screen for neurons that alter pathogenic lawn occupancy does not have an accompanying control on non-pathogenic OP50 bacteria. Hence, it remains unclear whether these neuronal inhibition experiments lead to pathogen-specific or generalized lawn-leaving alterations. For strains that show statistical differences between - and + ATR conditions, the authors should perform follow-up validation experiments on non-pathogenic OP50 lawns to ensure that the observed effect is PA14-specific. Similarly, neuronal inhibition experiments in Figures 5E and H are only performed with naïve animals on PA14 - we need to see the latency to re-entry on OP50 as well, to make general conclusions about these neurons' role in pathogen-specific avoidance.

      We have added data from new control experiments to Fig. S1 (subfigures B, C) for both exit and re-entry dynamics on OP50. We find that inhibition of neurons produces different effects on both lawn entry and exit on PA14 compared to OP50. We observed that inhibition of neurons failed to change the re-entry dynamics for any of the lines which showed delayed latency to re-entry on PA14. Our results suggest that the neural control of re-entry dynamics we see are PA14 specific.

      My second major concern is regarding the calcium imaging experiments of candidate neurons involved in lawn re-entry behavior. Although the data shows that AIY, AVK, and SIA/SIB neurons all show reduced activity following pathogen exposure, the authors do not relate these activity changes to changes in behavior. Given the well-established links between these cells and forward locomotion, it is essential to not only report differences in activity but also in the relationship between this activity and locomotory behavior. If animals are paused outside of the pathogen lawn, these neurons may show low activity simply because the animals are not moving forward. Other forward-modulated neurons may also show this pattern of reduced activity if the animals remain paused. Given that the authors have recorded neural activity before and after contact with pathogenic bacteria in freely moving animals, they should also provide an analysis of the relationship between proximity to the lawn and the activity of these neurons.

      In response, we added an additional supplementary figure S7 to illustrate the role of each neuron in navigational control and added text to the discussion to better explain the role of each neuron type in the regulation of re-entry, in light of our previously published work on SIA in speed control.

      This work is missing methodological descriptions that are necessary for the correct interpretation of the results shown here. Figure 2 suggests that the determination of statistical significance across the optogenetic inhibition screen will be found in the Methods, but this information is not to be found there. At various points in the text, authors refer to "exit rate", "rate constant", and "entry rate". These metrics seem derived from an averaged measurement across many individual animals in one lawn evacuation assay plate. However "latency to re-entry" is only defined on a per-animal basis in the lawn re-exposure assay. These differences should be clearly stated in the methods section to avoid confusion and to ensure that statistics are computed correctly.

      Additional details have been added to the methods section to provide more in depth information on the statistical analysis performed. In brief, the latency to re-entry is calculated in the same way across all assays – re-entry events across replicate experiments for a given experimental condition are aggregated together and used to calculate relevant statistics.

      This work also contains mislabeled graphs and incorrect correspondence with the text, which make it difficult to follow the authors 'claims. The text suggests that Pdop-2::Arch3 and Pmpz-1::Arch3 show increased exit rates, whereas Figure 2 shows that Pflp-4::Arch3 but not Pmpz-1::Arch3 has increased exit rate. The authors should also make a greater effort to correctly and clearly label which type of behavioral experiment is used to generate each figure and describe the differences in experimental design in the main text, figure legends, and methods. Figure 2E depicts trajectories of animals leaving a lawn over a 2.5-minute interval but it is unclear when this time window occurs within the 18-hour lawn leaving assay. Likewise, Figure 2H depicts a 30-minute time window which has an unclear relationship to the overall time course of lawn leaving. This figure legend is also mislabeled as "Infected/Healthy", whereas it should be labeled "-/+ ATR".

      In Figures 2C and F, the x-axis labels are in a different order, making it difficult to compare between the 2 plots. Promoter names should be italicized. What does the red ring mean in Figure 2A? Figure 2 legend incorrectly states that four lines showed statistically significant changes for the Exist rate constant - only 2 lines are significant according to the figure.

      We thank the reviewer for identifying this embarrassing error. Figure 2C and F were flipped, and we have corrected this, we are sorry for the error. Promoter names have been italicized, and we have added additional text in the captions that the red ring is a ring light for background illumination of the worms. In addition, we have corrected the error in the figure legends from “Infected/Healthy” to “+/- ATR”.

      Lines in figure 2C and 2F are ordered by significance rather than keeping the same order in both. Majority feedback from colleagues suggested that this ordering was preferred.

      This work raises the interesting possibility that different sets of neurons control lawn exit and lawn re-entry behaviors following pathogen exposure. However, the authors never directly test this claim. To rigorously show this, the authors would need to show that lawn-exit-promoting neurons (CEPs, HSNs, RIAs, RIDs, SIAs) are dispensable for lawn re-entry behavior and that lawn re-entry promoting neurons (AVK, SIA, AIY, MI) are dispensable for lawn exit behavior in pathogen-exposed animals.

      We agree with the reviewer’s comments that there is insufficient evidence to show a complete decoupling of lawn exit and lawn re-entry. However, we note that our screen results show that only 1 line (dop-2) shows changes in both exit and re-entry dynamics upon neural inhibition (Fig. 2). This seems to suggest that at least some degree of neural control of re-entry is decoupled from exit.

      Please label graph axes with units in Figure 1 - instead of "Exit Rate" make it #exits per worm per hour, and make it more clear that Figures 1C and E have a different kind of assay than Figures 1A, B and D. There should be more consistency between the meaning of "pre/post" and "naive/infected/healthy" - and how many hours constitutes post.

      We have edited Figure 1 and made additions to the captions of figure 1 to make both points clearer. We have also standardized our language for subsequent figures (such as figure 5) to provide less ambiguity in pre/post and naïve/infected/healthy.

      Figure 5 - it should be made more clear when the stimulation/inhibition occurred in these experiments and how long they were recorded/analyzed.

      We have added additional details to the figure captions to make it clearer when the data was collected.

      Workspaces and code have been added under a data availability section in the manuscript.

      Reviewer 2:

      However, the paper's main weakness lies in its lack of a detailed mechanism explaining how the delayed reentry process directly influences the actual locomotor output that results in avoidance. The term 'delayed reentry' is used as a dynamic metric for quantifying the screening, yet the causal link between this metric and the mechanistic output remains unclear. Despite this, the study is well-structured, with comprehensive control experiments, and is very well constructed.

      We thank the reviewer for their comments and suggestions. We have added additional data and details to our work to cover these weaknesses, as can be seen in our responses to the suggestions below.

      (1) A key issue in the manuscript is the mechanistic link between the delayed process and locomotor output. AIY is identified as a crucial neuron in this process, but the specifics of how AIY influences this delay are not clear. For instance, does AIY decrease the reversal rate, causing animals to get into long-range search when they leave the bacterial lawn? Is there any relationship between pdf-2 expression and reversal rates? Given that AIY typically promotes long-range motion when activated, the suppression of this function and its implications on motion warrants further clarification.

      We have included additional data to explain how AIY might be able to regulate lawn entry behaviors and have added more to the discussion to explain how neural suppression might lead to changes in the behavior (new figure S7). Both AIY and SIA dynamics have been linked to worm navigation. In previous work (Lee 2019), we have demonstrated that SIA can control locomotory speed. Inhibition of SIA decreases locomotory speed, and as a result may serve to drive the increased latency of re-entry.

      AIY’s role in navigation has been previously established (Zhaoyu 2014), but we have added an additional supplementary figure and edited our discussion to further illustrate this point. As can be seen in the new figure S7, AIY neural activity undergoes a transition after removal from a bacterial lawn, going from low activity to high activity. This activity increase is correlated with a transition from a high reversal rate local search state to a long range search state characterized by longer runs. Inhibition of AIY during this long range search state increased the reversal rate resulting in a higher rate of re-orientations. This might serve as a part of the mechanistic explanation for AIY’s role in preventing lawn re-entry, as inhibition dramatically increased the rate of re-orientation, preventing worms from making directed runs into the bacterial lawn. However, there is an additional effect of the inhibition of AIY, not seen during food search. Inhibition of AIY in the context of a pathogenic bacterial lawn leads to stalling at the edge. Therefore, re-entry AIY could have an additional role in governing the animals movement, post exposure, upon contact with a pathogenic lawn.

      (2) I recommend including supplementary videos to visually demonstrate the process. These videos might help others identify aspects of the mechanism that are currently missing or unclear in the text.

      (4) The authors mention that the worms "left the lawn," but the images suggest that the worms do not stray far and remain around the perimeter. Providing videos could help clarify this observation and strengthen the argument by visually connecting these points

      Additional supplementary videos (1-3) taken at several stages of lawn evacuation have been added to visually demonstrate the process.

      (3) Regarding the control experiments (Figure 1E-G), the manuscript describes testing animals picked from a PA14-seeded plate and retesting them on different plates. It's crucial to clarify the differences between these plates. Specifically, the region just outside the lawn should be considered, as it is not empty and worms can spread bacteria around. Testing animals on a new plate with a pristine proximity region might introduce variables that affect their behavior.

      We have reworded the paper to make it clearer that these new conditions on a fresh PA14 lawn represent a different type of assay from the lawn evacuation assay. Fresh PA14 plates will indeed have a pristine proximity region compared to plates where the worms have spread the bacteria.

      These experiments were done to test if the evacuation effect is purely due to aversive signals left on the lawn or attractive signals left outside of the lawn. Given that worms are known to be able to leave compounds such as ascarosides to communicate with each other, we wanted to test that this lawn re-entry defect was not simply the result of deposited pheromones. Without any other method to remove such compounds, we relied on using fresh PA14 lawns instead to test this. We have updated the manuscript to clarify this point.

      (5) The manuscript notes that the PA14 strain was grown without shaking. Typically, growing this strain without agitation leads to biofilm formation. Clarifying whether there is a link between biofilm formation and avoidance behavior would add depth to the understanding of the experimental conditions and their impact on the observed behaviors.

      As the reviewer has noted, growth of PA14 without shaking might indeed lead to biofilm formation. This does represent a legitimate concern, as evidence from previous work has suggested that biofilm formation could be linked to pathogen avoidance as worms make use of mechanosensation to avoid pathogenic bacteria (Chang et al. 2011).  However, we do not observe substantial formation of biofilm in our cultured bacteria, likely since our growth time might be insufficient for sufficient biofilm formation to occur. We also note that our evacuation dynamics appear to be of similar timescale to results reported in previous work which used different growth conditions. As such, we believe that our growth conditions thus represent similar conditions as to those historically used in the lawn evacuation literature.

      Reviewer 3:

      Weaknesses:

      My only concern is that the authors should be more careful about describing their "compressed sensing-based approach". Authors often cite their previous Nature Methods paper, but should explain more because this method is critical for this manuscript. Also, this analysis is based on the hypothesis that only a small number of neurons are responsible for a given behavior. Authors should explain more about how to determine scarcity parameters, for example.

      We have added more details to our paper outlining some of the details involved in our compressed sensing approach. We go into more detail about how we chose sparsity parameters and note that our discovered neurons for re-entry appear to be robust over choice of sparsity parameters. These additional details can be found in both the paper body and the methods section.

      Line 45: This paragraph tries to mention that there should be "small sets of neurons" that can play key roles in integrating previous information to influence subsequent behavior. Is it valid as an assumption in the nervous systems?

      We want to clarify that what is important is not that there are ‘small sets of neurons’, but rather that these key neurons make up a small fraction of the total number of neurons in the nervous system. More correctly: the compressed sensing approach identifies information bottlenecks in the neural circuits, and the assumption is that the number of neurons in these bottlenecks are small. This is the underlying sparsity assumption being made here that allows us to utilize a compressed sensing based approach to identify these neurons. We have reworded this section to make it clear that what is important is not that the total number of neurons is small, but that they must be a small fraction of the total number of neurons in the nervous system.

      Line 125: "These approaches…" Authors repeatedly mentioned this statement to emphasize that their compressed sensing-based approach is the best choice. Are you really sure?

      We agree that there are several approaches that might allow for faster screening of the nervous system. For example, many studies approach the problem by looking at neurons with synapses onto a neuron already known to be implicated in the behavior or find neurons that express a key gene known to regulate the behavior of interest. These approaches utilize prior information to greatly reduce the pool of candidate neurons needed to be screened.

      In the absence of such prior information, we believe that our compressed sensing based approach allows a rapid way to perform an unbiased interrogation of the entire nervous system to identify key neurons at bottlenecks of neural circuits. Once these key neurons are identified, neurons upstream and downstream of these key neurons can be investigated in the future.  This approach gives us the added advantage of being able to identify neurons that do not connect to neurons that are already implicated in the behavior, or that don’t have clear genetic signatures in the behavior of interest. Our approach further allows for screening of neurons with no clear single genetic marker without the next to utilize intersectional genetic strategies.  We should not use the phrase “best choice” which might not be justified. We have reworded these statements, and we believe that compressed sensing based methods provide a complementary approach to those in the literature.

      Line 42: If authors refer to mushroom bodies and human hippocampus in relation to the significance of their work, authors should go back to these references in the Discussion and explain how their work is important.

      We thank the reviewer for this feedback, and we have added to our discussion to expand upon these points.

      Line 151: "the accelerated pathogen avoidance" Accelerated pathogen avoidance does not necessarily indicate the existence of the neural mechanism that inhibits the association of pathogenicity with microbe-specific cues (during early stages: first two hours).

      We agree with the reviewer’s statements that these results alone do not indicate the presence of an early avoidance mechanism. Other evidence for early avoidance mechanisms exists as seen in two choice assay experiments (Zhang 2005), and our results do seem to support this. However, we agree that early neural inhibition is insufficient evidence towards such a mechanism. We have thus removed this statement for accuracy.

    1. Runs a full copy of the Visual Studio Code editor. · Completely portable - runs off a USB, cloud drive (DropBox, iCloud drive, OneDrive, etc) or hard drive.

      > to

      Description

    2. Visual Studio Code supports Portable mode. This mode enables all data created and maintained by VS Code to live near itself, so it can be moved around across

      to

    1. Enable Portable mode Windows, Linux After unzipping the VS Code download, create a data folder within VS Code's folder:

      enable portable mode

    1. AbstractBackground The expanding availability of large-scale genomic data and the growing interest in uncovering gene-disease associations call for efficient tools to visualize and evaluate gene expression and genetic variation data.Methodology Data collection involved filtering biomarkers related to multiple neurological diseases from the ClinGen database. We developed a comprehensive pipeline that was implemented as an interactive Shiny application and a standalone desktop application.Results NeuroVar is a tool for visualizing genetic variation (single nucleotide polymorphisms and insertions/deletions) and gene expression profiles of biomarkers of neurological diseases.Conclusion The tool provides a user-friendly graphical user interface to visualize genomic data and is freely accessible on the project’s GitHub repository (https://github.com/omicscodeathon/neurovar).

      This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.143). These reviews are as follows.

      **Reviewer 1. Joost Wagenaar **

      Is there a clear statement of need explaining what problems the software is designed to solve and who the target audience is?

      Yes. There is a clear statement of need, but the audience is not very targeted. The investigators outline the need for tools to help users identify phenotypic subtypes of disease and describe how the tool would help with this. Although the investigators mention that the tool will allow users to analyze biomarker data, the scope of the types of analysis that can be performed is relatively small. I think that it would benefit the tool to better define the targeted users (clinicians, data scientists, enthusiasts?) and develop specifically towards a single audience.

      The tool leverages several existing R packages to run the analysis over the data and the provided tool can be described as a user-friendly wrapper around these libraries. The interface allows users to submit a file, and plot the results of the analysis within the app.

      As Open Source Software are there guidelines on how to contribute, report issues or seek support on the code?

      No. I did not see any guidelines for contributing to the project in the paper, or in the associated GitHub repository.

      Is the documentation provided clear and user friendly?

      Yes, the investigators did a great job providing documentation and installation instructions. [also video demo: https://youtu.be/cYZ8WOvabJs?si=DnxVuL65yr0wYYjq]

      Is there a clearly-stated list of dependencies, and is the core functionality of the software documented to a satisfactory level?

      Yes, the investigators provide a clearly-stated list of dependencies and instructions on how to install them prior to running the application. Is test data available, either included with the submission or openly available via cited third party sources (e.g. accession numbers, data DOIs)?

      Yes. The paper, and GitHub repository point to a public dataset that can be used to test the application.

      Are there (ideally real world) examples demonstrating use of the software?

      Yes. The investigators provide a video highlighting the use of the application and provide a use-case where they use the app to validate some existing knowledge.

      Is automated testing used or are there manual steps described so that the functionality of the software can be verified?

      No. The application is sufficiently small that no automated testing or manual testing would necessary be required beyond validating that the application works.

      Additional Comments:

      The proposed application provides a nice tool that makes visualization of vcf data and analysis easier for users who are not comfortable working within R directly. It provides a nice demonstration how the scientific community can wrap scientific tools into deployable applications and tools that can be easily understood. A question remains on the target audience for an application like this as most people who are interested in these type of analysis and visualizations are, in fact, familiar enough with R, or other programming languages to directly leverage the libraries and plot the results.
      

      That said, as data integration and multi-omics visualization becomes more complex and the app provides more ways to visualize the data in meaningful ways, I do strongly believe that applications like this can provide a meaningful addition to the scientific tools that are available.

      Reviewer 2. Ruslan Rust

      Is the language of sufficient quality? Yes. The language quality of the document is of sufficient quality. I did not notice any major issues.

      Is there a clear statement of need explaining what problems the software is designed to solve and who the target audience is?

      Yes. Yes, authors provide a statement of need. Authors mention that there is the need for a specialized software tool to identify genes from transcriptomic data and genetic variations such as SNPs, specifically for neurological diseases. Perhaps authors could expand on how they chose the diseases. E.g. stroke is not listed among the neurological diseases. Perhaps authors could expand a bit on the diseases they chose in the introduction.

      Is the source code available, and has an appropriate Open Source Initiative license (https://opensource.org/licenses) been assigned to the code?

      Yes the source code is available in github under the following link: https://github.com/omicscodeathon/neurovar. Additionally authors deposited the source code and additional supplementary data in a permanent depository with zenodo under the following DOI: https://zenodo.org/records/13375493. They also provided test data https://zenodo.org/records/13375591. I was able to download and access the complete set of data

      As Open Source Software are there guidelines on how to contribute, report issues or seek support on the code?

      No. I did not find any way to contribute, report issues or seek support. I would recommend that the authors add this information to the Github README file.

      Is the code executable?

      Yes, I could execute the code using Rstudio 4.3.3

      Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined?

      Yes. I could follow the installation process, but perhaps authors could add few more details how to download from Github in more detail. As some scientist may have trouble with it. Also perhaps an installation video (additionally to the video demonstration of the Neurovar Shiny App might be helpful.

      Is the documentation provided clear and user friendly?

      Yes. The documentation is provided and is user friendly. I was able to install, test and run the tool using RStudio. Authors may consider to offer also a simple website link for the RshinyTools if possible. This may enable the access also for scientists that are not familiar with R.Especially, it is great that authors provided a demonstration video. I was able to reproduce the steps. However, I would recommend to add more information into the Youtube video. E.g. reference to the preprint/ paper and Github link would be helpful to connect the data.Perhaps authors could also expand a bit on the possibilities to export data from their software. And provide different formats e.g., PDF / PNG /JPEG. I think this is important for many researchs to export their outputs e.g., from the heatmaps.

      Is there a clearly-stated list of dependencies, and is the core functionality of the software documented to a satisfactory level?

      Yes, dependencies are listed and are installed automatically. It worked for me with Rstudio version 4.3.3. In the manuscript and in the repository.

      Is test data available, either included with the submission or openly available via cited third party sources (e.g. accession numbers, data DOIs)?

      Yes the authors provide test data with this doi: https://doi.org/10.5281/zenodo.13375590

      Are there (ideally real world) examples demonstrating use of the software?

      Yes, authors use the example of Epilepsy, focal epilepsy and the gene of interest DEPDC5. I replicated their search and got the same results. However, I find that the label in Figure 1 in the gene’s transcript could be a bit more clear. E.g. it is not clear to me what transcript start and end refers to. It might also be more helpful if authors provide an example dataset for the Expression data that is loaded in the software by default.Furthermore authors use a case study results using RNAseq in ALS patients with mutations in FUS, TARDBP, SOD1, VCP genes.

      Is automated testing used or are there manual steps described so that the functionality of the software can be verified?

      No. Automated testing is not used as far as I can access it.

      Additional Comments: The preprint version of this paper was also reviewed in ResearchHub: https://www.researchhub.com/paper/7381836/neurovar-an-open-source-tool-for-gene-expression-and-variation-data-visualization-for-biomarkers-of-neurological-diseases/reviews

      My expertise: I am assistant professor in neuroscience and physiology at University of Southern California and work on stem cell therapies on stroke. We are particularly interested in working with genomic data and the development of new biomarkers for stroke, AD and other neurological diseases.

      Summary: The authors provide a software tool NeuroVar that helps visualizing genetic variations and gene expression profiles of biomarkers in different neurological diseases.

    1. If two positioned elements overlap each other without a z-index specified, the element defined last in the HTML code will be shown on top.

      عندما يتداخل عنصران محددان الموضع (positioned elements) مع بعضهما (مثل relative، absolute، fixed، أو sticky) ولم يتم تحديد قيمة z-index لأي منهما، فإن المتصفح يستخدم قاعدة أساسية لتحديد العنصر الذي يظهر فوق الآخر:

      العنصر المُعرّف أخيرًا في كود HTML يظهر في الأعلى: الترتيب الافتراضي للعناصر يعتمد على ترتيبها في كود HTML. العنصر الذي يأتي متأخرًا في الكود (أقرب إلى نهاية المستند) سيظهر أعلى العنصر الذي يسبقه

    1. Note: This response was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      1. General Statements

      We thank the reviewers for their thorough and positive evaluation of the manuscript.

      2. Point-by-point description of the revisions

      We revised the manuscript following the suggestions of the reviewers to make the article more concise and comprehensible to a wider audience. Specifically, we rearranged Section 5, rewrote the difficult-to-understand sections 5 and 6, and removed unnecessary or overlapping text in Introduction and Discussion. We have also addressed the specific points raised by the reviewers. The responses to individual points are detailed below.

      Reviewer 1:

      The reviewer did not ask for any changes to the manuscript.

      We thank the reviewer for the positive evaluation of the manuscript.

      Reviewer 2:

      1/ Title: Structure-based mechanism of RyR channel operation by calcium and magnesium ions

      The authors may consider using an alternative term instead of "operation".

      Thank you for the suggestion. We considered and discussed the term "RyR channel operation" very thoroughly with several colleagues, including native English speakers, and we found it to represent the complex RyR behavior in situ and in experiments most exactly. Alternative terms such as "control" suggest a one-way deterministic action from the ion binding to the protein state, which is not the case. The terms such as "modulation" implicate the presence of a higher RyR state-governing principle, such as phosphorylation, nitrosylation, binding of auxiliary proteins, etc.

      2/ Abstract: Please spell out CFF and MWC theorem.

      Thank you for the proposal. CFF was changed to caffeine; MWC was changed to Monod-Wyman-Changeaux

      3/ Line 87-88: "In striated muscle cells, RyR channels cluster at discrete sites of sarcoplasmic reticulum attached to the sarcolemma where electrical excitation triggers transient calcium release by activation of RyRs."

      There is no attachment between sarcoplasmic reticulum and sarcolemma, please rewrite.

      We respectfully disagree, since there is strong evidence for the existence of discrete contact sites between the sarcolemma and sarcoplasmic reticulum both at triads of skeletal muscle (Rossi et al., 2019) and at dyads of cardiac muscle (Mackrill, 2022), at which both membranes are firmly attached.

      However, to avoid potential misunderstanding, we changed the sentence to "In striated muscle cells, RyR channels cluster at the discrete sites of sarcoplasmic reticulum attached to the sarcolemma in triads or dyads, where electrical excitation triggers transient calcium release by activation of RyRs" (lines 86-87).

      4/ Lines 104-107: "Recently, mathematical modeling of the cardiac calcium release site (Iaparov et al., 2022) confirmed that Mg2+ ions could at the same time act as the negative competitor at the calcium activation site and as an inhibitor at the inhibition site. Unfortunately, the structural counterpart of RyR inactivation, an inhibitory binding site for divalent ions, has not been located yet in RyR structures."

      Note that the exact structural counterpart exists (Nayak et al., 2022, 2024), where Ca and Mg were found both at the activation and inhibition sites. The paragraph should be updated accordingly.

      We respectfully disagree. In the cited works of Nayak et al. (2022; 2024) it was shown that Ca and Mg ions bind firmly at the activation site. Both atoms were also observed at the ACP molecule bound at the ATP binding site. However, they were not observed at the divalent ion-binding inhibition site, which is distinct from the ATP binding site and resides in the loops of the EF-hand region.

      However, to clarify the meaning of the disputed sentence, we have changed it to: "Although binding of Ca2+ or Mg2+ to an inhibitory binding site has not been observed yet in RyR structures, a consensus is emerging that the EF-hand loops constitute this site (Gomez et al., 2016; Zheng and Wen, 2020; Nayak et al., 2024; Chirasani et al., 2024 )" (lines 107-109).

      5/ Lines 108-110: The activation of RyR by agonists was shown to be accompanied by a conformational change around the Ca2+ binding site that leads to a decrease in the free energy and to a concomitant increase of the Ca2+ binding affinity and a population shift between the closed and open conformations (Dashti et al., 2020).

      Please clarify to what state does the "decrease in free energy" refer, to the open or to the closed state?

      Thank you for the proposal. The text was changed to: "The activation of RyR by agonists was shown to be accompanied by a conformational change around the Ca2+ binding site that leads to a decrease in the free energy of the open state and concomitantly to an increase of the Ca2+ binding affinity of the activation site. As a result, the occurrence probability of a RyR state/conformation shifts from the closed toward the open (Dashti et al., 2020)" (lines 110-113).

      6/ Figure 2: please indicate if distances were measured between the C-alphas or side chains.

      Thank you for the proposal. The figure legend was modified to "Distances D1 between the Cα atoms of E4075 and R4736 or equivalent. Right - Distances D2 between the Cα atoms of K4101 and D4730 or equivalent."

      7/ Line 353-357: "These data suggest that interactions between the basic arginine residue R4736 and the acidic residues at the start of the initial helix E of the EF1-hand are specific for Ca2+-dependent inactivation in RyR1, whereas the interactions between the lysine K4101 that immediately follows the F helix of EF1 and the middle of the S23 loop (corresponding to D4730 and I4731 in RyR1) may play a part in the inactivation of both RyR1 and RyR2 isoforms.

      Sentence is unclear; please rewrite. Overall, the entire section "Spatial interactions between the EF-hand and S23* regions" should be simplified and shortened.

      Thank you for the proposal. The text was changed to: "These data suggest that interactions between the basic arginine residue R4736 and the acidic residues E4075 and D4079 are specific for Ca2+-dependent inactivation in RyR1, whereas the interactions between the lysine K4101 and the residues D4730 and I4731 (rRyR1 notation)* may play a part in the inactivation of both RyR1 and RyR2 isoforms." (lines 334-337).

      We did not find a way how to make the whole section simpler and shorter at the same time without losing clarity.

      8/ Lines 246-249 and Table 1. "all structures corresponding to rRyR1 residues 4063-4196 were<br /> subjected to energy minimization and submitted to the MIB2 server for evaluation of the ion binding score (IBS) of individual amino acid residues and the number of ion binding poses (NIBP) for Ca and Mg ions."

      Please elaborate on the "ion binding score" and "number of ion binding poses" concepts and provide reference for the MIB2 server.

      Thank you for the proposal. We added the reference for the server (Lu et al., 2022) (line 228) and added the information: "IBS values of individual residues are determined using sequence and structure conservation comparison with 409 and 209 respective templates from the PDB database for Ca2+ and Mg2+ (Lin et al., 2016) and assessing the similarity of the configuration of the residue to its configurations in known structures of its complexes with the given metal (Lu et al., 2012). Ion binding sites are determined by locally aligning the query protein with the metal ion-binding templates and calculating its score as the RMSD-weighted scoring function Z. The site is accepted if it has a scoring function Z>1, and based on the local 3D structure alignment between the query protein and the metal ion-binding template, the metal ion in the template is transformed into the query protein structure (Lin et al., 2016). The larger the IBS value, the higher the tendency of the residue to bind the ion. The larger the NIBP value, the larger the number of such complexes with acceptable structure" (lines 224-234).

      9/ Lines 460-466: Nine structural models of RyR were selected, and then these are referred to in the text only with the pdb code. The reviewer understands that it would be difficult to recapitulate all conditions but either a table in the main manuscript file or a minimal description in the text following the pdb code would increase clarity and help readers to follow the content.

      Thank you for the proposal. We added a new Table 2 "Model structures used for identifying the allosteric pathways" on line 452 that contains the required information, and inserted a reference to it in the text at line 446 "According to these criteria we selected five RyR1 model structures (Table 2)..."

      10/ Line 467: "In the selected structures, we identified residues with high allosteric coupling intensities (ACI) for both the inhibition and activation network and compared them with residues important for ligand binding and gating of RyR (Table 2)."

      Please define further the concept of "allosteric coupling intensities". The corresponding methods section appears to focus on the outputs of the OHM server without delving too much on the algorithm or principles followed. Is the allosteric coupling between neighboring residues, or reflect movement of the residues due to ligand binding? Is there a "reference" state or are the comparisons carried out within each allosteric state? This would help to introduce better the sections "The inhibition network" and "The activation network".

      Thank you for this suggestion. We have lately realized, considering both the server output and the original work of Wang et al. (2020), that a better term for the variable depicting the role of the residue in the allosteric pathway would be the residue importance RI rather than the ACI. The allosteric pathway is determined on the basis of the network of contacts between pairs of residues in the given structure. The more contacts are present between two residues, the higher is the probability that a perturbation will be propagated from one to the other residue (Eq. 3 of Wang et al. (2020)). An allosteric pathway is then defined as the pathway that transmits the signal the whole way from the allosteric site to the active site.

      Based on this we have changed in the manuscript the term "allosteric coupling intensity" to "residue importance" throughout the text and figures of the manuscript. It should be underlined, that this change has no effect whatsoever on presented data and conclusions. We inserted the following formulation in the Results section:

      "The term residue importance defines the extent to which the given residue is involved in the propagation of a perturbation from the allosteric site to the active site, i.e., the fraction of simulated perturbations transmitted through this particular residue. The more contacts are present between two residues, the higher is the probability that a perturbation will be propagated from one to the other residue (Wang et al., 2020)." (lines 439-443).

      We also inserted the following formulations into the Methods section: "The simulation of the perturbation propagation was performed 10 000 times per structure and pathway to estimate the values of residue importance." (lines 1093-1095), and we expanded the relevant sentence: "Allosteric pathways were traced using the server OHM (https://dokhlab.med.psu.edu/ohm/#/home, (Wang et al., 2020)), in which the allosteric pathway is determined on the basis of the network of contacts between pairs of residues in the given structure." (lines 1082-1084).

      11/ Figure 8: The figure would be more meaningful if the pathways were drawn in the context of the 3D structure.

      Thank you for the proposal. The pathways described in Fig. 8 are too complex for description in the RyR 3D structure, therefore they were not presented in the original manuscript. However, to follow the reviewer's proposal we have illustrated the pathways observed in the inactivated RyR1 channel (7tdg) and the open RyR2 channel (7u9) in Expanded View Figure EV1 and added the corresponding Expanded View Movie EV1 and EV2. These RyR structures were selected for displaying both the intra- and inter-monomeric inactivation pathways.

      12/ Lines 610-612: "The structure of the inactivated RyR2 has not been determined yet; however, it is plausible to suppose that it exists at high concentrations of divalent ions and differs from the inactivated RyR1 structure by the extent of EF-hand - S23* coupling. "

      The speculation would be more fit for the discussion section.

      Thank you for the proposal; however, the sentence introduces a logical supposition, necessary there for reasoning on the construction of the model. We reformulated the sentence to: "In the absence of a structure of the inactivated RyR2, the model assumes that such a structure exists at high concentrations of divalent ions and differs from the inactivated RyR1 structure by the extent of EF-hand - S23* coupling." (lines 573-575).

      13/ Lines 617-619: Closed and primed macrostates could be combined into a single closed macrostate of the model since both are closed and cannot be functionally distinguished at a constant ATP concentration.

      The rationale for combining closed with primed does not seem a good idea, especially since the authors also mention that "the primed state is structurally very close to the open state" (lines 925-926). If the COI model is based on the structural findings, in principle it seems that primed should be treated separately.

      Thank you for the proposal. The use of both the closed and primed states was crucial for solving the model. As a matter of fact, although the primed and closed states are in part structurally different, functionally they are identical, that is, closed. Consequently, to be distinguished in a functional model we would need to incorporate single-channel data obtained under conditions when the ratio of closed and primed channels was modulated under otherwise identical conditions. Unfortunately, such a set of data, for instance at a varying ATP concentration for a range of cytosolic Ca2+ concentrations, does not exist for either RyR1 or RyR2 channels. Moreover, while there are several RyR1 high-resolution structures in the primed state (such as the 7tzc that we used; 2.45 Å; Melville et al. (2022)), the resolution of the corresponding RyR2 structures (6jg3, 6jh6, 6jhn; 4.5 - 6.1 Å; Chi et al. (2019)) is not sufficient for determination of allosteric pathways. Fortunately, however, the two sets of conditions for RyR2 open probability data that were available in the literature turned out to represent activation of channels either selectively from the closed state (Fig. 10C), or almost selectively from the primed state (Fig. 10A, B). This allowed us to interpret the difference in the allosteric coefficients as a consequence of this fact.

      To better clarify the idea, the corresponding text of the Discussion was modified as follows (lines 926-931): "RyR channels can be considered mostly in the primed state under these conditions since the binding of ATP analogs induces the primed structural macrostate in RyRs even in the absence of Ca2+ (Cholak et al., 2023). Fortunately, the two sets of conditions for RyR2 open probability data that were available in the literature turned out to represent activation of channels either selectively from the closed state (Fig. 10C), or selectively from the primed state (Fig. 10A, B).", and "construction of such a model is at present hampered by the lack of open probability data at a sufficiently wide range of experimental conditions and the absence of high-resolution structures of WT RyR2 in the primed state" (lines 934-937).

      14/ Line 619. Please define the "COI" acronym. I assume it is closed, open and inactivated but this is not mentioned.

      We thank the reviewer for noticing the insufficiency. We expanded the specific sentence as follows: Therefore, we constructed the model of RyR operation, termed the COI (closed-open-inactivated) model, in which we assigned a functional macrostate corresponding to each of the closed, open, and inactivated structural macrostates (Figure 9A)" (line 582).

      15/ Figure 9: The diagrams are difficult to follow. Something that could improve it is to differentiate more between open and closed gates, but further elaboration would help the reader.

      We thank the reviewer for paying attention to details. The open state was differentiated in Figure 9 (after line 603) by adding a pore opening to the gate.

      To elaborate on the gating transitions and to keep the manuscript concise, we added a new Expanded View Figure EV2, which illustrates the relationship between the ion binding within macrostates and the transitions between macrostates.

      Nevertheless, for the complexity of the model, which would need a multidimensional presentation, we had to limit the illustration to only the binding of the first ions at the binding sites. We hope that it will help the reader to grasp the principle of the model function more easily.

      16/ One comment is that the manuscript is too long; the manuscript exceeds the typical length required by most journals. To enhance its suitability for publication, the content needs to be synthesized and streamlined. The manuscript is written for an audience specialized in the RyR field and may be challenging for outsiders or for readers unfamiliar with structure and/or biophysical models.

      We thank the reviewer for opening this problem. The specific contribution to the understanding of RyR operation communicated by this manuscript was achieved by the synergy of approaches coming from different fields of RyR research - the structural, the functional, and the synthetic/systems ones. This needed deep immersion into complex studies performed over several decades to unwrap their complementary contributions. Only then we could synthesize the stepwise advances and integrate the mosaic of partial discoveries into the COI model. When conceptualizing the manuscript we were also considering a two-paper version, one on structural aspects and the other on modeling aspects. We realized that the two papers would need to have a very high overlap at the allosteric mechanism to be understandable in separation and would be difficult to publish in the same journal. We also anticipated a typical side effect that structuralists and modelers would read just their parts and would not appreciate enough the feedback from alternative views - how to design and interpret future structural, functional, and modeling studies.

      Compacting the manuscript would be extremely difficult for us. In our view, the dense text would make it even more challenging for readers unfamiliar with some of the numerous approaches used here, as often happens to prominent multidisciplinary journals. Maybe it would be possible with the help of AI, but for now, we prefer to remain authentic.

      Nevertheless, we made some effort. To shorten the manuscript, we have removed the paragraph describing the timeline of the search for the RyR inhibition site that was originally on lines 126-151 and replaced it with the paragraph on lines 129-134: "The regulatory domains involved in both, activation and inactivation of RyRs (Figure 1) are located in the C-terminal quarter of the RyR. The Central domain participates in the Ca2+ binding activation site; the C-terminal domain bears several residues of Ca-, ATP- and caffeine-binding activation sites; the U-motif participates at the ATP- and caffeine-binding sites; the EF-hand region contains the putative Ca-binding pair EF1 and EF2; and the S23 loop bears one residue of the caffeine-binding site and two residues interacting with the EF-hand region of a neighboring monomer (Samso, 2017; Hadiatullah et al., 2022)". We also removed the statements about the proposed kinetic mechanism of inactivation by Nayak et al. (2022), originally on lines 175-184. Finally, we removed the discussion of the work of Gomez et al. (2016) originally on lines 882-889, since it fully overlapped with the statements in Results on lines 358-367 (now lines 338-347). We also moved the text of the subsection "Relationship between the COI model and RyR allosteric pathways" (originally lines 670-685) into subsection "Construction of the model of RyR operation", lines 592-603 and 645-662 of the revised version.

      17/ Another comment is the limited consideration of two relevant published works. One is by Chirasani et al. (2024), focused on allosteric pathways similar to the ones described here. The other work is by Nayak et al (2024), with cryo-EM structures of RyR1 focused on the interplay with Mg2+ and Ca2+. Overall, the manuscript would be strengthened by incorporating such related results in the literature.

      We thank the reviewer for the concerns, but we cannot fully agree. The paper of Chirasani et al. (2024 ) was cited in the manuscript as its online-first version, Chirasani et al. (2023). The manuscript now refers to the printed version proposed by the reviewer. The Chirasani et al. work was discussed on lines 870-881. The paper concentrates on the interaction between the EF-hand region and the S23 segment and its effect on RyR inactivation, which we referenced in the manuscript, but not on the allosteric pathways as mentioned by the reviewer. To broaden the consideration of this important work, we have introduced a more detailed discussion of Chirasani et al. (2024 ) by adding the following text to the manuscript: Lines 881-888: "Based on their structural analysis of the open RyR1 structure 5tal, Chirasani et al. (2024 ) proposed that narrowing the gap between the EF-hand domain and S23 loop, resulting in H-bonding interactions between the EF-hand residue K4101 and the S23 loop residue D4730, and those between the EF-hand residues E4075, Q4076, D4079 and the S23 loop residue R4736, is a consequence of the binding of Ca2+ to the EF-hands. However, our PDBePISA analysis revealed a similar number of interactions between the EF-hand region and the S23 loop not only in open and inactivated but also in primed RyR1 structures (Figure 3). The presence of EF hand-S23 hydrogen bonds in the primed and open RyR1 structures suggests that the proximity of the EF-hand domain and S23 loop is a structural trait distinguishing RyR1 from RyR2, not a consequence of Ca2+ binding to the EF hand.*"

      The data and ideas of the illuminating work of Nayak et al. (2024) were discussed and referred to in the manuscript in several places, originally lines 74, 77, 164 (Introduction), 311, 340 (Results), 892-893, and 971 (Discussion). To broaden consideration of this work, we have expanded the discussion of this paper by adding the text shown in bold into the Introduction: "Recent studies reporting RyR structure at a high divalent ion concentration provide only indirect support for the molecular mechanism of Ca2+/Mg2+-dependent inactivation. Wei et al. (2016) and Nayak et al. (2024) observed a change in the conformation of the RyR1 EF-hands in the presence of 100 µM Ca2+ and 10 mM Mg2+, respectively, compared to low-calcium or low-magnesium conditions." (lines 135-138) and in the Discussion (lines 889-891): "The recent RyR1 structure 7umz (Nayak et al., 2024) provided evidence of Mg2+ ion bound in the RyR activation site, thus confirming the functional studies that established competition between Ca2+ and Mg2+ at this activation site (Laver et al., 1997; Zahradnikova et al., 2003; Zahradnikova et al., 2010)."

      Reviewer 3:

      Minor comment: While I am not an expert in allosteric model construction and therefore cannot fully assess their methodological approach, I observed that the authors fixed a number of parameters to achieve model convergence. A more detailed explanation of the rationale behind these fixed parameters would enhance clarity. Currently, these parameters are not clearly specified in the text and are somewhat obscured by the broader description of all parameters included in the model.

      We thank the reviewer very much for this comment, which made us realize that the relevant sections were written in a too technical manner, without sufficient explanation of the ideas behind the derivation and optimization of the model. To clarify the rationale of this process, we have rewritten the subsection "Derivation of the model open probability equation" and the section "Description of RyR operation by the COI model". In the subsection "Derivation of the model open probability equation", we have explained the simplification of the full set of equations (Eqs. 3A-C) into Eqs. 4A-C (lines 642 - 666). In the section "Description of RyR operation by the COI model", we have explained the extent of over-parametrization and the rationale of reducing it by three methods: combining the data into groups with common parameter values; eliminating parameter interdependence by fixation of one parameter at a preset value taken from the literature or postulated a priori; and sharing parameter values between data groups when no significant difference between these values was observed (lines 683-685, 702-710, 719-740).

      We hope that these changes make the manuscript more comprehensible.

      REFERENCES

      Chi, X., D. Gong, K. Ren, G. Zhou, G. Huang, J. Lei, Q. Zhou, and N. Yan. 2019. Molecular basis for allosteric regulation of the type 2 ryanodine receptor channel gating by key modulators. Proceedings of the National Academy of Sciences of the United States of America. 116:25575-25582.

      Chirasani, V.R., M. Elferdink, M. Kral, J.S. Carter, S. Heitmann, G. Meissner, and N. Yamaguchi. 2024 Structural and functional interactions between the EF hand domain and S2-S3 loop in the type-1 ryanodine receptor ion channel. The Journal of biological chemistry. 300:105606.

      Cholak, S., J.W. Saville, X. Zhu, A.M. Berezuk, K.S. Tuttle, O. Haji-Ghassemi, F.J. Alvarado, F. Van Petegem, and S. Subramaniam. 2023. Allosteric modulation of ryanodine receptor RyR1 by nucleotide derivatives. Structure. 31:790-800 e794.

      Dashti, A., G. Mashayekhi, M. Shekhar, D. Ben Hail, S. Salah, P. Schwander, A. des Georges, A. Singharoy, J. Frank, and A. Ourmazd. 2020. Retrieving functional pathways of biomolecules from single-particle snapshots. Nature communications. 11:4734.

      Gomez, A.C., T.W. Holford, and N. Yamaguchi. 2016. Malignant hyperthermia-associated mutations in the S2-S3 cytoplasmic loop of type 1 ryanodine receptor calcium channel impair calcium-dependent inactivation. American journal of physiology. 311:C749-C757.

      Hadiatullah, H., Z. He, and Z. Yuchi. 2022. Structural Insight Into Ryanodine Receptor Channelopathies. Frontiers in pharmacology. 13:897494.

      Laver, D.R., T.M. Baynes, and A.F. Dulhunty. 1997. Magnesium inhibition of ryanodine-receptor calcium channels: Evidence for two independent mechanisms. J.Membrane.Biol. 156:213-229.

      Lin, Y.F., C.W. Cheng, C.S. Shih, J.K. Hwang, C.S. Yu, and C.H. Lu. 2016. MIB: Metal Ion-Binding Site Prediction and Docking Server. Journal of chemical information and modeling. 56:2287-2291.

      Lu, C.H., C.C. Chen, C.S. Yu, Y.Y. Liu, J.J. Liu, S.T. Wei, and Y.F. Lin. 2022. MIB2: metal ion-binding site prediction and modeling server. Bioinformatics. 38:4428-4429.

      Lu, C.H., Y.F. Lin, J.J. Lin, and C.S. Yu. 2012. Prediction of metal ion-binding sites in proteins using the fragment transformation method. PLoS One. 7:e39252.

      Mackrill, J.J. 2022. Evolution of the cardiac dyad. Philosophical transactions of the Royal Society of London. Series B, Biological sciences. 377:20210329.

      Melville, Z., K. Kim, O.B. Clarke, and A.R. Marks. 2022. High-resolution structure of the membrane-embedded skeletal muscle ryanodine receptor. Structure. 30:172-180 e173.

      Nayak, A.R., W. Rangubpit, A.H. Will, Y. Hu, P. Castro-Hartmann, J.J. Lobo, K. Dryden, G.D. Lamb, P. Sompornpisut, and M. Samso. 2024. Interplay between Mg(2+) and Ca(2+) at multiple sites of the ryanodine receptor. Nature communications. 15:4115.

      Nayak, A.R., and M. Samso. 2022. Ca(2+) inactivation of the mammalian ryanodine receptor type 1 in a lipidic environment revealed by cryo-EM. eLife. 11.

      Rossi, D., A.M. Scarcella, E. Liguori, S. Lorenzini, E. Pierantozzi, C. Kutchukian, V. Jacquemond, M. Messa, P. De Camilli, and V. Sorrentino. 2019. Molecular determinants of homo- and heteromeric interactions of Junctophilin-1 at triads in adult skeletal muscle fibers. Proceedings of the National Academy of Sciences of the United States of America. 116:15716-15724.

      Samso, M. 2017. A guide to the 3D structure of the ryanodine receptor type 1 by cryoEM. Protein science : a publication of the Protein Society. 26:52-68.

      Wang, J., A. Jain, L.R. McDonald, C. Gambogi, A.L. Lee, and N.V. Dokholyan. 2020. Mapping allosteric communications within individual proteins. Nature communications. 11:3862.

      Wei, R., X. Wang, Y. Zhang, S. Mukherjee, L. Zhang, Q. Chen, X. Huang, S. Jing, C. Liu, S. Li, G. Wang, Y. Xu, S. Zhu, A.J. Williams, F. Sun, and C.C. Yin. 2016. Structural insights into Ca(2+)-activated long-range allosteric channel gating of RyR1. Cell research. 26:977-994.

      Zahradnikova, A., M. Dura, I. Gyorke, A.L. Escobar, I. Zahradnik, and S. Gyorke. 2003. Regulation of dynamic behavior of cardiac ryanodine receptor by Mg2+ under simulated physiological conditions. American journal of physiology. 285:C1059-1070.

      Zahradnikova, A., I. Valent, and I. Zahradnik. 2010. Frequency and release flux of calcium sparks in rat cardiac myocytes: a relation to RYR gating. The Journal of general physiology. 136:101-116.

      Zheng, W., and H. Wen. 2020. Investigating dual Ca(2+) modulation of the ryanodine receptor 1 by molecular dynamics simulation. Proteins. 88:1528-1539.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This is an interesting contribution by Zahradnikova et al. on the structure-based mechanism of RyR by calcium and magnesium. To this effect, they systematically and quantitatively compare multiple structures from the pdb database using bioinformatics. The comparisons between structures are rigorous and include RyR reconstructions in multiple conditions. They represent the more comprehensive structural comparison to date. The study proposes main long-allosteric pathways between the activation and inhibition ion binding sites and the ion gate and reasons the different inactivation properties of RyR1 and RyR2, which is an important question on the field.

      Based on the allosteric model, they built a model of "RyR operation" using Monod-Wyman-Changeaux and Markov theorems. While the reviewer cannot comment on the mathematical model due lack of expertise, it appears to have high predictive power and strong agreement with single channel functional data.

      Specific issues are noted below.

      Title: Structure-based mechanism of RyR channel operation by calcium and magnesium ions

      The authors may consider using an alternative term instead of "operation".

      Abstract: Please spell out CFF and MWC theorem.

      Line 87-88: "In striated muscle cells, RyR channels cluster at discrete sites of sarcoplasmic reticulum attached to the sarcolemma where electrical excitation triggers transient calcium release by activation of RyRs."

      There is no attachment between sarcoplasmic reticulum and sarcolemma, please rewrite.

      Lines 104-107: "Recently, mathematical modeling of the cardiac calcium release site (Iaparov et al., 2022) confirmed that Mg2+ ions could at the same time act as the negative competitor at the calcium activation site and as an inhibitor at the inhibition site. Unfortunately, the structural counterpart of RyR inactivation, an inhibitory binding site for divalent ions, has not been located yet in RyR structures."

      Note that the exact structural counterpart exists (Nayak et al., 2022, 2024), where Ca and Mg were found both at the activation and inhibition sites. The paragraph should be updated accordingly.

      Lines 108-110: The activation of RyR by agonists was shown to be accompanied by a conformational change around the Ca2+ binding site that leads to a decrease in the free energy and to a concomitant increase of the Ca2+ binding affinity and a population shift between the closed and open conformations (Dashti et al., 2020).

      Please clarify to what state does the "decrease in free energy" refer, to the open or to the closed state?

      Figure 2: please indicate if distances were measured between the C-alphas or side chains.

      Line 353-357: "These data suggest that interactions between the basic arginine residue R4736 and the acidic residues at the start of the initial helix E of the EF1-hand are specific for Ca2+-dependent inactivation in RyR1, whereas the interactions between the lysine K4101 that immediately follows the F helix of EF1 and the middle of the S23 loop (corresponding to D4730 and I4731 in RyR1) may play a part in the inactivation of both RyR1 and RyR2 isoforms.

      Sentence is unclear; please rewrite. Overall, the entire section "Spatial interactions between the EF-hand and S23* regions" should be simplified and shortened.

      Lines 246-249 and Table 1. "all structures corresponding to rRyR1 residues 4063-4196 were<br /> subjected to energy minimization and submitted to the MIB2 server for evaluation of the ion binding score (IBS) of individual amino acid residues and the number of ion binding poses (NIBP) for Ca and Mg ions."

      Please elaborate on the "ion binding score" and "number of ion binding poses" concepts and provide reference for the MIB2 server.

      Lines 460-466: Nine structural models of RyR were selected, and then these are referred to in the text only with the pdb code. The reviewer understands that it would be difficult to recapitulate all conditions but either a table in the main manuscript file or a minimal description in the text following the pdb code would increase clarity and help readers to follow the content.

      Line 467: "In the selected structures, we identified residues with high allosteric coupling intensities (ACI) for both the inhibition and activation network and compared them with residues important for ligand binding and gating of RyR (Table 2)."

      Please define further the concept of "allosteric coupling intensities". The corresponding methods section appears to focus on the outputs of the OHM server without delving too much on the algorithm or principles followed. Is the allosteric coupling between neighboring residues, or reflect movement of the residues due to ligand binding? Is there a "reference" state or are the comparisons carried out within each allosteric state? This would help to introduce better the sections "The inhibition network" and "The activation network".

      Figure 8: The figure would be more meaningful if the pathways were drawn in the context of the 3D structure.

      Lines 610-612: "The structure of the inactivated RyR2 has not been determined yet; however, it is plausible to suppose that it exists at high concentrations of divalent ions and differs from the inactivated RyR1 structure by the extent of EF-hand - S23* coupling. "

      The speculation would be more fit for the discussion section.

      Lines 617-619: Closed and primed macrostates could be combined into a single closed macrostate of the model since both are closed and cannot be functionally distinguished at a constant ATP concentration.

      The rationale for combining closed with primed does not seem a good idea, especially since the authors also mention that "the primed state is structurally very close to the open state" (lines<br /> 925-926). If the COI model is based on the structural findings, in principle it seems that primed should be treated separately.

      Line 619. Please define the "COI" acronym. I assume it is closed, open and inactivated but this is not mentioned.

      Figure 9: The diagrams are difficult to follow. Something that could improve it is to differentiate more between open and closed gates, but further elaboration would help the reader.

      Significance

      Overall, the work is a valuable conceptual contribution to the field that will help in the mechanistic understanding of RyR function.

      One comment is that the manuscript is too long; the manuscript exceeds the typical length required by most journals. To enhance its suitability for publication, the content needs to be synthesized and streamlined. The manuscript is written for an audience specialized in the RyR field and may be challenging for outsiders or for readers unfamiliar with structure and/or biophysical models.

      Another comment is the limited consideration of two relevant published works. One is by Chirasani et al. (2024), focused on allosteric pathways similar to the ones described here. The other work is by Nayak et al (2024), with cryo-EM structures of RyR1 focused on the interplay with Mg2+ and Ca2+. Overall, the manuscript would be strengthened by incorporating such related results in the literature.

    1. drive modularity through the pattern of change

      Put code that changes frequently together in the same module (or service), and separate code that changes at different frequencies into different modules (or services)

    1. Reviewer #1 (Public review):

      Summary:

      The authors in this manuscript performed scRNA-seq on a cohort of 15 early-stage cervical cancer patients with a mixture of adeno- and squamous cell carcinoma, HPV status, and several samples that were upstaged at the time of surgery. From their analyses they identified differential cell populations in both immune and tumour subsets related to stage, HPV status, and whether a sample was adenocarcinoma or squamous cell. Putative microenvironmental signaling was explored as a potential explanation for their differential cell populations. Through these analyses the authors also identified SLC26A3 as a potential biomarker for later stage/lymph node metastasis which was verified by IHC and IF. The dataset is likely useful for the community. The accuracy and clarity have been improved from the previous version, and additional immunofluorescence supporting the existence of their proposed cluster is now present. That said, there remain some issues with the strength of some claims (particularly in the abstract and results sections) and some of the cell type definitions.

      Strengths

      The dataset could be useful for the community<br /> SLC26A3 could potentially be a useful marker to predict lymph node metastasis with further study

      Weaknesses

      Casual language is used in the abstract around immunosuppressive microenvironment and signal cross-talk between Epi_10_CYSTM1 cluster and Tregs. The data show localization that supports a possible interaction and probable cytokines, but functional experiments would be needed to establish causality.

      In the description of the single cell data processing there is no mention of batch effect correction. Given that many patients were analyzed, and no mention was made of pooling or deconvolution, it must be assumed these were run separately which invariably leads to batch effects. Given the good overlays across patients some batch correction must have been performed. How was batch effect correction performed?

      While statistics were added to the clinical correlates, it would appear that single variables are being assessed one at a time by chi-squared analysis. This ignores the higher order structure of the data and the correlations between some variables resulting in potentially spurious findings. This is compounded as some categories had below 5 observations violating the assumptions of a chi-squared test.

      The description of all analytical steps remains quite truncated. While the inclusion of annotated code is useful, a full description of which tools were used, with which settings, and why each were chosen, is a minimum needed to properly interpret the results. This is as important in a mainly analytical paper as the experimental parameters.

      Validation of the clustering results remains a problem. The only details provided are that FindClusters was used. This depends on a manual choice of multiple parameters including the k-nearest neighbours included, whether Louvain or Leiden clustering is used, the resolution parameter, and others (how many variable genes/PCs etc...). Why were these parameters selected, how do you know that you're not over or under-clustering.

      The cluster Epi_10_CYSTM1 remains somewhat problematic. None of the additional data supports its existence outside of the single patient who has cells from that population. Additionally, it falls well outside of any of the other Epithelial cells to the point that drawing it as part of a differentiation order doesn't even make sense. Indeed, most of the upregulated pathways in this cluster appear to be related to class II antigen presentation which would fit better with a dendritic cell/macrophage than an epithelial cell. While the IF at the end does support the existence of the cluster, numbers are still very limited, and this doesn't have data on the antigen presenting function. At the least a strong disclaimer should be included in the text that this population is essentially exclusive to one sample in the scRNA data.

      The linkage between the cluster types and IHC for prediction of lymph node metastasis is tenuous. Most of the strongly cluster associated markers were not predictive despite their clusters being theoretically enriched. This inability to recognize the clusters in additional samples using alternative methods does not give confidence that these clusters are robust. SLC26A3 being associated with upstaging may very well be a useful marker, however, given the lack of association of the other markers, it may be premature to say this is due to the same Epi_10_CYSTM1 cluster.

      There are multiple issues in the classification of T cells and neutrophils. In the analysis of T cell subset, all CD4+ T cells are currently scored as Tregs, what happened to the T-helper cells? Additionally, Activated T and Cytotoxic T both seem to contain CD8+ cells, but all their populations have equivalent expression of the activation marker CD69. Moreover, the "Cytotoxic" ones also express TIGIT, HAVCR2 and LAG3 which are generally exhaustion markers. For neutrophils, several obviously different clusters have been grouped together (Neu_1 containing two diametrically opposite cell clouds being an obvious example).

      Again in the CellChat section of the results causal language is being repeatedly used. These are just possible interactions, not validated ones. While the co-localization in the provided IF images certainly supports the co-localization, this still is only correlative and doesn't prove causality.

      Minor Issues<br /> The sentence "However, due to the low morbidity of ADC, in-depth investigations are insufficient" could be misinterpreted. Morbidity generally refers to the severity or health burden rather than the frequency of cases, though it's true in some studies prevalence is used for the overall impact of the disease on a population and referred to as morbidity. In this instance though, "incidence" or "prevalence" would be clearer word choices.

      The previous rebuttal states that clusters/cell type calls were refined to eliminate issues such as epithelial cells creeping into the T cell cluster, however, the cell %s have not been altered according to the change tracking. Shouldn't all the %s have been altered even if only slightly?

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The findings of Ziolkowska and colleagues show that a specific projection from the nucleus reuniens of the thalamus (RE) to dorsal hippocampal CA1 neurons plays an important role in fear extinction learning in male and female mice. In and of itself, this is not a particularly new finding, although the authors' identification of structural alterations from within dorsal CA1 stratum lacunosum moleculare (SLM) as a candidate mechanism for the learning-related plasticity is potentially novel and exciting. The authors use a range of anatomical and functional approaches to demonstrate structural synaptic changes in dorsal CA1 that parallel the necessary role of RE inputs in modulating extinction learning. Yet, the significance of these findings is substantially limited by several technical shortcomings in the experimental design, and the authors' central interpretation. Otherwise, there remain several strengths in the design and interpretation that offset some of these concerns.

      Given that much is already known about the role of RE and hippocampus in modulating fear learning and extinction, it remains unclear whether addressing these concerns would substantially increase the impact of this study beyond the specific area of speciality. Below, several major weaknesses will be highlighted, followed by several miscellaneous comments.

      Methodological:

      (1) One major methodological weakness in the experimental design involves the widespread misapplication of Ns used for the statistical analyses. Much of the anatomical analyses of structural synaptic changes in the RE-CA1 pathway use N = number of axons (Figs. 1, 2), N = number of dendrites (Figs. 3, 4), and N = number of sections (Fig. 7; note that there are 7 figures in total). In every instance, N = animal number should be used. It is unclear which of these results would remain significant if N = animal number were used in each or how many more animals would be required. This is problematic since these data comprise the main evidence for the authors' central conclusion that specific structural synaptic changes are associated with fear extinction learning.

      We do agree with the reviewer that N = animal number is the preferred way to present data in most of our experiments. However, in some experimental groups we observed a very low number of entries. For example, in the 5US group we found RE+/+ spines only in 3 out of 6 analyzed animals. We believe that this observation is not due to technical problems as mCherry virus transduction required to find RE+/+ spines is similar in all experimental groups and we analyzed similar volumes of tissue. While this result still allows the calculation of density of RE+/+ spines per animal it generates no entries for spine area and PSD95 mean gray value if N = animal number. Hence, we decided to use N=animals to calculate spines and boutons densities, and N=dendritic spines/boutons to calculate other spine/bouton parameters. 

      (2) There is a lack of specific information regarding what constitutes learning with respect to behavioral freezing. It is never clearly stated what specific intervals are used over which freezing is measured during acquisition, extinction, and in extinction retrieval tests. Additionally, assessment of freezing during retrieval at 5- and 30-min time points doesn't lay to rest the possibility that there were differences in the decay rate over the 30-min period (also see below).

      We added a detailed description of how learning was assessed.

      ln 125-134: “For assessment of learning we used percent of time spent by animals freezing (% freezing). Freezing behavior was defined as complete lack of movement, except respiration. To assess within-session learning (working memory) we compared pre- and post-US freezing frequency (the first 148 sec vs last 30 sec) during the CFC session (day 1). To assess formation of long-term contextual fear memory, we compared pre-US freezing (day 1) and the first 5 minutes of the Extinction session (day 2). To assess within session contextual fear extinction we ran 2-way ANOVA to assess the effect of time and manipulation on freezing frequency. Freezing data were analyzed in 5-minute bins. To assess formation of long-term contextual fear extinction memory we compared the first 5 minutes of the Extinction session (day 2) and Test session (day 3).”

      As suggested by the reviewer, we also added data for all six 5-minut bins of Extinction sessions.

      (3) A minor-to-moderate methodological weakness concerns the authors' decision to utilize saline injected groups as controls for the chemogenetics experiments (Figs. 5, 6). The correct design is to have a CNO-only group with the same viral procedure sans hM4Di. This concern is partly mitigated by the inclusion of a CNO vs. saline injection control experiment (Fig. 6).

      Figure 5 does not describe a chemogenetic experiment.

      We added new groups with control virus (CNO vs saline) to Figure 6 (now Fig. 6D and H).

      The chemogenetic experiment shown on Figure 7 has all 4 experimental groups (Control vs hM4Di and saline vs CNO).

      (4) In the electron microscopic analyses of dendritic spines (Fig. 5), comparison of only the fear acquisition versus extinction training, and the lack of inclusion of a naïve control group, makes it difficult to understand how these structural synaptic changes are occurring relative to baseline. It is noteworthy that the authors utilize the tripartite design in other anatomical analyses (Fig. 2-4).

      We added data for the Naive mice to Figure 5.

      (5) Interpretation:

      The main interpretive weakness in the study is the authors' claim that their data shows a role for the RE-CA1 pathway in memory consolidation (i.e., see Abstract). This claim is based on the premise that, although RE-CA1 pathway inactivation with CNO treatment 30 min prior to contextual fear extinction did not affect freezing at 5- and 30-min time points relative to saline controls, these rats showed greater freezing when tested on extinction retrieval 24 h thereafter. First, the data do not rule out possible differences in the decay rate of freezing during extinction training due to CNO administration. Next, the fact that CNO is given prior to training still leaves open the possibility that acquisition was affected, even if there were not any frank differences in freezing. Support for this latter possibility derives from the fact that mice tested for extinction retrieval as early as 5 min after extinction training (Fig. 6C) showed the same impairments as mice tested 24 h later (Figs. 6A). Further, all the structural synaptic changes argued to underlie consolidation were based on analysis at a time point immediately following extinction training, which is too early to allow for any long-term changes that would underlie memory consolidation, but instead would confer changes associated with the extinction training event.

      We do agree with the reviewer that our data do not allow us to conclude whether RE-CA1 pathway is involved in acquisition or consolidation of CFE memory. Therefore, we avoid those terms in the manuscript. We just conclude that RE→CA1 participates in the CFE.

      Reviewer #2 (Public review):

      Summary:

      Ziółkowska et al. characterize the synaptic mechanisms at the basis of the REdCA1 contribution to the consolidation of fear memory extinction. In particular, they describe a layer specific modulation of RE-dCA1 excitatory synapses modulation associated to contextual fear extinction which is impaired by transient chemogenetic inhibition of this pathway. These results indicate that RE activity-mediated modulation of synaptic morphology contributes to the consolidation of contextual fear extinction

      Strengths:

      The manuscript is well conceived, the statistical analysis is solid and methodology appropriate. The strength of this work is that it nicely builds up on existing literature and provides new molecular insight on a thalamo-hippocampal circuit previously known for its role in fear extinction. In addition, the quantification of pre- and post-synapses is particularly thorough.

      Weaknesses:

      The findings in this paper are well supported by the data more detailed description of the methods is needed.

      (1) In the paragraph Analysis of dCA1 synapses after contextual fear extinction (CFE), more experimental and methodological data should be given in the text:

      - how was PSD95 used for the analysis, what was the difference between RE. Even if Thy1-GFP mice were used in Fig.2, it appears they were not used for bouton size analysis. To improve clarity, I suggest moving panel 2C to Figure 3. It is not clear whether all RE axons were indiscriminately analysed in Fig. 2 or if only the ones displaying colocalization with both PSD95 and GFP were analysed. If GFP was not taken into account here, analysed boutons could reflect synapses onto inhibitory neurons and this potential scenario should be discussed.

      PSD-95 immunostaining in close apposition to boutons was used to identify RE buttons innervating CA1 (Fig 1 and 2). In these cases PSD-95 signal was not quantified. PSD-95 in close apposition to dendritic spines was used as a proxy of PSDs in CA1 (Figure 3, 4 and 7). In these cases we assessed the integrated mean gray value of PSD-95 signal per dendritic spine (Figure 3, 4) or per ROI (Figure 7). This is explained in detail in the section Confocal microscopy and image quantification (ln 149-172).

      GFP signal was not taken into account during boutons analysis. This is explained in the materials and methods section Confocal microscopy and image quantification (ln 149-172).

      We indicate that PSD-95 is a marker of excitatory synapses located both on excitatory and inhibitory neurons.

      Ln 258: RE boutons were identified in SO and SLM as axonal thickenings in close apposition to PSD-95-positive puncta (a synaptic scaffold used as a marker of excitatory synapses located both on excitatory and inhibitory neurons (Kornau et al., 1995; El-Husseini et al., 2000; Chen et al., 2011; Dharmasri et al., 2024).

      We also cite literature demonstrating that RE projects to the hippocampal formation and forms asymmetric synapses with dendritic spines and dendrites, suggesting innervation of excitatory synapses on both excitatory and aspiny inhibitory neurons (ln 673).

      As advised by the reviewer the Figure 2C panel was moved to Figure 3 (now it is Fig 3A).

      (2) in the methods: The volume of intra-hippocampal CNO injections should be indicated. The concentration of 3 uM seems pretty low in comparison with previous studies. CNO source is missing.

      This section has been rewritten to be more clear. The concentration of CNO was chosen based on the previous studies (Stachniak et al., 2014).

      ln 103: “Cannula placement. Mice were anesthetized by inhalation of 3–5% isoflurane (IsoFlo; Abbott Animal Health) in oxygen and positioned in a stereotaxic frame (51503, Stoelting, Wood Dale, IL, USA). Two holes were drilled in the skull, and a double guide cannulae (2 mm apart and 2 mm long; 26GA, Plastics One) was lowered into the holes such that the cannula tip was located over dorsal CA1 area (2 mm posterior to bregma, ±1 mm lateral, and −1.3 mm vertical). Cannulae were kept patent by using 33-gauge internal dummy cannulae (Plastics One). The animals were used in contextual fear conditioning 21 days after the cannulation. Animals received bilateral CNO (3 μM, 0.2 μl per side for 1 min; Tocris Bioscience, Cat. No. 4936) (Stachniak et al., 2014) or saline injections (0.2 μl per side) 30 minutes before Extinction session via intrahippocampal injection cannulae (33-gauge). After the infusion, the cannula was left in place for 30 seconds. The cannula placement was verified by histology, and only data from animals with correct cannula implants were included in statistical analyses.”

      (3) More details of what software/algorithm was used to score freezing should be included.

      Freezing was automatically scored with VideoFreeze™ Software (Med Associates Inc.).

      (4) Antibody dilutions for IHC should be indicated. Secondary antibody incubation time should be indicated.

      The missing information is added.

      ln 144: “Next, sections were incubated in 4°C overnight with primary antibodies directed against PSD-95 (1:500, Millipore, MAB 1598), washed three times in 0.3% Triton X-100 in PBS and incubated in room temperature for 90 minutes with a secondary antibody bound with Alexa Fluor 647 (1:500, Invitrogen, A31571).”

      (5) No statement about code and data availability is present.

      The statements are added.

      ln 785: Row data and the code used for analysis of confocal data is available at OSF (https://osf.io/bnkpx/).

      Reviewer #3 (Public review):

      Summary:

      This paper examined the role of nucleus reuniens (RE) projections to dorsal CA1 neurons in context fear extinction learning. First, they show that RE neurons send excitatory projections to the stratum oriens (SO) and the stratum lacunosum moleculare (SLM), but not the stratum radiatum (SR). After context fear conditioning, the synaptic connections between RE and dCA1 neurons in the SLM (but not the SO) are weakened (reduced bouton and spine density) after mice undergo context fear conditioning. This weakening is reversed by extinction learning, which leads to enhanced synaptic connectivity between RE inputs and dendrites in the SLM. Control experiments demonstrate that the observed changes are due to extinction and not caused by simple exposure to the context. Extinction learning also induced increases in the size (volume and surface area) of the post-synaptic density (PSD) in SLM. To establish the functional role of RE inputs to dCA1, the researchers used an inhibitory DREADD to silence this pathway during extinction learning. They observe that extinction memory (measured 2-hours or 24-hours later) is impaired by this inhibition. Control experiments show that the extinction memory deficit is not simply due to increased freezing caused by inactivation of the pathway or injections of CNO. Inhibiting the RO projection during extinction learning also reduced the levels of PSD-95 protein levels in the spines of dCA1 neurons.

      Strengths:

      Based on their results, the authors conclude that, "the RE→SLM pathway participates in the updating of fearful context value by actively regulating CFE-induced molecular and structural synaptic plasticity in the SLM.". I believe the data are generally consistent with this hypothesis, although there is an important control condition missing from the behavioral experiments.

      Weaknesses:

      (1) A defining feature of extinction learning is that it is context specific (Bouton, 2004). It is expressed where it was learned, but not in other environments. Similarly, it has been shown that internal contexts (or states) also modulate the expression of extinction (Bouton, 1990). For example, if a drug is administered during extinction learning, it can induce a specific internal state. If this state is not present during subsequent testing, the expression of extinction is impaired just as it is when the physical context is altered (Bouton, 2004). It is possible that something similar is happening in Figure 6. In these experiments, CNO is administered to inactivate the RE-dCA1 projection during extinction learning. The authors observe that this manipulation impairs the expression of extinction the next day (or 2-hours later). However, the drug is not given again during the test. Therefore, it is possible that CNO (and/or inactivation of the RE-dCA1 pathway) induces a state change during extinction that is not present during subsequent testing. Based on the literature cited above, this would be expected to disrupt fear extinction as the authors observed. To determine if this alternative explanation is correct, the researchers need to add groups that receive CNO during extinction training and subsequent extinction testing. If the deficits in extinction expression reported in Figure 6 result from a state change, then these groups should not exhibit an impairment. In contrast, if the authors' account is correct, then the expression of extinction should still be disrupted in mice that receive CNO during training and testing.

      We do agree with the reviewer that such an experiment would be interesting. However, it could be also confusing as we could not distinguish whether the possible behavioral effects are related to the state-dependent aspects of CFE or impaired recall of CFE. Importantly, previous studies showed that RE is crucial for extinction recall (Totty et al., 2023). We also show that CFE memory is impaired not only when the animals recall CFE without CNO (day 3) but also with CNO (day 4) (Figure 6C). Moreover, we do not see the effects of CNO on CFE in the control groups (Figure 6D and H). So we believe that it is unlikely that CNO results in state-dependent CFE.

      (2) In their analysis of dCA1 synapses after contextual fear extinction (CFE) (Figure 4), the authors should have compared Ctx and Ctx-Ctx animals against naïve animals (as they did in Figure 3) when comparing 5US and Ext with naïve animals. Otherwise, the authors cannot make the following conclusion; "since changes of SLM synapses were not observed in the animals exposed to the familiar context that was not associated with the USs, our data support the role of the described structural plasticity at the RE→SLM synapses in CFE, rather than in processing contextual information in general.".

      We assume that the key experimental groups to conclude about synaptic plasticity related to particular behavior are the groups that differ just by one factor/experience. For CFE that would be mice sacrificed immediately before and after CFE session (Figure 2 & 3); on the other hand to conclude about the effects of the re-exposure to the neutral context mice sacrificed before and after second exposure to the neutral context are needed (Figure 4). The naive group, as it differs by at least two manipulations from the Ext and Ctx-Ctx groups, is interesting but not crucial in both cases. This group would be necessary if we focused on the memories of FC or novel context. However, these topics are not the main focus of the current manuscript. Still, the naive group is shown on Figures 2 & 3 to check if CFE brings spine parameters to the levels observed in mice with low freezing.

      We have re-written the cited paragraph to be more precise in our conclusions.

      "Overall, our data demonstrate that synapses in all dCA1 strata undergo structural or molecular changes relevant to CFC and/or CFE. However, only in SLM CFE-induced synaptic changes are likely to be directly regulated by RE inputs as they appear on RE+ dendrites and spines. Since such changes of SLM synapses were not observed in the animals re-exposed to the neutral context, our data support the role of the described structural plasticity at the RE→SLM synapses in CFE, rather than in processing contextual information in general."

      (3) In the materials and methods section, the description of cannula placements is confusing and needs to be rewritten.

      This section has been rewritten.

      ln 103: “Cannula placement. Mice were anesthetized by inhalation of 3–5% isoflurane (IsoFlo; Abbott Animal Health) in oxygen and positioned in a stereotaxic frame (51503, Stoelting, Wood Dale, IL, USA). Two holes were drilled in the skull, and a double guide cannulae (2 mm apart and 2 mm long; 26GA, Plastics One) was lowered into the holes such that the cannula tip was located over dorsal CA1 area (2 mm posterior to bregma, ±1 mm lateral, and −1.3 mm vertical). Cannulae were kept patent by using 33-gauge internal dummy cannulae (Plastics One). The animals were used in contextual fear conditioning 21 days after the cannulation. Animals received bilateral CNO (3 μM, 0.2 μl per side for 1 min; Tocris Bioscience, Cat. No. 4936) (Stachniak et al., 2014) or saline injections (0.2 μl per side) 30 minutes before Extinction session via intrahippocampal injection cannulae (33-gauge). After the infusion, the cannula was left in place for 30 seconds. The cannula placement was verified by histology, and only data from animals with correct cannula implants were included in statistical analyses.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Other/ Minor:

      In the beginning of the second paragraph on p. 21 of the Results section, it states that "RE-dCA1 has no effect on working memory," although it was not clear what data the authors were referring to support this conclusion.

      We refer there to the changes of freezing behavior within the CFE session. This is explained now.

      Reviewer #2 (Recommendations for the authors):

      No statement about code and data availability is present.

      The statements are added.

      ln 785: “Row data and the code used for analysis of confocal data is available at OSF (https://osf.io/bnkpx/).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This work makes several contributions: (1) a method for the self-supervised segmentation of cells in 3D microscopy images, (2) an cell-segmented dataset comprising six volumes from a mesoSPIM sample of a mouse brain, and (3) a napari plugin to apply and train the proposed method.

      First, thanks for acknowledging our contributions of a new tool, new dataset, and new software.

      (1) Method

      This work presents itself as a generalizable method contribution with a wide scope: self-supervised 3D cell segmentation in microscopy images. My main critique is that there is almost no evidence for the proposed method to have that wide of a scope. Instead, the paper is more akin to a case report that shows that a particular self-supervised method is good enough to segment cells in two datasets with specific properties.

      First, thanks for acknowledging our contributions of a new tool, new dataset, and new software. We agree we focus on lightsheet microscopy data, therefore to narrow the scope we have changed the title to “CellSeg3D: self-supervised 3D cell segmentation for light-sheet microscopy”.

      To support the claim that their method "address[es] the inherent complexity of quantifying cells in 3D volumes", the method should be evaluated in a comprehensive study including different kinds of light and electron microscopy images, different markers, and resolutions to cover the diversity of microscopy images that both title and abstract are alluding to.

      You have selectively dropped the last part of that sentence that is key: “.... 3D volumes, often in cleared neural tissue” – which is what we tackle. The next sentence goes on to say: “We offer a new 3D mesoSPIM dataset and show that CellSeg3D can match state-of-the-art supervised methods.” Thus, we literally make it clear our claims are on MesoSPIM and cleared data.

      The main dataset used here (a mesoSPIM dataset of a whole mouse brain) features well-isolated cells that are easily distinguishable from the background. Otsu thresholding followed by a connected component analysis already segments most of those cells correctly.

      This is not the case, as all the other leading methods we fairly benchmark cannot solve the task without deep learning (i.e., no method is at an F1-Score of 1).

      The proposed method relies on an intensity-based segmentation method (a soft version of a normalized cut) and has at least five free parameters (radius, intensity, and spatial sigma for SoftNCut, as well as a morphological closing radius, and a merge threshold for touching cells in the post-processing). Given the benefit of tweaking parameters (like thresholds, morphological operation radii, and expected object sizes), it would be illuminating to know how other non-learning-based methods will compare on this dataset, especially if given the same treatment of segmentation post-processing that the proposed method receives. After inspecting the WNet3D predictions (using the napari plugin) on the used datasets I find them almost identical to the raw intensity values, casting doubt as to whether the high segmentation accuracy is really due to the self-supervised learning or instead a function of the post-processing pipeline after thresholding.

      First, thanks for testing our tool, and glad it works for you. The deep learning methods we use cannot “solve” this dataset, and we also have a F1-Score (dice) of ~0.8 with our self-supervised method. We don’t see the value in applying non-learning methods; this is unnecessary and beyond the scope of this work.

      I suggest the following baselines be included to better understand how much of the segmentation accuracy is due to parameter tweaking on the considered datasets versus a novel method contribution:

      *  comparison to thresholding (with the same post-processing as the proposed method) * comparison to a normalized cut segmentation (with the same post-processing as the proposed method)

      *  comparison to references 8 and 9.

      Ref 8 and 9 don’t have readily usable (https://github.com/LiangHann/USAR) or even shared code (https://github.com/Kaiseem/AD-GAN), so re-implementing this work is well beyond the bounds of this paper. We benchmarked Cellpose, StartDist, SegResNets, and a transformer – SwinURNet. Moreover, models in the MONAI package can be used. Note, to our knowledge the transformer results also are a new contribution that the Reviewer does not acknowledge.

      I further strongly encourage the authors to discuss the limitations of their method. From what I understand, the proposed method works only on well-separated objects (due to the semantic segmentation bottleneck), is based on contrastive FG/BG intensity values (due to the SoftNCut loss), and requires tuning of a few parameters (which might be challenging if no ground-truth is available).

      We added text on limitations. Thanks for this suggestion.

      (2) Dataset

      I commend the authors for providing ground-truth labels for more than 2500 cells. I would appreciate it if the Methods section could mention how exactly the cells were labelled. I found a good overlap between the ground truth and Otsu thresholding of the intensity images. Was the ground truth generated by proofreading an initial automatic segmentation, or entirely done by hand? If the former, which method was used to generate the initial segmentation, and are there any concerns that the ground truth might be biased towards a given segmentation method?

      In the already submitted version, we have a 5-page DataSet card that fully answers your questions. They are ALL labeled by hand, without any semi-automatic process.

      In our main text we even stated “Using whole-brain data from mice we cropped small regions and human annotated in 3D 2,632 neurons that were endogenously labeled by TPH2-tdTomato” - clearly mentioning it is human-annotated.

      (3) Napari plugin

      The plugin is well-documented and works by following the installation instructions.

      Great, thanks for the positive feedback.

      However, I was not able to recreate the segmentations reported in the paper with the default settings for the pre-trained WNet3D: segments are generally too large and there are a lot of false positives. Both the prediction and the final instance segmentation also show substantial border artifacts, possibly due to a block-wise processing scheme.

      Your review here does not match your comments above; above you said it was working well, such that you doubt the GT is real and the data is too easy as it was perfectly easy to threshold with non-learning methods.

      You would need to share more details on what you tried. We suggest following our code; namely, we provide the full experimental code and processing for every figure, as was noted in our original submission: https://github.com/C-Achard/cellseg3d-figures.

      Reviewer #2 (Public Review):

      Summary:

      The authors propose a new method for self-supervised learning of 3d semantic segmentation for fluorescence microscopy. It is based on a WNet architecture (Encoder / Decoder using a UNet for each of these components) that reconstructs the image data after binarization in the bottleneck with a soft n-cuts clustering. They annotate a new dataset for nucleus segmentation in mesoSPIM imaging and train their model on this dataset. They create a napari plugin that provides access to this model and provides additional functionality for training of own models (both supervised and self-supervised), data labeling, and instance segmentation via post-processing of the semantic model predictions. This plugin also provides access to models trained on the contributed dataset in a supervised fashion.

      Strengths:

      (1) The idea behind the self-supervised learning loss is interesting.

      (2) The paper addresses an important challenge. Data annotation is very time-consuming for 3d microscopy data, so a self-supervised method that yields similar results to supervised segmentation would provide massive benefits.

      Thank you for highlighting the strengths of our work and new contributions.

      Weaknesses:

      The experiments presented by the authors do not adequately support the claims made in the paper. There are several shortcomings in the design of the experiment, presentation of the results, and reproducibility.

      We address your concerns and misunderstandings below.

      Major weaknesses:

      (1) The main experiments are conducted on the new mesoSPIM dataset, which contains quite small nuclei, much smaller than the pretraining datasets of CellPose and StarDist. I assume that this is one of the main reasons why these well-established methods don't work for this dataset.

      StarDist is not pretrained, we trained it from scratch as we did for WNet3D. We retrained Cellpose and reported the results both with their pretrained model and our best-retrained model. This is documented in Figure 1 and Suppl. Figure 1. We also want to push back and say that they both work very well on this data. In fact, our main claim is not that we beat them, it is that we can match them with a self-supervised method.

      Limiting method comparison to only this dataset may create a misleading impression that CellSeg3D is superior for all kinds of 3D nucleus segmentation tasks, whereas this might only hold for small nuclei.

      The GT dataset we labeled has nuclei that are normal brain-cell sized. Moreover in Figure 2 we show very different samples with both dense and noisy (c-FOS) labeling.

      We also clearly do not claim this is superior for all tasks, from our text: “First, we benchmark our methods against Cellpose and StarDist, two leading supervised cell segmentation packages with user-friendly workflows, and show our methods match or outperform them in 3D instance segmentation on mesoSPIM-acquired volumes" – we explicitly do NOT claim beyond the scope of the benchmark. Moreover we state: "We found that WNet3D could be as good or better than the fully supervised models, especially in the low data regime, on this dataset at semantic and instance segmentation" – again noting on this dataset. Again, we only claimed we can be as good as these methods with an unsupervised approach, and in the low-GT data regime we can excel.

      Further, additional preprocessing of the mesoSPIM images may improve results for StarDist and CellPose (see the first point in minor weaknesses). Note: having a method that works better for small nuclei would be an important contribution. But I doubt that the claims hold for larger and or more crowded nuclei as the current version of the paper implies.

      Figure 2 benchmarks our method on larger and denser nuclei, but we do not intend to claim this is a universal tool. It was specifically designed for light-sheet (brain) data, and we have adjusted the title to be more clear. But we also show in Figure 2 it works well on more dense and noisy samples, hinting that it could be a promising approach. But we agree, as-is, it’s unlikely to be good for extremely dense samples like in electron microscopy, which we never claim it would be.

      With regards to preprocessing, we respectfully disagree. We trained StarDist (and asked the main developer of StarDist, Martin Weigert, to check our work and he is acknowledged in the paper) and it does very well. Cellpose we also retrained and optimized and we show it works as-well-as leading transformer and CNN-based approaches. Again, we only claimed we can be as good as these methods with an unsupervised approach.

      The contribution of the paper would be much stronger if a **fair** comparison with StarDist / CellPose was also done on the additional datasets from Figure 2.

      We appreciate that more datasets would be ideal, but we always feel it’s best for the authors of tools to benchmark their own tools on data. We only compared others in Figure 1 to the new dataset we provide so people get a sense of the quality of the data too; there we did extensive searches for best parameters for those tools. So while we think it would be nice, we will leave it to those authors to be most fair. We also narrowed the scope of our claims to mesoSPIM data (added light-sheet to the title), which none of the other examples in Figure 2 are.

      (2) The experimental setup for the additional datasets seems to be unrealistic. In general, the description of these experiments is quite short and so the exact strategy is unclear from the text. However, you write the following: "The channel containing the foreground was then thresholded and the Voronoi-Otsu algorithm used to generate instance labels (for Platynereis data), with hyperparameters based on the Dice metric with the ground truth." I.e., the hyperparameters for the post-processing are found based on the ground truth. From the description it is unclear whether this is done a) on the part of the data that is then also used to compute metrics or b) on a separate validation split that is not used to compute metrics. If a) this is not a valid experimental setup and amounts to training on your test set. If b) this is ok from an experimental point of view, but likely still significantly overestimates the quality of predictions that can be achieved by manual tuning of these hyperparameters by a user that is not themselves a developer of this plugin or an absolute expert in classical image analysis, see also 3.

      We apologize for this confusion; we have now expanded the methods to clarify the setup is now b; you can see what we exactly did as well in the figure notebook: https://c-achard.github.io/cellseg3d-figures/fig2-b-c-extra-datasets/self-supervised-ext ra.html#threshold-predictions.

      For clarity, we additionally link each individual notebook now in the Methods.

      (3) I cannot reproduce any of the results using the plugin. I tried to reproduce some of the results from the paper qualitatively: First I downloaded one of the volumes from the mesoSPIM dataset (c5image) and applied the WNet3D to it. The prediction looks ok, however the value range is quite close (Average BG intensity ~0.4, FG intensity 0.6-0.7). I try to apply the instance segmentation using "Convert to instance labels" from "Utilities". Using "Voronoi-Otsu" does not work due to an error in pyClesperanto ("clGetPlatformIDs failed: PLATFORM_NOT_FOUND_KHR"). Segmentation via "Connected Components" and "Watershed" requires extensive manual tuning to get a somewhat decent result, which is still far from perfect.

      We are sorry to hear of the installation issue; pyClesperanto is a dependency that would be required to reproduce the images (sounds like you had this issue; https://forum.image.sc/t/pyclesperanto-prototype-doesnt-work/45724 ) We added to our docs now explicitly the fix:https://github.com/AdaptiveMotorControlLab/CellSeg3D/pull/90. We recommend checking the reproduction notebooks (which were linked in initial submission): https://c-achard.github.io/cellseg3d-figures/intro.html.

      Then I tried to reproduce the results for the Mouse Skull Nuclei Dataset from EmbedSeg. The results look like a denoised version of the input image, not a semantic segmentation. I was skeptical from the beginning that the method would transfer without retraining, due to the very different morphology of nuclei (much larger and elongated). None of the available segmentation methods yield a good result, the best I can achieve is a strong over-segmentation with watersheds.

      We are surprised to hear this; did you follow the following notebook which directly produces the steps to create this figure? (This was linked in preprint): https://c-achard.github.io/cellseg3d-figures/fig2-c-extra-datasets/self-supervised-extra .html

      We also expanded the methods to include the exact values from the notebook into the text.

      Minor weaknesses:

      (1) CellPose can work better if images are resized so that the median object size in new images matches the training data. For CellPose the cyto2 model should do this automatically. It would be important to report if this was done, and if not would be advisable to check if this can improve results.

      We reported this value in Figure 1 and found it to work poorly, that is why we retrained Cellpose and found good performance results (also reported in Figure 1). Resizing GB to TB volumes for mesoSPIM data is otherwise not practical, so simply retraining seems the preferable option, which is what we did.

      (2) It is a bit confusing that F1-Score and Dice Score are used interchangeably to evaluate results. The dice score only evaluates semantic predictions, whereas F1-Score evaluates the actual instance segmentation results. I would advise to only use F1-Score, which is the more appropriate metric. For Figure 1f either the mean F1 score over thresholds or F1 @ 0.5 could be reported. Furthermore, I would advise adopting the recommendations on metric reporting from https://www.nature.com/articles/s41592-023-01942-8.

      We are using the common metrics in the field for instance and semantic segmentation, and report them in the methods. In Figure 2f we actually report the “Dice” as defined in StarDist (as we stated in the Methods). Note, their implementation is functionally equivalent to F1-Score of an IoU >= 0, so we simply changed this label in the figure now for clarity. We agree this clarifies for the expert readers what was done, and we expanded the methods to be more clear about metrics.

      We added a link to the paper you mention as well.

      (3) A more conceptual limitation is that the (self-supervised) method is limited to intensity-based segmentation, and so will not be able to work for cases where structures cannot be distinguished based on intensity only. It is further unclear how well it can separate crowded nuclei. While some object separation can be achieved by morphological operations this is generally limited for crowded segmentation tasks and the main motivation behind the segmentation objective used in StarDist, CellPose, and other instance segmentation methods. This limitation is only superficially acknowledged in "Note that WNet3D uses brightness to detect objects [...]" but should be discussed in more depth. Note: this limitation does not mean at all that the underlying contribution is not significant, but I think it is important to address this in more detail so that potential users know where the method is applicable and where it isn't.

      We agree, and we added a new section specifically on limitations. Thanks for raising this good point. Thus, while self-supervision comes at the saving of hundreds of manual labor, it comes at the cost of more limited regimes it can work on. Hence why we don’t claim this should replace excellent methods like Cellpose or Stardist, but rather complement them and can be used on mesoSPIM samples, as we show here.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) One of the listed contributions is "adding the SoftNCuts loss". This is not true, reference 10 already introduced that loss.

      “Our changes include a conversion to a fully 3D architecture and adding the SoftNCuts loss” - we dropped the common and added the word “AND” to note that we added the 3D version of the SoftNCuts loss TO the 3D architecture, which 10 did not do.

      (2) "Typically, these methods use a multi-step approach" to segment 3D from 2D: this is only true for CellPose, StarDist does real 3D.

      That is why we preface with “typically” which implies not always.

      (3) "see Methods, Figure 1c, c)" is missing an opening in parentheses.

      (4) K is not introduced in equation (1) (presumably the number of classes, which seems to be 2 for all experiments considered).

      k actually was introduced just below equation 1 as the number of classes. We added the note that k was set to 2.

      (5) X is not introduced in equation (2) (presumably the spatial position of a voxel).

      Sorry for this oversight. We add that $X$ is the spatial position of the voxel.

      Reviewer #2 (Recommendations For The Authors):

      To improve the paper the weaknesses mentioned above should be addressed:

      (1) Compare to StarDist and/or CellPose on further datasets, esp. using pre-trained CellPose, to see if the claims of competitive performance with state-of-the-art approaches hold up for the case of different nucleus morphologies. The EmbedSeg datasets from Figure 2 c are well suited for this. In the current form, the claims are too broad and not supported if thorough experiments are performed on a single dataset with a very specific morphology. Note: even if the method is not fully competitive with CellPose / StarDist on these Datasets it holds merit since a segmentation method that works for small nuclei as in the mesoSPIM dataset and works self-supervised is very valuable.

      (2) Clarify how the best instance segmentation hyperparameters are found. If you indeed optimize these on the same part of the dataset used for evaluating metrics then the current experimental set-up is invalid. If this is not the case I would still rethink if this is a good way to report the results since it does not seem to reflect user experience. I found it not possible to find good hyperparameters for either of the two segmentation approaches I tried (see also next point) so I think these numbers are too optimistic.

      (3) Improve the instance segmentation part of the plugin: either provide troubleshooting for how to install pyClesperanto correctly to use the voronoi-based instance segmentation or implement it based on more standard functionality like skimage / scipy. Provide more guidance for finding good hyperparameters for the segmentation task.

      (4) Make sure image resizing is done correctly when using pre-trained CellPose models and report on this.

      (5) Report F1 Scores only (unless there is a compelling reason to also report Dice).

      (6) Address the limitations of the method in more detail.

      On a positive note: all data and code are available and easy to download/install. A minor comment: it would be very helpful to have line numbers for reviewing a revised version.

      All comments are also addressed in the public reviews.

    1. Add the code to create an object called path_data_raw

      [hugo] I would suggest we remove the object here for the variable name. I hear that it makes them practice object, but I feel it's confusing to introduce here("data", "raw") and then suddenly ask them to do here(path_data_raw, "msf_linelist_moissala_2023-09-24.xlsx"). The variable does not add much here and is only truly important with long automated scripts ...

    Annotators

    1. Add the code to create an object called path_data_raw

      I would suggest we remove the object here for the variable name. I hear that it makes them practice object, but I feel it's confusing to introduce here("data", "raw") and then suddenly ask them to do here(path_data_raw, "msf_linelist_moissala_2023-09-24.xlsx"). The variable does not add much here and is only truly important with long automated scripts ...

    2. The principles you learned in the Data Management module will apply here as well: we should do our best to ensure that our projects won’t just work today but can also be reused and shared in the future. While doing this is not always easy, there are several best practices that can help us, and one of the most important is to start with a good, organized code base.

      Test of a comment here - what happens if I render ?

    Annotators

    1. Reviewer #2 (Public review):

      Summary:

      This study evaluated the aperiodic component in the medial prefrontal cortex (mPFC) using resting-state EEG recordings from 149 individuals with chronic pain and 115 healthy participants. The findings showed no significant differences in the aperiodic component of the mPFC between the two groups, nor was there any correlation between the aperiodic component and pain intensity. These results were consistent across various chronic pain subtypes and were corroborated by whole-brain analyses. The study's robustness was further reinforced by preregistration and multiverse analyses, which accounted for a wide range of methodological choices.

      Strengths:

      This study was rigorously conducted, yielding clear and conclusive results. Furthermore, it adhered to stringent open and reproducible science practices, including preregistration, blinded data analysis, and Bayesian hypothesis testing. All data and code have been made openly available, underscoring the study's commitment to transparency and reproducibility.

      Weaknesses:

      The aperiodic exponent of the EEG power spectrum is often regarded as an indicator of the excitatory/inhibitory (E/I) balance. However, this measure may not be the most accurate or optimal for quantifying E/I balance, a limitation that the authors might consider addressing in the future.

      Comments on revisions:

      All my comments have been well addressed.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1: 

      Summary:

      In this study, Avila et al. tested the hypothesis that chronic pain states are associated with changes in the excitability of the medial prefrontal cortex (mPFC). The authors used the slope of the aperiodic component of the EEG power spectrum (= the aperiodic exponent) as a novel, non-invasive proxy for the cortical excitation-inhibition ratio. They performed source localization to estimate the EEG signals generated specifically by the mPFC. By pooling resting-state EEG recordings from three existing datasets, the authors were able to compare the aperiodic exponent in the mPFC and across the whole brain (at all modeled cortical sources) between 149 chronic pain patients and 115 healthy controls. Additionally, they assessed the relationship between the aperiodic exponent and pain intensity reported by the patients. To account for heterogeneity in pain etiology, the analysis was also performed separately for two patient subgroups with different chronic pain conditions (chronic back pain and chronic widespread pain). The study found robust evidence against differences in the aperiodic exponent in the mPFC between people with chronic pain and healthy participants, and no correlation was observed between the aperiodic exponent and pain intensity. These findings were consistent across different patient subgroups and were corroborated by the whole-brain analysis.

      Strengths:

      The study is based on sound scientific reasoning and rigorously employs suitable methods to test the hypothesis. It follows a pre-registered protocol, which greatly increases the transparency and, consequently, the credibility of the reported results. In addition to the planned steps, the authors used a multiverse analysis to ensure the robustness of the results across different methodological choices. I find this particularly interesting, as the EEG aperiodic exponent has only recently been linked to network excitability, and the most appropriate methods for its extraction and analysis are still being determined. The methods are clearly and comprehensively described, making this paper very useful for researchers planning similar studies. The results are convincing, and supported by informative figures, and the lack of the expected difference in mPFC excitability between the tested groups is thoroughly and constructively discussed.

      We are grateful for the appreciation of the strengths of our study.  

      Weaknesses:

      Firstly, although I appreciate the relatively large sample size, pooling data recorded by different researchers using different experimental protocols inevitably increases sample variability and may limit the availability of certain measures, as was the case here with the reports of pain intensity in the patient group. Secondly, the analysis heavily relies on the estimation of cortical sources, an approach that offers many advantages but may yield imprecise results, especially when default conduction models, source models, and electrode coordinates are used. In my opinion, this point should be discussed as well.

      We agree that the heterogeneous sample of people with chronic pain increases variability and limits the availability of clinical measures. We further agree on the limitations of source space analysis. Therefore, we have added these limitations to the discussion section.

      Reviewer #2: 

      Summary:

      This study evaluated the aperiodic component in the medial prefrontal cortex (mPFC) using restingstate EEG recordings from 149 individuals with chronic pain and 115 healthy participants. The findings showed no significant differences in the aperiodic component of the mPFC between the two groups, nor was there any correlation between the aperiodic component and pain intensity. These results were consistent across various chronic pain subtypes and were corroborated by whole-brain analyses. The study's robustness was further reinforced by preregistration and multiverse analyses, which accounted for a wide range of methodological choices.

      Strengths:

      This study was rigorously conducted, yielding clear and conclusive results. Furthermore, it adhered to stringent open and reproducible science practices, including preregistration, blinded data analysis, and Bayesian hypothesis testing. All data and code have been made openly available, underscoring the study's commitment to transparency and reproducibility.

      We appreciate the appraisal of the strengths of our study, highlighting our efforts in open and reproducible science practices.

      Weaknesses:

      The aperiodic exponent of the EEG power spectrum is often regarded as an indicator of the excitatory/inhibitory (E/I) balance. However, this measure may not be the most accurate or optimal for quantifying E/I balance, a limitation that the authors might consider addressing in the future.

      We are grateful for this suggestion and fully agree that the aperiodic component of the power spectrum is not necessarily the most optimal and accurate measure for quantifying E/I balance. We have now included this limitation in the discussion section.

      Recommendations for the authors

      Reviewer #1: 

      (1) In the Results section, it might be helpful to provide the mean values of the aperiodic exponent (before age correction) for all tested groups and subgroups. As this measure is still not widely used, providing these values would allow readers to better understand the normal range of the aperiodic exponent.

      We have added the mean values of the aperiodic exponent and their standard deviation (before age correction) to the manuscript's results section (page 6 and 11).

      (2) When reporting the aperiodic exponent across all cortical sources (Q3), I think it would be useful to include the raw values in Figure 6 in the main text rather than in the Supplementary Materials. At a glance, these plots seem to suggest that the aperiodic exponent differs between groups in the occipital and parietal regions, even though no tests were significant after correcting for multiple comparisons. Maybe this observation also deserves a mention in the text and possibly in the Discussion..?

      We have moved the report on the aperiodic exponent across all cortical sources from the Supplementary Material to the main text. It is now Fig. 7 of the main manuscript. Moreover, we agree that the plots suggest group differences in certain brain regions. However, according to our rigorous open and reproducible science practices and pre-registration, we prefer not to speculate on these non-significant findings. 

      (3) In the Methods section, when describing the participants, the authors state that "Gender was balanced across both groups...". It might be better to avoid referring to the datasets as "balanced," considering that the sample includes almost twice as many females as males.

      We have replaced the misleading statement with the more precise statement that ”the gender ratio of both groups was similar.”

      (4) In the Methods section, when describing the source localization, I find it slightly confusing that the authors first mention the anterior cingulate cortex as a possible label included in the mPFC cortical parcels but then state that the version of the cortical atlas used did not contain such a label. It might be simpler not to mention the cingulate cortex at all.

      We have deleted the misleading sentence from the manuscript.  

      Reviewer #2: 

      (1) The aperiodic exponent of the EEG power spectrum is often considered an indicator of the excitatory/inhibitory (E/I) balance, but this measure can be susceptible to artifacts. It is important to acknowledge this limitation and consider exploring alternative measures to quantify the E/I ratio in future studies.

      We are grateful for this suggestion and fully agree that the aperiodic component of the power spectrum is not necessarily the most optimal and accurate measure for quantifying E/I balance. We have now included this limitation in the discussion section.

      (2) The study assumed a linear relationship between the E/I ratio (represented by the aperiodic exponent of the EEG power spectrum) and chronic pain. However, this assumption may not hold true in all cases, and this point could be discussed in the study.

      We fully agree that the relationship between the E/I ratio and chronic pain might not be a linear one and have added this point to the discussion section.

      (3) The aperiodic component was characterized in eyes-closed resting-state EEG recordings, although EEG data were collected in both eyes-closed and eyes-open conditions. The authors could also consider assessing the aperiodic component from EEG data with eyes open.

      We thank the reviewer for this suggestion. We have focused our analysis on eyes-closed recordings since these recordings are usually less contaminated by artifacts than eyes-open recordings. Moreover, in our current datasets, some participants were missing eyes-open recordings. We agree that performing similar analyses for the eyes-open recordings would also be interesting. However, adding these analyses would double the amount of data included in the manuscript, which would likely overload it. We have, therefore, now included a statement to the discussion that future studies should also analyze eyes-open EEG recordings.  

      (4) The EEG power spectrum was calculated from signals after source reconstruction, a crucial step for targeting specific brain regions. However, this process can introduce potential signal distortions, such as variations in source waveforms depending on different regularization parameters. To ensure the robustness of the results, the authors could perform the same analysis at the sensor level, for example, using signals recorded at Fz.

      We agree on the potential shortcomings and limitations of source space analysis and have added this limitation to the discussion section.

      (5) It would be beneficial to present the raw EEG power spectrum averaged across subjects for each condition, along with the scalp distribution of the aperiodic exponent. This would enhance readers' understanding of the study and help demonstrate the quality of the data.

      We are grateful for this suggestion and added the power spectrum for each condition and the scalp distribution of the aperiodic exponent to the Supplementary Material.

      (6) Linear regression models were used to control for the influence of age on aperiodic exponents and pain intensity ratings. However, it is unclear why other relevant variables, such as gender and medication use, were not considered.

      We agree that the aperiodic exponent might be influenced by gender and medication. As these analyses had not been included in our pre-registered analysis plan, we have not performed them. Moreover, although we agree that gender might have an impact, we have not found any evidence for this so far. Regarding medication, we fully agree that medication can influence the measure. However, medication was very heterogeneous, including drugs with fundamentally different mechanisms of action. Thus, we do not see a robust way to appropriately analyze these effects with sufficient statistical power. We have now added this important point to the discussion section.

      (7) The authors may consider addressing or discussing the impact of inter-individual variability on the negative results, particularly given that the data were derived from multiple experiments.

      We agree that the heterogeneous sample of people with chronic pain increases variability and limits the availability of clinical measures. We have added this limitation to the discussion.

    1. Create a new section in your code called File Paths Add the code to create an object called path_data_raw that contains the path to your raw data folder using the function here(). We can now pass our new variable path_data_raw back into here() in order to create a full path to a specific data file.

      Can we not simplify and use just here::here() without saving to a new variable ?

    Annotators

    1. iacore helpfully wrote script using Deno to set up a nest. You can find the code and instructions on their repo: https://git.envs.net/iacore/featherwiki-deno-server

      to

    1. Reviewer #1 (Public review):

      Summary:

      The authors demonstrate a fully unsupervised, high throughput (meaning very low human interaction required) approach to quantifying marmoset behavior in unconstrained environments.

      Strengths:

      The authors provide an approach that is scalable, easy to implement at face value, and highly robust. Currently, most behavioral quantification approaches do not work well on marmosets, or the published examples that do look promising do not scale towards high throughput as demonstrated by the authors.

      While marmosets can certainly be a useful translational research model devoid of free behavior quantification, the authors make a compelling point about how this approach can be useful in the study of treatments of emerging marmoset disease models.

      Overall this is a very exhaustive manuscript that overcomes significant shortcomings in previous work and speaks highly to the use of marmosets for unconstrained behavioral and neural assessment.

      Weaknesses:

      Recording marmoset behavior with a 60Hz frame rate is a significant limitation to the approach which is hopefully easily alleviated in the future through better cameras/reconstruction pipelines. Marmosets (in the reviewers' experience) have a lot of motion energy above the 30Hz nyquist limit imposed by this system and are agile to a degree requiring higher frame rates.

      The manuscript neglects recent approaches to non-human primate behavioral quantification from other groups that should be included. Simians are simians after all.

      As a minor weakness, this reviewer would have liked to see code shared for the reviewers to evaluate, especially pertaining to the high throughput and robustness of the approach.