10,000 Matching Annotations
  1. Apr 2026
    1. The agent would not have looked for this without studying other backends during the research phase. From the CPU code alone, the two-step approach looks fine.

      令人惊讶的是:AI代理通过研究其他后端实现发现了CPU后端中缺失的优化机会。这表明AI代理能够跨代码库进行知识迁移,找到人类开发者可能忽略的优化点,展示了AI在代码理解方面的独特优势。

    1. Coding is the dominant use case for AI by nearly an order of magnitude. It's abundantly clear in the [reported explosive growth] of companies like Cursor, as well as the [hyper growth] of tools like Claude Code and Codex.

      令人惊讶的是:编程已成为AI在企业中最主要的应用场景,其规模远超其他用例近一个数量级。工程师使用AI工具可以将生产力提高10-20倍,这一惊人的效率提升解释了为什么企业愿意如此迅速地采用AI编程工具,也颠覆了人们对软件开发工作流程的传统认知。

    2. Code is upstream of all other applications because it's the core building block for any piece of software, so AI's accelerating impact on code should accelerate every other domain.

      「代码是所有其他应用的上游」——这是整篇报告最具战略眼光的一句话。AI 对编程的渗透不只是一个行业的故事,而是所有行业 AI 化的基础设施升级。当构建软件的成本下降 10 倍时,所有依赖软件的垂直行业的 AI 工具建设成本也随之下降。这解释了为什么编程 AI 的爆发不只是「一个热门赛道」,而是整个 AI 产业链的放大器。对智谱 AI 的启示:代码能力的提升是所有企业 Agent 场景的先决条件。

    1. Within eight days, the same campaign had cascaded from GitHub Actions to Docker Hub, npm, PyPI, and the VS Code extension marketplace. With just one token across five ecosystems, thousands of organizations were potentially impacted.

      令人惊讶的是:仅凭一个访问令牌,攻击者在短短八天内就横跨五个主要生态系统(GitHub Actions、Docker Hub、npm、PyPI和VS Code扩展市场),影响了数千个组织。这展示了现代供应链攻击的规模和速度有多么惊人。

    2. We are building a world where machines write the code, machines choose the dependencies, and machines ship the updates. The AI agents are building the software. If we don't secure the supply chain they rely on, the AI agents are cooked.

      大多数人认为AI将提高软件开发的效率和安全性,但作者警告说,如果我们不保护AI代理所依赖的供应链,这些代理本身就会成为攻击目标。这挑战了AI发展必然带来安全提升的主流观点,提出了一个反直觉的警告。

    1. In recent weeks, Apple has either pulled or blocked updates to apps such as Anything and Replit, pushing developers to change how their tools generate and execute code.

      令人惊讶的是,苹果正在积极阻止或撤回使用AI编码工具的应用程序更新,如Anything和Replit。这表明苹果对AI生成和执行代码的方式持谨慎态度,担心这些工具可能违反其应用审核指南和开发者计划许可,反映了公司对AI技术复杂性的担忧。

    1. Clean up most of the allocated kernel-space memory(e.g., process’s running time info).Step (2) Clean up the exit process’s user-space memory.Step (3) Notify the parent with SIGCHLD.exit() iscalled.(1) (2) (3)exit()returns.ChildKernel

      子进程调用 exit() 后,内核不会立即删除其 process table entry(PCB的一部分),而是将其标记为 zombie,并保留退出状态(exit code)等信息; 同时内核向父进程发送 SIGCHLD 信号,通知其子进程已终止; 父进程随后通过 wait() 或 waitpid() 获取子进程的 PID 和退出状态,并最终回收该进程资源。

    2. How do the twoprocessescommunicate?

      子进程返回给内核,内核唤醒父进程 父进程wait()的返回值是子进程的pid 子进程通过 exit(code) 将退出状态交给内核; 内核将该状态编码到 status 中; 父进程调用 wait(&status) 后,可通过 WEXITSTATUS(status) 获取子进程的退出码。

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer 1

      Point

      Summary

      Response

      1.1

      Overall, the study lacks well-controlled experiments comparing hypoxia induced by DMOG with hypoxia induced by 1% O₂ for assessing ERα occupancy throughout.

      To assess whether DMOG-induced changes in ERα occupancy reflect bona fide hypoxia, we measured ERα binding by ChIP-qPCR under 1% oxygen over 48 hours, compared to normoxic (21% oxygen) cells and input controls in matched cells at the GREB1 and TFF1 loci. Our findings demonstrate that 1% oxygen treatment recapitulates the ERα binding changes observed with DMOG, at the time points of our RNA-seq experiments.

      We have included these results in __Figure 1F __of the preliminary revision of the manuscript.

      1.2

      Lack of evidence for other co-transcription factors impact under hypoxia HIF's in Fig1.

      We thank the reviewer for this comment. We have clarified that motif enrichment analysis is included to characterise the sequence context of ERα binding sites and to confirm enrichment of known ER-associated motifs (e.g. EREs), rather than to infer functional involvement of additional transcription factors under hypoxia. Corresponding interpretative statements have been removed from the Results and restricted to the Discussion.

      1.3

      Lack of evidence for DMOG induce HIF protein expression in MCF7 cells.

      To confirm DMOG induces HIF-protein expression we have analysed HIF1α and HIF2α protein levels by western blot. We have included these in __Supplementary Figure S1A __within the preliminary revision to address this concern.

      1.4

      Figure 1: ATAC-seq was performed under 1% O₂, whereas ChIP-seq was conducted with DMOG treatment, making these conditions not directly comparable.

      We acknowledge that the ERα ChIP-seq (DMOG) and ATAC-seq datasets were generated under different conditions and are therefore not directly comparable. To address this, we have performed ChIP-qPCR under bona fide hypoxia (1% oxygen) at canonical ERα target loci (TFF1 and GREB1), demonstrating that the directionality of ERα binding changes observed with DMOG is recapitulated under physiological hypoxia. These data provide a direct comparison of ERα occupancy across conditions and support the use of DMOG as a proxy for hypoxia in our ChIP-seq experiments.

      If requested, we are willing to perform ATAC-seq at 16 h under 1% oxygen. However, because the original dataset was generated under 0.1% oxygen, and canonical ERα-bound sites show minimal accessibility changes under severe hypoxia, we anticipate limited additional insight from repeating this experiment.

      1.5a

      Figure S1: ERα ChIP lacks estradiol (E2) treatment in MCF7 cells with or without DMOG.

      The statement that the ERα ChIP samples lack estrogen treatment is incorrect. Estradiol was not an experimental variable and cells were intentionally maintained under estrogen-rich conditions to preserve tumour-relevant ERα activity.

      We have now clarified within the preliminary revision by stating that cells were routinely cultured in “estrogen-rich Dulbecco’s Modified Eagle Medium” in the methods section, and clarified the use of estrogen-rich conditions in the Figure S1 legend.

      1.5b

      The single-gene examples of DMOG effects shown in Fig. S1A are not significant.

      The peak illustrated in Figure S1A (now Figure S1D) __is intended to provide a visual confirmation of peak calling and enrichment patterns underlying the genome-wide redistribution observed in __Figure 1. The peak was called by the MACS2 pipeline (code available from https://doi.org/10.5281/zenodo.17221105) with a log10(q-value) = 268.5, which passes the MACS2 cut-off q

      1.6a

      Fig. S2 lacks 1% O₂ conditions,

      We wish to clarify that Figure S2 (now Figure S4) serves as quality control specifically for the DMOG-treated ChIP-seq dataset presented in Figure 1C. The purpose of the plot is to visualize unfiltered motif enrichment to confirm that the identified peaks represent bona fide ERα binding events within the DMOG condition. Motif enrichment under a 1% oxygen environment would not provide this validation. In all cases the ERE is the most significantly enriched motif.

      With respect to ERα binding under 1% oxygen, we have now assessed this via targeted ChIP-qPCR validation (Figure 1F).

      1.6b

      Fig. S3 lacks DMOG-induced HIF factor assessments.

      The DMOG-induced changes in HIF1α and HIF2α expression are shown in the__ Figure S1__ of this revision proposal and have been incorporated into the manuscript as part of the changes described in response 1.3.

      1.7a

      Figure S4: Estradiol (E2) treatment is missing from the controls, and the figure labeling is of poor quality.

      We have substantially improved the labelling of Figure S4, now__ Figure S6.__

      Additionally, we have clarified that all samples were cultured in estrogen-rich media and treated with either vehicle control or 100 nM fulvestrant; thus estrogen is present in all conditions including the controls.

      1.7b

      Hypoxic conditions for assessing ER status and appropriate controls are also lacking.

      We agree that monitoring ERα stability under hypoxic conditions is essential.

      We provided a western blot assessment of ERα protein levels at 0, 8 and 48 hours of treatment with 1% oxygen or DMOG, compared to normoxic controls, included as Supplementary Figures S1B, C in the preliminary revision.

      These demonstrate the cells remain positive for ERα protein expression at 0, 8 and 48h.

      1.8

      Figure S5: The description of fulvestrant treatments under hypoxic conditions is unclear.

      We thank the reviewer for this comment. To clarify the experimental design, we now signpost the reader in the figure legend of Figure S5 (now S7) to the schematic diagram provided in Figure 3B, and provide a summary stating the experiment employed a factorial design combining a 96-hour fulvestrant treatment with exposure to 1% oxygen for the final 48 hours.**

      1.9

      Supplemental legends: These require major revision; they are of poor quality and lack statistical details and references to biological replicates.

      We have extensively revised all supplementary figure legends to ensure clarity and precision.

      1.10

      Overall comparisons throughout the manuscript are weak; the figures appear sloppy and lack sufficient effort in presentation.

      Following this comment, we carefully reviewed the presentation of all figures throughout the manuscript. We improved the organisation and labelling of the Supplementary Figures to facilitate clearer comparison of the data. In particular, full western blots are now clearly annotated and supplementary legends have been expanded to provide sufficient context for each figure to be interpreted independently.

      1.11

      i) In general, the manuscript in its present form does not greatly contribute from published work as the ERα cistrone is well documented work studied for its role in regulating gene expression, particularly in ERα-positive breast cancer.

      ii) Additionally, a lack of a thorough comparison between DMOG and or 1 %oxygen induce hypoxia in the MCF7 ER+ model, diminished initial interest in the manuscript.

      iii) The lack of considering estradiol exposure under hypoxic conditions with either 1%oxygen and or DMOG also limits relevance to patients with ER+ BrCa.

      iv) The ERα epigenomic profile has been extensively studied including work under hypoxic conditions.

      i) We respectfully disagree that the manuscript does not extend prior work. Despite extensive characterisation of ERα, its role in shaping hypoxia-driven transcription in ER+ breast cancer has not been defined. Here, we identify an ERα-dependent hypoxic response (EDHR), demonstrating a reciprocal interaction between hypoxia and ERα activity.

      ii) In revision, we address concerns regarding DMOG by validating ERα binding under 1% oxygen using ChIP-qPCR thereby confirming our result in bona fide hypoxia. Additionally, all RNA-seq and functional assays, including ENaC targeting, were performed under 1% oxygen in the original manuscript.

      iii) All experiments were conducted under estrogen-complete conditions, now explicitly clarified, reflecting tumour-relevant ERα activity.

      iv) Together, these data establish a reciprocal interaction between ERα and hypoxia and uncover a targetable vulnerability in hypoxic ER+ breast cancer, linking transcriptional regulation to therapeutic opportunity.

      Reviewer 2

      No.

      Summary

      Response

      General Comments

      2.1

      ENAC is proposed as a therapeutic vulnerability based on amiloride sensitivity assays. Additional experiments are required, such as western blot validation of ENaC regulation under hypoxia and loss-of-function approaches to assess its contribution to the phenotype.

      We agree that further validation of ENaC involvement would strengthen this observation. We will assess ENaC protein levels under 1% hypoxia ± fulvestrant by western blot and perform siRNA-mediated depletion of ENaC subunits to test their contribution to the hypoxia-specific amiloride-sensitive phenotype by viability assay (see also response 3.3).

      2.2

      Fulvestrant is used to dissect ERa dependency. However, as a SERD, it may alter chromatin and transcription independently of a simple loss of ERα. Addition control would strengthen interpretation.

      The experimental design already controls for potential fulvestrant-specific transcriptional effects, as all four conditions (± hypoxia, ± fulvestrant) were included. EDHR genes were defined based on induction under hypoxia, loss of this induction following ERα degradation, and absence of residual hypoxic induction in the presence of fulvestrant. Consistent with this, SCNN1B and SCNN1G do not show significant fulvestrant-responsive changes under normoxia (Figure 5C,D).

      We also note that fulvestrant has been shown to induce minimal global chromatin remodelling (Guan et al., 2019), supporting its use to assess ERα dependency without broadly confounding chromatin accessibility; this reference is now included in the manuscript.

      2.3

      The molecular mechanism by which ERα modulates the hypoxic transcriptome, specifically how ERα and HIF pathways converge at ENAC loci should be more studied.

      We further examined the potential convergence of ERα and hypoxic signalling at the ENaC loci (included as __Figure 5E __in the revision proposal) showing genome browser views of the SCNN1G and SCNN1B loci, highlighting hypoxia-induced HIF1α binding and ERα association at these sites.

      To further support this, we will perform RT-qPCR validation of SCNN1G and SCNN1B expression following treatment ± IOX5 and ± fulvestrant. IOX5 is a selective PHD inhibitor that stabilises HIF proteins, enabling us to assess the contribution of HIF signalling independently of other oxygen-dependent effects associated with hypoxia.

      2.4

      In addition, to assess the relevance of this work for luminal breast cancer and ERα expression, specific validation in TNBC should be performed

      To assess the clinical relevance of SCNN1B and SCNN1G in ER-positive and ER-negative subgroups, we performed Cox proportional hazards analyses in TCGA and METABRIC cohorts individually, including ER status and stratifying by ER-positive and ER-negative cases (Figure 6C). These analyses support the association of SCNN1G with poorer relapse-free survival specifically in ER-positive patients.

      2.5

      The authors should provide RT-qPCR validation of the key EDHR genes, especially since this signature is later used for downstream analyses.

      We agree that independent validation would strengthen these findings. We will perform RT-qPCR validation of key EDHR genes (including SCNN1B and SCNN1G) under ± hypoxia and ± fulvestrant conditions to confirm ERα-dependent hypoxic induction.

      Limitations

      2.6

      Reprogramming of the ERα cistrome under cellular stress is well documented. The study extends these ideas but does not clearly demonstrate a new mechanistic paradigm, particularly because the EDHR is defined primarily through omics approaches without strong mechanistic validation. In addition, we have to keep in mind that the study uses DMOG to model hypoxia-driven chromatin changes, but DMOG inhibits many 2-oxoglutarate-dependent dioxygenases non-selectively.

      This makes it difficult to attribute ERα cistrome reprogramming specifically to hypoxia, rather than to broad off-target effects. The transcriptomic dataset is more convincing by need the validation suggested previously.

      While ERα cistrome reprogramming has been described, our study demonstrates a reciprocal interaction in which ERα not only responds to hypoxia but actively shapes hypoxia-driven transcription, defining an ERα-dependent hypoxic response (EDHR).

      We acknowledge the limitations of DMOG and have addressed this by validating key ERα binding events under bona fide hypoxia (1% oxygen) using ChIP–qPCR, confirming our findings under physiological conditions (response 1.1).

      To further strengthen mechanistic insight, we will assess the requirement for HIF stabilisation using the selective PHD inhibitor IOX5, combined with RT-qPCR analysis of SCNN1G and SCNN1B ± IOX5 ± fulvestrant (response 2.3 and 2.5). In addition, we will validate the functional relevance of ENaC through protein-level analysis and siRNA-mediated depletion, as described in__ response 2.1.__

      Together, these additions address concerns regarding DMOG specificity and provide further support for a functional interaction between ERα and hypoxic signalling.

      Audience

      2.7

      Given its reliance on omics datasets and preliminary functional assays, the paper will likely appeal to a specialized audience in transcriptional regulation, hypoxia signalling, and ER+ breast cancer biology. However, the limited mechanistic depth and uncertain translational relevance due to the lack of in vivo validation, may reduce its impact for broader oncology or therapeutic-development audiences. Without stronger validation, the findings may be perceived as niche and mainly of interest to researchers focused on ERα chromatin dynamics rather than to the wider cancer research community.

      The study incorporates multiple layers of human relevance, including spatial transcriptomic analyses demonstrating enrichment of EDHR within hypoxic tumour regions and survival analyses linking EDHR and ENaC expression to clinical outcome.

      In revision, we address the reviewer’s concerns through targeted validation (ChIP-qPCR in hypoxia, western blotting, and RT–qPCR). Together, these additions strengthen the mechanistic and translational relevance of the study.

      Reviewer 3

      No.

      Summary

      Response

      Major comments

      3.1

      The DMOG ChIP-seq provides a valuable first look at ERα redistribution. Since DMOG inhibits both HIF hydroxylases and oxygen-dependent demethylases, the driver of the observed changes remains ambiguous. It would help to include either ERα ChIP-seq under bona fide hypoxia or a selective PHD inhibitor condition (for example IOX5, as you discuss) to separate HIF stabilisation from broad demethylase inhibition. If ChIP-seq is not feasible, a brief ATAC validation at a small panel of gained and lost loci would still increase confidence.

      We acknowledge that mimetics of hypoxia can introduce off-target effects. To address this, we have validated our ERα ChIP-seq findings using ChIP-qPCR at representative loci (TFF1 and GREB1), demonstrating consistent changes in ERα binding under bona fide hypoxia (1% oxygen) (now included in Figure 1F).

      As acknowledged by the reviewer, ChIP-seq under these conditions is likely not feasible due to cell number constraints. We are willing to undertake ATAC-seq if required (as stated in response 1.1); however, we do not feel it would directly address ERα occupancy at these loci. We therefore consider our targeted ChIP-qPCR to be the most appropriate approach to validate ERα redistribution under hypoxia.

      3.2a

      The factorial RNA-seq is well designed and the attenuation analyses are clear. The EDHR selection is stringent and reproducible across two ER+ lines.

      To support the claim of ERα dependence mechanistically, a small number of targeted perturbations would go far. For example,

      i) confirm EDHR induction for SCNN1B and SCNN1G in hypoxia with and without fulvestrant by RT-qPCR

      We agree that targeted validation would strengthen the mechanistic support for ERα dependence. We will perform RT-qPCR validation of SCNN1B and SCNN1G under hypoxia ± fulvestrant to confirm ERα-dependent hypoxic induction (see also response 2.5).

      3.2b

      ii) test whether short-term ERα knockdown reproduces the effect.

      ERα dependency is already assessed through fulvestrant-mediated degradation within the factorial design, which provides a well-established and direct approach to evaluate ERα function. As EDHR genes are defined by loss of hypoxic induction following ERα degradation, this constitutes a robust assessment of ERα-dependent effects.

      We will therefore focus on orthogonal validation through RT-qPCR (response__ 2.5__), together with additional mechanistic and functional analyses using IOX5 and ENaC perturbation (responses 2.1 and 2.3), rather than introducing an ERα knockdown approach, although we would consider this if required.

      3.2c

      iii) A complementary test with a HIF-1α or HIF-2α knockdown at one time point would help position EDHR relative to HIF.

      This request aligns with point 2.3, which addresses the convergence of ERα and HIF signalling. While HIF knockdown under hypoxia would assess necessity, we will instead assess the contribution of HIF signalling using the selective PHD inhibitor IOX5, as this allows us to isolate HIF stabilisation from broader hypoxia-associated effects and avoids additional perturbation associated with transfection-based approaches. We will perform RT-qPCR analysis of SCNN1B and SCNN1G following treatment ± IOX5 ± fulvestrant to determine whether HIF stabilisation is sufficient to support ERα-dependent induction of EDHR genes.

      3.3

      The amiloride result is intriguing and consistent with a hypoxia-specific dependency. Because amiloride is pleiotropic, it would strengthen the conclusion to add one genetic and one pharmacological specificity control. A brief SCNN1B or SCNN1G knockdown in hypoxia should phenocopy the viability effect if ENaC contributes. In parallel, testing benzamil at sub-micromolar doses would provide a more ENaC-selective pharmacological readout. These can be performed in MCF7 and, resources permitting, in T47D.

      To address the reviewer’s concern regarding pleiotropic effects, we propose (aligning with our__ response to 2.1__) to apply siRNA-mediated knockdown of SCNN1B and SCNN1G under hypoxia to determine whether this reproduces our observed viability effect, thereby providing direct evidence for ENaC involvement.

      We agree that additional pharmacological validation could further support specificity, and would consider inclusion of a more ENaC-selective inhibitor if required.

      3.4

      The RFS associations for

      SCNN1B and SCNN1G are compelling. It would be helpful to report whether the associations persist in a multivariable model that at least includes ER status, grade and nodal status where available, or to state clearly when this is not possible across merged datasets. Even a sensitivity analysis in TCGA with ER+ cases only would contextualise the hazard ratios.

      We have analysed TCGA and METABRIC cohorts individually using Cox proportional hazards models, as this functionality is not available for merged datasets in KMplot. ER status was included in the models, and analyses were additionally stratified by ER-positive and ER-negative subgroups. The number of relapse events per subgroup is approximately 40; therefore, additional covariates such as grade and nodal status were not included given the limited number of events per model.

      Within ER-positive patients, high SCNN1G expression is associated with poorer relapse-free survival (TCGA HR 1.45, p = 0.0027), while SCNN1B shows a similar trend that does not reach statistical significance. These analyses are presented in Figure 6C and in the results section of the preliminary revision, and support the findings from the Kaplan–Meier analysis.

      3.5

      The spatial association of EDHR with EMT hotspots is a nice piece of the story. A short clarification of how spot-level cell type composition was handled will help readers interpret proximity results. If cell type deconvolution scores are available in the source dataset, adding a sentence on whether EDHR enrichment tracks tumour epithelial content would be useful.

      Spatial cell type composition and spot annotations were used as provided in the SpottedPy dataset, based on Cell2location-derived deconvolution scores and STARCH tumour annotations, without additional re-estimation.

      To address the reviewer’s suggestion, we examined the relationship between EDHR enrichment and epithelial content and observed no significant correlation at the neighbourhood level.

      These points have now been clarified in the manuscript.

      3.6

      Data processing for ChIP-seq and RNA-seq is documented and accessions are provided. The RNA-seq includes n=3 per condition, which is appropriate, and the correlation and LFC analyses are clearly presented. For the amiloride assay, the two-way ANOVA with interaction is appropriate; please add the exact n and whether experiments were independently repeated, and include the underlying values in a source table for transparency. These are small presentational edits rather than new experiments.

      In the preliminary revision we have added a statement to the amiloride assay figure (Figure 6D) clarifying that n = 3 independent biological replicates were performed per condition. In addition, we now provide the underlying numerical values for this assay in Table S11.

      3.7

      A small, hypothesis-driven mechanistic link from EDHR to ENaC function would substantially elevate impact without becoming a long project. For example, testing whether hypoxia increases amiloride-sensitive Na⁺ current in MCF7 and whether fulvestrant abrogates that increase would directly connect the transcriptional and functional observations. If available, patch-clamp or a simple SBFI-based Na⁺ imaging readout could suffice.

      We agree that directly linking EDHR to ENaC channel activity would further strengthen the mechanistic connection. We will prioritise genetic validation of ENaC function through siRNA-mediated depletion (response 2.1), which directly tests the requirement for ENaC in the hypoxia-specific viability phenotype.

      We are willing to explore the feasibility of measuring the amiloride-sensitive Na+ currents under normoxia and acute hypoxia (via perfusion of cells with bathing solution bubbled with nitrogen during recording) ± fulvestrant to further connect hypoxic regulation to channel activity.

      Minor Comments

      3.8

      Please show representative ERα ChIP-seq browser snapshots for at least one gained, one conserved and one lost locus alongside input for both conditions.

      We have now included representative ERα ChIP-seq browser snapshots for gained, conserved, and lost loci, together with input controls for both conditions, in Figure S3 of the revised manuscript.

      3.9

      In Figure 1D, the ATAC-seq comparison uses 0.1% O₂ for 48 h while the RNA-seq uses 1% O₂. Briefly justify the choice and discuss any expected differences.

      We thank the reviewer for this point. The ATAC-seq dataset was generated under 0.1% oxygen in the original study, whereas RNA-seq experiments in this work were performed at 1% oxygen to reflect tumour-relevant hypoxic conditions. The more severe hypoxia used for ATAC-seq would be expected to maximise detection of chromatin accessibility changes. Despite this, chromatin accessibility changes were limited, with ERα binding occurring predominantly at pre-accessible regions. This has now been clarified in the manuscript.

      3.10

      In the Methods for spatial analyses, specify the thresholds for hotspot calling and how the neighbourhood radius was chosen.

      The neighbourhood parameter was set to 8, corresponding to the immediate neighbouring spots in Visium data, consistent with package guidance. We have clarified this in the manuscript text.

      3.11

      For the EDHR heatmap, consider marking the 14 consensus genes and indicating which belong to the ENaC module to aid readability.

      We have marked the 14 EDHR consensus genes and indicated the ENaC module in the revised heatmap to aid readability.

      3.12

      Please report exact sample sizes and replicate numbers in all figure legends and provide a single table with all statistical tests, n, and p values.

      We have reported exact sample sizes and replicate numbers in all relevant figure legends and included Table S11 summarising all statistical tests, sample sizes (n), and p values.

      3.13

      A schematic summarising the experimental timelines for ChIP-seq, RNA-seq and viability would help orient readers.

      We have added timelines for these experiments as requested.

      3.14

      Minor copyedits: consistent formatting of O₂, gene symbols and reagent catalogue numbers.

      We have standardised oxygen notation throughout the manuscript to use “oxygen” in the main text and “O2” where appropriate (e.g. figures).

      Reagent catalogue numbers have now been standardised for consistency of presentation in the revised manuscript.

      Gene and protein nomenclature were already formatted according to accepted conventions and were verified for consistency.

      3.15

      The manuscript is well referenced. Where you contrast your findings with long-term CoCl₂ hypoxia, a sentence on why acute DMOG and short-term 1% O₂ may reveal different ERα behaviours would help position the novelty.

      We thank the reviewer for this suggestion. We have expanded the manuscript to clarify that acute hypoxia (1% oxygen) and DMOG treatment capture early, dynamic hypoxic responses, in contrast to chronic CoCl2 exposure, which reflects longer-term adaptation. This distinction is relevant to tumour biology, where hypoxia is often transient due to unstable vascularisation. The following statement has been added to the manuscript:

      “In addition to such chronic hypoxic adaptation, tumour hypoxia can also be dynamic, with cells experiencing acute or transient hypoxic exposure due to unstable vascularisation; an established contributor to tumour progression (Liu et al, 2022a; Koh & Powis, 2012). Thus, in contexts where both signalling pathways remain active, the dependence of the hypoxic response on ERα in ER+ cells has not been previously characterised.”

      Primary Limitations

      3.16

      DMOG vs hypoxia in the cistrome experiment,

      To address concerns regarding the use of DMOG, we have validated key ERα binding events from the ChIP-seq dataset by ChIP–qPCR at the TFF1 and GREB1 loci under bona fide hypoxia (1% oxygen) in biological triplicate__ (Figure 1F)__. These data demonstrate consistent changes in ERα binding under hypoxia, supporting that the DMOG-induced redistribution reflects hypoxia-driven changes.

      3.17

      the absence of direct HIF or cofactor perturbations

      We acknowledge the absence of direct HIF perturbation. To address this, we will assess the contribution of HIF signalling through stabilisation approaches, including RT-qPCR analysis of SCNN1B and SCNN1G ± IOX5 ± fulvestrant (response 3.2), to determine whether HIF activation is sufficient to support ERα-dependent induction.

      3.18

      and the pleiotropy of amiloride.

      To address the potential pleiotropy of amiloride, we will perform siRNA-mediated knockdown of SCNN1G and SCNN1B to provide independent validation of ENaC-dependent effects (response 3.3).

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary

      This study explores how hypoxia reshapes ERα signalling in ER-positive breast cancer and whether this cross-talk exposes targetable vulnerabilities. The authors first map ERα binding in MCF7 cells after dioxygenase inhibition with DMOG and observe a genome-wide redistribution with enrichment of ERE, FOXA1 and AP-1 motifs at gained sites while chromatin accessibility at these loci appears unchanged in public ATAC-seq after hypoxia. They then perform RNA-seq in MCF7 and T47D using a factorial design that combines fulvestrant-mediated ERα degradation with 1% O₂ to define an ERα-dependent hypoxia response (EDHR). A 14-gene consensus EDHR signature includes ENaC regulatory subunits SCNN1B and SCNN1G, whose higher expression is associated with poorer RFS in ER+ cohorts. Functionally, amiloride increases viability in normoxia but reduces viability under hypoxia in MCF7 across a dose range. Spatial transcriptomics from ER+ tumours shows EDHR expression enriched at the margins of hypoxia and estrogen-hallmark regions and adjacent to EMT hotspots. Raw data and code availability are stated for the central datasets and accessions are provided. Together the results argue that ERα helps organise a distinct hypoxic programme and suggest a context-specific sensitivity to ENaC inhibition.

      Major comments

      The paper addresses a timely question with a clear narrative arc and brings together ChIP-seq, RNA-seq, pharmacology, survival analysis and spatial transcriptomics. The EDHR concept is interesting and the ENaC angle is original. The work is already strong and with a few targeted additions and clarifications it can be made more persuasive without becoming a new project.

      1) The DMOG ChIP-seq provides a valuable first look at ERα redistribution. Since DMOG inhibits both HIF hydroxylases and oxygen-dependent demethylases, the driver of the observed changes remains ambiguous. It would help to include either ERα ChIP-seq under bona fide hypoxia or a selective PHD inhibitor condition (for example IOX5, as you discuss) to separate HIF stabilisation from broad demethylase inhibition. If ChIP-seq is not feasible, a brief ATAC validation at a small panel of gained and lost loci would still increase confidence. Estimated time: 6-8 weeks for a focused follow up with two conditions and biological duplicates/triplicates.

      2) The factorial RNA-seq is well designed and the attenuation analyses are clear. The EDHR selection is stringent and reproducible across two ER+ lines. To support the claim of ERα dependence mechanistically, a small number of targeted perturbations would go far. For example, confirm EDHR induction for SCNN1B and SCNN1G in hypoxia with and without fulvestrant by RT-qPCR and test whether short-term ERα knockdown reproduces the effect. A complementary test with a HIF-1α or HIF-2α knockdown at one time point would help position EDHR relative to HIF. Estimated time: 3-4 weeks for qPCR and siRNA validations.

      3) The amiloride result is intriguing and consistent with a hypoxia-specific dependency. Because amiloride is pleiotropic, it would strengthen the conclusion to add one genetic and one pharmacological specificity control. A brief SCNN1B or SCNN1G knockdown in hypoxia should phenocopy the viability effect if ENaC contributes. In parallel, testing benzamil at sub-micromolar doses would provide a more ENaC-selective pharmacological readout. These can be performed in MCF7 and, resources permitting, in T47D. Estimated time: 4-6 weeks.

      4) The RFS associations for SCNN1B and SCNN1G are compelling. It would be helpful to report whether the associations persist in a multivariable model that at least includes ER status, grade and nodal status where available, or to state clearly when this is not possible across merged datasets. Even a sensitivity analysis in TCGA with ER+ cases only would contextualise the hazard ratios. Estimated time: 1-2 weeks.

      5) The spatial association of EDHR with EMT hotspots is a nice piece of the story. A short clarification of how spot-level cell type composition was handled will help readers interpret proximity results. If cell type deconvolution scores are available in the source dataset, adding a sentence on whether EDHR enrichment tracks tumour epithelial content would be useful. Estimated time: 1 week.

      Reproducibility and statistics

      Data processing for ChIP-seq and RNA-seq is documented and accessions are provided. The RNA-seq includes n=3 per condition, which is appropriate, and the correlation and LFC analyses are clearly presented. For the amiloride assay, the two-way ANOVA with interaction is appropriate; please add the exact n and whether experiments were independently repeated, and include the underlying values in a source table for transparency. These are small presentational edits rather than new experiments.

      Optional

      A small, hypothesis-driven mechanistic link from EDHR to ENaC function would substantially elevate impact without becoming a long project. For example, testing whether hypoxia increases amiloride-sensitive Na⁺ current in MCF7 and whether fulvestrant abrogates that increase would directly connect the transcriptional and functional observations. If available, patch-clamp or a simple SBFI-based Na⁺ imaging readout could suffice. Estimated time: 6-8 weeks.

      Minor comments

      1. Please show representative ERα ChIP-seq browser snapshots for at least one gained, one conserved and one lost locus alongside input for both conditions.
      2. In Figure 1D, the ATAC-seq comparison uses 0.1% O₂ for 48 h while the RNA-seq uses 1% O₂. Briefly justify the choice and discuss any expected differences.
      3. In the Methods for spatial analyses, specify the thresholds for hotspot calling and how the neighbourhood radius was chosen.
      4. For the EDHR heatmap, consider marking the 14 consensus genes and indicating which belong to the ENaC module to aid readability.
      5. Please report exact sample sizes and replicate numbers in all figure legends and provide a single table with all statistical tests, n, and p values.
      6. A schematic summarising the experimental timelines for ChIP-seq, RNA-seq and viability would help orient readers.
      7. Minor copyedits: consistent formatting of O₂, gene symbols and reagent catalogue numbers.

      Prior studies

      The manuscript is well referenced. Where you contrast your findings with long-term CoCl₂ hypoxia, a sentence on why acute DMOG and short-term 1% O₂ may reveal different ERα behaviours would help position the novelty.

      Significance

      General assessment

      The strongest aspects are the carefully designed factorial RNA-seq that cleanly separates ERα and hypoxia effects, the discovery of a concise EDHR signature reproducible across two ER+ lines, and the integration with spatial transcriptomics that places EDHR near EMT-rich tumour regions. The ENaC connection is new and potentially actionable, and the context-dependent amiloride response is a practical lead. Limitations are primarily mechanistic: DMOG vs hypoxia in the cistrome experiment, the absence of direct HIF or cofactor perturbations, and the pleiotropy of amiloride.

      Advance

      To my knowledge, this is the first description of a distinct ERα-dependent hypoxic programme in ER+ breast cancer that includes ENaC regulatory subunits and links to an EMT-adjacent spatial niche. The conceptual advance is the positioning of ERα as a coordinator of a subset of hypoxia-induced genes rather than as a parallel pathway, together with an initial functional readout that suggests a therapeutic angle through ENaC modulation. With the targeted additions outlined above, the study would move from strong association to a more mechanistic and translationally relevant model.

      Audience

      The work will interest a specialised audience in nuclear receptor biology, hypoxia signalling, tumour microenvironment, and ion transport in cancer. It has potential relevance for basic researchers studying ERα cistrome dynamics, for groups using spatial transcriptomics to define micro-niches, and for translational researchers exploring metabolic and ionic vulnerabilities in ER+ disease.

      Expertise disclosure

      Keywords: nuclear receptors,, chromatin profiling, transcriptomics, spatial transcriptomics, breast cancer biology.

      I am not a domain expert in ion channel electrophysiology; my comments on ENaC pharmacology focus on specificity and study design rather than detailed channel biophysics.

      Tone

      I find the paper well conceived and already compelling. The suggested experiments are focused, realistic in scope, and primarily aim to turn several strong associations into concise mechanistic statements that would further increase confidence and impact.

    1. Reviewer #1 (Public review):

      Summary:

      This carefully executed study uncovers the functional relevance of curl signals that impinge on the retina every time an observer's gaze direction and movement direction are not aligned.

      Strengths:

      This finding is important, highlighting the functional role of an abundant incidental signal (curl in retinal motion) that has thus far believed to be a nuisance that needs to be filtered out of the retinal motion stream.

      The study's evidence is compelling: a combination of psychophysical experiments and critical manipulations, control theory and neural modeling, which together make an internally consistent and biologically plausible case for the role of curl signals in estimating heading direction.

      This study uncovers the functional relevance of curl signals that occur on the retina when an observer is moving, and gaze is not straight ahead. The experimental and modeling results clearly go beyond previous studies and significantly advance our understanding of vision-based navigation.

      Another clear strength is that the study uses tightly controlled experimental manipulation to provide strong test cases for the hypothesis that curl is used for visual navigation. These conditions are important to constrain the proposed model (and future models) of heading control.

      The modeling is very clearly described, and the modeling and analysis code is published and freely available. The authors go beyond a back-of-the-envelope control model and show how it might be implemented at the neural-circuit level. The model is biologically plausible.

      Weaknesses:

      The discussion would benefit from an extension of the implications of the study and predictions of their model.

    2. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      We appreciate Reviewer #1’s very positive feedback. Incorporating the perspective of ‘incidental’ sensory signals is a valuable suggestion that aligns perfectly with our findings. We agree that this perspective significantly strengthens the impact of our paper.

      In the revised version, we will update the manuscript to bridge these perspectives (the functional role of incidental” sensory signals and the role of retinal flow in navigation). In addition we will elaborate on the potential predictions of the model and possible manipulations that might affect the integration between sensory evidence (curl signal) and straight-ahead prior.

      Reviewer #2 (Public review):

      We appreciate the reviewer’s feedback regarding the formalization of our reference frames. We agree that certain definitions were implicitly assumed rather than explicitly stated. We will revise the manuscript to provide all necessary self-contained information, ensuring that the geometry of the task response and the definition of heading are unambiguous. Also, we will address the gap between the task response (in world coordinates) and the functional role of the controller, as well as the other points raised by the reviewer.

      Major issues:

      (1a), (2a) Clarification of Reference Frames

      The reviewer asks: “To ‘directly estimate heading’ relative to what?”

      In our study, participants were instructed to report their “perceived direction of self-motion” by aligning a rotational encoder (steering wheel) with the direction they felt they were moving within the 3D simulated scene. Consequently, participants reported their instantaneous heading in a world-centered reference frame, from which the 3D trajectories were reconstructed. Since the reviewer had to infer this information, it should be clarified to ensure it is immediately evident.

      Participants were informed that the initial heading (i.e. θ<sub>0</sub> in our controller nomenclature) was oriented “straight ahead” relative to their body which was aligned longitudinally with the experimental room. We will modify Figure 1B and revise the Methods section to explicitly clarify this initial alignment and the instructions provided to participants.

      In the revised manuscript, we will clarify that while the participant’s report is world-centered, the retinal curl provides a gaze-relative heading signal. Although this was already mentioned, we will emphasize this point. In natural navigation toward a fixated target, a world-centered vector is often unnecessary; an error signal indicating heading relative to fixation is sufficient (as the reviewer also notes). However, the initial alignment of the heading within the 3D scene allows the brain to “calibrate” this internal controller, mapping the retinal curl signal onto the 3D world coordinates required for the task.

      The reviewer also asks how we can be certain that participants were reporting in world coordinates rather than an alternative frame, such as “heading relative to the fixation target.” We believe our “Cancelled Curl” (and over-cancelled) conditions provide the most compelling evidence to rule out this alternative. In these conditions, the physical position of the fixation target in the scene remained identical to the unaltered flow condition. If participants were simply reporting heading relative to the fixation target’s spatial location, the observed biases should have persisted regardless of the flow manipulation. Instead, the bias vanished when the curl was removed. This causal evidence proves that the bias is driven by the retinal motion signal (curl) rather than the spatial orientation of the eyes or the target’s position in the scene. Furthermore, the temporal evolution of the response supports a world-centered integration. For simulated straight paths, the perceived heading remains straight for the first few seconds (consistent with the initial world-centered alignment), with biases only emerging after approximately 3 seconds of integration (a point we elaborate on in our response to Reviewer #3). Had participants been responding based on a simple gaze-relative reference frame from the onset, these biases would have manifested significantly earlier. We will incorporate these points into the revised Discussion to better frame our findings alongside other cues, such as the Focus of Expansion (FOE), that contribute to heading estimation.

      (1b) The reviewer notes that we must be clear about the relationship between curl and heading (relative to fixation) and the variables that affect curl.

      Beyond the discrepancy between heading (θ) and gaze (ψ), curl is geometrically determined by translational self-motion speed (υ), eye height (h), and pitch (α). More specifically curl = (υ sin_ψ_cos α)/h). The derivation will be included in the Supplementary Information. Since h = d_sin_α, where d is the 3D distance to the fixation point, we could express cos α as a function of distance. Certainly, there is not a 1:1 map from curl signal to heading relative to gaze (e.g. θ – ψ). Participant would need to know υ and eye height plus extra-retinal information. Frenz et al (2003, Vis Res.) showed that people can estimate self-motion directly from optic flow, across different simulated eye height and gaze angle; extra-retinal information can, in addition, provide knowledge to (ψ) and (α). It is then plausible that the visual system can use and transform the curl signal from a qualitative directional cue (i.e. steering left or right of fixation) into a quantitative steering command. By combining curl with knowledge of gaze orientation and eye height, the visual system can resolve ambiguities in the flow field and utilize curl as a more precise error signal for locomotor control. These aspects will be included in the new version.

      (2b) Mismatch between task and controller

      We thank the reviewer for this point. We have addressed the alignment of the reference frames in our response to Issues 1a and 2a. Once the initial orientation () is established in the world frame, the controller model generates steering adjustments that directly translate into heading predictions within that same world reference frame. By treating the perceptual report as an output of the locomotor controller, we resolve the discrepancy between the steering task and the reported heading.

      (2c) No raw data provided

      We respectfully disagree with the reviewer’s interpretation regarding data smoothing. The thin lines in Figure 2 represent the mean 3D paths derived directly from the response variable (θ<sub>0</sub>) across trials of identical conditions for each participant (as detailed in the ‘Computation of Perceived Path’ section). No smoothing or filtering has been applied to these plotted trajectories other than computing the mean across trials. We also wish to remind the reviewer that the raw data and analysis code remain publicly accessible for further inspection. Regarding the visual representation: in earlier versions of the manuscript, we included shaded 95% Confidence Intervals (CIs) in Figure 2. However, this addition rendered the plot overly cluttered and obscured the individual trajectories. We therefore elected to present individual participant means (thin lines) alongside group averages (thick lines) to emphasize inter-subject variability. For clarity, the 95% CIs are explicitly displayed in Figure 3, where the data density is more conducive to shaded areas.

      (3) Difference with Matthis et al (2022)

      While Matthis et al. (2022) described the existence of retinal curl during walking and which information can provide relative to gaze, Our paper provides the causal link, since we manipulate in real-time (the ‘cancelled & overcancelled curl’ condition) providing the critical evidence that perceived heading is affected by this signal.

      (4) Eye movements analysis

      We thank the reviewer for noting that retinal slip (velocity error) is a more critical metric than positional gaze error. We agree that tracking inaccuracies can introduce translational noise into the flow field. The 3° threshold was established based on the eye tracker’s specifications and the naturalistic setup (1-meter viewing distance without head stabilization). Across all participants, the mean positional error ranged from 1.016° to 1.5° (1 deg is 2.08 cm in our setup). We also calculated retinal slip values, which ranged from 0.12 to 0.27 deg/s (X dimension) and 0.12 to 0.23 deg/s (Y dimension). These values are comparable to natural oculomotor drift (Kowler et al., 1979) and are understandably small given the low velocity of the fixation target. Consequently, it is highly unlikely that retinal slip influenced the results. Furthermore, assuming that tracking error remained consistent across fixation conditions, any present retinal slip cannot explain why the bias followed the retinal curl manipulation as predicted by the controller. We therefore consider retinal slip to be an unlikely confounding factor.

      (5) the separate and joined fits

      We thank the reviewer for the opportunity to clarify the logic behind our modeling choices. We acknowledge that the “separate fits” are inherently less informative due to the high number of free parameters relative to the data. Our primary scientific goal was not to achieve perfect descriptive accuracy via 30 parameters, but to test a specific functional hypothesis through the “joint fit.”

      The Logic of the Joint Fit:

      We agree with the reviewer that the joint fit misses some paths in some conditions. Of course, the joint fit reflects a significant compromise. The “Gain” (the weighting of the curl signal) is likely not a static constant but is dynamically tuned based on task demands, confidence in the visual signal, simulated speed, and so on. By using a single Gain parameter, we intentionally ignore this contextual variability to see how much of the behavior can be explained by a “minimalist” controller. In this sense, the 2-parameter joint model is a deliberate attempt to test this limit. By forcing a single Gain parameter to account for all conditions across both straight and curved paths within one flow manipulation (e.g. unaltered flow) we are asking if a single, fixed linear relationship between retinal curl and steering effort/gain can explain the results. We view the joint fit not as a “perfect” model, but as a stronger test of the curl-based control theory. The fact that a 2-parameter model can capture the direction and scale of biases across such a diverse set of conditions (straight/curved paths, five fixation eccentricities) suggests that retinal curl is a robust signal. Upon closer analysis, these discrepancies between the joint model and the data are most pronounced in the over-cancelled condition which is the one when sensory evidence becomes more ecologically inconsistent with the extra-retinal information (gaze direction). While the joint fit successfully demonstrates that a single parameter can capture the general functional role of curl, it fails to account for the complex sensory re-weighting that occurs in ecologically inconsistent conditions (like ‘over-cancelled’ flow). We will update the manuscript to discuss these limitations, framing the model as a parsimonious first-order approximation rather than a complete description of human heading perception based on a minimal set of parameters.

      (6) On the neural simulations

      We acknowledge that the presentation of the neural model requires more clarity regarding its objectives and its relationship to the behavioral data.

      We first wish to clarify the intended scope of the neural ring-attractor model. Our primary goal was not to provide a comprehensive account of behavioral performance across all conditions (which is the role of the controller model), but rather to demonstrate a biologically plausible mechanism that explains the emergence of the “Opposite-to-Gaze” bias. While the controller demonstrates that the bias follows a specific control law, the neural model shows how such a law can emerge from known primate neurophysiology, specifically, spiral-tuned MSTd neurons, gaze-contingent inhibition, and an egocentric “straight-ahead” prior.

      Why Straight Paths are Sufficient for this Objective. The reviewer asks why only straight paths were simulated. In our study, the straight-path condition with eccentric gaze is the purest test of the bias mechanism. Simulating the straight paths allowed us to isolate the interaction between foveal inhibition and the straight-ahead prior without the confounding variable of path-curvature flow. Given the complexity of the neural network’s parameter space, we focused on these conditions to provide a clear neuro-plausible explanation.

      Units: Pixels vs. Degrees. We acknowledge that the use of “pixels” in the plots of internal neural dynamics may appear awkward. The neural network operates on input stimuli that are defined by the pixel resolution of the videos used in the simulations, we used pixels as the native coordinate system to describe the movement of activity peaks within the network’s internal “map.”

      Behavioral Output (Meters): Importantly, the final heading estimates produced by the network are not left in pixels. We use a pinhole camera model to reconstruct the 3D trajectories from the neural activity. These results are expressed in meters, allowing for a direct comparison with the human behavioral data.

      Addressing Wild Oscillations and Smooth Paths. The oscillations observed in the instantaneous heading estimates reflect the stochastic nature of the population peak when tracking high-frequency sensory inputs. In our model, the synaptic time constant (τ) was kept relatively small to ensure a fast, low-latency response to changes in self-motion. While increasing τ would have produced smoother internal dynamics, it would also have introduced delays into the control loop. Instead, we chose to maintain this high sensory responsiveness and applied a temporal moving average later to the network’s decoding to reconstruct the 3D trajectories.

      In addition, the neural activity over time is shown in two ways: the heatmap shows the neuron with preferred heading (one can see more oscillations, specially when the fixation point is closer to the centre (eccentricities -2 and 2), due to larger competition between the sensory evidence and the straight-ahead prior. The other way is the decoded heading. In the ring-attractor model, the decoded heading is not determined by a single neuron but is calculated using a population vector average (equation 19). By summing across the entire population, the decoder effectively integrates sensory evidence from many neurons simultaneously. One can appreciate (see e.g. Fig. 5B) that averaged decoding, leads to a smoother resulting estimate (the white dashed line, whose visibility will be improved in the revised version). Behavioral work by Burr and Santoro (2001) suggests that global motion signals (divergence and rotation in optic flow) are integrated over much longer timescales—roughly 1000ms to 3000ms—compared to local motion units (~200ms).

      See also our comment on temporal integration in the responses to reviewer #3.

      Reviewer #3 (Public review):

      We thank Reviewer #3 the comments regarding the definition of heading at different time scales, the role of the gait cycle, and the temporal integration of the curl signal. They will help us refine the manuscript’s core arguments.

      We agree that “heading” must be precisely defined within the context of the differing temporal demands of balance and steering. While instantaneous retinal motion provides the high-frequency feedback necessary for momentary postural adjustments and balance, our study is concerned with heading as a gaze-relative signal used for the continuous control of a locomotor trajectory. As such, we will revise the manuscript to specify that the perceived heading measured in our task reflects a signal integrated over the gait cycle to filter out the oscillatory noise induced by head bob and sway.

      The reviewer correctly notes that gait-induced head bob and sway produce high-frequency oscillations in the curl signal, yet our behavioral results show smooth, slowly evolving biases. The visual system does not react to “instantaneous” curl, which would lead to jittery, unstable heading estimates. Instead, it integrates flow over a timescale roughly commensurate with a full gait cycle (~500–1000ms). This implies a significant temporal integration process. This temporal integration is consistent with evidence (Burr and Santoro,2001, Vis Res) indicating that optic flow signals (radial and rotational components) are integrated over windows of approximately up to 3 seconds to ensure perceptual stability. Neurally, this likely involves the projection from area MSTd to the Ventral Intraparietal area (VIP), a pathway where fast, eye-centered sensory inputs are transformed into stable, body-centered representations suitable for guiding long-term steering behavior (Chen et al. 2011, JNeurosci.). By grounding our definition of heading in these specific temporal and neural constraints, we aim to clarify how the visual system exploits retinal curl for goal-directed action in natural, dynamic environments and relate our findings to recent studies addressing the role of retinal motion on balance (Powell et al. 2026 Bioarx).

      In our implementation, we explicitly address the high-frequency noise introduced by gait dynamics by smoothing the retinal curl signals computed from the stimulus videos before they are fed into the controller. This temporal filtering allows the fit of the controller’s prediction to the response data while remaining robust to the rapid fluctuations of head bob and sway. In contrast, the neural ring-attractor model would not require an external smoothing step; instead, the integration is an emergent property of the system’s architecture that can be controlled with different parameters. The dynamics of the synaptic weights and the characteristic “leak” in the population activity naturally implement a leaky integration of sensory evidence, ensuring that the decoded heading reflects a sustained estimate rather than an instantaneous response to visual noise.

    1. SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration Authors:Jialong Chen, Xander Xu, Hu Wei, Chuan Chen, Bing Zhao View a PDF of the paper titled SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration, by Jialong Chen and 4 other authors View PDF HTML (experimental) Abstract:Large language model (LLM)-powered agents have demonstrated strong capabilities in automating software engineering tasks such as static bug fixing. However, in the real world, the development of mature software is typically predicated on complex requirement changes and long-term feature iterations -- a process that static, one-shot repair paradigms fail to capture. To bridge this gap, we propose SWE-CI, the first repository-level benchmark built upon the Continuous Integration loop, aiming to shift the evaluation paradigm for code generation from static, short-term functional correctness toward dynamic, long-term maintainability. The key insight is simple: Maintainability can be revealed by tracking how functional correctness changes over time. The benchmark comprises 100 tasks, each deriving from a real-world code repository with a development history spanning an average of 233 days and 71 consecutive commits. SWE-CI requires agents to systematically resolve these tasks through dozens of rounds of analysis and coding iterations. SWE-CI provides valuable insights into how well agents can sustain code quality throughout long-term evolution. Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL) Cite as: arXiv:2603.03823 [cs.SE]   (or arXiv:2603.03823v4 [cs.SE] for this version)   https://doi.org/10.48550/arXiv.2603.03823 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Jialong Chen [view email] [v1] Wed, 4 Mar 2026 08:20:25 UTC (3,311 KB) [v2] Tue, 17 Mar 2026 15:22:33 UTC (3,312 KB) [v3] Wed, 18 Mar 2026 12:07:41 UTC (3,315 KB) [v4] Wed, 1 Apr 2026 05:06:38 UTC (6,535 KB)

      AI agent

    1. Dernier regard

      je me disais que tu pourrais évoquer les nouvelles régulations / lois pour les influenceurs qui permettent à ceux qui consomment leur contenu une meilleur transparence (code de l'ARPP) par exemple quand les photos sont retouchées on est sensé le savoir

  2. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. Code-switching. November 2023. Page Version ID: 1185649746. URL: https://en.wikipedia.org/w/index.php?title=Code-switching&oldid=1185649746 (visited on 2023-11-24).

      This source talks about "code-switching" which it describes as when people switch between two different languages. It lists different types of code-switching, as well as how it can be applied or used. The source also gives written examples toward the end.

    2. Code-switching. November 2023. Page Version ID: 1185649746. URL:

      This reminds me of what I learned in my linguistics class last quarter about style shifting and code switching. Everyone does it to some extent to show belonging, community, and connection to a certain group. We switch the way we speak to project a specific image of ourselves, or we switch according to who's listening to be more like them or more different than them to establish connection or disconnect.

    1. Author response:

      Reviewer 1:

      Porte et al. investigate how observers form confidence judgments about the presence vs absence of near-threshold audiovisual stimuli. In two psychophysical detection experiments, human participants judged whether a stimulus (visual, auditory, or audiovisual) was present or absent, reported amodal confidence, and then gave modality-specific detection and confidence ratings using a bidimensional scale. The authors report that audiovisual (AV) stimuli are detected more accurately than unimodal stimuli, but that multisensory stimulation does not improve metacognitive efficiency. Participants are more confident in absence than in presence judgments. They extend a previously proposed model to an audiovisual setting, assuming evidence is available only for presence and that absence is inferred via counterfactual detectability. Detection is modeled with a disjunctive integration rule across modalities, while confidence is explained by a combination of conjunctive (for presence) and disjunctive/negation-of-disjunction (for absence) rules.

      We thank the reviewer for thoroughly evaluating our work.

      There are several points I wish to have clarified, outlined below:

      (1) Framing of bimodal vs unimodal detection

      On p.3, the introduction states that "Adults typically show higher detection rates and faster reaction times for bimodal than for unimodal stimuli." This is broadly consistent with the literature, but as written, it obscures the fact that these effects depend critically on experimenter-defined stimulus strengths. It is trivial to construct cases where a strong unimodal stimulus is more detectable than a bimodal stimulus made of two very weak unimodal stimuli. If "bimodal" is understood as the co-presentation of two unimodal components matched in detectability, then Bayes-rule-based arguments indeed predict better detection for the bimodal case; how much better is theoretically interesting, but not quantified in this paper. There is an entire literature on the combination of two unimodal stimuli, which is not touched on. For a pertinent reference, see Ernst & Banks 2002. I recommend clarifying that the statement assumes comparable unimodal intensities.

      We will clarify that when discussing bimodal stimuli, we mean the co-presentation of two unimodal stimuli of similar intensity. We will add references to the literature during discrimination tasks that have shown that multisensory cue-combination followed Bayes rule integration (e.g., Ernst & Banks, 2002; Battaglia et al., 2003; Alais & Burr, 2004) and clarify in which ways our work differs from this rich body of work and provides novel contributions.

      (2) Relationship to signal detection theory and counterfactual perceptibility

      In the introduction, the authors write, "If sensory evidence is only available for presence," motivating counterfactual perceptibility as a necessary ingredient to infer absence. However, standard signal detection theory (SDT) already provides a widely accepted framework in which a continuous internal response is present on both signal and noise (absent) trials, with absence corresponding to the noise distribution and decisions implemented by a criterion. Thus, there is no logical need to invoke counterfactual perceptibility simply to define absence; rather, the Mazor-style framework adds an explicit belief model about detectability and an optimal stopping policy. It would strengthen the paper to more clearly state how the proposed model goes beyond SDT conceptually, acknowledge that SDT can account for presence/absence decisions without counterfactuals, and position the counterfactual account as a hypothesis about how observers actually compute absence/confidence, not as a necessity.

      One of the central claims of the paper is that detection in the case of absence requires counterfactual reasoning. The authors should demonstrate whether or not an SDT-based generative model can describe these amodal and uni- and bi-modal stimulus decisions. In such an SDT model, an SDT-based generative model in which the noise distribution is shared across conditions, and unimodal vs bimodal differences are captured by changes in the mean or variance of the signal+noise distribution.

      We will clarify that our framework explains how absence judgments (and related confidence) are formed, and what it adds to SDT models, including the reproduction of reaction times and a normative explanation of criterion placement (results about RTs are available in the supplementary materials).We will also run additional model comparisons assessing how an SDT-based generative model performs compared to our Bayesian model based on counterfactual perceivability.

      (3) Confidence vs performance: is AV confidence special?

      The paper's central claims about multisensory confidence and metacognition would be stronger if the authors showed that AV confidence deviates from what is expected given performance alone. From the reported results, AV accuracy is around 80%, with visual and auditory at about 60% and 40%, respectively. Given that confidence typically monotonically scales with accuracy, the first question is whether AV confidence is entirely explained by improved performance, or whether there is an additional multisensory contribution. A simple, informative analysis would be for each subject, plot mean confidence vs per cent correct for AV, V, A, and absent conditions, and to test whether AV confidence lies above the trend predicted by accuracy alone.

      This is an excellent suggestion, and we will conduct the proposed analysis.

      (4) Metacognitive measures: logistic regression slopes vs meta-d′/d′

      In the "Multisensory effects on metacognitive performance" section, the authors define "metacognitive sensitivity" as the slope of a Bayesian logistic regression predicting accuracy from confidence. There is substantial literature showing that logistic-slope measures of metacognitive sensitivity are criterion-dependent and can be affected by both task and confidence criteria (for one example, see Rausch & Zehetleitner, 2017). In contrast, meta-d′/d′ was specifically developed to provide a bias-invariant measure of metacognitive efficiency. Though this, too, is dated (see Boundy-Singer et al., 2023). Given that the authors already estimate HMeta-d-based M-ratios, it is unclear why they rely on logistic regression slopes as their primary "metacognitive sensitivity" metric in Figure 4A. I suggest either replacing the logistic-slope metric with SDT-based measures (meta-d′, meta-d′/d′) or providing a clear justification for using logistic slopes, along with a discussion of their known limitations.

      Additionally, Figure 3 reports M-ratios without showing the corresponding d′ or meta-d′ for judge-present vs judge-absent conditions. Presenting these would help contextualize the metacognitive efficiency results and clarify whether differences are driven mainly by changes in metacognitive sensitivity, changes in task performance, or both. The d' values per condition could be added to Figure 2A.

      All typical measures of metacognitive sensitivity are influenced by metacognitive bias and task performance to some extent, and none of them is a pure measure of type-2 sensitivity (e.g., see Rahnev, 2025). Here, we chose logistic regression because it enables modeling interactions with other predictors in a factorial design with a limited number of trials.

      We will clarify the limitations of metacognitive sensitivity measures and better explain why we then used Mratio to estimate metacognitive performance while controlling for underlying task performance.

      Thank you for this suggestion. We will add the d’ values per condition to Figure 2A.

      (5) Interpretation of confidence in absence vs presence

      The authors emphasise that it is surprising subjects are more confident in absence than in presence judgments, both at amodal and modality-specific levels. However, Figure 2B suggests that absent responses are very accurate: absent is reported as present only in about 10% of absent trials, implying a high correct rejection rate. If confidence tracks outcome probability, higher confidence for absence may be at least partly expected. Before attributing this asymmetry primarily to counterfactual reasoning, it would be important to explicitly relate confidence to accuracy for hits, misses, false alarms, and correct rejections and show whether absence confidence remains elevated relative to presence after controlling for accuracy differences across judgment types and conditions. Without this, the interpretation that higher absence confidence is inherently "unexpected" seems overstated.

      This higher confidence for absence judgments than for presence judgments was observed while controlling for response accuracy. We will clarify this in the main text.

      (6) Model: integration rules, confidence, and evidence strength

      The modeling section extends the Mazor et al. ideal observer to two modality-specific sensors, with disjunctive integration for detection and then disjunctive vs conjunctive integration rules for confidence. I have a few comments.

      First, the detection rule is disjunctive and is reported as a finding. However, the conclusion that detection relies on a disjunctive rule ("present if A or V") closely mirrors the task instructions-participants are explicitly told to respond "present" if they detect the stimulus in any modality. As such, this seems more like a sanity check than a novel empirical finding. Relatedly, the conjunctive detection is a weak null. The conjunctive rule ("present only if both A and V") is behaviorally implausible given the task instructions. A more informative baseline would be an SDT-style scalar-evidence model (see comment 2), rather than a conjunctive rule that participants would have to actively violate the instructions to follow.

      Second, confidence in the model is defined as the probability of being correct at the time of the detection decision. However, this implies a fixed amount of evidence at decision time unless additional mechanisms are invoked. This issue is well known in diffusion modeling (see Kiani et al. 2014) and deserves explicit discussion; otherwise, it is unclear how the model produces graded confidence from a bound-crossing rule alone.

      Third, the authors do not consider a straightforward evidence-strength account of confidence. When both modalities indicate presence, there is, on average, more total sensory evidence than in unimodal trials, making correct decisions more likely and, under most frameworks, confidence higher. Likewise, weak evidence in both modalities can be stronger evidence for absence than moderate in one and weak in the other. Many of the patterns that motivate the presence-conjunctive/absence-disjunctive mix could arise from a model where confidence simply reflects the amount of evidence for the chosen option, without positing distinct logical integration rules for presence vs absence. As the authors note, purely disjunctive or purely conjunctive confidence rules fail to capture the trends in confidence reports in Figure 7, leading them to adopt a combined presence-conjunctive/absence-disjunctive rule. A more parsimonious alternative-that confidence scales with evidence magnitude and cross-modal agreement-should be explicitly considered and, ideally, implemented as a competing model. Finally, if the model is intended as a good account of the data, it would be useful to report whether it also reproduces the metacognitive efficiency patterns (M-ratios) beyond the mean confidence patterns shown in Figures 7-8. At present, the model appears systematically over-confident, which should be acknowledged and quantified.

      Indeed, the disjunctive rule was expected, given our design; we will clarify this. As mentioned above, we will directly compare the results of our current model with those of a more traditional SDT-based generative model, as suggested by the reviewer.

      Contrary to a classical drift diffusion model, the model does not assume a fixed decision boundary, but derives an optimal stopping policy per time point and belief state. As a result, and depending on beliefs about perceptual evidence and the temporal discounting factor, optimal decision boundaries can be asymmetric and may collapse asymmetrically toward 0. Furthermore, given the asymmetry in the information value between sensor activations and inactivations, and differences in the information value of sensor activations of the two modalities, boundary crossing can lead to belief states that are far or close to the decision boundary, depending on the nature of the evidence. Together, even without an explicit modeling of post-decisional evidence, the model can account for variability in the total accumulated evidence at decision time.

      From our understanding, the proposed alternative is equivalent to our current model, in which confidence scales with evidence magnitude.

      The model was not fitted to confidence data, which could explain its overall overconfidence. To further test our model, we will assess its ability to reproduce patterns of metacognitive efficiency (M-ratios).

      (7) Confidence asymmetry index (CAI) and modality weighting

      The confidence asymmetry index (CAI) is defined as the difference between auditory and visual confidence on AV vs absent trials, and the authors report strong correlations between observed and simulated CAI across participants. They interpret this as evidence that subjects place different weights on auditory vs visual signals. Several questions arise. First, does CAI capture asymmetries beyond what is expected from accuracy differences between modalities and conditions? Second, because the simulated data are generated from model fits to the observed data, a correlation between observed and simulated CAI is expected: the model is built to reproduce the individual patterns it is then compared to. A stronger test would compare CAI from data simulated with modality-specific belief parameters, versus CAI from data simulated with constrained equal belief parameters (same θs). Relatedly, the paper would benefit from a plot showing the distribution of θs for A and V- present stimuli across subjects. These values could also be related to unimodal sensitivity measured in the calibration/training phases. A natural prediction is that higher unimodal sensitivity should correspond to higher belief parameters for presence.

      The model was not fitted to either the modality-specific responses or the confidence ratings, so the correlation between observed and simulated CAI was not expected and provides a good test of our model's ability to reproduce the observed patterns. We will test whether the same correlations hold when using the difference in accuracy instead of the confidence.

      We found that the best model is the one with the same belief across the visual and auditory sensors. Given this, we cannot investigate how modality-specific belief parameters are linked to unimodal sensitivity for each participant.

      Reviewer 2:

      Summary:

      In this study, across two experiments, the authors wrestle with the question: What is the profile of confidence judgments in presence/absence decisions for audiovisual stimuli? After thresholding observers to 50% target detection rates in each modality, the authors conducted one experiment that included 75% target presence (spread equally across bimodal, auditory, and visual targets) and one experiment with 50% overall target presence. Results showed that, overall, detection performance was higher for audiovisual stimuli compared to unimodal ones, and that a recent model for stimulus detection could be extended to this multisensory scenario. By incorporating a disjunctive rule for absence judgments and a conjunctive rule for presence judgments, the model was able to qualitatively reproduce some of the trends observed in the human data regarding confidence.

      Strengths:

      (1) The paper makes novel contributions to the study of multisensory confidence judgments for yes/no target detection.

      (2) The paper further extends the use of a leading model of stimulus detection (from Mazor et al., 2025).

      (3) Pre-registration of the study was implemented, and the code is publicly available (although the GitLab link requires registration to access the materials).

      (4) One of the empirical results (higher confidence for absence compared to presence judgments) is especially interesting, contributing another empirical finding to a very mixed literature on this topic (as the authors note).

      We thank the reviewer for the positive evaluation of our work.

      Weaknesses:

      (1) Page 5 - I have concerns about the use of the equal-variance model from Signal Detection Theory to analyze the data. For example, the authors should read the recent paper by Miyoshi, Rahnev, and Lau in iScience, found at this link: https://www.cell.com/iscience/fulltext/S2589-0042(26)00373-1 . In this paper, the authors note how the equal variance model should be used with caution in yes/no detection tasks, since the variances of the "stimulus present" and "stimulus absent" distributions are often different from one another. In a revision, I highly recommend that the authors explicitly discuss this paper and review whether the assumptions for the equal-variance model have been met (e.g., since they have confidence data, one way to do this would be to evaluate if the slope of the line in zROC space differs from 1). The authors may also want to incorporate methods from this iScience paper into the current manuscript, or potentially move to using an unequal variance SDT model and compute d'a and c'a.

      This is an excellent suggestion. We will run this analysis and refit the d’ and criterion response using unequal-variance models to see whether we observe the same results.

      (2) Related to the computation/measurement of the response criterion, the authors note on page 18 in the Methods that for Experiment 1, signals are actually present on 75% of trials, since a bimodal stimulus is present on 25% of trials, the visual circle only occurs on 25% of trials, the sinusoidal tone occurs on 25% of trials, and then only noise is present on 25% of trials. Did the authors have any a priori hypotheses about the response criteria that participants would exhibit in Experiment 1, considering the unbalanced target presentation rate in this task? Also, in Experiment 2, what did it mean to equate target present and target absent trials? Is it that they broke 50% target present trials down into 16.67% bimodal targets, 16.67% visual targets, and 16.67% auditory targets? A few more details would be good to explicitly note for those trying to replicate the task

      We will clarify this point in the manuscript. In Experiment 2, the stimulus was absent on 50% of the trials. As a result, the 50% of stimulus present trials were split into the three possible conditions, resulting in a sixth of the trials being auditory, a sixth visual, and a sixth audiovisual; we will make these proportions clearer in the text.

      We did not have any a priori hypotheses about the response criteria for Experiment 1. The reviewer is right, the proportion of absent versus present trials can indeed have an impact on response bias. In fact, one of the goals of Experiment 2 was to test whether the low frequency of absent trials compared to present ones could explain both response bias and higher confidence in absence observed in Experiment 1, which we found was not the case, as we did not observe a difference between the two experiments. We will clarify this in our revision.

      (3) It is important to plot the individual data for Figure 2. If the authors didn't match detection performance for the visual and auditory modalities, it would be good to see the individual data to know why. Is it that the thresholding procedure didn't work for some of the participants in the visual modality, and that's why the "yes" response rate is (on average) ~60% or higher across the two experiments? Similarly, in the auditory domain, do the authors have participants that are at floor? Or is it simply that the staircases failed to successfully target 50% detection on average?

      We will add individual data to Figure 2.

      Indeed, staircases failed to achieve 50% detection on average; participants for whom psychometric curves did not converge were excluded, as were those at floor level in one of the two modalities.

      (4) The authors mentioned that data were collected on the Prolific platform. What checks did they conduct to ensure that this data wasn't produced by bots? There are recent high-profile publications in PNAS and Behavioral Research Methods that indicate how online data collection is problematic (e.g., https://www.pnas.org/doi/10.1073/pnas.2535585123and https://link.springer.com/article/10.3758/s13428-025-02852-7 ). What analyses or quality checks are there to ensure that humans were the ones completing the task?

      Data were collected on the Prolific platform, which has been shown to yield high-quality data (Kay, 2025). However, we agree that this is a potential concern and will add a note of caution in the revised manuscript, even if the risk that the data do not come from humans but from bots is low (Huskey et al., 2026; Chetverikov, 2026).

      (5) Page 7 - Since confidence was collected on a continuous scale, the authors should say a bit more about how they were able to compute measures of metacognitive efficiency. My understanding is that to compute meta-d', the data has to be binned. How was the binning implemented? With whatever bin size the authors chose, would it make any difference to the results if they changed the number of the bins in the analysis?

      We will clarify this aspect of the analysis. Data were binned into four quartiles based on the overall distribution of confidence values across participants, based on the binning used in the example in Fleming (2017). We will examine whether changing the number of bins changes the results (Dayan, 2023).

      (6) Page 8 - Is there a prior precedent for using slope of the Bayesian logistic regression predicting accuracy from confidence as a measure of metacognitive sensitivity? If so, can the authors cite those papers as a reference? If not, can they place this analysis within the context of other measures of metacognitive sensitivity that exist? (meta-d', AUROC (Type 2), etc.)

      Yes, logistic regression has been used to quantify metacognitive sensitivity before. We will add the relevant papers as references (e.g., Sandberg et al., 2010; Norman et al., 2011; Siedlecka et al., 2016; Wierzchoń et al., 2012; Faivre et al., 2018; Pereira et al., 2023)

      (7) Page 8 - Another one of the results on page 8 is worth reflecting further upon: the authors note how in Experiment 1, no credible difference was found between unimodal and bimodal trials (DeltaM = -0.25 [-0.59, 0.10]), but in Experiment 2, "we observed higher metacognitive efficiency in unimodal compared to bimodal trials (DeltaM = -0.28 [-0.54, -0.02]. Those DeltaM values are nearly identical, so without a power analysis motivating the number of participants the authors collected, how certain are they that the results from these two experiments are really that distinct? It reminds me a bit of the Andrew Gelman blog post, "The difference between significance and non-significance is not significant".

      The number of participants was determined using a Bayesian optional stopping rule, as preregistered. The reviewer is right that the delta values are very similar in the two experiments. Given that a difference was found in only one experiment, we decided not to draw conclusions from it.

      (8) Is there any way to look at whether the presence of multisensory hallucinations (or perhaps that word is too strong, and we should simply consider them miscategorizations) increased as the task progressed? That is, the authors have repeated presentations of audiovisual stimuli for at least some percentage of the trials. Since the percentages for auditory stimuli being correctly categorized as auditory are at 85% in Experiment 1 and 79% in Experiment 2, were the trials where they miscategorized these stimuli equally spread throughout the task? Or did they come later in the experiment, after being repeatedly exposed to multisensory trials?

      We will examine how the proportion of miscategorisation changed throughout the task.

      (9) Would the authors obtain the same results if they got rid of the amodal confidence judgment in their task, and simply had participants report the bimodal confidence following the presence/absence judgment? Part of the reason for asking this is that, according to page 11, the model is only fitted to amodal detection accuracy and response time data. This surprised me. I would have expected that the bimodal confidence would provide more useful information for the model fit. The authors should further explain this rationale in the paper. It seems odd to me to have the multisensory confidence ratings and not have them play a central role in the modeling work.

      Our main goal was to investigate how participants form integrated, supramodal confidence judgments on the basis of multisensory sources of information. Therefore, the amodal confidence judgments are required here.

      Moreover, the model was fitted to response times that corresponded to the amodal judgment. Because we had no meaningful response times for the modality-specific judgment, we could not use them to fit the model.

      (10) In Figure 6, it appears the model is a bit off in its estimate of auditory responses (panel B, E) in the AV condition. Do the authors have any intuitions about why this might be happening?

      Indeed, the model does not capture the full behavioral effects reflecting multisensory interference in the modality-specific responses. We suppose that the model does not reproduce these interferences, as it is only fitted to amodal detection accuracy, and as the two sensors are completely independent from one another. We will clarify this aspect in the text.

      (11) The authors talk about how the model is reproducing effects in the human data, but there's no systematic comparison, quantitatively, of how the two things relate. The authors should include some quantitative measure that reflects this

      In addition to the d’ and criterion comparison between the observed and simulated data, we will compare modality-specific d’ and the correlations between observed and simulated confidence.

      (12) Related to this, I am not sure I agree with the characterization in Figure 7 that "when confidence followed a disjunctive rule, the model failed to capture important aspects of the data. On the other hand, when confidence followed a conjunctive rule, it reproduced confidence in presence judgments but failed to capture variability in confidence ratings for absence judgments." What, quantitatively, is the basis of this claim? This applies to Figure 8, too. I am not clear how, specifically, and quantitatively, the authors are justifying their claims about model fits. I don't think the confidence asymmetry index in Figure 8 is enough to quantify the quality of the model fitting procedure.

      To further support this claim, we will add a quantitative comparison of the different confidence fits.

      (13) Is there any chance the higher metacognitive efficiency for auditory trials is simply driven by differences in the d' values across the modalities? It might be good to probe this effect further.

      Thank you for this remark. Indeed, the difference in metacognitive efficiency may be driven by differences in the d’ values, and so a lower d’ for auditory stimuli can lead to higher metacognitive efficiency for a similar metacognitive sensitivity.

      Reviewer 3:

      This study used a pre-registered novel behavioural paradigm and computational modelling to investigate multi-sensory influences on detection and confidence. Participants performed amodal detection of auditory and visual stimuli (indicating that a stimulus was there when either an auditory stimulus or a visual stimulus or both were present), followed by amodal and unimodal confidence ratings. Detection was higher when both stimuli were present, and the presence of one modality increased the confidence in the presence of the other modality. In contrast to previous detection studies, confidence was higher for absent than for present judgements, but metacognitive efficiency was higher for present judgements. Metacognitive sensitivity was higher for bimodal stimuli, but this was not the case for metacognitive efficiency, suggesting that the sensitivity might be driven by first-order performance. The computational model showed that both detection and confidence in absence followed a disjunctive evidence integration rule, while confidence in presence followed a conjunctive integration rule.

      We thank the reviewer for engaging with our work.

      Strengths:

      The paper has several major strengths. Firstly, it addresses a novel research question using an innovative and well-controlled paradigm. Furthermore, the paradigm and analyses were pre-registered, and all effects that were interpreted were replicated in two independent samples. Finally, the paper uses an advanced computational model to capture counterintuitive patterns in the data.

      Weaknesses:

      The major weakness of the paper is the narrative structure. It is not always clear how the different analyses relate to the main research question. Many different effects are reported in terms of detection accuracy, bias, confidence and metacognition, as well as cross-modal and unimodal versus bimodal effects. It would help readability if the paper were streamlined in terms of the research question that is being answered, which I believe is specifically about multimodal absence judgements. Relatedly, for a reader not intimately familiar with the metacognition literature, the difference between MRatio, metacognitive sensitivity and metacognitive efficiency is not obvious. It would be good to clarify this more in the manuscript.

      We will improve the narrative structure so that each result clearly relates to the research question.

      We will also add a clearer definition of the various metacognition metrics to improve readability.

      In general, the conclusions drawn by the authors seem to be supported by the results. However, I was missing quantitative model comparisons between the conjunctive and the disjunctive models and an explanation of why the models systematically overestimated the confidence ratings. Furthermore, the 'perceptual multisensory interference' section reports on very interesting effects, but these are not supported by statistical tests in the main text. It would help to assess the strength of the claims if the statistical evidence in favour of these claims were presented together in the main text.

      The model was not fitted to confidence data, which could explain its overall overconfidence. As stated in previous responses, we will perform additional analyses to evaluate the model’s ability to reproduce confidence ratings. As some of the results were not replicated across experiments, we decided to put all statistical results related to multisensory interference in the supplementary materials and to focus only on consistent results across experiments.

      One other concern is that in real-world multi-sensory perception, such as the mosquito example in the introduction, the auditory and visual signals have a strong natural association, which means that if you hear the auditory signal, you expect that you will see the visual signal soon and vice versa. As far as I understood, this association was not present in the current paradigm, which might influence the type of effects that one would expect to see.

      The relation here is indeed artificial; we try to reinforce it as much as possible in the instructions of the task by indicating to the participants that they have to “detect a mosquito” that could be present auditory, visually, or both. But we acknowledge that the association between the visual and auditory stimuli is artificial, which may indeed influence our results.

      References

      Alais, D., & Burr, D. (2004). The Ventriloquist Effect Results from Near-Optimal Bimodal Integration. Current Biology, 14(3), 257‑ 262. https://doi.org/10.1016/j.cub.2004.01.029

      Battaglia, P. W., Jacobs, R. A., & Aslin, R. N. (2003). Bayesian integration of visual and auditory signals for spatial localization. JOSA A, 20(7), 1391‑ 1397. https://doi.org/10.1364/JOSAA.20.001391

      Chetverikov, A. (2026). Online behavioral studies are safe for now : Unusual RTs do not imply bots (A reply to Van der Stigchel et al., 2026) (Gjw5u_v1). PsyArXiv. https://osf.io/preprints/psyarxiv/gjw5u_v1/

      Dayan P. (2023). Metacognitive Information Theory. Open mind : discoveries in cognitive science, 7, 392–411. https://doi.org/10.1162/opmi_a_00091

      Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415(6870), Article 6870. https://doi.org/10.1038/415429a

      Faivre, N., Filevich, E., Solovey, G., Kühn, S., & Blanke, O. (2018). Behavioral, Modeling, and Electrophysiological Evidence for Supramodality in Human Metacognition. Journal of Neuroscience, 38(2), 263‑ 277. https://doi.org/10.1523/JNEUROSCI.0322-17.2017

      Fleming, S. M. (2017). HMeta-d : Hierarchical Bayesian estimation of metacognitive efficiency from confidence ratings. Neuroscience of Consciousness, 2017(1),

      Huskey, R., Zhao, Z., Parry, D. A., & Fisher, J. T. (2026). An AI agent can complete the Attention Network Test with human-like behavioral signatures : Implications for the bot-or-not debate (T2jru_v1). PsyArXiv. https://osf.io/preprints/psyarxiv/t2jru_v1/

      Kay, C.S. Why you shouldn’t trust data collected on MTurk. Behav Res 57, 340 (2025). https://doi.org/10.3758/s13428-025-02852-7nix007. https://doi.org/10.1093/nc/nix007

      Norman, E., Price, M. C., & Jones, E. (2011). Measuring strategic control in artificial grammar learning. Consciousness and Cognition, 20(4), 1920-1929. https://doi.org/10.1016/j.concog.2011.07.008

      Pereira, M., Skiba, R., Cojan, Y., Vuilleumier, P., & Bègue, I. (2023). Preserved Metacognition for Undetected Visuomotor Deviations. Journal of Neuroscience, 43(35), 6176‑ 6184. https://doi.org/10.1523/JNEUROSCI.0133-23.2023

      Rahnev, D. (2025). A comprehensive assessment of current methods for measuring metacognition. Nature Communications, 16(1), 701. https://doi.org/10.1038/s41467-025-56117-0

      Sandberg, K., Timmermans, B., Overgaard, M., & Cleeremans, A. (2010). Measuring consciousness : Is one measure better than the other? Consciousness and Cognition, 19(4), 1069‑ 1078. https://doi.org/10.1016/j.concog.2009.12.013

      Siedlecka, M., Paulewicz, B., & Wierzchoń, M. (2016). But I Was So Sure ! Metacognitive Judgments Are Less Accurate Given Prospectively than Retrospectively. Frontiers in Psychology, 0. https://doi.org/10.3389/fpsyg.2016.00218

      Wierzchoń, M., Asanowicz, D., Paulewicz, B., & Cleeremans, A. (2012). Subjective measures of consciousness in artificial grammar learning task. Consciousness and cognition, 21(3), 1141-1153. https://doi.org/10.1016/j.concog.2012.05.012

    2. Reviewer #2 (Public review):

      Summary:

      In this study, across two experiments, the authors wrestle with the question: What is the profile of confidence judgments in presence/absence decisions for audiovisual stimuli? After thresholding observers to 50% target detection rates in each modality, the authors conducted one experiment that included 75% target presence (spread equally across bimodal, auditory, and visual targets) and one experiment with 50% overall target presence. Results showed that, overall, detection performance was higher for audiovisual stimuli compared to unimodal ones, and that a recent model for stimulus detection could be extended to this multisensory scenario. By incorporating a disjunctive rule for absence judgments and a conjunctive rule for presence judgments, the model was able to qualitatively reproduce some of the trends observed in the human data regarding confidence.

      Strengths:

      (1) The paper makes novel contributions to the study of multisensory confidence judgments for yes/no target detection.

      (2) The paper further extends the use of a leading model of stimulus detection (from Mazor et al., 2025).

      (3) Pre-registration of the study was implemented, and the code is publicly available (although the GitLab link requires registration to access the materials).

      (4) One of the empirical results (higher confidence for absence compared to presence judgments) is especially interesting, contributing another empirical finding to a very mixed literature on this topic (as the authors note).

      Weaknesses:

      (1) Page 5 - I have concerns about the use of the equal-variance model from Signal Detection Theory to analyze the data. For example, the authors should read the recent paper by Miyoshi, Rahnev, and Lau in iScience, found at this link: https://www.cell.com/iscience/fulltext/S2589-0042(26)00373-1. In this paper, the authors note how the equal variance model should be used with caution in yes/no detection tasks, since the variances of the "stimulus present" and "stimulus absent" distributions are often different from one another. In a revision, I highly recommend that the authors explicitly discuss this paper and review whether the assumptions for the equal-variance model have been met (e.g., since they have confidence data, one way to do this would be to evaluate if the slope of the line in zROC space differs from 1). The authors may also want to incorporate methods from this iScience paper into the current manuscript, or potentially move to using an unequal variance SDT model and compute d'a and c'a.

      (2) Related to the computation/measurement of the response criterion, the authors note on page 18 in the Methods that for Experiment 1, signals are actually present on 75% of trials, since a bimodal stimulus is present on 25% of trials, the visual circle only occurs on 25% of trials, the sinusoidal tone occurs on 25% of trials, and then only noise is present on 25% of trials. Did the authors have any a priori hypotheses about the response criteria that participants would exhibit in Experiment 1, considering the unbalanced target presentation rate in this task? Also, in Experiment 2, what did it mean to equate target present and target absent trials? Is it that they broke 50% target present trials down into 16.67% bimodal targets, 16.67% visual targets, and 16.67% auditory targets? A few more details would be good to explicitly note for those trying to replicate the task.

      (3) It is important to plot the individual data for Figure 2. If the authors didn't match detection performance for the visual and auditory modalities, it would be good to see the individual data to know why. Is it that the thresholding procedure didn't work for some of the participants in the visual modality, and that's why the "yes" response rate is (on average) ~60% or higher across the two experiments? Similarly, in the auditory domain, do the authors have participants that are at floor? Or is it simply that the staircases failed to successfully target 50% detection on average?

      (4) The authors mentioned that data were collected on the Prolific platform. What checks did they conduct to ensure that this data wasn't produced by bots? There are recent high-profile publications in PNAS and Behavioral Research Methods that indicate how online data collection is problematic (e.g., https://www.pnas.org/doi/10.1073/pnas.2535585123 and https://link.springer.com/article/10.3758/s13428-025-02852-7). What analyses or quality checks are there to ensure that humans were the ones completing the task?

      (5) Page 7 - Since confidence was collected on a continuous scale, the authors should say a bit more about how they were able to compute measures of metacognitive efficiency. My understanding is that to compute meta-d', the data has to be binned. How was the binning implemented? With whatever bin size the authors chose, would it make any difference to the results if they changed the number of the bins in the analysis?

      (6) Page 8 - Is there a prior precedent for using slope of the Bayesian logistic regression predicting accuracy from confidence as a measure of metacognitive sensitivity? If so, can the authors cite those papers as a reference? If not, can they place this analysis within the context of other measures of metacognitive sensitivity that exist? (meta-d', AUROC (Type 2), etc.)

      (7) Page 8 - Another one of the results on page 8 is worth reflecting further upon: the authors note how in Experiment 1, no credible difference was found between unimodal and bimodal trials (DeltaM = -0.25 [-0.59, 0.10]), but in Experiment 2, "we observed higher metacognitive efficiency in unimodal compared to bimodal trials (DeltaM = -0.28 [-0.54, -0.02]. Those DeltaM values are nearly identical, so without a power analysis motivating the number of participants the authors collected, how certain are they that the results from these two experiments are really that distinct? It reminds me a bit of the Andrew Gelman blog post, "The difference between significance and non-significance is not significant".

      (8) Is there any way to look at whether the presence of multisensory hallucinations (or perhaps that word is too strong, and we should simply consider them miscategorizations) increased as the task progressed? That is, the authors have repeated presentations of audiovisual stimuli for at least some percentage of the trials. Since the percentages for auditory stimuli being correctly categorized as auditory are at 85% in Experiment 1 and 79% in Experiment 2, were the trials where they miscategorized these stimuli equally spread throughout the task? Or did they come later in the experiment, after being repeatedly exposed to multisensory trials?

      (9) Would the authors obtain the same results if they got rid of the amodal confidence judgment in their task, and simply had participants report the bimodal confidence following the presence/absence judgment? Part of the reason for asking this is that, according to page 11, the model is only fitted to amodal detection accuracy and response time data. This surprised me. I would have expected that the bimodal confidence would provide more useful information for the model fit. The authors should further explain this rationale in the paper. It seems odd to me to have the multisensory confidence ratings and not have them play a central role in the modeling work.

      (10) In Figure 6, it appears the model is a bit off in its estimate of auditory responses (panel B, E) in the AV condition. Do the authors have any intuitions about why this might be happening?

      (11) The authors talk about how the model is reproducing effects in the human data, but there's no systematic comparison, quantitatively, of how the two things relate. The authors should include some quantitative measure that reflects this.

      (12) Related to this, I am not sure I agree with the characterization in Figure 7 that "when confidence followed a disjunctive rule, the model failed to capture important aspects of the data. On the other hand, when confidence followed a conjunctive rule, it reproduced confidence in presence judgments but failed to capture variability in confidence ratings for absence judgments." What, quantitatively, is the basis of this claim? This applies to Figure 8, too. I am not clear how, specifically, and quantitatively, the authors are justifying their claims about model fits. I don't think the confidence asymmetry index in Figure 8 is enough to quantify the quality of the model fitting procedure.

      (13) Is there any chance the higher metacognitive efficiency for auditory trials is simply driven by differences in the d' values across the modalities? It might be good to probe this effect further.

      (14) Lastly, I think it would be interesting to look at how instructions about modality-specific attention could modulate these findings, in terms of how unimodal (unimodal visual, unimodal auditory) or bimodal attention might modulate these effects. This is an idea for future work.

    1. The way we present ourselves to others around us (our behavior, social role, etc.) is called our public persona [f20]. We also may change how we behave and speak depending on the situation or who we are around, which is called code-switching [f21]. While modified behaviors to present a persona or code switch may at first look inauthentic, they can be a way of authentically expressing ourselves in each particular setting. For example: Speaking in a formal manner when giving a presentation or answering questions in a courtroom may be a way of authentically sharing your experiences and emotions, but tailored to the setting Sharing those same experiences and emotions with a close friend may look very different, but still can be authentic Different communities have different expectations and meanings around behavior and presentation. So what is appropriate authentic behavior depends on what group you are from and what group you are interacting with, like this gif of President Obama below: Fig. 6.6 President Obama giving a very different handshakes [f22] to a white man and a Black man (Kevin Durant [f23]). See also this Key & Peele comedy sketch on greeting differences [f24] with Jordan Peele [f25] playing Obama, and also Key & Peele’s Obama’s Anger Translator sketch [f26].# Read/watch more about code-switching here: How Code-Switching Explains The World [f27] ‘Key & Peele’ Is Ending. Here Are A Few Of Its Code Switch-iest Moments [f28] Still, modifications of behavior can also be inauthentic. In the YouTube Video Essay: YouTube: Manufacturing Authenticity (For Fun and Profit!) [f29] by Lindsay Ellis, Ellis explores nuances in authenticity as a YouTuber. She highlights the emotional labor [f30] of keeping emotional expressions consistent with their public persona, even when they are having different or conflicted feelings. She also highlights how various “calls to action” (e.g., “subscribe to my channel”) may be necessary for business and can be (and appear) authentic or inauthentic.

      It appears that in the chapter "authenticity" is used as a positive term for social roles without defining what differentiates an authentic role from one that has been constructed as a performance. At what point, if a role is being carried out consistently and deliberately, does the performance become the person? This question is hinted at through Ellis, yet it leaves room for philosophical inquiry as to when a "genuine self" becomes apparent (bad faith by Sartre, or perhaps simply the issue of whether there is any true self at all).

    2. The way we present ourselves to others around us (our behavior, social role, etc.) is called our public persona [f20]. We also may change how we behave and speak depending on the situation or who we are around, which is called code-switching [f21]. While modified behaviors to present a persona or code switch may at first look inauthentic, they can be a way of authentically expressing ourselves in each particular setting. For example: Speaking in a formal manner when giving a presentation or answering questions in a courtroom may be a way of authentically sharing your experiences and emotions, but tailored to the setting Sharing those same experiences and emotions with a close friend may look very different, but still can be authentic Different communities have different expectations and meanings around behavior and presentation. So what is appropriate authentic behavior depends on what group you are from and what group you are interacting with, like this gif of President Obama below:

      Yes I actually agree with this observation that we often distinguish our behavior when infront of people that are new to us to those that we are close to. I have personally expereinced this several times in my life where often polish my personality in way that will please others when I am infront of newer people. But the same is not present when I talk to my closer friends whom I have known for a long time.

    1. Week of April 14: Confirm Claude Code/Cowork access for all. Demo of Granola team folder + research site generation workflow.

      Add to the meeting agenda please

    1. Reviewer #2 (Public review):

      Summary:

      The authors have developed an elegant, lightweight, open-source system that should be able to be widely disseminated to the community. They have used this system in multiple experimental paradigms and demonstrate its functionality quite elegantly. One of these experiments involves two of three animals in the arena being stimulated, a situation that clearly requires an untethered approach. They have appropriately quantified key system parameters (latency and battery life).

      Strengths:

      The introduction places this work in a broader context. That context includes a number of previous solutions, many of which are smaller or more technically complex. However, I agree with the authors that there is a need for something that is easy for labs to acquire and deploy in terms of both what goes on the head and the broader infrastructure (i.e., not needing complex wireless power delivery approaches).

      The paper does an excellent job of describing the system architecture. And the architecture is good! Their system comprises more than just the bluetooth enabled head-mounted devices - they also have built an interface that allows for TTL triggers that link into existing workflows.

      The key metrics for a device like this are weight, battery life, and latency. The weight is 1.4g, which is appropriate for adult mice; the battery life is ~100 minutes of continuous stimulation, which should be sufficient for many experiments, and the latency is typically less than 30 ms, which is fine for all but the most demanding closed-loop experiments.

      Performance is demonstrated in two experiments, a continuous Y-maze, which elegantly demonstrates how transfected animals learn to sense optogenetic closed-loop stimulation to drive their choice behavior in a way that control-stimulated animals do not. While authors claim that the ~2m diameter apparatus is "large scale", the second behavior more convincingly demonstrates the need for wireless stimulation.

      They used closed-loop monitoring of animal pose to selectively stimulate animals for approaching the tails of a dominant conspecific (based on pre-experimental pairwise assessments). It seems that the original hope was that the increases in following that they observe would result in long-lasting changes in the hierarchy of a cage, but as they report, this was not observed. Critically, their supplementary video demonstrates that they conducted this experiment with two instrumented animals simultaneously. This is a situation where a tether would have been hopelessly tangled within a few moments!

      The online documentation seems complete, and it seems quite possible for other labs to adopt and deploy the system.

      Weaknesses:

      The battery life is highly dependent on the stimulation paradigm. It makes sense that the LED is a major component of power consumption. It would have been elegant to measure the total optical energy that can be provided by the system. In addition, Bluetooth transmission is probably a major consumer of power, and receiving may not be "free". Quantifying power as a function of Bluetooth message rates would have been useful.

      Presumably, the major constraint on latency is that the Bluetooth receiver polls at ~10 Hz, resulting in latency blocks of 20+, 30+, or 40+ ms. Why latency is never less than 10 ms is unclear. Could latency be reduced by changing a setting? Having a low-latency option would be very helpful for some experimental situations. Latency is probably the primary weakness of the system.

      The programming process sounds quite complicated. It would be nice if they had OTA updates. But described and open source. Similarly, the configuration process (Arduino IDE) seems a bit complex. It would be nice if there were a dedicated cross-platform application.

      It is unclear what the maximum number of devices that could be used without wireless interference is. The base station has two charging stations, but it would have been nice to understand the limits beyond this number.

      There is a very nice website for the system, but there is some concern that the code and design files are not archived. Could they be deposited with the paper?

    2. Author response:

      eLife Assessment

      This work presents a valuable new open-source tool for wirelessly controlling optogenetic stimulation in neuroscience experiments in behaving rodents. Evidence for its potential usefulness in different types of optogenetic experiments is solid, although some details and concerns were viewed as lacking or overlooked (e.g., system latency, battery weight). The work is expected to interest neuroscientists working with optogenetics and neuroengineers developing small-sized integrated devices for rodent experiments.

      We thank the eLife team for taking the time to consider and assess our manuscript. Please find below our provisional author responses accompanying the first version of the Reviewed Preprint.

      We would like to clarify an important error regarding the battery model reported in the manuscript. We mistakenly referred to the CP1254-A3 (1.8 g), whereas the battery used for all devices is the CP9440 A4X (0.8 g).

      Importantly, this correction reduces the total device weight by approximately 1 g compared to the value assumed by Reviewer #3. We believe this directly addresses the concern raised regarding battery weight in both the individual review and the overall eLife assessment.

      We will correct this error in the revised manuscript and clearly report the exact battery model and total device weight.

      For reference, the official VARTA CoinPower catalog is available here:

      https://www.varta-ag.com/fileadmin/varta/industry/downloads/products/lithium-ion-cells/VARTA_CoinPower_EN_digital_221124_A5_6p.pdf

      The battery used in BlueBerry is listed on the last line of page 2.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper presents a wireless device for closed-loop control of optogenetic stimulation based on behavioral triggers. The authors demonstrate the device through two behavioral experiments in mice, showcasing the device's capabilities and emphasizing open accessibility and using off-the-shelf components.

      Strengths:

      The paper presents a device that is open access and easily reproducible for wireless stimulation in a closed loop based on behavioral triggers. Other strengths of the device include the simultaneous use of multiple devices in parallel and the claimed ease of integration with existing frameworks. The paper shows to behavioral experiments on multiple mice along with some device validation results.

      We thank the reviewer for the statement.

      Weaknesses:

      The main weakness of the presented device lies in the lack of flexibility in stimulation power. For a device that is intended for stimulation only, having to physically change a component on the board to adapt stimulation power is a major downside. Reprogrammable stimulation current is not complex to implement and should really have been included on this device. Another weakness lies in the limited battery life of the device. While using a battery-powered device decreases spatial constraints, allowing for the maze experiment presented in the paper, it also means the lifespan of the device is limited compared to an inductively powered device, limiting its ability for long-term experiments.

      We thank the reviewer for these valuable comments. We did consider implementing programmable control of stimulation power, for example using a digital potentiometer. However, in our current design this approach was not sufficient because the output current supported by typical digital potentiometers is too low for the high-power LEDs used in our system. For this reason, we did not include programmable stimulation current in the present version. We agree that this is a limitation and that further work is needed to identify a suitable solution for adjustable stimulation power, which we plan to pursue in future versions of the device. We will revise the manuscript to make this limitation and future direction clearer.

      We also agree that the use of a battery-powered wireless system introduces an important trade-off. We will revise the manuscript to discuss this limitation more explicitly.

      Reviewer #2 (Public review):

      Summary:

      The authors have developed an elegant, lightweight, open-source system that should be able to be widely disseminated to the community. They have used this system in multiple experimental paradigms and demonstrate its functionality quite elegantly. One of these experiments involves two of three animals in the arena being stimulated, a situation that clearly requires an untethered approach. They have appropriately quantified key system parameters (latency and battery life).

      Strengths:

      The introduction places this work in a broader context. That context includes a number of previous solutions, many of which are smaller or more technically complex. However, I agree with the authors that there is a need for something that is easy for labs to acquire and deploy in terms of both what goes on the head and the broader infrastructure (i.e., not needing complex wireless power delivery approaches).

      The paper does an excellent job of describing the system architecture. And the architecture is good! Their system comprises more than just the bluetooth enabled head-mounted devices - they also have built an interface that allows for TTL triggers that link into existing workflows.

      The key metrics for a device like this are weight, battery life, and latency. The weight is 1.4g, which is appropriate for adult mice; the battery life is ~100 minutes of continuous stimulation, which should be sufficient for many experiments, and the latency is typically less than 30 ms, which is fine for all but the most demanding closed-loop experiments.

      Performance is demonstrated in two experiments, a continuous Y-maze, which elegantly demonstrates how transfected animals learn to sense optogenetic closed-loop stimulation to drive their choice behavior in a way that control-stimulated animals do not. While authors claim that the ~2m diameter apparatus is "large scale", the second behavior more convincingly demonstrates the need for wireless stimulation.

      They used closed-loop monitoring of animal pose to selectively stimulate animals for approaching the tails of a dominant conspecific (based on pre-experimental pairwise assessments). It seems that the original hope was that the increases in following that they observe would result in long-lasting changes in the hierarchy of a cage, but as they report, this was not observed. Critically, their supplementary video demonstrates that they conducted this experiment with two instrumented animals simultaneously. This is a situation where a tether would have been hopelessly tangled within a few moments!

      The online documentation seems complete, and it seems quite possible for other labs to adopt and deploy the system.

      We appreciate the reviewer’s enthusiasm. Thank you.

      Weaknesses:

      The battery life is highly dependent on the stimulation paradigm. It makes sense that the LED is a major component of power consumption. It would have been elegant to measure the total optical energy that can be provided by the system. In addition, Bluetooth transmission is probably a major consumer of power, and receiving may not be "free". Quantifying power as a function of Bluetooth message rates would have been useful.

      We thank the reviewer for this important suggestion. We agree that this is a missing characterization in the current manuscript. In the revised version, we will include a more detailed analysis of the system’s power budget, including the maximum stimulation power supported by the BlueBerry device, the corresponding output currents, and the contribution of the main integrated circuits to overall current consumption.

      Presumably, the major constraint on latency is that the Bluetooth receiver polls at ~10 Hz, resulting in latency blocks of 20+, 30+, or 40+ ms. Why latency is never less than 10 ms is unclear. Could latency be reduced by changing a setting? Having a low-latency option would be very helpful for some experimental situations. Latency is probably the primary weakness of the system.

      In the revised manuscript, we will clarify more explicitly that latency is a key limitation of the current system. We will also further investigate the source of this latency, including whether it can be reduced through additional configuration changes. In addition, we will include comparative latency measurements using different Arduino modules as the central BLE controller for the BlueHub device.

      The programming process sounds quite complicated. It would be nice if they had OTA updates. But described and open source. Similarly, the configuration process (Arduino IDE) seems a bit complex. It would be nice if there were a dedicated cross-platform application.

      We will investigate this matter and provide a simpler install and configuration script to setup both the BlueHub and Blueberry systems.

      It is unclear what the maximum number of devices that could be used without wireless interference is. The base station has two charging stations, but it would have been nice to understand the limits beyond this number.

      Due to the current structure of the ArduinoBLE library used in BlueHub devices, each BlueHub unit can support active communication with up to maximum 3 BlueBerry units. We thank the reviewer for highlighting this point and in the next version of the paper we will clarify this point.

      There is a very nice website for the system, but there is some concern that the code and design files are not archived. Could they be deposited with the paper?

      In the revised submission, we will deposit all code used to program both the BlueHub and BlueBerry devices, together with the Gerber files required for PCB fabrication, alongside the paper.

      Reviewer #3 (Public review):

      Summary:

      This study presents a novel device for wireless control of optogenetic stimulation of the mouse brain, the Blueberry, using Bluetooth Low Energy (BLE) communication for parallel activation of up to 4 devices through an Arduino interface. The authors also present two types of brain implants for light delivery that can be connected to the Blueberry: one using uLEDs for surface cortical stimulation, and another using optical fibers for intra- or sub-cortical implants. The architecture of the system, including electronics, communication, and programming, is thoroughly described. Because the system was especially designed to be integrated with existing software used for neuroscience behavioral experiment for closed-loop experiments, validation of the system is shown on two different scenarios: a learning task in a "infinite" Y-maze, where light delivery at precise locations conditions arm choice for navigation; and a social interaction analysis where 3 animals are simultaneously stimulated in order to alter social dynamics among the group.

      Strengths:

      (1) The full system can be built by individual labs with simple PCB printing, off-the-shelf components, and readily available hardware (Arduino) for widespread dissemination.

      (2) Four headstages can be controlled in parallel for simultaneous experiments with multiple mice.

      (3) Validation across different relevant behavioral tests, demonstrating the potential of integrating Bluberry in closed-loop setups.

      We thank the reviewer for the statement.

      Weaknesses:

      (1) Some details in the manuscript regarding system characterization (latency, battery life, etc) are included only in the supplementary materials.

      As correctly mentioned, in the revised manuscript we will move the necessary quantifications from supplementary section to main section.

      (2) The practical details of integration with other commercial and open-source software used for the closed-loop experiments, which could help third-party researchers interested in using the system, are lacking sufficient detail.

      We will clarify this point more clearly in the revised manuscript.

      (3) System range (3 meters reported) is limited for a BLE device.

      The system range reported is the range considered as reliable communication range. In the revised manuscript we quantify this problem by reporting the Received Signal Strength (RSS) value for multiple BlueBerry devices across varying distances.  

      (4) Light output amplitude is not programmable, limiting the choice of stimulation protocols and LEDs used.

      That is indeed a limitation of our system, we will investigate the feasibility of integrating programmable stimulation protocols in the updated version of BlueBerry device.

      (5) Thermal modeling of the cortical surface stimulator was not performed, and it is unclear if the brain implant for this purpose is within the safety limits.

      We thank the reviewer for this comment. In the revised manuscript, we will clarify that the thermal measurements reported here apply only to the specific superficial implant geometry and stimulation conditions used in this study. Because tissue heating depends strongly on implant design and on parameters such as optical power, pulse width, and stimulation frequency, a general safety statement cannot be made for all possible implant configurations. Since the primary goal of this work is to present the wireless device platform rather than to validate a particular implant design, thermal safety should be evaluated individually for each implant and stimulation paradigm.

      (6) The paper is missing a comparison with other state-of-the-art devices for wireless control of optogenetic stimulation in mice.

      In the revised manuscript, we will include a comparison table summarizing our system alongside currently available wireless optogenetic devices.

    1. Because MCP is a newly developed protocol, there are many potential vulnerabilities that have not yet been fully explored or addressed. While some tools have been created specifically to protect MCP systems, Pysealer offers a more general solution by focusing on the integrity of the underlying source code itself. This broader approach helps safeguard against a wide range of attacks, not just those unique to MCP. The importance of protecting MCP and similar systems is underscored by the significant financial impact of cybersecurity breaches. For example, the average cost of a data breach in the United States in 2024 was $9.36 million [2]. As organizations increasingly rely on MCP for critical AI applications, implementing robust security measures like Pysealer becomes essential to prevent costly incidents.

      Too short to be its own subsection. Expand or merge with subsequent sections

    2. At a high level, Pysealer introduces a novel approach to version control by enabling code to version control other code.

      Is this actually true? In my understanding, Pysealer is not really doing version control in the Git sense

    1. Author Response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study addresses a critical and timely question regarding the role of a subpopulation of cortical interneurons (Chrna2-expressing Martinotti cells) in motor learning and cortical dynamics. However, while some of the behavior and imaging data are impressive, the small sample sizes and incomplete behavioral and activity analyses make interpretation difficult; therefore, they are insufficient to support the central conclusions. The study may be of interest to neuroscientists studying cortical neural circuits, motor learning, and motor control.

      We thank the reviewers and the editors for the insightful comments. We are pleased to report that the raised issues with the manuscript can be addressed by improving clarity in our writing of specific sections and by providing additional analysis. Specifically, it was not clear in the manuscript text that although we show illustrative data with a lower number of animals, our conclusions are supported by data with a larger and sufficient sample size. Also, the description of our control experiments has been improved to clarify our proper treatment controls. We therefore clarify below that our study presents compelling and sufficient evidence to support our conclusions. We have responded to all the comments, explaining how each concern has been addressed. All line and figure numbers mentioned here refer to the numbering of the reviewed manuscript version. All references are cited as DOIs.

      Reviewer #1 (Public review):

      There are many major issues with the study. The findings across experiments are inconsistent, and it is unclear how the authors performed their analyses or why specific time points and comparisons were chosen. The study requires major re-analysis and additional experiments to substantiate its conclusions.

      The main limitation of the study lies in its small sample sizes and the absence of key control experiments, which substantially weaken the strength of the conclusions.

      (1a) Behavior task - the pellet-reaching task is a well-established paradigm in the motor learning field. Why did the authors choose to quantify performance using "success pellets per minute" instead of the more conventional "success rate" (see PMID 19946267, 31901303, 34437845, 24805237)? It is also confusing that the authors describe sessions 1-5 as being performed on a spoon, while from session 6 onward, the pellets are presented on a plate. However, in lines 710-713, the authors define session 1 as "naive," session 2 as "learning," session 5 as "training," and "retraining" as a condition in which a more challenging pellet presentation was introduced. Does "naive session 1" refer to the first spoon session or to session 6 (when the food is presented on a plate)? The same ambiguity applies to "learning session 2," "training session 5," and so on. Furthermore, what criteria did the authors use to designate specific sessions as "learning" versus "training"? Are these definitions based on behavioral performance thresholds or some biological mechanisms? Clarifying these distinctions is essential for interpreting the behavioral results.

      We agree that success rate is a more conventional measure than the number of successful prehensions per minute. We have changed all behavior quantifications to success rate. Note that all behavioral conclusions drawn before are still valid under the new quantification (see Figures 1, 4, and 5). Importantly, the terms “learning,” “training,” and “retraining” were defined based on task structure and prior literature on motor learning stages rather than predetermined behavioral performance thresholds. These labels reflect progression through the task design (initial acquisition, continued practice under stable conditions, and adaptation to altered task demands), not biologically distinct or threshold-defined phases. We have revised the Methods section to make these definitions and transitions explicit to avoid ambiguity in interpreting the behavioral results.

      (1b) Judging from Figures 1F and 4B, even in WT mice, it is not convincing that the animals have actually learned the task. In all figures, the mice generally achieve 10-20 pellets per minute across sessions. The only sessions showing slightly higher performance are session 5 in Figure 1F ("train") and sessions 12 and 13 in Figure 4B ("CLZ"). In the classical pellet-reaching task, animals are typically trained for 10-12 sessions (approximately 60 trials per session, one session per day), and a clear performance improvement is observed over time. The authors should therefore present performance data for each individual session to determine whether there is any consistent improvement across days. As currently shown, performance appears largely unchanged across sessions, raising doubts about whether motor learning actually occurred.

      As described in the methods Single pellet prehension task section, in our setup box, the elevated plate slot for pellet delivery is at a challenging position, outside the slit and 2cm to the right, forcing the mice to use the left paw. Therefore, mice need to be trained in gradually harder positions, using a spoon to deliver the pellet instead of placing it directly at the plate slot. Due to the gradually increasing difficulty in the task, the success rate curve remains flat, while the total number of attempts and number of successful prehensions per minute increase (Figure 1 F-H). We therefore argue that motor learning indeed occurred, with a relatively constant success rate when performing a gradually harder task. Further, the success rate and number of successful prehensions of our mice is within levels previously reported for trained mice (10.3791/51238). We added the precise plate slot position in the methods section to make clearer the need of a gradually increasing difficulty delivery method.

      (1c) The authors also appear to neglect existing literature on the role of SST-INs in motor learning and local circuit plasticity (e.g., PMID 26098758, 36099920). Although the current study focuses on a specific subpopulation of SST-INs, the results reported here are entirely opposite to those of previous studies. The authors should, at a minimum, acknowledge these discrepancies and discuss potential reasons for the differing outcomes in the Discussion section.

      We thank the reviewer for pointing this out. It is by no means a neglect, but a careful balance discussing previous literature that can be fairly compared with our findings. It is becoming increasingly clear — with mounting evidence from modern transcriptomic and connectomic studies — that the canonical “three‑cardinal” interneuron populations (SST⁺, PV⁺, VIP⁺) represent oversimplified groupings that mask considerable heterogeneity. For example, in a comprehensive single-cell RNA‑sequencing (scRNA‑seq) study covering ~1.3 million cells from mouse cortex and hippocampus, the authors identified dozens of discrete GABAergic subtypes beyond the classical marker-defined classes, revealing continuous and graded variation in molecular identity across cortical and hippocampal regions (10.1016/j.cell.2021.04.021). Moreover, a recent study focusing on SST-expressing interneurons demonstrated that even within the SST class there are multiple subtypes with distinct laminar distributions, axonal projection patterns, and circuit connectivity — for instance, two different Martinotti subtypes vs. a non-Martinotti SST subtype targeting different pyramidal neuron types and dendritic compartments (10.1016/j.neuron.2023.05.032). Finally, developmental single‑cell transcriptomics shows that interneuron diversity is already apparent at early postmitotic stages, indicating that these subtypes are pre-specified rather than being mere activity‑dependent states (10.1038/s41467‑018‑07458‑1). These findings argue strongly that the traditional SST⁺ / PV⁺ / VIP⁺ classification, while useful as a coarse heuristic, fails to capture the rich diversity in molecular, morphological, and functional phenotypes that likely underlie distinct roles in circuit computation and behavior.

      The consequence of this is that studies using any of these three markers must be cautiously interpreted since in reality, several quite different neuronal populations are studied at once, especially if no efforts were made to tease out which of the participating populations (inside the “cardinal” population) contribute to the effects seen. Most likely, the reported results are based on a mixed population - in the worst case scenario - populations with opposite effects. In any case, we have now included the role of SST-INs in motor learning and M1 circuitry in the discussion section. We also respectfully disagree that our findings are the opposite of previous SST-IN studies. We show that increasing Ma2 excitability improved execution of an already learned movement, while 10.1038/nn.4049 showed that both activating (which is different from increasing excitability) and inhibiting SST-INs impaired the learning of a stereotyped movement. Similarly, 10.1016/j.neuron.2022.08.018 showed that increasing SST-INs excitability impairs motor learning, not execution of a previously learned movement. While we found that increasing excitability of Ma2 cells did not affect motor learning, note that the Ma2 are a subset of martinotti cells with homogeneous electrophysiological and morphological properties (10.1371/journal.pbio.2001392), and martinotti cells themselves are a subset of SST+ cells (10.1016/j.neuron.2023.05.032). The discussion has been updated to include this reasoning.

      (2a) Calcium imaging - The methodology for quantifying fluorescence changes is confusing and insufficiently described. The use of absolute dF values ("detrended by baseline subtraction," lines 565-567) for analyses that compare activity across cells and animals (e.g., Figure 1H) is highly unconventional and problematic. Calcium imaging is typically reported as dF/F0 or z-scores to account for large variations in baseline fluorescence (F0) due to differences in GCaMP expression, cell size, and imaging quality. Absolute dF values are uninterpretable without reference to baseline intensity - for example, a dF of 5 corresponds to a 100% change in a dim cell (F0 = 5) but only a 1% change in a bright cell (F0 = 500). This issue could confound all subsequent population-level analyses (e.g., mean or median activity) and across-group comparisons. Moreover, while some figures indicate that normalization was performed, the Methods section lacks any detailed description of how this normalization was implemented. The critical parameters used to define the baseline are also omitted. The authors should reprocess the imaging data using a standardized dF/F0 or z-score approach, explicitly define the baseline calculation procedure, and revise all related figures and statistical analyses accordingly.

      The calcium imaging used here is 1-photon microendoscopic video data. To our knowledge, it is not possible to extract the true cell baseline over time from 1-photon data, since the background component includes signals from multiple sources, and usually has fluctuations larger than the neural signal itself. We agree that absolute dF values cannot be compared across cells, and that is not what we report here. The CNMF-E algorithm outputs the temporal activity of each neuron with the background component already removed (10.7554/eLife.28728) and therefore the baseline subtraction used in our study is already standardized (10.7554/eLife.38173). Note that although it is common in the literature to record 1-photon data and perform similar preprocessing (some form of baseline subtraction and/or normalization by noise std), referring to the resulting trace as dF/F, that is not entirely correct, since true F0 extraction is not possible. We thus chose to refer to the resulting preprocessed traces as what they actually are - dF detrended (raw trace with estimated background components removed). However, we agree that a better description of the process would be helpful in our manuscript, and that the nomenclature might be confusing to readers. We therefore expanded the methods section to better explain that we will now refer to F0 as the background component (and refer to our resulting traces as dF/F) and explain how it was determined. We also updated the example traces in Figure 1E to now show the raw traces, the estimated background components and the detrended traces.

      (2b) Figure 1G - It is unclear why neural activity during successful trials is already lower one second before movement onset. Full traces with longer duration before and after movement onset should also be shown. Additionally, only data from "session 2 (learning)" and a single neuron are presented. The authors should present data across all sessions and multiple neurons to determine whether this observation is consistent and whether it depends on the stage of learning.

      We agree that it would be beneficial to show longer traces as an example of prehension-related activity, so we expanded Figure 1I to show a longer trace for a single neuron. We added to Supplemental Figure 2 plots showing longer traces from all sessions including all neurons for both genotypes.

      (2c) Figure 1H - The authors report that chemogenetic activation of Chrna2 cells induces differential changes in PyrN activity between successful and failed trials. However, one would expect that activating all Chrna2 cells would strongly suppress PyrN activity rather than amplifying the activity differences between trials. The authors should clarify the mechanism by which Chrna2 cell activation could exaggerate the divergence in PyrN responses between successful and failed trials. Perhaps, performing calcium imaging of Chrna2 cells themselves during successful versus failed trials would provide insight into their endogenous activity patterns and help interpret how their activation influences PyrN activity during successful and failed trials.

      The reviewer is correct to assume that increasing excitability of Ma2 cells would suppress PC activity. As shown in Supplemental Figure 2I, that is exactly what we observe when considering only non-prehension related activity. Thus, it is very interesting that the opposite effect is seen for prehension-related activity. Also, this finding perfectly aligns with our results from the assembly analysis showing that assembly activity is decreased within the prehension window compared to outside the prehension window. Unfortunately, imaging Ma2 cells would only add information to this study in understanding their influence on PCs if we image both populations simultaneously, which require equipment and reagents we do not currently have. Fortunately, however, the endogenous activity patterns of Ma2 cells and the direct connectivity between Ma2 and pyramidal cells was already previously investigated in detail (10.1371/journal.pbio.2001392), therefore we expanded the discussion to better explain that the differential changes in PC when increasing Ma2 excitability could be due to increased PC synchronization, since a single Ma2 connects to several PCs, and upon inhibition release all connected PCs fire synchronously.

      (2d) Figure 1H - Also, in general, the Cre+ (red) data points appear consistently higher in activity than the Cre- (black) points. This is counterintuitive, as activating Chrna2 cells should enhance inhibition and thereby reduce PyrN activity. The authors should clarify how Cre+ animals exhibit higher overall PyrN activity under a manipulation expected to suppress it. This discrepancy raises concerns about the interpretation of the chemogenetic activation effects and the underlying circuit logic.

      As explained above, increasing Ma2 excitability indeed decreased non-prehension related PC activity, and the proposed mechanism has been added to the discussion section. We also made

      clearer in the results section that we are referring to prehension-related PC activity, and emphasize that overall non-prehension related PC activity is decreased.

      (3) The statistical comparisons throughout the manuscript are confusing. In many cases, the authors appear to perform multiple comparisons only among the N, L, T, and R conditions within the WT group. However, the central goal of this study should be to assess differences between the WT and hM3D groups. In fact, it is unclear why the authors only provide p-values for some comparisons but not for the majority of the groups.

      We agree that a clearer description of the statistical analysis is warranted. We expanded the statistical analysis methods section to clarify, among other things, that all possible pairwise comparisons were performed and appropriately corrected for multiple comparisons, and only positive p-values are reported in the figures, therefore the absence of p-value for a comparison means that is not significant.

      (4a) Figure 4 - It is hard to understand why the authors introduce LFP experiments here, and the results are difficult to interpret in isolation. The authors should consider combining LFP recordings with calcium imaging (as in Figure 1) or, alternatively, repeating calcium imaging throughout the entire re-training period. This would provide a clearer link between circuit activity and behavior and strengthen the conclusions regarding Chrna2 cell function during re-training.

      Unfortunately, it is not possible in our setup to record calcium imaging and LFP simultaneously, since the implants needed for the miniscope occupy the entire space above the animal’s cranium. To record calcium imaging during the execution of learned movements is also impractical. If the animals were to be implanted before the training phase, the signal will likely be too degraded for recordings after the training sessions, since the miniscope signal quality decreases over time, and over successive miniscope attachments. If the animals were to be implanted between the training and retraining phase (as the LFP group), the gap between training and retraining would be even larger, at least 28 days (as opposed to 16 days for the LFP group), which would affect the performance in the task. Therefore, LFP recordings provide understanding of the higher-level changes happening in neural activity when excitation is increased in Ma2 cells during the execution of learned movements. We respectfully disagree that the results from the LFP group cannot be interpreted in isolation, since we found that mice with increased excitability of Ma2 cells display increased low theta and gamma power during the prehension movement. As discussed in the manuscript, the increased high gamma band power when Ma2 cells are overexcitable, particularly for the successful trials in the planning phase, suggest that Ma2 cells may have a role influencing theta and gamma oscillations during motor performance (lines 1348-1355).

      (4b) It is unclear why CLZ has no apparent effect in session 11, yet induces a large performance increase in sessions 12 and 13. Even then, the performance in sessions 12 and 13 (30 successful pellets) is roughly comparable to Session 5 in Figure 1F. Given this, it is questionable whether the authors can conclude that Chrna2 cell activation truly facilitates previously acquired motor skills?

      We understand that a source of confusion for the behavioral data in the LFP group was the absence of data from sessions 1-7, together with the missing explanation about the task changing from spoon to plate (as explained in answers to question 1a and 1b). Since the animals are getting pellets from the spoon in session 5 (easier) and from the plate in later sessions (harder), the fact that animals achieved the same performance in the plate as they had on the last spoon session indicates they relearned the movement. To further clarify the training development, we added the full set of sessions (1-13) to Supplemental Figure 7, indicating the spoon-to-plate switch after session 5 and the 16-days gap between sessions 7 and 8 (due to viral injection and electrodes implant surgeries).

      (5) Figure 5 - The authors report decreased performance in the pasta-handling task (presumably representing a newly learned skill) but observe no difference in the pellet-reaching task (presumably an already acquired skill). This appears to contradict the authors’ main claim that Chrna2 cell activation facilitates previously acquired motor skills.

      We respectfully disagree that the results for the pasta-handling conflict with the finding that increasing Ma2 excitability facilitates previously acquired movements. The pasta handling specifically measures forepaw dexterity (as outlined in lines 442-444), therefore assessing forelimb function unrelated to learning. Mice perform a set of stereotyped movements to manipulate the pasta, therefore no learning is required (note that animals were habituated to the arena, followed by a single test session, with no training sessions). We do specifically mention in the results section that "we used the pasta handling task to assess forepaw dexterity that does not require learning" (lines 1137-1139). Our findings support our reported conclusion that "Ma2 cells may have a role in orchestrating precise forelimb movements that do not require previous specific training" (lines 1154-1156).

      (6) Supplementary Figure 1 - The c-Fos staining appears unusually clean. Previous studies have shown that even in home-cage mice, there are substantial numbers of c-Fos+ cells in M1 under basal conditions (PMID 31901303, 31901303). Additionally, the authors should present Chrna2 cell labeling and c-Fos staining in separate channels. As currently shown, it is difficult to determine whether the c-Fos+ cells are truly Chrna2+ cells.

      Our c-Fos stain does work well after having improved this method in several of our projects. Unfortunately, we could not check the references mentioned in the comment, since it points to a study that did not mention c-Fos (maybe incorrect PMID code?). However, we found our images to have similar c-Fos levels in control as other studies (for example 10.3389/fnana.2014.00013 Figure 1A and 10.1109/TBME.2024.3401136 Supplemental Figure 2C). Thus, we do find background activity of c-Fos in both Cre+ and control mice, but the c-Fos stain appears clean because of the strong up-regulation and fluorescent signal in exogenously activated hM3Dq+ cells. Also, we noticed that the manuscript was missing a methods section for the c-Fos experiments, therefore we added a section detailing the hM3Dq activation validation (lines 487-498). Further, the figure now displays separate channels for hM3Dq + cells (magenta) and c-Fos (cyan) for better clarity.

      (7) Overall, the authors selectively report statistical comparisons only for findings that support their claims, while most other potentially informative comparisons are omitted. Complete and transparent reporting is necessary for proper interpretation of the data.

      As explained above (comment 3), we expanded the statistical description in the methods to explain that all possible pairwise comparisons were performed and appropriately corrected for multiple comparisons, and that omitted comparisons are non-significant.

      Reviewer #1 (Recommendations for the authors):

      (1) Figure legends - The authors should provide more detailed information in the figure legends, such as N values. It is also not explained what the bold bars, as well as the highest and lowest bars, represent. Clear labeling is essential for proper interpretation of the data.

      We revised all figure legends to add n-numbers for all quantification plots, and expanded the Statistical analysis methods section to explain the labeling of all quantifications.

      (2) Presentation of plots - The authors need to improve the clarity and completeness of their figure presentations. For example:

      (a) In Figure 1F, it is unclear whether the results were obtained under chemogenetic activation, as this information is missing from both the figure and the legend. Currently, it could be a comparison of Cre+ mice with Cre- mice without any manipulations.

      (b) In Figure 1H, p-values are reported, but it is not specified which groups are being compared. As mentioned above, why are p-values only given to some comparisons? Does that mean the others are not significant?

      (c) In Figure 1D, a scale bar should be provided.

      (d) In Figure 1E, the y-axis (fluorescence) scale should be clearly indicated.

      We thank the reviewer’s attention to the figure details. We added the missing scale bars for Figures 1D-E. We also clarified in the results section that all miniscope recordings were performed under clozapine treatment. As answered above (comments 3 and 7), we expanded the methods section to state that although all comparisons were made and appropriately corrected for multiple comparisons, only significant comparisons were reported. As for the groups being compared, every significance bar clearly connects two groups, which are the ones being compared. We also expanded the Statistical Analysis section to state that “Significance bars without ticks represent pairwise comparisons, while significance bars with downward ticks represent an effect.”.

      Reviewer #2 (Public review):

      The main limitation of the study lies in its small sample sizes and the absence of key control experiments, which substantially weaken the strength of the conclusions. Core findings of this paper, such as the lack of effect of Ma2 cell activation on motor learning, as well as the altered neuronal activity, rely on a sample size of n=3 mice per condition, which is likely underpowered to detect differences in behavior and contributes to the somewhat disconnected results on calcium activity, activity timing, and neuronal assembly activity.

      We understand that the source of confusion is the number of mice used for calcium imaging and the number of mice used for assessing the effect of Ma2 increased excitability in motor learning. The core finding that Ma2 increased excitability did not alter motor learning is supported by the data shown previously in Supplemental Figure 5 (now Figure 1F-H), with n=6 Cre+ and n=7 controls, which has enough statistical power to detect the effect of training session (F (3,33) = 9.254, power = 0.997) and should have enough power to detect the effect of group (estimated power of 0.835 for F(1,11)). The behavior performance of the miniscope-recorded mice was shown in the previous version for transparency, however no conclusion was drawn based on that data. To improve clarity, we now present data from the previous Supplemental Figure 5 as Figures 1F–H. This dataset clearly demonstrates that increased excitability of Ma2 cells did not affect motor learning. In addition, note that all quantification and conclusions drawn about neuronal activity are based on robust sample sizes: 1070 cells for controls and 403 for Chrna2-Cre+, or 70 assemblies for controls and 48 for Chrna2-Cre+. These sample sizes ensure sufficient statistical power, as demonstrated by the multiple significant effects and pairwise differences reported in our study. We reiterate that no underpowered tests were conducted in this study, and no conclusions were drawn on n = 3 controls and 3 Chrna2-Cre+ mice on behavioral outcomes.

      More comprehensive analyses and data presentation are also needed to substantiate the results. For example, examining calcium activity and behavioral performance on a trial-by-trial basis could clarify whether closely spaced reaching attempts influence baseline signals and skew interpretation.

      We agree and we performed a trial-by-trial analysis to verify the effect of adjacent prehensions in the trial signal. We found that only 17.7% of adjacent trials were affected by a previous trial. In addition we selected only trials not preceded by another trial for at least 6s, and evaluated whether activity immediately before the trial (-3 to -1s) is different from the activity long before the trial (-5 to -3s). The rationale is that if a trial would affect the baseline, then activity immediately before would be different from the activity long before the trial. In this analysis, we found no genotype- or session-related differences in baseline amplitude between epochs. Together these results confirm that prehension-related activity does not systematically alter non-prehension epochs. The results are shown in Supplemental Figure 3.

      The study uses cre-negative mice as controls for hM3Dq-mediated activation, which does not account for potential effects of Cre-dependent viral expression that occur only in Cre-positive mice. This important control would be necessary to substantiate the conclusion that it is increased Ma2 cell activity that drives the observed changes in behavior and cortical activity.

      Having a control group of Cre+ mice injected with cre-dependent vector control carrying, for example, only fluorescence, would add one more layer of certainty that the effects observed here are due to CLZ-induced hM3Dq activation. We do not agree, however, that it is necessary to confirm our findings. Cre-dependent expression alone was already extensively demonstrated to have no effect by comparing a DREADD activator to a vehicle treatment (for example 10.7554/eLife.38052, 10.1523/JNEUROSCI.0537-18.2018, 10.7554/eLife.67822). We also showed this for our LFP group (Figure 4), further confirming no effect of Cre-dependent hM3Dq expression alone.

      An unspecific effect of clozapine, where the treatment affects animals without the hM3Dq receptor, would be much more likely. We do control for this by giving the same treatment to Cre+ and Cre- mice. Moreover, since we use a low dose of clozapine, a lack of hM3Dq activation would be more likely, which we also controlled for with the c-Fos experiment as explained in the answer to the Minor point 1. Nevertheless, we added to the discussion that although we find it highly unlikely that the effects found here are due to Cre-dependent viral expression, we have not recorded Cre+ animals expressing control vectors instead of hM3Dq (lines 1360-1375).

      Reviewer #2 (Recommendations for the authors):

      Major points

      (1) One of the main findings in this paper is that Chrna2-Cre cell activation did not affect learning of the prehension task; however, the presented data do not convincingly support this claim. Looking at Fig.1F, Cre+ mice appear to have an overall lower number of successful prehensions compared to control mice. If this is not statistically significant, it is likely because n=3 mice for each group is underpowered. To better judge the behavior of these mice, it would be necessary to plot success rate and overall number of prehensions over the entire course of training, in addition to successes per minute. Given that n=3, plotting all individual data points would make more sense than showing a violin plot. Relatedly, in Supplemental Figure 5, there appears to be a clear effect on reduced success rates in Cre+ mice, which is stated in the figure legends, whereas the result section states: we found no effect of genotype on prehension success rates (lines 895-896). The authors should ensure that these behavior experiments are sufficiently powered to detect potential differences in learning between groups and present the complete data and statistical analysis.

      As explained on Comment 1, the finding that Ma2 increased excitability did not alter motor learning is not based on the data on the previous Figure 1F (n=3 Cre+ and n=3 controls, shown for transparency). Instead, it is supported by the data in the previous Supplemental Figure 5, now Figures 1F-H, with n=6 Cre+ and n=7 controls, for which we found only overall effects of training session, but no effect of genotype, with no significant post-hoc pairwise comparisons. We agree that plotting the success rate, total number of prehensions and successful prehensions per minute, for all 6 sessions, allows better evaluation of the mice behavior. We moved the Supplemental Figure 5 into Figure 1, plotting the three measures for the full set of sessions, with individual data points within the violin plots, and expanded the statistical results description on the main text. We reiterate that no underpowered tests were conducted in this study, and no conclusions were drawn on n = 3 controls and 3 Chrna2-Cre+ mice.

      (2) The authors mention that a significant fraction of prehension trials overlapped with a preceding prehension attempt. Were those attempts excluded from the analysis? The stark differences in calcium signals at baseline before prehension onset in some sessions (Figure 1G, Supplementary Figure 2D) suggest that trials preceding closely in time might play a role and could skew the analysis and interpretation.

      Overlapping trials were not excluded from the previous analysis. As summarized in our response to Comment 2, and expanded in the results section (lines 876-894), we found that only 17.7% of adjacent trials were affected by a previous trial, and that when selecting only trials not preceded by another trial for at least 6s, we found no effect of prehension-related activity in the baseline preceding the trials.

      (3) Relatedly, to test the differences in calcium activity before and after prehension onset, it would be clearer to use a delta F/F measure where the 1 second before onset is used as baseline.

      Since a large proportion of neurons are more active before the onset (on the movement planning phase, Figure 2C), the activity 1s before the movement onset cannot be considered as F0. Dividing the activity during the movement by the activity during the planning phase would generate a different measure, a form of execution/planning ratio. We performed this analysis as an additional measure and found a three-way interaction effect of genotype, session, and prehension accuracy, driven by genotype effects on early sessions, indicating that Ma2 activity might be involved in the planning/execution activity balance. Those results are now described in the results section and shown at the Supplemental Figure 4.

      (4) For the experiments in which mice were trained prior to Ma2 cell activation (Fig.4), the behavior in sessions 8-10 does not seem to have reached a plateau yet, and the increase in successful prehensions in sessions 11-13 of Cre+ mice could just be a continuation of training. It would be more convincing to show the original training curve of those mice in sessions 1-7. Additionally, the authors should perform a two-way ANOVA test for the interaction of drug and genotype, rather than two separate one-way ANOVAs.

      We agree, and we now show the curve for sessions 1-7 in Supplemental Figure 7, showing that the success ratio for sessions 8-10 is similar to session 7. Also, a 2-way ANOVA was already performed, although the full report was missing from the manuscript. We switched from successful prehensions per minute to success ratio (see Reviewer #1 comment 1a) and now include the full report, in which we found an overall effect of session, and when grouping by genotype, we found an effect for Cre+ but not control mice (lines 1065-1072).

      Minor points

      (1) The validation experiment for the efficacy of hM3Dq is somewhat confusing. It is surprising that the few hM3Dq-mCherry expressing cells in the cre-negative mice did not show increased c-Fos staining since non-specific leaky hM3Dq expression would presumably still lead to a functional DREADD. The better control for validating the efficacy of hM3Dq-mediated Chrna2-Cre cell activation would be to show c-Fos staining in Cre+ mice with or without clozapine injection. This would control for non-specific c-Fos expression and neuronal activation purely by expression of the DREADD. In cre-negative control mice, the comparison should also be between mice with and without clozapine injection to control for non-specific neuronal activation regardless of hM3Dq expression.

      We thank the reviewer for raising this point and agree that validation of hM3Dq efficacy and specificity requires careful interpretation. In principle, any hM3Dq-expressing cell, including the few hM3Dq-mCherry+ cells observed in Cre– mice, could respond to clozapine. However, in practice, effective DREADD activation depends on sufficient receptor expression levels and on the pharmacodynamics of clozapine in the brain (Gomez et al., 2017, Science, 10.1126/science.aan2475). In our dataset, even in Chrna2-Cre+ mice, only ~76% of hM3Dq+ cells showed c-Fos induction after clozapine, indicating that receptor expression and/or ligand access is not uniform across cells. Consistent with this, the very sparse and weak hM3Dq expression observed in Cre- mice resulted in only 0.8% of hM3Dq+ cells showing c-Fos induction, which is in line with previous reports demonstrating that low-level “leaky” expression is insufficient to drive neuronal activation (e.g. 10.1038/s41467-019-12236-z; 10.1523/JNEUROSCI.0537-18.2018; 10.1523/ENEURO.0363-21.2021).

      The reviewer also suggests that an ideal validation would compare Cre+ mice with and without clozapine to control for any c-Fos induction driven purely by DREADD expression. We agree that such a comparison is informative, and note that in our experiments the c-Fos assay was designed specifically to test whether the low clozapine dose used (0.01 mg/kg) is sufficient to activate hM3Dq in Ma2 cells, rather than to assay baseline effects of viral expression.

      Importantly, non-specific effects of clozapine itself were controlled for throughout the study by administering the same clozapine dose to both Chrna2-Cre+ and Cre– mice in all behavioral and physiological experiments. Thus, any clozapine-driven neuronal activation independent of hM3Dq would be expected to appear in both groups.

      Together, these results indicate that (i) the clozapine dose used is sufficient to robustly activate hM3Dq-expressing Ma2 cells, (ii) sparse leaky expression in Cre– mice is not sufficient to drive measurable activation, and (iii) the effects reported in the manuscript are unlikely to be explained by non-specific clozapine actions or by viral expression alone.

      (2) The authors state in the methods section that "only neurons that displayed a significant change comparing the before onset and after onset phases" were included in the analysis. This appears to bias the data towards neurons that change their activity with the prehension movement. If this is the intention, the authors should clearly state this and their rationale in the results section and show what proportion of recorded neurons fall into this category.

      Yes, thanks for pointing this out, the explanation for this exclusion criteria is missing. We expanded the methods section “Neural activity around prehensions” to explain that since we are evaluating the role of Ma2 cells in the prehension-related activity of pyramidal cells, we excluded neurons with no prehension-related activity. We also stated in the expanded text that 15.97% of recorded neurons were excluded due to no prehension-related activity.

      (3) I don’t understand the peak PC activity latency shown in Figure 2D. How is it possible that there are negative peak latencies during the prehension phase, which is defined as >0sec, (upper right panel), and positive peak latencies in the before prehension phase, which is defined as <0sec, (lower right panel)?

      As stated in lines 939-941 and in the figure 2C legend, neurons were sorted into "before prehension" or "during prehension" neurons according to their activity during the successful prehension. One of our main findings is that the pyramidal cells temporal patterns were strongly affected by prehension accuracy (lines 941-944) meaning that a significant number of neurons shifted prehension phases when performing a failed prehension (as illustrated in Figure 2C, note how the temporal pattern is not kept from successful to failed prehensions). That is why, for failed prehensions, there are negative latencies for neurons that were classified as "during prehension" and positive latencies for neurons classified as "before prehension" in successful trials. We expanded the sorting explanation in the results section (lines 944-950) to better highlight the latency change between different prehension accuracies.

      (4) Please specify how baseline subtraction (detrending) was performed for the calcium image analysis.

      We expanded the methods section “Neural signal extraction” to better explain that we will now refer to F0 as the background component (and refer to our resulting traces as dF/F) and explain how it was determined (lines 614-619).

      (5) The authors state that they found a "dissociation between changes in neural activity and performance outcomes". Since they only analyzed motor performance by quantifying successful prehensions, this statement should be caveated with the notion that other aspects of the behavior (e.g., trajectories/speed) could be affected but were not measured.

      We agree, and expanded the discussion section to acknowledge that we focussed the behavioral aspects to success ratio, and that other measures not investigated could also be affected (lines ????-????).

      (6) Are the differences in theta and gamma power specific to the prehension trials, or does Ma2 cell activation generally increase LFP activity in those bands?

      We thank the reviewer for the question, as we had not analyzed general LFP activity in the previous version. We performed the same analysis now including only LFP from epochs outside prehension windows across the full sessions. We found that Mα2 cell activation actually reduces LFP power across all bands specifically in Session 13 when no prehension is being performed. These findings are now included as Supplemental Figure 7.

      (7) Please define terms that might not be familiar to a typical reader in the field, such as "assemblies", when first introducing them in the text.

      We revised the introduction where we now define assemblies (lines 85-88).

      (8) Please specify the n-numbers for each figure throughout the manuscript. For example, in some figures, the number of trials or the number of neurons is used; however, it is not clear what this number is.

      We agree that although the n-numbers are stated in the text, it would be clearer to add them also to the figure legends. All figure legends now contain n-numbers for panels showing quantifications.

      (9) Relatedly, while the inclusion of supplemental tables with expanded statistical results is commendable, several statistical test details are missing, such as for Figure 5.

      We have fully revised the text to add any missing statistical details for the statements in the Supplemental Tables.

    1. Geoghegan, Bernard Dionysius. Code: From Information Theory to French Theory. Sign, Storage, Transmission. Duke University Press, 2023

      Confused by this going on this list and not the prior one

    1. Note d'Information : Priorités de la Protection de l’Enfance et Justice des Mineurs

      Synthèse de l'Exécutif

      Ce document synthétise les orientations stratégiques et les réformes engagées par le ministère de la Justice pour renforcer la protection de l’enfance et moderniser la justice des mineurs.

      Les points clés incluent :

      Urgence et Rapidité : Réduction des délais de jugement (passés de 18 mois à 8,7 mois en quatre ans) et création d'une ordonnance de protection provisoire permettant au procureur de statuer en 72 heures.

      Refonte du Placement : Fermeture des Centres Éducatifs Fermés (CEF) publics au profit des Unités de Placement de la Jeunesse et de l'Éducation (UJPE), mettant l'accent sur la continuité pédagogique (52 semaines/an).

      Moyens Humains Massifs : Création de 1 600 postes au ministère de la Justice, dont 50 nouveaux cabinets de juges des enfants en deux ans et 70 postes à la Protection Judiciaire de la Jeunesse (PJJ).

      Évolutions Législatives : Soutien à l'imprescriptibilité des crimes sexuels sur mineurs, à la présence obligatoire de l'avocat pour l'enfant, et volonté de réformer l'« excuse de minorité » pour les crimes les plus graves.

      Protection contre les Fléaux Modernes : Lutte contre la prostitution des mineurs (6 prostituées sur 10 sont mineures), interdiction des téléphones portables en centres de placement, et encadrement du protoxyde d'azote.

      --------------------------------------------------------------------------------

      1. Renforcement de la Protection des Enfants Victimes

      Urgence Judiciaire et Mesures de Sûreté

      L'accent est mis sur la nécessité d'une justice qui s'adapte au rythme de l'enfant.

      Ordonnance de protection provisoire : Un nouveau dispositif permet au procureur d'agir en 72 heures pour protéger immédiatement un mineur, avec des interdictions de contact et l'attribution provisoire du logement au parent protecteur.

      Le juge dispose ensuite de 8 jours pour être saisi et de 15 jours pour statuer.

      Loi du 18 mars 2024 : Prévoit le retrait automatique de l'autorité parentale pour les parents condamnés pour crime ou violence sexuelle sur leur enfant, ainsi que l'élargissement de la suspension de l'exercice de cette autorité dès la mise en examen.

      Accompagnement et Droits des Mineurs

      Avocat pour l'enfant : Soutien à la présence obligatoire d'un avocat en assistance éducative.

      Une expérimentation avec les barreaux est envisagée avant une généralisation législative.

      Unités d'Accueil Pédiatrique (UAPED) : Déploiement en cours sur tout le territoire pour améliorer le recueil de la parole et le soin des victimes.

      Chiens d'assistance judiciaire : Passage de 10 à une trentaine de chiens actuellement, avec un objectif de 100 chiens (un par département) d'ici un à deux ans pour apaiser les enfants lors des procédures.

      --------------------------------------------------------------------------------

      2. Réforme de la Justice Pénale des Mineurs

      Équilibre entre Sanction et Éducation

      La doctrine ministérielle refuse l'opposition entre ces deux concepts.

      La sanction comme acte éducatif : « La sanction fait partie de l'éducation. La sanction toute seule n'est pas un but en soi [...] et une éducation sans aucun interdit mène au n'importe quoi. »

      Efficacité du Code de la Justice Pénale des Mineurs (CJPM) : Les délais entre les faits et la sanction ont été divisés par deux en quatre ans (8,7 mois en 2024 contre 18 mois en 2020).

      Transformation des Structures de Placement

      Le constat sur les Centres Éducatifs Fermés (CEF) est jugé sévère : coût élevé (30 à 50 % de plus), taux de fugue identique aux centres classiques, et déshérence éducative (seulement 5 à 10 heures de cours par semaine).

      Création des UJPE : Ces nouvelles unités fusionnent les anciens foyers et les CEF pour garantir un parcours de reconstruction pédagogique.

      Recrutement de professeurs techniques : Réouverture d'un concours pour 40 professeurs dépendant directement du ministère de la Justice afin d'assurer 26 heures de cours par semaine, 52 semaines sur 52, y compris durant les vacances scolaires.

      Santé et Addictions : Recrutement de 60 infirmiers pour pallier les carences de soins psychiatriques et de prise en charge des addictions dans les centres de placement.

      --------------------------------------------------------------------------------

      3. Moyens et Organisation de la Justice

      Augmentation des Effectifs

      Le budget de la Justice permet une hausse inédite des moyens humains :

      Magistrature : Création de 50 cabinets de juges des enfants supplémentaires en deux ans (notamment à Bobigny, Cambrai, Alès).

      Actuellement, certains cabinets gèrent entre 400 et 500 dossiers.

      PJJ : Recréation de 70 postes, permettant de renforcer les effectifs là où ils baissaient depuis 20 ans (ex: Marseille, Île-de-France).

      Milieu Ouvert : Réaffectation de 150 éducateurs vers le milieu ouvert pour ramener la charge de travail à environ 23 dossiers par agent (contre 25 auparavant).

      Unité de Commandement

      Le système actuel est jugé trop fragmenté (plusieurs ministères concernés, compétences partagées avec les départements pour l'ASE).

      Une volonté de meilleure coordination, voire d'unité de responsabilité, est exprimée.

      --------------------------------------------------------------------------------

      4. Enjeux de Société et Nouvelles Menaces

      Violences Sexuelles et Imprescriptibilité

      Fin de la prescription : Avis favorable pour l'imprescriptibilité des crimes sexuels sur mineurs, ainsi que pour les crimes de sang (assassinats).

      Prostitution des mineurs : Un constat alarmant montre que 60 % des prostituées en France sont mineures.

      Des unités dédiées au sein de la PJJ sont opérationnelles depuis trois mois pour lutter contre ce fléau et les réseaux de proxénétisme.

      Sécurité Numérique et Addictions

      Interdiction des téléphones : La nouvelle circulaire de politique éducative et pénale impose l'interdiction des téléphones portables dans les chambres des centres de placement pour protéger les mineurs des prédations numériques (trafiquants, proxénètes).

      Protoxyde d'azote : Soutien à la pénalisation du transport et de l'achat en ligne (en dehors du cadre médical), alors que les intoxications ont triplé entre 2020 et 2023.

      Débats sur la Responsabilité Pénale

      Excuse de minorité : Position favorable à la fin de l'automatisme de l'atténuation de peine pour les crimes les plus graves (assassinats, tortures) commis par des mineurs de 13 à 15 ans.

      Cela nécessiterait une évolution constitutionnelle tout en préservant la spécialisation du jugement des mineurs.

      --------------------------------------------------------------------------------

      5. Données Clés et Statistiques

      | Indicateur | Donnée Source | | --- | --- | | Délai moyen de jugement (2020) | 18 mois | | Délai moyen de jugement (2024) | 8,7 mois | | Dossiers par cabinet de juge des enfants | 400 à 500 (moyenne) | | Proportion de mineurs parmi les prostitués | 60 % | | Nombre de mineurs à l'ASE | 400 000 (dont 200 000 placés) | | Heures de cours en CEF | < 10h/semaine (contre 26h en milieu classique) | | Placements chez des tiers de confiance | < 9 % (19 000 jeunes) |

      --------------------------------------------------------------------------------

      Citations Marquantes

      « L'enfant ne vit pas au rythme d'un dossier administratif ou d'un dossier judiciaire. [...] 4 mois pour un mineur c'est une vie. »

      « Nous devrions pouvoir en grande partie avoir honte de la façon dont on traite une partie de ces enfants notamment à l'aide sociale à l'enfance. »

      « Le placement doit protéger et pas rendre encore plus vulnérable. »

      « La sanction fait partie de l'éducation. [...] Une éducation sans jamais aucun interdit mène au n'importe quoi. »

    1. Reviewer #2 (Public review):

      Summary and strengths:

      The authors present a description of their online tool to estimate real-world performance of predictive models. The authors bring together different calculations to make better-informed implementation choices. It is a very nice tool to go from effect sizes to base rates to decision curve analysis. The paper describes the background and use of the tool with examples and seems like an extended version of their online how-to. The methods themselves are not new, but I think the tool will be valuable for researchers from different fields. Tools already exist for the conversion of effect sizes (my current favorite is https://www.escal.site/), but I haven't seen measurement noise being incorporated previously. The main benefit is the evaluation of performance under different real-world scenarios. Code is available on GitHub, and the manuscript is well-written.

      Weaknesses:

      While comprehensive explanation and examples are important for correct use of the tool, I don't really see the added value above their online how-to guide, as the software itself has already been published (Karvelis, P. and Diaconescu, A. O. (2025b). E2p simulator: An interactive tool for estimating real world predictive utility of research findings. Journal of Open Source Software, 10(114):8334.)

    1. One thing that stood out to me is how lists and dictionaries are used to model social media relationships, like users and who they follow. It’s interesting that something as complex as online networks is built from simple structures in code. This connects to earlier course ideas about how platforms shape interactions. It also makes me wonder how much these technical choices influence what users see online, since the way data is organized could affect visibility, recommendations, and even social behavior.

    1. Several studies we cite used an algorithm alone or an algorithmin combination with human assessment of gender to assign a code of male or female based on the sextypically associated with their first name, the pronouns their first name is typically associated, and theirgender presentation in pictures on institutional websites (e.g., Edwards et al.,2018;Murrayetal.,2018,2019;Walker et al.,2015;Westetal.,2013)

      I am fascinated that algorithm's are organizing based on names. Names are so symbolic as well as completely subjective.

    1. In contrast, the markedness model emphasizes that speakers use CS as a tool to present a certain persona; they exploit participants’ sense of the indexicality of each code (see indexicals) and of the contrast between the social import of codes in a given context (Myers-Scotton 1993 inter alia). Some analysts, such as B. Rampton, C. Stroud, and J. Gafaranga, emphasize CS as exemplifying the speaker’s creative agency.

      When code switching happens, the speaker typically has this creative agency

    1. his cross-linguistic observation affords the plausible assumption that the syntax of code-switched constructions strives toward well-formedness, i.e., when the guest (embedded) constituents are mixed into the host (matrix) language, the syntax operates to optimize well-formedness. In other words, when the guest items are introduced to the host language, certain adjustments follow, naturally, since items (words, phrases) from one language with one set of well-formedness conditions move to a language with another set of well-formedness conditions.
    2. This entry focuses on research that deals with the structural design of code-switching, the knowledge and ability underlying bilinguals' use of two languages within a sentence. This ability known variously as ‘code-mixing’ (see Code-mixing), or ‘intra-senlential code-switching’

      Code switching is also called code mixing and intrasenlential code switching

    1. A denialof the existence of such a code can be viewed as a denial ofthe culture that sustains it

      If gone about incorrectly, someone can be accused of hate because they try to teach Standard English

    2. Some of these studentsare able to switch codes, using Standard English whennecessary. Others are unable to do this effectively and, as aresult, suffer the negative academic and social consequencesof using nonstandard English in settings where StandardEnglish is required

      Being able to code-switch is necessary in the classroom

  3. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. I took a look at the “Programming paradigm” source from this bibliography section. This source helped me understand that programming isn’t just one way of writing code, but actually includes multiple “paradigms” that shape how problems are approached. What stood out to me is how coding languages like Python or Java can support more than one paradigm, which means there isn’t just one correct way to structure a program. This makes me think about how flexibility in programming reflects real world problem solving, where there is rarely just one way to solve an issue. The rest of the source basically explains different types of programming paradigms and how they classify and organize different programming languages.

    2. Programming paradigm. July 2023. Page Version ID: 1167849453. URL: https://en.wikipedia.org/w/index.php?title=Programming_paradigm&oldid=1167849453 (visited on

      I think programming paradigms are less about strict categories and more about different ways of thinking through problems. Wikipedia frames them as high level approaches to structuring programs, like imperative or declarative styles, but in the context of bots on social media, this feels especially relevant. Bots aren’t just code,they reflect choices about control, autoation, and interaction. For example, a reactive or rule based paradigm directly shapes how bots respond to users. Ethically, that means the paradigm itself can embed biases or power dynamics, which makes how we code inseparable from how bots behave online.

    1. The Git Commands I Run Before Reading Any Code

      Commands for: - What Changes the Most - Who Built This - Where Do Bugs Cluster - Is This Project Accelerating or Dying - How Often Is the Team Firefighting

    1. eed, as we read anddiscussed our letters, we could find nothing thatwould be considered nonstandard by the readers ofthese letters, despite the frequent inclusion of otherlanguages to make a point. Some students used dou-ble negatives despite the knowledge that it waswrong in a more formal setting. Again, it was clearthat the inclusion of a double negative was integralto the point being made. At the same time, manystudents had to stop and think when I asked them ifsuch code switching would be more acceptable in aformal setting if the writer was wealthy and power-ful. "The rich can write anyway they want," said onestudent after a few moments of silen

      code switching in this assignment was seen as okay by all students

    2. As other letters were read and critiqued, moreof this context-driven code switching was unearthed.Ernestina referred to her boyfriend as her queridorather than use the English equivalent of "honey" or"sweetheart." Again, the code switch was an exampleof the mercurial, contextual character of language andthe way it is manipulated to advance the nuances ofour feelings. As with Illeana before, Ernestina knewof English words to express her affection but wantedthe letter to capture the special spirit of a note to herboyfriend. Many students, in turn, recognized theunique circumstances that permit one to code switchand the ways that audience helps define the conceptof correctness. "These letters wouldn't be real if theydidn't use certain Spanish words," suggested one stu-dent as we reviewed the d

      more about code switching and using Spanish and english words together.

    3. as it wrong or incorrect, for example, forIlleana to tell her father that she wished to stopbeing approached by his friend who is antipatico?As the class read and discussed her letter they werequick to recognize the appropriateness of the use ofSpanish in this context. "There isn't a word in En-glish that describes that feeling," she said in de-fending her use of the Spanish alternative. Indeed,in the context in which she was writing-and withher father as her audience-code switching was"standard" and correct for the letter she was writ-ing. Such stylistic switching, argues GuadalupeE MarcH 2001Valdes, "occurs not because speakers lack an equiv-alent in one of their languages, but because theywish to convey a precise meaning" (127). This prac-tice, she later contends, is a "sign of strength ratherthan weakness" (127) in using language. Clearly, thiswas the case for Illeana, who knew of an English al-ternative but who aspired to make the letter con-gruent with the affection she felt for he

      for people who are from different backgrounds sometimes there is not an English word they can use to describe what they are feeling so then using their native language is important.

    4. ne week later, students presented their sec-ond drafts and began to delve into the dynamics ofcorrectness and context. Illeana's essay began with aplea to her father that he stop working so hard andconsider his family, who seemed to be growing apartfrom him as he worked constantly. In reading hershort, one page missive, it was interesting to note theuses of language that represented a clear divergencefrom what most English speakers would considerstandard English--even for a letter to a family mem-ber. "I want you to work less and spend more timewith your family, Papi," she wrote. As she read on,what was most fascinating about her letter was herability to code switch or use both English and Span-ish as a way to communicate with her father more ef-fectivel

      students are working on their essays being correct. one student code switches in hers.

    1. arlos and his peers know a lot about language; their knowledgeis reflected in comments such as the one above. They know thereare many varieties of English in their speech community: "Whitesthey speak different from blacks" (Mario, a nonnative speaker ofEnglish). They understand that speakers may vary their Englishaccording to setting and interlocutor: "In the class, we have to speaknice you know, but not on the street... when some people in thestreet talk bad, you have to speak bad to him" (Luis, a nonnativespeaker of English). Some even suggest that they have madechoices about which variety they want to speak as their secondlanguage: "I speak like white Americans. That's a choice" (Paul, anonnative speake

      Carlos and his peers are aware of different forms of English that are spoken and use code switching.

    1. When one of us ran the program, who made those posts (me? you? the bot?)?

      I personally believe that the person who made the post is the person who ran it. I don't think that the bot made the code but is closer to a tool that can be used like a shovel. I also think that the person who made the post is simply the person who created the tool to make the post.

    1. Note de Synthèse : Analyse des Mécanismes et du Traitement Judiciaire des Violences Intrafamiliales

      Résumé Exécutif

      Ce document analyse les dynamiques systémiques des violences intrafamiliales telles qu'observées lors des audiences judiciaires.

      Les points clés révèlent que la violence n'est pas un incident isolé mais un système de domination fondé sur le contrôle coercitif et un sentiment d'appropriation de l'autre (« tu m'appartiens »).

      Les auteurs de violences utilisent des mécanismes de défense récurrents : minimisation, déni, inversion de la culpabilité et décrédibilisation de la victime.

      Le passage à l'acte ultime, le féminicide, survient souvent lorsque la victime tente de s'extraire de ce contrôle (séparation, grossesse).

      Face à cet héritage historique et culturel de domination masculine, l'institution judiciaire évolue vers une approche plus spécialisée.

      Le rôle du juge est désormais de décrypter ces mécanismes, de nommer précisément les faits et de corriger les inégalités systémiques pour interrompre le cycle intergénérationnel de la violence.

      --------------------------------------------------------------------------------

      I. La Logique de Domination et le Contrôle Coercitif

      Les violences intrafamiliales s'inscrivent dans un schéma de comportement structuré visant à instaurer un climat de captivité au sein du foyer.

      • Le sentiment de propriété : L'auteur considère la victime comme un objet lui appartenant.

      Cette logique se traduit par des expressions telles que « je t'aime, je vais te tuer, tu m'appartiens ».

      • Le contrôle coercitif : Ce mécanisme consiste en une micro-régulation de la vie quotidienne de la victime.

      Il inclut :

      • La surveillance des communications (lectures de SMS, contrôle des réseaux sociaux).

      • La restriction des mouvements et des sorties.

      • Le contrôle des relations sociales.

      • Le climat de peur : L'usage de menaces de mort (« Je te crève », « Je vous crève tous ») vise à maintenir l'entourage dans un état de soumission et de terreur constante.

      II. Rhétorique et Mécanismes de Défense des Auteurs

      L'analyse des audiences met en lumière des stratégies discursives systématiques employées par les prévenus pour échapper à leur responsabilité.

      1. La minimisation et l'euphémisation

      Les auteurs présentent souvent les actes de violence comme des accidents ou des erreurs de calcul :

      • Utilisation de termes comme « manque de peau » ou « mal calculé la distance » pour justifier un coup de tête ou une dégradation matérielle.

      • Substitution de termes violents par des mots atténuants (ex: parler d'un crachat qu'on « amène » comme s'il s'agissait d'un cadeau).

      • Distinction entre être « colérique » et être « violent ».

      2. L'inversion de la culpabilité

      Les auteurs tentent de justifier leurs actes par le comportement de la victime :

      • La violence est présentée comme une réponse à des « provocations ».

      • L'argument du « il n'y a pas de fumée sans feu » est utilisé pour rejeter la responsabilité sur la victime.

      • L'auteur se présente parfois comme la véritable victime, poussée à bout.

      3. La décrédibilisation de la victime

      Une tactique fréquente consiste à faire passer la victime pour « folle », « menteuse » ou « hystérique » afin d'invalider sa parole devant le tribunal.

      III. Le Cycle de la Violence et les Facteurs de Risque

      La violence intrafamiliale suit une trajectoire de progressivité qui peut mener au féminicide.

      | Stade | Caractéristiques | | --- | --- | | Enfermement | Mise en place silencieuse du contrôle coercitif, parfois sans violence physique préalable. | | Escalade | Augmentation graduelle de la gravité des sanctions : menaces, puis violences physiques. | | Passage à l'acte | Souvent déclenché par une rupture du contrôle (annonce du divorce, séparation). |

      Facteurs aggravants du risque de féminicide :

      • La volonté de départ : Quand la victime s'échappe, l'homme peut préférer « briser son jouet » plutôt que de perdre le contrôle.

      • La grossesse : Perçue comme une intrusion dans la relation fusionnelle au profit de l'enfant, menaçant l'exclusivité de la possession.

      • La jalousie : La découverte d'une tierce personne, même potentielle, déclenche une réaction de destruction.

      IV. Dimensions Sociétales et Culturelles

      Le document souligne que ces violences ne sont pas des faits isolés mais s'adossent à une structure historique.

      • Héritage juridique : Jusqu'au XIXe siècle (Code Napoléon), l'homme avait un « droit de correction » légitime sur sa femme et ses enfants.

      Ce reliquat historique influence encore les consciences actuelles.

      • Responsabilité de la culture populaire : La fiction (séries, films) participe souvent à une « culture du féminicide » en banalisant les corps de femmes violentées ou en romantisant le crime sous l'appellation de « crime passionnel ».

      • Réalité du crime : Le document insiste sur le fait qu'« on ne tue pas par amour ».

      Le terme « passionnel » occulte la réalité criminelle de l'acte.

      • Transmission intergénérationnelle : Les enfants témoins de violences sont marqués.

      Si les garçons ont tendance à reproduire le schéma d'agresseur, les filles ont tendance à reproduire un schéma de victimation.

      V. L'Office du Juge : Vers une Justice de Qualité

      Le rôle des magistrats évolue pour mieux répondre aux enjeux des violences de genre.

      • Décryptage des mécanismes : Le juge doit être capable d'identifier le contrôle coercitif et de ne pas se laisser abuser par la rhétorique de l'auteur.

      • Manifestation de la vérité : Il s'agit de renommer les faits avec précision et de remettre la culpabilité du côté de l'auteur, indépendamment du comportement de la victime.

      • Analyse systémique : Le juge doit se poser trois questions fondamentales pour corriger les inégalités de genre :

        • Mon jugement est-il le fruit d'une inégalité systémique ?
      • Ma façon de parler à la victime peut-elle aggraver cette inégalité (ex: lui demander pourquoi elle n'est pas partie) ?

      • Ma décision peut-elle corriger une inégalité systémique ?

      VI. Ressources et Dispositifs d'Aide

      Il est impératif de ne pas rester isolé face à ces situations. Des outils concrets existent pour les victimes et les témoins :

      • 3919 : Numéro national de référence pour les violences faites aux femmes (anonyme et gratuit).

      • 119 : Numéro dédié à la protection de l'enfance.

      • Services de police et gendarmerie : Accessibles 24h/24 pour le dépôt de plainte, constituant une réponse concrète et immédiate.

      • Lieux d'écoute : Espaces de conseils pour ceux qui craignent d'être victimes ou qui souhaitent anticiper un processus judiciaire perçu comme impressionnant.

  4. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. Zero-based numbering. September 2023. Page Version ID: 1176111995. URL: https://en.wikipedia.org/w/index.php?title=Zero-based_numbering&oldid=1176111995#Origin (visited on 2023-11-24).

      I remember learning about zero-based numbering in my intro to Java class, and it was really confusing at first because I was used to counting starting at 1. When I started using arrays, I kept making mistakes by trying to access the first element with index 1 instead of 0. After practicing more, I realized that starting at 0 actually makes loops and counting work more smoothly in code. It also helped me understand how computers keep track of positions in memory. Even though it was frustrating at first, it ended up making more sense the more I used it in the context of this

    1. There are several ways computer programs are involved with social media. One of them is a “bot,” a computer program that acts through a social media account. There are other ways of programming with social media that we won’t consider a bot (and we will cover these at various points as well): The social media platform itself is run with computer programs, such as recommendation algorithms (chapter 12). Various groups want to gather data from social media, such as advertisers and scientists. This data is gathered and analyzed with computer programs, which we will not consider bots, but will cover later, such as in Chapter 8: Data Mining. Bots, on the other hand, will do actions through social media accounts and can appear to be like any other user. The bot might be the only thing posting to the account, or human users might sometimes use a bot to post for them. Note that sometimes people use “bots” to mean inauthentically run accounts, such as those run by actual humans, but are paid to post things like advertisements or political content. We will not consider those to be bots, since they aren’t run by a computer. Though we might consider these to be run by “human computers” who are following the instructions given to them, such as in a click farm:

      I find it interesting how the definition of a "bot" depends on whether the actions are automated by code or done by humans following instructions. In addition to that, the idea that click farms are like human computers blurs the line between automation and human behavior for me. For example, if both bots and click farms can manipulate engagement or spread information, should we treat them differently just because one uses code and the other uses people?

    1. When scientists wanted these human computers to do a task for them, they would give these human computers instructions for what they wanted calculated. These instructions were given in a regular human language (like English), and in math notation. Then the human computers would send back the results of whatever calculation they had been asked to perform. But human computers were eventually replaced by electronic computers, and communication with electronic computers was not straightforward.

      I realized that computing has always been about communication, rather than just calculation. Even before machines, people had to give clear instructions to human computers, which is similar to how we write code today. What specifically stood out to me was how this work was often done by women, but their contributions are rarely recognized in discussions about technology.

    1. On 2025-09-02 12:08:41, user Constant VINATIER wrote:

      Feedbacks about your preprint : https://doi.org/10.1101/2025.08.11.669624

      About registration: <br /> We could not find any information about the pre-registration of your study in the pre-print. Pre-registration involves documenting the hypotheses, methods, and/or analyses of a scientific study prior to its conduct (10.1073/pnas.1708274114; 10.1038/s41562-021-01269-4). If your study was pre-registered, we strongly encourage you to include the registration number in the pre-print, ideally in the abstract make this important information easy to retrieve, as this practice enhances transparency and reproducibility. If the study was not pre-registered, this should be acknowledged as a limitation. For future studies, we recommend pre-registering on an appropriate repository.<br /> About Protocol Sharing: <br /> We did not find the protocol for your study. If you have one, we encourage you to share it as supplementary material or deposit it in a publicly available repository such as the Open Science Framework ( https://osf.io ) or Zenodo ( https://zenodo.org/) "https://zenodo.org/)") . You can then include a statement in the Methods section indicating that your protocol is openly available (e.g., 'The protocol for this study is available at (link)/ in the supplementary'). Sharing your protocol will help readers better understand your study and enable them to reproduce it if they wish to test it.<br /> About the Statistical Analysis Plan Sharing: <br /> We did not find the Statistical Analysis Plan (SAP) for your study. If you have one, we encourage you to share it as supplementary material or deposit it in a repository such as the Open Science Framework ( https://osf.io ). You can then include a statement in the Methods section indicating that your protocol is openly available (e.g., 'The SAP for this study is available at (link)/ in the supplementary'). Sharing your SAP will help readers better understand your study and enable them to reproduce it if they wish to test it.<br /> About Deviations and/or changes<br /> We could not find any information about potential deviations or changes to the protocol in your pre-print. Since such deviations are common, if this applies to your study, we strongly encourage you to include a subsection titled Changes to the Initial Protocol in the Methods' section and discuss these changes as a potential limitation of your results. If any deviations occurred during your study, please specify them in this new subsection.<br /> About Data sharing / FAIR Data<br /> We found insufficient information about your data sharing approach. Data should be findable, i.e. data are to assigned a globally unique and persistent identifier (for instance there is a DOI assigned to the dataset, or data are registered or indexed in a searchable resource). Data should also be accessible, i.e. data are retrievable by their identifier and can be accessed following an open, free, and universally implementable protocol. As your data id not sensitive data, we encourage you to share it openly on a data sharing repository (Dryad, etc.) and include the Digital Object Identifier (DOI) in the Methods section. If you want more information about good practices of data sharing, visit https://www.go-fair.org/ <br /> About Code sharing<br /> We could not find any information about your (statistical) code. Sharing code is important for enhancing transparency and reproducibility, especially since it does not contain sensitive information. We encourage you to openly share it on a code sharing platform (Github, Codepen, CodShare, etc.) and include the Digital Object Identifier (DOI) in the Methods section. If you want more information about Code sharing https://fair-software.nl/ .

    1. On 2025-09-02 12:07:37, user Constant VINATIER wrote:

      Feedbacks about your preprint : https://doi.org/10.1101/2025.08.11.669634

      About registration: <br /> We could not find any information about the pre-registration of your study in the pre-print. Pre-registration involves documenting the hypotheses, methods, and/or analyses of a scientific study prior to its conduct (10.1073/pnas.1708274114; 10.1038/s41562-021-01269-4). If your study was pre-registered, we strongly encourage you to include the registration number in the pre-print, ideally in the abstract make this important information easy to retrieve, as this practice enhances transparency and reproducibility. If the study was not pre-registered, this should be acknowledged as a limitation. For future studies, we recommend pre-registering on an appropriate repository.<br /> About Protocol Sharing: <br /> We did not find the protocol for your study. If you have one, we encourage you to share it as supplementary material or deposit it in a publicly available repository such as the Open Science Framework ( https://osf.io ) or Zenodo ( https://zenodo.org/) "https://zenodo.org/)") . You can then include a statement in the Methods section indicating that your protocol is openly available (e.g., 'The protocol for this study is available at (link)/ in the supplementary'). Sharing your protocol will help readers better understand your study and enable them to reproduce it if they wish to test it.<br /> About the Statistical Analysis Plan Sharing: <br /> We did not find the Statistical Analysis Plan (SAP) for your study. If you have one, we encourage you to share it as supplementary material or deposit it in a repository such as the Open Science Framework ( https://osf.io ). You can then include a statement in the Methods section indicating that your protocol is openly available (e.g., 'The SAP for this study is available at (link)/ in the supplementary'). Sharing your SAP will help readers better understand your study and enable them to reproduce it if they wish to test it.<br /> About Deviations and/or changes<br /> We could not find any information about potential deviations or changes to the protocol in your pre-print. Since such deviations are common, if this applies to your study, we strongly encourage you to include a subsection titled Changes to the Initial Protocol in the Methods' section and discuss these changes as a potential limitation of your results. If any deviations occurred during your study, please specify them in this new subsection.<br /> About Data sharing / FAIR Data<br /> We found insufficient information about your data sharing approach. Data should be findable, i.e. data are to assigned a globally unique and persistent identifier (for instance there is a DOI assigned to the dataset, or data are registered or indexed in a searchable resource). Data should also be accessible, i.e. data are retrievable by their identifier and can be accessed following an open, free, and universally implementable protocol. As your data id not sensitive data, we encourage you to share it openly on a data sharing repository (Dryad, etc.) and include the Digital Object Identifier (DOI) in the Methods section. If you want more information about good practices of data sharing, visit https://www.go-fair.org/ <br /> About Code sharing<br /> We could not find any information about your (statistical) code. Sharing code is important for enhancing transparency and reproducibility, especially since it does not contain sensitive information. We encourage you to openly share it on a code sharing platform (Github, Codepen, CodShare, etc.) and include the Digital Object Identifier (DOI) in the Methods section. If you want more information about Code sharing https://fair-software.nl/ .

    1. On 2025-08-26 09:35:41, user Constant VINATIER wrote:

      Feedbacks about your preprint : https://doi.org/10.1101/2025.08.13.669948

      About registration: <br /> We could not find any information about the pre-registration of your study in the pre-print. Pre-registration involves documenting the hypotheses, methods, and/or analyses of a scientific study prior to its conduct (10.1073/pnas.1708274114; 10.1038/s41562-021-01269-4). If your study was pre-registered, we strongly encourage you to include the registration number in the pre-print, ideally in the abstract make this important information easy to retrieve, as this practice enhances transparency and reproducibility. If the study was not pre-registered, this should be acknowledged as a limitation. For future studies, we recommend pre-registering on an appropriate repository.<br /> About Protocol Sharing: <br /> We did not find the protocol for your study. If you have one, we encourage you to share it as supplementary material or deposit it in a publicly available repository such as the Open Science Framework ( https://osf.io ) or Zenodo ( https://zenodo.org/) "https://zenodo.org/)") . You can then include a statement in the Methods section indicating that your protocol is openly available (e.g., 'The protocol for this study is available at (link)/ in the supplementary'). Sharing your protocol will help readers better understand your study and enable them to reproduce it if they wish to test it.<br /> About the Statistical Analysis Plan Sharing: <br /> We did not find the Statistical Analysis Plan (SAP) for your study. If you have one, we encourage you to share it as supplementary material or deposit it in a repository such as the Open Science Framework ( https://osf.io ). You can then include a statement in the Methods section indicating that your protocol is openly available (e.g., 'The SAP for this study is available at (link)/ in the supplementary'). Sharing your SAP will help readers better understand your study and enable them to reproduce it if they wish to test it.<br /> About Deviations and/or changes<br /> We could not find any information about potential deviations or changes to the protocol in your pre-print. Since such deviations are common, if this applies to your study, we strongly encourage you to include a subsection titled Changes to the Initial Protocol in the Methods' section and discuss these changes as a potential limitation of your results. If any deviations occurred during your study, please specify them in this new subsection.<br /> About Data sharing / FAIR Data<br /> While we could access your data in OSF, we could not find any DOI. Sharing data is important for enhancing transparency and reproducibility. We encourage you to share it on a data sharing repository provided the data is not sensitive (Dryad, etc.) and include the Digital Object Identifier (DOI) in the Methods section.If you want more information about data sharing https://www.go-fair.org/ <br /> About Code sharing<br /> While we could access your code [interventioncontro_arm_1][code_location], we could not find any DOI. Sharing code is important for enhancing transparency and reproducibility, especially since it does not contain sensitive information. We encourage you to openly share it on a code sharing platform (Github, Codepen, CodShare, etc.) and include the Digital Object Identifier (DOI) in the Methods section. If you want more information about Code sharing https://fair-software.nl/ <br /> Comments :<br /> During the evaluation of your preprint, I noticed the presence of a post hoc analysis. I recommend creating a dedicated section that clearly describes any protocol deviations or changes from the initial plan, in order to enhance transparency and clarity.

    1. On 2025-08-26 09:34:49, user Constant VINATIER wrote:

      Feedbacks about your preprint : https://doi.org/10.1101/2025.08.08.669288

      About registration: <br /> We could not find any information about the pre-registration of your study in the pre-print. Pre-registration involves documenting the hypotheses, methods, and/or analyses of a scientific study prior to its conduct (10.1073/pnas.1708274114; 10.1038/s41562-021-01269-4). If your study was pre-registered, we strongly encourage you to include the registration number in the pre-print, ideally in the abstract make this important information easy to retrieve, as this practice enhances transparency and reproducibility. If the study was not pre-registered, this should be acknowledged as a limitation. For future studies, we recommend pre-registering on an appropriate repository.<br /> About Protocol Sharing: <br /> We did not find the protocol for your study. If you have one, we encourage you to share it as supplementary material or deposit it in a publicly available repository such as the Open Science Framework ( https://osf.io ) or Zenodo ( https://zenodo.org/) "https://zenodo.org/)") . You can then include a statement in the Methods section indicating that your protocol is openly available (e.g., 'The protocol for this study is available at (link)/ in the supplementary'). Sharing your protocol will help readers better understand your study and enable them to reproduce it if they wish to test it.<br /> About the Statistical Analysis Plan Sharing: <br /> We did not find the Statistical Analysis Plan (SAP) for your study. If you have one, we encourage you to share it as supplementary material or deposit it in a repository such as the Open Science Framework ( https://osf.io ). You can then include a statement in the Methods section indicating that your protocol is openly available (e.g., 'The SAP for this study is available at (link)/ in the supplementary'). Sharing your SAP will help readers better understand your study and enable them to reproduce it if they wish to test it.<br /> About Deviations and/or changes<br /> We could not find any information about potential deviations or changes to the protocol in your pre-print. Since such deviations are common, if this applies to your study, we strongly encourage you to include a subsection titled Changes to the Initial Protocol in the Methods' section and discuss these changes as a potential limitation of your results. If any deviations occurred during your study, please specify them in this new subsection.<br /> About Data sharing / FAIR Data<br /> While we could access your data in OSF, we could not find any DOI. Sharing data is important for enhancing transparency and reproducibility. We encourage you to share it on a data sharing repository provided the data is not sensitive (Dryad, etc.) and include the Digital Object Identifier (DOI) in the Methods section.If you want more information about data sharing https://www.go-fair.org/ <br /> About Code sharing<br /> While we could access your code [interventioncontro_arm_1][code_location], we could not find any DOI. Sharing code is important for enhancing transparency and reproducibility, especially since it does not contain sensitive information. We encourage you to openly share it on a code sharing platform (Github, Codepen, CodShare, etc.) and include the Digital Object Identifier (DOI) in the Methods section. If you want more information about Code sharing https://fair-software.nl/ <br /> Comments :<br /> Dear Author,<br /> Thank you for contributing to open science by sharing your data. To further strengthen the impact and sustainability of your work, I encourage you to:<br /> Assign a Digital Object Identifier (DOI) to your dataset, to ensure long-term accessibility and reliability of the link.<br /> Share your metadata, to facilitate the interpretation and reuse of your data.<br /> Share your study protocol, to help other researchers better understand and reproduce your work.

    1. On 2025-08-26 09:32:02, user Constant VINATIER wrote:

      Feedbacks about your preprint : https://doi.org/10.1101/2025.08.03.668152

      About registration: <br /> We could not find any information about the pre-registration of your study in the pre-print. Pre-registration involves documenting the hypotheses, methods, and/or analyses of a scientific study prior to its conduct (10.1073/pnas.1708274114; 10.1038/s41562-021-01269-4). If your study was pre-registered, we strongly encourage you to include the registration number in the pre-print, ideally in the abstract make this important information easy to retrieve, as this practice enhances transparency and reproducibility. If the study was not pre-registered, this should be acknowledged as a limitation. For future studies, we recommend pre-registering on an appropriate repository.<br /> About Protocol Sharing: <br /> We did not find the protocol for your study. If you have one, we encourage you to share it as supplementary material or deposit it in a publicly available repository such as the Open Science Framework ( https://osf.io ) or Zenodo ( https://zenodo.org/) "https://zenodo.org/)") . You can then include a statement in the Methods section indicating that your protocol is openly available (e.g., 'The protocol for this study is available at (link)/ in the supplementary'). Sharing your protocol will help readers better understand your study and enable them to reproduce it if they wish to test it.<br /> About the Statistical Analysis Plan Sharing: <br /> We did not find the Statistical Analysis Plan (SAP) for your study. If you have one, we encourage you to share it as supplementary material or deposit it in a repository such as the Open Science Framework ( https://osf.io ). You can then include a statement in the Methods section indicating that your protocol is openly available (e.g., 'The SAP for this study is available at (link)/ in the supplementary'). Sharing your SAP will help readers better understand your study and enable them to reproduce it if they wish to test it.<br /> About Deviations and/or changes<br /> We could not find any information about potential deviations or changes to the protocol in your pre-print. Since such deviations are common, if this applies to your study, we strongly encourage you to include a subsection titled Changes to the Initial Protocol in the Methods' section and discuss these changes as a potential limitation of your results. If any deviations occurred during your study, please specify them in this new subsection.<br /> About Data sharing / FAIR Data<br /> We found insufficient information about your data sharing approach. Data should be findable, i.e. data are to assigned a globally unique and persistent identifier (for instance there is a DOI assigned to the dataset, or data are registered or indexed in a searchable resource). Data should also be accessible, i.e. data are retrievable by their identifier and can be accessed following an open, free, and universally implementable protocol. As your data id not sensitive data, we encourage you to share it openly on a data sharing repository (Dryad, etc.) and include the Digital Object Identifier (DOI) in the Methods section. If you want more information about good practices of data sharing, visit https://www.go-fair.org/ <br /> About Code sharing<br /> We could not find any information about your (statistical) code. Sharing code is important for enhancing transparency and reproducibility, especially since it does not contain sensitive information. We encourage you to openly share it on a code sharing platform (Github, Codepen, CodShare, etc.) and include the Digital Object Identifier (DOI) in the Methods section. If you want more information about Code sharing https://fair-software.nl/ .

    1. On 2025-08-26 09:31:03, user Constant VINATIER wrote:

      Feedbacks about your preprint : https://doi.org/10.1101/2025.08.02.668124

      About registration: <br /> We could not find any information about the pre-registration of your study in the pre-print. Pre-registration involves documenting the hypotheses, methods, and/or analyses of a scientific study prior to its conduct (10.1073/pnas.1708274114; 10.1038/s41562-021-01269-4). If your study was pre-registered, we strongly encourage you to include the registration number in the pre-print, ideally in the abstract make this important information easy to retrieve, as this practice enhances transparency and reproducibility. If the study was not pre-registered, this should be acknowledged as a limitation. For future studies, we recommend pre-registering on an appropriate repository.<br /> About Protocol Sharing: <br /> We did not find the protocol for your study. If you have one, we encourage you to share it as supplementary material or deposit it in a publicly available repository such as the Open Science Framework ( https://osf.io ) or Zenodo ( https://zenodo.org/) "https://zenodo.org/)") . You can then include a statement in the Methods section indicating that your protocol is openly available (e.g., 'The protocol for this study is available at (link)/ in the supplementary'). Sharing your protocol will help readers better understand your study and enable them to reproduce it if they wish to test it.<br /> About the Statistical Analysis Plan Sharing: <br /> We did not find the Statistical Analysis Plan (SAP) for your study. If you have one, we encourage you to share it as supplementary material or deposit it in a repository such as the Open Science Framework ( https://osf.io ). You can then include a statement in the Methods section indicating that your protocol is openly available (e.g., 'The SAP for this study is available at (link)/ in the supplementary'). Sharing your SAP will help readers better understand your study and enable them to reproduce it if they wish to test it.<br /> About Deviations and/or changes<br /> We could not find any information about potential deviations or changes to the protocol in your pre-print. Since such deviations are common, if this applies to your study, we strongly encourage you to include a subsection titled Changes to the Initial Protocol in the Methods' section and discuss these changes as a potential limitation of your results. If any deviations occurred during your study, please specify them in this new subsection.<br /> About Data sharing / FAIR Data<br /> We found insufficient information about your data sharing approach. Data should be findable, i.e. data are to assigned a globally unique and persistent identifier (for instance there is a DOI assigned to the dataset, or data are registered or indexed in a searchable resource). Data should also be accessible, i.e. data are retrievable by their identifier and can be accessed following an open, free, and universally implementable protocol. As your data id not sensitive data, we encourage you to share it openly on a data sharing repository (Dryad, etc.) and include the Digital Object Identifier (DOI) in the Methods section. If you want more information about good practices of data sharing, visit https://www.go-fair.org/ <br /> About Code sharing<br /> We could not find any information about your (statistical) code. Sharing code is important for enhancing transparency and reproducibility, especially since it does not contain sensitive information. We encourage you to openly share it on a code sharing platform (Github, Codepen, CodShare, etc.) and include the Digital Object Identifier (DOI) in the Methods section. If you want more information about Code sharing https://fair-software.nl/ .

    1. On 2025-08-26 09:26:21, user Constant VINATIER wrote:

      Feedbacks about your preprint : https://doi.org/10.1101/2025.07.24.666513

      About registration: <br /> We could not find any information about the pre-registration of your study in the pre-print. Pre-registration involves documenting the hypotheses, methods, and/or analyses of a scientific study prior to its conduct (10.1073/pnas.1708274114; 10.1038/s41562-021-01269-4). If your study was pre-registered, we strongly encourage you to include the registration number in the pre-print, ideally in the abstract make this important information easy to retrieve, as this practice enhances transparency and reproducibility. If the study was not pre-registered, this should be acknowledged as a limitation. For future studies, we recommend pre-registering on an appropriate repository.<br /> About Protocol Sharing: <br /> We did not find the protocol for your study. If you have one, we encourage you to share it as supplementary material or deposit it in a publicly available repository such as the Open Science Framework ( https://osf.io ) or Zenodo ( https://zenodo.org/) "https://zenodo.org/)") . You can then include a statement in the Methods section indicating that your protocol is openly available (e.g., 'The protocol for this study is available at (link)/ in the supplementary'). Sharing your protocol will help readers better understand your study and enable them to reproduce it if they wish to test it.<br /> About the Statistical Analysis Plan Sharing: <br /> We did not find the Statistical Analysis Plan (SAP) for your study. If you have one, we encourage you to share it as supplementary material or deposit it in a repository such as the Open Science Framework ( https://osf.io ). You can then include a statement in the Methods section indicating that your protocol is openly available (e.g., 'The SAP for this study is available at (link)/ in the supplementary'). Sharing your SAP will help readers better understand your study and enable them to reproduce it if they wish to test it.<br /> About Deviations and/or changes<br /> We could not find any information about potential deviations or changes to the protocol in your pre-print. Since such deviations are common, if this applies to your study, we strongly encourage you to include a subsection titled Changes to the Initial Protocol in the Methods' section and discuss these changes as a potential limitation of your results. If any deviations occurred during your study, please specify them in this new subsection.<br /> About Data sharing / FAIR Data<br /> We found insufficient information about your data sharing approach. Data should be findable, i.e. data are to assigned a globally unique and persistent identifier (for instance there is a DOI assigned to the dataset, or data are registered or indexed in a searchable resource). Data should also be accessible, i.e. data are retrievable by their identifier and can be accessed following an open, free, and universally implementable protocol. As your data id not sensitive data, we encourage you to share it openly on a data sharing repository (Dryad, etc.) and include the Digital Object Identifier (DOI) in the Methods section. If you want more information about good practices of data sharing, visit https://www.go-fair.org/ <br /> About Code sharing<br /> We could not find any information about your (statistical) code. Sharing code is important for enhancing transparency and reproducibility, especially since it does not contain sensitive information. We encourage you to openly share it on a code sharing platform (Github, Codepen, CodShare, etc.) and include the Digital Object Identifier (DOI) in the Methods section. If you want more information about Code sharing https://fair-software.nl/ .

    1. On 2025-08-26 09:23:11, user Constant VINATIER wrote:

      Feedbacks about your preprint : https://doi.org/10.1101/2023.12.12.571308

      About registration: <br /> We could not find any information about the pre-registration of your study in the pre-print. Pre-registration involves documenting the hypotheses, methods, and/or analyses of a scientific study prior to its conduct (10.1073/pnas.1708274114; 10.1038/s41562-021-01269-4). If your study was pre-registered, we strongly encourage you to include the registration number in the pre-print, ideally in the abstract make this important information easy to retrieve, as this practice enhances transparency and reproducibility. If the study was not pre-registered, this should be acknowledged as a limitation. For future studies, we recommend pre-registering on an appropriate repository.<br /> About Protocol Sharing: <br /> We did not find the protocol for your study. If you have one, we encourage you to share it as supplementary material or deposit it in a publicly available repository such as the Open Science Framework ( https://osf.io ) or Zenodo ( https://zenodo.org/) "https://zenodo.org/)") . You can then include a statement in the Methods section indicating that your protocol is openly available (e.g., 'The protocol for this study is available at (link)/ in the supplementary'). Sharing your protocol will help readers better understand your study and enable them to reproduce it if they wish to test it.<br /> About the Statistical Analysis Plan Sharing: <br /> We did not find the Statistical Analysis Plan (SAP) for your study. If you have one, we encourage you to share it as supplementary material or deposit it in a repository such as the Open Science Framework ( https://osf.io ). You can then include a statement in the Methods section indicating that your protocol is openly available (e.g., 'The SAP for this study is available at (link)/ in the supplementary'). Sharing your SAP will help readers better understand your study and enable them to reproduce it if they wish to test it.<br /> About Deviations and/or changes<br /> We could not find any information about potential deviations or changes to the protocol in your pre-print. Since such deviations are common, if this applies to your study, we strongly encourage you to include a subsection titled Changes to the Initial Protocol in the Methods' section and discuss these changes as a potential limitation of your results. If any deviations occurred during your study, please specify them in this new subsection.<br /> About Data sharing / FAIR Data<br /> We found insufficient information about your data sharing approach. Data should be findable, i.e. data are to assigned a globally unique and persistent identifier (for instance there is a DOI assigned to the dataset, or data are registered or indexed in a searchable resource). Data should also be accessible, i.e. data are retrievable by their identifier and can be accessed following an open, free, and universally implementable protocol. As your data id not sensitive data, we encourage you to share it openly on a data sharing repository (Dryad, etc.) and include the Digital Object Identifier (DOI) in the Methods section. If you want more information about good practices of data sharing, visit https://www.go-fair.org/ <br /> About Code sharing<br /> We could not find any information about your (statistical) code. Sharing code is important for enhancing transparency and reproducibility, especially since it does not contain sensitive information. We encourage you to openly share it on a code sharing platform (Github, Codepen, CodShare, etc.) and include the Digital Object Identifier (DOI) in the Methods section. If you want more information about Code sharing https://fair-software.nl/ .

    1. On 2025-08-26 09:21:59, user Constant VINATIER wrote:

      Feedbacks about your preprint : https://doi.org/10.1101/2025.07.23.666382

      About registration: <br /> We could not find any information about the pre-registration of your study in the pre-print. Pre-registration involves documenting the hypotheses, methods, and/or analyses of a scientific study prior to its conduct (10.1073/pnas.1708274114; 10.1038/s41562-021-01269-4). If your study was pre-registered, we strongly encourage you to include the registration number in the pre-print, ideally in the abstract make this important information easy to retrieve, as this practice enhances transparency and reproducibility. If the study was not pre-registered, this should be acknowledged as a limitation. For future studies, we recommend pre-registering on an appropriate repository.<br /> About Protocol Sharing: <br /> We did not find the protocol for your study. If you have one, we encourage you to share it as supplementary material or deposit it in a publicly available repository such as the Open Science Framework ( https://osf.io ) or Zenodo ( https://zenodo.org/) "https://zenodo.org/)") . You can then include a statement in the Methods section indicating that your protocol is openly available (e.g., 'The protocol for this study is available at (link)/ in the supplementary'). Sharing your protocol will help readers better understand your study and enable them to reproduce it if they wish to test it.<br /> About the Statistical Analysis Plan Sharing: <br /> We did not find the Statistical Analysis Plan (SAP) for your study. If you have one, we encourage you to share it as supplementary material or deposit it in a repository such as the Open Science Framework ( https://osf.io ). You can then include a statement in the Methods section indicating that your protocol is openly available (e.g., 'The SAP for this study is available at (link)/ in the supplementary'). Sharing your SAP will help readers better understand your study and enable them to reproduce it if they wish to test it.<br /> About Deviations and/or changes<br /> We could not find any information about potential deviations or changes to the protocol in your pre-print. Since such deviations are common, if this applies to your study, we strongly encourage you to include a subsection titled Changes to the Initial Protocol in the Methods' section and discuss these changes as a potential limitation of your results. If any deviations occurred during your study, please specify them in this new subsection.<br /> About Data sharing / FAIR Data<br /> We found insufficient information about your data sharing approach. Data should be findable, i.e. data are to assigned a globally unique and persistent identifier (for instance there is a DOI assigned to the dataset, or data are registered or indexed in a searchable resource). Data should also be accessible, i.e. data are retrievable by their identifier and can be accessed following an open, free, and universally implementable protocol. As your data id not sensitive data, we encourage you to share it openly on a data sharing repository (Dryad, etc.) and include the Digital Object Identifier (DOI) in the Methods section. If you want more information about good practices of data sharing, visit https://www.go-fair.org/ <br /> About Code sharing<br /> We could not find any information about your (statistical) code. Sharing code is important for enhancing transparency and reproducibility, especially since it does not contain sensitive information. We encourage you to openly share it on a code sharing platform (Github, Codepen, CodShare, etc.) and include the Digital Object Identifier (DOI) in the Methods section. If you want more information about Code sharing https://fair-software.nl/ .

    1. On 2025-03-31 10:55:42, user Marco Barilari wrote:

      Dear authors,

      We enjoyed discussing this preprint in our Layer Seminar Journal Club.

      The draft provides good arguments about the importance of developing and the capabilities of denoising methods to improve the quality of high-resolution fMRI data.

      To further strengthen the impact of this work, we would suggest openly sharing the code with the fMRI community. We believe that there is a high need for improvement in data quality that may come not only from better data acquisition but also from data denoising.

      If code becomes available, we would love to explore this methodology on our own datasets.

      With best regards,

      The Layer Seminar Journal Club

      Marco Barilari, Renzo Huber, Omer Faruk Gulban, Kenshu Koiso, Alessandra Pizzuti

    1. On 2025-03-14 18:02:28, user Julius Zhu wrote:

      Note to add to this preprint (Zheng et al., 2022).

      Since the publication of this genetically encoded sensor-based image analysis program (or GESIAP2.0) on bioRxiv in 2022 (Zheng et al., 2022), we have received extensive feedback from collaborators, colleagues, and reviewers—some of whom commented and reviewed our manuscripts that utilized GESIAP to analyze synaptic properties (Huang et al., 2025; Zhang et al., 2025a; Zhang et al., 2025b). This feedback has been instrumental in identifying several limitations associated with GESIAP2.0. Fortunately, these issues have been resolved in the latest version, GESIAP3.0, which incorporates algorithms independently developed by Roger Zhu and colleagues at Zhejiang University (Zhu et al., 2024).

      One limitation of GESIAP2.0 concerns the initial movement correction procedure. GESIAP2.0 employs translational (or affine) alignment algorithms to correct movements in fluorescence signals collected from ex vivo imaging experiments (Zheng et al., 2022). While effective for minor displacements, this method struggles with larger movements, leading to the exclusion of many cells from analysis. GESIAP3.0 incorporates an advanced non-rigid movement correction algorithm that corrects complex displacements in behaving animals, as well as small movements in ex vivo experiments (Zhu et al., 2024). Recent tests show that this algorithm eliminates the need for alignment, improving the robustness and accuracy of the image analysis.

      Both GESAIP2.0 (Zheng et al., 2022) and GESAIP3.0 (Zhu et al., 2024) utilize the Landweber deconvolution procedure (Sage et al., 2017), a published method that enhances microscopic image quality by "reassigning" detected photons to their original emitting locations in both ex vivo and in vivo experiments.

      Another notable limitation of GESIAP2.0 is the denoising procedure. Initially, GESIAP2.0 used a non-synaptic function to fit fluorescence responses (Zheng et al., 2022). This method proved ineffective in accurately identifying key parameters of fluorescence responses, including latency and peak. This method often failed to detect small fluorescence responses from noise background, resulting in an underestimation of releasing synapses (Zheng et al., 2022). Roger Zhu later introduced a double-exponential synaptic function to improve fitting, but its high computational demands prevented its application to all data in the preprint (Zheng et al., 2022). In GESIAP3.0, Roger Zhu and his Zhejiang University team implemented a least-squares non-linear optimization using the Levenberg-Marquardt algorithm to fit double-exponential synaptic function (Zhu et al., 2024). This method combines gradient descent with the Gauss-Newton method (Moré, 2006; Gavin, 2019), fitting both the rise and decay phases of synaptic responses to optimize parameter regression. Biological constraints based on the response properties of genetically encoded sensors (e.g., (Borden et al., 2020; Sun et al., 2020; Wan et al., 2021)) were also incorporated to restrict parameter ranges, improving both speed and accuracy.

      The final step in GESIAP2.0 is flawed (Zheng et al., 2022). To amplify small signals extracted during denoising, baseline responses are arbitrarily subtracted, introducing random variations in the fluorescence responses. In contrast, GESIAP3.0 improves small signal extraction using double-exponential synaptic function fitting (Zhu et al., 2024). When necessary, it further reduces residual noise with a bilateral non-linear filter (Tomasi and Manduchi, 1998), enhancing image clarity without inflating fluorescence ?F/F0 values.

      In summary, GESIAP2.0 (Zheng et al., 2022) provides further proof-of-principle that nanoscopic sensor-based functional imaging analysis can dissect synaptic properties of transmission across various transmitters, cell types, and animal species (cf. (Zhu et al., 2020; Lin et al., 2021)). This analysis also implies both shared patterns and a diverse range of presynaptic release dynamics that contribute to neuromodulation. However, code defects in GESIAP2.0 lead to inaccurate estimates of synaptic properties, limiting its ability to address fundamental biological questions. Our new manuscripts show that synaptic properties calculated using GESIAP3.0 (Huang et al., 2025; Zhang et al., 2025a; Zhang et al., 2025b) differ significantly from those obtained with GESIAP2.0 and align closely with results from independent approaches. Hence, we encourage readers to contact the Zhejiang University team for potential collaborations and access to the GESIAP3.0 code.

      • J. Julius Zhu, corresponding author of this preprint (Zheng et al., 2022).

      REFERENCES:<br /> Borden PM et al. (2020) A fast genetically encoded fluorescent sensor for faithful in vivo acetylcholine detection in mice, fish, worms and flies. bioRxiv doi: https://doi.org/10.1101/2020.02.07.939504 .<br /> Gavin HP (2019) The Levenberg-Marquardt algorithm for nonlinear least squares curve-fitting problems. Department of Civil Environmental Engineering Duke University August 3.<br /> Huang L, Chang Y, Yang Z, Lynch WJA, Venton BJ, Zhu JJ (2025) Coding principles of dopaminergic transmission modes. Science Advances: revised.<br /> Lin L, Gupta S, Zheng WS, Si K, Zhu JJ (2021) Genetically encoded sensors enable micro- and nano-scopic decoding of transmissions in healthy and diseased brains. Molecular psychiatry:443–455.<br /> Moré JJ (2006) The Levenberg-Marquardt algorithm: implementation and theory. In: Numerical analysis: proceedings of the biennial Conference held at Dundee, June 28–July 1, 1977, pp 105-116: Springer.<br /> Sage D, Donati L, Soulez F, Fortun D, Schmit G, Seitz A, Guiet R, Vonesch C, Unser M (2017) DeconvolutionLab2: An open-source software for deconvolution microscopy. Methods 115:28-41.<br /> Sun FM, Zhou J, Dai B, Qian T, Zeng J, Li X, Zhuo Y, Zhang Y, Wang Y, Qian C, Tan K, Feng J, Dong H, Lin D, Cui G, Li Y (2020) New and improved GRAB fluorescent sensors for monitoring dopaminergic activity in vivo. Nature methods 17:1156–1166.<br /> Tomasi C, Manduchi R (1998) Bilateral filtering for gray and color images. In: Sixth international conference on computer vision (IEEE Cat. No. 98CH36271), pp 839-846: IEEE.<br /> Wan J, Peng W, Li X, Qian T, Song K, Zeng J, Deng F, Hao S, Feng J, Zhang P, Zhang Y, Zou J, Pan S, Shin M, Venton BJ, Zhu JJ, Jing M, Xu M, Li Y (2021) A genetically encoded sensor for measuring serotonin dynamics. Nature neuroscience 24:746-752.<br /> Zhang Y, Zhang P, Shin M, Chang Y, Abbott SBG, Venton BJ, Zhu JJ (2025a) Coding principles and mechanisms of serotonergic transmission modes. Molecular psychiatry ePub ahead of print: doi: https://doi.org/10.1038/s41380-025-02930-4 .<br /> Zhang Y, Zhou J, Scott MM, Looger LL, Li Y, Cui G, Zhu JJ (2025b) Adrenergic signals determine demanding task performance. Manuscript in preparation.<br /> Zheng WS et al. (2022) GESIAP: a versatile genetically encoded sensor-based image analysis program. bioRxiv https://doi.org/10.1101/2022.10.05.511006 .<br /> Zhu PK, Zheng WS, Zhang P, Jing M, Borden PM, Ali F, Guo K, Feng J, Marvin JS, Wang Y, Wan J, Gan L, Kwan AC, Lin L, Looger LL, Li Y, Zhang Y (2020) Nanoscopic visualization of restricted non-volume cholinergic and monoaminergic transmission with genetically encoded sensors. Nano Lett 20:4073-4083.<br /> Zhu RE, Diao X, Liu X, Ru Q, Wu Z, Zhang Z, Looger LL, Zhu J (2024) GESIAP3.0: sensor-based image analysis program for transmission visualization in vivo. bioRxiv doi: https://doi.org/10.1101/2024.10.28.620522 .

    1. On 2024-09-06 06:42:50, user Alessio wrote:

      Dear Clarissa and colleagues,

      Thanks for the extremely interesting read. After a long time investigating the role of dendritic nonlinearities in single neuron computation, it is a pleasure to see it formalized in a statistical mechanics fashion. I look forward to digging into the SI and Methods to understand the ins and outs of the approach.

      I want to ask you if you intend to make the code for the computational experiments available upon publication or if it is already available.

      Also, if you are interested in the topic from a more biophysical perspective, you could check out these two studies where I focused on the temporal aspect of the dendritic nonlinearity and its interaction with Hebbian synaptic plasticity.

      https://physoc.onlinelibrary.wiley.com/doi/10.1113/JP283399 <br /> https://www.biorxiv.org/content/10.1101/2023.09.26.559322v3 Thanks again for your work, and all the best!<br /> Alessio

    1. On 2023-09-11 18:24:50, user markus wrote:

      Review of Stöckl and Maass (2023) “Local prediction-learning in high-dimensional spaces enables neural networks to plan”<br /> by Markus Meister and Zeyu Jing

      The authors propose a neuromorphic algorithm by which an agent can learn the structure of an environment and decide on the best course of action to reach a given target location in the shortest time. The principle is to embed both the states (graph nodes) and the actions (graph edges) in the same high-dimensional space, such that the graph distances are represented by geometric distances between nodes, and the actions correspond to directions in the space. Then the agent can pursue the shortest route by choosing the action that points in the direction of the target node. The embeddings of both states and actions are learned during random exploration of the graph, based on optimizing a prediction of the next state from the previous state and action. Ultimately the embedding should be such that the next state can be predicted by simply adding the action vector to the previous state. The authors argue that all this can be accomplished with biologically plausible learning rules.

      Evaluation

      This is an interesting suggestion for how an agent might learn the structure of its environment, namely how states and actions are connected. This ultimately results in a goal signal, by which the correct action can be chosen that leads to the goal in the shortest amount of time. Some of my questions concern whether the algorithms are really neuromorphic, in the sense that they could be implemented in biological neural circuits.

      Concerns about bio-plausibility:

      1. The learning rule for state-action association, Eqns 2 and 3: This looks difficult to implement with neurons, because:

      a) it involves two different time points on the post-synaptic side: s_{t+1} appears at time t+1, but \hat s_{t+1} at time t;

      b) to make s_{t+1} appear in that neural population, the action vector must temporarily be switched off, requiring some kind of control system;

      c) the postsynaptic variable is a difference between two successive time points. It is not obvious how to implement such a delta rule with biophysics.

      1. The computation of utility, Eqn 4: Again, how would this be implemented with neurons?

      a) It involves multiplying a population vector with another vector in the same population; what would be the mechanism for such a multiply-and-add operation?

      b) In the left term, the activity is driven by an action; in the right term, activity of the same neurons is driven by observations. How would that be accomplished?

      c) The right term involves subtracting activity from the current observations from activity driven by some remembered observation. How are the remembered observations stored? Are these fed into the network one after another? How are they subtracted?

      1. Normalization of synaptic weights

      a) Eqn 5: Here each input synapse gets modified by a factor that depends on the strengths of all the other synapses onto that same neuron. This violates the locality of synaptic plasticity: A synapse should be modified based only on its own strength and the activity of the pre-and postsynaptic neurons.

      b) Another related normalization appears in line 660.

      1. Winner-take-all choice (Eqn 7): Explain how this would work in a neuromorphic system? How does the agent sample the values of the affordable actions? By actually executing them in the real world one at a time and comparing the resulting output from Eqn 4? But then the observations o_t would change as all the actions are played out. Or is the comparison done “mentally” without real-world action? If so, does this require some accessory system that can look up the codes of all the affordable actions and feed them into the network one at a time?

      In summary, if the authors advocate that this model could be implemented in brains, it would be helpful if the proposal included a bio-plausible neural circuit for each of these operations.

      Other questions:

      1. Graphs with cycles

      a) As pointed out (p.9), the theory behind this learning model causes problems with cyclic graphs. All the graph edges are supposed to be orthogonal in the embedding space, yet adding the edges around a loop should give zero. Both can’t be true. It is not clear how this conflict was resolved. Does it require careful parameter tuning so as to sustain performance of the model?

      b) Line 585 describes a hack by which the agent was prevented from traveling cycles during exploration. That requires oracular knowledge of the graph. How would that work in a real-world system?

      1. The concept of “action”

      a) In the first part of the paper (p.1-12), every edge on the graph is considered a different action, and they are all encoded with one-hot vectors. For example, a robot action like “turn left” would have a different one-hot code at every different location. This is somewhat different from the common concept of “action”, which is specific to the agent’s movements, not where they are executed. By postulating a one-hot code for every edge on the graph one effectively circumvents the problem of path integration, which requires making a prediction from stringing together actions that may appear identical from the agent’s perspective.

      b) Please explain how a naive agent entering a new environment will already have a one-hot code for every edge on the graph, even if they involve identical movements of the agent.

      c) What if the actions instead were agent-centered, e.g. {left, right, forward, back} as in the second part of the paper (p.13ff). Would the system still be able to learn an arbitrary graph?

      1. “Affordances”

      a) What is the source of the “affordances” in Eqn 6. They represent part of the graph structure. Don’t these need to be learned as well? In the present formulation it seems they are offered to the system without learning (contra line 194).

      b) By what mechanism would this happen in a neuromorphic system, and how would they be represented by the neural circuit?

      c) Figure 4e highlights the fact that the agent takes an action that was never taken during exploration (Fig 4c). But what if there was a wall between those two nodes, and that’s why the action never occurred there? How does the agent know that the action is “affordable”.

      1. Parameter sensitivity

      a) It appears that for each of the illustrated graphs the embedding space had a different dimensionality, and different choices were made for the learning rates (p.22). In line 610, the state space has fewer dimensions than the number of actions, which obviously precludes finding orthogonal vectors. What motivated all these different choices?

      b) How robust is the system to the relevant parameters? Can the same agent with one parameter set learn different spaces? The report would benefit from exploring a wider range of graphs and scanning over parameter values.

      1. Related work

      There is prior research on these topics that could be used to put the current work in context. In particular, various versions of model-based reinforcement learning acquire the structure of the state space through learning, so that a goal-directed policy can be superposed on that. The present paper only deals with learning the state space, not with learning the goal locations: Those are provided by some unspecified accessory system. Within the RL literature there has been recent enthusiasm about the "successor representation”, which is an embedding that helps predict the agent’s next state. For neuromorphic models that learn the successor representation, see for example Fang 2023 (https://doi.org/10.7554/eLi... "https://doi.org/10.7554/eLife.80680)") and literature cited there.

      Other suggestions:

      1. There is frequent reference to “planning”. But the model presented here doesn’t make any plans. Once it arrives at a node it decides on the next action, as though it had a lookup table (see line 13). It does not “think ahead”. In the neuroscience and robotics field, “planning” usually concerns evaluating the outcome of successive actions ahead of time, for example comparing the value of different routes. The behavior implemented here is more like “online navigation”. This may be confusing to the reader.

      2. In Figures 4 and 5, is there a meaning to the emojis painted on the nodes? At least this reader finds them distracting. Preceding figures worked just fine without emojis.

      3. In part 2 (p.13ff) the main difference from part 1 is that the actions are now agent-centric. It may help the reader to point this out. Because there are only 4 action vectors, the predictions are forced into a 4-dimensional subspace of the high-dimensional space. Eventually, the learning process squeezes that into a 2-dimensional subspace that accurately reproduces the geometry of the graph.

      4. Eqn 4: should start with u_t=

      5. Figure 6c: Perhaps show a bit more of the time course to document that the two variables have settled.

      6. Fig 2 caption: “square root of the length of the shortest path” should this be “square root of the sum of squares along the shortest path”. As stated, it doesn’t resemble the law of Pythagoras.

      7. Fig 4a: use different arrows for the 4 actions.

      8. Typos:<br /> Line 68<br /> Line 32<br /> Line 250: Meaning of this sentence unclear.<br /> Line 392<br /> Line 505<br /> Eqn 12: should be Va instead of V?

    1. On 2023-02-23 20:27:02, user Olavo Amaral wrote:

      I recently reviewed this manuscript for a journal. For the sake of transparency, I thought it was worth it to post my comments here on bioRxiv as well, as it brings the review effort within the public domain. Let me know if you have any feedback and congratulations on the work: it's a nice paper on a very important topic.

      Summary:

      The manuscript addresses the question of “shortcut citations” in methods description. Although this problem is frequently mentioned in debates about methodological reproducibility, it is understudied and it’s nice to see actual research about it.<br /> The results contain three main sections, which study (a) the prevalence of various types of citations in the methods sections of articles in highly cited journals, including shortcut ones, (b) examples of what happens when shortcut citations are followed and (c) a review of journal policies. This is followed by a reasonably extensive discussion focused on (d) guidelines on how to use shortcut citations.<br /> I generally agree that this is an interesting structure, as it (a) documents the phenomenon, (b) evaluates to what degree it represents a problem, (c) inquires what is being made to address it and (d) suggests additional measures. The weakest link in the chain, however, seems to be point (b) (i.e. measuring the impact of the problem), as I am not sure the case studies provided are enough to quantify this. I will try to make this clear in my main point below.

      Main point:

      • While the numbers of articles and citations in the first section of the study are probably sufficient to provide an overview of the use of citations, the 15 articles included as case studies in the second section are not. The authors seem to acknowledge this limitation, as they refrain from making a quantitative synthesis of these articles. That said, this leads this section of the manuscript to fall short in accurately presenting the importance of the problem. <br /> Although I found the visualization for each case study provided in Fig. S2 interesting, I would doubt that most readers will really make the effort to go through each one of them, much less be able to synthesize the data in their own heads to reach meaningful conclusions. Thus, I would strongly recommend that the authors provide some kind of quantitative synthesis of the problem in this section (i.e. What percentage of shortcut citations can ultimately be traced to the original reference? What’s the average number of steps? What percentage is behind a paywall? What percentage reaches a dead end or an insufficient description?).<br /> I note that 15 articles are probably too few for this purpose, and that the sample of articles in which citations are followed would have to be expanded. Thus, I would recommend that the authors perform a sample size calculation to reach the number of citations/articles that can provide reliable estimates within a given confidence interval. For this purpose, it’s worth noting that it would be desirable to perform synthesis both at the level of citations (i.e. what percentage of citations in the sample can be traced?) and at the level of articles (i.e. what percentage of articles in the sample have at least one untraceable citation?), as citations within a single article should not be considered as fully independent units when it comes to representing the whole population of citations. Thus, using articles as units for the purpose of sample size calculation might be the better option.

      Other general points:

      • The categorization of scientific fields is somewhat strange: most people would probably consider neuroscience is a subfield of biology, so presenting both as separate categories may puzzle some readers. I understand that this is a consequence of the JCR categories used, but making this clearer from the start (e.g. “examine the use of shortcut citations in neuroscience, biology and psychiatry journals in the abstract) and perhaps referring to the biology journals as “general biology” would help to avoid confusion.<br /> Still on this point, the selection of fields is narrow and ad hoc. I understand that this is a limitation posed by the authors’ own expertise, but it is nevertheless one of the main weaknesses of the manuscript. Thus, the narrow range of scientific fields examined should probably be mentioned in the limitations section.

      • Even within this relatively narrow sample of fields, the kinds of methods that deserve a protocol probably varies a lot: I’d guess that psychiatry journals include a lot of surveys and instruments, while biology and neuroscience might have predominantly wet lab protocols. It would be interesting if somewhere in the paper (possibly in the example cases provided) we could get a feeling of what kind of “protocols” we are talking about, even if only in a general sense. If quantifying/classifying them is not feasible, at least some illustrative examples could be provided. Are we talking about methods to quantify proteins? Scales to measure depression? Electrophysiology setups for rodents)? The citation culture probably depends a lot on the particular method, so the whole discussion sounds a bit disembodied without touching on this point somewhere.

      • Why are only minimum/maximum numbers of citations within shortcuts and the youngest/oldest citation coded? This looks like an approach to simplify data extraction, but it ends up providing very limited information (i.e. especially if there are many citations per paper, the oldest and youngest ones give very little information on the actual range).<br /> Moreover, this ends up making data visualization in Fig. 3 much less intuitive than it could be (i.e. it would clearer and more informative to provide the full range of citation ages). If the authors could provide the full ranges (although I’m not sure that this is feasible), this would likely strengthen the paper. If not, I’d reconsider whether Fig. 3 should be included in the main results, as I don’t think the results as displayed say much about the sample of citations as a whole.

      • Some points in the case series description and discussion mention that some references “provided a description that was no longer state-of-the-art” and that this may be a problem. I don’t really get the idea here: methods citation are supposed to provide an accurate description of what was done in a study, not of what’s the current state of the art of the method. In this sense, descriptions shouldn’t age badly or become “not-state-of-the art”.<br /> I understand the concern that a very old shortcut citation raises suspicions that it might not really describe what was done in the paper (as it may be likely that no one uses certain methods in exactly the same way after 50 years). But if this is what the authors meant, this should be stated more clearly, as it is not really the impression that comes out of reading these passages.<br /> In the same vein, mentioning in the discussion that “supplemental methods cannot be updated” is technically correct, but is not a limitation in terms of making methods sections reproducible (which seems to be the point of the paper). For the purpose of methods description, whatever was used in a paper should remain static, even though the method may evolve in subsequent study.

      • In terms of data sharing, one thing I could not find in the manuscript or in the OSF was the DOI and title of the articles used as case studies in Fig. S2. I may have missed it, but as there was no folder for the case series section I didn’t know where to look for it. As this seems important for reproducing the findings, this list should be provided somewhere (possibly as a document within the OSF) and cited within the text and legend to figure S2.

      Minor points:

      Introduction:<br /> - The correct name of the project mentioned in the first paragraph is Reproducibility Project: Cancer Biology (not “for Cancer Biology”).

      • “This risk of bias for randomization sequence generation and allocation concealment was unclear…” – this sentence seems odd (in particular the “This” at the start), please revise the wording.

      Figure 1:<br /> - Isn’t the methods section a viable alternative for sharing details needed to reproduce experiments as well? While I agree that in many cases a separate protocol may be a better option, that depends on the length of detail that is needed, which will vary greatly depending on the method. Therefore, I would argue that the methods section should be included as an option in the figure – saying that the information “should” be shared in a separate document sounds overprescriptive.

      • The second “readers” can be omitted from the third sentence of the figure legend.

      Methods:

      • Instead of citing the full OSF page for “protocols, data and code for the prevalence study and journal policy studies” using a single link, wouldn’t it make sense to cite a specific DOI for each of these resources? The same thing hold for points in the text in which specific resources are cited (e.g. “The full search strategy is available on the OSF repository” could point to a direct link to the search strategy rather to the full OSF page).<br /> I think this is optional, as the Readme files in the OSF are clear. But providing specific links to each resource would be more consistent with the authors’ recommendation of providing pages for book citations, for example (in the sense of sparing the reader the trouble to search for a resource within a larger space).

      • What is meant by “top journals” exactly? Are those the ones with the highest impact factor in the JCR in their specific fields? Although this would be my guess, it is not clear from the description.

      • The data on whether papers were related to SARS-Cov2 sounded rather gratuitous, as Covid-19 was not mentioned anywhere in the introduction. If this data is to be kept in the paper (I personally don’t think it adds much), the rationale for extracting this should be mentioned somewhere.

      • Though this eventually became clear, I initially had a hard time to understand what was meant by “number of citations per shortcut”. This could be made clearer when this variable is first introduced.

      • The description of a probable shortcut states that “additional details are not provided in the following sentences or elsewhere in the methods sections”. But what happens if the method is fully explained outside of the methods section (i.e. in the supplementary material or in a repository)? I was unsure how these cases were classified, so it’s probably worth commenting explicitly on it.

      • Electronic searches were performed using the terms “[journal name]”, “journal citation reports ranking”, “author guidelines”, “journal policy”, and “impact factor”. I don’t quite get what this search means to achieve. Why would one need to search for “impact factor” to look for policies?

      Results:

      Figure 2:

      • The different areas have different mean numbers of methods citations per paper (being somewhat higher in Biology). Thus, showing the results for different categories in percentages as in Fig. 2A may cause misleading impressions – although there are still less “How” citations in Biology than in Neuroscience or Psychiatry when measuring absolute numbers, the actual difference is smaller (while that in “Who or what” citations is even larger). Having the bars represent absolute numbers (possibly still displaying the percentage within the bars) – with overall longer bars for Biology – would likely provide a more accurate impression of what’s going on.
      • It took me a while to understand the right panel in Fig. 2B. While the fact the two sides of the violin plots represent different data eventually becomes clear, wouldn’t it make it easier on the reader to break the information for probable and possible citation into separate plots (especially as the left panel uses symmetric violin plots)?

      Tables S5 and S6:<br /> - Can’t the information in these tables be included in the legend for Fig.2 and Fig.3 (as it is relatively short and essentially synthesizes the data in the figures)? This is optional, but would leave the information in one place instead of creating a lot of supplements.

      Figure 5:<br /> - Are the categories in Fig.5A and 5D mutually exclusive? It would seem to me that a journal could encouragd providing sufficient methodological details both in the author guidelines and as policy, and that they may encourage sharing methods in more than one place (i.e. repository or supplemental files). This is likely worth commenting on in the legend.

      Discussion:

      • I don’t think the Germany and California examples mentioned in Box 1 are needed: there are plenty of places of the world with much worse access, and these particular examples are not particularly representative of difficulties faced by the world at large.

      • While I agree with the recommendation to “make all methods publications open access”, I don’t think that there’s any particular reason why methods papers are different from the rest of science (in the sense that they should be open access), so I’m not sure the recommendation really belongs here.

      • The discussion about copyright issues described in the list of recommendations is long for an item in a list. Thus, it probably would fit better in the main text or in a box.

      Table S7:<br /> - I get the feeling that Table S7 would read better if lines and columns were reversed (i.e. methods as lines, features as columns), but it may be a matter of taste.<br /> - Why are supplemental files and protocol journals deemed static while shortcut citations are not? This does not make much sense to me.<br /> - I’d say supplemental files would generally be expected to have been peer reviewed. I agree that this is likely not always the case, but that probably depends more on the reviewer than on the journal (e.g. I don’t know of journals that explicitly exempts supplementary material from the peer review process), so I’d remove “depending on the journal”. <br /> - The comment “protocols remain available over time” made for repositories stands for all categories – it makes sense when comparing a protocol repository to a lab notebook, not to the other forms of describing protocols. Thus, I’d probably not include it as an advantage here.<br /> - I’d argue that both shortcut citations and supplemental files are “findable” for whoever’s reading the paper (which is likely what matters here), so I’d be inclined to remove this category. <br /> - Clinical journals are not the only one to publish protocols as articles (the systematic review community has a tradition of publishing protocols, for example).

      Figure 6:<br /> - In the last no/no option, describing the method in the main text (if it is simple enough to fit) should also be included as an alternative.

    1. On 2022-11-30 22:07:30, user JJ wrote:

      This is a nice preprint looking at the effect of perceptual visuo-proprioceptive mismatch on reach control. The authors should be commended for sharing their data and code online. I downloaded them and was able to reproduce the different figures of the manuscript and the statistical analyses. The wiki on OSF is also very well documented and explain perfectly what the different data are. The figures in the preprint are beautiful.

      I also have a few comments:<br /> 1. upon looking at the figures that you get from OSF, it appears that the variability in reach distance is much higher in the mismatch group than in the veridical group. Even before the experimental manipulation. Are there any reasons for this? <br /> 2. I am a little bit puzzled by the statistical analysis. There are 20 reaches pre-manipulation and 20 reaches post-manipulation but only 5 reaches pre-manipulation are included in the analysis. Would it be beter to have a 2x2x4 design with factors groups, pre vs post and time instead of a 2x5 design (group x time) where pre and post times fell under the same factors? Are there any reasons to exclude the first 15 reaches pre-manipulation?

      Congratulations on this interesting preprint

      JJ Orban

    1. On 2022-10-24 11:08:05, user Anne Urai wrote:

      In this paper, Weilnhammer et al. tackle an intruiging question at the heart of decision-making theory: do observers use stable strategies, or do their strategies ('modes') change within a session? This papers analyses the autocorrelation in perceptual decision-making, and finds that it fluctuates between externally- and internally-driven strategies in ways that are strikingly similar between humans and mice.

      I very much enjoyed reading this work. The topic is exciting, and together with the Ashwood paper (which I see as a companion) this work will undoubtedly spur many future developments across psychology and neuroscience. The author's commitment to open science is laudable, and I am glad to see them making such good use of open, large-scale, community-curated datasets. I do have a few questions that are important to fully understand and validate the author's conclusions, and several minor suggestions.

      My expertise lies in the analysis of behavioral data, so I will mostly comment on main figures 1-4 rather than the model in figure 5.

      I thank the authors for also sharing this work as a preprint. As a signatory of Publish Your Reviews, I have committed to publish my peer reviews alongside the preprint version of an article. Specfically, this review will also be posted on the bioRxiv comments section. For more information, see publishyourreviews.org.

      Major questions<br /> 1. Do mode fluctuations have a characteristic timescale? I find terminology like 'oscillated as 1/f noise' (l. 985) a bit confusing - there may be frequency-specific oscillations, *or* a 1/f spectrum, but these are not the same (https://www.nature.com/arti... "https://www.nature.com/articles/s41593-020-00744-x)"). While the 1/f characteristic is clearly shown in Figures 2D, 3D, 4D (and the authors discuss self-organized criticality), I could not reconcile this with some other aspects of the analyses and writing. First, are phase and coherence not usually computed as a function of frequency? If so, which frequency is used in Figure 2E,F? Methods section 7.3.4 and l. 148 could be elaborated for readers unfamiliar with spectral analyses of behavior (how should the units of coherence be intrepreted?). Second, how can the simulation with a phasic mode switch at a single frequency (Figure 5A) give rise to a 1/f spectrum (Figure 4D), rather than a spectrum with an oscillatory peak (l. 425)? Third, there is a long history of investigating oscillations in perception (work by Fries, VanRullen, Kastner and many others, as the authors cite) albeit at a much faster timescale than shown here. Since the unit in this manuscript is trials (not time), are these two lines of work inherently incomparable, or can something be said about the typical trial length and therefore the interpretation of the best-fitting f = 0.11 in their model fits (l. 890/891)? Throughout the manuscript, it would help to distinguish oscillatory vs. 1/f. Alternatively, I may fundamentally misunderstand the results, in which case further elaboration and clarification would be great.<br /> 4. It would be very helpful if the full set of figure panels (as in Figures 2, 3, 4) was reproduced for each of their control simulations (S4, S6, S7), to better compare these models' behavior against human and mouse data. This would increase confidence in the main findings, and further pinpoint what exact behavioral signatures are unique to bimodal inference (rather than arousal fluctuations or decision bouds). I would like to suggest further control analyses to strengthen the existing ones. First, to conclude that internal-external mode fluctuations do not reflect periods of 'limited capacity', 'energy budget' or 'unstructured neuro-cognitive noise' (section 5.3), can you simulate and fit data from a process that additionally (or only) has periods of low/high perceptual sensitivity or task engagement (for instance, simulating high lapses)? Especially in the mouse data (and shown by Ashwood, there are likely periods of disengagement from the task, e.g. when mice become satieted towards the end of the session. One prediction may be that this would lead to more errors in history-congruent modes (as in 3A), Second, recent work (reviewed in https://doi.org/10.51628/00... "https://doi.org/10.51628/001c.35908)") has shown that slow drifts in decision boundary, without any strategic history-dependent updating, may give rise to statistical confounds and apparent history-dependence. It is difficult to intuit how such a process may affect the analyses presented here: could the authors simulate a process with only a slowly drifting bound (beyond the static response bias in section 5.4)?

      Minor suggestions<br /> 1. I applaud the authors for sharing their full workflow on OSF. However, I did find the format (all files in a zip) a bit difficult to work with: for instance, it's not possible to view the code in-line without downloading. To further increase the usefulness of the codebase to others, consider exploring ways to present the code in a way that allows easy re-running, inspection and versioning (e.g. in a notebook form, or at least with the scripts on GitHub). Also, comments on how to use the files (where to start? how to install/run? what are the dependencies? what version of R?) would be of great help to others who want to implement the same method.<br /> 3. Could the authors show standard (and history-conditioned) psychometric curves in both modes? This would show if there are considerable lapses, which can bias the estimation of history-dependent logistic regression models (see http://www.journalofvision.... "http://www.journalofvision.org/lookup/doi/10.1167/14.7.9)"). If lapses are considerable, this may need to be taken into account in the model (or at least simulated to check that it doesn't introduce confounds). Related to this, how many human studies did not have stimulus strength information (and thus presumably only one level of task difficulty), and do the results look the same without studies with these missing data?<br /> 17. The authors recognize that trial-to-trial variations in stimulus strength (i.e. task difficulty) are a major driver of choices, and account for this in their control analysis. However, when defining stimulus-congruence, this is (as far as I can tell) only done based on the sign (i.e. a binary indicator), thus removing the _degree_ of stimulus congruence. Would the results look the same is stimulus-congruence was instead coded as a continuous variable, i.e. being more congruent when stimulus strength is high?<br /> 18. Why not show the logistic regression results in Figure 2B, which takes into account several confounds that are now hidden in the supplement?<br /> 19. Why do the mixed effects models only have random intercepts, and not random slopes? It seems that sensory and history dependence vary substantially across observers, which random slopes could capture.<br /> 20. Consider visualizing the phase information (Figure 2E) on a circular plot. To better interpret the coherence and phase information (see my first main question), it would be very helpful to discuss whether the two modes are anti-correlated/alternate at a specific frequency.<br /> 21. Please define 'infra-slow' when first used (see first main question). Is this characterized by a specific frequency range?<br /> 22. I am unsure if the selection of IBL mouse sessions may affect these results. Specifically, the authors here use a simple performance criterion of 80% in easy trials. However, in this task contrasts were introduced gradually, meaning that 80% correct on easy trials may happen early on in training as well as very late (with very different contrast sets). In Figure S5, could it be indicated which sessions were incorporated into the main analysis? Would the results hold if using a more stringent criterion to consider animals 'trained', as proposed in the original IBL paper (https://github.com/int-brai..., which also incorporates bias and lapses)? A related point is that TDs are a lot larger (Figure 3H) than in the original paper (IBL et al. 2021, Figure 3 - supplement 2A), which may be remarked upon.<br /> 23. Also Figure S5: it would be of great interest to also see the within-session changes in mode, for mice as well as humans (see also my main question above on satiety).<br /> 24. The green plots in Supplementary Figure 2 are interesting but also a bit worrying, showing all sorts of autocorrelation in the stimulus sequences that make the paper's conclusions trickier to assess. The authors already discuss some features of the IBL task design that introduce specific autocorrelation patterns (i.e. post-error bias correction). Is such information, describing the specific algorithms for sequence generation, available for the studies from the confidence database? And could the authors relate specific stimulus sequences to the behavioral modes they observe?<br /> 25. In section 5.7, it should be noted that mice did receive single-trial feedback. How about humans? Splitting the confidence database into those studies with and without single-trial feedback could be used to nicely test the predictions in line 549 / Figure 5.<br /> 26. RT distributions are characteristically long-tailed, which can strongly affect scaling them. I am a bit confused that all values in Figure 2H lie below zero, should this not be zero-meaned? Was a transform (e.g. log) used before zscore? If not, could the authors show the RT distributions per study before and after outlier removal and scaling, to give a better sense of the distributions that were used in the analyses (or replicate 2H without normalization so that the real RT units are visible, as in 3H)? It would also be nice to add the range of individual cutoff values used for exclusion criteria (l. 776).<br /> 27. To make the magnitude of weights in figure 2B, 3B and 4B easier to interpret, consider adding the weight for sensory stimuli (see Abrahamyan figure 3 - may need to be on a differen y-scale). <br /> 28. lines 436-445: are these correlations linear, or may there be quadratic relationships between posterior certainty and RT, confidence, TDs? A supplementary figure would be nice.<br /> 29. Very minor: I had some issues printing the figures, likely due to many transparent datapoints in the pdf. For the final version, consider exporting the figures to a high-resolution bitmap format to reduce the size.<br /> 30. The work contains a couple of remaining typos (e.g. ressource'), duplicate words, etc.<br /> 31. l. 148, S2F -> 2F.<br /> 32. l. 653, where in the Ashwood paper is this number of >100 trials mentioned? As far as I can tell, they only analyze the first 90 trials of each session (see their figure 3E).<br /> 33. Some references should be updated: for instance, 21 is now published in eLife, and 12 & 66 point to the same paper (published and preprint). Please check all the references to make sure that they point to the most recent versions.<br /> 34. Consider adding author contribution statement and/or displaying this visually (see e.g. https://twitter.com/Steinme..., https://elifesciences.org/a... figure 6). Also, is there a reason why one author is now omitted, compared to the first bioRxiv version?

      Further questionsAs with all fresh ideas, this work raises many more questions that it can answer: while beyond the scope of this manuscript, I list some here so that they may be of use to the community.

      • Especially together with the paper by Ashwood et al., the obvious next question is the structure of mode fluctuations: are there two modes (as suggested here) or more (as suggested by Ashwood)? Do these switch in a discrete vs. continuous way?
      • At what timescale do states/modes change? How does this relate to even slower timescales in biology, e.g. at the level of circadian rhythms?
      • It is fantastic to see that large-scale databases are increasingly being used for cross-species comparison. In a way, it's a shame that these exist only for humans and mice. Are there plans or efforts to collect and publish similar databases from non-human primates (where many, many trials of perceptual decision-making tasks have been collected over the years)?
    1. On 2022-05-05 17:55:24, user Andrew Hooyman wrote:

      Hello!

      Very interesting paper!

      The authors may want to consider using a different formula for fitting their learning curves. Asadayoobi et al. (2021) seems specific for exponential learning where as your data appears logarithmic, Fig 1. I think if you consider modifying the formula slightly you will get a better fit.

      Consider the following

      fp = fatigue parameter<br /> l = learning rate<br /> T1 = Initial Performance<br /> n = trial number<br /> That(n) = predicted correct inputs at trial n

      To calculate fatigue, fg = exp(fp*(n^fp-1))<br /> To predict correct inputs at n, That(n) = l*log(n)-fg+T1

      Example code below that fits session 1 performance of your data, fig 1<br /> `fp=.3<br /> l=6<br /> T1=15

      n=c(1:36)

      fg=exp(fp*(n^fp-1))

      That1=l*log(n)-fg+T1

      plot(That1)`

      Example code that fits session 2 performance of your data, fig 1<br /> `fp=.5<br /> l=5<br /> T1=25

      n=c(1:36)

      fg=exp(fp*(n^fp-1))

      That2=l*log(n)-fg+T1

      plot(That2)`

      This model is similar to Park and Schweighofer (2017) https://pubmed.ncbi.nlm.nih...<br /> However, they also model exponential fit. Although if you change the model in this paper from exponent to log you get a similar result to what I have done above, i.e. there is precedent for this type of learning/fatigue model in the literature for fitting human behavioral data.

    1. On 2022-03-28 08:35:45, user Nikola Stikov wrote:

      Disclaimer: I was asked to review this paper by the author, focusing on the parts of the experiment where I consider myself to be an expert. Therefore, I will comment primarily on the MR physics/interpretation, whereas the morphometric analysis and anatomical interpretations are better suited for another reviewer.

      Recommendation: Major revision

      This is an ambitious article that helps 7T deliver on its promise of high-resolution MRI. The authors acquired T1 and T2* weighted images at 0.35mm isotropic resolution on 5 healthy participants. They implemented a correction for blood motion artifacts, as well as a cortical analysis toolbox for visualizing and characterizing the cortical surface. Additionally they fitted for T1 and T2* maps. The authors have shared the data and the code, making it easier to fill in some gaps that are present in the manuscript. The authors believe that their work will bring mesoscopic imaging to clinical practice because of (i) their protocol, (ii) their post-processing tools and (iii) their high-quality qMRI datasets.

      The images are impressive, even though the coverage is limited (~3cm in the inferior-superior direction). The correction works well, and the analysis appears robust, even though some basic SNR/CNR measurements are missing. I also believe that it is a little too early to perform quantitative MRI on such datasets (see points 3 and 4 below). I would suggest performing phantom studies before moving on to in vivo qMRI maps. Below are some comments/suggestions to improve the manuscript.

      MAJOR COMMENTS

      1. I was missing a comparison with other state-of-the-art methods. In particular, a discussion of SNR would have been very useful to evaluate the efficiency of the scans. While the protocol is relatively fast, the coverage is also limited, and a visual assessment is not enough to convince me of the quality of the images. A proper SNR analysis combined with histograms would be useful. For example, in Figure 2 it appears that compositing reduces the contrast. Not sure what happens to the SNR and CNR, would be important to get those numbers and how much the post-processing changes them.

      2. Related to the above point: the authors share the raw data, but these are very large datasets and OSF download speeds are low. After downloading the zipped data I received a message that I cannot expand it (screenshot), so I was not able to inspect the images. Also, the shared scripts appear well-documented, but I was not able to reproduce the analysis because of too many dependencies. A combination of Jupyter notebooks and containers (possibly via MyBinder) would make it much easier to reproduce the analysis.

      3. Figure 1: The quantitative MRI maps do not have units/colormaps, so it is very difficult to evaluate them. For example, there appears to be a strong B1 effect, so more details about the B1 transmit/receive would be important for qMRI validation. Why weren’t field maps acquired? Also, the 2D histograms are informative, but for qMRI it is essential to conduct scan-rescan to determine the stability of the images and accompanying maps.

      4. Related to the above point, in Figure 9 there seems to be a bias in the T1 values between the left and right Heschl’s gyrus. And in Figures 8 and 10 one can notice that the extreme percentiles (red lines) of one subject interact with the medians (white lines) of another subject. Is there a physiological interpretation for this, or could it be a field effect? In general, without reliable B0/B1 one could have significant non-uniformities and offsets that could translate into inaccurate quantitative MRI maps.

      5. p. 17. What was the reason for the chosen bandwidth and the 6/8 partial Fourier? Was there a time limit to the acquisition? On p. 17 the authors argue that they were hoping to reduce acquisition times, but neither points above are big time-savers and would improve the quality of the images.

      MINOR POINTS AND TYPOS

      The field strength and the manufacturer not mentioned anywhere in the manuscript. The information is easy to tease out from the supporting documents, keywords and acknowledgements, but it would be useful to have it upfront.

      p.15 the triangular meshes is not well -> are not well

      p.15 We have implemented our geometric approach within the LayNii software suite (Huber et al., 2021) and have made them -> made it

    1. On 2022-01-05 12:44:29, user Renzo Huber wrote:

      This manuscript describes a layer-fMRI sequence comparison study to determine which experimental approach is best suited for cognitive working memory tasks with decoding analyses in the primary visual cortex.

      The research question of the best layer-fMRI sequence and corresponding ‘benchmarking’ of fMRI sequences has been one of the most debated topics in the field. Over the last three decades, there have been several dozen such studies in the context of sub-millimeter fMRI resolutions. Those studies focused on investigations of the biophysical properties (mechanisms) in preclinical models, as well as investigations of their utility in the experimental setup of human cognitive neuroscience experiments. Unfortunately, none of these studies had ever yielded generalisable results about the superiority of a given sequence over other ones, and none of these studies resulted in a general consensus in the field.

      Despite the abundance of many previous sequence comparison studies, I think the study of this manuscript is valuable to the emerging field of layer-fMRI in multiple aspects:<br /> -> The manuscript highlights that the overall main layer-dependent fMRI modulation is consistent across sequences. This is an informative confirmation that the results from previous application studies can be trusted, no matter which specific sequence was used. This confirmation is valuable to the field. <br /> -> The manuscript confirms that MVPA analyses might be viable in layer-fMRI for contrasts beyond GE-BOLD. This finding is not established so far and thus, such results represent valuable information in the field.<br /> -> The study uses a highly unconventional experimental design of 4 participants scanned across 6 sessions each. I find this way of ‘deep’ data acquisition very valuable. And I think it is a good reminder to the field that layer-fMRI should not be judged with the standards of common fMRI of many participants with very short scans.

      I have a few concerns about the this manuscript listed below:

      1.) Concerns about sequence comparison studies, in general: ###

      ##################################################

      Personally, I am an opponent of sequence comparison studies, which aim to find one single ‘winner sequence’ that is superior to other sequences. I find this way of thinking distracting and unhealthy for the field. My thoughts are layed out here:<br /> -> 5 min overview video: https://youtu.be/m6TMjm620JI<br /> -> Blog post: https://layerfmri.com/the-b...<br /> For the reasons mentioned in the links above, I do not agree with the conclusion that any given sequence can be most favorable for “typical cognitive neuroscience”. <br /> The crux in sequence comparison studies is that there is no such thing as universal “the VASO” or “the SE-EPI”. Sequences are not like mysterious phenomena of mother nature that can be examined for generalizable conclusions. Instead, sequences are completely man-made engineering products and each sequence implementation can be substantially different from one another. E.g., the most widely used VASO sequence of VE (Terra) is quite different from most VASO sequences on VB (‘old’ 7T). And similarly, a ‘classical’ single-shot SE-EPI sequence will provide a very different sensitivity-specificity compromise compared to a more modern segmented SE-EPI method (Han 2021). In fact, I believe that in this study, the favorable sensitivity of the SE-EPI sequence is mostly driven by T2’-contaminations of the long echo train length and the unwanted T2* weighting in outer k-space lines. And I expect that the conclusions of this study will be quite different if they would be carried out with faster readout-speeds (smaller matrix size, or modern head-gradients).

      2.) Concern about considering sequences as isolated entities ###

      ##################################################

      I have the impression that the manuscript’s reasoning inherently implies that for common application studies one single sequence would be used. I have the impression that this manuscript views sequences as competitors. This is against the trend in the field. I think it has become a standard in the field of layer-fMRI to not rely on one single contrast. Instead, it has become quite common in cognitive neuroscience studies to perform the main experiments with one specific sequence, while showing similar results (e.g. in the Supplementary) information with another sequence. This is usually done to control for expected vascula biases. <br /> E.g., the following list of studies used high-level cognitive neuroscience tasks with multiple sequence contrasts.<br /> -> Muckli 2015 (SE based GRASE and GE)<br /> -> Moerel 2018 (SE based GRASE and GE)<br /> -> Finn 2019 (VASO and GE)<br /> -> Zamboni 2020 (SE based GRASE and GE)<br /> -> Liu 2020 (SE based bSSFP and GE)<br /> The manuscript here seems to imply that those studies are not “typical”. This manuscript does not highlight the possibility that a collection of sequences might be advantageous for neuroscience experiments. As such, the manuscript does not acknowledge that the SS-SI VASO approach provides more than just a single CBV contrast. In fact, it also provides a GE BOLD contrast “for free”. This BOLD signal that is concomitantly acquired with VASO is not shown here. <br /> -> Maybe it would be helpful to judge SS-SI VASO not based on CBV alone. Just disregarding all the desired sensitivity in the second contrast appears quite inefficient to me. Maybe the SS-SI VASO sequence should be judged based on a combination of both provided contrasts. <br /> -> Maybe it would be helpful to include the BOLD signal of the SS-SI VASO sequence into the manuscript. This might also be helpful to address potential confounds of comparing 2D vs. 3D readouts?<br /> -> Maybe it would be appropriate to explicitly mention how the authors define a “typical cognitive neuroscience experiment” and maybe they can discuss how representative this definition is within the entire landscape of functional neuroimaging studies and layer-fMRI? In my humble understanding of ‘typical neuroscience’, I believe it is quite common in cognitive neuroscience to utilize a large umber of stimulation tasks. Most cognitive experiments combine the main task conditions with multiple ‘control’ task conditions. I do not understand why it should not be ‘typical’ to combine one main sequence contrast with additional control sequence.

      3.) Concern about bias against VASO ###

      ###############################

      In my opinion, the focus of the study and the phrasings are not really balanced. It appears to me that the interpretation and the summaries are tuned against VASO.<br /> -> Acquisition choices are different for VASO and BOLD that make VASO look worse than it is. As such, 2D-EPI and 3D-EPI are expected to have different sensitivities to physiological noise in superficial layers. This different physiological noise pattern is phrased as a weakness of VASO, while it might be a weakness of 3D readouts over 2D readouts?<br /> -> Analysis choices are different for VASO and BOLD that make VASO look worse than it is. E.g. shorter time periods are included in the analysis (on top of the already longer TRs for VASO). This is done despite the fact that VASO is known to have a slower HRF. When using less time points in VASO, it is not unexpected that is comed along with reduced statistical power. This choice makes VASO perform worse.<br /> -> A lack of a known ground truth activation profile is used to interpret any difference between sequences in favor of BOLD. There are few differences of layer-profiles between contrasts. Without established consensus in the field about which layer-profile is expected, the authors imply that VASO provides a wrong answer and the other contrasts provide a right answer. E.g., the different significance levels at superficial layers are interpreted as a false-negative in VASO rather than a false positive in SE-BOLD.<br /> -> The overall effects and layer-dependent modulations are very similar across contrasts. In the highlight section, this similarity is phrased as a proof for the usability of GE-BOLD. However, it could analogously be phrased as proof for the usability of VASO. <br /> -> Code development credit is given on analysis software and CMRR sequences, but not to the developers of VASO, nor to the developers of 3D-EPI.

      4.) Concern about missing sequence details ###

      ####################################

      Layer-fMRI is a relatively young field. There is no general consensus of how to optimize either of the tested sequences for a given application. While each acquisition parameter has huge implications on localization specificity and detection efficiency (sensitivity), the respective sequences are treated here rather as a ‘black box’. I think the generalizability claims of the conclusions in this study can be improved by including more details on the implementation and version number of the sequences compared here:<br /> -> The GE-EPI and SE-EPI sequences are described as CMRR-sequences with a reference to Moeller et al.,. Maybe it would be appropriate to mention the version number? <br /> -> It is not mentioned which MAGNETOM 7T scanner was used. The new “Terra MAGENTOM 7T”, or the “classical MAGNETOM 7T”?<br /> -> In order to give the chance to judge the magnitude of the T2’ contaminations, it would be helpful to explicitly state the echo train length. How long does it take for the EPI to get from the outermost k-space line to the k-space center? I believe that this is the single most important parameter that determines the T2’ contaminations and thus the SE-BOLD protocol's sensitivity. <br /> -> The VASO sequence is not accompanied by corresponding sequence developer credits, nor is it accompanied with a reference (like the other sequences). This suggests that it is developed by the authors. The code-avaliability statement declares that all the code will be made available and it is explicitly mentioned that the authors have an institutional agreement with SIEMENS. Thus, I would recommend the authors to include how the authors aim to share their sequence code. I am specifically mentioning this because I am (myself) an interested VASO user and would like to compare their sequence to my own. <br /> -> I am confused about the stated inversion time of 650ms in VASO. This is well below the blood nulling time. And it can be even below the GM-nulling time. As such short TIs, the VASO contrast can become positive (CBV increase comes along with a signal increase). Maybe the authors are confusing inversion time and inversion delay? Maybe the authors use a central acquisition ordering (not linear)?<br /> -> A flip angle of 26 degrees seems unconventionally large without the use of VFAs. It’s almost double the Ernst angle for such protocols. This is expected to significantly lower the sensitivity of VASO. Maybe this needs to be discussed, when judging the sensitivity of the VASO results. Maybe the authors did implement a VFA-approach and are just not mentioning it?<br /> -> The MP2RAGE and the VASO data are acquired with a 3D-EPI readout. It is not described which sequence baseline is used here. Given that Nikolas Weiskopf is in the author list, it remains to be assumed that it is a 3D-EPI baseline developed by Lutti et al (2013). I think it would be appropriate to give credit to the developer. Personally, I think it would be similarly appropriate to also acknowledge the first 3D-EPI MP2RAGE study by van der Zwaag (2018). Given the extremely short echo time and bandwidth in the 3D-EPI MP2RAGE, I think it would be appropriate to explicitly mention the segmentation factor of the 3D-EPI MP2RAGE. Was the 3D-EPI still distortion-matched?

      5.) Concern about misalignment between discussion and abstract/conclusion ###

      ##############################################################

      In the discussion section, the authors state that their findings might be related to the fact that “the experimental regime {was} particularly suited for SE-BOLD”. Thus, the authors acknowledge the limited generalizability of their results. Yet still in the abstract and conclusion sections, the manuscript has general statements saying that SE-BOLD was favorable for typical cognitive neuroscience.

      6.) Concern about contradiction with previous literature ###

      #############################################

      There have been quite a few similar studies already that compared VASO, spin-echo based methods, and GE-BOLD within the same participants (Beckett 2020, Haenelt 2020, Huber 2017, Jin 2008). Unlike in this manuscript, none of these previous triple-contrast comparison studies recommends the usage of spin-echo based methods for neuroscience applications. These previous studies are not acknowledged here and the contradicting findings are not discussed.

      7.) Concern about lacking ground truth layer profiles ###

      ##########################################

      One important difference between the manuscript at hand compared to previous studies is the usage of an employed neuroscience task without an established ground truth of an expected spatial activation signature across cortical depth.<br /> Here, the authors used a working memory orientation discrimination task to evoke feed-back activity. However, it is not clear if the feedback is expected predominantly in the deeper layers (as suggested by Kok et al) or if feedback is predominantly expected in superficial layers (as suggested by Muckli et al.). Thus, in light of a different layer-profile across contrasts, it is not clear which one is the right one. It is not clear if a different significance level in the superficial layers of VASO and SE-BOLD (a) is due to a false negative result in VASO or (b) if it is due to a false positive result in SE-BOLD. Here VASO does not find significant differences in the superficial layers and the authors interpret this as a false negative result. This interpretation is taken despite the fact that such a layer-profile is expected from Kok et al.<br /> Personally, I would consider the possibility that an unwanted T2’-related vein bias in SE-BOLD results in a false positive result in superficial layers.

      8.) Concern about unclear effects of decoding analyses with novel sequence approaches ###

      ########################################################################

      Decoding analyses are not very common in layer-fMRI. In non-GE BOLD sequences, it has never been used beyond exploratory purposes, I think. I have few concerns about potential mechanisms that might introduce biases with MVPA in layer-fMRI applications.<br /> -> In my humble understanding of MVPA, it is quite sensitive to both: (1) magnitude of the spatial signal patterns, and (2) magnitude of the noise. In layer-fMRI, each layer is expected to have a different noise level, and a different relative mixture of thermal noise and physiological noise. Thus, it is hard to interpret whether the different classification accuracy layer-profiles of the different sequences are dominated by actual brain information represented, or if they are rather dominated by the different sensitivity of physiological noise across cortical depth. Since the relationship of tSNR and classification accuracy is highly non-linear between chance-level and 100% accuracy, a straightforward subtraction analysis between task conditions is not viable. Maybe it would be appropriate to confirm that the tSNR profiles are the same across contrasts?<br /> -> In SS-SI VASO, the CBV time course is estimated by means of temporal interpolation of multiple surrounding control images. This means that information content is expected to be leaked across 2-3 neighboring time points (dependent on the temporal interpolation function). This might offset the baseline and significantly bias the results. Thus, MPVA might not be straightforwardly applicable on standard SS-SI VASO processing pipelines. Those standard pipelines have been solely optimized for univariate analyses. Alternative processing pipelines of TR-shifting across trials might be better suited (see Chaimow et al. https://layerfmri.com/baddi/) "https://layerfmri.com/baddi/)"). A further more suited MVPA-optimized VASO approach without temporal interpolation would be a non-SS-SI approach with multi-echo acquisition and in-plane segmentation.

      Conflict of interest statement ###

      ########################

      My career is perceived to be connected to the VASO sequence. While I invest about 90% of my work hours on non-VASO related layer-fMRI topics, my reputation in the field is tightly coupled to the VASO contrast. Thus, it might hurt the perception of my reputation in the field if the VASO contrast is described to be inferior. <br /> I also want to disclose that I have changed my mindset about sequence comparison studies, along the course of the last decade. Nowadays, I am an opponent of sequence comparison studies. Though, in earlier stages of my career, I have regretfully wasted way too much time on it myself.

      References ###

      ##########

      Beckett AJS, Dadakova T, Townsend J, Huber L, Park S, Feinberg DA. Comparison of BOLD and CBV using 3D EPI and 3D GRASE for cortical layer functional MRI at 7 T. Magn Reson Med. 2020;84(6):3128-3145. doi:10.1002/mrm.28347

      Finn ES, Huber L, Jangraw DC, Molfese PJ, Bandettini PA. Layer-dependent activity in human prefrontal cortex during working memory. Nat Neurosci. 2019;22:1687–1695. doi:10.1038/s41593-019-0487-z

      Haenelt D, Weiskopf N, Vaculciakova L, et al. Mapping Ocular Dominance Columns in Humans Using GE-EPI, SE-EPI and SS-SI-VASO at 7 T. In: Proc Intl Soc Mag Reson Med. ; 2020:1230. https://cds.ismrm.org/prote....

      Han S, Eun S, Cho H, Uluda K, Kim S. Improvement of sensitivity and specificity for laminar BOLD fMRI with double spin-echo EPI in humans at 7 T. Neuroimage. 2021;241(241):118435. doi:10.1016/j.neuroimage.2021.118435

      Huber L, Handwerker DA, Jangraw DC, et al. High-Resolution CBV-fMRI Allows Mapping of Laminar Activity and Connectivity of Cortical Input and Output in Human M1. Neuron. 2017;96(6):1253-1263.e7. doi:10.1016/j.neuron.2017.11.005

      Jin T, Kim SG. Improved cortical-layer specificity of vascular space occupancy fMRI with slab inversion relative to spin-echo BOLD at 9.4 T. Neuroimage. 2008;40(1):59-67. doi:10.1016/j.neuroimage.2007.11.045

      Liu C, Guo F, Qian C, et al. Layer-dependent multiplicative effects of spatial attention on contrast responses in human early visual cortex. Prog Neurobiol. 2020;(July):101897. doi:10.1016/j.pneurobio.2020.101897

      Lutti A, Thomas DL, Hutton C, Weiskopf N. High-resolution functional MRI at 3 T: 3D/2D echo-planar imaging with optimized physiological noise correction. Magn Reson Med. 2013;69(6):1657-1664. doi:10.1002/mrm.24398

      Moerel M, De Martino F, Kemper VG, et al. Sensitivity and specificity considerations for fMRI encoding, decoding, and mapping of auditory cortex at ultra-high field. Neuroimage. 2018;164(March):18-31. doi:10.1016/j.neuroimage.2017.03.063

      Muckli L, Martino F De, Vizioli L, et al. Contextual Feedback to Superficial Layers of V1 Report Contextual Feedback to Superficial Layers of V1. Curr Biol. 2015;25:2690-2695. http://dx.doi.org/10.1098/r....

      Zamboni E, Kemper VG, Goncalves NR, et al. Fine-scale computations for adaptive processing in the human brain. Elife. 2020;9:1-21. doi:10.7554/eLife.57637

      van der Zwaag W, Buur PF, Fracasso A, et al. Distortion-matched T1 maps and unbiased T1-weighted images as anatomical reference for high-resolution fMRI. Neuroimage. 2018;176(January):41-55. doi:10.1016/j.neuroimage.2018.04.026

    1. On 2021-11-18 17:56:53, user Arindam (Andy) Bhattacharjee wrote:

      This study is now published in eNeuro under the title: Humans use a temporally local code for vibrotactile perception.<br /> Arindam Bhattacharjee, Christoph Braun and Cornelius Schwarz<br /> eNeuro 8 October 2021, 8 (6) ENEURO.0263-21.2021; DOI: https://doi.org/10.1523/ENE...

    1. On 2021-05-13 19:42:33, user Juraj Mesik wrote:

      A quick message to those who stumbled onto this pre-print but haven't read the final published version of the article in Frontiers in Neuroscience linked at the top of this bioRxiv page (doi: 10.3389/fnins.2021.635126):

      Firstly, if you plan to read/use/reference our work, make sure to use the final, up-to-date version of the paper published in Frontiers in Neuroscience (all Frontiers papers are open access).<br /> Secondly, for the sake of transparency, there are few small, but important differences between the published version and this pre-print (I missed the time window to submit "version 2" here on bioRxiv before the paper was accepted). Most notably, during the peer-review process we discovered a bug in the code that generated regressors for the Audibility feature, which specifically affected the TRFs/goodness-of-fit values for audibility responses to ignored speech (the timing of audibility features was misaligned in these regressors). Note that no other regressors were affected. The consequence of this bug was that in the bioRxiv pre-print, the ignored+audibility responses were much flatter than they should have been, and the associated goodness-of-fit contributions were weaker. This is fixed in the final published version.

      The final, peer-reviewed version also includes word onset feature in the model, and more appropriate LME-based statistics. We also dropped some of the more arbitrary outlier detection criteria, which led to inclusion of all data in the analyses.

      If you have any questions, feel free to contact me using the contact information in the manuscript.

      Sincerely, <br /> Juraj Mesik

    1. On 2021-03-04 13:55:35, user Johannes Franz wrote:

      Dear Tim van Mourik, Peter J. Koopmans, Lauren J. Bains, David G. Norris, Janneke F.M. Jehee,

      Thank you for posting your manuscript as a preprint. We enjoyed reading and discussing it in our layer fMRI journal club (Maastricht University). We would like to provide a few comments compiled from our discussion that we hope will be of use to you.

      The manuscript describes a layer-fMRI study with a spatial attention task. The behavioral protocol follows a long tradition in the psychophysics of spatial attention, and the layer fMRI predictions stem from a well-established literature on the neurophysiology of attentional modulation in visual cortex studied with single units. Thus, we think that the experiment is perfectly suited for applications with layer-fMRI. The acquisition and analysis procedures include cutting edge methodologies and both data and analysis code is claimed to be openly available.

      We believe a large readership will appreciate your investigation of the effect of spatial attention on laminar BOLD activation profiles in an orientation discrimination task, as well as your intention to drive the young field of laminar fMRI towards more thorough reporting of analysis choices and consequences. Furthermore, we are excited about the pipeline being publicly available.

      In this study you show, similar to previous findings, an increase in BOLD response for attended regions, with and without visual stimulation. Yet, unlike previous studies, you did not find an effect of spatial attention across layers.

      We believe the manuscript could be improved along the following points:

      1.) Data are hard to access:<br /> We fully agree with the lead author in his agenda that open sharing of data is mandatory for modern research. We think this is even more essential for replication studies that do not see the same layer-dependent effects compared to previous studies. Only when the data are available, the community can employ their own set of tools and expertise to help tease out potential layer-specific attention effects and/or potential reasons for a disagreement between studies.<br /> Given the authors' stated support for open science, and the fact that the manuscript mentioned more than 5 times (at most prominent places) that all data are openly available, we were surprised how difficult it was to get access to the data. Many of us did not succeed in getting access to the MRI data straightforwardly. After reading IT manuals on how to use webdav.data, setting up our ORCID settings from scratch, and after requesting a temporary Donders account, we succeeded to download the data of the single participant that is provided.<br /> The time course data are much easier to access. However, we were disappointed that those data do not refer to MRI data per se, but rather refer to model fits, which are highly processed, and upsampled to a temporal resolution that is three times that of the actual fMRI time series. The manuscript might benefit from adding a few details about the shared time course data.

      2.) Details on data acquisition:<br /> The acquisition of the functional data is described in one single sentence (line 354f). To aid the importance of reproducibility, we believe this section would benefit from further explanations. <br /> 2a) E.g. application of GRAPPA 8 is rather liberal and unconventional in the field. In fact, some of us first thought it was a typo. Maybe the authors can convince the reader that this is an appropriate choice of acquisition by explaining how this could be achieved (CAIPI = 1/4) and/or reporting basic quality metrics (e.g. tSNR) that allow judgement of the g-factor penalty.<br /> 2b) We were a bit surprised by the application of partial Fourier in both phase encoding directions. We believe that this might be an important piece of information to be reported in the manuscript and might help explain why no high-resolution attention effect was observed. As the MR-physicists in the author list know much better than us, the application of partial Fourier is based on the point-symmetry of the Hermitian k-space. This means that for applications of partial Fourier in both directions, it is not possible to synthesize (recover) the missing outer k-space data that represent the high spatial frequencies. With PF 6/8 for resolutions of 0.827x0.827x0.80mm^3, this results in an effective resolution of 1.15mm in the diagonal direction. Given that V1 has a cortical thickness of at most 2.5 mm, it is perhaps not surprising that the authors failed to observe differences between deep, middle, and superficial cortical layers with this effective spatial resolution.

      3.) Interaction of attention and orientation:<br /> Maybe the manuscript could benefit from including a (supplementary) figure of the behavioral data. What was the effect of the attentional manipulation on orientation discrimination? Were the behavioral effects similar in magnitude to previous studies of spatial attention?

      4.) Units of signal change:<br /> It was not clear to us why the values on the y-axis in Figure 1 and 2 are so small compared to the percent signal change reported in Figure 3? Do the arbitrary units in Figure 1 refer to the same scaling across task conditions and time steps?

      5.) Surprisingly short inter-trial intervals:<br /> We were surprised by the unconventionally short duration of the inter-trial intervals. We wondered whether this timing introduced an HRF-bias that might have confounded the characterization of layer-specific effects. Specifically, it is likely that the shape, and possibly, the linearity, of the HRF varies with cortical depth (Figure 2). Each trial has an average length of 4.7s, followed by a variable inter-trial interval of length 1 to 2.5s. Due to the variable hemodynamic response function across cortical depth (Yacoub 2006, Petridou 2017; full citation attached below), it is expected that the depth-dependent response interacts non-linearly for trials that follow in such quick succession. As such, the accumulating signal in the superficial layers might not return back to baseline as fast as the signal in the deeper layers. In addition to the draining effect, signals might be carried over to the next trial in a depth-dependent way. Specifically, the superficial signal might not only reflect processes across cortical depths from the current trials, but also processes from previous trials while the signal at lower depth could be expected to have less ‘memory’. This layer-dependent bias of non-linear HRF might diminish the attention effect in superficial layers more than in other layers. We feel that this concern could be addressed by additional control experiments with very long inter-trial intervals.

      Yacoub E, Ugurbil K, Harel N. The spatial dependence of the poststimulus undershoot as revealed by high-resolution BOLD- and CBV-weighted fMRI. 2006:634-644. doi:10.1038/sj.jcbfm.9600239

      Petridou N, Siero JCW. Laminar fMRI: What can the time domain tell us? NeuroImage. http://dx.doi.org/10.1016/j.... Published 2019.

      6.) The performance of the spatial GLM is unclear:<br /> Figure 3 has a very appealing layout that nicely conveys the relevant information. When comparing Figure 3 (main analysis with spatial GLM) to Figure 3-Figure supplement 4 (analysis with interpolated laminar signal) we noticed that the effect of ascending/draining veins (the slope of the lines) is comparable in both, if not flatter in the latter case, which is counter-intuitive (the spatial GLM should mitigate the impact of the vascular bias from pial vessels). We would be very interested in a discussion of how the spatial GLM is expected to handle potential carry-over effects between trials such as described in Point 5.

      7.) Voxel selections:<br /> We appreciate the additional analyses summarized in Table 1, repeating the analysis including different numbers of vertices. Specifically we wondered whether not using a selection threshold on the vertices of the main experiment but instead purely relying on the ROI definition of the retinotopic localizer would lead to similar conclusions as when imposing an activation threshold. Is there a danger that a statistical activation threshold in the voxel selection could have resulted in the final layer profiles coming from patches of the cortex that are more dominated by ascending and pial veins (blooming)? Could the lack of localization specificity from those veins be responsible for the lack of layer-specific attention effects? In fact, if we could access the data, we would be interested in repeating the analysis and specifically excluding the voxels with the largest responses (which the authors have focused on), as these are the very voxels that are most likely to be contaminated by a vascular bias.

      8.) Failed to replicate or a new research question?<br /> We were a bit surprised about the article type this manuscript is listed as. In previous public communication (e.g. workshops and thesis) with the lead author, the study was phrased in the context of a replication attempt. However, the article type chosen here is “New results”, as opposed to BioRxiv’s other available categories: “Confirmatory Results”, or “Contradictory Results”. <br /> While we believe that either category would be of interest to a large readership, we feel that the manuscript would benefit from an in-depth discussion of previous layer-fMRI studies that could indeed replicate a spatial attention effect in superficial layers. Maybe the authors can use these studies to estimate the expected effect size of the layer-specific attention effect in a power analysis explaining why the study at hand might not have been able to detect such modulations. Example studies are listed below:

      Liu C, Guo F, Qian C, et al. Layer-dependent multiplicative effects of spatial attention on contrast responses in human early visual cortex. Prog Neurobiol. 2020;(July):101897. doi:10.1016/j.pneurobio.2020.101897

      Gau R, Bazin P-L, Trampel R, Turner R, Noppeney U. Resolving multisensory and attentional influences across cortical depth in sensory cortices. Elife. 2020;9:1-26. doi:10.7554/elife.46856

      Hollander G De, Zwaag W Van Der, Qian C, Zhang P. Ultra-high resolution fMRI reveals origins of feedforward and feedback activity within laminae of human ocular dominance columns. Neuroimage. 2020. doi:10.1101/2020.05.19.102186

      Klein BP, Fracasso A, van Dijk JA, Paffen CLE, te Pas SF, Dumoulin SO. Cortical depth dependent population receptive field attraction by spatial attention in human V1. Neuroimage. 2018;176(October 2017):301-312. doi:10.1016/j.neuroimage.2018.04.055

      Lawrence SJD, Norris DG, de Lange FP. Dissociable laminar profiles of concurrent bottom-up and top-down modulation in the human visual cortex. Elife. 2019:1-28. https://doi.org/10.7554/eLi....

      Marquardt, I., De Weerd, P., Schneider, M., Gulban, O. F., Ivanov, D., Wang, Y., & Uludag, K. (2020). Feedback contribution to surface motion perception in the human early visual cortex. ELife, 9, 1–28. https://doi.org/10.7554/eLi...

      9.) How can a large number of participants account for head motion?<br /> Lastly, while we agree that it can be useful to include larger sample sizes for population statistics we fail to follow the reasoning: “For example, at a resolution this high, even the smallest movement of the participant may cause additional blurring of the data, with potentially detrimental effects on the signal-to-noise ratio. For this reason, we collected data from 17 participants”. It could be argued that to reduce the influence of measurement error, high-resolution fMRI experiments should repeatedly sample a small number of subjects. Given the large number of participants, we would be especially interested in a discussion of individual results, in relation to individual motion estimates.

      Stylistic suggestions:

      Line 10: “Directing spatial attention towards a particular stimulus location enhances cortical responses at corresponding regions in the cortex.” -> We would suggest to specify that BOLD responses increase with attention, not necessarily neural responses.

      Line 80: ‘histiological’ -> histological

      Line 356: ‘T2*-weigthed’ -> T2*-weighted

      Line 367: ‘3200 m’ -> 3200 ms

      Figure 3 and supplementary figures -> Could you elaborate on the gray diamonds?

      We would advise the authors to consider changing the color code in all time series figures. E.g. The two types of red and the two types of blue in Figure 1 are indistinguishable. Should the reader infer which line refers to which condition based on the magnitude of the response? If so, it could be mentioned in the caption.

      The two types of red in Figure 2 are hardly distinguishable.

      In Figure 1–Figure supplement 1, the two panels have no description that distinguishes them. We assume one refers to right and one refers to left hemispheres? It is puzzling why the unattended (blue) line in the right panel has a larger response than the attended (red) line. Is it possible that trials are not labeled correctly for one of the hemispheres? Specifically, does the attention label reflect ‘attention to the left’ instead of ‘attention to the contra-lateral side w.r.t. hemisphere'?

      Overall, we find this work presents an important contribution to the field by attempting to replicate a previously observed effect and promoting a replicable pipeline. We hope that our thoughts and comments will be helpful. We are looking forward to seeing this manuscript published.

      With kind regards,<br /> Sebastian Dresbach, Lonike Faes, Johannes Franz, Omer Faruk Gulban, Renzo Huber, Miriam Heynckes , Eli Merriam, Alessandra Pizzuti, Yawen Wang

    1. On 2021-02-04 03:36:46, user Sara Sims wrote:

      Reviewer #3 (Minor Comments):

      P2, ?3. The authors cite To et al. (2011) for the claim that foveal magnification is greater than peripheral magnification. However, to make this claim, To et al. rely on a number of other citations which would be more appropriate here (?2 of their introduction). A clear example of this is Horton and Hoyt (1991). Additionally, it might be more appropriate to describe cortical magnification as having units of square-mm/square-degree rather than only mm/degree. <br /> We appreciate reviewer 3 for her/his suggestion, we cited Horton and Hoyt, 1991; Azzopardi and Cowey 1993 in the third paragraph of Introduction on Page 3.

      P2, ?3. Additionally, the final line of this paragraph addresses receptive field size. It might be of interest to review the finding of Harvey and Dumuolin (2011) [10.1523/JNEUROSCI.2572-11.2011], that the product of the pRF size and the cortical magnification factor are approximately constant across human V1 and nearby visual cortex. <br /> We added the information regarding how receptive field size and cortical magnification factor changes as eccentricity increases through V1 constantly in human V1 and near visual areas in the third paragraph of Introduction on Page 3.

      P4, continued ?1. in order to understand what the FEF's inclusion in the Dorsal Attention Network means, it might be useful to introduce the Dorsal Attention Network briefly when discussing the DMN and the FPN. <br /> This sentence was reworded for clarity, including removing reference to the Dorsal Attention Network since it was not relevant to the sentence’s main point.

      P4, full ?1. Given the amount of work that has been done on the fronto-occipital and inferior longitudinal fasciculi, the following sentence should probably include a citation or three. "Major white matter tracts that connect to the occipital lobe such as the inferior fronto-occipital fasciculus (connects occipital lobe to lateral prefrontal cortex) and the inferior longitudinal fasciculus (connects occipital lobe to anterior temporal lobe) have been well documented using tractography methods in humans." <br /> We have added this citation to the text in the introduction: “Major white matter tracts that connect to the occipital lobe such as the inferior fronto-occipital fasciculus (connects occipital lobe to lateral prefrontal cortex) and the inferior longitudinal fasciculus (connects occipital lobe to anterior temporal lobe) have been well documented using tractography methods in humans (Wu et al., 2016).” <br /> Here is the full citation: Wu, Y., Sun, D., Wang, Y., & Wang, Y. (2016). Subcomponents and Connectivity of the Inferior Fronto-Occipital Fasciculus Revealed by Diffusion Spectrum Imaging Fiber Tracking. Frontiers in Neuroanatomy, 10, 88.

      P4, full ?2. This paragraph is a bit hard to follow and might be improved by breaking it up into shorter sentences. In particular, I'm not 100% sure what the authors mean by "direct and indirect structural connections". Additionally, I'm not sure why the end of this sentence follows from its beginning: "Since functional connectivity between two brain regions could come from both direct and indirect structural connections, we used DWI to examine direct connections between regions (Adachi et al., 2012; Honey et al., 2009) that were previously found to show functional connections." <br /> We have changed the wording of this paragraph to the following:<br /> “The goals of the current study are 1) to assess the reproducibility and generalizability of retinotopic effects on functional connections between V1 and functional networks that were found in prior work (Griffis et al., 2017). We aim to extend these findings in a new dataset collected under different task conditions (previous work used blocks of rest during a task with central fixation and the current data was collected as part of a resting-state only scan). 2) Extend prior work on the retinotopic connectivity difference to structural connections between V1 and functional networks. 3) Examine the relationship between functional and structural connections. Since functional connectivity between two brain regions could be derived from measurable structural connections, we used DWI to examine connections between regions (Adachi et al., 2012; Honey et al., 2009).”

      P4, full ?3. Again, the concept of a "direct connection" versus an "indirect connection" appears prior to being introduced. Given that this paragraph marks the concept as critical to the point of the paper, the introduction needs to explain what these are. Additionally, it seems that the paper separates the idea of a direct/indirect "structural connection" from that of a direct/indirect "functional connection". This should all be clearer. <br /> In addition to the text added in response to the above comment the following text has been added to the paragraph referenced in this comment: “the pattern of structural and functional connections is similar, suggesting that this lateral frontal functional connection pattern arises from a direct (uni-synaptic) structural connection.” for additional clarification.

      P6, ?3. "Previous work has shown that cortical anatomy is a reliable predictor of the retinotopic organization of V1 (O. Hinds et al., 2009; O. P. Hinds et al., 2008) so that the more posterior parts of the visual cortex represent more central portions of the visual field." At the risk of splitting hairs, the publications by Oliver Hinds show mainly that the V1 *boundaries* are reliably predicted by anatomy. A better citation for the V1 *retinotopic organization* is Benson et al. (2012) [10.1016/j.cub.2012.09.014], wherein we actually assessed the retinotopic maps and not just the boundaries. <br /> This citation has been added.

      P6, ?3. "The average eccentricity of each segment was estimated from Benson and colleagues' probabilistic retinotopy template (Benson et al., 2012)..." The correct citation for the retinotopic template is Benson et al. (2014) [10.1371/journal.pcbi.1003538], along with Benson and Winawer (2018) [10.7554/eLife.40224] assuming you are using a recent version of the template, which appears to be the case based on Figure 2 (though given that you are using the FreeSurfer V1 boundary also, I can't really tell). Additionally, it isn't technically correct to call this a probabilistic template (such as might be said correctly of the visual area atlas by Wang et al., 2015). The retinotopic template is more accurately a model of retinotopic organization fit to the average retinotopic organization across many subjects-it does not explicitly express or depend on probabilities. <br /> Wording has been changed to retinotopic template.

      P6, ?3. "These ROIs were defined in the gray matter on the cortical sheet for the freesurfer template, then moved into the individual anatomical space for each participant." I believe that the authors' intent here is to state that ROIs were defined on FreeSurfer's fsaverage brain using the eccentricity of the retinotopic template (which is also defined on the fsaverage brain) then were interpolated over to individual subject cortical surfaces using FreeSurfer's anatomical registration. However, I don't have a good prior for what the "freesurfer template" is here or what the "gray matter on the cortical sheet" of it might be, so this may all be wrong. Perhaps the implication is that the ROIs were hand-drawn in the voxels of the fsaverage subject's "ribbon," but if so, is the interpolation back to the individual subject done on the surface or using FreeSurfer's newish diffeomorphic volumetric alignment? <br /> The following text has been revised to further clarify for the reviewer: “These V1 eccentricity segment ROIs were defined on FreeSurfer's fsaverage brain using the eccentricity of the retinotopic template then were interpolated to individual subject cortical surfaces using FreeSurfer's anatomical registration. To avoid the potential for artifacts due to differences in ROI size when comparing probabilistic tractography results, the number of vertices were kept similar (on the Freesurfer fsaverage brain) between eccentricity segments.”

      P6, ?3. "To avoid the potential for artifacts due to differences in ROI size, the number of segments per eccentricity region were assigned to more evenly distribute ROI size." Again, this is not at all clear. Earlier text in this paragraph implies that the segments *are* eccentricity regions. Does this sentence indicate that the segments were adjusted in each individual subject to be of a similar size? Or that the ROIs were split into several segments each before interpolation? Is there a material difference between what was done and simply starting with a larger number of segments? It's not clear to my why the process is described in terms of three segments whose eccentricities are reported then redescribed in terms of more segments whose eccentricities are not reported. <br /> We acknowledge that the reporting of the V1 ROI eccentricity segments was unclear. We have simplified the text to be more clear so that it now reads: “Based on this template, 3 retinotopic regions were identified: central vision (mean eccentricity estimates of 0-2.2 degrees visual angle), mid-peripheral vision (mean eccentricity estimates of 4.1-7.3 degrees visual angle) and far-peripheral vision (mean eccentricity estimates of 14.1-25.5 degrees visual angle) (Figure 2).”

      P7, ?1. "... voxels within the white matter corresponding to the network ROIs were used as track seeds." I found this initially confusing as immediately prior to this section, "ROI" refers to the ROIs of V1, which should have no truck with the white-matter (i.e., a white-matter voxel predicted to be in an ROI derived from the FreeSurfer's V1 label or the retinotopic template must by definition be erroneous). However, I suspect that this is intended to be about a separate set of network ROIs? This should be clearer. <br /> Yes, there are two sets of ROIs, the V1 ROIs and the Network ROIs. The “network ROIs” has been changed to “network-ROIs” to emphasize this point further. Also, whenever the term “ROI” is used, the name of the set of ROIs being referred to is now stated.

      P7, Data Analysis. Again, citing the analysis methods is well and good, but this section should make very clear up front which data were collected/analyzed by the authors and which data were collected/analyzed by the HCP. I should be able to easily tell both what analysis steps were performed *and* which set of authors performed each step. <br /> See response to Reviewer #3 Major Comment #2.

      P7, ?2. "Next, right-to-left and left-to-right acquisitions were concatenated into a single 4D volume for the functional connectivity analysis." While I understand from this sentence that the preprocessed images were transformed into single 4D volume files, I do not follow the significance of "right-to-left" and "left-to-right" in this context. <br /> The text of the article has been changed to clarify this: “Next, both the acquisitions (those collected right-to-left and those collected left-to-right) were concatenated into a single 4D volume for the functional connectivity analysis.”

      P8, ?4. The text references a "2mm2 Gaussian kernel". Is this supposed to be 2 mm (not squared)? If so, does it refer to the FWHM or to the HWHM or to the parameter ?? It says the "surface maps" were smoothed, but was this done on the FreeSurfer cortical sphere (in which case, mm is a curious unit)? Volumetrically? Something else? <br /> This was a typo it has been changed to “2mm” and the text now reads “Surface maps of the track termination probabilities were smoothed using a 2mm FWHM Gaussian filter and averaged across all subjects.”. This was done with mri_glmfit “fwhm” flag.

      P9, ?1. More information is needed about the t-tests that were used. Were these tests one-tailed or two-tailed? Corrected for multiple comparisons or not? How was mri_glmfit used to perform these tests? The help-file for mri_glmfit mentions t-tests only in the context that a certain use-case reduces to a t-test in some circumstances. <br /> We have added “two-tailed” to the text. The mri_glmfit function can be used as a t-test under one sample group mean test with the --osgm flag. We did not correct for multiple comparisons due to the analysis’s design with specific, planned comparisons.

      P9, Comparison of Functional and Structural Connectivity. Was only one correlation coefficient calculated? Were the authors not interested in these correlations for the non-central V1 regions? It seems irregular that only one of these would be examined given the experimental setup and the hypotheses of the manuscript. <br /> We have now included dice coefficients, per the reviewer’s suggestion, as well as adding non-central V1 regions in this new analysis.

      Methods, generally. In a couple of places, the authors refer to commands like "mri_vol2surf" (P8, ?1). It would be ideal if the command lines or scripts were also provided with the manuscript. <br /> The code has now been added to the code repository.

      P9, ?4. "The t-test comparing functional connectivity to different eccentricity segments in V1 revealed significant effects (p<.001) and brain regions belonging to FP, CO, and DMN functional networks (Figure 3)" is the "and" here supposed to be "in"? <br /> This edit has been made.

      P9, ?4. It's not clear to me how "preference" was evaluated here. For example, "central representing V1 was preferentially connected (over mid-peripheral and far-peripheral V1) to regions associated with the FP network". Was this assessed by visual inspection? A good quantitative metric would be nice to have here, such as the dice coefficient for each ROI-network pair. <br /> We have added dice coefficients to the analysis. See Tables 1 & 2.

      P9, ?4. "Those previous results had also shown differences in connectivity between mid-peripheral-representing regions and far-peripheral representing regions, which were not observed here, (Figure 3)" <br /> This text has been reworded for clarity: “However our results differ in that mid-peripheral-representing regions and far-peripheral representing regions differences were not observed here (Figure 3).”

      P10, Figure 3. "There, vertices in yellow showed stronger (z>3) connectivity to central V1 than to both Far peripheral and mid-peripheral regions." I do not understand the significance of "(z>3)" in this caption. Additionally, what is the significance of the gray color shown on all brains in the bottom row? <br /> Clarification has been added to the Figure legend, including “The grey regions indicate the location of the other networks.”

      P11, ?1. "... we performed pairwise comparisons of functional connections... Results indicate that ... there are preferential connections between central V1 ..." Again, I'm not clear how preference is being assessed here, or what is being compared pairwise. Pairwise comparisons between segments and networks? What values exactly were compared? If these are referring to visual inspection, that is fine, but the language seems to suggest something more programatic, and what that might be is not clear. <br /> The text has been clarified to now state “We performed statistical comparisons (t-test) of functional connections between central vs far-peripheral eccentricity segments of V1 and the FPN (Figure 4).”

      P11, Figure 4. Please tell us what exactly is being plotted. What value minus what value? <br /> The values being subtracted have now been added to all figures.

      P14, ?2. "A comparison between structure and function showed overall agreement, indicating that the functional connections are likely mediated by direct structural connections (Figure 6, right column)." Depending on what the authors mean by "mediate" I'm not sure that this follows. Please elaborate. <br /> We acknowledge that this wording is unclear. We have therefore changed the wording of this statement to the following: “These relationships indicate that the overall pattern of connectivity of central V1 greater than far peripheral V1 is consistent across modalities with an especially high overlap within the FPN.”

      P14, Figure 6. "Far-peripheral and central V1 are statistically different within the FPN..." How was statistical difference within the FPN assessed? <br /> Please refer to the following section:<br /> “Tractography Analysis <br /> To test the hypothesis that patterns of functional connections previously found in V1 (Griffis et al., 2017) are similar to patterns of structural connections, comparisons were made between the central and far-peripheral eccentricity segments of V1 connectivity patterns to the FPN. Differences in track probabilities corresponding to V1 eccentricity segments connections were compared by paired, two-tailed t-test (using Freesurfer’s mri_glmfit with a one sample group mean test). “

      Style/Aesthetic Comments <br /> Throughout the manuscript, starting on P3, full ?1, there are several mismatched parentheses that are distracting. These typically look like this: "some claim is made here (e.g., (Someone et al., 2010) then continues here". Almost all of these could be fixed by removing the "(e.g. ". That said, the use of "e.g., "makes me think that there are other citations that *should* appear here, but haven't been filled in yet, especially given that many of these are broad statements somewhat outside my particular expertise, such as "The fronto-parietal network (FPN) directs attentional control (e.g., (Zanto & Gazzaley, 2013)".

      P4, L8. "Markov et. al," should be "Markov et al.," <br /> This edit has been made.

      P4, full ?2-3. The authors mix the style "Something listable: (1) first thing, (2) second thing..." and the style "Something listable: 1) first thing, 2) second thing." <br /> This formatting has been changed.

      P7, ?1. The acronyms "FP" and "CO" were previously reported as "FPN" and "CON". This needs to be fixed throughout. I get that at times the intention is to represent the deduplication of the word "network," i.e., "the fronto-parietal and default mode network" becomes "the FP and DMN". I think this usage is less clear to readers than "the FPN and DMN" and, besides, the text sometimes says "the FP and DMN networks" (P8?3L3, P9?4L4). Alternately, introduce FP et al. as separate acronyms on P7: "Fronto-parietal (FP), cingulo-opercular (CO), and default mode (DM) networks...". <br /> Abbreviations have been edited for consistency.

      Reviewer #3 (Additional data files and statistical comments):

      As mentioned in the Major and Minor comments, most if not all of the statistical tests need to be more explicitly described. I could not currently reproduce the exact tests from the manuscript, even if I had the data.

      Additionally, because the project is a reanalysis of a large dataset, it would be particularly valuable to have the source code used for analysis. It is nearly impossible to reproduce or assess a project like this without such code.

      The code for the analysis has now been added to a repository and it is referenced in the paper.

    2. On 2021-02-04 03:35:58, user Sara Sims wrote:

      Reviewer #3 (General assessment and major comments (Required)):

      This manuscript examines data from the Young Adult Human Connectome Project's 900-subject release to compare both structural and functional connections between iso-eccentricity bands in striate cortex and the fronto-parietal, cingulo-opercular, and default mode networks. The authors find that central vision is most strongly connected to the fronto-parietal network, which is associated with attention, while the far periphery is more strongly connected to the default mode network. The questions asked in this manuscript are of considerable interest to the field, and this study has the potential to be impactful. However, substantial work is needed to make the methods and results sufficiently clear and reproducible to the reader.

      Major Comments <br /> A major problem throughout this paper is that the authors have not been very careful in documenting their methods, what they are plotting, or how they are supporting their assertions. Many small examples of this are documented in the Minor Comments, below, and together should be taken as a major shortcoming of the work. I do not believe there is sufficient detail in this paper as is to reproduce the methods, nor was I able to understand what precisely was calculated in the statistical tests reported.

      The amount of work that has been put into this project's quality control (at minimum, visual inspection and filtering of 900 MR images) is very impressive! This information should really be shared with the broader research community in order to make this manuscript more reproducible and in order to ensure that other researchers can simply use and cite the authors' efforts rather than repeating them. This could be as simple as a supplemental table or text-file that includes the subject IDs of those HCP subjects that were included in all analyses. <br /> This has been added to the repository with the code.

      It should be crystal-clear from the Methods section whether the manuscript's data were collected or reanalyzed by the authors. My understanding is that all of this manuscript's analyzed data are from the HCP database. However, had I read only the "Data Acquisition" section I would have been left with the strong impression that the authors collected the data themselves using the same kind of scanner and the same analysis pipelines as the HCP. Unless this is the case, the opening sentence of this section should probably be something like "All data were acquired and preprocessed by the Human Connectome Project (Van Essen et al., 2013)" [10.1016/j.neuroimage.2012.02.018]. It may also be wise to reference the HCP in the Acknowledgements section. Further information: https://www.humanconnectome.... This should apply equally to the data and the preprocessing methods-i.e., if the quality control mentioned in the above comment was performed by the HCP and not the authors, that should have been explicit. <br /> We have clarified what activities were done by HCP and which the authors did in the methods section. We have done this by including text such as “The authors applied additional preprocessing steps” and “We then censored the functional images”. We hope this has addressed this point for the reviewer. <br /> The following has been included in the acknowledgments section: “Data were provided [in part] by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University.”<br /> The recommended citations for HCP data use and preprocessing analysis have now been added throughout the methods section.

      P3, ?6. This paragraph is critical to the methods but is not at all clear (see the many specific points in the Minor Comments, below). In particular, the paragraph eventually describes seven eccentricity segments per subject, yet it does not explain what the eccentricity boundaries of these segments are, nor does Figure 2 show these segments. It isn't clear from the manuscript if these are ever used (rather than the 3 central/mid-peripheral/far-peripheral segments) or exclusively used. <br /> We have clarified the ROIs used in this analysis in this paragraph by referring only to the 3 central/mid-peripheral/far-peripheral segments and not to additional segments used in prior works.

      In looking at Figure 4, my first and strongest impression is that the central connectivity is very similar to the far-peripheral connectivity, and the z-score differences are incredibly small. Additionally, the legend does not make the quantities plotted very clear (these are based on the averaged z-scores across subjects?) so I'm left wondering how to assess any sort of significance. I have a similar reaction to Figure 5. More help is needed to understand these results.

      In this paper, we are looking at connectivity to V1, and heterogeneity in this connectivity across portions of V1. You are absolutely right that Figure 4 shows that the portions of cortex that are most strongly connected to central V1 are the same ones that are most strongly connected to far peripheral V1. We should have made this a more prominent point in the text of the original, and have now modified that (‘While central and peripheral representation portions are still part of the same V1 area, and therefore we would expect similarity in their connectivity patterns, our results indicate that eccentricity differences do exist and are consistent with previously reported differences in information processing on central and peripheral visual information. is the sentence we added’). While visual information from all portions of V1 share some similar network connections, as we motivate in the introduction, it is important to identify where in the brain there are differences between these two. These differences will, necessarily, be small relative to the size of the raw connection strength. However, they are quite reproducible, as evidenced by the Griffis et al. paper’s reproduction for functional connectivity and by the very significant p-value.

      For clarity of the plotted quantities, the figure legend of Figure 4 reads:<br /> “Top: Group average central minus far-peripheral differences in functional connections in relation to the FPN. Group average data was thresholded for significance (p<.001) and effect size (connectivity differences > .01). The FPN is outlined in green.”<br /> The figure legend for Figure 6 (previously Figure 5) reads:<br /> “Top: Central minus Far-peripheral V1 structural connection differences within the FPN. Group average p-track differences between central and far-peripheral V1 data were thresholded for significance (p<.001) and effect size (ptrack differences > .01) and masked for the FPN (outlined in green).”

      Given that this paper consists of a large analysis of a large existing dataset, it would be especially nice if the authors would make their source code and intermediate analysis files publicly available. Having access to the source code directly is virtually a requirement of making this kind of study reproducible and would mediate many of my concerns about the ambiguities of the methods. <br /> The following has been added to the paper: “Code has been made available at: https://github.com/Visscher...

    1. On 2020-08-07 11:57:09, user Omer Faruk Gulban wrote:

      Dear Michele Svanera, Dennis Bontempi, Sergio Benini, Lars Muckli,

      Congratulations on your work. I have enjoyed reading this manuscript. You can find my comments below. I think that addressing some of these issues would benefit the next iteration of the manuscript:

      This manuscript proposes a deep learning based method to segment canonical cortical tissue types (white matter, gray matter, cerebrospinal fluid) which seems to offer higher accuracy and precision -if not equivalent- compared to some other contemporary methods. Currently, image segmentation is one of the most time consuming bottlenecks in the analysis of high resolution ultra high field MRI data especially for whole-brain experiments. Researchers either apply automatic methods and accept the errors that come with them or attempt to manually fix such errors in slice-by-slice fashion which is extremely time consuming. Studies attempting to make progress in this front is of utmost importance to the field. Therefore, I applaud the authors’ efforts.

      As far as I am aware of, the authors correctly claim that their deep learning based method is the first example of such application in high resolution ultra high field MRI domain. Therefore the methodological novelty element of this paper is high. As I am not an expert on deep learning, I will be unable to judge the technical prowess in this regard. However, I have enjoyed reading the deep learning related details and thought that these sections were well-written for a non-expert. Having said this, in what follows, I am going to focus on how the authors validated their new method in an attempt to evaluate the “practical value” of the method rather than its value based on “methodological novelty”:

      Major<br /> 1- I have found usage of the “ground truth” label unfounded. Fracasso et al. 2016 and Bergmann et al. 2019 are experiments that are not focused on tissue segmentation. Calling the pipelines developed for them to be used as “ground truth”, to me, makes no sense as there are no attempts at segmentation validation in those studies. I understand that the authors are choosing a select number of methods to test their algorithms against but labeling this specific segmentation “ground truth” is misleading for the general reader (e.g. compared to the “ground truth” segmentations shared by Gulban & Schneider et al. 2018, which are slice-by-slice manually edited and quality controlled by multiple experts for the whole brain). I do recommend the authors to consider adjusting the manuscript accordingly by replacing this label as they actually did in Figure 2 by calling it “Fracasso et al. 2016” rather than “ground truth”. I think this would increase the credibility of this manuscript and clarify a potential misinterpretation.

      2- It is unfortunate to see the code and data “will become available after publication”. This significantly hampers my ability to have a more in-depth review of the authors’ work as segmentation quality assessment depends on the slice-by-slice inspection of the 3D volumes of the segmented images. A few selected 2D slices presented in figures can be misleading for the overall segmentation quality of the whole volume.

      3- However, assuming that the segmented slices present in Figure 2 are the best examples, I am going to evaluate these. I think the white matter border looks good in CEREBRUM-7T. However, compared to the manual segmentation is clear that the thin white matter protrusions are incorrectly labeled as gray matter. It is nice to see that CEREBRUM-7T beats the so-called “ground truth” in Figure 8, but it is no surprise that it fails against the manual segmentation. Such errors on the white matter border in CEREBRUM-7T would be unacceptable for e.g. layer-(f)MRI analyses, therefore I would not consider performing better than the so-called “ground truth” a firm validation in favor of CEREBRUM-7T. Though I see that it can perform better than its training set, and agree that this is a valuable result to show.

      4- I appreciate that the authors present the resulting white-gray matter surface meshes in Figure 10 as this data structure is often used in advanced analyses. However, since the authors pitch CEREBRUM-7T as a tool for sub-millimeter imaging researchers, I would have been more interested in seeing the outer gray matter surface meshes. Accurate and precise segmentation of both inner and outer gray matter surfaces are of utmost importance for e.g. layer-(f)MRI. Currently the outer gray matter surfaces are not presented or discussed in the manuscript. Unfortunately this is a major consideration for the field and I think this manuscript would benefit significantly by having a discussion on this topic.

      5- Looking back to Figure 2, I would expect an outer gray matter mesh quality that is unusable for the advanced layer & column analyses not only for CEREBRUM-7T but for all other competing methods including the superior (according to Figure 8) manual segmentation. I say this by seeing the amount of “not gray matter” tissue forming touching/kissing gyri, and deep sulci not being separated. These are the main concerning areas for the high resolution researchers, which takes a significant amount of time to manually edit. Interestingly, this aspect is not discussed in the manuscript. I do think that the authors can straightforwardly and substantially improve their discussion and results by showing and discussing the failings of their algorithm, together with the failings of the validation dataset. Currently, there is no extensive validation dataset for the high resolution ultra-high field MR images (aside from nine manually segmented subjects shared by Gulban & Schneider et al 2018). Therefore if providing/sharing such extensively manually segmented and quality controlled (both in white-gray matter border and outer gray matter border) is out of scope of this manuscript, the authors can at least address the issue by providing a substantial discussion on this matter.

      In summary, I think that the manuscript is in a decent state analysis-wise but misses the mark by cutting the discussion too quickly by not addressing the detailed considerations at the heart of everyday high resolution MRI research. However, I believe that if the major concerns above are addressed, it would be a timely and valuable manuscript that would direct future improvements to address the time consuming issue of tissue segmentation at high resolution ultra high field MRI.

      Minor<br /> - Figure 2 last column “manual segmentation” lacks consistency compared to the other columns. E.g. Top row does not show gray matter (blue) and the middle row does not show both white (red) and gray matter (blue). However these labels are visible in the bottom row.

      • “Uurbil et al. 2003” should be “Ugurbil et al. 2003”

      I hope that some of these comments will be helpful for you.<br /> Kind regards,<br /> Omer Faruk Gulban

    1. On 2020-06-19 12:34:29, user Renzo Huber wrote:

      Dear Gilles de Hollander, Wietske van der Zwaag, Chencan Qian, Peng Zhang, and Tomas Knapen<br /> Thank you for providing your preprint. We enjoyed reading and discussing it in our layer fMRI seminar (Maastricht University) and would like to provide a few comments that we hope will be of use to you.

      This is de Hollander’s and Knapen’s debut paper in the emerging field of layer-fMRI in humans. It is the most recent paper of a list of papers that utilize known submillimeter structures of columns and layers to validate cutting edge acquisition and analysis methods. The manuscript at hand is a particularly nice one and it stands out compared to similar papers because of its commendable data transparency, sophisticated analysis methods, and comprehensive rigor.

      We were impressed by the level of detail the analysis methods are described. The authors went through an odyssey of using many available open-source software tools to ensure the high analysis quality. Furthermore, we admire the interactive presentation of volume and surface data via interactive web tools. <br /> While the analysis documentation received a lot of attention to details, this is not the case for other aspects of the study. As such, the acquisition parameter documentation, the functional contrast origin, and the corresponding assumptions did not receive the same attention to details. We are particularly concerned on the combination baseline uncertainties and the non-linear model-based de-veining and SNR constraints in the decoding scores.

      While we anticipate the impact of the paper to be on the domain of neuroscience application research, we would classify the paper as an analysis-focused methods-development paper.<br /> We enthusiastically anticipate the publication of the paper in a prestigious journal.

      As an effort to support our peers to improve their manuscript, we list some points below. These points were discussed in our Layer-fMRI Seminar (Maastricht Brain Imaging Center) on June 2nd 2020:

      1.) The authors seem to claim at multiple points in the manuscript (and abstract) that they would be the first to investigate columnar structures and laminar structures at the same time. We feel that this might be an overstatement. There are multiple studies (which we happen to be familiar with) that had previously looked at laminar and columnar features at the same time across brain areas and tasks: E.g. De Martino et al. (2015) in the auditory cortex, Huber et al (2020) in the sensorimotor system, and Schneider et al. (2019) in V5/MT+. Specifically, in the context of ODC, we would like to refer the authors to the work from Kemper et al. (2018, see Fig. 2 for simultaneous columnar and laminar features) and Feinberg et al. (2018, see Fig. 7-8 for ODC across depth). We would recommend to the authors, to rephrase their claim (e.g. make it more specific). Maybe the claim would be more appropriate, if it would be constrained to simultaneous modulations of laminar and columnar activity at the same time?

      2.) We are concerned about the conclusions of the layer-dependent results upon model-based deveining. While we are excited about the emergence of model-based deconvolution methods, they are just starting to be validated, their shortcomings are just starting to be understood, and their applications are not established except for exploratory use. Currently, it is still more established in the field to avoid unwanted venous signals by means of advanced acquisition strategies, rather than adding additional signal analysis filtering (deconvolution) steps. Specifically in the manuscript at hand, the application of model-based vascular vein removal can be complicated by the models' non-linear behaviour. As the authors are surely aware of, the model-based vascular deconvolution is highly dependent on a very accurate knowledge of the (resting-state) baseline (the signal without any task). In this sense, the model-based deveining is different to common task-subtraction designs. For model-based deveining, a small error in the baseline-estimate can result in fundamentally different layer profiles. For example, two parallel layer-profiles can end up with qualitatively different layer-peaks upon vascular de-convolution, simply based on a different constant baseline offset. As such, the different fMRI activity modulations across depth in Fig. 5B that are seens as two parallel lines result in converging lines in Fig. 5D, simply based on a small 0.3% baseline offset. <br /> Due to this sensitivity of an accurate baseline estimate, previous studies tried to minimize this issue with corresponding optimizations of the task timing. As such, Markuerkiaga et al. (2016) used data in their model development that had 53 s rest (inter-stimulus interval) and the Uludag group usually pushed for very long rest periods too (e.g. see the recent Marquardt & de Weerd et al. 2020). Only such long rest-periods give the BOLD signal enough time to drift past the undershoot back to baseline. <br /> While a baseline-offset can be partly (but incompletely) accounted for with random alternating mixing of task conditions within runs that contain all possible task conditions, this was not done in the study at hand. Here, the baseline cannot be expected to be the same across runs, which used different task modulations. While the ‘stimulation-effect’ is estimated across temporal periods that are only 12s (ISI) apart in some cases, the ‘task-effect’ is estimated across temporal periods more than 4 minutes apart. These different modulations thus will be differently affected by vascular HRF-dependent post-stimulus undershoots and, thus, will bias the model-based-vascular deconvolution differently. Therefore, it is not clear to us if the two parallel lines in Fig 5B have a different offset due to baseline biases or due to neuronally-driven differences. <br /> We would advise the authors to estimate the baseline-related bias with their specific setup. E.g. time courses in Fig. 6 show that the inter-stimulus ‘baseline’ is highly negative compared to the pre-stimulus baseline at the beginning of each run. Maybe they can use a discussion of this discrepancy to argue whether there is a baseline-bias here? Maybe the authors can exchange baseline estimates across the different runs and see if their findings are still reproducible? (S_act_run1 - S_rest_run2)<br /> Related to this topic we would like to point the authors attention to the fact that the references (Corbitt et al., 2018, Merola et al., 2018, Havlicek & Uludag, 2020) are in the reference list, but not mentioned in the main text.

      3.) We found it hard to follow the interpretation of the origin of the inverted-U shape of the decoding layer-profiles (Fig. Fig 6C). If we understand the analysis methods correctly, the decodability score (correctly estimated task conditions) should be dependent on the relationship between the task-induced signal strengths vs. the noise floor. Thus, a low decodability score can be a sign of a low neurally-driven signal amplitude compared to a finite thermal noise (as expected in the deeper layers). Independently, however, a low decodability signal can also be a sign of particularly high physiologically driven signal fluctuations despite the presence of large neuronal signal strength. And correspondingly, the decodability score resembles a convolved measure of neural-diven information content and multiple sources of temporal variance.<br /> It is well established in the field of layer-fMRI that the physiological noise and the correspondingly noisy signal fluctuations are largest at the superficial layers (Polimeni et al., 2018), associated with pulsating vessels and partial voluming of CSF fluctuations. Thus, we would interpret an inverted U-shape in figure 6C as a feature of noise distribution across depth, and not necessarily as a feature of feed-forward driven thalamic input. <br /> We would advise the authors to state the underlying assumptions of interpreting layer-dependent decodability scores. Is it assumed that the noise is homogeneous? Maybe it would be appropriate to show profiles of layer-dependent RETROICOR estimates across cortical depth?

      4.) The authors note that the power of the wavelength peaks in the middle layers and interpret this as an indication of neurally driven input. We think the manuscript would benefit from a discussion of the layer-dependent architecture of vein branches. E.g. based on ex-vivo work in (Duvernoy 1981, Fig. 43) it is expected that the deeper layers have more lateral signal leakage due to the longer branch lengths. And similarly, the superficial layers are expected to have a wider lateral signal leake due to partial voluming of pial vessels. Thus, it is not clear to us if the cortical depth of the power-peak should be interpreted as a vascular feature or a neural feature.

      5.) Unlike the description of the analysis procedure, the data acquisition methods are not described in equal detail. E.g.: <br /> We were wondering if the acquisition did not use partial-Fourier imaging methods (asymmetric echoes)? <br /> Did the authors use any parallel-imaging methods (SENSE or GRAPPA)?<br /> Based on the matrix size and FOV, we would expect a resolution of 0.62 mm, instead of the stated 0.7mm. Was there some phase oversampling used? <br /> It would be interesting to the SIEMENS users how many shots (excitations) it took to obtain each k-space plane.

      6.) Comment on Figure 5: Would it be possible that “model proposed by Marquardt” should have been Markuerkiaga?

      7.) Comments on Figure 4: The panel keys (A) and (B) seem to be missing. The x-axis labels on Panel D are cut off. It is not clear what the color code in panel C refers to. We assume it is power, but the authors might want to clarify this.

      8.) Comment on Figure 2: There are a few minor stylistic issues in this figure (text size, time arrow head being displaced (D) etc.). In addition, it is curious to see the darkening of the brain tissue in cerebellum part of in Panel A, and right axial image on lateral sides of the temporal lobe. The segmentation in those regions looks highly suboptimal. We highlight this because although it is clear that these areas were not close or the region of interest, the amplitude of these visual artifacts was concerning. A clarification on such artifacts would be appreciated.

      Minor comments: <br /> *Subsequent task runs was often -> subsequent task runs ‘that’ was often<br /> *In non-human primates ‘by’ Van Kerkoerle and colleagues. <br /> *Emoji in: Primary visual (; e.g., Muckli) <br /> *Emoji in: V1 (; Hubel) <br /> *We are not sure if “robust and consistent” are the best terms to describe a ODC prediction of 60-79%. Only every 5th sample (20% = 70-50) is predicted correctly above the change level. <br /> *Text says: “see preprocessing anatomical data”, the section reads “Structural *Preprocessing”<br /> *As far as we know, ‘SampleToSurface’ is not a Freesurfer command but a Nipype wrapper for mri_vol2surf. Might be unclear to readers.<br /> *Inconsistent spelling of MP2RAGE-ME as M2RAGE-ME<br /> *It should be Lanczos not Lancos in one of the occurances.

      We do not strictly expect or desire these comments to be separately answered by the authors because we are aware of the potential workload of the authors with regards to the formal journal reviews. We are only hoping that our comments would be helpful to improve the quality of your article.

      With kind regards,<br /> Renzo Huber, Omer Faruk Gulban, Miriam Heynckes, Yawen Wang, Sebastian Dresbach, Johannes Franz, Lonike Faes, Sriranga Kashyap<br /> With helpful discussions from Martin Havlicek.

      References: <br /> -> De Martino F, Moerel M, Ugurbil K, Goebel R, Yacoub E, Formisano E. Frequency preference and attention effects across cortical depths in the human primary auditory cortex. Proc Natl Acad Sci U S A. 2015;112(52):16036-16041. doi:10.1073/pnas.1507552112<br /> -> Huber L, Finn ES, Handwerker DA, et al. Sub-millimeter fMRI reveals multiple topographical digit representations that form action maps in human motor cortex. Neuroimage. 2020;208:116463. doi:10.1101/457002<br /> -> Schneider M, Kemper VG, Emmerling TC, De Martino F, Goebel R. Columnar clusters in the human motion complex reflect consciously perceived motion axis. Proc Natl Acad Sci. 2019;116(11):5096-5101. doi:10.1073/pnas.1814504116<br /> -> Kemper VG, De Martino F, Emmerling TC, Yacoub E, Goebel R. High resolution data analysis strategies for mesoscale human functional MRI at 7 and 9.4 T. Neuroimage. 2018;164(March):48-58. doi:10.1016/j.neuroimage.2017.03.058<br /> Feinberg DA, Vu AT, Beckett A. Pushing the limits of ultra-high resolution human brain imaging with SMS-EPI demonstrated for columnar level fMRI. Neuroimage. 2018;164(February):155-163. doi:10.1016/j.neuroimage.2017.02.020<br /> -> Polimeni JR, Renvall V, Zaretskaya N, Fischl B. Analysis strategies for high-resolution UHF-fMRI data. Neuroimage. 2018;168(April):296-320. doi:10.1016/j.neuroimage.2017.04.053<br /> -> Duvernoy HM. Cortical Blood Vessels of the Human Brain. Brain Res Bull. 1981;7:519-579.

    1. On 2020-04-12 16:22:55, user Costas A wrote:

      Code availability: Automated staged optimization workflow is available as a python package (github.com/AllenInstitute/A... "github.com/AllenInstitute/All-active-Workflow)") written on top of BluePyOpt. The workflow is easily configurable using nested .jsonschemas. The electrophysiology, reconstructed morphology as well as the perisomatic models can be downloaded using AllenSDK api and as a result is a dependency of our repository along with NEURON for biophysically detailed simulations. As a placeholder until a future data release where these new models are integrated into the Allen Institute Cell-types data portal(portal.brain-map.org/explor... "portal.brain-map.org/explore/models)"), we have shared the models used in this work in a separate github repository (github.com/AllenInstitute/A... "github.com/AllenInstitute/All-active-Manuscript)").

    1. On 2019-12-17 16:36:21, user Johan S. Martinez-Fuentes wrote:

      NE 598 Group 3<br /> IntroductionWe are university students enrolled in a course focused on understanding neural circuits, including factors important for their development and control of animal physiology. In an effort to promote constructive discourse of current research in this field, and to gain experience in the process of peer-review, we provide the following critique of the currently unpublished manuscript from Wallace et al. posted on biorxiv.org (version: July 25, 2019).

      Summary:There has been a growing appreciation for the role microglia play in regulating synaptic connectivity during brain development; however, how microglia regulate the circuit integration of neurons in neurogenesis in the healthy adult brain remains unclear. Wallace et al. focus on the effects of microglia on adult-born granule cells (abGCs) as part of the mechanisms underlying a previously reported increase in activity of principle neurons of the olfactory bulb (OB) after microglia ablation. Their general approach consists of combining genetic labeling methods with in vivo live-cell imaging of microglia and abGCs (both constitutive labeling and GCaMP indicator of activity-related calcium influx) under conditions of odor presentation and microglial ablation using the CSF-1R antagonist PLX-5622. Overall, the authors found evidence for specific microglial interaction with adult-born abGC spines, that the population-level dendritic GCaMP response of OB abGCs was significantly decreased, and that the excitatory input into the abGCs were selectively decreased with no change to their inhibitory input. These results further support the notion that microglia play an important role in sculpting the circuit connections of nascent/developing neurons in the context of adult neurogenesis, with new descriptions of potential molecular mechanisms that may be at play. Overall, we recommend more consistency with respect to experimental time courses to strengthen the overall conclusions, more consistent definitions of threshold values for the classification of evoked responses, and clearly articulated cohort numbers and ages. We recommend improving the labelling of figures in terms of defining the control and experimental groups using keys, and the sizes of the two groups should be more balanced. Further, we recommend consistency between written text, legends and the figures themselves, particularly in cases where the number of odorants stated and displayed do not match. The authors may elaborate on these points in-text for improved understanding of their findings.<br /> In Figure 1, to explore the nature of microglia interactions with abGCs, the authors employ viral-genetic labeling to target both cell populations and examine them under in vivo two-photon imaging. The authors confirmed the highly motile nature of microglial processes, as microglial interactions with abGC "mushroom" and "filopodial" spines were quantified by spatial overlap of the cell markers. While overlapping of both types of spines with microglial processes were not significantly greater than expected by chance ("offset" image analysis), there was an increase in the number of microglial interactions with mushroom spines with about two-fold increase in interaction time than expected by chance. This was not seen with microglial interactions with filopodia, thus showing preference for mushroom/potentially active spines.<br /> In an effort to investigate how microglia ablation effects odor-evoked responses of abGCs, Wallace et al used two-photon imaging to observe abGC calcium activity over the entire time course of abGC development in anesthetized mice on PLX5622 (PLX) chow. Compared to controls, abGC neurons in PLX-treated mice were less responsive to odors as quantified in 2F by cumulative distribution plot. Additionally, figure 2H features a raincloud plot that quantifies a decrease in the median lifetime sparseness of the abGC dendrites in OB of PLX-mice. Moreover, figure 2I quantifies median response amplitude across all dendrites, showing significant decrease in median amplitude across dendrites of treated anesthetized mice. These results suggest a decrease in their dendrites’ temporal selectivity and likely reflects the developing abGC’s decreased odor responsiveness. Another set of experiments testing these effects in awake mice were performed (Figure 3). cumulative distribution of dendrite responses in figure 3B affirms suppressed calcium transients under odor exposure in PLX-treated awake mice. There were not significant decreases in median number of responsive odors (Fig. 3C), nor lifetime median sparseness of dendrites.<br /> In Figure 4 the investigators explore whether effects of microglial ablation were specific to developing versus mature abGCs. After following the experimental protocol shown in Figure 1a, the cumulative distribution of the responses are unchanged after PLX administration, and noise is not significant (Figure 4c). Cumulative distribution of the number of effective odor (exceeding an ROC threshold of 0.53) also shown not to be significant (Figure s4d). Finally, this figure also includes a Raincloud plot of lifetime sparseness, with control and PLX groups largely overlapped, and kernel density estimates underneath with box plots showed insignificant differences (Figure 4d). Thus, the ablation of microglia did not significantly change the evoked responses of developed abGCs, highlighting the importance of microglial during abGC development.<br /> In contrast in figure 4.1, there is no administration of PLX chow. The abGCs are imaged twice at the 3 month post-injection, and three weeks later alongside control group imaging (Figure 4.1a). Dendrite-odor pair response comparisons in the images Before 1 and Before 2 as seen in the timeline garnered similar results with an R2 value of 0.73 (Figure 4.1b). There are also distribution plots show no significant difference between groups (Figure 4.1c-e). Overall, this suggests abGC cells do not display significant differences in their responses three weeks after the injection. Similar results are shown in Figure 4.2 with 9 weeks after the first imaging set (Figure 4.2a).<br /> In Figure 5, possible PLX-mediated structural changes to abGC spines in the EPL were assessed by quantification of spine number and volume in two-photon acquired images. The authors measured spine density per abGC after four weeks with or without drug treatment during abGC development and found no significant difference resulting from microglial ablation. When considering total population of spine volumes, the PLX-treated condition revealed spines were significantly smaller compared to those in control. However, this effect was not observed when cell-averaged spine volumes were compared between conditions.<br /> In figure 6, the authors looked at electrophysiological correlates in the previously observed spine head sizes during abGC development. To do this they simultaneously recorded in vitro spontaneous excitatory postsynaptic currents (sEPSCs) using patch clamping and in vivo imaging (Figure 6a). They report that there are no differences in frequency of sEPSCs from control to PLX-treated mice, but observed reduced amplitude (Figure 6c-d). They then report their finding of the membrane properties as being the same across all mice, control and treated (Supplemental 1). To test potential changes to spontaneous inhibitory postsynaptic currents (IPSC), the authors repeated the same experiment but tracing the IPSCs and found no difference between the control mice and the PLX-treated mice(Figure 6e-g). These results show changes in abGC functional responses is due to the weaker excitatory inputs. Using the timeline in Fig. 4, the authors also tested electrophysiological effects of microglial ablation on matured abGCs (Fig. 7), and found that ablation after development has no effect on synaptic input, either excitatory or inhibitory.

      Major Issues:We believe there is a general lack of explicitly tracking the age of mice used in this study, which may potentially affect the significance of the findings. The authors list the age of mice used as 8-12 weeks from the beginning of experiments. This may be too large of a range given the lack of knowledge in the field regarding how factors regulating neurogenesis change with age (Kase et al., 2019). One suggestion is to explicitly list the n associated with the age of mouse used, and perhaps in supplementary figures color code certain quantified data points by age to show how measures may or may not be different. Figure 1 uses a low number of mice (n=3), and so it is unclear whether the significant increased time of microglia-abGC interactions may be more related to earlier or later ages at this adult brain stage. The issue here may be summarized by the question, do developmentally perturbed abGCs recover activity after 12+ weeks? We invite the authors to consider addressing this timing discrepancy.<br /> In Figure 2, the timeline for development of abCGs could be improved upon because there doesn't seem to be a fixed time point to anchor the data set, we are unsure to what extent the authors can be confident in their comparisons. Moreover, there is no mention of the timeline that the authors used in Figure 3. Additionally, the cumulative distribution plots used in figures 2 and 3 do not do an adequate job of showing the discrepancy between the PLX and control groups. We suggest using another form of statistical analysis to depict the disparities between these two groups more effectively (e.g., consider general histogram depicting counts per bin of response level). There are some more integral criticisms that can be made for Figure 4 even though it is useful and well-done. In the figure legend, 16 compounds are discussed; however, the figure itself only shows the combination of the compound, heatmap, and trace for 15 substances. Furthermore, while each set of experiments looks at a different aspect of the effects of microglial ablation, the different timelines that are used over the course of the experiment and the changes to it as seen in figure 4 specifically, can be problematic when trying to make assertions when trying to make comments on the findings of the paper as a whole. Additionally, the age of the animals themselves in not mentioned. Furthermore, the ROC threshold indicated for treating evoked responses as effective is inconsistent between the primary figure, where it is listed as 0.53, the supplementary figure 4.1 where it is listed as 0.39, and the supplementary figure 4.2 where it is listed as 0.78. The use of supplementary figures and experiments was useful on its own right; however, changing the threshold values between the sets of experiments at their analogous counterpoints is problematic when trying to consider the outcomes of the parts in unison since all of the portions are using the ROC threshold value in the same way. <br /> There are two main issues to address in Figure 5. One is the abrupt change in the timing of lentiviral labeling of abGCs and PLX feeding. Here, the two were simultaneous, such that experimental migrating abGCs are expected to interact with microglia not present in other developmental ablation experiments. This particular timing would make the experimental condition more similar to control where microglia are intact. Thus, the synaptic findings in Figure 5 are not strictly transferable to functional deficits seen in Figure 2. This also means that the authors may expect a more robust synaptic phenotype if they revert to the experimental timeline used in Figure 2. The second issue is the oversampling conducted in the experimental condition: there was an average of ~61 spines sampled from each control abGC, and ~101 spines from each PLX-treated abGC. The authors may consider quantifying more control spine volumes to make a more balanced/fair comparison.<br /> A significant issue with Figure 7 is that the authors decide to use an experimental timeline different from that of Figure 4 where the time from lentiviral labeling is shortened by one month, but their choice behind this change in timeline is not explained. Besides the change in timeline, the recordings are completed after a month after injection, whereby the difference in age of abGCs from shorter experimental timelines makes it unclear what sort of broader conclusions can be drawn.

      Minor Issues:In Figure 1B, the insets showing percent coverage are insightful for understanding microglial-abGC interaction dynamics; however, it suffers from a lack of x-axis labeling that affects ease of reading. We suggest either moving all insets to its own panel with explicit time labeling, or make the x-axis reference clearer in sub-panels. Regarding spine selection, it may be important to address how/whether other spines not well-described by the two classifications were considered (i.e., were stubby and cup-shaped spines considered/observed?). It would also be interesting to see whether there were any differences across the quantified measures as a function of time (1-4 weeks post-injection).<br /> It would be interesting if the authors addressed their rationale for picking the odors that they did in both figures 2 and 3. Panels 2D and 3A would benefit from providing the common names of the scents corresponding to each odor. Figure 3 in general could also be improved by distinguishing the PLX and control groups more effectively. This could be accomplished by adding clearer labels on each of the figure insets. We would also suggest increasing the overall number of experimental mice for this particular experiment to see if the data that is currently trending towards significance can be bolstered above threshold. <br /> While overall Figure 4 is quite well-done, there are some minor errors and possible areas of improvement. In the timeline in part 4a it would have been useful to label the Before (control) as a control imaging session because looking at the figure at first glance it is not entirely clear that the control is not another group of mice and rather is the same mice imaged twice. With the consideration that a timeline is used (which was a good idea) mentioning first imaging and second imaging session directly on it could be useful. It may be helpful to the reader if the names of the odorant compounds were included. Furthermore, while one can eventually piece together that the control group is in purple and the PLX group is in orange, they are unable to do so from the figure alone. In figure 4.2d there is a key on the figure that indicates these groups are specified by these colors, but this cannot be well determined in the primary figure; while a small thing to fix, this is integral to comprehending the results accurately. Furthermore, only three mice are used in the primary experiment. It may have been useful to look in more mice for the purposes of the experiments.<br /> In reporting the results for Figure 5, it is not intuitive why cell-averaged spine volume is not significant between control and experimental conditions, but it is the opposite when analyzing individual spine populations. A short description to reconcile this conflicting finding is needed. We suspect this suggests that a relatively small population of PLX-treated abGCs harbor most of the spine volume changes. Furthermore, it is unclear in the discussion how well the authors may speak to a trend in increase in spine density when there seems to be two data points that may be driving a lot of the PLX population average.<br /> In Figure 6C-G, the figures are all comparing control to PLX-treated mice.These graphs all have two different colored sets of data, and in 6A there is a demonstration of what these two colors correspond to. However, in the rest of the figure there is no clarification of which set of data corresponds to which color. We suggest the authors include which data set if for which condition on each of the graphs or add a legend near these graphs to be clearer to the readers.

      Merits:In Figure 1, the authors highlight evidence of cellular interactions that lay proper motivation for examining the effects that microglia may have on abGC functional development. The data acquisition and method of analysis are generally well-described in their respective report sections, and the conservative nature of quantifying microglia-spine interactions lends to more confident data. The comparison of real data to its offset counterpart across many quantified measures is also a clever way to argue for microglial preferential interaction with mushroom spines.<br /> Figure 2 provides excellent histological confirmation of microglial ablation. In figure 2 and 3, the authors showed the processed data for the GCaMP6s traces in panels 3A, 2D, and 2E in an easily interpretable manner. Moreover, the decision to use a raincloud plot for panel 2H and a bar graph in 2F showed significance more effectively than the cumulative distribution plots. <br /> There are several parts of the experiments associated with Figure 4 that are highly useful. The use of a timeline is highly conducive the set up of the experiment highly understandable and creates a visual image that is easier to comprehend than the worded explanation. Furthermore, it is useful that the experimenters have chosen to include the actual chemical structures of those used in the experiments. The raincloud plot shows expertly how the data compares between the groups very directly. The kernel plot gives a sense of the individual data points, and the box plots give important information on statistical measures of the data. Additionally, the concept of including an experiment on discussing the relevance of the ablation of microglia in the context of whether developed abGCs are affected strengthens the overall argument and credibility of the paper as a whole. Finally, including a supplemental section which had experiments both on looking at simply long time post injection in comparison to the three month mark (Figure 4.1) and one that looked at an increased period of time with the PLX administration (Figure 4.2) was also very useful in bolstering the results.<br /> Figure 5 is a valuable addition to the article as it brings a cell biological mechanism into discussion for the observed functional phenotypes in microglial ablation. We commend the authors for reporting different single-spine and single-cell perspectives of analysis on the same data set in Fig. 5D even though the two analyses lay out a complex and seemingly conflicting picture. But combined with the rigor in the authors’ approach, this motivates the reader to ponder future experiments to explain the data.

      Future DirectionsWith respect to Figure 1, future experiments may further partition the subtypes of mushroom spines with which microglia interact based on different post-synaptic markers. For instance, microglia may preferentially interact with spines expressing certain receptors. It is also unclear how activity in the olfactory bulb may direct microglial interactions with abGC spines. Increasing olfactory activity in mice by housing in an environment with prolonged exposure to stimulatory odors, and subsequently tracking microglial interactions, may result in more robust phenotype and better reveal microglial attraction to certain spines.<br /> Concerning Figures 2 and 3, we suggest that future experiments should attempt to refine the timeline by providing a fixed time point for the development period of abGCs. We also suggest that more experimental mice be added to the cohort in figure 3 to probe the validity of the non-significance of their statistical analyses. <br /> As in many other sets of experiments seen in this paper, in Figure 4 a number of different odorants was used to evoke responses. We think it would be interesting to take a closer look at the chemical composition of the compounds and look at the differential effects on the responses of the abGCs. Additionally, regarding the fourth set of experiments in particular it may have been interesting to look at the before period being even further along in the life of the microglia. Microglia live for a couple years in mice, it may be interesting to look at the effects of the PLX administration in microglia that were not only fully mature, but also as they are reaching the end of their life.<br /> An intriguing idea stemming from the data in Figure 5 is that there is a subpopulation of abGCs that is particularly susceptible to microglia-dependent spine volume enlargement. Given the relatively low number of abGCs sampled per group, this may be a rather large subpopulation, perhaps representing dedicated GCs or periglomerular cells, both of which should be labeled non-discriminantly here. Thus, using a cell-type resource of the olfactory bulb, such as that created by Tepe et al. (2018), to find a lead for molecular markers to target susceptible adult-born neuron subpopulations may push our understanding of the phenotypes reported here.<br /> With regards to future experiments from the EPSC experiments (Figures 6, 7), it may be interesting to investigate potential changes in mini-EPSCs or -IPSCs to flesh out a fuller picture of the state of synaptic activity. The approach would effectively be the same only with acute introduction of tetrodotoxin at the site of recording. These minis may behave differently depending on the ablation and can have an effect on the EPSC frequency and amplitudes. This difference could be a notable change that leads to what appears to be either no change or a change in amplitude.

      Works Cited:Kase Y, Otsu K, Shimazaki T, Okano H. (2019). Involvement of p38 in Age-Related Decline in Adult Neurogenesis via Modulation of Wnt Signaling. Stem Cell Reports.;12(6):1313-1328.<br /> Tepe, B., Hill, M. C., Pekarek, B. T., Hunt, P. J., Martin, T. J., Martin, J. F., & Arenkiel, B. R. (2018). Single-Cell RNA-Seq of Mouse Olfactory Bulb Reveals Cellular Heterogeneity and Activity-Dependent Molecular Census of Adult-Born Neurons. Cell reports, 25(10), 2689–2703.e3. doi:10.1016/j.celrep.2018.11.034.

    1. On 2019-05-10 17:15:35, user Leslie Vosshall wrote:

      We discussed this interesting paper at the Vosshall Lab Olfaction and Behavior Journal Club on May 8, 2019. The use of GCaMP for peripheral antennal imaging is really exciting because it opens up Anopheles mosquitoes to comprehensive investigation of mechanisms of olfaction. The paper also takes on the thorny question of the mechanism of action of insect repellents. The big ideas out there are: 1. DEET smells bad and repels insects. 2. DEET soaks up human body odor, making you invisible to them. 3. DEET scrambles the mosquito odor code so that humans smell like pizza-vomit-coffee-gasoline or something rather than just humans. This paper provides more evidence for model 2. Syed and Leal (PMID: 18711137) first pointed to DEET as binding odorants as the mechanism for blocking mosquito biting (e.g. if you coat yourself with DEET you become invisible) (model 2). Syed and Leal also provided evidence that DEET smells and repels, by activating olfactory neurons (model 1). Our group published a response to this and showed contrary evidence that at the concentrations we tested that DEET did not prevent odorants from volatilizing (PMID: 21937991). Our current data in fly and mosquito are consistent with model 3. This paper shows that DEET does NOT activate olfactory neurons but bind odorants (model 2), so neither model 1 nor model 3.

      So which model is correct? DEET is a seductive molecule to study scientifically; we are still no closer to closure on its mechanism of action and this paper adds an additional wrinkle that is worthy of further investigation by the field.

      We had the following feedback and questions (in no particular order):

      1. In Figure 1, is it possible to do an overlay to estimate which of the 7 identified cells reliably respond to which of the 6 tested odorants? This would extract more information from the figure and give some initial glimpses into mosquito odor coding.
      2. Scale Bar missing Figure S1
      3. The graphics would be easier to “read” if the Tufte “chart junk” of background grids were removed to let the data take center stage (example Figure S3a)
      4. The 1-octen-3-ol in Figure 1d and Figure 2a appear to be the same image, duplicated in different figures. It would be ideal to provide a new image or disclose this in the Figure 2 legend.
      5. We wondered if the odor activation code is really consistent across all segments, such that focusing on one segment would give a universal answer for all antennal segments? Or is there some zonal nature or functional specification that would alter the conclusions? The whole antenna images in Figure S1b suggest some variability in how many neurons are activated in a given segment.
      6. Do the natural repellents block/change odorant responses as DEET does to 1-octen-3-ol? We could not find this experiment in the paper.
      7. The paper focuses on one odor 1-octen-3-ol to build the case that DEET acts merely to reduce drastically the volatility of this odorant, thus reducing/eliminating the delivery of this odorant to olfactory neurons. Is this the case for other odorants? How would this one DEET molecule be able to reduce, mechanistically, the volatility of the hundreds of different molecules emitted by the skin? We are not chemists but DEET does not seem to be particularly reactive. Is it a covalent attachment promiscuously to every odorant or more hydrophobic van der Waals mechanism that blocks odorants from volatilizing when mixed with DEET? How could this work given the enormous range in the chemistry of human odor volatiles?
      8. Is Anopheles coluzzi repelled by DEET behaviorally?
      9. PIDs measure bulk ionized molecules but cannot identify them. What are the prospects for repeating this with GC-MS?
      10. Finally, if DEET acts by binding odorants on our skin rather than acting to repel (model 1) or confuse (model 3) wouldn’t you be bitten if you had a swath of skin that was not coated with DEET that was giving off human odor fumes?
    1. On 2018-10-11 14:05:32, user Luigi Antelmi wrote:

      Thanks for sharing this idea!<br /> Question 1: Why you use the log-likelihood to compare the models and not the ELBO, that should be a proxy to the data evidence?<br /> Q2: How do you compute the KL term in the non-gaussian prior cases?<br /> Q3: Are you willing to publicly share your code?<br /> Thanks for any answer you can give!

    1. On 2018-09-05 08:55:11, user daniele marinazzo wrote:

      Dear Benedikt and Olaf

      I am posting this here since academickarma.org cannot find the preprint, but looks like the reviews are automatically ported there.

      thanks a lot for this really interesting work.

      Here some suggestions I collected:

      general impression: the paper is very well written, and clearly specifies why the toolbox is needed, and the theory behind it.<br /> The limitations are also properly described.

      general questions:<br /> 1. would it be feasible to design an experiment in which stimuli are isolated as much as possible, and use these shapes as prior for the estimation of overlapping responses?<br /> 2. This paper "A Statistical Framework for Neuroimaging Data Analysis Based on Mutual Information Estimated via a Gaussian Copula"(https://onlinelibrary.wiley... "https://onlinelibrary.wiley.com/doi/epdf/10.1002/hbm.23471)") combines neuroelectrical data with stimuli (in the form of discrete data), and quantifies the interaction not only between different stimuli, but also, within the same ERP, between different ERP peaks (fig 13). Maybe you could use this info to estimate to which extent different stimuli overlap in the ERP, prior to your analysis?<br /> 3. would it be possible to extend the ANOVA to a MANOVA for multivariate analysis, as in the LIMO toolbox, within your deconvolution framework?<br /> 4. Do you consider effects of the combination of two responses, i.e. the fact that a "face" response could be different when combined to a color versus another? Here I don't talk about the overlap, which is the main motivation of your study, but to the fact that the separated responses could still be interacting? I refer to an interaction at the psychological/physiological level, not to the actual linear interaction, which you do model, so not sure whether this makes sense as a practical question.<br /> 5. How would the deconvolution look if we considered response-locked instead of stimulus-locked ERP?

      technical details:

      1. maybe you could describe the effect of the number of available events and the sampling frequency. I tried the extreme case of a single "car" event, and all the rest "faces", and of course the estimation is much noisier, but still present.
      2. how would the estimate change if using reconstructed sources instead of sensor space?
      3. This question comes from my experience with our blind deconvolution toolbox for fMRI (https://www.nitrc.org/proje... "https://www.nitrc.org/projects/rshrf)"), so maybe it's completely not relevant here: would it make sense to apply the toolbox when the timing of the stimuli are unknown, spotting a signature of a possible neural event in the data, and using the lag between the event and its signature as a parameter in the GLM? This would probably very short for EEG data.
      4. related question: qould it be possible to deconvolve also unknown (i.e. spontaneous) events?
      5. figure 6: why the data are padded with zeros before and after?
      6. You mention that truncating the spline has a filtering effect, it makes sense. Would this conflict with a filtering previously applied by the user?

      other small things:

      1. maybe you could add a sample of real EEG data to the code
      2. you could uniform the color scheme (sometimes you have parula, other times the cbrewer one).
      3. there are a few formatting issues in the figures.
      4. page 18, there a strange symbol replacing "pi" in "from 0 to 2X"

      code issues (more for github, but I put them here for consistency too):

      1. in the tutorial you use init_deconvolution, that does not seem to exist in the github version
      2. there is a typo "warning onW"
      3. the dependency on cbrewer could be made explicit, or you could check whether it's in the path, otherwise use a default color scheme
      4. simulate_data.m is called simulate_data2 in the function name

      thanks again, and looking forward to see more applications of this tool!

    1. On 2018-08-26 08:01:40, user ??? wrote:

      Hello!!! Thank you for your article very much.

      I’m from Kyoto Institute of Technology Japan, a student in the 1st doctoral course, focusing on the EEG spectrograms for emotion classification. As I’m a beginner in deep learning, not very good at constructing CNN or adjust the parameters of the network. Some network was searched online to apply to my research case, but they didn’t work very well.

      In the paper, it described that using CNN and RNN to classify EEG spectrograms successfully.

      I exerted a great interest in your research about the CNN and RNN for spectrograms classification, as they are texture property, different from the other classification issues. There’s an inappropriate request for you. Could you share the method or code page of CNN and RNN for me? If it’s confidential for you, you could refuse my requirement totally.

      Very sorry for the inconvenience brought to you. Thank you very much indeed.

    1. On 2018-08-01 21:10:02, user Caio Maximino wrote:

      The following comments are part of a PREreview of this pre-print (https://www.authorea.com/us... "https://www.authorea.com/users/219701/articles/311510-lanec-journal-club-prereview-of-differential-encoding-of-predator-fear-in-the-ventromedial-hypothalamus-and-periaqueductal-grey)"), and are intended as a review of the preprint "Differential encoding of predator fear in the ventromedial hypothalamus and periaqueductal grey"

      Overview and take-home messages:<br /> Masferrer et al. have made a significant advance to the field of the neurophysiology of fear by showing that some neurons in the periaqueductal grey and in the ventromedial hypothalamus respond to threat level, while other neurons are associated with motor responses. This contradicts the hierarchical model that suggests that more rostral regions are responsible for detecting threat and selecting responses, while more caudal regions execute motor responses. In addition, they have bridged a gap in our knowledge of how neurons in those regions code fear independently, as a distributed network. Although this work is of significant interest to the field, there are some concerns that could be addressed in the next version. These are outlined below.

      Positive feedback:<br /> Currently, two non-exclusive hypotheses are provided in the field of the neurophysiology of fear to describe the circuitry involved. One of them (e.g., McNaughton & Corr, 2008) suggests a rostrocaudal hierarchy, with rostral structures detecting and processing potentially threatening stimuli and more caudal structures providing responses to proximate threats. Another model suggests that more caudal structures, such as the periaqueductal grey, provide the motor output of this fight/flight/freeze system, and threat detection and response selection occurs at higher levels (e.g., Fanselow, 1991). The reported results present the fascinating concept that, instead of (or in addition of) forming a hierarchical circuit, both threat detection and motor output are distributed at both levels - at least for proximal threats. The authors develop this idea by recording single units from the dPAG and VMHdm, both regions which have been shown to be involved in antipredatory behavior, in awake, behaving animals, and temporally correlating cell firing with behaviors indicative of risk assessment or flight/escape responses. Future exciting directions for this research include simultaneous lesion or activation paradigms combined with the electrophysiological approach reported here, to try to understand whether VMHdm-dPAG projections modulate the activity of the latter.

      Major concerns:<br /> -Our major concern regards the lack of adequate statistical information that the data relies on. The authors report that the definition of units as "flight" or "assessment" cells was made via analysis of firing rate variation associated with the behavioral events. They show, in Figures 2 and 3, an apparent peaking of responses right before flight for "assessment+" cells, and right before flight initiation for "flight+" cells, and suggest, in the Methods section, that the definition of these categories was made by "Wilcoxon rank-sum test". Since the authors do not properly report the results of this statistical analysis, simply stating a p-value, it is hard to judge whether the classification is accurate. Perhaps using auto-correlograms would increase classification accuracy.<br /> -In addition to this issue, it is not clear whether the classification was made at the within-individual level (i.e., for each mouse) or at the between-individual level (i.e., for all mice). This is important because, at 4-8 mice per region, statistical power is considerably low, and can only reach and adequate level by pooling data from individual neurons at the between-individual level; however, this constitutes pseudo-replication, and can considerably inflate effect sizes and p-values. This lack of clarity impairs judgments on the replicability and generalizability of the findings.<br /> -Even though it could be expected that datasets and analysis scripts were not shared due to concerns with scooping before publication, this information can be privately shared with journal referees only, allowing them to assess the computational reproducibility of the statistical model used to classify cells, and therefore the robustness of the findings. We strongly recommend that the authors do so when they submit the paper to a journal, and also that this information is shared with readers after publication.

      Minor concerns:<br /> -The abstract has paucity of information; it should include more details on the results (e.g., how were cells classified?) so that readers can comprehend what the authors mean by "Distinct correlates of threat intensity and motor responses were found in both structures". <br /> -There is a discrepancy as to the time that the animal remains in the apparatus during the final stage of the experimental (when the animal has already been removed from the apparatus). In the Methods section, as well as in Figure 1, it is mentioned that free exploration occurs for 5 min, while in the Results section it is described as 10 min

    1. On 2018-07-18 02:48:03, user Jason D. Yeatman wrote:

      General summary:

      This is an important paper that carefully examines mechanisms that contribute to MRI-based measures of cortical thickness. The main question at hand is whether developmental decreases in MRI-based cortical thickness are picking up on synaptic pruning or are, in fact, driven by myelination of the underlying white matter. With hundreds of studies using MRI-based cortical thickness to study development, this is an extremely important methodological question to work out and this paper has important implications for the vast literature that uses these MRI-based measures to develop theories of cortical pruning during childhood.

      The paper demonstrates that multiple mechanisms contribute to MRI-based measures of cortical thickness and that one of the primary mechanisms is myelination as oppose to the thickness of cortex per se. Since MRI voxels are millimeter scale measurements, myelin in the superficial white matter drives the voxels on the gray/white boundary to become brighter on a T1-weighted image and this change pushes the gray/white boundary closer to the pial surface leading to an apparent thinning of cortex. The findings and analytic approach is quite elegant, combining conventional T1-weighted images with quantitative MRI measures and post-mortem histology. However, while the paper has important implications for the MRI-based literature on development, it should not be taken as redefining what post-mortem studies have shown about pruning. Since the paper argues that MRI-based measures of cortical thickness are not in fact sensitive to synaptic pruning, then it does not make sense to interpret these data as ruling out (or testing) the pruning hypothesis. In summary, the paper is a major and important contribution to the MRI literature, but I think that some of the interpretations and assertions need to be revised/clarified and that there are a few additional analyses that would help generalize the findings.

      Specific comments:

      -There is an extensive literature on synaptic pruning over development that is based on careful post-mortem measurements. Some of the claims in the paper seem to contradict that literature. The paper acknowledges that the various hypotheses about development are not mutually exclusive (introduction page 3), but then goes on to interpret the data as if it invalidates post-mortem studies on pruning. For example, paragraph 4 asks how MRI measures can differentiate these hypotheses, but I do not think that the paper shows that MRI measures can. Since the hypotheses are not mutually exclusive, it is possible that myelination and synaptic pruning happen together and that MRI is not sensitive to synaptic pruning and, thus, is primarily driven by myelination. MRI (at least the methods employed here) jusr might not be appropriate for differentiating these three potential mechanisms. To be clear, this point does not make the findings here any less important - The results clearly demonstrate that myelin and other factors contribute to apparent thinning of cortex measured with MRI. However, these findings should not be presented as refuting the idea that synapses are pruned during development – only that pruning is unlikely to be the mechanism that is driving MRI-based measures of cortical thinning. I think that this is an important distinction that should be carried throughout the paper and could be accomplished through some small changes in wording. For example, the fist sentence of the abstract asserts that the mechanisms of cortical thinning during development are unknown – this assertion seems contrary to a multitude of careful post-mortem measurements (e.g. see work by Huttenlocher and others). If the statement were revised to be specific to the MRI literature then it would be true and consistent with the results of the paper.

      -The analyses showing that T1 values predict much of the variance in cortical thickness is a nice finding, elegantly demonstrating how individual differences in T1 relaxation rates (likely driven by myelin) affect segmentation algorithms. To generalize this finding it would be extremely useful to generate maps of variance explained across the whole cortical surface. In other words, expanding beyond the functionally defined ROIs, how general is this finding? Can most of the variance in cortical thickness be predicted across the whole cortical surface? If so, we should probably refrain from using the term “cortical thickness” and come up with a new term that more accurately conveys the mechanisms driving this measure.

      -The post mortem data is a beautiful example demonstrating how myelinated fibers entering the cortex can influence MRI-based measures of cortical thickness. To generalize this finding, synthetic T1-weighted images could be generated at millimeter resolution from the histology data and then passed through the freesurfer segmentation algorithm. This analysis would make it possible to directly test how differences in myelin push the gray/white boundary closer to the pial surface across the brain. If this is not possible, a similar analysis could be done based on the high resolution (50 micron) T2* weighted images.

      -Rather than simply dilating the ROIs into the white matter, would the results be the same if the surface normal from freesurfer were use to extend the white/gray surface into the white matter by a few millimeters?

      -Discussion page 14 – “First, we found no evidence of pruning after age 5 in any region of VTC.” I would suggest revising this sentence to state “First, we found no evidence that MRI-based measures of cortical thickness are sensitive to pruning in any region of VTC” or something along these lines. My read of the Results is that it is unlikely that pruning is a major driving mechanisms for these MRI measures but the fact that we are not measuring pruning does not mean that it isn’t occurring.

      -Discussion page 15 “Our data provide evidence that increased myelination of axons during childhood is a key source of cortical thinning in VTC after age 5”. I would suggest revising this sentence to “Our data provide evidence that increased myelination of axons during childhood is a key source of cortical thinning in VTC after age 5 measured with MRI.”

      -The carefully designed analytic approach in this paper would be useful to many other investigators. Moreover, given the complexity of the analysis it is difficult to ascertain how various methodological choices contribute to the overall findings. The impact and reproducibility of this work would be increased if the authors posted the code in an open repository. If it is too early to release all the data then single example dataset would suffice to demonstrate how these various measures are extracted and summarized.

    1. On 2018-07-03 15:17:25, user Dan Tracey wrote:

      This paper has a nice series of experiments that convincingly demonstrate that sensory input from the tarsi on the legs of Aedes aegypti mosquitos promotes the repellency of the well-known compound DEET when these mosquitos feed on human skin. Preventing the tarsal contact by providing very small patches of DEET nicely shows that the gustatory neurons on the mosquito feeding appendage (the stylet) are not sufficient to mediate avoidance of DEET. As well, preventing tarsal input by sealing the legs with glue blocks repellency with DEET- further suggesting that input from the tarsi is necessary for DEET avoidance. This is true even though other experiments indicate that DEET can function as a gustatory feeding repellent. For instance DEET blocks feeding on sucrose in a CAFÉ assay and it also blocks feeding when mixed with blood in a glytube assay. The key findings in the study include some interesting differences that are found between the actions of DEET and several bitter tasting compounds. First, DEET can prevent feeding when it is only present on the surface of the glytube feeder while quinine cannot. Second, two bitter compounds (quinine and lobelline) fail to inhibit feeding when applied to human skin. Importantly, all of the experiments are carried out in orco mutant mosquitos thus ruling out olfactory inputs in this repellency mechanism.

      The major conclusion of the study is that DEET inhibits feeding through a mechanism that involves contact with tarsi (and not the stylet), and that this effect cannot be replicated with the tested bitter tasting compounds. These conclusions are well supported by the data.

      The paper is short and “sweet” yet there are areas that the paper could be substantially improved to help readers that do not have extensive expertise in the chemosensory system of insects.

      For instance, the manuscript is almost completely lacking in any citation of the prior literature on the study of tarsal taste. Instead, the authors create their own definitions: restricting the use of word taste to describe sensation of the mouthparts. In doing so, they attempt to redefine the tarsal taste system as not a taste system. In places, the manuscript is written in a way that suggests to the reader the discovery of a new chemosensory system that was previously unknown. Yet, the prior literature on tarsal taste is substantial and goes back for at least 100 years. The field started with the early discovery that butterflies can taste the sweetness of nectar with their feet and it is the sugar touching the feet that triggers the proboscis extension.

      It may be more interesting for a novice reader to simply learn that insects of all kinds are very well-known to taste with their feet and mosquitos are no exception. This suggested change to the manuscript would also allow the authors to provide a more thorough and rigorous treatment of the prior literature.

      Indeed, the prior literature might provide a potential explanation for how DEET could be acting through a tarsal “bad taste” system even if quinine and lobelline are not capable of blocking feeding through the tarsi in Aedes aegypti mosquitos.

      For instance, Ling, Dahunakar et al, while in the Carlson lab, performed detailed electrophysiological recordings of the tarsal gustatory sensilla in Drosophila. They found that some bitter compounds are able to activate tarsal gustatory neurons without activating gustatory neurons on the labellum. Other bitter tastants activate both labellar and tarsal gustatory neurons and so on. Flies are also known to have a diverse repertoire of bitter taste receptors and the expression patterns of particular receptors acts as a combinatorial code that produces diverse responses to various chemicals across classes of bitter taste sensilla. It seems likely that a similar molecular logic will be found to be in place in the tarsal and stylet taste receptors of mosquitos.

      As always, the authors should be applauded for their important work studying the relatively intractable Aedes system. Yet, it is not clear that it is warranted for them to conclude that the repellent actions of DEET cannot be adequately studied in organisms such as Drosophila. The possibility that DEET acts on the tarsal taste system of flies has not been ruled out and remains a likely possibility. Indeed, the broad spectrum action of DEET makes the identification of its conserved molecular target(s) (whatever they may be) of even greater importance.

    1. On 2018-06-23 22:01:57, user Taylor Salo wrote:

      This preprint is really interesting! Thanks for putting it up on biorxiv.

      In reading it, I noticed some typos and possible points of ambiguity. I annotated them on hypothesis if you're interested (https://via.hypothes.is/htt... "https://via.hypothes.is/https://www.biorxiv.org/content/biorxiv/early/2018/04/11/299024.full.pdf)").

      I had a couple of questions about the preprint:<br /> 1. Were the foci from Neurosynth convolved with binary spheres? Based on the values in Table 1, I would assume so, but I didn't see any mention of spheres or radii in the text.<br /> 2. This is a pretty naive question, but would the same approach work with statistical images (e.g., unthresholded maps from NeuroVault or maps reconstructed from coordinates) instead of binary vectors?<br /> 3. Did you consider consolidating synonyms in the abstracts using an ontology like the Cognitive Atlas? That would also make it possible to incorporate multiword expressions, which don't seem to be captured in the current approach.<br /> 4. Is the code for the model available anywhere yet?

    1. On 2018-04-18 08:56:23, user Guillaume Rousselet wrote:

      Interesting result, but a more accurate title would be:

      "Spontaneous Pre-encoding Activation of Neural Patterns Pearson correlates (r=0.56, n=23) with Memory"

      You could also provide a bootstrap confidence interval for the correlation. See R and Matlab code here for instance:<br /> https://www.frontiersin.org...

      The bar graphs would be better replaced with scatterplots - see guidelines in these papers for instance:

      http://journals.plos.org/pl...

      https://onlinelibrary.wiley...

      I also strongly encourage you to read this paper, to go beyond the p<0.05 cutoff to make decisions and report results:

      https://arxiv.org/abs/1709....

    1. On 2018-03-28 11:47:56, user Tom Wallis wrote:

      This is an elegant and potentially important paper. It shows that at least for one domain (motion adaptation), one may be able to distinguish perceptual aftereffects from decisional aftereffects via confidence in perceptual decisions: putatively perceptual adaptation produces both decision and confidence shifts, whereas putatively post-perceptual aftereffects seem to be associated with decision but not confidence changes. If this finding is shown to hold in other domains, it has the potential to contribute some sorely-needed empirical discrimination to a longstanding debate in the literature regarding whether perception is "encapsulated" from cognition.

      I have a few very minor comments that might improve the paper:

      * I strongly encourage the authors to make their code and data publicly available in a reputable third-party repository. See https://opennessinitiative.org for details on how to do this.

      * Reporting of the Bayes Factor analyses are unclear and should be expanded (at least in an appendix). First, the authors should report BFs also for the direction decisions. For example, while statistically significant, the difference in PSE for Expt 2 is tiny compared to Expt 1. It would be good to see the Bayes Factors there, too. Also, I don't understand the "assuming minimal and maximal effect sizes +/- 50% from the decision effect size" statement. One should also show a sensitivity analysis (sensitivity to the strength of prior difference) in an appendix. This is easily computable in the JASP software package.

      * A slightly modified paradigm would presumably (?) yield the same substantive results but take less time. Provide four buttons: "left sure", "left unsure", "right unsure" and "right sure". Or do the authors believe that the result requires the clear separation of the decision and confidence judgments for the observers?

      * missing references to neural evidence for implied and real motion sharing a common neural substrate (p 22):

      Krekelberg, B., Dannenberg, S., Hoffmann, K., Bremmer, F., & Ross, J. (2003). Neural correlates of implied motion. Nature, 424(6949), 674–677.

      Krekelberg, B., Vatakis, A., & Kourtzi, Z. (2005). Implied motion from form in the human visual cortex. Journal of Neurophysiology, 94(6), 4373–4386.

      * Two very nitpicky points about the Declaration of Helsinki: (1) written consent is preferred if the participant is able (article 26), and (2) the study must be registered publicly before data collection (article 35). Therefore this study doesn't conform to the declaration, as stated. In my own papers that are not pre-registered, I state instead: "All procedures conformed to Standard 8 of the American Psychological Association’s “Ethical Principles of Psychologists and Code of Conduct” (2010)." [It is unfortunate that one must appeal to the pre-2015 ethical principles of this organisation, but there you have it].

    1. On 2017-09-12 10:59:35, user Max Losch wrote:

      Hi, I would like to shift your attention to a related work on investigating the relationship between representations and tasks: http://www.biorxiv.org/cont...<br /> Furthermore, I want to note that it seems ill-considered to simply set the weights to 0, as they are still interpretable as contrast information to their projected units. I would find it interesting to see whether your results remain consistent if you're using a Bayesian approach as in the aforementioned article. The code is available online btw: https://github.com/mlosch/F...<br /> Best regards

    1. On 2017-08-14 12:32:41, user Sebastian James wrote:

      Alex, I don't think you've stated what the dopamine value was for the results in this paper. If the value in my copy of the code is right, then DOPAMINE (referred to as d on page 8 of the manuscript) was 0.2.

    1. On 2017-07-13 19:35:56, user Gerard Rinkus, Ph.D wrote:

      This is an interesting article describing specific functions for the neocortex's layers<br /> and for its macrocolumnar tiling. Its main idea is that multiple codes in an<br /> 'input' layer, e.g., L4, can become associated over time with a single, more<br /> temporally stable, code in an 'output' layer, e.g., L2/3, which, as they point<br /> out, conforms with Hubel & Wiesel's original concept of simple and complex<br /> cells. They also propose layer 6a and 5 as another possible instance of this input-output circuit. As they state, this association principle applies equally well, whether the sequences of active codes in an input field arise due to movements of objects in the world or to movements of the receptive field with respect to the world (e.g., eye saccades).

      Readers (and actually, the authors too) might also be interested in another, quite similar,<br /> sparse distributed representation (SDR) based theory of sequence learning, recognition, recall, and inference, called Sparsey. As described in Rinkus (2014)(http://journal.frontiersin.... "http://journal.frontiersin.org/article/10.3389/fncom.2014.00160/full)"), Sparsey uses the same concept of associating sequences of SDRs in one field with longer-lasting ("persisting") codes in another field, but posits the two fields in question as being at different (adjacent) levels of the cortical hierarchy (as opposed to different layers at the same cortical<br /> level (region). The 2014 paper is based on earlier descriptions of the core SDR-based sequence memory model, e.g., Rinkus 1996 (http://www.sparsey.com/Rink... "http://www.sparsey.com/RinkusThesis.pdf)"), 2010 (http://journal.frontiersin.... "http://journal.frontiersin.org/article/10.3389/fnana.2010.00017/full)"). It seems plausible that both instantiations of the basic principle for achieving invariance, associating multiple, more transient codes, with single, more persistent codes (which essentially underlies Hubel & Wiesel's explanation) could be operative in the brain. It's important to realize that this kind of mechanism allows learning of essentially arbitrary invariances, which amongst other things, could include the kinds of invariances that have for decades been hard-wired, i.e., translation, rotation, and size invariance (i.e., log polar Fourier).

      If above links don't work, they are all available at www.sparsey.com/Publication....

    1. On 2017-04-14 11:35:05, user Olivia Guest wrote:

      This is a really exciting finding, especially from my perspective. In a preprint of our own now published in eLife (see: http://dx.doi.org/10.7554/e... — Guest, O., Love, B. C. (2017). What the Success of Brain Imaging Implies about the Neural Code. eLife. doi: 10.7554/eLife.21397) we had predicted that downstream areas are less likely to be easily decodable based on the behaviour of two neural network models.

      What do the authors think of our explanation/prediction with respect to why such downstream areas, like the prefrontal cortex, are less likely to be decodable, i.e., that representations become increasingly more orthogonal to each other and thus the richness of the similarity space is lost?

    1. On 2017-04-11 23:48:44, user Jean Lienard for David's lab wrote:

      We presented this preprint during our latest journal club, and we summarize below our main remarks, points of discussion and suggestions. Our approach was inspired by Prachee Avasthi’s preprint journal club: http://asapbio.org/preprint...

      We were very excited by the thought-provoking paradigm of studying multiplexing in neural activity. The manuscript is rich and packed with analyses, some of which could deserve a paper in their own right. We found overall that this is a great work addressing a wide-ranging and probably under-studied question, and is thus of general interest to a broad readership.

      We liked that the many analyses of this preprint were all made on data obtained with a single experimental setup. Statistical analyses are performed to address if and how neurons could perform multiplexing, and we enjoyed going through them. However, these analyses are complex and we sometimes found it hard to relate the very heavily processed data shown in figures with the underlying neural activity. We thus believe that this manuscript would benefit from showing more examples of cell activity along with the digested result of the statistical analysis. For example, the authors could display the Poisson distribution (Figure 4a) for the examples shown in Figure 3, along with its inferred Poisson/Poisson mixture distribution (as an "intermediate" step between Figures 3 and 4). In addition, showing neural data would be particularly useful to see how well the DAPP approach captures temporal dynamics (illustrating the time-dependent alpha(t) function in Figure 5B with reference data would be very useful). We felt that such illustrations of real spiking data would go a long way to make the paper more accessible and more convincing for non-statisticians.

      After having read and discussed the preprint, we agree that some analyses need to be done on a very limited proportion of the recorded neural data (less than 10%), due to the restrictions imposed by the statistical modeling. We believe that this is not a shortcoming of the analysis but rather a natural consequence of the chosen modeling approach. However, such small sample sizes still call the question: how many neural recordings are required to reach the same conclusions? Or in other words, how many neural recordings would another lab need to reproduce the results presented here? One way to answer this question would be to redo the most important analyses using only a subset of the data, and see how many of them are required to reach similar conclusions.

      Among other minor remarks, the introduction does a good job at explaining the theoretical framework but the reader has no idea what the Results section will contain. This makes it harder to grasp the content of the Results during the first reading. It would be friendlier to the reader to outline a summary of the approach at the end of the Introduction or at the beginning of the Results section. We also found that the DAPP fitting methods was difficult to fully understand (Supplementary Materials, subsection "Analyses of fluctuations in neural firing across and within-trials, and inclusion criteria"). A simple way to clarify the whole process - and enable reproducibility - may be to attach the source code of the statistical analyses. Finally, to conclude our minor remarks: we felt that some terminology was foreign to us, and translating some of it in auditory neuroscience terminology could be a plus to convey the message. For example, adding some biological examples of multiplexed code in the brain as well as put the problem in the context of existing auditory streaming literature would help the reader understand the "problem". Also, explaining how the place code (referred to as "map for sound frequency") would only "partially ameliorate this situation" would be beneficial to a reader that is more computationally oriented. In the same vein, the terms discussing Monte Carlo Markov Chains slightly confused us: "hypothetical new trials", "hypothetical future draws" - in our experience MCMC is typically explained as re-sampling from the observed probability distributions, rather than about hypothetical future recordings. This is arguably a small distinction.

      Again, we were very positive about this preprint and enjoyed discussing it at our journal club. We also thank the authors from releasing a preprint of their study. We hope that these comments will help making this work even better.

    1. On 2017-03-01 23:44:59, user gedankenstuecke wrote:

      Just one thing I stumbled upon: "Licenses are also important to protect you from others misusing your code." This needs some explanation in what is meant imho, because as far as I can tell there aren't any OSS licenses that make real limitations potential "mis-re-use" (as in "research i don't approve"), but rather give limitations on e.g. commercial uses?

    2. On 2016-04-05 06:18:51, user Chris Gorgolewski wrote:

      One of the complaints I often hear when encouraging people to share code concerns providing user support. Researchers are worried that, by publishing their code they are obligated to provide time consuming user support and reply to countless emails describing installation issues and requests for new features. Many people don't share their code to avoid that. My response to such concerns is: build a user community around the code that will be able to help each other. It is as easy as setting up a Google Groups mailing list and clearly stating that all announcements and user support will be made though such list (NeuroStars.org could also be used for this). This takes the pressure from getting personal emails send directly to you, creates a publicly searchable database of user issues and helps users to help other users. We have described this solution in our recent paper: http://biorxiv.org/content/...

    1. On 2016-12-22 20:18:01, user mauromanassi wrote:

      Dear Will and Peter,

      Congratulations on your new paper! We have read it with great interest. We listed below our concerns and comments on it. We hope you will find these comments useful, we wrote them with a very constructive spirit hoping to improve the manuscript.

      General comments:

      1. You mentioned that three general classes of mechanism have been advanced to account for crowding (positional uncertainty, feature averaging and source confusion). How do you consider grouping? Another mechanism? When do you think it occurs? Any assumption would have strong constraints on the way the model is built.

      2. Lines 172-176. It is not clear why mixture modeling based on maximum likelihood would fail to predict the underlying distribution of a data set. This technique has been widely used in the visual short term memory literature as the author properly cited. Some of us have also been using it for explaining visual masking and its interaction with spatial attention (Agaoglu, Agaoglu, Breitmeyer, & Ogmen, 2015; Agaoglu, Breitmeyer, & Ogmen, 2016).

      3. Categorizing errors based on their distance to the nearest model prediction is technically equivalent to mixture modeling with three circular Gaussians, each sitting at the error predicted by each model (averaging, substitution etc.). So the method used here is qualitatively similar but quantitatively seems rather arbitrary. The current way of analysis implicitly assumes that the best way to account for crowded responses is a mixture model with (at least) three components, and then goes onto quantifying the weight of each component as a function of target-flanker spacing.

      Minor comments:

      The novel contribution of this study is a bit unclear to us. If it is to show that a population code of orientation selectivity can generate all types of errors, what is exactly the difference between your previous paper (CB 2015) and this manuscript?

      Poder & Wageman 2007 study is highly relevant to this work. Also Ester and colleagues' studies used a similar approach, and the differences in model parameters between similar and dissimilar flankers in Ester et al. (2015) and the differences between one-gap flanker and two-gap flanker conditions in this study would be very interesting to compare.

      In a recent study using the stimulus paradigm that you used previously (Agaoglu & Chung, 2016), we have shown that this particular stimulus paradigm is prone to eccentricity confounds. Perceptual errors are highly affected by the absolute orientation of the target and flankers, not just relative to each other. It is unclear how this affects the results reported here.

      Line 34. It is fair to ask to cite our relevant work (Agaoglu, Chung, & Ogmen, 2016) where you described previous work on crowding and eye movements, since we presented a different point of view. The same holds for Pachai, Doering & Herzog 2016 (you cited only the reply to the reply). As scientists, we can agree to disagree, we hope.

      Line 143. Except for N1, perceptual error does not seem to follow a linear trend. For A2 there is an increase in perceptual error only for the smallest flanker size. You may want to revise that sentence.

      Line 270. We have a supporting evidence for this sentence. The role of masking is indeed increasing random guessing and slightly decreasing stimulus encoding precision (Agaoglu, Agaoglu, Breitmeyer, & Ogmen, 2015). However, ruling out metacontrast masking only because of this seems weak. Since the stimulus duration was 500 ms, we don't think there is any masking at all. You might also want to mention that to support the claim made in this sentence.

      Mauro Manassi<br /> Mehmet Agaoglu<br /> Michael Herzog<br /> Susana Chung

    1. On 2016-08-24 20:55:25, user Tal Yarkoni wrote:

      This is an innovative and very thought-provoking paper that will hopefully be widely read by researchers working with fMRI. I have two general comments with respect to the authors' main thesis:

      1. As far as I can tell, the authors don't motivate the decision to focus exclusively on sub-voxel representations. They point out that non-smooth sub-voxel representations would be impossible to detect with fMRI, which is an important observation. But surely non-smooth *supra-voxel* representations would still be easily detectable with fMRI. A priori, there doesn't seem to be a good reason to rule out this kind of representation in the brain. As far as I can tell, representational similarity analyses would still work successfully if the brain were composed of hundreds of functionally discrete tiles that were non-smooth at both the sub-voxel and supra-voxel levels. This doesn't seem like a far-fetched possibility; for example, suppose that when people think about penguins, they're somewhat more likely to think about the unusual climate in which penguins live. Representations of climate may be non-smooth, yet reside in fundamentally different brain circuits from representations of physical shape, size, etc. One consequence would be that neural representations of robins would almost certainly more closely resemble those of sparrows than those of penguins even if there were no spatially graded sub-voxel representations at all in the human brain--simply in virtue of sharing a larger number of salient properties with the former than the latter. Of course, I'm not suggesting that there _aren't_ smooth sub-voxel representations in the brain, but simply that the authors conclusion that "the neural code must be smooth, both at the subvoxel and functional levels" doesn't necessarily follow.

      2. Even if one assumes that the signal detected by fMRI is in fact driven entirely by smooth sub-voxel representations, it still wouldn't follow that the neural code must be smooth at the sub-voxel level. All we would be able to conclude is that there is at least *some* component of the signal that is smooth. This would not preclude other neural codes from existing, and in fact, we already have abundant evidence of non-smooth sub-voxel representations. For example, ocular dominance columns clearly exist, and if fMRI is unable to detect them, that reflects a limitation of fMRI, not a generalizable claim about the way the brain represents information. While I'm not a systems neuroscientist, I would imagine that there are any number of examples in the systems neuroscience literature of non-smooth, but highly structured sub-voxel representations that would probably be completely undetectable with fMRI. So I think the authors may want to be more circumspect about the conclusions they draw. Their results don't really show that only a subset of neural coding schemes are plausible; rather they suggest that whatever neural representations fMRI is capable of detecting are likely to stem from either (a) smooth representations (either sub- or supra-voxel) or (b) non-smooth supra-voxel representations. This leaves open the possibility (and it seems like a very real one) that the vast majority of information represented in the brain is not represented in a way that is amenable to detection with fMRI.

      Setting these concerns aside, I think this is still a paper that should be of great interest to most cognitive neuroscientists. One point that is made very elegantly here is that the nature of neural representations does not have to be (and probably isn't) uniform across the brain. In particular, the authors put forward a compelling argument for the possibility that brain regions higher in the processing stream--and that are more likely to represent very abstract, multidimensional information--may not be amenable to imaging at all. This point should give many fMRI researchers pause when considering studying, e.g., the representational structure of prefrontal cortex. At the very least, the manuscript raises a number of important questions that should spur further discussion within the neuroimaging community.

    1. On 2016-07-28 02:16:01, user Patrick Mineault wrote:

      I appreciate the very narrow point that the authors are making - i.e. that a certain class of GLM-like models doesn't work very well when it comes to explaining RGC responses to natural stimuli -, but I feel like the claim that "retinal signaling<br /> under natural conditions cannot be captured by models that begin with linear filtering" is a bit too broad.

      First, a very simple two subunit-model qualifies as a "model which begins with linear filter"; e.g. two linear subunits, followed by a pointwise nonlinearity, then combined linearly and followed by another pointwise nonlinearity. Dan Butts (McFarland at al. 2011, J Neurosci) showed that that goes a long way in explaining variability in LGN cells in response to natural stimuli (http://neurotheory.umd.edu/... "http://neurotheory.umd.edu/Publications_files/Butts2011.pdf)"), and one would presume would resolve some of the deficiencies in the RGC models.

      Second, the claim that '[these findings] emphasize the importance of additional spatial nonlinearities, gain control, and/or peripheral effects in the first stage of visual processing." is both a bit strong and a lost opportunity. It makes it sound like GLMs and GLM+s are so deficient that you're going to need to throw the kitchen sink at the problem to explain natural image responses. Why not fit a simple subunit model and see if that helps? Then that will help us understand the limits of the current model and how much extra modeling really needs to be done to explain natural movie responses. The code is available for this (http://neurotheory.umd.edu/... "http://neurotheory.umd.edu/nimcode)"), it seems like a pretty low-hanging fruit.

    1. On 2016-07-05 11:37:07, user Marius Pachitariu wrote:

      Dear Nicholas,

      Thank you very much for your feedback.

      Indeed we remove any rois that we detect on consecutive planes, but I realized after reading your message that the top two planes are not exactly parallel at 30um intervals, and because of that the algorithm that removes these doubles was not working correctly. As you probably well know, the momentum of the piezo during reversal at the top plane makes the top of plane 2 be very close to the top of plane 3. We shall correct these when we re-upload.

      Regarding the code neuro data, I have in fact spent quite some time with that data and have my own opinion of it. I believe the datasets with the red nuclear marker are not very valuable: they label on the order of 500 cells per volume, while only 5-10% of these are active in the short imaging period provided. As such, two of the algorithms we tried (including Suite2P) only found very few ROIs. In fact, using the mean image provides a much better score, and I have developed such an algorithm in the past:

      https://papers.nips.cc/pape...

      We have shifted away from such algorithms, because the majority of cells they find are silent (except for neuropil contamination). So we prefer to exclude such ROIs anyway, rather than severely bias our recordings towards neuropil-only ROIs.

      For the other comparisons you propose, we should certainly do some of those. One thing to note is that while the short segment of recording is brief, we always run the algorithm on a full recording session of 1-2 hours. Otherwise, weakly expressing and sparsely spiking cells would not be found. Note also that we greatly encourage users to check the results of the algorithm in the GUI after the automated results, and so we are not aiming for a fully automated algorithm, which seems hard and out of reach currently.

      Best wishes,<br /> Marius

    2. On 2016-07-02 13:17:26, user Nicholas Sofroniew wrote:

      Hi Marius

      Great to see a preprint on your algorithm and your code up on github! I look forward to trying it out on my data. I had a few questions first though.

      For the analysis of multiplane imaging data are you doing any post-processing to ensure that you are not detecting the same neuron in multiple planes (such as looking at the cross correlation between rois at the same location in neighboring z planes)? Just briefly visually inspecting the data in figure 2 it looks like you might have quite a lot of double counted neurons (see my figure below). Did you exclude these from your estimate of >10,000 simultaneously recorded neurons?

      With this in mind, are you making any attempts to validate your algorithm against real ground-truth data (i.e. data where GCaMP activity has been recorded in neurons with a red nuclear marker, which enables easy automated segmentation)? If you cannot generate such data, there are some publicly available datasets that come close to that form at http://neurofinder.codeneur.... I would find analysis of such data much more informative than your analysis of the transplanted rois.

      I would also like to know more about how the choice of imaging parameters - pixels per um, frame rate, laser power (i.e. SNR), and duration of time session - effect the segmentation accuracy, (false positives, false negatives) of your algorithm. You chose to show data collected at 2.5 Hz over a ~900 x 930um FOV, with what looks like 512 lines and 512 pixels per line, for maybe 5 minutes. I would find it useful to know how the results of segmentation would have changed if you had changed these parameters (either by first acquiring higher resolution data and artificially down-sampling it, or by acquiring datasets with different imaging parameters and making comparisons across datasets)?

      I think such numbers and a comparison with real ground truth data would be a real benefit to the calcium imaging community.

      Thanks,<br /> Nicholas Sofroniew

    1. On 2016-06-09 18:55:53, user Stephen Smith wrote:

      Very interesting article. The point that many of the techniques we use in neuroscience measure impossible-to-interpret epiphenomena is very clearly stated. What I'm not sure I understand is, what would the authors consider a success, or what's the "goal" of brain research?

      -You mentioned replacing a broken "unit" with an artificial one, but I think that would be possible for a chip. Perform "electrophsyiology" on the inputs and outputs of a transistor, and you could probably figure out the input/output relationship and solder in a new transistor. Likewise, I think there are a few other solvable problems that you could discuss:

      -Can you figure out that the "point" of the system is to output a video game? For that, you would need to understand the output involves a cathode-ray beam in a TV that is lighting up pixels by sweeping across the screen. You would then need to identify the correct bit of the circuitry as the output, figure out the beam is sweeping at 60hz, figure out the coding for the RGB, ON/OFF of each pixel, number of pixels, ect. That would allow you to de-code the picture. If you started with a complete system (ie an ATARI+joystick hooked up to a TV), this might not be too much of a stretch. So, how would you look for the output-level activity of the chip? How would you identify the 'input' vs the output vs internal processing?

      -Can you figure out the software? Would there be a possible way to reverse-engineer the code that's being input into into the system, based on the behavior of the transistors? Even if we do 'map' the entire brain, the software needs to be understood too...

      Overall, very interesting paper. I think it could be improved if you better explore how we COULD answer these questions using the chip. This might inspire neuroscientists to think about analogous ways to answer those questions in the brain.

    1. On 2016-02-21 06:07:40, user Yaroslav O. Halchenko wrote:

      Well -- although it will sound like I am just trying to squeeze a citation in (http://www.gigasciencejourn... "http://www.gigasciencejournal.com/content/4/1/31)") but I guess I wrote that piece for a reason.<br /> You state that pillars of "Open" science are data, code and papers... but oh well -- none of those alone make it "Open". Even deposition of all those online and publishing in "open access" journals doesn't make them sustainably open, since those might (and do) go away in X years, and if noone had permissions to duplicate, reuse, improve upon your works -- what "open" is that? It is as open as a door of a limo standing at your doorstep but which you cannot ride.<br /> To guarantee that science product is open, it must be allowed for its widest dissemination and reuse. For that clear statement of copyright and license terms must be made, and no "exclusive licenses" be provided to take away those freedoms and place them into a single hand (which is often the case with publications in some "open access" journals). But this manuscript doesn't even mention a word "license". IMHO it is ignoring an elephant in the room ;)

    1. On 2014-06-10 13:49:57, user Authors of the manuscript wrote:

      Dear Mike X Cohen,

      this kind of personal commenting is much more helpful and constructive for the authors than the anonymous peer-review process and we thank you for taking your time to write this comment. We respond to some of your points in the following:

      MXC: “It is not always clear whether the authors are criticizing the biophysical interpretation of CFC analyses, or the mathematical foundations of CFC methods. Perhaps it would be useful for the authors to define the situations under which CFC could be validly interpreted, and what exactly the neurobiologically meaningful interpretation would be.<br /> Concerning the former, the authors accurately state that relatively little is understood about the neural mechanisms that could produce CFC, and this may impede interpretations of empirical findings (the same criticism applies to most macroscopic measures of brain activity, including ERPs, time-frequency power, most measures of functional connectivity, the FMRI BOLD response, etc.).”

      Authors:

      We agree with this comment in the sense that indeed many measures in Neuroscience depend on an interpretational step. However, in contrast to the current handling of CFC, these aspects are well acknowledged for measures like BOLD and ERP. In addition there have been intense efforts to disentangle various generating mechanisms of BOLD signals and ERPs. (For the origin of the BOLD signal, the role of astrocytes, lactate, and calcium see for example: Niessing et al, Science, 2005; Logothetis et al., Nature, 2001; Barros, TINS, 2013; Petzold&Murthy, Neuron, 2011; Iadecola&Nedergaard, Nat Neurosci, 2007 . For generating principles of the ERP see for example: Mazaheri & Jensen, J Neurosci, 2008; Turi et al. NeuroImage, 2012; Telenczuk et al, J Neurophysiol, 2010, and references therein).

      In these fields, the variety of generating mechanisms is typically discussed and wording is carefully chosen. With respect to the interpretation of CFC measures, this care is often lacking. Moreover, the mathematical methods of CFC are more involved compared to standard BOLD-fMRI or ERP analyses. Therefore, plain technical errors in published work occur more frequently than in either ERP or BOLD fMRI studies.<br /> _____

      MXC: “Their suggestion for researchers to label their CFC analyses as relatively exploratory vs. confirmatory and as a marker vs. biophysical understanding (figure 5) is also sensible (this suggestion also could be applied to most or perhaps all measures of brain activity). The reliance on DCM should be cautioned against the over-parameterization and opaqueness of DCM models used in practice.”

      Authors:

      We agree with this comment insofar as the mathematics involved in DCMs is necessarily much more involved than that in the current standard CFC analyses. In our opinion however, this is outweighed by the advantage to be able to state the relative odds for and against the presence of a CFC mechanism in the data. Moreover, we also agree that the mathematical complexity of model specification indeed results in a certain opaqueness, especially to the lay.

      We disagree with the criticism of over-parametrization, as models selected by Bayesian model comparison need two properties: (1) the ability to explain the data well, and (2) generalizability. The latter is ensured by automatically favoring models that explain the data well without using an excessive number of parameters, thus implementing Occam's razor. However, it is indeed necessary to carefully specify models for comparison, that are plausible a priori, based on existing knowledge (Lohmann et al, NeuroImage, 2013; comments by Friston et al, NeuroImage, 2013; Breakspear, NeuroImage, 2013; reply by Lohmann, NeuroImage, 2013). This requirement may mean that DCMs of CFC will have to wait until the mechanisms underlying CFC are spelled out more explicitly using interventions.<br /> ____

      MXC: “the general point is that methods for assessing CFC are not necessarily confounded just because their results can be difficult to interpret from a neurophysiological perspective. Let me explain this by analogy: Imagine comparing ten randomly selected negative numbers with ten randomly selected positive numbers. A t-test would indicate statistical significance, but this significance is uninterpretable. However, the reason that the result is uninterpretable is not due to a confound of the t-test, but rather, due to the assumptions underlying the data collection. Imagine you received the same numbers but were told that they reflected measurements of relative alpha-band power in conditions A and B. Now the same result would be interpretable.”

      Authors:

      Indeed, in some sense the whole first part of our paper illustrates the variety of different but equally plausible reasons behind a CFC signature, or different possible interpretations if you wish. So, why do we call them "methodological confounds"?

      Taking an analogy with the t-test might help us here, though we think that the analogy provided by MXC is slightly misleading and prefer a different version of the analogy. Namely, when you make a t-test, the un-interpretability is not only about the "origin of the data" (as in the example of MXC), but also (and actually even more) about the "nature of the data".

      T-test makes specific assumptions on the underlying probability distribution (e.g. normality) and when these assumptions do not hold, the p-value obtained might very well just reflect the fact that the underlying distribution did not match well.

      This is similar to CFC - we do not claim that the CFC measures are wrong, but in some sense show that the underlying assumption that there is real coupling in the data might well be doubted (for several reasons explained in the text). We show how alternative assumptions (i.e. non-linearity, common drive etc) could as well account for high CFC values. I.e. the CFC measure describes the amount of coupling only if we already assume the existence of this coupling, and the absence of the other mechanisms, or their constancy over experimental conditions.

      Maybe "methodological confounds" sounds more appropriate if one keeps also this analogy in mind - if the methodology is applied in case of doubt with assumptions, the results are not interpretable. It is the same with the T-test - applying it to any distribution, one is not able to draw conclusions. This is not a fault of the T-test. However we would end up with a possible confound if we DID not know what the underlying distribution is, but still applied the T-test. In the case of CFC analysis we do not have a good understanding of underlying biophysics, but still apply the CFC measure and try to interpret it.

      It might be useful to compare two different possibilities of expanding the acronym CFC - either Cross-Frequency Correlation or Cross-Frequency Coupling. The latter indicates biophysical interaction and even causality and is the one used now in the literature. Our article discusses at length why in fact we should rather hold to Cross-Frequency Correlation. Moreover, we explain that even in this case it is important to try to partial out the effects that could diminish the specificity of CFC as a marker.<br /> ______

      MXC: “Their first example is the van der Pol oscillator. The authors claim that CFC here reflects a confound, because (page 3) “there is no simple physical interpretation for the different frequency components of the oscillator.” The interpretation depends entirely on the assumptions of the signal. If this were a neural signal, one might interpret that certain phases of the lower frequency oscillation regulate the variability of faster activity (as an aside, the lack of band-limited activity in Figure S1 is a classic situation of when *not* to interpret results as reflecting an oscillation; this has been discussed since the 1990’s by, among other researchers, Singer, Tallon-Baudry, Pfurtscheller, Miller). This is readily apparent by plotting the van der Pol signal along with its rectified derivative, which can be obtained with the Matlab code below:

      ode = @(t,y)

      vanderpoldemo(t,y,1);

      [t,y] = ode45(ode,[0 20],[2 0]);

      plot(t,y(:,1)), hold on

      plot(t(1:end-1),abs(diff(y(:,1)))*8,'r')

      The problem here is not with the measure of CFC. In fact, I do not see a problem at all; the authors simply tested a method on simulated data and got a result, much like a t-test on signed random numbers would produce a result. Here is another, even more striking, example:

      t=0:1/1000:1;

      plot(t,sin(2*pi*40*t) .*sin(2*pi*t))

      As with the van der Pol illustration, one can say that CFC here is uninterpretable because there is no interaction amongst subsystems; there is simply a 40-Hz sine wave multiplied by a 1-Hz sine wave (this could occur from two independent systems with wave cancelation at the recording electrode). Again, the problem is not with the CFC measure, but that the simulated data do not lend themselves to a neurobiological interpretation of CFC.”

      Authors:

      Indeed, “the simulated data do not lend themselves to a neurobiological interpretation of CFC”, and neither do the neurobiological data at the moment. This is one of the main points of the manuscript.

      The problem is that for now, the neurobiological measurements might not lend themselves to the “coupling” interpretation of CFC. The CFC analysis has been adopted and is used with a certain aim and interpretation. Thus it seems fair to say that if the methodology does not provide answers and interpretations it should, we deal with "methodological confounds".

      The examples brought up show that without further assumptions and knowledge of the underlying neurobiology, current methodology is unable to discriminate between various basic but very different interpretations. In analogy with the T-test example above, similar other toy examples treated with a T-test would illustrate what could happen if the underlying distribution did not match the assumptions (i.e. normality) - and why a T-test is not applicable without checking its assumptions first.

      As we mention in several places, this is not a problem when one tries to use the CFC measure only as a MARKER, however the problem comes when one goes one step further in the interpretation, trying to give a particular (physiological) meaning to CFC (“high frequency oscillations modulated by low frequency phase” or something along these lines).

      Also, notice that your second example (modulated sinusoids) does tell you something about which parameters (in terms of bandwidth) should be used so that the CFC measure would be closer to its desired interpretation.<br /> ____

      MXC: “Their other examples are also not compelling as identifying any confounds with CFC measures. Prime numbers are nonrandom sequences with a periodic structure (http://xxx.lanl.gov/pdf/cond-m... and anyway, true random sequences can appear non-random at small N. A more serious concern is that the authors are interpreting CFC in random data or in ECoG data with non-linearity introduced (Figure S6) without performing any statistics to justify the interpretation of CFC. Analogously, a t-statistic on random numbers is unlikely to be exactly 0; it is only through evaluation of that t-statistic with respect to a null hypothesis distribution that a t-value of, say, 1.5 can be interpreted.”

      Authors:

      Interestingly enough, prime-numbers, when one partials out the fact that there is only one even prime number, one prime number that is divisible by three etc, seem to be best described as what are called pseudo-random numbers. (See for example any of Terence Tao’s blog posts or presentations on “primes and pseudorandomness”.) So at least for now, to our knowledge, there seems to be no reason to believe that there is cross-frequency coupling behind any process we might expect to generate prime numbers. ;) But of course this is just an illustration of how hard it is to conclude anything about mechanistic processes by just using a CFC measure. As a side note, one should also not forget that still some care is needed when interpreting such statistics, i.e. recall the numerical information on the change of sign between \pi(x) and li(x) and Skewes’ numbers. ;) But probably none of us is an expert on primes and knows exactly why they give rise to a high CFC index. We reason in the article that even in the case of the CFC measured from the brain, this “why” still continues to have a multitude of possible answers.

      Now, more seriously, in the ECoG or random data we use the exactly same procedure as is usual in the CFC analysis. Indeed, we used the code provided by Tort for the modulation index, and the code provided by Canolty et al. from their Science paper and hence, their respective surrogate analysis (and in our text it was indicated that the results were significant). In addition, for the non-linearity case we even provided a simple example (supplementary material) where we derived analytically that quadratic non-linearities lead to CFC. <br /> ____

      MXC: “Another issue identified by the authors is the potential confound of co-occurring but independent low-frequency phase and high-frequency power dynamics. This is a potential confound (discussed in Cohen, 2014, Analyzing Neural Time Series Data; figure 30.7) but is fairly easy to identify and address (including: avoiding interpreting CFC from immediate post-stimulus periods, removing the phase-locked time-domain signal before computing CFC, and inspecting whether the time-course of CFC differs from the time-course of phase clustering). Perhaps the authors have additional suggestions?”

      Authors:

      As we note in our manuscript “if a brain area under a recording electrode receives time-varying input from any other brain area, this input might generate similar dependencies across frequency components (Figure 4A). The problem is that usually one has no control over the timing of the internal input to the examined brain area (Figure 4B). Thus, phase-amplitude coupling measured anywhere in the brain can be potentially explained by common influence on the phase and amplitude, without the phase of a low frequency oscillation modulating the power of high frequency activity.” The improvements mentioned in your commentary do not help to identify and address the problems with INTERNAL input, where we have no idea about the onset time (see Figure 4). <br /> ____

      MXC: “Later, they write (pages 9-10 and figure 4) "If a brain area under a recording electrode receives time-varying input from any other brain area, this input might generate similar dependencies across frequency components." This does not seem to be a confound, but rather, a description of CFC: low-frequency oscillations from a distal brain region modulate local activity, as manifest in higher frequency oscillations. Perhaps if the authors would identify a mechanism/consequence of CFC for neural activity it would be easier to understand whether/how this is a confound.”

      Authors:

      There is a misunderstanding here. We would not NOT agree with the interpretation that “low-frequency oscillations from a distal brain region modulate local activity, as manifest in higher frequency oscillations”. Instead we clearly write in our manuscript that “non-stationary input to a given area simultaneously affects the phase of a low frequency component and increases high-frequency activity (common drive to frequency components of the same signal).” This means that the low frequency phase is modulated and the high frequency component is influenced by the same common drive to the area. As we conclude: “In this case, high-frequency amplitude increases occur preferentially for certain phases of slow oscillations even without any need of interaction between the two rhythms.” (See also Figure 3). Again, we would agree on this point if CFC would stand for Cross-Frequency Correlation rather than Cross-Frequency Coupling, as the latter indicates interaction or causality.

      ____

      MXC: “On page 6, the authors write “The main conclusion is – not that surprisingly - that a clear peak in the power spectrum of the low frequency component is a prerequisite for a meaningful interpretation of any CFC pattern.” The justification does not follow. If one is interested in *phase* dynamics, why does there need to be a peak in *power*? Assuming that phase reflects the timing of neural populations while power reflects their spatial coherence at the LFP level, why is spatial coherence considered a prerequisite for investigating timing? In real EEG data, power and phase dynamics are often independent of each other.”

      Authors:

      It is here not at all necessary to think about which neural processes the phase or power variable could reflect. The reason for why a peak in the power spectrum is a prerequisite for a meaningful interpretation of phase (as an index that is a parameter of an oscillation) is well known in the physics/electrical engineering community and simply comes from the signal processing perspective: phase can be meaningfully defined only for narrow-band (and slowly frequency-varying) oscillatory signals for which the phase grows monotonically (please see page 35 of the manuscript: Supplementary discussion - conditions for a meaningful phase). Note that although narrow-band filtering a signal enhances smooth dynamics of its phase, it does not improve its physical interpretability.

      ____

      MCX: “A related discussion is potential differences in power across conditions. CFC methods generally measure the relationship between power and phase, not the magnitude of power. Appropriate permutation-based statistical corrections will account for differences in the magnitude of power (Cohen, 2014, chapter 30).”

      Authors:

      Yes, we agree that this is something that one indeed can control for and just point out that this is not always done in the literature. (See literature review).<br /> ____

      MCX: “The potential confound of low power for estimating phase (Muthukumaraswamy & Singh, 2011) applies only for very low SNR; in real EEG data, power and phase dynamics are often easily disambiguated and unrelated to each other.”

      Authors:

      The level of SNR for EEG is dependent on the frequency band considered and stimulation elicited by the experimental protocol. Here the main point is that many studies compare CFC between conditions that elicit very different power in a given band (e.g. peak vs no peak). Thus there is straight away a bias in the reliability of the phase estimation and therefore of the phase-amplitude coupling. How big this effect is should be assessed for each dataset. In addition, the amplitude and phase defined by the analytical signal approach (using Hilbert transforms) are not fully independent and even a nominal change in one of them induces a perturbation in the other (Supplementary Figure 7B).

      ____

      MXC: “Table 1 should include citations of the papers surveyed; otherwise independent verification is not possible.”

      Authors:

      we feel that the description preceding the literature review enables anyone to find the respective papers (as the years, journals and search criteria have been mentioned, a simple PUBMED search can provide the explicit list of papers considered). The magic paper is the one we added manually, which we indeed can identify here - Saalmann et al., 2012 in Science. The literature review covers papers up to January 2014 (included).

    2. On 2014-06-06 04:37:20, user MikeXCohen wrote:

      This paper discusses some theoretical, mathematical, and practical issues concerning cross-frequency coupling (CFC). CFC is receiving increasing interest in neuroscience, and critical discussions of data analysis methods and their interpretations are always timely. I found this paper to be interesting but the arguments were not always compelling. Below are my comments that I hope will help the authors improve their arguments and perhaps help invigorate the critical discussions.

      It is not always clear whether the authors are criticizing the biophysical interpretation of CFC analyses, or the mathematical foundations of CFC methods. Perhaps it would be useful for the authors to define the situations under which CFC could be validly interpreted, and what exactly the neurobiologically meaningful interpretation would be.

      Concerning the former, the authors accurately state that relatively little is understood about the neural mechanisms that could produce CFC, and this may impede interpretations of empirical findings (the same criticism applies to most macroscopic measures of brain activity, including ERPs, time-frequency power, most measures of functional connectivity, the FMRI BOLD response, etc.). Their suggestion for researchers to label their CFC analyses as relatively exploratory vs. confirmatory and as a marker vs. biophysical understanding (figure 5) is also sensible (this suggestion also could be applied to most or perhaps all measures of brain activity). The reliance on DCM should be cautioned against the over-parameterization and opaqueness of DCM models used in practice.

      Concerning the latter, the authors need to explain and justify their criticisms of the few methods that they focused on. (Many methods of cross-frequency coupling have been reported; it is not clear why the authors focus only on a small number of methods without discussing (or mentioning/citing) other methods.) They variously conflate potential mathematical confounds with neurobiological interpretations.

      Below are some specific comments; the general point is that methods for assessing CFC are not necessarily confounded just because their results can be difficult to interpret from a neurophysiological perspective. Let me explain this by analogy: Imagine comparing ten randomly selected negative numbers with ten randomly selected positive numbers. A t-test would indicate statistical significance, but this significance is uninterpretable. However, the reason that the result is uninterpretable is not due to a confound of the t-test, but rather, due to the assumptions underlying the data collection. Imagine you received the same numbers but were told that they reflected measurements of relative alpha-band power in conditions A and B. Now the same result would be interpretable.

      Their first example is the van der Pol oscillator. The authors claim that CFC here reflects a confound, because (page 3) “there is no simple physical interpretation for the different frequency components of the oscillator.” The interpretation depends entirely on the assumptions of the signal. If this were a neural signal, one might interpret that certain phases of the lower frequency oscillation regulate the variability of faster activity (as an aside, the lack of band-limited activity in Figure S1 is a classic situation of when *not* to interpret results as reflecting an oscillation; this has been discussed since the 1990’s by, among other researchers, Singer, Tallon-Baudry, Pfurtscheller, Miller). This is readily apparent by plotting the van der Pol signal along with its rectified derivative, which can be obtained with the Matlab code below:

      ode = @(t,y)<br /> vanderpoldemo(t,y,1);<br /> [t,y] = ode45(ode,[0 20],[2 0]);<br /> plot(t,y(:,1)), hold on<br /> plot(t(1:end-1),abs(diff(y(:,1)))*8,'r')

      The problem here is not with the measure of CFC. In fact, I do not see a problem at all; the authors simply tested a method on simulated data and got a result, much like a t-test on signed random numbers would produce a result. Here is another, even more striking, example:

      t=0:1/1000:1;<br /> plot(t,sin(2*pi*40*t) .*sin(2*pi*t))

      As with the van der Pol illustration, one can say that CFC here is uninterpretable because there is no interaction amongst subsystems; there is simply a 40-Hz sine wave multiplied by a 1-Hz sine wave (this could occur from two independent systems with wave cancelation at the recording electrode). Again, the problem is not with the CFC measure, but that the simulated data do not lend themselves to a neurobiological interpretation of CFC.

      Their other examples are also not compelling as identifying any confounds with CFC measures. Prime numbers are nonrandom sequences with a periodic structure (http://xxx.lanl.gov/pdf/con... "http://xxx.lanl.gov/pdf/cond-mat/0303110v4.pdf)") and anyway, true random sequences can appear non-random at small N. A more serious concern is that the authors are interpreting CFC in random data or in ECoG data with non-linearity introduced (Figure S6) without performing any statistics to justify the interpretation of CFC. Analogously, a t-statistic on random numbers is unlikely to be exactly 0; it is only through evaluation of that t-statistic with respect to a null hypothesis distribution that a t-value of, say, 1.5 can be interpreted.

      Another issue identified by the authors is the potential confound of co-occurring but independent low-frequency phase and high-frequency power dynamics. This is a potential confound (discussed in Cohen, 2014, Analyzing Neural Time Series Data; figure 30.7) but is fairly easy to identify and address (including: avoiding interpreting CFC from immediate post-stimulus periods, removing the phase-locked time-domain signal before computing CFC, and inspecting whether the time-course of CFC differs from the time-course of phase clustering). Perhaps the authors have additional suggestions?

      Later, they write (pages 9-10 and figure 4) "If a brain area under a recording electrode receives time-varying input from any other brain area, this input might generate similar dependencies across frequency components." This does not seem to be a confound, but rather, a description of CFC: low-frequency oscillations from a distal brain region modulate local activity, as manifest in higher frequency oscillations. Perhaps if the authors would identify a mechanism/consequence of CFC for neural activity it would be easier to understand whether/how this is a confound.

      On page 6, the authors write “The main conclusion is – not that surprisingly - that a clear peak in the power spectrum of the low frequency component is a prerequisite for a meaningful interpretation of any CFC pattern.” The justification does not follow. If one is interested in *phase* dynamics, why does there need to be a peak in *power*? Assuming that phase reflects the timing of neural populations while power reflects their spatial coherence at the LFP level, why is spatial coherence considered a prerequisite for investigating timing? In real EEG data, power and phase dynamics are often independent of each other. A related discussion is potential differences in power across conditions. CFC methods generally measure the relationship between power and phase, not the magnitude of power. Appropriate permutation-based statistical corrections will account for differences in the magnitude of power (Cohen, 2014, chapter 30). The potential confound of low power for estimating phase (Muthukumaraswamy & Singh, 2011) applies only for very low SNR; in real EEG data, power and phase dynamics are often easily disambiguated and unrelated to each other.

      Table 1 should include citations of the papers surveyed; otherwise independent verification is not possible.

    1. Authors in Head et al. [30] described that "code gets horrible looking" as macros are added to it to specify augmentations.

      sentence that describes the obstacles that the proposed system is designed to help the intended user get around to reach their goals

    1. Frequently, code-generation systems focus on building and then refining a full working application, using visibility of the full underlying code as a fallback when users need to build understanding of the generated program.

      sentence that describes the obstacles that the proposed system is designed to help the intended user get around to reach their goals

    2. Ply uses a server program written in TypeScript to make code generation requests to a large language model and to execute the resulting code, which passes messages to and from sensors and actuators.

      sentence that describes the characteristics that define the proposed system

    3. Code generation offered by large language models can serve to author this glue code for trigger-action programs, allowing for data from triggers to be mapped to input data for actions automatically even when their native data formats or intended functionality do not match exactly.

      sentence that describes the conditions for which the system is designed

    4. It encourages program decomposition into "layer" abstractions, It automatically creates visualizations of event payloads at layer boundaries to help users understand layer behavior without having to read the underlying generated code, and It constructs ad hoc parametrization interfaces that allow users to configure important dimensions of the behavior of each layer without having to re-author it.

      sentence that describes the characteristics that define the proposed system

    5. However, such LLM-authored code, especially when implementing nontrivial logic, can be difficult to specify, understand or debug. Users need appropriate tools and handles to understand and make changes to the computation that is being performed in such code.

      sentence that describes the obstacles that the proposed system is designed to help the intended user get around to reach their goals

    6. Trigger-action programming has been a success in end-user programming. Traditionally, the simplicity of links between triggers and actions limits the expressivity of such systems. LLM-based code generation promises to enable users to specify more complex behavior in natural language. However, users need appropriate ways to understand and control this added expressive power.

      sentence that describes the conditions for which the system is designed

    1. AbstractHerbarium collections are a vast but underutilized resource for ancient DNA research, containing over 400 million specimens with detailed metadata and spanning centuries of global biodiversity. Understanding patterns of DNA preservation in natural collections is crucial for optimizing ancient DNA studies and informing future curation practices. We analysed genomic data for 573 herbarium specimens from six plant species from the genera Hordeum and Oryza collected from the Americas and Eurasia over 220 years. Using standardized laboratory protocols and shotgun sequencing, we quantified DNA degradation and elucidated factors that accelerate it. We find significant age-dependent DNA fragmentation rates, indicating temporal degradation processes not detected in prehistoric samples. In our analysis, DNA decay rates in herbarium specimens were almost eight times faster than in moa bones, reflecting fundamental differences in tissue composition and preservation environments. Environmental conditions at the time of specimen collection emerged as the major determinants of post-mortem damage rates, with the interaction term between temperature and genus being the dominant driver of cytosine deamination. We find no effect of sample storage on DNA damage and degradation. These findings provide insights into how climatic origin, preservation environment, taxonomic identity and age influence DNA preservation while highlighting opportunities for improving institutional preservation practices. Due to standardised preservation conditions, museum collections can provide better insights into DNA damage and degradation over time than archaeological and paleontological samples.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giag026), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 2:

      Reproducibility report for: Patterns of aDNA Damage Through Time end Environments - lessons from herbarium specimens Journal: Gigascience ID number/DOI: GIGA-D-25-00447 Reviewer(s): Laura Caquelin, Department of Clinical Neuroscience, Karolinska Institutet, Sweden [Wrote the report and reproduced the results] Gustav Nilsonne, Department of Clinical Neuroscience, Karolinska Institutet, Sweden [Reviewed the final report]


      1. Summary of the study The authors evaluated DNA preservation in herbarium collections by analyzing genomic data from 573 specimens of Hordeum and Oryza. They quantified DNA degradation and identified factors affecting decay, finding that specimen age and environmental conditions strongly influence DNA preservation.

      1. Scope of reproducibility

      According to our assessment the primary objective is: the regression analyses of aDNA damage metrics for Hordeum and Oryza.

      • Outcome: "Four metrics were selected to quantify patterns of aDNA damage: (i) the proportion of endogenous DNA content, (ii) the fragment length distribution, (iii) the damage fraction per site (λ), and (iv) the frequencies of 5' C>T substitutions." (lines 197-199)
      • Analysis method outcome: "The four metrics were analysed in linear models as a function of collection year and sample age using the 'lm' function in R" (lines 199-200)

      • Main result: The results of this outcome are presented in figure 2 "Regression analyses of aDNA damage metrics for Hordeum and Oryza" and in the related text lines 302 to 361 in the "Regression analysis" section: "Endogenous fraction […] The regression analyses revealed no statistically significant relationship between the proportion of endogenous DNA and the sample collection year in Hordeum (R2 = 0.003, p = 0.451, N = 211), but a very weak yet significant relationship was observed in Oryza (R2 = 0.04, p = 0.00167, N= 245; figure 2a).

      Fragment length […] We observed a statistically significant relationship between the log-mean fragment length and the sample collection year for both genera (figure 2b), with a stronger relationship for Hordeum (R2 = 0.27, p =5.33 x 10-16, N=211) than Oryza (R2 = 0.112, p = 8.58 x 10-8, N= 245).

      Damage fraction per site (λ) and DNA decay rate (k) […] We estimated the DNA decay rate per year (k) for Hordeum and Oryza from the slope of the linear relationship between λ and sample age (figure 2c). We observed a per nucleotide decay rate of k= 2.64 x 10-4 per year for Hordeum (R2 = 0.208, p =3.27 x 10-12, N= 211), which was 1.5 times faster than the decay rate of Oryza of k= 1.79 x 10-4 per year (R2 = 0.101, p = 3.65 x 10-7, N= 245) […].

      Nucleotide misincorporations […] (figure 2d), with Oryza starting from a higher baseline of damage when compared to Hordeum and displaying a stronger relationship (R2 = 0.303, p = 8.62 x 10-21, N= 245 for Oryza, and R2 = 0.207, p =3.63x 10-12, N= 211 for Hordeum, respectively). […]"


      1. Availability of Materials a. Data
      2. Data availability: Raw data are not yet publicly available but uploaded in NCBI database. Processed data are shared via the private journal dropbox, and the intermediate file is available on the GitHub repository.
      3. Data completeness: Complete processed data and intermediate file (all data necessary to reproduce main results are available).
      4. Access Method: Private journal dropbox and GitHub repository
      5. Repository: https://github.com/Stefano-Porrelli/Herbaria_aDNA_Damage -Data quality: Structured b. Code
      6. Code availability: Open
      7. Programming Language(s): R and Bash
      8. Repository link: https://github.com/Stefano-Porrelli/Herbaria_aDNA_Damage
      9. License: MIT license
      10. Repository status: Public
      11. Documentation: Clear Readme file. Additional details may be required to run the Bash code.

      1. Computational environment of reproduction analysis

      2. Operating system for reproduction: MacOS 15.7.2

      3. Programming Language(s): R
      4. Code implementation approach: Using shared code
      5. Version environment for reproduction: R version 4.5.1/RStudio 2025.05.1

      1. Results

      5.1 Original study results - Results 1: See screenshot figure 2:

      5.2 Steps for reproduction

      -> Run 01_Plant_aDNA_screening_prep.sh - Issue 1: The reviewer link provided for the bioprojects on NCBI did not allow downloading. -- Partial resolution: An email was sent to the authors requesting access to the raw data or sharing processed data and intermediate files. Processed data were shared via the private journal dropbox and intermediate file (aDNA_damage_screening_MAIN.txt) was shared both on the dropbox and the GitHub repository.

      The authors contacted NCBI to enable downloading the raw data with the reviewer link, but no response has yet been received. As the review needed to be performed within a set timeframe, the computational reproducibility review was performed first using the processed data and then directly with the intermediate file.

      Note: The two bash scripts were not run. Additional guidelines would be helpful for running these scripts, especially regarding terminal commands and manual steps (changing the repository name or the link to the data for example).

      -> Run the analysis from the processed data shared --> Run code aDNA_Dmg_Script00_collate_screening_results.r - Issue 2: The code expects data organized in two sub-folders: 4_mapping and 5_aDNA_characteristics. Processed data were received in several species-specific folders, each containing 4_mapping and 5_aDNA_characteristics. -- Resolved: All data were merged manually into single 4_mapping and 5_aDNA_characteristics folders to match the script's requirements. This detail should be added to the readme file. - Issue 3: The sample_metadata.txt file was not correctly merged with the results dataframe. Many columns (Batch_no to X) in aDNA_damage_screening_MAIN.txt contained NA values. -- Resolved: A message was sent to the authors to resolve the issue. They updated both sample_metadata.txt and aDNA_damage_screening_MAIN.txt on GitHub. Author's response: "I have realised the problem stems from inconsistencies between sample naming conventions in the screening output directories and the sample identifiers in the metadata file. Specifically, for the Hordeum samples, the directories are named using library IDs rather than the short sample names, and some of the Oryza samples were missing their expected suffixes. This meant the left_join step failed to match metadata for those samples. Thank you for flagging this up. I have now corrected this by updating the "Sample" column in the metadata file to reflect the actual directory names used in the screening outputs. The original short names are preserved in a "Sample_ID" column. I have uploaded the corrected sample_metadata.txt file to the GitHub repository, and also updated the aDNA_damage_screening_MAIN.txt dataset on the GitHub repo to reflect these changes. I have re-run the pipeline and it now works correctly. Please let me know if you encounter any further issues, and thank you again for catching this."

      The reproduced aDNA_damage_screening_MAIN.txt file no longer contains NA values.

      --> Run code aDNA_Dmg_Script02_Regressions.r: The script was run without any issues.

      -> Run the analysis from the intermediate data file shared on Github --> Run code aDNA_Dmg_Script02_Regressions.r: Run the code after renaming the file to aDNA_damage_screening_MAIN_shared.txt.

      5.3 Statistical comparison Original vs Reproduced results - Reproduced results: -- Using the processed data and the reproduced aDNA_damage_screening_MAIN.txt, the results of Figure 2 were successfully reproduced (see screenshots below). -- Using the shared aDNA_damage_screening_MAIN.txt from GitHub, the results were also successfully reproduced (see screenshots below).

      • Comments: Supplementary Figure 1 was also reproduced using the same code. We confirmed that the reproduced values match the original results. Both the processed data and the intermediate data file reproduced Supplementary Figure 1 (see screenshots below).
      • Errors detected: One reporting error was detected in the "Fragment length" section (line 336): the p-value for Oryza should be 8.47 x 10-8, not 8.58 x 10-8 as reported in the text.
      • Statistical Consistency: All statistical results reproduced from both the processed data and the intermediate file are identical to those reported in the manuscript (see Comparison_reproduced_vs_original.csv and Comparison_two_reproductions.csv files attached with this report).

      1. Conclusion
      2. Summary of the computational reproducibility review The computational reproducibility review shows that the results in Figure 2 and related text of the original study were fully reproducible using both the processed data and the intermediate data file shared (aDNA_damage_screening_MAIN.txt). The statistical results reproduced are identical to those presented in the manuscript. One minor reporting error was detected in the manuscript: the p-value for Oryza in the "Fragment length" section should be 8.47 × 10⁻⁸ instead of 8.58 × 10⁻⁸.

      3. Recommendations for authors -- Provide clearer instructions for running the Bash scripts, including terminal commands and any manual steps. -- Ensure consistent sample naming across metadata files and data directories to avoid merging issues for all analysis/scripts. -- Consider making raw data publicly available or provide clear guidance for reviewers to access it. -- Maintain clear documentation of file structure to facilitate future reproducibility.

    1. The AI Scientist-v2 eliminates the reliance on human-authored code templates

      v1 到 v2 最关键的跨越是「去除人类模板依赖」。v1 仍然需要人类提供初始代码框架,v2 从零开始自主生成代码、设计实验。这个区别的深远意义:v1 是「AI 完成人类设计的任务」,v2 是「AI 自己设计任务并完成它」。这条界线一旦被跨越,AI 在科研中的角色就从工具变成了研究者。

    1. an agent does not care about the structure, unless you specifically ask it to. But even in this case you have to review the changes.

      【启发】「AI 天然不在意结构,除非你明确要求」——这个发现定义了人类工程师在 AI 时代最不可替代的职责:做代码结构的「守门人」。这与 Every 文章里「每个人都是管理者」的洞见形成呼应:人类的工作从「执行代码」转变为「审查代码质量并为 AI 设定标准」。对工程团队文化的启发:代码 Review 的重要性不是在下降,而是在上升——因为现在需要 Review 的代码量是以前的 10 倍。

    2. When you give a task to your agent, make sure you also explain how the code should be organized. Not only value, but also structure.

      【启发】这条实操建议揭示了一个普遍被忽视的 Prompt 盲区:大多数人给 AI 下达编程任务时,只描述「做什么」,从不描述「怎么组织」。这相当于只告诉一个新员工「实现这个功能」,却从不告诉他「我们的代码规范是什么」。对所有使用 Vibe Coding 的人来说,这条建议应该成为标准操作流程的一部分——在每次任务 Prompt 中,主动加入结构约束。

    3. Robert Martin in Clean Architecture talks about code as having two properties: value (it works, it's fast, etc.) and structure (how code is organised).

      【启发】把 Robert Martin 的「价值 vs 结构」二元框架带入 AI Agent 时代,是一个极聪明的理论嫁接。AI 天然只关心「价值」(能跑通、能完成任务),却倾向于忽略「结构」(代码是否整洁、是否可维护)。这意味着在 AI 驱动的开发工作流中,「守护结构」必须成为人类工程师的核心职责——这是 AI 无法自发完成的工作,也因此成了人类不可替代的价值所在。

    4. poorly organized code means agents need to read, "understand", and make changes to more files than necessary - polluting their context and costing you tokens.

      【启发】技术债从「慢慢损害可维护性」变成了「立刻损害你的账单」。这是一个全新的技术债量化维度——不再只能用「未来的工时」来衡量,而可以用「每次 AI 调用的 token 超支」来实时计算。这为「说服管理层重视代码质量」提供了一个全新的、财务可量化的论据:烂代码不只是技术问题,它在每次 AI 执行任务时都在直接产生额外费用。

    5. their productivity is affected by the state of the codebase.

      【启发】这句话的深远意义在于:它把 AI Coding Agent 与人类开发者置于同一评价维度。这不是「AI 是否能替代人」的问题,而是「AI 受代码质量影响的方式是否与人类相同」。答案是肯定的——这意味着几十年来软件工程师积累的代码质量实践,不是因为 AI 的到来而失效,而恰恰因为 AI 的到来而变得更加重要。技术债从「慢慢影响人」变成了「立刻影响 AI 的 token 消耗」。

    1. *

      Yicong Wang:

      Coding was done with the help of: ChatGPT, and Visual Studio Code.

      Digital Platform was Supported By: Hypothes.is, and GitHub.

      Bibliography: Handian (The Chinese Dictionary). https://www.zdic.net.

      Hanyu Da Cidian (The Dictionary of Classical Chinese). https://homeinmists.ilotus.org/hd/hydcd.php. Accessed 9 Apr. 2026.

      Luo, Hui. “Mastering a Minor Tradition: Pu Songling and the Chinese Ghost Tale.” A Companion to World Literature, 19 Dec. 2019.

      Pu, Songling. Strange Tales from Liaozhai. Translated by Sidney L. Sondergard, Jain Publishing, 2008.

      Quan jiao hui zhu ji ping Liaozhai zhiyi (trange Tales from the Studio of Leisure and The Anthology of Commentaries ). Edited by Ren Duxing, 4 vols., People's Literature Publishing House, 2016.

      Werner, Sarah. Studying Early Printed Books, 1450–1800: A Practical Guide. Wiley, 2019.

    1. *

      Yicong Wang:

      Coding was done with the help of: ChatGPT, and Visual Studio Code.

      Digital Platform was Supported By: Hypothes.is, and GitHub.

      Bibliography: Handian (The Chinese Dictionary). https://www.zdic.net.

      Hanyu Da Cidian (The Dictionary of Classical Chinese). https://homeinmists.ilotus.org/hd/hydcd.php. Accessed 9 Apr. 2026.

      Luo, Hui. “Mastering a Minor Tradition: Pu Songling and the Chinese Ghost Tale.” A Companion to World Literature, 19 Dec. 2019.

      Pu, Songling. Strange Tales from Liaozhai. Translated by Sidney L. Sondergard, Jain Publishing, 2008.

      Quan jiao hui zhu ji ping Liaozhai zhiyi (trange Tales from the Studio of Leisure and The Anthology of Commentaries ). Edited by Ren Duxing, 4 vols., People's Literature Publishing House, 2016.

      Werner, Sarah. Studying Early Printed Books, 1450–1800: A Practical Guide. Wiley, 2019.

    Annotators

    1. The other frequently cited use of Generative AI in research tasks was its use to generate programming code. Respondents cited using Generative AI to help with R programming language. Again, this was framed as time-saving:It speeds up programming (proposes new lines, which saves time on typing, can write a piece of generic code from a simple prompt). It can also sometimes solve errors.I have also used ChatGPT to help me write code (R and JavaScript). e.g. I wanted to create a nice-looking output table for some descriptive statistics and I didn’t know how to do this.Simply reducing the time it takes to type out code is a benefit, particularly if the prompt is short. In producing visually pleasing outputs that may be difficult or time-consuming to realise, Generative AI was felt to lessen learning time and speed up outputs. This further extended to Generative AI fixing code:[Fixing] my coding errors in a way that maximises amount of time that I can spend on my research rather than trying to fix coding bugs.

      The use of LLMs to generate code could possibly be useful at lower levels, but when one is trying to make something complex, the amount of time it takes for the AI to not make some error of some kind is exponentially higher. In order to fix these things, one would have to have enough coding knowledge to know how to write the code and where an error may be. At that point, it would probably be faster to just write out the code itself instead of prompting an AI for something that could very much have an error or two or three or more. The time sink from fixing those errors probably outweighs the time gained through asking an AI to write the code for you. Additionally, a LLM could end up using an inefficient method that works in the short term, but causes problems in the long term, which one might miss until it's too late.

    2. The use of Generative AI in Higher Education has become an urgent issue since the launch of the freely available Open AI ChatGPT chatbot in November 2022. Generative AI are ‘deep-learning’ programmes that can generate human-like responses to user prompts (Lim et al., 2023). This includes the generation of images, computer code, or text. Generative AI is a significant advance on Conversational AI models. Rather than relying on pre-programmed responses to queries, Generative AI draws on information databases to create seemingly ‘new’ answers. While not the first Large Language Model (LLM), the arrival of ChatGPT-3 accelerated the debate around Generative AI and its potential for good or ill in education, as it provided a free service that was more accessible than most of its rivals.

      Large Language Models, unlike something like a calculator or a translation system, work less as a tool and more as something that will do all the work for the user if asked to. It is important to note that while these LLMs could be used as tools to help someone understand the gist of something, or get the ball rolling, when the option to simply have the LLM do the work for the user is there, many choose the easier route.

    1. For each respondent, we create a variable—pro-portion peaceful protest—indicating the share of BLMprotest events defined as peaceful within 25 miles oftheir zip code between May 25, 2020, and our surveyfielding.

      Again controlling, people closer to protests are more likely to do so

    1. All experiments were carried out using Python, and the source code is available at https://github.com/kwkwon13/a-posteriori-conv-diff-siac.

      一篇发表在 arXiv 的纯数学论文提供了完整的 Python 源码——这在数值分析领域仍属少数,但正在成为趋势。令人印象深刻的是实验规模:均匀 N×N 网格(N 最大 128)、五个不同粘性系数、两种多项式次数,在二维空间上的完整参数扫描。可复现性不只是 AI 领域的议题,数学论文同样值得这样的透明度标准。

    1. In the last year, we moved from manually editing files to working with agents that write most of our code.

      令人惊讶的是:仅仅一年时间内,Cursor已经从手动编辑文件转变为让代理编写大部分代码,这展示了AI编程助手发展的惊人速度,暗示软件开发正在经历前所未有的范式转变。

    1. AbstractIntegrating single-cell omics data at an atlas scale enhances our understanding of cell types and disease mechanisms. However, the integration of data processed by different normalisation methods can lead to biases, such as unexpected batch effects and gene expression distortion, leading to misinterpretations in downstream analysis. To address these challenges, we present scDenorm, an algorithm that reverts normalised single-cell omics data to raw counts, preserving the integrity of the original measurements and ensuring consistent data processing during integration. We evaluated scDenorm’s performance on large-scale datasets and benchmarked its impact on data integration and downstream analysis across three datasets.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giag032), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 3:

      Reproducibility report for: scDenorm: a denormalisation tool for integrating single-cell transcriptomics data Journal: Gigascience ID number/DOI: GIGA-D-25-00209 Reviewer(s): Laura Caquelin, Department of Clinical Neuroscience, Karolinska Institutet, Sweden


      1. Context

      This report corresponds to a second assessment of the computational reproducibility of the article GIGA-D-25-00209, following a revision by the authors after the first round of review.

      The scope of the computational reproducibility review is to reproduce the results in figure 5f related to the evaluation of whether scDenorm improves the biological relevance of gene expression analyses by comparing GO term enrichment from differentially expressed genes (DEGs), before and after denormalization against a gold standard.


      1. Changes since the first review

      The authors made several changes based on comments from the initial computational reproducibility review: - Reorganized and updated the code in Fig5.ipynb and R_goanalysis.ipynb, - Created a docker environment, - Provided pre-computed GO enrichment results and intermediate files in Zenodo, - Added an environment.yaml file for python and installed_packages.csv file for R, - Improved the Readme file.


      1. Availability of Materials a. Data
      2. Data availability: Open
      3. Data completeness: Complete = all data necessary to reproduce main results are available
      4. Access Method: Repository
      5. Repository: https://zenodo.org/records/17275776 (new link) -Data quality: Completed, no metadata was shared.

      b. Code - Code availability: Open - Programming Language(s): R and Python - Repository link: https://github.com/rnacentre/scDenorm_reproducibility - License: - - Repository status: Public - Documentation: A Readme file is provided, but some improvements are needed.


      1. Computational environment of reproduction analysis

      2. Operating system for reproduction: MacOS 15.6.1

      3. Programming Language(s): R (jupyter notebook), Python (jupyter notebook)
      4. Code implementation approach: Using shared code
      5. Version environment for reproduction: Docker version 28.5.1, R version 4.5.1 (2025-06-13), Python 3.13.9

      1. Results

      5.1 Original study results - Results 1: In the revised version 1 of the paper , Figure 5 does not appear in the PDF. Therefore, we assumed that the figure is identical to the one in the original submission, especially based on the authors' comment stating that "We re-ran the analysis and obtained results consistent with those reported in the manuscript." Below is Figure 5f from the original paper:

      (See screenshot)

      The intermediate file "PBMC_go_analysis_result.csv" shared in Zenodo was used to run the authors' code and extract the numerical values of this graph, enabling direct comparison:

      (See screenshot)

      5.2 Steps for reproduction

      -> Follow the readme guidelines to set up the environnement: --> Download the notebooks from Github. Note: notebook list in readme is not updated. --> Install docker and jupyter. Note: the jupyter installation is not precised in the readme file. --> Download data. --- Issue 1: To download the data, no link was provided in the readme file in the Github repository. The zenodo link in the manuscript was not updated in the "Availability of Data and Materials" section. ---- Resolved: The new link was provided in the authors' response to the reviewer but needs to be added in the manuscript and the readme file. The link is https://zenodo.org/records/17275776. --- Issue 2: Guidelines in the README file do not correspond to the actual procedure. ---- Resolved: From the Zenodo archive, download scDenorm_reproducibility.tar.gz, unzip it, and place the data into the data folder. It would be clearer if the authors explicitly specified which files should be placed in the data directory to avoid confusion. --> Run the docker image. --- Issue 3: The following Docker instructions provided by the authors do not work as written: tar -xzf scdenorm_v0.tar.gz docker load -i scdenorm_v0.tar docker run -p 8888:8888 -v /path/to/scDenorm_reproducibility:/app scdenorm_v0 \ jupyter lab --ip=0.0.0.0 --no-browser --allow-root scdenorm_v0.tar.gz does not contain a standard Docker .tar image. After extraction, the result is a directory named scdenorm_v0, not a .tar file. docker load -i scdenorm_v0.tar fails because scdenorm_v0.tar does not exist. Docker must be running before executing docker load. The extraction step is sensitive to the current directory, but this is not documented. ---- Resolved: The image can be successfully loaded directly from the .tar.gz file using: docker load < scdenorm_v0.tar.gz After this, the image scdenorm_v0:latest is available.

      --- Issue 4: Two main issues appeared when running the docker run command: ----- "WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8)" ----- "mounts denied: The path /path/to/scDenorm_reproducibility is not shared from the host". ---- Resolved: To be able to use the docker run command, two steps were needed: ----- Share the project folder with docker manually: Docker → Preferences → Resources → File Sharing → add the local project path ----- Update the docker run command with the local path and add linux/amd64:

      docker run --platform linux/amd64\ -p 8888:8888\ -v /path/to /scDenorm_reproducibility:/app\ scdenorm_v0\ jupyter lab --ip=0.0.0.0 --no-browser --allow-root

      --- Issue 5: R was not connected to Jupyter. ---- Resolved: In the terminal, this made the R kernel available:

      R install.packages("IRkernel") IRkernel::installspec()

      -> Run the Fig5_R__goanalysis.ipynb script --- Issue 6: Docker image does not install the R packages. The file installed_packages.csv lists all required R packages, but they are not installed automatically. ---- Resolved: A solution was to install all required packages at the start of the notebook using the csv file: pkg_list <- read.csv("installed_packages.csv", stringsAsFactors = FALSE)

      for (pkg in pkg_list$Package) { if (!requireNamespace(pkg, quietly = TRUE)) { message(" Installing the package: ", pkg) tryCatch( { install.packages(pkg, dependencies = TRUE) }, error = function(e) { message("Failed to install package: ", pkg) } ) } else { message(" Already installed: ", pkg) } } Additional required packages from Bioconductor:

      if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") if (!requireNamespace("enrichplot", quietly = TRUE)) { BiocManager::install("enrichplot", ask = FALSE)} if (!requireNamespace(c("enrichplot","org.Hs.eg.db"), quietly = TRUE)) { BiocManager::install(c("clusterProfiler", "org.Hs.eg.db"), ask = FALSE)}

      After these steps, the R script ran without errors.

      -> Run the Fig5.ipynb script --- Issue 7: The same issue as no. 3 occurred again, the docker image did not provide a working python environment. Attempt to create the python environment with environment.yaml file. conda env create -f environment.yaml Failed because many packages do not exist for the system, for exemple: "ipyw_jlab_nb_ext_conf ==0.1.0 py39h06a4308_1 does not exist (perhaps a typo or a missing channel);" These errors seem to happen because the environment file contains many Linux-specific packages. ---- Unresolved: Authors should provide an environment file working in all systems. A temporary solution was used: create a minimal clean environment: conda env create -f environment.yaml Environment.yaml: name: scdenorm_clean channels: - conda-forge - bioconda - defaults

      dependencies: - python=3.9 - numpy - pandas - scipy - matplotlib - seaborn - tqdm - scanpy - anndata - tables - pip

      • pip:
      • scdenorm
      • SCCAF

      Then:

      conda activate scdenorm_clean conda install ipykernel python -m ipykernel install --user --name=scdenorm_clean --display-name "Python (scdenorm)"

      Select this kernel in Jupyter Notebook to run the python files.

      An additional issue was the conflict between matplotlib and scapy. Resolved with:

      conda install matplotlib=3.6.3 conda install -c conda-forge scanpy (Successfully installed scanpy-1.10.3)

      --> The script was executed only by starting from HSPC section. --- Issue 8: A specific issue appeared after filtering the dataframe tmp1 by go_terms, only two cell types remained (b0 and b1), and b1n disappeared. This was because no row corresponding to b1n matched the selected GO terms. ---- Unresolved: Fig5_R__goanalysis.ipynb was re-run multiple times to obtain a new version of the PBMC_go_analysis_result.csv. However, the error persists.

      5.3 Statistical comparison Original vs Reproduced results - Reproduced results: Figure 5f

      (see screenshots)

      • Comments: The figure obtained does not show all go_terms nor all categories. Only categories b1 and b0 are shown.
      • Errors detected: -
      • Statistical Consistency: If there is no error, b0 would correspond to the gold standard and b1 to the before_scDenorm cell type. The -log10(adjusted p-value) values reproduced do not match the reported values.

      1. Conclusion
      2. Follow-up on previous recommendations: In the first round of review, we noted the following points: -- Add a requirement file that lists all the needed packages with their exact versions. Authors provided an installed_packages.csv which allowed to manually reconstruct the R environment. However, a functional environment.yaml is required. -- Make sure all data files needed to reproduce the figures are available in the repository. The authors updated the Zenodo link and uploaded all relevant intermediate files. -- Clearly explain which parts of the results may vary due to randomness in the model and how much variation users should expect. This point remains insufficiently addressed.

      3. Summary of the second computational reproducibility review

      Both scripts used to reproduce the figure 5f were executed, but several issues were encountered. The results obtained differ from the ones reported in the manuscript. In particular: -- Several p-values could not be reproduced, -- Some discrepancies appeared in the GO enrichment analysis. Some clarifications are required for the GO analysis about why some cell types are not present after filtering.

      Significant manual intervention was required, to improve the reproducibility, here is some new recommendations: -- Improve the readme file. The readme does not reflect the real procedure needed to reproduce the results (incorrect docker instructions, missing steps, outdated notebook list). Clear instructions should be added regarding: --- the required jupyter installation, --- file paths and folder structure, --- link to the zenodo --- how to run each notebook -- Provide a functional environment.yaml. The provided docker image fails to create the required Python and R environments.

    1. We also replaced Node.js APIs that Next.js had provided polyfills for (Buffer, url.parse, and others) with browser-native alternatives

      This is still the wrong way to do it. The right way to do it is to insulate your program from platform APIs entirely. It has the nice side benefit of making the code better, too.

    1. Reviewer #2 (Public review):

      The authors aim to investigate the ability of evolution to create strong transcription factor binding sites (TFBSs) de novo in E. coli. They focus on three global transcriptional regulators: CRP, Fis, and IHF, using a massively parallel reporter assay to evaluate the regulatory effects of over 30,000 TFBS variants. By analyzing the resulting genotype-phenotype landscapes, they explore the ruggedness, accessibility, and evolutionary dynamics of regulatory landscapes, providing insights into the evolutionary feasibility of strong gene regulation. Their experiments show that de novo adaptive evolution of new gene regulation is feasible. It is also subject to a blend of chance, historical contingency, and evolutionary biases that favor some peaks and evolutionary paths.

      (1) Strengths of the methods and results:

      The authors successfully employed a well-designed sort-seq assay combined with high-throughput sequencing to map regulatory landscapes. The experimental design ensures reliable measurement of regulation strengths. Their system accounts for gene expression noise and normalizes measurements using appropriate controls.

      Comprehensive Landscape Mapping:<br /> The study examines ~30,000 TFBS variants per transcription factor, providing statistically robust and thorough maps of the regulatory landscapes for CRP, Fis, and IHF. The landscapes are rigorously analyzed for ruggedness (e.g., number of peaks) and epistasis, revealing parallels with theoretical uncorrelated random landscapes.

      Evolutionary Dynamics Simulations:<br /> Through simulations of adaptive walks under varying population dynamics, the authors demonstrate that high peaks in regulatory landscapes are accessible despite ruggedness. They identify key evolutionary phenomena, such as contingency (multiple paths to peaks) and biases toward specific evolutionary outcomes.

      Biological Relevance and Novelty:<br /> The author's work is novel in focusing on global regulators, which differ from previously studied local regulators (e.g., TetR). They provide compelling evidence that rugged landscapes are navigable, facilitating de novo evolution of regulatory interactions. The comparison of landscapes for CRP, Fis, and IHF underscores shared topographical features, suggesting general principles of global transcriptional regulation in bacteria.

      (2) Weaknesses of the methods and results:

      Undersampling of Genotype Space:<br /> Approximately 40% of the theoretical TFBS genotype space remains uncharacterized after quality filtering. The authors now discuss this limitation more explicitly and provide analyses suggesting that undersampling does not strongly bias their conclusions at the landscape level. Nevertheless, predictive modeling approaches could further extend these landscapes in future work.

      Simplified Regulatory Architecture:<br /> The study considers a minimal system consisting of a single TFBS upstream of a reporter gene. While this simplification allows clean interpretation and high-throughput measurement, natural promoters often involve combinatorial regulation and chromosomal context effects that may alter landscape topography.

      Lack of Experimental Evolution Validation:<br /> The evolutionary conclusions are based on simulations rather than direct experimental evolution. The authors provide a reasonable justification for this choice and frame their conclusions at the statistical level rather than for specific trajectories, but experimental validation would be a valuable future extension.

      Impact on the Field:<br /> This study advances our understanding of adaptive landscapes in gene regulation and offers a critical step toward deciphering how global regulators evolve de novo binding sites. The findings provide foundational insights for synthetic biology, evolutionary genetics, and systems biology by highlighting the evolutionary accessibility of strong regulation in bacteria.

      Utility of Methods and Data:<br /> The sort-seq approach, combined with landscape analysis, provides a robust framework that can be extended to other transcription factors and systems. If made publicly available, the study's data and code would be valuable for researchers modeling transcriptional regulation or studying evolutionary dynamics.

      Additional Context:<br /> The study builds on a growing body of work exploring regulatory evolution. For instance, recent studies on local regulators like TetR and AraC have revealed high ruggedness and epistasis in TFBS landscapes. This study distinguishes itself by focusing on global regulators, which are more complex biologically and more influential in bacterial gene networks. The observed evolutionary contingency aligns with findings in other biological systems, such as protein evolution and RNA folding landscapes, underscoring the generality of these evolutionary principles.

      Conclusion:<br /> The authors successfully mapped the genotype-phenotype landscapes for three global regulators and simulated evolutionary dynamics to assess the feasibility of strong TFBS evolution. They convincingly demonstrate that ruggedness and epistasis, while prominent, do not preclude the evolution of strong regulation. Their results support the notion that gene regulation evolves through a blend of chance, contingency, and evolutionary biases.

      This paper makes a significant contribution to the understanding of regulatory evolution in bacteria. While minor limitations exist, the authors' methods are robust, and their findings are well-supported. The work will likely be of broad interest to researchers in molecular evolution, synthetic biology, and gene regulation.

    2. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The main weakness of this paper, in my view, is that it felt disconnected from the larger body of work on fitness and genotype-phenotype landscapes, including previous data on TFBSs in E. coli, genotype-phenotype maps of TFBSs in other systems, protein sequence landscapes (e.g., from mutational scans or combinatorially-complete libraries), and fitness landscapes of genomic mutations (e.g., combinatorially-complete landscapes of antibiotic resistance alleles). I have no doubt the authors are experts in this literature, and they probably cite most of it already given the enormous number of references. But they don't systematically introduce and summarize what was already known from all that work, and how their present study builds on it, in the Abstract and Introduction, which left me wondering for most of the paper why this project was necessary. Eventually, the authors do address most of these points, but not until the end, in the Discussion. Readers who have no familiarity with this literature might read this paper thinking that it's the first paper ever to study topography and evolutionary paths on genotype-phenotype landscapes, which is not true.

      There were two points that made this especially confusing for me. First, in order to choose which nucleotides in the binding sites to vary, the authors invoke existing data on the diversity of these sequences (position-weight matrices from RegulonDB). But since those PWMs can imply a genotype-phenotype map themselves, an obvious question I think the authors needed to have answered right away in the Introduction is why it is insufficient for their question. They only make a brief remark much later in the Results that the PWM data is just observed sequence diversity and doesn't directly reflect the regulation strength of every possible TFBS sequence. But that is too subtle in my opinion, and such a critical motivation for their study that it should be a major point in the Introduction.

      The second point where the lack of motivation in the Introduction created confusion for me was that they report enormous levels of sign epistasis in their data, to the point where these landscapes look like random uncorrelated landscapes. That was really surprising to me since it contrasts with other empirical landscape data I'm familiar with. It was only in the Discussion that I found some significant explanation of this - namely that this could be a difference between prokaryotic TFBSs, as this paper studies, and the eukaryotic TFBSs that have been the focus of many (almost all?) previous work. If that is in fact the case - that almost all previous studies have focused on eukaryotic TFBSs or other kinds of landscapes, and this is the first to do a systematic test of prokaryotic TFBS, then that should be a clear point made in the Abstract and Introduction. (I find a comparable statement only in the very last paragraph of the Discussion.) If that's the case, then I would also find that point to be a much stronger, more specific conclusion of this paper to emphasize than the more general result of observing epistasis and contingency (as is currently emphasized in the Abstract), which has been discussed in tons of other papers. This raises all sorts of exciting questions for future studies - why do the landscapes of prokaryotic TFBSs differ so dramatically from almost all the other landscapes we've observed in biology? What does that mean for the evolutionary dynamics of these different systems?

      We thank the reviewer for this thoughtful and detailed critique. We agree that the original version of the manuscript did not sufficiently motivate the study early on, nor did it clearly position our work within the broader literature on genotype–phenotype (GP) and fitness landscapes. We also agree that two specific issues, the role of PWMs and the unexpectedly high levels of sign epistasis, were insufficiently explained early on, which could lead to confusion for readers not already familiar with this field.

      Positioning within the broader landscape literature

      In response, we have substantially revised the Abstract and Introduction to explicitly situate our work within existing empirical studies of GP and fitness landscapes, including TFBS landscapes in bacteria, eukaryotic TFBS genotype–phenotype maps, in vitro TF–DNA binding studies, deep mutational scans of proteins, and combinatorially complete fitness landscapes such as antibiotic resistance alleles (Abstract; Introduction, lines 64–85). We now make clear that our study builds directly on this extensive body of work, rather than introducing the landscape framework itself. For example, we write in the introduction:

      “Over the last decade, genotype–phenotype (GP) maps and fitness landscapes have become central tools for understanding how molecular systems evolve under mutation and selection[22–25]. Such maps and landscapes have been experimentally studied for DNA[6,8,18,19,26,27], protein[28–32] and RNA[33–35] molecules, revealing key topographical properties that shape evolutionary outcomes, including epistasis[24,36]—the non-additive effects of multiple mutations on phenotype—landscape ruggedness, reflected in the number and distribution of fitness peaks, and constraints on adaptive evolution.”

      At the same time, we clarify what remains rare in the literature: large-scale, in vivo genotype–phenotype landscapes for bacterial transcription factor binding sites that are sufficiently dense to support explicit evolutionary analyses. While numerous high-throughput studies have characterized bacterial regulatory elements, these datasets typically do not provide quantitative regulatory phenotypes across large genotype spaces, nor do they analyze evolutionary accessibility. To our knowledge, only one such in vivo TFBS landscape had previously been characterized at comparable resolution for a bacterial local regulator (TetR). Our work extends this approach to three global regulators, enabling systematic comparisons across prokaryotic systems (Abstract, Introduction, lines 64–85). For example, we write in the introduction:

      “For transcription factor binding sites, most pertinent large-scale studies are based on in vitro binding assays, such as protein-binding microarrays (PBMs), and they focus predominantly on eukaryotic transcription factors[6]. While these studies have been instrumental in characterizing transcription factor binding preferences, they typically do not measure regulatory output in a native cellular context. In contrast, comprehensive in vivo data for bacterial TFBSs remain extremely rare. To our knowledge, only two high-resolutionin vivo landscapes have been previously mapped for bacterial regulators, those of the local regulators TetR[18] and LacI[27]. As a result, it remains unclear whether principles inferred from protein landscapes, eukaryotic TFBSs, or in vitro binding assays generalize to transcriptional regulation in bacteria, particularly for global regulators[11] that integrate multiple physiological signals.”

      Why PWMs are insufficient for our question.

      We agree with the reviewer that our original explanation of the role of PWMs was too cursory and should have been addressed explicitly in the Introduction. We have now revised the Introduction to clearly explain why PWMs derived from RegulonDB cannot substitute for empirical GP landscapes in our study (Introduction, lines 102–113).

      In this passage we now explain that, first, PWMs are inferred from a limited number of naturally occurring binding sites—typically on the order of hundreds of sequences—whose diversity reflects evolutionary history and genomic context rather than systematic exploration of sequence space. As a result, PWMs sample only a small and biased subset of the possible TFBS variants, whereas our libraries probe tens of thousands of sequences in a controlled manner, providing substantially broader and more uniform coverage of genotype space (Introduction, lines 102–113).

      Second, PWM scores are not direct measurements of regulatory strength. Instead, they represent probabilistic or heuristic scores that are primarily used for identifying candidate binding sites in genomes. Numerous studies have shown that PWM scores often correlate weakly with in vivo binding affinity or regulatory output, where DNA shape, cooperative interactions, and chromosomal context play important roles. As such, PWMs do not provide quantitative genotype–phenotype relationships for regulation strength (Introduction, lines 102–113).

      Third, PWMs assume independent and additive contributions of individual nucleotide positions. This assumption excludes epistatic interactions by construction. Because epistasis is central to landscape ruggedness, peak structure, and evolutionary accessibility, PWM-based models are fundamentally unsuited to address the evolutionary questions we study here (Introduction, lines 102–113). We now explicitly state this limitation early in the manuscript, rather than only alluding to it later in the Results.

      Sign epistasis and contrast with prior TFBS landscapes.

      We also agree with the reviewer that the extensive sign epistasis we observe—approaching levels expected for uncorrelated random landscapes—is surprising in light of much of the existing empirical landscape literature. Importantly, as the reviewer notes, most previous TFBS landscape studies have focused on in vitro binding systems or on eukaryotic transcription factors, which tend to exhibit smoother and more additive landscapes.

      To address this concern, we have revised the Abstract and Introduction to explicitly frame this contrast as a central result of the study (Abstract; Introduction, lines 151-153, Discussion, lines 652–668). For example, we write in the discussion:

      “We showed that the regulatory landscapes of all three TFs are highly rugged and have multiple peaks. The ruggedness of all three landscapes is also supported by the prevalence of epistasis between pairs of TFBS mutations (Supplementary Table S5). A particularly important form of epistasis is sign epistasis[24,93,94], because it can lead to multiple adaptive peaks [24,93,94] (see Supplementary Methods 7.5). Our landscapes contain up to 65% of mutation pairs with sign epistasis, a value that is especially high compared to the almost exclusively additive interactions of mutations in eukaryotic TFs[6,125].”

      We now emphasize that prokaryotic TFBS landscapes, particularly for global regulators, appear to be substantially more rugged and epistatic than most previously characterized TFBS landscapes, and that this difference likely reflects fundamental biological distinctions between regulatory systems.

      Revised emphasis and conclusions.

      Following the reviewer’s suggestion, we have adjusted the emphasis of the manuscript accordingly. Rather than highlighting epistasis and contingency as generic evolutionary phenomena, we now present the extreme ruggedness of prokaryotic TFBS landscapes as a system-specific finding with important implications for the evolution of gene regulation. We explicitly note that this raises new questions for future work—such as why prokaryotic regulatory landscapes differ so markedly from eukaryotic ones, and how these differences shape evolutionary dynamics—which we now highlight in the Introduction and Discussion (Abstract; Introduction, lines 151-153, Discussion, lines 652–668). For example, we write in the discussion:

      “… A possible reason for this greater incidence of epistasis lies in the nature of prokaryotic TFBSs. Specifically, prokaryotic TFBSs are at approximately 20bps twice as long as eukaryotic TFBSs[80,128] and exhibit symmetries that reflect the dimeric state of their cognate TFs[129–131]. These factors may increase the likelihood of intramolecular epistasis. Our observations raise important questions for future work, such as why the landscapes of prokaryotic TFBSs differ so dramatically from those of eukaryotic ones. And what do these differences imply for the evolutionary dynamics of gene regulation?”

      We believe that these revisions substantially improve the clarity, motivation, and positioning of the manuscript, and directly address the reviewer’s concerns by making both the necessity and the novelty of the study clear from the outset.

      (2) I am a bit concerned about the lack of uncertainties incorporated into the results. The authors acknowledge several key limitations of their approach, including the discreteness of the sort-seq bins in determining possible values of regulation strength, the existence of a large number of unsampled sequences in their genotype space, as well as measurement noise in the fluorescence readouts and sequencing. While the authors acknowledge the existence of these factors, I do not see much attempt to actually incorporate the effect of these uncertainties into their conclusions, which I suspect may be important. For example, given the bin size for the fluorescence in sort-seq, how confident are they that every sequence that appears to be a peak is actually a peak? Is it possible that many of the peak sequences have regulation strengths above all their neighbors but within the uncertainty of the fluorescence, making it possible that it's not really a peak? Perhaps such issues would average out and not change the statistical nature of their results, which are not about claiming that specific sequences are peaks, just how many peaks there are. Nevertheless, I think the lack of this robustness analysis makes the results less convincing than they otherwise would be.

      We thank the reviewer for raising this important concern. We fully agree that uncertainties arising from experimental resolution, measurement noise in fluorescence and sequencing, and incomplete sampling of genotype space should be incorporated explicitly into the analysis. While these limitations were acknowledged qualitatively in the original manuscript, we recognize that a direct, quantitative assessment of their impact on our conclusions is essential to strengthen the robustness of the study.

      We first clarify that regulation strength is not discretized in our analysis. For each TFBS, regulation strength is calculated as a continuous weighted average of fluorescence across all sorting bins, based on the sequencing read-count distribution of each sequence across bins. We clarified this information in the main text (Results, lines 201-203). Nevertheless, finite binning resolution and experimental noise introduce uncertainty in these estimates, which could in principle affect the identification of local peaks.

      Importantly, our study does not aim to assert that specific TFBS sequences are definitively peaks. Rather, our focus is on landscape-level statistical and topological properties—such as ruggedness, the abundance and distribution of peaks, and the evolutionary accessibility of strong regulation. We therefore centered our new analyses on testing whether these conclusions are robust to experimentally plausible sources of uncertainty, rather than on the identity of individual peaks.

      To address the reviewer’s concern, we performed two complementary analyses. The first evaluates whether the observed ruggedness of the landscapes could arise as an artifact of incomplete sampling. It addressed the effects of missing genotypes and the possibility of spurious peak identification due to unsampled neighbors. Sparse sampling can introduce opposing biases: true peaks may be missed, while other genotypes may be falsely classified as peaks because fitter neighbors are absent. As shown for uncorrelated random (House-of-Cards) landscapes (Kauffman & Levin, 1987), these effects can partially cancel.

      In this analysis, we constructed a null model by randomly permuting regulation strengths across the mapped genotype network while preserving its topology. The number of peaks in these randomized landscapes is only modestly higher than in the empirical data, indicating that the measured landscapes are close to the maximal ruggedness compatible with the sampled network (Results, lines 308–320).

      In addition, we quantified potential sampling bias by analyzing genotype connectivity. Here we defined the relative connectivity of a genotype as the fraction of possible single-mutant neighbors for which we had measured regulation strength. We observed only a very weak correlation between connectivity and regulation strength (R=-0.1, -0.1, 0.01 for the CRP, Fis, and IHF landscapes, Figures S13-S15). Similarly, the relative connectivity of peak genotypes is only weakly correlated with their regulation strength (R=-0.05, -0.04, 0.06 for the CRP, Fis, and IHF landscapes). (Results, lines 321–330), indicating that strongly regulating genotypes are not preferentially oversampled or undersampled (Results, lines 321–330).

      The second, and most important, analysis directly addresses the reviewer’s concern that experimental uncertainty could affect peak classification and, consequently, landscape navigability. We explicitly incorporated experimentally measured, genotype-specific noise estimates from biological replicates when comparing fitness values between neighboring genotypes. Using these uncertainty-aware comparisons, we then recomputed adaptive-walk dynamics and genotype visitation frequencies on the resulting noisy landscapes.

      We observe strong correlations between visitation frequencies in the noise-free and noisy landscapes across all three transcription factors (new Supplementary Figure S35), indicating that evolutionary accessibility patterns are robust to realistic levels of experimental uncertainty. These analyses are described in the revised Results (lines 622–636) and in a new Supplementary Methods section (“Incorporation of experimental uncertainty into adaptive walks”).

      Reviewer #2 (Public review):

      The authors aim to investigate the ability of evolution to create strong transcription factor binding sites (TFBSs) de novo in E. coli. They focus on three global transcriptional regulators: CRP, Fis, and IHF, using a massively parallel reporter assay to evaluate the regulatory effects of over 30,000 TFBS variants. By analyzing the resulting genotype-phenotype landscapes, they explore the ruggedness, accessibility, and evolutionary dynamics of regulatory landscapes, providing insights into the evolutionary feasibility of strong gene regulation. Their experiments show that de novo adaptive evolution of new gene regulation is feasible. It is also subject to a blend of chance, historical contingency, and evolutionary biases that favor some peaks and evolutionary paths.

      (1) Strengths of the methods and results:

      The authors successfully employed a well-designed sort-seq assay combined with high-throughput sequencing to map regulatory landscapes. The experimental design ensures reliable measurement of regulation strengths. Their system accounts for gene expression noise and normalizes measurements using appropriate controls.

      Comprehensive Landscape Mapping:

      The study examines ~30,000 TFBS variants per transcription factor, providing statistically robust and thorough maps of the regulatory landscapes for CRP, Fis, and IHF. The landscapes are rigorously analyzed for ruggedness (e.g., number of peaks) and epistasis, revealing parallels with theoretical uncorrelated random landscapes.

      Evolutionary Dynamics Simulations:

      Through simulations of adaptive walks under varying population dynamics, the authors demonstrate that high peaks in regulatory landscapes are accessible despite ruggedness. They identify key evolutionary phenomena, such as contingency (multiple paths to peaks) and biases toward specific evolutionary outcomes.

      Biological Relevance and Novelty:

      The author's work is novel in focusing on global regulators, which differ from previously studied local regulators (e.g., TetR). They provide compelling evidence that rugged landscapes are navigable, facilitating de novo evolution of regulatory interactions. The comparison of landscapes for CRP, Fis, and IHF underscores shared topographical features, suggesting general principles of global transcriptional regulation in bacteria.

      (2) Weaknesses of the methods and results:

      Undersampling of Genotype Space:

      While the quality filtering of the data ensures robustness, ~40% of the TFBS space remains uncharacterized. The authors acknowledge this limitation but could improve the analysis by employing subsampling or predictive modeling.

      We thank the reviewer for raising this point. We agree that undersampling of genotype space is an important limitation of our dataset and that, in principle, subsampling or predictive modeling approaches could be used to address missing genotypes. We have now clarified in the manuscript why these approaches are not straightforward in the context of our analyses and why we did not pursue them here.

      Although approximately 40% of TFBS genotypes were removed during the filtering step due to lack of reliable measurements, this filtering step was necessary to ensure robust estimation of regulation strength from sort-seq data. Importantly, random subsampling of the genotypes in our data set would not alleviate this limitation, because many of our key analyses—such as peak identification, quantification of epistasis, and assessment of evolutionary accessibility—require combinatorially complete local neighborhoods in genotype space. Subsampling would remove mutational neighbors from many neighborhoods, and thus further limit our ability to characterize landscape topology.

      Predictive modeling approaches could, in principle, be used to infer missing genotypes and reconstruct more complete landscapes. However, developing, experimentally validating, and benchmarking such models would not only substantially expand the scope of an already long paper, it would  also require additional assumptions about genotype–phenotype relationships that entail their own limitations. Our primary goal in this work was to provide the first large-scale empirical in vivo regulatory landscapes for global bacterial transcription factors, comprising tens of thousands of experimentally measured variants. We view these empirical landscapes as a necessary foundation upon which predictive modeling and landscape completion can be built in future, complementary studies.

      We have now revised the Discussion (lines 760-770) to explicitly articulate these points and to clarify that, while undersampling remains a limitation, it does not invalidate the landscape-level conclusions we draw from the combinatorially complete neighborhoods present in our data. There we also outline predictive modeling as an important directions for future work.

      For a more detailed answer regarding subsampling and peak classification, please also see our response to comment (2) of Reviewer #1.

      Simplified Regulatory Architecture:

      The study considers a minimal system of a single TFBS upstream of a reporter gene. While this may have been necessary for clarity, this simplification may not reflect the combinatorial complexity of transcriptional regulation in vivo.

      Point well taken. We have added paragraph to state explicitly that the system we use to study gene regulation is much simpler than most in vivo regulatory circuits (Discussion, lines 797-802)

      Lack of Experimental Validation of Simulations:

      The adaptive walks are based on simulated dynamics rather than experimental evolution. Incorporating in vivo experimental evolution studies would strengthen the conclusions. Although this is a large request for the paper, that would not prevent publication.

      We thank the reviewer for this important point. We fully agree that in vivo experimental evolution would provide a valuable and complementary way to validate the evolutionary dynamics inferred from our simulations. However, we ask for the reviewer's understanding that adding experimental evolution to an (already long) paper would go far beyond the scope of our study.

      Also, the goal of our study was not to reproduce evolutionary trajectories experimentally, but to characterize the structure of large empirical regulatory landscapes, and to use these landscapes as a data-driven basis for exploring evolutionary accessibility under well-defined population-genetic assumptions. The adaptive walks we employ are parameterized directly from experimentally measured genotype–phenotype maps, and incorporate established fixation probabilities. Such walks have been widely used to study evolutionary dynamics on empirical landscapes when experimental evolution is not tractable, because it would involve tens of thousands of genotypes that represent small mutational targets and would thus take a long time to evolve.

      An additional issue related to the feasibility of experimental evolution is that performing in vivo experimental evolution for the regulatory landscapes analyzed here would require tracking large populations across a combinatorially vast TFBS space, while simultaneously measuring regulatory phenotypes for thousands of evolving lineages, which is currently not experimentally feasible. This is another reason why simulation-based approaches have been the standard method for linking large-scale empirical landscapes to evolutionary dynamics in both theoretical and experimental studies.

      Furthermore, our conclusions are intentionally framed at the level of statistical and landscape-wide properties (e.g., accessibility of high peaks, contingency, and evolutionary bias), rather than at the level of specific mutational trajectories. As such, they do not rely on the precise reproduction of any single evolutionary path, but on aggregate patterns that are robust to reasonable variation in population-genetic parameters.

      In sum, we do not view experimental evolution as essential for the conclusions we draw, but as an important and exciting direction for future work that may be enabled by the landscapes we have experimentally mapped.

      Impact on the Field:

      This study advances our understanding of adaptive landscapes in gene regulation and offers a critical step toward deciphering how global regulators evolve de novo binding sites. The findings provide foundational insights for synthetic biology, evolutionary genetics, and systems biology by highlighting the evolutionary accessibility of strong regulation in bacteria.

      Utility of Methods and Dat

      The sort-seq approach, combined with landscape analysis, provides a robust framework that can be extended to other transcription factors and systems. If made publicly available, the study's data and code would be valuable for researchers modeling transcriptional regulation or studying evolutionary dynamics.

      Additional Context:

      The study builds on a growing body of work exploring regulatory evolution. For instance, recent studies on local regulators like TetR and AraC have revealed high ruggedness and epistasis in TFBS landscapes. This study distinguishes itself by focusing on global regulators, which are more biologically complex and influential in bacterial gene networks. The observed evolutionary contingency aligns with findings in other biological systems, such as protein evolution and RNA folding landscapes, underscoring the generality of these evolutionary principles.

      Conclusion:

      The authors successfully mapped the genotype-phenotype landscapes for three global regulators and simulated evolutionary dynamics to assess the feasibility of strong TFBS evolution. They convincingly demonstrate that ruggedness and epistasis, while prominent, do not preclude the evolution of strong regulation. Their results support the notion that gene regulation evolves through a blend of chance, contingency, and evolutionary biases.

      This paper makes a significant contribution to the understanding of regulatory evolution in bacteria. While minor limitations exist, the authors' methods are robust, and their findings are well-supported. The work will likely be of broad interest to researchers in molecular evolution, synthetic biology, and gene regulation.

      We thank the reviewer for their thorough evaluation and for their supportive opinion of this paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 28 (Abstract): "Landscape ruggedness does not prevent the evolution of strong regulation, because more than 10% of evolving populations can attain one of the highest peaks." I did not find this interpretation very convincing; only 10% of populations being able to achieve strong regulation sounds to me like ruggedness DOES impede adaptation in the vast majority of cases.

      We thank the reviewer for this thoughtful comment and agree that our original phrasing in the Abstract overstated this conclusion. We did not intend to imply that landscape ruggedness has only a minor effect on adaptation. On the contrary, our results clearly show that ruggedness strongly constrains evolutionary outcomes and prevents the majority of evolving populations from reaching the globally highest regulatory peaks. We have therefore toned down the wording in both the Abstract and the Discussion (lines 670-679) to reflect this more accurately. For example, in the abstract we now state

      “Nonetheless, evolutionary simulations show that ~10% of evolving populations can reach a peak of strong regulation, a proportion that is significantly greater than in comparable random landscapes.”

      In the discussion we state:

      “… Specifically, our evolutionary simulations show that 10% of populations with a size typical of E. coli reach one of the highest peaks. This percentage is significantly higher than in randomized landscapes (Supplementary Methods 9; Supplementary Figure S30)"

      Our intended interpretation was more limited: namely, that ruggedness does not fully preclude the evolution of strong regulation. In highly rugged landscapes with extensive sign epistasis—whose topological properties approach those of uncorrelated random landscapes—the a priori expectation is that access to the strongest peaks could be vanishingly rare or effectively impossible under Darwinian evolution. In this context, observing that a non-negligible fraction of populations (on the order of 10%) can reach one of the highest peaks suggests that strong regulation remains evolutionarily attainable, even though it is far from guaranteed.

      Motivated by the reviewer’s suggestion, we also added a null-model analysis that makes this point more explicitly and quantitatively. Specifically, we constructed randomized landscapes by permuting regulation-strength values across genotypes while preserving the experimentally sampled genotype network topology and all parameters of the evolutionary simulations (Supplementary Methods 9, “Randomized landscape null model for peak accessibility”). We then repeated the adaptive-walk simulations on these shuffled landscapes. This null model provides an expectation for peak accessibility in landscapes with identical sampling, neighborhood structure, and evolutionary dynamics, but without genotype–phenotype correlations.

      Using this null model, we find that the fraction of populations that reach high peaks in the empirical landscapes is substantially higher than expected by chance alone (new Supplementary Figure S30; Results, lines 504–516). Specifically, across the three transcription factors, empirical landscapes exhibit on average a ~3-fold higher accessibility of high regulatory peaks than shuffled landscapes. This comparison does not weaken the conclusion that ruggedness strongly impedes adaptation; rather, it shows that the structure of the measured genotype–phenotype landscapes enables greater accessibility of strong regulation than would be expected in equally rugged but unstructured landscapes.

      In response to the reviewer’s concern, we have revised the abstract and main text to avoid the phrase “does not prevent” and to more accurately convey this balance between constraint and accessibility. We now emphasize that ruggedness strongly constrains adaptation, while still allowing access to strong regulatory peaks at rates that exceed null expectations. (Discussion, lines 512-516). For example, in the discussion we state:

      “… In sum, rugged regulatory landscapes strongly constrain evolutionary trajectories, yet do not render the evolution of strong regulation vanishingly rare. Instead, strong regulatory phenotypes remain evolutionarily attainable at levels that exceed null expectations, even though they are reached by only a minority of evolving populations.”

      We believe that the revised wording, together with the added null-model analysis more faithfully represents our results and strengthens the quantitative interpretation of accessibility in these landscapes.

      (2) Line 123: I found the explanation of the plasmid system and the accompanying SI figures (Figures S1 and S2) confusing in terms of how many plasmids there were. In particular, the Figure S1 graphics show the plasmid specifically with CRP but the text in the graphic and in the caption refers to the plasmid pCAW-Sort-Seq-V2 (which, according to Table S1, isn't that just the base plasmid without any TF?). Figure S2 also shows the plasmid with CRP and does specify pCAW-Sort-Seq-V2-CRP-CRP0 in the graphic, but then the caption refers again only to the base plasmid pCAW-Sort-Seq-V2. I recommend the authors clarify these items for readers who might want to reproduce or build upon their system. In particular, I recommend the main text explain more explicitly that they generate three versions of this plasmid (one for each TF), and then on the backgrounds of each of those three plasmids, a whole library with all the binding site variants.

      We thank the reviewer for pointing out this lack of clarity. We agree that the original description of the plasmid system and the accompanying Supplementary Figures S1 and S2 could be confusing with respect to how many plasmids were used and how they differ.

      To clarify the experimental design, we start from a common backbone plasmid, pCAW-Sort-Seq-V2, which contains all shared regulatory and reporter elements but does not encode any transcription factor. From this backbone, we generated three distinct TF-specific plasmids, each carrying one of the transcription factors studied here—CRP, Fis, or IHF—resulting in pCAW-Sort-Seq-V2-CRP, pCAW-Sort-Seq-V2-Fis, and pCAW-Sort-Seq-V2-IHF. On the background of each TF-specific plasmid, we then constructed a complete library of plasmids containing all variants of the corresponding TF binding site cloned upstream of the reporter gene.

      We have revised the main text to explicitly describe this plasmid hierarchy and library construction strategy and to clarify that three TF-specific plasmids were generated prior to TFBS library construction (Results, Landscape mapping section; lines 159–193). In addition, we have redesigned Supplementary Figures S1 and S2 to facilitate understanding of the plasmid system. Specifically, these figures now clearly distinguish between the base plasmid backbone and the TF-specific plasmid derivatives. Also, the plasmid names shown in the graphics and captions are now consistent with those listed in Supplementary Table S1. Upon final publication, we will also deposit the sequences of all plasmids in Addgene to further facilitate reproducibility.

      (3) Line 135: Can the authors clarify whether these TFs are essential in these media conditions and, if not, why? I was expecting them to be so given the core functions of these TFs as described in the Introduction, but then Figure S3 appears to show that all knockouts are viable.

      We thank the reviewer for raising this important point and apologize for the lack of clarity in the original version of the manuscript. The transcription factors CRP, Fis, and IHF are not essential for viability under the growth conditions used in this study, but they are important for optimal growth and cellular fitness, consistent with their roles as global regulators.

      Under our experimental conditions, single-gene knockout strains (Δcrp, Δfis, and Δihf) are viable but exhibit slower growth dynamics compared to the wild-type strain, reflecting impaired regulation of core cellular processes (Supplementary Figure S3). This behavior is consistent with previous work showing that many global transcriptional regulators in E. coli are conditionally essential or strongly fitness-affecting, rather than absolutely essential under standard laboratory conditions.

      Importantly, while single knockouts remain viable, double mutants involving these global regulators are not viable, indicating substantial functional redundancy and network-level essentiality among global transcription factors. This explains why each TF can be studied individually in isolation, while combinations of deletions cannot be maintained.

      We have now clarified this point in the Results section by explicitly stating that the knockout strains show reduced growth rates but reach comparable cell densities during late exponential or early stationary phase, the growth phase at which all measurements were performed (Results, Landscape mapping section; lines 185–193). This clarification reconciles the apparent discrepancy between the biological importance of these transcription factors discussed in the Introduction and the viability of the single-knockout strains shown in Supplementary Figure S3.

      (4) Lines 141 and 227: The authors appear to refer to two different citations for different versions of RegulonDB (refs. 47 and 66). Did they actually use both versions for different purposes (if so, why?), or is this a typo?

      We thank the reviewer for noticing this inconsistency. We did not use two different versions of RegulonDB. The two separate references were an error. We have now corrected this by using a single, consistent RegulonDB citation in both locations.

      (5) Line 166 (Figure 1 caption): I think 2^8 here should be 4^8.

      Thank you. We have corrected “2<sup>8</sup>” to “4<sup>8</sup>” in the Figure 1 caption.

      (6) Figure 2Are the distributions in Figure 2a (regulation strengths across all TFBSs in the libraries) equivalent to the distributions in Figures S4-S6 (direct fluorescence readout from cell sorting), just transformed from fluorescence to regulation strength? If so I think that would be helpful to clarify, perhaps in the captions to Figures S4-S6 so that it's clear these contain the same information.

      No. Figures S4–S6 and Figure 2a do not show the same distributions. Figures S4–S6 display the raw fluorescence distributions obtained from cell sorting, whereas Figure 2a shows regulation strengths (S), which are derived quantities computed from these fluorescence data. Specifically, regulation strength is calculated as a weighted average over fluorescence bins using the sequencing read distribution for each TFBS (see Methods, “Regulation strengths”).

      To clarify this relationship, we have revised the main text (lines 201-203 and Figure 1b-c), to explicitly state how regulation strengths (S) were calculated.

      (7) Figure 2b: Can the authors label each logo/frequency matrix with its corresponding TF name in the graphic itself? I think this is only implied in the caption.

      We have updated Figure 2b to label each sequence logo / frequency matrix directly in the graphic with its corresponding transcription factor name (CRP, Fis, or IHF), in addition to mentioning these names in the caption. This change clarifies the figure and makes the TF identity immediately apparent to the reader.

      (8) Lines 290 and 298 (Figure 2 caption): The labels for panels b and c appear to be swapped in the caption.

      We thank the reviewer for pointing this out. The labels for panels b and c in the Figure 2 caption were indeed swapped. This has now been corrected.

      (9) Line 379: There is a missing period at the end of this line.

      We have added the missing period at the end of this line.

      (10) Line 400 (Figure 3 caption): There is a missing subtitle for panel c in the caption for this figure (all other panels seem to have bolded subtitles in their captions).

      We have added the missing subtitle for panel c in the Figure 3 caption to match the formatting of the other panels.

      (11) Line 583: There is a missing period after "Methods 7.5)".

      We have added the missing period after “Methods 7.5)”.

      (12) Line 641: "All three landscapes highly rugged" should probably be "All three landscapes are highly rugged".

      We have corrected the sentence to read “All three landscapes are highly rugged.”

    1. Each task includes a unified evaluation framework supporting sandboxed code and APIs, alongside a human reference trajectory annotated with stepwise checkpoints along dual-axis: S-axis and V-axis.

      大多数人认为AI评估可以通过简单的自动化测试完成。但作者提出需要复杂的双轴(S-axis和V-axis)人工参考轨迹和沙箱环境支持,这暗示了评估AI代理能力的极端复杂性远超当前行业的普遍认知。这一观点挑战了AI评估的简化主义倾向,强调了人类参与在评估中的不可替代性。

    1. model alignment alone does not reliably guarantee the safety of autonomous agents.

      大多数人认为模型对齐(alignment)是确保AI系统安全的关键因素,但作者通过实验证明,即使是对齐良好的模型(如Claude Code)在计算机使用代理中也表现出高达73.63%的攻击成功率。这挑战了当前AI安全领域的核心假设,表明仅依赖模型对齐无法解决自主代理的安全问题。

    2. model alignment alone does not reliably guarantee the safety of autonomous agents

      大多数人认为通过模型对齐(alignment)可以有效保证AI代理的安全性,但作者认为这远远不够,因为实验显示即使使用对齐的Qwen3-Coder模型,Claude Code仍有73.63%的攻击成功率。这挑战了当前AI安全领域的主流观点,即单纯依靠模型对齐就能解决安全问题。

    1. 实际效果就是你的 Claude Code、Cursor 或任何支持 MCP 的 AI Agent,可以直接'看到' 𝕏 上的实时数据并执行操作,不需要自己写 API 封装。

      大多数人认为API集成总是需要开发者编写自定义封装代码,但作者强调xAI通过MCP协议实现了无缝集成,这暗示未来API设计可能转向更标准化的直接访问模式,挑战了当前API集成的复杂性常态。

    1. This class of bug is insidious because it evades every layer of defense. It will not be caught in development testing — who runs a test for 50 days? It will not be flagged in code review — the logic looks perfectly reasonable.

      大多数人认为代码审查和测试能捕获大多数系统性缺陷,但作者认为这个bug的特殊性使其能够逃避所有常规检测手段。这挑战了软件质量保证的基本假设,暗示某些缺陷只有在极端条件下才会显现,而常规开发流程无法覆盖这些场景。

    1. Looking at the code and having opinions on architecture is seen as just as 'bad' as calling a compiled C module from an interpreted language was seen back in the day... it's not bad, it's actually quite practical, but it violates some strange 'purity'.

      作者将'氛围编程'的极端主义与历史上编程语言和框架中的'纯粹性'倡导者相提并论,认为两者都坚持不切实际的'纯粹'标准。这一观点挑战了软件开发中追求'纯粹性'的传统,暗示这种追求可能实际上是有害的,阻碍了实用性和效率。

    1. Claude 的 Max Pro 账号额度不允许给第三方产品用了,如果你没有使用 Agent SDK 和 Claude Code 为底座的产品,就不能用这个账号里的额度

      大多数人认为云服务提供商的订阅额度应该具有通用性,但 Anthropic 限制额度只能用于特定产品的做法颠覆了这一认知。这种策略实际上是一种'锁定效应',迫使开发者和用户使用其生态系统产品,反映了 AI 服务提供商从开放向封闭的转变趋势,可能成为行业新标准。

    1. The more important work happens before the agent even starts. An agent operating inside a well-designed system already has the context and constraints it needs to do good work. In Linear, that means project plans, issue backlogs, code, and documentation. These all shape what the agent does and how it does it.

      大多数人认为AI系统的责任在于实时监控和干预,但作者认为真正的责任在于事前的系统设计和环境构建。这一观点将问责制从实时交互转向了系统设计阶段,挑战了传统的AI治理思维。

    1. Code viewof reset_adoption = Inputs.button("Reset adoption defaults", { reduce: () => { // Set viewof values back to defaults viewof p_hydro.value = 0.75; viewof p_hydro.dispatchEvent(new Event("input", {bubbles: true})); viewof p_foodgrade.value = 0.65; viewof p_foodgrade.dispatchEvent(new Event("input", {bubbles: true})); viewof p_recfactors.value = 0.5; viewof p_recfactors.dispatchEvent(new Event("input", {bubbles: true})); viewof gf_progress.value = 50; viewof gf_progress.dispatchEvent(new Event("input", {bubbles: true})); } }) Reset adoption defaultsreset_adoption = 0 Code viewof p_hydro = Inputs.range([0.3, 0.95], { value: 0.75, step: 0.05, label: "P(Hydrolysates for basal media)" })

      'reset adoption defaults' button is invisible -- too dark so too little contrast with the text.

      Make reset defaults buttons more prominent throughout. #implement

    1. Reviewer #1 (Public review):

      Summary:

      This manuscript presents an end-to-end pipeline, intended to accelerate EM-based connectomics by combining low-resolution imaging for large volumes with synapse-level imaging only in selected regions of interest. In principle, this strategy can substantially reduce imaging time, computational demands, analysis time, and overall cost.

      General note:

      Overall, I found the manuscript interesting and valuable, particularly as a description of how one laboratory has assembled and applied a practical workflow to reconstruct and analyze the central complex across multiple insect species. In that sense, the work is compelling as an account of a real, functioning strategy for comparative connectomics, and I appreciated reading it. My main reservation is not about the relevance of the biological problem or the utility of the pipeline in the authors' own hands, but about whether the manuscript, in its current form, fully meets the expectations of a paper that is focused on tools and resources. The expectation would be that this paper would be a venue for sharing new techniques, software tools, datasets, and other resources intended to be usable by the community. Here, because much of the pipeline appears to build on existing methods and software, the key value added should be a particularly clear demonstration of how these components were adapted, integrated, validated, and documented for this specific use case in a way that others could realistically reproduce and adopt. At present, that translational and reproducibility-oriented component does not yet seem sufficiently developed, despite the clear promise of the overall approach.

      Major comments:

      (1) The work is valuable as a practical integration and application of multiple existing tools into a coherent pipeline, together with a new multi-resolution imaging strategy. However, the manuscript at times reads as though it introduces an entirely novel workflow. I would encourage the authors to clarify the contribution more explicitly: which components are genuinely new (for example, the acquisition strategy and the end-to-end integration/validation), and which are adaptations of already established methods or software. This would make the scope and novelty of the paper easier to assess.

      (2) The most distinctive element is the multi-resolution acquisition strategy. However, as described, the selection of high-resolution regions seems to be decided a priori based on anatomy (guided by xCT localization of the CX), rather than being determined automatically from the data (i.e., ROI placement is anatomy-driven rather than data-driven). A more data-driven or machine learning-guided ROI strategy would strengthen the methodological contribution and the adaptability to new scenarios, along the lines of approaches such as SmartEM [1].

      (3) The manuscript emphasizes open-source availability and reduced barriers to entry, but the current software release, as referenced, does not yet appear to support straightforward external reuse. Since much of the pipeline builds on existing methods, the main added value lies in how these technologies were adapted, combined, and validated for the present problem. A clear and complete explanation of this adaptation is therefore essential, but is currently missing. I would suggest the following concrete improvements:<br /> a) Provide a single landing page or umbrella repository that links each pipeline step in the paper to the corresponding codebase, including version tags/commits and expected inputs/outputs for each step.<br /> b) Include step-by-step tutorials for each component.<br /> c) Provide an example dataset together with a full reproduction walkthrough in a controlled environment.<br /> d) Clearly explain the required parameters and configuration for each step, including how they should be adjusted for other datasets or scenarios.<br /> e) Follow packaging and distribution best practices (for example, PyPI/conda releases, Docker containers, and version pinning).

      (4) In my own attempt to set up and run parts of the released code, I encountered issues that currently limit reproducibility. For example, when creating an environment for EMalign (https://github.com/Heinze-lab/EMalign), the required Python version is not specified, and installation did not succeed under Python 3.12 due to dependency constraints. Additionally, synful_312 (https://github.com/Heinze-lab/synful_312) and SegToPCG (https://github.com/Heinze-lab/SegToPCG) appear to be empty despite being referenced in the manuscript. These are fixable issues, but addressing them is important if the paper is to deliver on its "low entry cost" claim.

      (5) Table 1 reports acquisition times, which is helpful. However, the multi-resolution approach adds essential processing steps that appear due to the strategy followed (e.g., "XY alignment high-res" and "high-res to low-res alignment"). Please include registration/alignment (and other major post-processing) runtimes and resource requirements, such as storage, in a comparable table so readers can assess true end-to-end cost.

      References:

      [1] Meirovitch, Y., et al. "SmartEM: machine learning-guided electron microscopy." Nature Methods (2025).

    2. Author Response:

      Public Review:

      On behalf of all authors I would like to thank the reviewers for highly constructive and helpful comments, which, once addressed fully, will make the paper stronger and more useful as a tools and resources contribution.

      Besides addressing all minor issues that were pointed out by the reviewers, we see three main lines of changes we will need to pursue in order to address all major concerns. We plan to do all of these as fast as possible. Given that new alignments, segmentation and tracing is needed, this will take between one and three months.

      (1) Availability of code, software documentation and accessibility of pipeline. 

      Both reviewers and the editorial summary agreed that we need to improve the availability of our code, provide more instructions and examples of how to use the code, and make our methods more reusable to outsiders. To achieve this we will follow the suggestions made by the reviewers, in particular the list presented by reviewer 1 (point three of weaknesses in the public review).

      We firstly would like to apologize for the faulty link to the SegToPCG (https://github.com/Heinzelab/SegToPCG) repository (the correct name and link is: LSDtoPCG and https://github.com/Heinze-lab/LSDtoPCG) as well as the missing code in the https://github.com/Heinze-lab/synful_312 repository; these issues have already been fixed and will be included in an updated bioRxiv version.

      Second, we will generate an overarching umbrella page that will serve as a go-to site for any user who would like to implement our pipeline. To enable implementation, we will expand the documentation, provide detailed instructions, and include an example dataset with these instructions.

      (2) Quantification of analysis steps, including segmentation, alignment and manual tracing, to validate our claims of increased efficiency and transferability across species.

      As for point 1, both reviewers as well as the editorial summary highlighted the need for more comprehensive quantification of the workflow, especially with respect to segmentation quality as well as time investment into manual tracing and high resolution alignments. In particular, these data should validate the transferability of the segmentation models across species, and support the claims made about the time savings resulting from using our multiresolution workflow compared to a whole sample synaptic resolution approach.

      To this aim, we will generate all analyses according to the reviewer suggestions and incorporate the resulting data in new figures and tables. To make the data fully comparable across species, we will apply the latest version of our alignment and segmentation scripts to at least one high resolution data stack of each species, quantify manual tracing of a comparable, defined set of neurons in each species, and perform VOI analyses of each species segmentation against manually traced neurons in identically sized testing volumes in each dataset. Additionally, we will proof-read identical branches of homologous neurons in each species and quantify the required number of edits from raw segmentation output to completion.

      As the segmentation pipeline has evolved over the last years, a fair comparison between all datasets requires fresh analysis based on the latest version of our machine learning models (cannot be done with existing data) and will therefore take a few weeks of time.

      (3) Clarification of aims for multi-resolution pipeline and how projectomes and connectomes inform each other

      Reviewer 2 highlighted that there is not sufficient clarity about the aims of combining projectome and connectome. Judging from the reviewer comment, we might have inadvertently left the impression that we aimed at predicting a connectome from projectome data, by using spatial proximity of neurons as a proxy for connectivity. In fact, our data show that this is not possible, and that projection level data cannot predict connectivity. For instance, in the head direction system, the projectivity data suggests identical circuits for bees and flies (except at the edges of the ring), but connectivity data shows that the components of the ring attractor circuit are forming circuits that are distinctly different between the species (despite the same neurons with the same projection patterns being involved).

      What we aim to do is slightly different. We define global patterns of information flow using the projectome, and then define circuits in a part of this global circuit at synaptic level. Then, we extrapolate the global connectivity by assuming that the circuits identified in one or two computational units (columns) are repeated in each column. This rests on the assumption that the same neurons form the same connections in each repeated module, as long as the cellular repertoire is identical (verified by the projectome), but does not use proximity data to predict connectivity. This method thus only applies to brain regions that consist of repeated computational modules, i.e. where we can assume that knowing the connectivity in one of them allows extrapolation to the entire brain region. While this is a simplification, the Drosophila CX has in principle confirmed this assumption.

      We will generate a new figure in which we illustrate the process of combining local connectomes and global projectomes using examples from our data, but illustrating this schematically also for other brain regions, e.g. the insect optic lobe or the cerebral cortex of mammals. We will also carefully rewrite the relevant text passages to avoid misunderstandings.

      Overall, we would like to thank the reviewers again for their thorough and detailed comments, which will help to make our connectomics workflow more accessible and reproducible.

    1. the code that admonishes killing only out of need

      What is need though? The "need" for a Christmas tree certainly is different from the need to eat (one is about survival, the other about "living" in a way).

    Annotators

    1. se. Valdes-Fallis found that when the message is sent in eitheSpanish or English, but not both, bilingual interlocutors use the last language used by thespeaker in order to follow suit with a sequential response. On the other hand, when the message is sent as a blend of the Spanish and English code systems, the bilingual interlocutresponds symmetrically with a codeswitched response resembling that of the original speaker (1976, 70

      .

    2. s. In the case that one of the bilinguals was not as fluent as the other participants,researchers' data have shown that the nonfluent bilingual's native language would be selected as the appropriate linguistic code for that particular spee

      .

    3. It has been demonstrated that the direction of the language shift in Spanish/Englishcodeswitching may or may not influence the outcome of the conversation. Notwithstanding,because most of the Spanish/English bilingual subjects under investigation live in Hispaniccommunities, they tend to use Spanish as their intimate and personal code in order to convey a sense of intimacy and community solidarity; the English code is often reserved formore objective and impersonal communicative exchanges associated with the external community. (

      .

    4. ilingual speakers have twice as many options to choose from as monolingual speakers whenexpressing their thoughts and ideas because their language repertoire is twofold; not only canbilinguals choose from a variety of styles of speech within the same language, but also theycan switch from one language, or code, system to the other. A

      .

    1. Finally, applied research (e.g., Altarriba & Santiago-Rivera, 1994) hasrevealed that code switching is oftenused strategically in counseling settings, as clients choose to speak in asecond language when trying to distance themselves from emotionalevents. Because the first language isoften associated with a broaderrange of emotions than the secondlanguage, language switching becomes a defense mechanism

      ,

    2. n implication ofthis interpretation is that duringearly stages of bilingualism, whenbilinguals tend to rely more ontheir first language, their codeswitching would mostly involveintrusions from their first languageas they communicate in their second language. However, as the second language becomes the dominant language, their code switchingwould tend to consist of intrusionsfrom the second language as theycommunicate in their first language.

      .

    3. mmunicate in their firstlanguage. This would be becauseof their limited knowledge of theirsecond language. Although thismay be the case for beginning bilinguals, Spanish-English bilinguals in south Texas report moreEnglish interference when theycommunicate in Spanish, and littleor no interference from Spanishwhen they communicate in English. In other words, these bilinguals code-switch more when theycommunicate in Spanish thanwhen they use English

      .

    4. One of the most frequent explanations of why bilinguals code-switchis that they do it to compensate forlack of language proficiency. The argument is that bilinguals codeswitch because they do not knoweither language completel

      .

    1. Now, there are many reasons one might be suspicious about utilitarianism as a cheat code for acting morally,

      This thought instantly came to my mind. Although I see utilitarianism as a helpful framework in some cases, I see it as a harmful one in this case. Why? I think it could be extremley subjective, with major consequences. With things like generative AI, a utilitarian perspective may see the benefit (utility) of AI outweigh the negatives, such as the loss of critical thinking skills in students. Just because generative AI might help with efficency, doesn't mean its consequences should be ignored either.

    1. In the rewrite, refactoring became the core of my workflow. After every large batch of generated code, I’d step back and ask “is this ugly?” Sometimes AI could clean it up. Other times there was a large-scale abstraction that AI couldn’t see but I could; I’d give it the direction and let it execute21. If you have taste, the cost of a wrong approach drops dramatically because you can restructure quickly22.

      this is tiring and i am only good at it after having read a lot of code

    2. AI basically let me put aside all my doubts on technical calls, my uncertainty of building the right thing and my reluctance to get started by giving me very concrete problems to work on. Instead of “I need to understand how SQLite’s parsing works”, it was “I need to get AI to suggest an approach for me so I can tear it up and build something better"18. I work so much better with concrete prototypes to play with and code to look at than endlessly thinking about designs in my head, and AI lets me get to that point at a pace I could not have dreamed about before. Once I took the first step, every step after that was so much easier.

      blank page

    3. Being a maintainer is much more than just “throwing the code out there” and seeing what happens. It’s triaging bugs, investigating crashes, writing documentation, building a community, and, most importantly, having a direction for the project.

      huh, this is the first time open source has soundsd appealing

    1. How are people’s expectations different for a bot and a “normal” user? Choose an example social media bot (find on your own or look at Examples of Bots (or apps).) What does this bot do that a normal person wouldn’t be able to, or wouldn’t be able to as easily? Who is in charge of creating and running this bot? Does the fact that it is a bot change how you feel about its actions

      Expectations for bots focus on efficiency, speed, and rigid adherence to code, whereas normal users are expected to possess empathy, social nuance, and accountability for their "intent." For example, a unit conversion bot can scan thousands of posts to provide instant metric offsets that a task a human could not perform at that scale without extreme fatigue. These bots are typically managed by independent developers who use APIs to automate actions. Because a bot lacks personal will, we often view its errors as technical bugs rather than moral failings, shifting the ethical responsibility back to the person who created or ran the program.

    1. These visual aids invite readers to make code-meshing a "shared project, one that will not only inform instructional practices, but possibly intervene into the culture of prejudices against African American English as a mainstream language variety"

      Beyond the classroom and academic settings introducing code meshing into modern "standard" linguistic practice can change how the world views AAVE and other dialects rather than seeing one as correct and any other dialect as inferior.

    2. In Other People's English, the pedagogical imperative moves beyond solely teaching students what the languages of academic institutions are and how to use them. It also moves beyond Delpit's imperative to give students access to the "language of economic success" (Delpit 68). Rather than building a language curriculum that assumes a Standard Academic English code deficiency in students, educators can work with students from a space that emphasizes how their language experiences are already engaging with different linguistic codes, both standard and disenfranchised.

      This has so much potential to teach students to their best ability, allowing students to learn language in an entire new way and communicate in ways that don't discriminate large groups of people.

    3. By sharing assignments, student writing, and most tellingly, conversations he has had with colleagues initially resistant to any code-meshing content in the curriculum, Lovejoy teases out the multifarious implications that African American English carries especially in a post-secondary education context.

      I imagine as an educator it has to be hard to deny something is working when there is proof of it benefitting students, young students even which are very hard to teach complex things, meaning code meshing is not as complex as people are making it out to be, the transition to teaching these dialects alongside SAE would be seamless.

    4. Y'shanda Young-Rivera offers an elementary education perspective on how code-meshing works on the ground, within several classroom contexts. Young-Rivera, previously a skeptic of code-meshing, offers revealing articulations by fourth-, fifth-, and eighth-grade students of the terms "code-meshing" and "code-switching." She includes daily lesson plans, as well as images of the students' written homework responses, in which the young writers identify and interpret the code-meshing they encounter in their world. This chapter serves to not only emphasize how easily implemented the frame is but also how flexible the code-meshing curriculum can be, given the imperative of a state-wide accountability project like Common Core requirements.

      This is the proof that teaching code switching from a young age is shown to work. Even in middle school settings children understand code switching very well and even use examples of it in their own work and decipher other examples of code switching in the classroom. This also proves that this topic can be taught to multiple different people, students ranging young to old.

    5. no language, Standard English included, is a static, neutral, code. In "Code-Meshing or Code-Switching?", Young argues that code-switching, despite well-intended goals of inclusion, is in practice a vestige of legalized segregation, and "an educational strategy that forces African Americans to view their language culture and identity as antithetical to the U.S. mainstream"

      The forced assimilation is blatant and it is detrimental to these communities.

    6. Young continuously points to the elision of the ways in which code-switching is connected to racial self-understanding,

      Forcing a large part of the population to assimilate to SAE is harmful to their education and their success level, as well as their racial self understanding. Being forced to speak like white people, for lack of better words, shouldn't be the only way African American students find success in the academic setting or success in a professional setting.

    7. "not in your face, not in demand of a conversion, but a conversation based on personal experiences, classroom experiences, and decades of research and scholarship"

      I think it should be an in your face conversation! I don't think people should have to be passive as can be when discussing an important cause they are passionate about such as code meshing.

    8. Other People's English unpacks the fluidity, mobility, and heteroglossia of English through the possibilities of code-meshing, and outlines what structures of racism code-switching reconstitutes, despite the good intentions by which it is deployed

      I have questions about the word Heteroglossia so I had to google it!

    9. "But how can I let a student, who had come to see me for help, walk out without my having shown them the way the school wants them to write?" At least a dozen hands shoot up. The responses that follow, some echoing this anxiety, some responding critically to the implicated assumptions, are indicative of a tense, decades-old pedagogical impasse in language and writing studies.

      I find it interesting that we are all aware of how ever evolving language itself really is and still educators are scared to nudge the norm, let alone teach the concept to their students. I'm confused why we have drawn a line in the sand so to speak, where we are absolutely refusing to let language evolve, benefitting students across the board. Not teaching about code meshing also only negatively impacts minorities, which probably has something to do with it.

    10. Code-meshing, he explains, is an approach to writing and interpreting texts that advocates for blending language codes in the classroom, rather than switching from one set of linguistic codes to another, depending on the "appropriate" social and discursive contexts

      There should be meshing of dialects in the classroom and it should also be acceptable in the professional setting. This would help to teach students a wide linguistic variety rather than a limited one perpetuated by the powers that be. The fact is we can change how we teach and view the American language model, most powerful people either don't understand or sadly are too racist to change their views.

    11. that through a pedagogy of "code-switching," the burden of discourse assimilation invariably falls on African American students.

      the burden is absolutely on African American student's when faced to assimilate to SAE, this is because of the different dialects spoken across different houses, white house holds tend to already use different forms of SAE, while African American households usually use a different dialect such as AAVE. This puts African American students at a massive disadvantage because not only are they learning how to use the language, it's almost an entirely differemt way to speak it.

    1. Finally, code-switching is also revealed to be highly effective in terms of teaching vocabulary and grammar for EFL students, as the findings revealed that students received better results when teachers code-switch than when teachers provide English-only instruction.

      final take and opinion

    2. Overall, these results imply that code-switching could be intentionally utilized as a tactic to inform and explain word meaning, leading to higher learning performance. Comprehending the effects of code-switching on foreign language vocabulary development begins with this work.

      final takeaway

    3. How frequently teachers utilize code-switching is also determined by their own personal perspectives and the personality of the teachers. For instance, Istifci's research (2019) found that while both novice and experienced teachers had favorable opinions toward the use of code-switching in the classroom, they rarely used it in the courses that were observed.

      personally i think a mix would be useful

    4. The widespread usage of code-switching is due to the importance for EFL teachers to give students a successful classroom experience with less stress.

      claim

    5. In addition to the previously mentioned two purposes, the interpersonal aspect is another important reason why teachers choose to code-switch

      another new proposed idea

    6. purpose and also effect of types of code-switching on learning, Kashi (2018) found that there is a substantial difference between learning the past tense when intra-sentential and inter-sentential code switching are utilized.

      new idea

    7. 1) What are the types of code-switching and the functions of code-switching in EFL English language teaching? 2) How do teachers perceive the use of code-switching in teaching Asian EFL tertiary-level students? 3) To what extent is code-switching implemented in teaching Asian EFL learners writing skill?

      rhetorical questions

    8. Concerning the use of code switching, there are two main perspectives. While one side is against code-switching and advocates for teaching entirely in the target language, the other is in favor of code-switching and advocates using CS to some extent.

      main idea

    1. well-crafted policy, a single professional development presentation or workshop isn't enough to bring about systemic change. The demands laid out in "This Ain't Another Policy Statement!" make clear that, as a WPA, I must continually ask myself how I can meet the demand to "do much better in [my] own self-work that must challenge the multiple institutional structures of anti-Black racism [I] have used to shape language politics."

      A simple solution to a complex problem is never going to work. This is an evolution process thar will take lots of work and convincing. Many educators should make the shift in teaching code meshing in itself at the very least and how it fits into our society and our communities, and why it is just as valid of a language as SAE.