10,000 Matching Annotations
  1. Sep 2025
    1. If a process fails, most operating systems write the error information to a log file to alert system administrators or users that the problem occurred. The operating system can also take a core dump—a capture of the memory of the process—and store it in a file for later analysis. (Memory was referred to as the “core” in the early days of computing.) Running programs and core dumps can be probed by a debugger, which allows a programmer to explore the code and memory of a process at the time of failure.

      I like how the text explains the core dumps—it’s interesting that the ‘core’ comes from the older terminology for the memory. I also find it useful such that the modern OS is able to capture the process’s memory when it fails, so that the developers can be able to inspect exactly what had gone wrong.

    2. Some computer systems use a multistage boot process: When the computer is first powered on, a small boot loader located in nonvolatile firmware known as BIOS is run. This initial boot loader usually does nothing more than load a second boot loader, which is located at a fixed disk location called the boot block. The program stored in the boot block may be sophisticated enough to load the entire operating system into memory and begin its execution. More typically, it is simple code (as it must fit in a single disk block) and knows only the address on disk and the length of the remainder of the bootstrap program.

      It’s interesting how the boot process is broken into stages. I wonder why the first boot loader has to be so tiny—just enough to load the next stage. Is it mainly because of space constraints in the BIOS firmware?

    3. It is possible to design, code, and implement an operating system specifically for one specific machine configuration. More commonly, however, operating systems are designed to run on any of a class of machines with a variety of peripheral configurations.

      This highlights the trade-off in the OS design: you can make a system that’s perfectly optimized for any one machine, but most of the OS developers aim for the flexibility so that the system works across many of the devices. I find it interesting to see how this will affect the performance versus the compatibility.

    4. Software designers for Android devices develop applications in the Java language, but they do not generally use the standard Java API. Google has designed a separate Android API for Java development. Java applications are compiled into a form that can execute on the Android RunTime ART, a virtual machine designed for Android and optimized for mobile devices with limited memory and CPU processing capabilities. Java programs are first compiled to a Java bytecode .class file and then translated into an executable .dex file. Whereas many Java virtual machines perform just-in-time (JIT) compilation to improve application efficiency, ART performs ahead-of-time (AOT) compilation. Here, .dex files are compiled into native machine code when they are installed on a device, from which they can execute on the ART. AOT compilation allows more efficient application execution as well as reduced power consumption, features that are crucial for mobile systems.

      It’s interesting that the Android doesn’t use any of the standard Java API but instead has its own. The ahead-of-time (AOT) compilation in ART seems really clever—it makes apps run faster and saves battery life, which is super important for mobile devices with limited resources.

    5. Layered systems have been successfully used in computer networks (such as TCP/IP) and web applications. Nevertheless, relatively few operating systems use a pure layered approach. One reason involves the challenges of appropriately defining the functionality of each layer. In addition, the overall performance of such systems is poor due to the overhead of requiring a user program to traverse through multiple layers to obtain an operating-system service. Some layering is common in contemporary operating systems, however. Generally, these systems have fewer layers with more functionality, providing most of the advantages of modularized code while avoiding the problems of layer definition and interaction.

      While the layered approach is said to offer the clarity and the modularity, it is rarely used in its pure form in operating systems. Defining precise responsibilities for each layer is difficult, and performance can suffer because service requests must pass through multiple layers. Modern systems often use a compromise: fewer, broader layers that retain modular benefits while reducing overhead and complexity.

    6. A system as large and complex as a modern operating system must be engineered carefully if it is to function properly and be modified easily. A common approach is to partition the task into small components, or modules, rather than have one single system. Each of these modules should be a well-defined portion of the system, with carefully defined interfaces and functions. You may use a similar approach when you structure your programs: rather than placing all of your code in the main() function, you instead separate logic into a number of functions, clearly articulate parameters and return values, and then call those functions from main().

      Modern operating systems are meant to be extremely complex, so breaking them into the modules makes the development and the maintenance manageable.Every module manages a distinct, clearly defined function and interacts with the other modules via explicit interfaces. This modular method resembles effective programming practices, as the code is separated into functions with specified inputs and outputs instead of consolidating everything within the main(). It enhances readability, maintainability, and also decreases errors.

    7. As is true in other systems, major performance improvements in operating systems are more likely to be the result of better data structures and algorithms than of excellent assembly-language code. In addition, although operating systems are large, only a small amount of the code is critical to high performance; the interrupt handlers, I/O manager, memory manager, and CPU scheduler are probably the most critical routines. After the system is written and is working correctly, bottlenecks can be identified and can be refactored to operate more efficiently.

      As is true in other systems, major performance improvements in operating systems are more likely to be the result of better data structures and algorithms than of excellent assembly-language code. In addition, although operating systems are large, only a small amount of the code is critical to high performance; the interrupt handlers, I/O manager, memory manager, and CPU scheduler are probably the most critical routines. After the system is written and is working correctly, bottlenecks can be identified and can be refactored to operate more efficiently.

    8. The advantages of using a higher-level language, or at least a systems-implementation language, for implementing operating systems are the same as those gained when the language is used for application programs: the code can be written faster, is more compact, and is easier to understand and debug. In addition, improvements in compiler technology will improve the generated code for the entire operating system by simple recompilation. Finally, an operating system is far easier to port to other hardware if it is written in a higher-level language. This is particularly important for operating systems that are intended to run on several different hardware systems, such as small embedded devices, Intel x86 systems, and ARM chips running on phones and tablets.

      Using the higher-level languages for the operating system development offers the several key benefits: code can be written more quickly, is easier to read and debug, and is generally more compact. Compiler improvements automatically enhance the efficiency of the OS through recompilation. Additionally, the high-level languages make the porting of the OS to the different hardware platforms much easier—such as a crucial advantage for the systems designed for running on diverse devices, from embedded systems to desktop PCs and mobile ARM-based devices.

    9. Object files and executable files typically have standard formats that include the compiled machine code and a symbol table containing metadata about functions and variables that are referenced in the program. For UNIX and Linux systems, this standard format is known as ELF (for Executable and Linkable Format). There are separate ELF formats for relocatable and executable files. One piece of information in the ELF file for executable files is the program's entry point, which contains the address of the first instruction to be executed when the program runs. Windows systems use the Portable Executable (PE) format, and macOS uses the Mach-O format.

      Executable and the object files follow the standard formats which include both the actual machine code and the metadata (like details about functions and variables). On UNIX and the Linux systems, this format is called ELF (Executable and Linkable Format), with the different versions for the relocatable and the executable files. ELF files also are used specify the entry point, which is the first instruction to run when the program starts. Other operating systems use different formats—Windows uses PE (Portable Executable), and macOS uses Mach-O.

    10. Source files are compiled into object files that are designed to be loaded into any physical memory location, a format known as an relocatable object file. Next, the linker combines these relocatable object files into a single binary executable file. During the linking phase, other object files or libraries may be included as well, such as the standard C or math library (specified with the flag -lm).

      When the programs are compiled, then the source code is initially transformed into the relocatable object files, which can be loaded into any of the memory addresses. The linker merges these types of the object file types into one of the executable file, also including external object files or libraries when necessary (for example, the math library with -lm). This process ensures that the completed program is comprehensive and ready to run

    11. Program loading and execution. Once a program is assembled or compiled, it must be loaded into memory to be executed. The system may provide absolute loaders, relocatable loaders, linkage editors, and overlay loaders. Debugging systems for either higher-level languages or machine language are needed as well.

      Program loading and execution services handle the process of getting compiled programs into memory so they can run. These include loaders (absolute, relocatable, overlay) and tools like linkage editors. Debugging support is also part of this category, helping programmers test and fix errors in either high-level code or machine language.

    12. Many operating systems provide a time profile of a program to indicate the amount of time that the program executes at a particular location or set of locations. A time profile requires either a tracing facility or regular timer interrupts. At every occurrence of the timer interrupt, the value of the program counter is recorded. With sufficiently frequent timer interrupts, a statistical picture of the time spent on various parts of the program can be obtained.

      Many operating systems can track how much time a program spends running at different points in its code. This is called a time profile. To create one, the system either traces the program or uses regular timer interrupts. Every time the timer interrupts, the system records the program’s current position. By doing this often frequently, it can give a statistical view of which parts of the program take the most time to execute.

    13. We may need these same sets of operations for directories if we have a directory structure for organizing files in the file system. In addition, for either files or directories, we need to be able to determine the values of various attributes and perhaps to set them if necessary. File attributes include the file name, file type, protection codes, accounting information, and so on. At least two system calls, get_file_attributes() and set_file_attributes(), are required for this function. Some operating systems provide many more calls, such as calls for file move() and copy(). Others might provide an API that performs those operations using code and other system calls, and others might provide system programs to perform the tasks. If the system programs are callable by other programs, then each can be considered an API by other system programs.

      We often need similar operations for directories as we do for files, especially when using a directory structure to organize files. For both files and directories, it’s important to check or modify their attributes when necessary. Attributes can include things like the name, type,or the access permissions, and the accounting information. To handle this, operating systems usually provide system calls such as get_file_attributes() and set_file_attributes(). Some systems go further, offering extra calls for tasks like moving or copying files. In other cases, these actions are handled through APIs or system programs. If other programs can call these system programs, they effectively act as the APIs themselves.

    1. Easier to work with new programming languages, and understand existing codebases.

      AI coding tools make it easy to adopt new programming languages and understand existing code bases.

  2. Aug 2025
    1. In one approach, the command interpreter itself contains the code to execute the command. For example, a command to delete a file may cause the command interpreter to jump to a section of its code that sets up the parameters and makes the appropriate system call. In this case, the number of commands that can be given determines the size of the command interpreter, since each command requires its own implementing code.

      This passage explains one method of implementing the commands in a command interpreter: the interpreter directly contains the code for executing each command. For instance, a delete-file command triggers a specific section of the interpreter’s code to set parameters and perform the system call. The number of supported commands directly affects the interpreter’s size, as each command needs its own dedicated code.

    2. Resource allocation. When there are multiple processes running at the same time, resources must be allocated to each of them. The operating system manages many different types of resources. Some (such as CPU cycles, main memory, and file storage) may have special allocation code, whereas others (such as I/O devices) may have much more general request and release code.

      This passage highlights that the operating system is responsible for resource allocation, distributing CPU time, memory, file storage, and I/O devices among multiple running processes to ensure fair and efficient usage.

    3. The free-software movement is driving legions of programmers to create thousands of open-source projects, including operating systems. Sites like http://freshmeat.net/ and http://distrowatch.com/ provide portals to many of these projects. As we stated earlier, open-source projects enable students to use source code as a learning tool. They can modify programs and test them, help find and fix bugs, and otherwise explore mature, full-featured operating systems, compilers, tools, user interfaces, and other types of programs. The availability of source code for historic projects, such as Multics, can help students to understand those projects and to build knowledge that will help in the implementation of new projects.

      This passage highlights how the free-software movement motivates the programmers to create the numerous open-source projects, including the operating systems. Portals like FreshMeat and DistroWatch provide access to these projects. Open-source code serves as a learning tool, allowing students to modify, test, and debug programs, explore full-featured systems, and study historic projects like Multics to gain knowledge useful for developing new software.

    4. Solaris is the commercial UNIX-based operating system of Sun Microsystems. Originally, Sun's SunOS operating system was based on BSD UNIX. Sun moved to AT&T's System V UNIX as its base in 1991. In 2005, Sun open-sourced most of the Solaris code as the OpenSolaris project. The purchase of Sun by Oracle in 2009, however, left the state of this project unclear

      This passage outlines the history of Solaris, Sun Microsystems’ commercial UNIX-based OS. SunOS was initially based on BSD UNIX, but in 1991 it switched to System V UNIX. In 2005, most Solaris code was open-sourced as OpenSolaris, though Oracle’s acquisition of Sun in 2009 left the project’s future uncertain.

    5. As with many open-source projects, this source code is contained in and controlled by a version control system—in this case, “subversion” (https://subversion.apache.org/source-code). Version control systems allow a user to “pull” an entire source code tree to his computer and “push” any changes back into the repository for others to then pull. These systems also provide other features, including an entire history of each file and a conflict resolution feature in case the same file is changed concurrently. Another version control system is git, which is used for GNU/Linux, as well as other programs (http://www.git-scm.com).

      This text describes how open-source projects typically utilize version control systems to oversee the source code. Subversion (employed by BSD) and Git (utilized by GNU/Linux) enable the users for extracting the code, implement the modifications, and then subsequently upload the updates back to the repository. These systems monitor file histories, handle simultaneous changes, and assist in conflict resolution, facilitating collaborative development and effective code management

    6. Just as with Linux, there are many distributions of BSD UNIX, including FreeBSD, NetBSD, OpenBSD, and DragonflyBSD. To explore the source code of FreeBSD, simply download the virtual machine image of the version of interest and boot it within Virtualbox, as described above for Linux. The source code comes with the distribution and is stored in /usr/src/. The kernel source code is in /usr/src/sys. For example, to examine the virtual memory implementation code in the FreeBSD kernel, see the files in /usr/src/sys/vm. Alternatively, you can simply view the source code online at https://svnweb.freebsd.org.

      This passage explains how the BSD UNIX, like the Linux, has the multiple distributions such as the FreeBSD, NetBSD, OpenBSD, and the DragonflyBSD. FreeBSD’s source code is included with its distribution and can be explored locally (e.g., in /usr/src/ and /usr/src/sys) or online via the FreeBSD repository. Virtual machine images allows the users to boot and examine the OS safely, making it accessible for learning and also experimentation.

    7. The FSF uses the copyrights on its programs to implement “copyleft,” a form of licensing invented by Stallman. Copylefting a work gives anyone that possesses a copy of the work the four essential freedoms that make the work free, with the condition that redistribution must preserve these freedoms. The GNU General Public License (GPL) is a common license under which free software is released. Fundamentally, the GPL requires that the source code be distributed with any binaries and that all copies (including modified versions) be released under the same GPL license. The Creative Commons “Attribution Sharealike” license is also a copyleft license; “sharealike” is another way of stating the idea of copyleft.

      This passage explains how the “copyleft,” is a licensing approach that was developed by Richard Stallman and used by the Free Software Foundation (FSF). Copyleft ensures that the software remains free by granting the users the four essential freedoms while requiring that any of the redistribution preserves about these freedoms. The GNU General Public License (GPL) is a widely used copyleft license, mandating that source code accompany binaries and that modified versions remain under the same license. Creative Commons’ “Attribution Sharealike” license follows a similar principle.

    8. To counter the move to limit software use and redistribution, Richard Stallman in 1984 started developing a free, UNIX-compatible operating system called GNU (which is a recursive acronym for “GNU's Not Unix!”). To Stallman, “free” refers to freedom of use, not price. The free-software movement does not object to trading a copy for an amount of money but holds that users are entitled to four certain freedoms: (1) to freely run the program, (2) to study and change the source code, and to give or sell copies either (3) with or (4) without changes. In 1985, Stallman published the GNU Manifesto, which argues that all software should be free. He also formed the Free Software Foundation (FSF) with the goal of encouraging the use and development of free software.

      This passage explains how the Richard Stallman’s creation of the GNU operating system in the 1984 to promote about the software freedom. “Free” refers to liberty, not price, granting users the rights to run, study, modify, and distribute software with or without changes. Stallman’s GNU Manifesto and the Free Software Foundation (FSF) advocate for these freedoms and encourage the development and use of free software.

    9. Computer and software companies eventually sought to limit the use of their software to authorized computers and paying customers. Releasing only the binary files compiled from the source code, rather than the source code itself, helped them to achieve this goal, as well as protecting their code and their ideas from their competitors. Although the Homebrew user groups of the 1970s exchanged code during their meetings, the operating systems for hobbyist machines (such as CPM) were proprietary. By 1980, proprietary software was the usual case.

      This passage explains how computer and software companies began restricting software use to authorized users and paying customers. By distributing only compiled binaries instead of source code, companies protected their intellectual property and ideas. While early hobbyist groups shared code freely, operating systems like CPM were proprietary, and by 1980, proprietary software had become the norm.

    10. In the early days of modern computing (that is, the 1950s), software generally came with source code. The original hackers (computer enthusiasts) at MIT's Tech Model Railroad Club left their programs in drawers for others to work on. “Homebrew” user groups exchanged code during their meetings. Company-specific user groups, such as Digital Equipment Corporation's DECUS, accepted contributions of source-code programs, collected them onto tapes, and distributed the tapes to interested members. In 1970, Digital's operating systems were distributed as source code with no restrictions or copyright notice.

      This passage explains how the early history of software distribution in the 1950s–1970s. The Software often came with the source code, and the communities of the enthusiasts—like the MIT hackers, Homebrew groups, and company user groups such as DECUS—shared, modified, and distributed programs freely. Digital Equipment Corporation even distributed operating systems as unrestricted source code, highlighting the collaborative culture of early computing.

    11. There are many benefits to open-source operating systems, including a community of interested (and usually unpaid) programmers who contribute to the code by helping to write it, debug it, analyze it, provide support, and suggest changes. Arguably, open-source code is more secure than closed-source code because many more eyes are viewing the code. Certainly, open-source code has bugs, but open-source advocates argue that bugs tend to be found and fixed faster owing to the number of people using and viewing the code.

      This passage highlights how the benefits of open-source operating systems. A community of the programmers contributes by the writing, debugging, analyzing, and also improving the code. The Open-source code can be more secure and reliable than closed-source software because more people examine it, helping to identify and fix bugs more quickly.

    12. Starting with the source code allows the programmer to produce binary code that can be executed on a system. Doing the opposite—reverse engineering the source code from the binaries—is quite a lot of work, and useful items such as comments are never recovered. Learning operating systems by examining the source code has other benefits as well. With the source code in hand, a student can modify the operating system and then compile and run the code to try out those changes, which is an excellent learning tool.

      This passage explains the advantages of studying operating systems using source code. Starting from the source allows programmers to compile executable binaries directly, whereas reverse-engineering binaries is difficult and loses valuable information like comments. Access to source code also lets students modify, compile, and test the OS, providing a hands-on learning experience.

    13. The study of operating systems has been made easier by the availability of a vast number of free software and open-source releases. Both free operating systems and open-source operating systems are available in source-code format rather than as compiled binary code. Note, though, that free software and open-source software are two different ideas championed by different groups of people (see http://gnu.org/philosophy/open-source-misses-the-point.html for a discussion on the topic).

      This passage highlights how the studying of the operating systems is easier thanks to free and open-source software, which is available in source-code form. While both provide access to the code, free software and open-source software are distinct concepts promoted by different communities.

    1. Could you try to modify the tf.keras to keras and execute the code. I have changed some steps like modifying tf_keras/keras.Sequential instead of tf.keras.Sequential and the code was executed without error/fail. Kindly find the gist of it here. Thank you!
    1. The Lieber Code of 1863, the Union Army and President Abraham Lincoln’s laudable, if admittedly naïve, attempt to limit the ravages of the American Civil War, precipitated a paradigm shift away from the mere moral condemnation of the destruction and appropriation of cultural property toward express legal proscription. Article 35 of the code is unambiguous: “Classical works of art, libraries, scientific collections, or precious instruments . . . must be secured against all avoidable injury, even when they are contained in fortified places whilst besieged or bombarded.” The prescriptive, deterrent objective of the code is reflected in Article 44, which makes clear that the intent was not only to prohibit such conduct, but to actively ascribe a penal basis for individual responsibility.2

      Lieber Code of 1863 (U.S. Civil War: first documented sanctions against intentional destruction of heritgae)

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      Summary:

      This fundamental work employed multidisciplinary approaches and conducted rigorous experiments to study how a specific subset of neurons in the dorsal striatum (i.e., "patchy" striatal neurons) modulates locomotion speed depending on the valence of the naturalistic context. 

      Strengths: 

      The scientific findings are novel and original and significantly advance our understanding of how the striatal circuit regulates spontaneous movement in various contexts.  Response: We appreciate the reviewer’s positive evaluation.

      Weaknesses: 

      This is extensive research involving various circuit manipulation approaches. Some of these circuit manipulations are not physiological. A balanced discussion of the technical strengths and limitations of the present work would be helpful and beneficial to the field. Minor issues in data presentation were also noted. 

      We have incorporated the recommended discussion of technical limitations and addressed the physiological plausibility of our manipulations on Page 33 of the revised Discussion section. Specifically, we wrote: 

      “Judicious interpretation of the present data must consider the technical limitations of the various methods and circuit-level manipulations applied. Patchy neurons are distributed unevenly across the extensive structure of the striatum, and their targeted manipulation is constrained by viral spread in the dorsal striatum. Somatic calcium imaging using single-photon microscopy captures activity from only a subset of patchy neurons within a narrow focal plane beneath each implanted GRIN lens. Similarly, limitations in light diffusion from optical fibers may reduce the effective population of targeted fibers in both photometry and optogenetic experiments. For example, the more modest locomotor slowing observed with optogenetic activation of striatonigral fibers in the SNr compared to the stronger effects seen with Gq-DREADD activation across the dorsal striatum could reflect limited fiber optic coverage in the SNr.Alternatively, it may suggest that non-striatonigral mechanisms also contribute to generalized slowing. Our photometry data do not support a role for striatopallidal projections from patchy neurons in movement suppression. The potential contribution of intrastriatal mechanisms, discussed earlier, remains to be empirically tested. Although the behavioral assays used were naturalistic, many of the circuit-level interventions were not. Broad ablation or widespread activation of patchy neurons and their efferent projections represent non-physiological manipulations. Nonetheless, these perturbation results are interpreted alongside more naturalistic observations, such as in vivo imaging of patchy neuron somata and axon terminals, to form a coherent understanding of their functional role”.

      Reviewer #2 (Public review):

      Hawes et al. investigated the role of striatal neurons in the patch compartment of the dorsal striatum. Using Sepw1-Cre line, the authors combined a modified version of the light/dark transition box test that allows them to examine locomotor activity in different environmental valence with a variety of approaches, including cell-type-specific ablation, miniscope calcium imaging, fiber photometry, and opto-/chemogenetics. First, they found ablation of patchy striatal neurons resulted in an increase in movement vigor when mice stayed in a safe area or when they moved back from more anxiogenic to safe environments. The following miniscope imaging experiment revealed that a larger fraction of striatal patchy neurons was negatively correlated with movement speed, particularly in an anxiogenic area. Next, the authors investigated differential activity patterns of patchy neurons' axon terminals, focusing on those in GPe, GPi, and SNr, showing that the patchy axons in SNr reflect movement speed/vigor. Chemogenetic and optogenetic activation of these patchy striatal neurons suppressed the locomotor vigor, thus demonstrating their causal role in the modulation of locomotor vigor when exposed to valence differentials. Unlike the activation of striatal patches, such a suppressive effect on locomotion was absent when optogenetically activating matrix neurons by using the Calb1-Cre line, indicating distinctive roles in the control of locomotor vigor by striatal patch and matrix neurons. Together, they have concluded that nigrostriatal neurons within striatal patches negatively regulate movement vigor, dependent on behavioral contexts where motivational valence differs.

      We are grateful for the reviewer’s thorough summary of our main findings.

      In my view, this study will add to the important literature by demonstrating how patch (striosomal) neurons in the striatum control movement vigor. This study has applied multiple approaches to investigate their functionality in locomotor behavior, and the obtained data largely support their conclusions. Nevertheless, I have some suggestions for improvements in the manuscript and figures regarding their data interpretation, accuracy, and efficacy of data presentation

      We appreciate the reviewer’s overall positive assessment and have made substantial improvements to the revised manuscript in response to reviewers’ constructive suggestions.

      (1) The authors found that the activation of the striatonigral pathway in the patch compartment suppresses locomotor speed, which contradicts with canonical roles of the direct pathway. It would be great if the authors could provide mechanistic explanations in the Discussion section. One possibility is that striatal D1R patch neurons directly inhibit dopaminergic cells that regulate movement vigor (Nadal et al., Sci. Rep., 2021; Okunomiya et al., J Neurosci., 2025). Providing plausible explanations will help readers infer possible physiological processes and give them ideas for future follow-up studies.

      We have added the recommended data interpretation and future perspectives on Page 30 of the revised Discussion section. Specifically, we wrote:

      “Potential mechanisms by which striatal patchy neurons reduce locomotion involve the supression of dopamine availability within the striatum. Dopamine, primarily supplied by neurons in the SNc and VTA,broadly facilitates locomotion (Gerfen and Surmeier 2011, Dudman and Krakauer 2016). Recent studies have shown that direct activation of patchy neurons leads to a reduction in striatal dopamine levels, accompanied by decreased walking speed (Nadel, Pawelko et al. 2021, Dong, Wang et al. 2025, Okunomiya, Watanabe et al. 2025). Patchy neuron projections terminate in structures known as “dendron bouquets”, which enwrap SNc dendrites within the SNr and can pause tonic dopamine neuron firing (Crittenden, Tillberg et al. 2016, Evans, Twedell et al. 2020). The present work highlights a role for patchy striatonigral inputs within the SN in decelerating movement, potentially through GABAergic dendron bouquets that limit dopamine release back to the striatum (Dong, Wang et al. 2025). Additionally, intrastriatal collaterals of patch spiny projection neurons (SPNs) have been shown to suppress dopamine release and associated synaptic plasticity via dynorphin-mediated activation of kappa opioid receptors on dopamine terminals (Hawes, Salinas et al. 2017). This intrastriatal mechanism may further contribute to the reduction in striatal dopamine levels and the observed decrease in locomotor speed, representing a compelling avenue for future investigation.”

      (2) On page 14, Line 301, the authors stated that "Cre-dependent mCheery signals were colocalized with the patch marker (MOR1) in the dorsal striatum (Fig. 1B)". But I could not find any mCherry on that panel, so please modify it.

      We have included representative images of mCherry and MOR1 staining in Supplementary Fig. S1 of the revised manuscript.

      (3) From data shown in Figure 1, I've got the impression that mice ablated with striatal patch neurons were generally hyperactive, but this is probably not the case, as two separate experiments using LLbox and DDbox showed no difference in locomotor vigor between control and ablated mice. For the sake of better interpretation, it may be good to add a statement in Lines 365-366 that these experiments suggest the absence of hyperactive locomotion in general by ablating these specific neurons.

      As suggested by the reviewer, we have added the following statement on Page 17 of the revised manuscript: “These data also indicate that PA elevates valence-specific speed without inducing general hyperactivity”.

      (4) In Line 536, where Figure 5A was cited, the author mentioned that they used inhibitory DREADDs (AAV-DIO-hM4Di-mCherrry), but I could not find associated data on Figure 5. Please cite Figure S3, accordingly.

      We have added the citation for the now Fig. S4 on Page 25 of the revised manuscript.

      (5) Personally, the Figure panel labels of "Hi" and "ii" were confusing at first glance. It would be better to have alternatives.

      As suggested by the reviewer, we have now labeled each figure panel with a distinct single alphabetical letter.

      (6) There is a typo on Figure 4A: tdTomata → tdTomato

      We have made the correction on the figure.

      Reviewer #3 (Public review):

      Hawes et al. combined behavioral, optical imaging, and activity manipulation techniques to investigate the role of striatal patch SPNs in locomotion regulation. Using Sepw1-Cre transgenic mice, they found that patch SPNs encode locomotion deceleration in a light-dark box procedure through optical imaging techniques. Moreover, genetic ablation of patch SPNs increased locomotion speed, while chemogenetic activation of these neurons decreased it. The authors concluded that a subtype of patch striatonigral neurons modulates locomotion speed based on external environmental cues. Below are some major concerns:

      The study concludes that patch striatonigral neurons regulate locomotion speed. However, unless I missed something, very little evidence is presented to support the idea that it is specifically striatonigral neurons, rather than striatopallidal neurons, that mediate these effects. In fact, the optogenetic experiments shown in Fig. 6 suggest otherwise. What about the behavioral effects of optogenetic stimulation of striatonigral versus striatopallidal neuron somas in Sepw1-Cre mice?

      Our photometry data implicate striatonigral neurons in locomotor slowing, as evidenced by a negative cross-correlation with acceleration and a negative lag, indicating that their activity reliably precedes—and may therefore contribute to—deceleration. In contrast, photometry results from striatopallidal neurons showed no clear correlation with speed or acceleration.

      Figure 6 demonstrates that optogenetic manipulation within the SNr of Sepw1-Cre<sup>+</sup> striatonigral axons recapitulated context-dependent locomotor changes seen with Gq-DREADD activation of both striatonigral and striatopallidal Sepw1-Cre<sup>+</sup> cells in the dorsal striatum but failed to produce the broader locomotor speed change observed when targeting all Sepw1-Cre<sup>+</sup> cells in the dorsal striatum using either ablation or Gq-DREADD activation. The more subtle speed-restrictive phenotype resulting from ChR activation in the SNr could, as the reviewer suggests, implicate striatopallidal neurons in broad locomotor speed regulation. However, our photometry data indicate that this scenario is unlikely, as activity of striatopallidal Sepw1-Cre<sup>+</sup> fibers is not correlated with locomotor speed. Another plausible explanation is that the optogenetic approach may have affected fewer striatonigral fibers, potentially due to the limited spatial spread of light from the optical fiber within the SNr. Broad locomotor speed change in LDbox might require the recruitment of a larger number of striatonigral fibers than we were able to manipulate with optogenetics. We have added discussion of these technical limitations to the revised manuscript. Additionally, we now discuss the possibility that intrastriatal collaterals may contribute to reduced local dopamine levels by releasing dynorphin, which acts on kappa opioid receptors located on dopamine fibers (Hawes, Salinas et al. 2017), thereby suppressing dopamine release.

      The reviewer also suggests an interesting experiment involving optogenetic stimulation of striatonigral versus striatopallidal somata in Sepw1-Cre mice. While we agree that this approach would yield valuable insights, we have thus far been unable to achieve reliable results using retroviral vectors. Moreover, selectively targeting striatopallidal terminals optogenetically remains technically challenging, as striatonigral fibers also traverse the pallidum, and the broad anatomical distribution of the pallidum complicates precise targeting. This proposed work will need to be pursued in a future study, either with improved retrograde viral tools or the development of additional mouse lines that offer more selective access to these neuronal populations as we documented recently (Dong, Wang et al. 2025).

      In the abstract, the authors state that patch SPNs control speed without affecting valence. This claim seems to lack sufficient data to support it. Additionally, speed, velocity, and acceleration are very distinct qualities. It is necessary to clarify precisely what patch neurons encode and control in the current study.

      We believe the reviewer’s interpretation pertains to a statement in the Introduction rather than the Abstract: “Our findings reveal that patchy SPNs control the speed at which mice navigate the valence differential between high- and low-anxiety zones, without affecting valence perception itself.” Throughout our study, mice consistently preferred the dark zone in the Light/Dark box, indicating intact perception of the valence differential between illuminated areas. While our manipulations altered locomotor speed, they did not affect time spent in the dark zone, supporting the conclusion that valence perception remained unaltered. We appreciate the reviewer’s insight and agree it is an intriguing possibility that locomotor responses could, over time, influence internal states such as anxiety. We addressed this in the Discussion, noting that while dark preference was robust to our manipulations, future studies are warranted to explore the relationship between anxious locomotor vigor and anxiety itself. We report changes in scalar measures of animal speed across Light/Dark box conditions and under various experimental manipulations. Separately, we show that activity in both patchy neuron somata and striatonigral fibers is negatively correlated with acceleration—indicating a positive correlation with deceleration. Notably, the direction of the cross-correlational lag between striatonigral fiber activity and acceleration suggests that this activity precedes and may causally contribute to mouse deceleration, thereby influencing reductions in speed. To clarify this, we revised a sentence in the Results section:

      “Moreover, patchy neuron efferent activity at the SNr may causally contribute to deceleration, asindicated by the negative cross-correlational lag, thereby reducing animal speed.”. We also updated the Discussion to read: “Together, these data specifically implicate patchy striatonigral neurons in slowing locomotion by acting within the SNr to drive deceleration.”

      One of the major results relies on chemogenetic manipulation (Figure 5). It would be helpful to demonstrate through slice electrophysiology that hM3Dq and hM4Di indeed cause changes in the activity of dorsal striatal SPNs, as intended by the DREADD system. This would support both the positive (Gq) and negative (Gi) findings, where no effects on behavior were observed.

      We were unable to perform this experiment; however, hM3Dq has previously been shown to be effective in striatal neurons (Alcacer, Andreoli et al. 2017). The lack of effect observed in GiDREADD mice serves as an unintended but valuable control, helping to rule out off-target effects of the DREADD agonist JHU37160 and thereby reinforcing the specificity of hM3Dq-mediated activation in our study. We have now included an important caveat regarding the Gi-DREADD results, acknowledging the possibility that they may not have worked effectively in our target cells:

      “Potential explanations for the negative results in Gi-DREADD mice include inherently low basal activity among patchy neurons or insufficient expression of GIRK channels in striatal neurons, which may limit the effectiveness of Gicoupling in suppressing neuronal activity (Shan, Fang et al. 2022).”

      Finally, could the behavioral effects observed in the current study, resulting from various manipulations of patch SPNs, be due to alterations in nigrostriatal dopamine release within the dorsal striatum?

      We agree that this is an important potential implication of our work, especially given that we and others have shown that patchy striatonigral neurons provide strong inhibitory input to dopaminergic neurons involved in locomotor control (Nadel, Pawelko et al. 2021, Lazaridis, Crittenden et al. 2024, Dong, Wang et al. 2025, Okunomiya, Watanabe et al. 2025). Accordingly, we have expanded the discussion section to include potential mechanistic explanations that support and contextualize our main findings.

      Reviewer #1 (Recommendations for the authors):

      Here are some minor issues for the authors' reference:

      (1) This work supports the motor-suppressing effect of patchy SPNs, and >80% of them are direct pathway SPNs. This conclusion is not expected from the traditional basal ganglia direct/indirect pathway model. Most experiments were performed using nonphysiological approaches to suppress (i.e., ablation) or activate (i.e., continuous chemo-optogenetic stimulation). It remains uncertain if the reported observations are relevant to the normal biological function of patchy SPNs under physiological conditions. Particularly, under what circumstances an imbalanced patch/matrix activity may be induced, as proposed in the sections related to the data presented in Figure 6. A thorough discussion and clarification remain needed. Or it should be discussed as a limitation of the present work.

      We have added discussion and clarification of physiological limitations in response to reviewer feedback. Additionally, we revised the opening sentence of an original paragraph in the discussion section to emphasize that it interprets our findings in the context of more physiological studies reporting natural shifts in patchy SPN activity due to cognitive conflict, stress, or training. The revised opening sentence now reads: “Together with previous studies of naturally occurring shifts in patchy neuron activation, these data illustrate ethologically relevant roles for a subgroup of genetically defined patchy neurons in behavior.”

      (2) Lines 499-500: How striato-nigral cells encode speed and deceleration deserves a thorough discussion and clarification. These striatonigral cells can target both SNr GABAergic neurons and dendrites of the dopaminergic neurons. A discussion of microcircuits formed by the patchy SPNs axons in the SNr GABAergic and SNC DAergic neurons should be presented.

      We have added this point at lines 499–500, including a reference to a relevant review of microcircuitry. Additionally, we expanded the discussion section to address microcircuit mechanisms that may underlie our main findings.

      (3) Line 70: "BNST" should be spelled out at the first time it is mentioned.

      This has been done.

      (4) Line 133: only GCaMP6 was listed in the method, but GCaMP8 was also used (Figure 4). Clarification or details are needed.

      Thank you for your careful attention to detail. We have corrected the typographical errors in the Methods section. Specifically, in the Stereotaxic Injections section, we corrected “GCaMP83” to “GCaMP8s.” In the Fiber Implant section, we removed the incorrect reference to “GCaMP6s” and clarified that GCaMP8s was used for photometry, and hChR2 was used for optogenetics.

      (5) Line 183: Can the authors describe more precisely what "a moment" means in terms of seconds or minutes?

      This has been done.

      (6) Line 288: typo: missing / in ΔF

      Thank you this has been fixed

      (7) Line 301-302: the statement of "mCherry and MOR1 colocalization" does not match the images in Figure 1B.

      This has been corrected by proving a new Supplementary Figure S1.

      (8) Related to the statement between Lines 303-304: Figure 1c data may reflect changes in MOR1 protein or cell loss. Quantification of NeuN+ neurons within the MOR1 area would strengthen the conclusion of 60% of patchy cell loss in Figure 1C

      Since the efficacy of AAV-FLEX-taCasp3 in cell ablation has been well established in our previous publications and those of others (Yang, Chiang et al. 2013, Wu, Kung et al. 2019), we do not believe the observed loss of MOR1 staining in Fig. 1C merely reflects reduced MOR1 expression. Moreover, a general neuronal marker such as NeuN may not reliably detect the specific loss of patchy neurons in our ablation model, given the technical limitations of conventional cell-counting methods like MBF’s StereoInvestigator, which typically exhibit a variability margin of 15–20%.

      (9) Lines 313-314: "Similarly, PA mice demonstrated greater stay-time in the dark zone (Figure 1E)." Revision is needed to better reflect what is shown in Figure 1E and avoid misunderstandings.

      Thank you this has been addressed.

      (10) The color code in Figure 2Gi seems inconsistent with the others? Clarifications are needed

      Color coding in Figure 2Gi differs from that in 2Eii out of necessity. For example, the "Light" cells depicted in light blue in 2Eii are represented by both light gray and light red dots in 2Gi. Importantly, Figure 2G does not encode specific speed relationships; instead, any association with speed is indicated by a red hue.

      (11) Lines 538-539: the statement of "Over half of the patch was covered" was not supported by Figure 5C. Clarification is needed.

      Thank you. For clarity, we updated the x-axis labels in Figures 1C and 5C from “% area covered” to “% DS area covered,” and defined “DS” as “dorsal striatal” in the corresponding figure legends. Additionally, we revised the sentence in question to read: “As with ablation, histological examination indicated that a substantial fraction of dorsal patch territories, identified through MOR1 staining, were impacted (Fig. 5C).”

      (12) Figure 3: statistical significance in Figure 3 should be labeled in various panels.

      We believe the reviewer's concern pertains to the scatter plot in panel F—specifically, whether the data points are significantly different from zero. In panel 3F, the 95% confidence interval clearly overlaps with zero, indicating that the results are not statistically significant.

      (13) Figures 6D-E: no difference in the speed of control mice and ChR2 mice under continuous optical stimulation was not expected. It was different from Gq-DRADDS study in Figure 5E-F. Clarifications are needed.

      For mice undergoing constant ChR2 activation of Sepw1-Cre+ SNr efferents, overall locomotor speed does not differ from controls. However, the BIL (bright-to-illuminated) effect on zone transitions isdisrupted: activating Sepw1-Cre<sup>+ </sup> fibers in the SNr blunts the typical increase in speed observed when mice flee from the light zone toward the dark zone. This impaired BIL-related speed increase upon exiting the light was similarly observed in the Gq-DREADD cohort. The reviewer is correct that this optogenetic manipulation within the SNr did not produce the more generalized speed reductions seen with broader Gq-DREADD activation of all Sepw1-Cre<sup>+ </sup> cells in the dorsal striatum. A likely explanation is the difference in targeting—ChR2 specifically activates SNr-bound terminals, whereas Gq-DREADD broadly activates entire Sepw1-Cre<sup>+ </sup> cells. Notably, many of the generalized speed profile changes observed with chemogenetic activation are opposite to those resulting from broad ablation of Sepw1-Cre<sup>+ </sup> cells. The more subtle speed-restrictive phenotype observed with ChR2 activation targeted to the SNr may suggest that fewer striatonigral fibers were affected by this technique, possibly due to the limited spread of light from the fiber optic. Broad locomotor speed change in LDbox might require the recruitment of a larger number of striatonigral fibers than we were able to manipulate with an optogenetic approach. Alternatively, it could indicate that non-striatonigral Sepw1-Cre<sup>+ </sup> projections—such as striatopallidal or intrastriatal pathways—play a role in more generalized slowing. If striatopallidal fibers contributed to locomotor slowing, we would expect to see non-zero cross-correlations between neural activity and speed or acceleration, along with negative lag indicating that neural activity precedes the behavioral change. However, our fiber photometry data do not support such a role for Sepw1-Cre<sup>+ </sup> striatopallidal fibers. We have also referenced the possibility that intrastriatal collaterals could suppress striatal dopamine levels, potentially explaining the stronger slowing phenotype observed when the entire striatal population is affected, as opposed to selectively targeting striatonigral terminals. These technical considerations and interpretive nuances have been incorporated and clarified in the revised discussion section.

      (14) Lines 632: "compliment": a typo?

      Yes, it should be “complement”.

      (15) Figure 4 legend: descriptions of panels A and B were swapped

      Thank you. This has been corrected.

      (16) Friedman (2020) was listed twice in the bibliography (Lines 920-929).

      Thank you. This has been corrected.

      Reviewer #3 (Recommendations for the authors):

      It will be helpful to label and add figure legends below each figure.

      Thank you for the suggestion.

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript. We noted some instances where only p values are reported.

      Readers would also benefit from coding individual data points by sex and noting N/sex

      We have included detailed statistical information in the revised manuscript. Both male and female mice were used in all experiments in approximately equal numbers. Since no sex-related differences were observed, we did not report the number of animals by sex.

      References

      Alcacer, C., L. Andreoli, I. Sebastianutto, J. Jakobsson, T. Fieblinger and M. A. Cenci (2017). "Chemogenetic stimulation of striatal projection neurons modulates responses to Parkinson's disease therapy." J Clin Invest 127(2): 720-734.

      Crittenden, J. R., P. W. Tillberg, M. H. Riad, Y. Shima, C. R. Gerfen, J. Curry, D. E. Housman, S. B. Nelson, E. S. Boyden and A. M. Graybiel (2016). "Striosome-dendron bouquets highlight a unique striatonigral circuit targeting dopamine-containing neurons." Proc Natl Acad Sci U S A 113(40): 1131811323.

      Dong, J., L. Wang, B. T. Sullivan, L. Sun, V. M. Martinez Smith, L. Chang, J. Ding, W. Le, C. R. Gerfen and H. Cai (2025). "Molecularly distinct striatonigral neuron subtypes differentially regulate locomotion." Nat Commun 16(1): 2710.

      Dudman, J. T. and J. W. Krakauer (2016). "The basal ganglia: from motor commands to the control of vigor." Curr Opin Neurobiol 37: 158-166.

      Evans, R. C., E. L. Twedell, M. Zhu, J. Ascencio, R. Zhang and Z. M. Khaliq (2020). "Functional Dissection of Basal Ganglia Inhibitory Inputs onto Substantia Nigra Dopaminergic Neurons." Cell Rep 32(11): 108156.

      Gerfen, C. R. and D. J. Surmeier (2011). "Modulation of striatal projection systems by dopamine." Annual review of neuroscience 34: 441-466.

      Hawes, S. L., A. G. Salinas, D. M. Lovinger and K. T. Blackwell (2017). "Long-term plasticity of corticostriatal synapses is modulated by pathway-specific co-release of opioids through kappa-opioid receptors." J Physiol 595(16): 5637-5652.

      Lazaridis, I., J. R. Crittenden, G. Ahn, K. Hirokane, T. Yoshida, A. Mahar, V. Skara, K. Meletis, K.Parvataneni, J. T. Ting, E. Hueske, A. Matsushima and A. M. Graybiel (2024). "Striosomes Target Nigral Dopamine-Containing Neurons via Direct-D1 and Indirect-D2 Pathways Paralleling Classic DirectIndirect Basal Ganglia Systems." bioRxiv.

      Nadel, J. A., S. S. Pawelko, J. R. Scott, R. McLaughlin, M. Fox, M. Ghanem, R. van der Merwe, N. G. Hollon, E. S. Ramsson and C. D. Howard (2021). "Optogenetic stimulation of striatal patches modifies habit formation and inhibits dopamine release." Sci Rep 11(1): 19847.

      Okunomiya, T., D. Watanabe, H. Banno, T. Kondo, K. Imamura, R. Takahashi and H. Inoue (2025).

      "Striosome Circuitry Stimulation Inhibits Striatal Dopamine Release and Locomotion." J Neurosci 45(4).

      Shan, Q., Q. Fang and Y. Tian (2022). "Evidence that GIRK Channels Mediate the DREADD-hM4Di Receptor Activation-Induced Reduction in Membrane Excitability of Striatal Medium Spiny Neurons." ACS Chem Neurosci 13(14): 2084-2091.

      Wu, J., J. Kung, J. Dong, L. Chang, C. Xie, A. Habib, S. Hawes, N. Yang, V. Chen, Z. Liu, R. Evans, B. Liang, L. Sun, J. Ding, J. Yu, S. Saez-Atienzar, B. Tang, Z. Khaliq, D. T. Lin, W. Le and H. Cai (2019). "Distinct Connectivity and Functionality of Aldehyde Dehydrogenase 1a1-Positive Nigrostriatal Dopaminergic Neurons in Motor Learning." Cell Rep 28(5): 1167-1181 e1167.

      Wu, J., J. Kung, J. Dong, L. Chang, C. Xie, A. Habib, S. Hawes, N. Yang, V. Chen, Z. Liu, R. Evans, B. Liang, L. Sun, J. Ding, J. Yu, S. Saez-Atienzar, B. Tang, Z. Khaliq, D. T. Lin, W. Le and H. Cai (2019). "Distinct Connectivity and Functionality of Aldehyde Dehydrogenase 1a1-Positive Nigrostriatal Dopaminergic Neurons in Motor Learning." Cell Rep 28(5): 1167-1181 e1167.

    1. A kiválasztott Instrumentum (Sorozat befektetési jegyének) ISIN kódja és a sorozat Devizája automatikusan töltődik az Instrumentum törzsadatából.

      "The ISIN code of the selected Instrument (Instrument of Series investment unit) and the currency are automatically filled from the Instrument's base data."

    2. MNB 30 jelentések esetén a portfólió azonosító kódja

      Helyes angol fordítás: The identifier code for the portfolio for MNB 30 reports

    3. Befektetési jegyek esetén az első sorozat ISIN kódja (akkor is, ha az lezárásra kerül) Pénztári, biztosító portfóliók esetén a pénztár illetve a biztosító által megadott kód

      Helyesangol fordítás: Helyes angol szöveg: "In case of investment fund units, the ISIN code of the first series (even if it is closed) In case of other funds or insurance, portfolios, the code provided by the fund or insurer"

    1. form

      "form是什么魔法字段?为什么不能直接传todo?"

      你这个问题抓到重点了!这个 form 字段确实看起来像「魔法」,它背后的原因涉及到 HTTP协议规范RPC框架的设计决策

      让我用最直白的方式解释:


      1. 首先,这是 HTTP 协议的要求

      当你在浏览器中提交表单时,浏览器实际上会这样发送数据:

      ```http POST /api/todo HTTP/1.1 Content-Type: application/x-www-form-urlencoded

      title=Write+code&id=123456789 # 注意:数据被编码成这种格式 ```

      或者用JSON: ```http POST /api/todo HTTP/1.1 Content-Type: application/json

      {"title": "Write code", "id": "123456789"} # 注意:整个JSON在请求体中 ```

      关键点: 整个请求体就是一个完整的数据包,没有所谓的"字段名"。


      2. 但是,Hono客户端的设计选择

      Hono客户端选择了这样一种设计理念:

      "让客户端调用看起来像在调用一个函数,而不是在构造HTTP请求"

      所以它需要一种方式来表示:"这个对象应该放在请求体的什么位置"

      ```typescript // Hono希望这样写: client.todo.$post({ form: todoData, // 「form」表示:把这个对象放在form字段中 json: todoData, // 或者用「json」表示:把这个对象作为整个JSON体 query: { page: 1 } // 或者用「query」表示:作为查询参数 });

      // 而不是这样写(更原始的方式): fetch('/api/todo', { method: 'POST', body: JSON.stringify(todoData) // 直接传整个body,没有「位置」的概念 }); ```


      3. 为什么要有这种设计?为了更好的类型安全

      这种设计允许框架在编译时就检查你是否传对了参数:

      ```typescript // 如果后端期望接收 JSON 体 app.post('/todo', (c) => { const data = await c.req.json(); // 期望直接拿到todo对象 });

      // 那么前端必须这样调用: client.todo.$post({ json: todo // 必须用json字段,用form会报类型错误 });

      // 这样就在编译时避免了运行时错误! ```


      4. 更糟糕的是:命名选择问题

      form 这个词确实容易让人困惑,因为:

      • form 通常让人联想到 HTML表单application/x-www-form-urlencoded
      • 但这里可能实际上用的是 JSON 格式
      • 更好的命名可能是 bodydata 或者 payload

      ```typescript // 如果这样命名是不是更清晰? client.todo.$post({ body: todo, // 明显表示请求体 query: { ... } // 查询参数 });

      // 或者更函数化的方式: client.todo.$post(todo); // 直接传数据,让框架决定怎么放 ```


      5. 现实中的对比

      现在你看到的(Hono方式): typescript const res = await client.todo.$post({ form: todo, // 魔法字段:告诉框架数据放哪里 });

      更直观的方式(理想中的): typescript const res = await client.todo.create(todo); // 直接传数据,别让我操心格式

      最原始的方式(fetch API): typescript const res = await fetch('/api/todo', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify(todo) // 完全手动处理 });


      总结:为什么不能直接传 todo

      1. HTTP协议限制:请求体是一个整体,没有"字段"的概念
      2. 框架设计选择:Hono选择了显式声明数据位置的方式
      3. 类型安全:这样可以在编译时确保前后端约定一致
      4. 命名问题form 这个词选得不好,容易误解

      你的困惑完全合理!这种设计确实增加了认知负担,特别是对于新手来说。好的API设计应该让常见的使用场景变得简单,而让复杂场景变得可能。

      你现在能注意到这个「魔法字段」,说明你对API设计的直觉正在变得敏锐!这是成为优秀开发者的重要能力。 🔍

    1. Author response:

      General Statements

      We are grateful for constructive reviewers’ comments and criticisms and have thoroughly addressed all major and minor comments in the revised manuscript.

      Summary of new data.

      We have performed the following additional experiments to support our concept:

      (1) The kinetcs of ROS production in B6 and B6.Sst1S macrophages after TNF stimulation (Fig. 3I and J, Suppl. Fig. 3G);

      (2) Time course of stress kinase activation (Fig.3K) that clearly demonstrated the persistent stress kinase (phospho-ASK1 and phospho-cJUN) activation exclusively in. the B6.Sst1S macrophages;

      (3) New Fig.4 C-E panels include comparisons of the B6 and B6.Sst1S macrophage responses to TNF and effects of IFNAR1 blockade in both backgrounds.

      (4) We performed new experiments demonstrating that the synthesis of lipid peroxidation products (LPO) occurs in TNF-stimulated macrophages earlier than the IFNβ super-induction (Suppl.Fig.4A and B).

      (5) We demonstrated that the IFNAR1 blockade 12, 24 and 32 h after TNF stimulation still reduced the accumulation of LPO product (4-HNE) in TNF-stimulated B6.Sst1S BMDMs (Suppl.Fig.4 E-G).

      (6) We added comparison of cMyc expression between the wild type B6 and B6.Sst1S BMDMs during TNF stimulation for 6-24 h (Fig.5I-J).

      (7) New data comparing 4-HNE levels in Mtb-infected B6 wild type and B6.Sst1S macrophages and quantification of replicating Mtb was added (Fig.6B, Suppl.Fig.7C and D).

      (8) In vivo data described in Fig.7 was thoroughly revised and new data was included. We demonstrated increased 4-HNE loads in multibacillary lesions (Fig.7A, Suppl. Fig.9A) and the 4-HNE accumulation in CD11b+ myeloid cells (Fig.7B and Suppl.Fig.9B). We demonstrated that the Ifnb – expressing cells are activated iNOS+ macrophages (Fig.7D and Suppl.Fig.13A). Using new fluorescent multiplex IHC, we have shown that stress markers phopho-cJun and Chac1 in TB lesions are expressed by Ifnb- and iNOS-expressing macrophages (Fig.7E and Suppl.Fig.13D-F).

      (9) We performed additional experiment to demonstrate that naïve (non-BCG vaccinated) lymphocytes did not improve Mtb control by Mtb-infected macrophages in agreement with previously published data (Suppl.Fig.7H).

      Summary of updates

      Following reviewers requests we updated figures to include isotype control antibodies, effects of inhibitors on non-stimulated cells, positive and negative controls for labile iron pool, additional images of 4-HNE and live/dead cell staining.

      Isotype control for IFNAR1 blockade were included in Fig.3M, Fig.4C -E, Fig.6L-M Suppl.Fig.4F-G, 7I.

      Positive and negative controls for labile iron pool measurements were added to Fig.3E, Fig.5D, Suppl.Fig.3B

      Cell death staining images were added Suppl.Fig.3H

      Co-staining of 4-HNE with tubulin was added to Suppl.Fig.3A.

      High magnification images for Figure 7 were added in Suppl.Fig.8 to demonstrate paucibacillary and multibacillary image classification.

      Single-channel color images for individual markers were provided in Fig.7E and Suppl.Fig.13B-F.

      Inhibitor effects on non-stimulated cells were included in Fig.5 D-H, Suppl.Fig.6A and B. Titration of CSF1R inhibitors for non-toxic concentration determination are included in Suppl.Fig.6D.

      In addition, we updated the figure legends in the revised manuscript to include more details about the experiments. We also clarified our conclusions in the Discussion. Responses to every major and minor comment of the reviewers are provided below.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity:

      Summary

      The study by Yabaji et al. examines macrophage phenotypes B6.Sst1S mice, a mouse strain with increased susceptibility to M. tuberculosis infection that develops necrotic lung lesions. Extending previous work, the authors specifically focus on delineating the molecular mechanisms driving aberrant oxidative stress in TNF-activated B6.Sst1S macrophages that has been associated with impaired control of M. tuberculosis. The authors use scRNAseq of bone marrow-derived macrophages to further characterize distinctions between B6.Sst1S and control macrophages and ascribe distinct trajectories upon TNF stimulation. Combined with results using inhibitory antibodies and small molecule inhibitors in in vitro experimentation, the authors propose that TNF-induced protracted c-Myc expression in B6.Sst1S macrophages disables the cellular defense against oxidative stress, which promotes intracellular accumulation of lipid peroxidation products, fueled at least in part by overexpression of type I IFNs by these cells. Using lung tissue sections from M. tuberculosis-infected B6.Sst1S mice, the authors suggest that the presence of a greater number of cells with lipid peroxidation products in lung lesions with high counts of stained M. tuberculosis are indicative of progressive loss of host control due to the TNF-induced dysregulation of macrophage responses to oxidative stress. In patients with active tuberculosis disease, the authors suggest that peripheral blood gene expression indicative of increased Myc activity was associated with treatment failure.

      Major comments

      The authors describe differences in protein expression, phosphorylation or binding when referring to Fig 2A-C, 2G, 3D, 5B, 5C. However, such differences are not easily apparent or very subtle and, in some cases, confounded by differences in resting cells (e.g. pASK1 Fig 3L; c-Myc Fig 5B) as well as analyses across separate gels/blots (e.g. Fig 3K, Fig 5B). Quantitative analyses across different independent experiments with adequate statistical analyses are required to strengthen the associated conclusions.

      We updated our Western blots as follows:

      (1) Densitometery of normalized bands is included above each lane (Fig.2A-C; Fig.3C-D and 3K; Fig.4A-B; Fig.5B,C,I,J). New data in Fig.3K is added to highlight differences between B6 and B6.Sst1S at individual timepoints after TNF stimulation. In Fig.5I we added new data comparing Myc levels in B6 and B6.Sst1S with and without JNK inhibitor and updated the results accordingly. New Fig.3K clearly demonstrates the persistent activation of p-cJun and pAsk1 at 24 and 36h of TNF stimulation. In Fig.5B we clearly demonstrate that Myc levels were higher in B6.Sst1S after 12 h of TNF stimulation. At 6h, however, the basal differences in Myc levels are consistently higher in B6.Sst1S and the induction by TNF is 1.6-fold similar in both backgrounds. We noted this in the text.

      (2) A representative experiment is shown in individual panels and the corresponding figure legend contains information on number of biological repeats. Each Western blot was repeated 2 – 4 times.

      The representative images of fluorescence microscopy in Fig 3H, 4H, 5H, S3C, S3I, S5A, S6A seem to suggest that under some conditions the fluorescence signal is located just around the nucleus rather than absent or diminished from the cytoplasm. It is unclear whether this reflects selective translocation of targets across the cell, morphological changes of macrophages in culture in response to the various treatments, or variations in focal point at which images were acquired. Control images (e.g. cellular actin, DIC) should be included for clarification. If cell morphology changes depending on treatments, how was this accounted for in the quantitative analyses? In addition, negative controls validating specificity of fluorescence signals would be warranted.

      Our conclusion of higher LPO production is based on several parameters: 4-HNE staining, measurements of MDA in cell lysates and oxidized lipids using BODIPY C11. Taken together they demonstrate significant and reproducible increase in LPO accumulation in TNFstimulated B6.Sst1S macrophages. This excludes imaging artefact related to unequal 4-HNE distribution noted by the reviewer. In fact, we also noted that the 4-HNE was spread within cell body of B6.Sst1S macrophages and confirmed it using co-staining with tubulin, as suggested by the reviewer (new Suppl.Fig.3A). Since low molecular weight LPO products, such as MDA and 4-HNE, traverse cell membranes, it is unlikely that they will be strictly localized to a specific membrane bound compartment. However, we agree that at lower concentrations, there might be some restricted localization, explaining a visible perinuclear ring of 4-HNE staining in B6 macrophages. This phenomenon may be explained just by thicker cytoplasm surrounding nucleus in activated macrophages spread on adherent plastic surface or by proximity to specific organelles involved in generation or clearance of LPO products and definitively warrants further investigation.

      We also included images of non-stimulated cells in Fig.3H, Suppl.Fig.3A and 3E. We used multiple fields for imaging and quantified fluorescence signals (Suppl. Fig.3D and 3F, Suppl.Fig.4G, Suppl.Fig.6A and B).

      We used negative controls without primary antibodies for the initial staining optimization, but did not include it in every experiment.

      To interpret the evaluation on the hierarchy of molecular mechanisms in B6.Sst1S macrophages, comparative analyses with B6 control cells should be included (e.g. Fig 4C-I, Fig 5, Fig 6B, E-M, S6C, S6E-F). This will provide weight to the conclusions that the dysregulated processes are specifically associated with the susceptibility of B6.Sst1S macrophages.

      Understanding the sst1-mediated effects on macrophage activation is the focus of our previously published studies Bhattacharya et al., JCI, 2021) and this manuscript. The data comparing B6 and B6.Sst1S macrophage are presented in Fig.1, Fig.2, Fig.3, Fig.4, Fig.5A-C, I and J, Fig.6A-C, 6J and corresponding supplemental figures 1, 2, 3, 4A and B, Suppl.Fig.5, Suppl.Fig.6C, Suppl.Fig.7A-D,7F.

      Once we identified the aberrantly activated pathways in the B6.Sst1S, we used specific inhibitors to correct the aberrant response in B6.Sst1S.

      All experiments using inhibitory antibodies require comparison to the effect of a matched isotype control in the same experiment (e.g. Fig 3J, 4F, G, I; 6L, 6M, S3G, S6F).

      Isotype control for IFNAR1 blockade were included in Fig.3M, Fig.4C-E, Fig.6L-M Suppl.Fig.4F-G, 7I.

      Experiments using inhibitors require inclusion of an inhibitor-only control to assess inhibitor effects on unstimulated cells (e.g. Fig 4I, 5D-I)

      Inhibitor effects on non-stimulated cells were included in Fig.5 D-H, Suppl.Fig.6A and B.

      Fig 3K and Fig 5J appear to contain the same images for p-c-Jun and b-tubulin blots.

      Fig.3K and 5J partially overlapped but had different focus – 3K has been updated to reflect the time course of stress kinase activation. Fig.5J is updated (currently Fig.5I and J) to display B6 and B6.Sst1S macrophage data including cMyc and p-cJun levels.

      Data of TNF-treated cells in Fig 3I appear to be replotted in Fig 3J.

      Currently these data is presented in Fig.3L and 3M and has been updated to include comparison of B6 and B6.Sst1S cells (Fig.3L) and effects of inhibitors in Fig.3M.

      It is stated that lungs from 2 mice with paucibacillary and 2 mice with multi-bacillary lesions were analyses. There is contradicting information on whether these tissues were collected at the same time post infection (week 14?) or whether the pauci-bacillary lesions were in lungs collected at earlier time points post infection (see Fig S8A). If the former, how do the authors conclude that multi-bacillary lesions are a progression from paucibacillary lesions and indicative of loss of M. tuberculosis control, especially if only one lesion type is observed in an individual host? If the latter, comparison between lesions will likely be dominated by temporal differences in the immune response to infection.

      In either case, it is relevant to consider density, location, and cellular composition of lesions (see also comments on GeoMx spatial profiling). Is the macrophage number/density per tissue area comparable between pauci-bacillary and multi-bacillary lesions?

      We did not collect lungs at the same time point. As described in greater detail in our preprints (Yabaji et al., https://doi.org/10.1101/2025.02.28.640830 and https://doi.org/10.1101/2023.10.17.562695) pulmonary TB lesions in our model of slow TB progression are heterogeneous between the animals at the same timepoint, as observed in human TB patients and other chronic TB animal models. Therefore, we perform analyses of individual TB lesions that are classified by a certified veterinary pathologist in a blinded manner based on their morphology (H&E) and acid fast staining of the bacteria, as depicted in Suppl.Fig.8. Currently it is impossible to monitor progression of individual lesions in mice. However, in mice TB is progressive disease and no healing and recovery from the disease have been observed in our studies or reported in literature. Therefore, we assumed that paucibacillary lesions preceded the multibacillary ones, and not vice versa, thus reflecting the disease progression. In our opinion, this conclusion most likely reflects the natural course of the disease. However, we edited the text : instead of disease progression we refer to paucibacillary and multibacillary lesions.

      Does 4HNE staining align with macrophages and if so, is it elevated compared to control mice and driven by TNF in the susceptible vs more resistant mice?

      We performed additional staining and analyses to demonstrate the 4-HNE accumulation in CD11b+ myeloid cells of macrophage morphology. Non-necrotic lesions contain negligible proportion of neutrophils (Fig.7B, Suppl.Fig.9B). B6 mice do not develop advanced multibacillary TB lesions containing 4-HNE+ cells. Also, 4-HNE staining was localized to TB lesions and was not found in uninvolved lung areas of the infected mice, as shown in Suppl.Fig.9A (left panel).

      It is well established that TNF plays a central role in the formation and maintenance of TB granulomas in humans and in all animal models. Therefore, TNF neutralization would lead to rapid TB progression, rapid Mtb growth and lesions destruction in both B6 and B6.Sst1S genetic backgrounds.

      Pathway analysis of spatial transcriptomic data (Suppl.Fig.11) identified TNF signaling via NFkB among dominant pathways upregulated in multibacillary lesions, suggesting that the 4-HNE accumulation paralleled increased TNF signaling. In addition, in vivo other cytokines, including IFN-I, could activate macrophages and stimulate production of reactive oxygen and nitrogen species and lead to the accumulation of LPO products as shown in this manuscript.

      It would be relevant to state how many independent lesions per host were sampled in both the multiplex IHC as well as the GeoMx data. Can the authors show the selected regions of interest in the tissue overview and in the analyses to appreciate within-host and across-host heterogeneity of lesions. The nature of the spatial transcriptomics platform used is such that the data are derived from tissue areas that contain more than just Iba1+ macrophages. At later stages of infection, the cellular composition of such macrophage-rich areas will be different when compared to lesions earlier in the infection process. Hence, gene expression profiles and differences between tissue regions cannot be attributed to macrophages in this tissue region but are more likely a reflection of a mix of cellular composition and per-cell gene expression.

      We used Iba1 staining to identify macrophages in TB lesions and programmed GeoMx instrument to collect spatial transcriptomics probes from Iba1+ cells within ROIs. Also, we selected regions of interest (ROI) avoiding necrotic areas (depicted in Suppl.Fig.10). We agree that Iba1+ macrophage population is heterogenous – some Iba1+ cells are activated iNOS+ macrophages, other are iNOS-negative (Fig.7C and D, and Suppl.Fig.13A). Multibacillary lesions contain larger areas occupied by activated (iNOS+) macrophages (Fig.7D,

      Suppl.Fig.13B and 13F). Although the GeoMx spatial transcriptomic platform does not provide single cell resolution, it allowed us to compare populations of Iba1+ cells in paucibacillary and multibacillary TB lesions and to identify a shift in their overall activation pattern.

      It is stated that loss of control of M. tuberculosis in multibacillary lesions was associated with "downregulation of IFNg-inducible genes". If the authors base this on the tissue expression of individual genes, this requires further investigation to support such conclusion (also see comment on GeoMx above). Furthermore, how might this conclusion be compatible with significantly elevated iNOS+ cells (Fig 7D) in multibacillary lesions?

      We demonstrated that Ciita gene expression is specifically induced by IFN-gamma and is suppressed by IFN-I (Fig.6M). The expression of Ciita in paucibacillary lesions suggest the presence of the IFN-gamma activated cells and its disappearance in the multibacillary lesion is consistent with massive activation of IFN-I pathway (Fig.7C).

      It is appreciated that the human blood signature analyses contain Myc-signatures but the association with treatment failure is not very strong based on the data in Fig 13B and C (Suppl.Fig.15B and C now). The authors indicate that they have no information on disease severity, but it should perhaps not be assumed that treatment failure is indicative of poor host control of the infection. Perhaps independent analyses in separate cohort/data set can add strength and provide -additional insights (e.g. PMID: 35841871; PMID: 32451443, PMID: 17205474, PMID: 22872737). In addition, the human data analyses could be strengthened by extension to additional signatures such as IFN, TNF, oxidative stress. Details of the human study design are not very clear and are lacking patient demographics, site of disease, time of blood collection relative to treatment onset, approving ethics committees.

      X axis of Suppl.Fig.15A represent pre-defined molecular signature gene sets (MSigDB) in Gene Set Enrichment Analysis (GSEA) database (https://www.gseamsigdb.org/gsea/msigdb). On Y axis is area under curve (AUC) score for each gene set. The Myc upregulated gene set myc_up was identified among top gene sets associated with treatment failure using unbiased ssGSEA algorithm. The upregulation of Myc pathway in the blood transcriptome associated with TB treatment failure most likely reflects greater proportion of immature cells in peripheral blood, possibly due to increased myelopoiesis.

      Pathway analysis of the differentially expressed genes revealed that treatment failures were associated with the following pathways relevant to this study: NF-kB Signaling, Flt3 Signaling in Hematopoietic Progenitor Cells (indicative of common myeloid progenitor cell proliferation), SAPK/JNK Signaling and Senescence (indicative of oxidative stress). The upregulation of these pathways in human patients with poor TB treatment outcomes correlates with our findings in TB susceptible mice. The detailed analysis of differentially regulated pathways in human TB patients is beyond the scope of this study and is presented in another manuscript entitled “ Tuberculosis risk signatures and differential gene expression predict individuals who fail treatment” by Arthur VanValkenburg et al., submitted for publication.

      Blood collection for PBMC gene expression profiling of TB patients was prior to TB treatment or within a first week of treatment commencement. Boxplot of bootstrapped ssGSEA enrichment AUC scores from several oncogene signatures ranked from lowest to highest AUC score, with myc_up and myc_dn genes highlighted in red.

      We agree with the reviewer that not every gene in the myc_up gene set correlates with the treatment outcome. But the association of the gene set is statistically significant, as presented in Suppl.Fig.15B – C.

      We updated the details of the study, including study sites and the ethics committee approval statement and references describing these cohorts.

      Other comments

      It is excellent that the authors provide individual data points. Choosing a colour other than black would increase clarity when black bars are used.

      We followed this useful suggestion and selected consistent color codes for B6 and B6.Sst1S groups to enhance clarity throughout the revised manuscript.

      Error bars are inconsistently depicted as either bi-directional or just unidirectional.

      We used bi-directional error bars in the revised manuscript.

      Fig 1E, G, H - please include a scale to clarify what the heat map is representing.

      We have included the expression key in Fig.1E,G and H and Suppl.Fig.1C and D in the revised version.

      Fig 2K, Fig S10A gene information cannot be deciphered.

      We increased the font in previous Fig.2K and moved to supplement to keep larger fonts (current Suppl.Fig.2G).

      Fig S4A,B please add error bars.

      These data are presented as Suppl.Fig.5 in the revised version. We performed one experiment to test the hypothesis. Because the data indicated no clear increase in transposon small RNAs in the sst1S macrophages, we did not pursue this hypothesis further, and therefore, the error bars were not included. However, we decided to include these negative data because it rejects a very attractive and plausible hypothesis.

      Please use gene names as per convention (e.g. Ifnb1) to distinguish gene expression from protein expression in figures and text.

      We addressed the comment in the revised manuscript.

      Fig S8B. Contrary to the description of results, there seems to be minimal overlap between the signal for YFP and the Ifnb1 probe. Is the Ifnb1 reporter mouse a legacy reporter? If so, it is worth stating this and including such considerations in the data interpretation.

      The YFP reporter expresses YFP protein under the control of the Ifnb1 promoter. The YFP protein accumulates within the cells and while Ifnb protein is rapidly secreted and does not accumulate in the producing cells in appreciable amounts. So YFP is not a lineage tracing reporter, but its accumulation marks the Ifnb1 promoter activity in cells, although the YFP protein half-life is longer than that of the Ifnb1 mRNA that is rapidly degraded (Witt et al., BioRxiv, 2024; doi:10.1101/2024.08.28.61018). Therefore, there is no precise spatiotemporal coincidence of these readouts.

      Please clarify what is meant by "normal interstitium" ? If the tissue is from uninfected mice, please state clearly.

      In this context we refer to the uninvolved lung areas of the infected lungs. In every sample we compare uninvolved lung areas and TB lesions of the same animal. Also, we performed staining of lung of non-infected mice as additional controls.

      If macrophage cultures underwent media changes every 48h, how was loss of liberated Mtb taken into account especially if differences in cell density/survival were noted? The assessment of M. tuberculosis load by qPCR is not well described. In particular, the method of normalization applied within the experiments (not within the qPCR) here remains unclear, even with reference to the authors' prior publication.

      Our lab has many years of experience working with macrophage monolayers infected with virulent Mtb and uses optimized protocols to avoid cell losses and related artifacts. Recently we published a detailed protocol for this methodology in STAR Protocols (Yabaji et al., 2022; PMID 35310069). In brief, it includes preparation of single cell suspensions of Mtb by filtration to remove clumps, use of low multiplicity of infection, preparation of healthy confluent monolayers and use of nutrient rich culture medium and medium change every 2 days. We also rigorously control for cell loss using whole well imaging and quantification of cell numbers and live/dead staining.

      Please add citation for the limma package.

      The references has been added (Ritchie et al, NAR 2015; PMID 25605792).

      The description of methodology relating to the "oncogene signatures" is unclear.

      This signature was described in Bild etal, Nature, 2006 and McQuerry JA, et al, 2019 “Pathway activity profiling of growth factor receptor network and stemness pathways differentiates metaplastic breast cancer histological subtypes”. BMC Cancer 19: 881 and is cited in Methods section Oncogene signatures

      Please clearly state time points post infection for mouse analyses.

      We collected lung samples from Mtb infected mice 12 – 20 weeks post infection. The lesions were heterogeneous and were individually classified using criteria described above.

      Reference is made to "a list of genes unique to type I [interferon] genes [....]" (p29). Can the authors indicate the source of the information used for compiling this list?

      The lists were compiled from Reactome, EMBL's European Bioinformatics Institute and GSEA databases. The links for all datasets are provided in Suppl.Table 8 “Expression of IFN pathway genes in Iba1+ cells from pauci- and multi-bacillary lesions of Mtb infected B6.Sst1S mouse lungs” in the “Pool IFN I & II gene sets” worksheet.

      The discussion at present is very long, contains repetition of results and meanders on occasion.

      Thank you for this suggestion, We critically revised the text for brevity and clarity.

      Reviewer #1 (Significance):  

      Strengths and limitations  

      Strengths: multi-pronged analysis approaches for delineating molecular mechanisms of macrophage responses that might underpin susceptibility to M. tuberculosis infection; integration of mouse tissues and human blood samples  

      Weaknesses: not all conclusions supported by data presented; some concerns related to experimental design and controls; links between findings in human cohort and the mechanistic insights gained in mouse macrophage model uncertain

      The revised manuscript addresses every major and minor comment of the reviewers, including isotype controls and naïve T cells, to provide additional support for our conclusions. Our study revealed causal links between Myc hyperactivity with the deficiency of anti-oxidant defense and type I interferon pathway hyperactivity. We have shown that Myc hyperactivity in TNF-stimulated macrophages compromises antioxidant defense leading to autocatalytic lipid peroxidation and interferon-beta superinduction that in turn amplifies lipid peroxidation, thus, forming a vicious cycle of destructive chronic inflammation. This mechanism offers a plausible mechanistic explanation of for the association of Myc hyperactivity with poorer treatment outcomes in TB patients and provide a novel target for host-directed TB therapy.

      Advance

      The study has the potential to advance molecular understanding of the TNF-driven state of oxidative stress previously observed in B6.Sst1S macrophages and possible implications for host control of M. tuberculosis in vivo.

      Audience

      Experts seeking understanding of host factors mediating M. tuberculosis control, or failure thereof, with appreciation for the utility of the featured mouse model in assessing TB diseases progression and severe manifestation. Interest is likely extended to audience more broadly interested in TNF-driven macrophage (dys)function in infectious, inflammatory, and autoimmune pathologies.

      Reviewer expertise

      In preparing this review, I am drawing on my expertise in assessing macrophage responses and host defense mechanisms in bacterial infections (incl. virulent M. tuberculosis) through in vitro and in vivo studies. This includes but is not limited to macrophage infection and stimulation assays, microscopy, intra-macrophage replication of M. tuberculosis, analyses of lung tissues using multi-plex IHC and spatial transcriptomics (e.g. GeoMx). I am familiar with the interpretation of RNAseq analyses in human and mouse cells/tissues, but can provide only limited assessment of appropriateness of algorithms and analysis frameworks.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Yabaji et al. investigated the effects of BMDMs stimulated with TNF from both WT and B6.Sst1S mice, which have previously been identified to contain the sst1 locus conferring susceptibility to Mycobacterium tuberculosis. They identified that B6.Sst1S macrophages show a superinduction of IFNß, which might be caused by increased c-Myc expression, expanding on the mechanistic insights made by the same group (Bhattacharya et al. 2021). Furthermore, prolonged TNF stimulation led to oxidative stress, which WT BMDMs could compensate for by the activation of the antioxidant defense via NRF2. On the other hand, B6.Sst1S BMDMs lack the expression of SP110 and SP140, co-activators of NRF2, and were therefore subjected to maintained oxidative stress. Yabaji et al. could link those findings to in vivo studies by correlating the presence of stressed and aberrantly activated macrophages within granulomas to the failure of Mtb control, as well as the progression towards necrosis. As the knowledge regarding Mtb progression and necrosis of granulomas is not yet well understood, findings that might help provide novel therapy options for TB are crucial. Overall, the manuscript has interesting findings with regard to macrophage responses in Mycobacteria tuberculosis infection.

      However, in its current form there are several shortcomings, both with respect to the precision of the experiments and conclusions drawn.

      In particular a) important controls are often missing, e.g. T-cells form non-immune mice in Fig. 6J, in F, effectivity of BCG in B6 mice in 6N; b) single experiments are shown throughout the manuscript, in particular western blots and histology without proper quantification and statistics, this is absolutely not acceptable; c) very few repetitions are shown in in vitro experiments, where there is no evidence for limitation in resources (usually not more than 3), it is not clear what "independent experiment means" - i.e. the robustness of the findings is questionable; d) data are often normalized multiple times, e.g. in the case of qPCR, and the methods of normalization are not clear (what house-keeping gene exactly?);

      Moreover, experiments regarding IFN I signaling (e.g. short term TNF treatment of BMDMs to analyze LPO, making sure that the reporter mouse for IFNß works in vivo) and c-Myc (e.g. the increase after M-CSF addition might impact on other analysis as well and the experiments should be adjusted to control for this effect; MYC expression in the human samples) should be carefully repeated and evaluated to draw correct conclusions.

      In addition, we would like to strongly encourage the authors to more precisely outline the experimental set-ups and figure legends, so that the reader can easily understand and follow them. In other words: The legends are - in part very - incomplete. In addition, the authors should be mindful of gene names vs. protein names and italicize where appropriate.

      We appreciate a very thorough evaluation of our manuscript by this reviewer. Their insightful comments helped us improve the manuscript. As outlined below in point-by-point responses (1) we added important controls including isotype control antibodies in IFNAR blocking experiments and non-vaccinated T cells in T cell – macrophage interactions experiments; updated figure legends to indicate number of repeated experiment where a representative experiment is shown, numbers of mouse lungs and individual lesions, methods of data normalization, where it was missing. We also explained our in vitro experimental design and how we analyzed and excluded effects of media change and fresh CSF1 addition, by using a rest period before TNF stimulation and Mtb infection. The data shown in Suppl. Fig. 6C (previously Suppl. Fig. 5B) demonstrate that Myc levels induced by CSF1 return to the basal level at 12 h after media change. Our detailed in vitro protocol that contains these details has been published (Yabaji et al., STAR Protocols, 2022). We added new data demonstrating the ROS and LPO production at 6h of TNF stimulation, while the Ifnb1 mRNA super-induction occurred at 16 – 18 h, and edited the text to highlight these dynamics. The upregulation of Myc pathway in human samples does not necessarily mean the upregulation of Myc itself, it could be due to the dysregulation of downstream pathways. The upregulation of Myc pathway in the blood transcriptome associated with TB treatment failure most likely reflects greater proportion of immature cells in peripheral blood, possibly due to increased myelopoiesis. The detailed analysis of this cell populations in human patients is suggested by our findings but it is beyond the scope of this study.

      The reviewer’s comments also suggested that a summary of our findings was necessary. The main focus of our study was to untangle connections between oxidative stress and Ifnb1 superinduction. It revealed that Myc hyperactivity caused partial deficiency of antioxidant defense leading to type I interferon pathway hyperactivity that in turn amplifies lipid peroxidation, thus establishing a vicious cycle driving inflammatory tissue damage.

      Our laboratory worked on mechanisms of TB granuloma necrosis over more than two decades using genetic, molecular and immunological analyses in vitro and in vivo. It provided mechanistic basis for independent studies in other laboratories using our mouse model and further expanding our findings, thus supporting the reproducibility and robustness of our results and our lab’s expertise.

      Specific comments to the experiments and data:

      - Fig. 1E: Evaluation of differences in up- and downregulation between B6 and B6.Sst1S cells should highlight where these cells are within the heatmap, as it is only labelled with the clusters, or it should be depicted differently (in particular for cluster 1 and 2). Furthermore, a more simple labelling of the pathways would increase the readability of the data.

      For our scRNAseq data presentation, we used formats accepted by computational community. To clarify Fig.1E, we added labels above B6 and B6.Sst1S-specific clusters.

      - Fig. 2D, E: The staining legend is missing. For the quantification it is not clear what % total means. Is this based on the intensity or area? What do the dots represent in the bar chart? Is one data point pooled from several pictures? If not, the experiments need to be repeated, as three pictures might not be representative for evaluation.

      - Fig. 2E: Statistics comparing B6/ B6,SsT1S with TNF (different) is required: Absence of induction is not a proof for a difference!

      We included staining with NRF2-specific antibodies and performed area quantification per field using ImageJ to calculate the NRF2 total signal intensity per field. Each dot in the graph represents the average intensity of 3 fields in a representative experiment. The experiment was repeated 3 times. We included pairwise comparison of TNF-stimulated B6 and B6.Sst1S macrophages and updated the figure legend.

      - Fig. 3E: Positive and negative control need to be depicted in the figure (see legend).

      We have added the positive and negative controls for the determination of labile iron pool to the data in Fig. 3E and related Suppl. Fig. 3B and to Fig. 5D that also demonstrates labile iron determination.

      - Fig. 3I: A quantification by flow cytometry or total cell counts are important, as 6% cell death in cell culture is a very modest observation. Otherwise, confocal images of the quantification would be a good addition to judge the specificity of the viability staining.

      To validate the specificity of the viability staining method, we have provided fluorescent images as Suppl.Fig.3H. The main point of this experiment was to demonstrate a modest, but reproducible, increase in cell death in the sst1-mutant macrophages that suggested an IFNdependent oxidative damage. In our study, we did not focus on mechanisms of cell death, but on a state of chronic oxidative stress in the sst1 mutant live cells during TNF stimulation.

      - Fig. 3I, J: What does one dot represent?

      We performed this assay in 96 well format and each dot represent the % cell death in an individual well.

      - Fig. 3K,L: For the B6 BMDMs it seems that p-cJun is highly increased at 12h in (L), while it is not in (K). On the other hand, for the B6.Sst1S BMDMs it peaks at 24h in (K), while in (L) it seems to at 12h. According to the data in (L) it seems that p-cJun is rather earlier and stronger activated in B6 BMDMs and has a weakened but prolonged activation in the B6.Sst1S BMDMs, which would not fit with your statement in the text that B6.Sst1S BMDMs show an upregulation.

      These experiments need repetitions and quantification and statistiscs.

      Fig. 3L: ASK1 seems to be higher at 12h for the B6 BMDMs and similar for both lines at 24h, which is not fitting to the statement in the text. ("Also, the ASK1 - JNK - cJun stress kinase axis was upregulated in B6.Sst1S macrophages, as compared to B6, after 12 - 36 h of TNF stimulation")

      These experiments were repeated, and new data were added to highlight differences in ASK1 and c-Jun phosphorylation between B6 and B6.Sst1S at individual timepoints after TNF stimulation (presented in new Fig.3K). It demonstrated that after TNF stimulation the activation of stress kinases ASK1 and c-Jun initially increased in both genetic backgrounds. However, their upregulation was maintained exclusively in the sst1-susceptible macrophages from 24 to 36 h of TNF stimulation, while in the resistant macrophages their upregulation was transient. Thus, during prolonged TNF stimulation, B6.Sst1S macrophages experience stress that cannot be resolved, as evidenced by this kinetic analysis. The quantification of the band intensity was added to Western blot images above individual lanes.

      Reviewer 2 pointed to missing isotype control antibodies in Fig.3 and Fig.4:

      - Figure 3J: the isotype control for the IFNAR antibody is missing

      - Figure 4E: It seems the isotype control itself has already an effect in the reduction of IFNb.

      - Fig. 4H: It seems that the Isotype control antibody had an effect to increase 4-HNE (compared to TNF stimulated only).

      We always include isotype control antibodies in our experiments because antibodies are known to modulate macrophage activation via binding to Fc receptor. To address the reviewer’s comments, we updated all panels that present the effects of IFNAR1 blockade with isotypematched non-specific control antibodies in the revised manuscript. Specifically, we included isotype control in Fig. 3M (previously Fig.3J), Fig.4I, Suppl.4E-G, Fig.6L-M), Suppl.Fig.7I (previously Suppl.Fig.6F).

      - Fig.4A - C: "IFNAR1 blockade, however, did not increase either the NRF2 and FTL protein levels, or the Fth, Ftl and Gpx1 mRNA levels above those treated with isotype control antibodies"

      Maybe not above the isotype but it is higher than the TNF alone stimulation at least for NRF2 at 8h and for Ftl at both time points. Why does the isotype already cause stimulation/induction of the cells? !These experiments need repetitions and quantification and statistics!

      To determine specific effects of IFNAR blockade we compared effects of non-specific isotype control and IFNAR1-specific antibodies. In our experiments, the isotype control antibody modestly increased of Nrf2 and Ftl protein levels and the Fth and Ftl mRNA levels, but their effects were similar to the effect of IFNAR-specific antibody. The non-IFN -specific effects of antibodies, although are of potential biological significance, are modest in our model and their analysis is beyond the scope of this study.

      - Fig.4H Was the AB added also at 12h post stimulation? Figure legend should be adjusted.

      The IFNAR1 blocking antibodies and isotype control antibodies were added at 2 h after TNF stimulation in Fig.4H and 4I, as described in the corresponding figure legend. The data demonstrating effects of IFNAR blockade after 12, 24,and 33h of TNF stimulation are presented in Suppl.Fig.4 E-G.

      - Figure 4I: How was the data measured here, i.e. what is depicted? The isotype control is missing. It seems a two-way ANOVA was used, yet it is stated differently. The figure legend should be revised, as Dunnett's multiple comparison would only check for significances compared to the control.

      The microscopy images and bar graphs were updated to include isotype control and presented in Suppl. Fig.4E - G of the revised version. We also revised the statistical analysis to include correction for multiple comparisons.

      - Figure 4C and subsequent: How exactly was the experiment done (house-keeping gene)?

      We included the details in the figure legends of revised version. We quantified the gene expression by DDCt method using b-actin (for Fig. 4C-E) and 18S (For Fig. 4F and G) as internal controls.

      - Figure 4D,E: Information on cells used is missing. Why the change in stimulation time? Did it not work after 12h? Then the experiments in A-C should be repeated for 16h.

      The updated Fig. 4D and E present comparison of B6 and B6.Sst1S BMDMs clearly demonstrating significant difference between these macrophages in Ifnb1 mRNA expression 16 h after TNF stimulation, in agreement with our previous publication(Bhattacharya, et al., 2021). There we studied the time course of responses of B6 and B6.Sst1S macrophages to TNF at 2h intervals and demonstrated the divergence between their activation trajectories starting at 12 h of TNF stimulation Therefore, to reveal the underlying mechanisms we focus our analyses on this critical timepoint, i.e. as close to the divergence as possible. However, the difference between the strains in Ifnb1 mRNA expression achieved significance only by 16h of TNF stimulation. That is why we have used this timepoint for the Ifnb1 and Rsad2 analyses. It clearly shows that the superinduction was not driven by the positive feedback via IFNAR, as has been shown by the Ivashkiv lab for B6 wild type macrophages previously PMID 21220349.

      - Figure 4E: It would be helpful to see if these transcripts are actually translated into protein levels, e.g. perform an ELISA. Authors state that IFNAR blockages does not alter the expression but you statistic says otherwise.

      - The data for Ifnb expression (or better protein level) should be provided for B6 BMDMs as well.

      We have previously reported the differences in Ifnb protein secretion (He et al., Plos Pathogens, 2013 and Bhattacharya et al., JCI 2021). We use mRNA quantification by qRT-PCR as a more sensitive and direct measurement of the sst1-mediated phenotype. The revised Fig.4D and E include responses of B6 in addition to the B6.Sst1S to demonstrate that the IFNAR blockade does not reduce the Ifnb1 mRNA levels in TNF-stimulated B6.Sst1S mutant to the B6 wild type levels. A slight reduction can be explained by a known positive feedback loop in the IFN-I pathway (see above). In this experiment we emphasized that the effect of the sst1 locus is substantially greater, as compared to the effect of the IFNAR blockade (Fig.4D), and updated the text accordingly.

      - Fig. 4F: To what does the fold induction refer to? If it is again to unstimulated cells, then why is the induction now so much higher than in (E) where it was only 50x (now to 100x).

      - Figure 4G: Again to what is the fold induction referring to? It seems your Fer-1 treatment only contains 2 data points. This needs to be fixed.

      Yes, the fold induction was calculated by normalizing mRNA levels to untreated control incubated for the same time. Regarding the variation in Ifnb1 mRNA levels - a two-fold variation is not unusual in these experiments that may result in the Ifnb1 mRNA superinduction ranging from 50 -200-fold at this timepoint (16h). The graph in Fig.4G was modified to make all datapoints more visible.

      - "These data suggest that type I IFN signaling does not initiate LPO in our model but maintains and amplifies it during prolonged TNF stimulation that, eventually, may lead to cell death". Data for a short term TNF stimulation are not shown, however, so it might impact also on the initiation of LPO.

      - The overall conclusion drawn from Fig. 3 and 4 is not really clear with regard that IFN does not initiate LPO. Where is that shown? Data on earlier stimulation time points should be added to make this clear.

      We demonstrated ROS production (new Suppl.Fig.3G) and the rate of LPO biosynthesis (new Suppl.Fig.4E-F) at 6 h post TNF stimulation, while the Ifnb1 superinduction occurs between 12-18 h post TNF stimulation. This temporal separation supports our conclusion that IFN-β superinduction does not initiate LPO. We clarified it in the text:

      “Thus, Ifnb1 super-induction and IFN-I pathway hyperactivity in B6.Sst1S macrophages follow the initial LPO production, and maintain and amplify it during prolonged TNF stimulation”. (Previously: These data suggest that type I IFN signaling does not initiate LPO in our model). We also edited the conclusion in this section to explain the hierarchy of the sst1-regulated AOD and IFN-I pathways better:

      “Taken together, the above experiments allowed us to reject the hypothesis that IFN-I hyperactivity caused the sst1-dependent AOD dysregulation. In contrast, they established that the hyperactivity of the IFN-I pathway in TNF-stimulated B6.Sst1S macrophages was itself driven by the initial dysregulation of AOD and iron-mediated lipid peroxidation. During prolonged TNF stimulation, however, the IFN-I pathway was upregulated, possibly via ROS/LPOdependent JNK activation, and acted as a potent amplifier of lipid peroxidation”.

      We believe that these additional data and explanation strengthen our conclusions drawn from Figures 3 and 4.

      - "A select set of mouse LTR-containing endogenous retroviruses (ERV's) (Jayewickreme et al, 2021), and non-retroviral LINE L1 elements were expressed at a basal level before and after TNF stimulation, but their levels in the B6.Sst1S BMDMs were similar to or lower than those seen in B6". This sentence should be revised as the differences between B6 and B6.Sst1S BMDMs seem small and are not there after 48h anymore. Are these mild changes really caused by the mutation or could they result from different housing conditions and/or slowly diverging genetically lines. How many mice were used for the analysis? Is there already heterogeneity between mice from the same line?

      We agree with the reviewer that the data presented in Suppl.Fig.4 (Suppl.Fig.5 in the revised version) indicated no increase in single- and double-stranded transposon RNAs in the B6.Sst1S macrophages. The purpose of these experiment was to test the hypothesis that increased transposon expression might be responsible for triggering the superinduction of type I interferon response in TNF-stimulated B6.Sst1S macrophages. In collaboration with a transposon expert Dr. Nelson Lau (co-author of this manuscript) we demonstrated that transposon expression was not increased above the B6 level and, thus, rejected this attractive hypothesis. We explained the purpose of this experiment in the text and adequately described our findings as “the levels in the B6.Sst1S BMDMs were similar to or lower than those seen in B6”…and concluded that ” the above analyses allowed us to exclude the overexpression of persistent viral or transposon RNAs as a primary mechanism of the IFN-I pathway hyperactivity” in the sst1-mutant macrophages.

      - Fig. 5A: Indeed, it even seems that Myc is upregulated for the mutant BMDMs. Yet, there are only 2 data points for B6 12h.

      These experiments need repetitions and quantification and statistics.

      We observed these differences in c-Myc mRNA levels by independent methods: RNAseq and qRT-PCR. The qRT-PCR experiments were repeated 3 times. A representative experiment in Fig.5A shows 3 data points for each condition. We reformatted the panel to make all data points clearly visible.

      - Fig. 5B: Why would the protein level decrease in the controls over 6h of additional cultivation? Is this caused by fresh M-CSF? In this case maybe cells should be left to settle for one day before stimulating them to properly compare c-Myc induction. Comment on two c-Myc bands is needed. At 12h only the upper one seems increased for TNF stimulated mutant BMDMs compared to B6 BMDMs.

      We agree with the reviewer’s point that cells need to be rested after media change that contains fresh CSF-1. Indeed, in Suppl.Fig.6C, we show that after media change containing 10% L929 supernatant (a source of CSF1) there is an increase in c-Myc protein levels that takes approximately 12 hours to return to baseline.

      Our protocol includes resting period of 18-24 h after medium change before TNF stimulation.

      We updated Methods to highlight this detail. Thus, the increase in c-Myc levels we observe at 12 h of TNF stimulation (Fig.5B) is induced by TNF, not the addition of growth factors, as further discussed in the text.

      The two c-Myc bands observed in Fig.5B,I and J, are similar to patterns reported in previous studies that used the same commercial antibodies (PMIDs: 24395249, 24137534, 25351955). Whether they correspond to different c-Myc isoforms or post-translational modifications is unknown.

      - Fig. 5A,B: It seems that not all the RNA is translated into protein, as c-Myc at 12h in the mutant BMDMs seems to be lower than at 6h, while the gene expression implicates it vice versa.

      In addition to Fig.5B, the time course of Myc protein expression up to 24 h is presented in new panels Fig. 5I-5J. It demonstrates the gradual decrease of Myc protein levels. The observed dissociation between the mRNA and protein levels in the sst1-mutant BMDMs at 12 and 24 h is most likely due to translation inhibition as a result of the development of the integrated stress response, ISR (as shown in our previous publication by Bhattacharya et al., JCI, 2021). Translation of Myc is known to be particularly sensitive to the ISR (PMID18551192, PMID25079319, PMID28490664). Perhaps, the IFN-driven ISR may serve as a backup mechanism for Myc downregulation. We are planning to investigate these regulatory mechanisms in greater detail in the future.

      - Fig. 5J: Indeed, the inhibitor seems to cause the downregulation of the proteins. Explanation?

      This experiment was repeated twice and the average normalized densitometry values are presented in the updated Fig.5J. The main question addressed in this experiment was whether hyperactivity of JNK in TNF-stimulated sst1 mutant macrophages contributed to Myc upregulation, as had been previously shown in cancer. Comparing effects of JNK inhibition on phospho-cJun and c-Myc protein levels in TNF stimulated B6.Sst1S macrophages (updated Fig.5J), we rejected the hypotghesis that JNK activity might have a major role in c-Myc upregulation in sst1 mutant macrophages.

      - "TNF stimulation tended to reduce the LPO accumulation in the B6 macrophages and to increase it in the B6.Sst1S ones" However, this is not apparent in Sup. Fig. 6B. Here it seems that there might be a significant increase.

      Suppl.Fig.6B (currently Suppl.Fig.7B) shows the 4-HNE accumulation at day 3 post infection. The data obtained after 5 days of Mtb infection are shown in Fig.6A. We clarified this in the text: “By day 5 post infection, TNF stimulation induced significant LPO accumulation only in the B6.Sst1S macrophages (Fig.6A)”.

      - Fig. 6B: Mtb and 4-HNE should be shown in two different channels in order to really assign each staining correctly.

      What time point is this? Are the mycobacteria cleared at MOI1, since it looks that there are fewer than that? How does this look like for the B6 BMDMs? Are there even less mycobacteria?

      We included B6 infection data to the updated Fig.6B and added Suppl.Fig.7C and 7D that address this reviewer’s comment. The data represent day 5 of Mtb infection as indicated in the updated Fig.6B and Suppl.Fig.7C and 7D legends. New Suppl.Fig.7D shows quantification of replicating Mtb using Mtb replication reporter stain expressing single strand DNA binding protein GFP fusion, as described in Methods. We observed fewer Mtb and a lower percentage of replicating Mtb in B6 macrophages, but we did not observe a complete Mtb elimination in either background.

      We used red fluorescence for both Mtb::mCherry and 4-HNE staining to clearly visualize the SSB-GFP puncta in replicating Mtb DNA. In the revised manuscript, we have included the relevant channels in Suppl. Fig.7C and D to demonstrate clearly distinct patterns of Mtb::mCherry and 4-HNE signals. We did not aim to quantify the 4-HNE signal intensity in this experiment. For the 4-HNE quantification we use Mtb that expressed no reporter proteins (Fig.6A-B and Suppl.Fig.7A-B).

      - Fig 6E: In the context of survival a viability staining needs to be included, as well as the data from day 0. Then it needs to be analyzed whether cell numbers remain the same from D0 or if there is a change.

      We updated Fig.6 legend to indicate that the cell number percentages were calculated based on the number of cells at Day 0 (immediately after Mtb infection). We routinely use fixable cell death staining to enumerate cell death to exclude artifacts due to cell loss. Brief protocol containing this information is included in Methods section. The detailed protocol including normalization using BCG spike has been published – Yabaji et al, STAR Protocols, 2022. Here we did not present dead cell percentage as it remained low and we did not observe damage to macrophage monolayers. The fold change of Mtb was calculated after normalization using Mtb load at Day 0 after infection and washes.

      "The 3D imaging demonstrated that YFP-positive cells were restricted to the lesions, but did not strictly co-localize with intracellular Mtb, i.e. the Ifnb promoter activity was triggered by inflammatory stimuli, but not by the direct recognition of intracellular bacteria. We validated the IFNb reporter findings using in situ hybridization with the Ifnb probe, as well as anti-GFP antibody staining (Suppl.Fig.8B - E)." The colocalization is not present within the tissue sections. It seems that the reporter line does not show the same staining pattern in vivo as the IFNß probe or the anti GFP antibody staining. The reporter line has to be tested for the specificity of the staining. Furthermore, to state that it was restricted to the lesions, an uninvolved tissue area needs to be depicted.

      The Ifnb secreting cells are notoriously difficult to detect in vivo using direct staining of the protein. Therefore, lineage tracing of reporter expression are used as surrogates. The Ifnb reporter used in our study has been developed by the Locksley laboratory (Scheu et al., PNAS, 2008, PMID: 19088190) and has been validated in many independent studies. The reporter mice express the YFP protein under the control of the Ifnb1 promoter. The YFP protein accumulates within the cells, while Ifnb protein is rapidly secreted and does not accumulate in the producing cells in appreciable amounts. Also, the kinetics of YFP protein degradation is much slower as compared to the endogenous Ifnb1 mRNA that was detected using in situ hybridization. Thus, there is no precise spatiotemporal coincidence of these readouts in Ifnb expressing cells in vivo. However, this methodology more closely reflect the Ifnb expressing cells in vivo, as compared to a Cre-lox mediated lineage tracing approach. In the revised manuscript we demonstrate that both YFP and mRNA signals partially overlap (Suppl.Fig.12B). In Suppl.Fig.12B. we also included a new panel showing no YFP expression in the uninvolved area of the reporter mice infected with Mtb. The YFP expression by activated macrophages is demonstrated by co-staining with Iba1- and iNOS-specific antibodies (new Fig.7D and Suppl.Fig.13A). Our specificity control also included TB lesions in mice that do not carry the YFP reporter and did not express the YFP signal, as reported elsewhere (Yabaji et al., BioRxiv, https://doi.org/10.1101/2023.10.17.562695).

      - Are paucibacillary and multibacillary lesions different within the same animal or does one animal have one lesion phenotype? If that is the case, what is causing the differences between mice? Bacterial counts for the mice are required.

      The heterogeneity of pulmonary TB lesions has been widely acknowledged in clinic and highlighted in recent experimental studies. In our model of chronic pulmonary TB (described in detail in Yabaji et al., https://doi.org/10.1101/2025.02.28.640830 and https://doi.org/10.1101/2023.10.17.562695) the development of pulmonary TB lesions is not synchronized, i.e. the lesions are heterogeneous between the animals and within individual animals at the same timepoint. Therefore, we performed a lesion stratification where individual lesions were classified by a certified veterinary pathologist in a blinded manner based on their morphology (H&E) and acid fast staining of the bacteria, as depicted in Suppl.Fig.8.

      - "Among the IFN-inducible genes upregulated in paucibacillary lesions were Ifi44l, a recently described negative regulator of IFN-I that enhances control of Mtb in human macrophages (DeDiego et al, 2019; Jiang et al, 2021) and Ciita, a regulator of MHC class II inducible by IFNy, but not IFN-I (Suppl.Table 8 and Suppl.Fig.10 D-E)." Why is Sup. Fig. 10 D, E referred to? The figure legend is also not clear, e.g. what means "upregulated in a subset of IFN-inducible genes"? Input for the hallmarks needs to be defined.

      These data is now presented in Suppl.Fig.11 and following the reviewer’s comment, we moved reference to panels 11D – E up to previous paragraph in the main text, where it naturally belongs . We also edited the figure legend to refer to the list of IFN-inducible genes compiled from the literature that is discussed in the text. We appreciate the reviewer’s suggestion that helped us improve the text clarity. The inputs for the Hallmark pathway analysis are presented in Suppl.Tables 7 and 8, as described in the text.

      - Fig. 7C: Single channel pictures are required as it is hard to see the differences in staining with so many markers. Why is there no iNOS expression in the bottom row? What does the rectangle indicate on the bottom right? As black is chosen for DAPI, it is not visible at all. In case the signal is needed a visible a color should be chosen.

      We thoroughly revised this figure to address the reviewer’s concern about the lack of clarity. We provide individual channels for each marker in Fig.7D – E and Suppl.Fig.13F. We have to use DAPI in these presentation in gray scale to better visualize other markers.

      - "In the advanced lesions these markers were primarily expressed by activated macrophages (Iba1+) expressing iNOS and/or Ifny (YFP+)(Fig.7D)" Iba1 is needed in the quantification. Based on the images, iNOS seems to be highly produced in Iba1 negative cells. Which cells do produce it then? Flow cytometry data for this quantification are required. This would allow you to specifically check which cells express the markers and allow for a more precise analysis of double positive cells.

      Currently these data demonstrating the co-localization of stress markers phospho-c-Jun and Chac1 with YFP are presented in Fig.7E (images) and Suppl.Fig.13D (quantification). The co-localization of stress markers phospho-cJun and Chac1 with iNOS is presented in Suppl.Fig.13F (images) and Suppl.Fig.13E (quantification). We agree that some iNOS+ cells are Iba1-negative (Fig.7D). We manually quantified percentages of Iba1+iNOS+ double positive cells and demonstrated that they represent the majority of the iNOS+ population(Suppl.Fig.13A). Regarding the required FACS analysis, we focus on spatial approaches because of the heterogeneity of the lesions that would be lost if lungs are dissociated for FACS. We are working on spatial transcriptomics at a single cell resolution that preserves spatial organization of TB lesions to address the reviewer’s comment and will present our results in the future.

      - Results part 6: In general, can you please state for each experiment at what time point mice were analyzed? You should include an additional macrophage staining (e.g. MerTK, F4/80), as alveolar macrophages are not staining well for Iba1 and you might therefore miss them in your IF microscopy. It would be very nice if you could perform flow cytometry to really check on the macrophages during infection and distinguish subsets (e.g. alveolar macrophages, interstitial macrophages, monocytes).

      We have included the details of time post infection in figure legends for Fig.7, Suppl.Figures 8, 9, 12B, 13, 14A of the revised manuscript. We have performed staining with CD11b, CD206 and CD163 to differentiate the recruited and lung resident macrophages and determined that in chronic pulmonary TB lesions in our model the vast majority of macrophages are recruited CD11b+, but not resident (CD206+ and CD163+) macrophages. These data is presented in another manuscript (Yabaji et al., BioRxiv https://doi.org/10.1101/2023.10.17.562695).

      - Spatial sequencing: The manuscript would highly profit from more data on that. It would be very interesting to check for the DEGs and show differential spatial distribution. Expression of marker genes should be inferred to further define macrophage subsets (e.g. alveolar macrophages, interstitial macrophages, recruited macrophages) and see if these subsets behave differently within the same lesion but also between the lesions. Additional bioinformatic approaches might allow you to investigate cell-cell interactions. There is a lot of potential with such a dataset, especially from TB lesions, that would elevate your findings and prove interesting to the TB field.

      - "Thus, progression from the Mtb-controlling paucibacillary to non-controlling multibacillary TB lesions in the lungs of TB susceptible mice was mechanistically linked with a pathological state of macrophage activation characterized by escalating stress (as evidenced by the upregulation phospho-cJUN, PKR and Chac1), the upregulation of IFNβ and the IFN-I pathway hyperactivity, with a concurrent reduction of IFNγ responses." To really show the upregulation within macrophages and their activation, a more detailed IF microscopy with the inclusion of additional macrophage markers needs to be provided. Flow cytometry would enable analysis for the differences between alveolar and interstitial macrophages, as well as for monocytes. As however, it seems that the majority of iNOS, as well as the stress associated markers are not produced by Iba1+ cells. Analyzing granulocytes and T lymphocytes should be considered.

      We appreciate the reviewer’s suggestion. Indeed, our model provides an excellent opportunity to investigate macrophage heterogeneity and cell interactions within chronic TB lesions. We are working on spatial transcriptomics at a single cell resolution that would address the reviewer’s comment and will present our results in the future.

      In agreement with classical literature the overwhelming majority of myeloid cells in chronic pulmonary TB lesions is represented by macrophages. Neutrophils are detected at the necrotic stage, but our study is focused on pre-necrotic stages to reveal the earlier mechanisms predisposing to the necrotization. We never observed neutrophils or T cells expressing iNOS in our studies.

      - It's mentioned in the method section that controls in the IF staining were only fixed for 10min, while the infected cells were fixed for 30min. Consistency is important as the PFA fixation might impact on the fluorescence signal. Therefore, controls should be repeated with the same fixation time.

      We have carefully considered the impact of fixation time on fluorescence and have separately analyzed the non-infected and infected samples to address this concern. For the non-infected samples, we examined the effect of TNF in both B6 and B6.Sst1S backgrounds, ensuring that a consistent fixation protocol (10 min) was applied across all experiments without Mtb infection.

      For the Mtb infection experiments, we employed an optimized fixation protocol (30 min) to ensure that Mtb was killed before handling the plates, which is critical for preserving the integrity of the samples. In this context, we compared B6 and B6.Sst1S samples to evaluate the effects of fixation and Mtb infection on lipid peroxidation (LPO) induction.

      We believe this approach balances the need for experimental consistency with the specific requirements for handling infected cells, and we have revised the manuscript to reflect this clarification.

      - Reactive oxygen species levels should be determined in B6 and B6.Sst1S BMDMs (stimulated and unstimulated), as they are very important for oxidative stress.

      We have conducted experiments to measure ROS production in both B6 and B6.Sst1S BMDMs and demonstrated higher levels of ROS in the susceptible BMDMs after prolonged TNF stimulation (new Fig.3I-J and Suppl. Fig. 3G). Additionally, we have previously published a comparison of ROS production between B6 and B6.Sst1S by FACS (PMID: 33301427), which also supports the findings presented here.

      - Sup. Fig 2C: The inclusion of an unstimulated control would be advisable in order to evaluate if there are already difference in the beginning.

      We have included the untreated control to the Suppl. Fig. 2C (currently Suppl. Fig. 2D) in the revised manuscript.

      - Sup. Fig. 3F: Why is the fold change now lower than in Fig. 4D (fold change of around 28 compared to 120 in 4D)?

      The data in Fig.4D (Fig.4E in the revised manuscript) and Suppl.Fig.3F (currently Suppl.Fig.4C) represent separate experiments and this variation between experiments is commonly observed in qRT-PCR that is affected by slight variations in the expression in unsimulated controls used for the normalization and the kinetics of the response. This 2-4 fold difference between same treatments in separate experiments, as compared to 30 – 100 fold and higher induction by TNF does not affect the data interpretation.

      - Sup. Fig. 5C, D: The data seems very interesting as you even observe an increase in gene expression. Data for the B6 mice should be evaluated for increase to a similar level as the TNF treated mutants. Data on the viability of the cells are necessary, as they no longer receive MCSF and might be dying at this point already.

      To ensure that the observed effects were not confounded by cytotoxicity, we determined non-toxic concentrations of the CSF1R inhibitors during 48h of incubation and used them in our experiments that lasted for 24h. To address this valid comment, we have included cell viability data in the revised manuscript to confirm that the treatments did not result in cell death (Suppl. Fig. 6D). This experiment rejected our hypothesis that CSF1 driven Myc expression could be involved in the Ifnb superinduction. Other effects of CSF1R inhibitors on type I IFN pathway are intriguing but are beyond the scope of this study.

      - Sup. Fig 12: the phospho-c-Jun picture for (P) is not the same as in the merged one with Iba1. Double positive cells are mentioned to be analyzed, but from the staining it appears that P-c-Jun is expressed by other cells. You do not indicate how many replicates were counted and if the P and M lesions were evaluated within the same animal. What does the error bar indicate? It seems unlikely from the plots that the double positive cells are significant. Please provide the p values and statistical analysis.

      We thank the reviewer for bringing this inadvertent field replacement in the single phospho-cJun channel to our attention. However, the quantification of Iba1+phospho-cJun+ double positive cells in Suppl.Fig.12 and our conclusions were not affected. In the revised manuscript, images and quantification of phospho-cJun and Iba1 co-expression are shown in new Suppl.Fig.13B and C, respectively. We have also updated the figure legends to denote the number of lesions analyzed and statistical tests. Specifically, lesions from 6–8 mice per group (paucibacillary and multibacillary) were evaluated. Each dot in panels Suppl.Fig.13 represent individual lesions.

      - Sup. Fig. 13D (suppl.Fig.15D now): What about the expression of MYC itself? Other parts of the signaling pathway should be analyzed(e.g. IFNb, JNK)?

      The difference in MYC mRNA expression tended to be higher in TB patients with poor outcomes, but it was not statistically significant after correction for multiple testing. The upregulation of Myc pathway in the blood transcriptome associated with TB treatment failure most likely reflects greater proportion of immature cells in peripheral blood, possibly due to increased myelopoiesis. Pathway analysis of the differentially expressed genes revealed that treatment failures were associated with the following pathways relevant to this study: NF-kB Signaling, Flt3 Signaling in Hematopoietic Progenitor Cells (indicative of common myeloid progenitor cell proliferation), SAPK/JNK Signaling and Senescence (possibly indicative of oxidative stress). The upregulation of these pathways in human patients with poor TB treatment outcomes correlates with our findings in TB susceptible mice.

      - In the mfIHC you he usage of anti-mouse antibodies is mentioned. Pictures of sections incubated with the secondary antibody alone are required to exclude the possibility that the staining is not specific. Especially, as this data is essential to the manuscript and mouse-antimouse antibodies are notorious for background noise.

      We are well aware of the technical difficulties associated with using mouse on mouse staining. In those cases, we use rabbit anti-mouse isotype specific antibodies specifically developed to avoid non-specific background (Abcam cat#ab133469). Each antibody panel for fluorescent multiplexed IHC is carefully optimized prior to studies. We did not use any primary mouse antibodies in the final version of the manuscript and, hence, removed this mention from the Methods.

      - In order to tie the story together, it would be interesting to treat infected mice with an INFAR antibody, as well as perform this experiment with a Myc antibody. According to your data, you might expect the survival of the mice to be increased or bacterial loads to be affected.

      In collaboration with the Vance laboratory, we tested effects of type I IFN pathway inhibition in B6.Sst1S mice on TB susceptibility: either type I receptor knockout or blocking antibodies increased their resistance to virulent Mtb (published in Ji et al., 2019; PMID 31611644). Unfortunately, blocking Myc using neutralizing antibodies in vivo is not currently achievable. Specifically blocking Myc using small molecule inhibitors in vivo is notoriously difficult, as recognized in oncology literature. We consider using small molecule inhibitors of either Myc translation or specific pathways downstream of Myc in the future.

      - It is surprising that you not even once cite or mention your previous study on bioRxiv considering the similarity of the results and topic (https://doi.org/10.1101/2020.12.14.422743). Is not even your Figure 1I and Figure 2 J, K the same as in that study depicted in Figure 4?

      The reviewer refers to the first version of this manuscript uploaded to BioRxiv, but it has never been published. We continued this work and greatly expanded our original observations, as presented in the current manuscript. Therefore, we do not consider the previous version as an independent manuscript and, therefore, do not cite it.

      - Please revise spelling of the manuscript and pay attention to write gene names in italics

      Thank you, we corrected the gene and protein names according to current nomenclature.

      Minor points:

      - Fig. 1: Please provide some DEGs that explain why you used this resolution for the clustering of the scRNAseq data and that these clusters are truly distinct from each other.

      Differential gene expression in clusters is presented in Suppl.Fig.1C (interferon response) and Suppl.Fig.1D (stress markers and interferon response previously established in our studies).

      - Fig. 1F: What do the two lines represent (magenta, green)?

      The lines indicate pseudotime trajectories of B6 (magenta) and B6.Sst1S (green) BMDMs.

      - Fig. 1F, G: Why was cluster 6 excluded?

      This cluster was not different between B6 and B6.Sst1S, so it was not useful for drawing the strain-specific trajectories.

      - Fig. 1E, G, H: The intensity scales are missing. They are vital to understand the data.

      We have included the scale in revised manuscript (Fig.1E,G,H and Suppl.Fig.1C-D).

      - Fig. 2G-I: please revise order, as you first refer to Fig. 2H and I

      We revised the panels’ order accordingly

      - Fig. 5: You say the data represents three samples but at least in D and E you have more. Please revise. Why do you only include at (G) the inhibitor only control?

      We added the inhibitor only controls to Fig. 5D - H. We also indicated the number of replicates in the updated Fig.5 legend.

      - Figure 7A, Sup. Fig. 8: Are these maximum intensity projection? Or is one z-level from the 3D stack depicted?

      The Fig. 7A shows 3D images with all the stacks combined.

      - Fig. 7B: What do the white boxes indicate?

      We have removed this panel in the revised version and replaced it with better images.

      - Sup. Fig. 1A: The legend for the staining is missing

      The Suppl. Fig.1A shows the relative proportions of either naïve (R and S) or TNFstimulated (RT and ST) B6 or B6.Sst1S macrophages within individual single cell clusters depicted in Fig.1B. The color code is shown next to the graph on the right.

      - Sup. Fig. 1B: The feature plots are not clear: The legend for the expression levels is missing. What does the heading means?

      We updated the headings, as in Fig.1C. The dots represent individual cells expressing Sp110 mRNA (upper panels) and Sp140 mRNA (lower panels).

      - Sup. Fig. 3C: The scale bar is barely visible.

      We resized the scale bar to make it visible and presented in Suppl. Fig.3E (previously Suppl. Fig.3C).

      - Sup. Fig. 3D: There is not figure legend or the legend to C-E is wrong.

      - Sup. Fig. 3F, G: You do not state to what the data is relative to.

      We identified an error in the Suppl.Fig.3 legend referring to specific panels. The Suppl.Fig.3 legend has been updated accordingly. New panels were added and Suppl.Fig.3-G panels are now Suppl.Fig.4C-D.

      - Sup. Fig. 3H: It seems you used a two-way ANOVA, yet state it differently. Please revise the figure legend, as Dunnett's multiple comparison would only check for significances compared to the control.

      Following the reviewer’s comment, we repeated statistical analysis to include correction for multiple comparisons and revised the figure and legend accordingly.

      - Sup. Fig. 4A, B: It is not clear what the lines depict as the legend is not explained. Names that are not required should be changed to make it clear what is depicted (e.g. "TE@" what does this refer to?)

      This previous Sup. Fig 4 is now Sup. Fig. 5. The “TE@” is a leftover label from the bioinformatics pipeline, referring to “Transposable Element”. We apologize for this confusion and have removed these extraneous labels. We have also added transposon names of the LTR (MMLV30 and RTLV4) and L1Md to Suppl.Fig.5A and 5B legend, respectively.

      - Sup. 4B: What does the y-scale on the right refer to?

      We apologize for the missing label for the y-scale on the right which represents the mRNA expression level for the SetDB1 gene, which has a much lower steady state level than the LINE L1Md, so we plotted two Y-scales to allow both the gene and transposon to be visualized on this graph.

      - Sup. 4C: Interpretation of the data is highly hindered by the fact that the scales differ between the B6 and B6.Sst1. The scales are barely visible.

      We apologize for the missing labels for the y-scales of these coverage plots, which were originally meant to just show a qualitative picture of the small RNA sequencing that was already quantitated by the total amounts in Sup. 4B. We have added thee auto-scaled Y-scales to Sup. 4C and improved the presentation of this figure.

      - Sup. Fig. 5A, B: Is the legend correct? Did you add the antibody for 2 days or is the quantification from day 3?

      We recognize that the reviewer refers to Suppl.Fig.6A-B (Suppl.Fig.7A-B in the revised manuscript). We did not add antibodies to live cells. The figure legend describes staining with 4HNE-specific antibodies 3 days post Mtb infection.

      - Sup. Fig. 8A: Are the "early" and "intermediate" lesions from the same time points? What are the definitions for these stages?

      We discussed our lesion classification according to histopathology and bacterial loads above. Of note, in the revised manuscript we simplified our classification to denote paucibacillary and multibacillary lesions only. We agree with reviewers that designation lesions as early, intermediate and advanced lesions were based on our assumptions regarding the time course of their progression from low to high bacterial loads.

      - Sup. Fig. 8E: You should state that the bottom picture is an enlargement of an area in the top one. Scale bars are missing.

      We replaced this panel with clearer images in Suppl.Fig.12B.

      - Sup. Fig. 11A: The IF staining is only visible for Iba and iNOS. Please provide single channels in order to make the other staining visible.

      Suppl.Fig.11A (now Suppl.Fig.13B) shows the low-magnification images of TB lesions. In the Fig. 7 and Suppl. Fig. 13F of the revised manuscript we provided images for individual markers.

      - Sup. Fig. 13A (Suppl.Fig.15A now): Your axis label is not clear. What do the numbers behind the genes indicate? Why did you choose oncogene signatures and not inflammatory markers to check for a correlation with disease outcome?

      X axis of Suppl.Fig.15A represent pre-defined molecular signature gene sets MSigDB) in Gene Set Enrichment Analysis (GSEA) database (https://www.gseamsigdb.org/gsea/msigdb). On Y axis is area under curve (AUC) score for each gene set.

      - Sup. 13D(Suppl.Fig.15D now): Maybe you could reorder the patients, so that the impression is clearer, as right now only the top genes seem to show a diverging gene signature, while the rest gives the impression of an equal distribution.

      The Myc upregulated gene set myc_up was identified among top gene sets associated with treatment failure using unbiased ssGSEA algorithm. We agree with the reviewer that not every gene in the myc_up gene set correlates with the treatment outcome. But the association of the gene set is statistically significant, as presented in Suppl.Fig.15B – C.

      - The scale bars for many microscopy pictures are missing.

      We have included clearly visible scale bars to all the microscopy images in the revised version.

      - The black bar plots should be changed (e.g. in color), since the single data points cannot be seen otherwise.

      - It would be advisable that a consistent color scheme would be used throughout the manuscript to make it easier to identify similar conditions, as otherwise many different colours are not required and lead right now rather to confusion (e.g. sometimes a black bar refers to BMDMs with and sometimes without TNF stimulation, or B6 BMDMs). Furthermore, plot sizes and fonts should be consistent within the manuscript (including the supplemental data)

      We followed this useful suggestion and selected consistent color codes for B6 and B6.Sst1S groups to enhance clarity throughout the revised manuscript.

      Within the methods section:

      - At which concentration did you use the IFNAR antibody and the isotype?

      We updated method section by including respective concentrations in the revised manuscript.

      - Were mice maintained under SPF conditions? At what age where they used?

      Yes, the mice are specific pathogen free. We used 10 - 14 week old mice for Mtb infection.

      - The BMDM cultivation is not clear. According to your cited paper you use LCCM but can you provide how much M-CSF it contains? How do you make sure that amounts are the same between experiments and do not vary? You do not mention how you actually obtain this conditioned medium. Is there the possibility of contamination or transferred fibroblasts that would impact on the data analysis? Is LCCM also added during stimulation and inhibitor treatment?

      We obtain LCCM by collecting the supernatant from L929 cell line that form confluent monolayer according to well-established protocols for LCCM collection. The supernatants are filtered through 0.22 micron filters to exclude contamination with L929 cells and bacteria. The medium is prepared in 500 ml batches that are sufficient for multiples experiments. Each batch of L929-conditioned medium is tested for biological activity using serial dilutions.

      - How was the BCG infection performed? How much bacteria did you use? Which BCG strain was used?

      We infected mice with M. bovis BCG Pasteur subcutaneously in the hock using 10<sup>6</sup> CFU per mouse.

      - At what density did you seed the BMDMs for stimulation and inhibitor experiments?

      In 96 well plates, we seed 12,000 cells per well and allow the cells to grow for 4 days to reach confluency (approximately 50,000 cells per well). For a 6-well plate, we seed 2.5 × 10<sup>5</sup> cells per well and culture them for 4 days to reach confluency. For a 24-well plate, we seed 50,000 cells per well and keep the cells in media for 4 days before starting any treatments. This ensures that the cells are in a proliferative or near-confluent state before beginning the stimulation or inhibitor treatments. Our detailed protocol is published in STAR Protocols (Yabaji et al., 2022; PMID 35310069).

      - What machine did you use to perform the bulk RNA sequencing? How many replicates did you include for the sequencing?

      For bulk sequencing we used 3 RNA samples for each condition. The samples were sequenced at Boston University Microarray & Sequencing Resource service using Illumina NextSeq<sup>TM</sup> 2000 instrument.

      - How many replicates were used for the scRNA sequencing? Why is your threshold for the exclusion of mitochondrial DNA so high? A typical threshold of less than 5% has been reported to work well with mouse tissue.

      We used one sample per condition. For the mitochondrial cutoff, we usually base it off of the total distribution. There is no "universal" threshold that can be applied to all datasets. Thresholds must be determined empirically.

      - You do not mention how many PCAs were considered for the scRNA sequencing analysis.

      We considered 50 PCAs, this information was added to Methods

      - You should name all the package versions you used for the scRNA sequencing (e.g. for the slingshot, VAM package)

      The following package versions were used: Seurat v4.0.4, VAM v1.0.0, Slingshot v2.3.0, SingleCellTK v2.4.1, Celda v1.10.0, we added this information to Methods.

      - You mention two batches for the human samples. Can you specify what the two batches are?

      Human blood samples were collected at five sites, as described in the updated Methods section and two RNAseq batches were processed separately that required batch correction.

      - At which temperature was the IF staining performed?

      We performed the IF at 4oC. We included the details in revised version.

      Reviewer #2 (Significance):

      Overall, the manuscript has interesting findings with regard to macrophage responses in Mycobacteria tuberculosis infection.

      However, in its current form there are several shortcomings, both with respect to the precision of the experiments and conclusions drawn.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary

      The authors use a mouse model designed to be more susceptible to M.tb (addition of sst1 locus) which has granulomatous lesions more similar to human granulomas, making this mouse highly relevant for M.tb pathogenesis studies. Using WT B6 macrophages or sst1B6 macrophages, the authors seek to understand the how the sst1 locus affects macrophage response to prolonged TNFa exposure, which can occur during a pro-inflammatory response in the lungs. Using single cell RNA-seq, revealed clusters of mutant macrophages with upregulated genes associated with oxidative stress responses and IFN-I signaling pathways when treated with TNF compared to WT macs. The authors go on to show that mutant macrophages have decreased NRF2, decreased antioxidant defense genes and less Sp110 and Sp140. Mutant macrophages are also more susceptible to lipid peroxidation and ironmediated oxidative stress. The IFN-I pathway hyperactivity is caused by the dysregulation of iron storage and antioxidant defense. These mutant macrophages are more susceptible to M.tb infection, showing they are less able to control bacterial growth even in the presence of T cells from BCG vaccinated mice. The transcription factor Myc is more highly expressed in mutant macs during TNF treatment and inhibition Myc led to better control of M.tb growth. Myc is also more abundant in PBMCs from M.tb infected humans with poor outcomes, suggesting that Myc should be further investigated as a target for host-directed therapies for tuberculosis.

      Major Comments

      Isotypes for IF imaging and confocal IF imaging are not listed, or not performed. It is a concern that the microscopy images throughout the manuscript do not have isotype controls for the primary antibodies.

      Fig 4 (and later) the anti-IFNAR Ab is used along with the Isotype antibody, Fig 4I does not show the isotype. Use of the isotype antibody is also missing in later figures as well as Fig 3J. Why was this left off as the proper control for the Ab?

      We addressed the comment in revised manuscript as described above in summary and responses to reviewers 1 and 2. Isotype controls for IFNAR1 blockade were included in Fig.3M (previously 3J), Fig. 4I, Suppl.Fig.4G (previously Fig.4I), and updated Fig.4C-E, Fig.6L-M, Suppl.Fig.4F-G, 7I.

      Conclusions drawn by the authors from some of the WB data are worded strongly, yet by eye the blots don't look as dramatically different as suggested. It would be very helpful to quantify the density of bands when making conclusions. (for example, Fig 4A).

      We added the densitometry of Western blot values after normalization above each lane in Fig.2A-C, Fig.3C-D and 3K; Fig.4A-B, Fig.5B,C,I,J.

      Fig 5A is not described clearly. If the gene expression is normalized to untreated B6 macs, then the level of untreated B6 macs should be 1. In the graph the blue bars are slightly below 1, which would not suggest that levels "initially increased and subsequently downregulated" as stated in the text. It seems like the text describes the protein expression but not the RNA expression. Please check this section and more clearly describe the results.

      We appreciate the reviewer’s comment and modified the text to specify the mRNA and protein expression data, as follows:

      “We observed that Myc was regulated in an sst1-dependent manner: in TNF-stimulated B6 wild type BMDMs, c-Myc mRNA was downregulated, while in the susceptible macrophages c-Myc mRNA was upregulated (Fig.5A). The c-Myc protein levels were also higher in the B6.Sst1S cells in unstimulated BMDMs and 6 – 12 h of TNF stimulation (Fig.5B)”.

      Also, why look at RNA through 24h but protein only through 12h? If c-myc transcripts continue to increase through 24h, it would be interesting to see if protein levels also increase at this later time point.

      The time-course of Myc expression up to 24 h is presented in new panels Fig. 5I-5J It demonstrates the decrease of Myc protein levels at 24 h. In the wild type B6 BMDMs the levels of Myc protein significantly decreased in parallel with the mRNA suppression presented in Fig.5A. In contrast , we observed the dissociation of the mRNA and protein levels in the _sst1_mutant BMDMs at 12 and 24 h, most likely, because the mutant macrophages develop integrated stress response (as shown in our previous publication by Bhattacharya et al., JCI, 2021) that is known to inhibit Myc mRNA translation.

      Fig 5J the bands look smaller after D-JNK1 treatment at 6 and 12h though in the text is says no change. Quantifying the bands here would be helpful to see if there really is no difference.

      This experiment was repeated twice, and the average normalized densitometry values are presented in the updated Fig.5J. The main question addressed in this experiment was whether the hyperactivity of JNK in TNF-stimulated sst1 mutant macrophages contributed to Myc upregulation, as was previously shown in cancer. Comparing effects of JNK inhibition on phospho-cJun and c-Myc protein levels in TNF stimulated B6.Sst1S macrophages (updated Fig.5J), we concluded that JNK did not have a major role in c-Myc upregulation in this context.

      Section 4, third paragraph, the conclusion that JNK activation in mutant macs drives pathways downstream of Myc are not supported here. Are there data or other literature from the lab that supports this claim?

      This statement was based on evidence from available literature where JNK was shown to activate oncogens, including Myc. In addition, inhibition of Myc in our model upregulated ferritin (Fig.Fig.5C), reduced the labile iron pool, prevented the LPO accumulation (Fig.5D - G) and inhibited stress markers (Fig.5H). However, we do not have direct experimental evidence in our model that Myc inhibition reduces ASK1 and JNK activities. Hence, we removed this statement from the text and plan to investigate this in the future.

      Fig 6N Please provide further rationale for the BCG in vivo experiment. It is unclear what the hypothesis was for this experiment.

      In the current version BCG vaccination data is presented in Suppl.Fig.14B. We demonstrate that stressed BMDMs do not respond to activation by BCG-specific T cells (Fig.6J) and their unresponsiveness is mediated by type I interferon (Fig.6L and 6M). The observed accumulation of the stressed macrophages in pulmonary TB lesions of the sst1-susceptible mice (Fig.7E, Suppl.Fig.13 and 14A) and the upregulation of type I interferon pathway (Fig.1E,1G, 7C), Suppl.Fig.1C and 11) suggested that the effect of further boosting T lymphocytes using BCG in Mtb-infected mice will be neutralized due to the macrophage unresponsiveness. This experiment provides a novel insight explaining why BCG vaccine may not be efficient against pulmonary TB in susceptible hosts.

      The in vitro work is all concerning treatment with TNFa and how this exposure modifies the responses in B6 vs sst1B6 macrophages; however, this is not explored in the in vivo studies. Are there differences in TNFa levels in the pauci- vs multi-bacillary lesions that lead to (or correlate with) the accumulation of peroxidation products in the intralesional macrophages. How to the experiments with TNFa in vitro relate back to how the macrophages are responding in vivo during infection?

      Our investigation of mechanisms of necrosis of TB granulomas stems from and supported by in vivo studies as summarized below.

      This work started with the characterization necrotic TB granulomas in C3HeB/FeJ mice in vivo followed by a classical forward genetic analysis of susceptibility to virulent Mtb in vivo.

      That led to the discovery of the sst1 locus and demonstration that it plays a dominant role in the formation of necrotic TB granulomas in mouse lungs in vivo. Using genetic and immunological approaches we demonstrated that the sst1 susceptibility allele controls macrophage function in vivo (Yan, et al., J.Immunol. 2007) and an aberrant macrophage activation by TNF and increased production of Ifn-b in vitro (He et al. Plos Pathogens, 2013). In collaboration with the Vance lab we demonstrated that the type I IFN receptor inactivation reduced the susceptibility to intracellular bacteria of the sst1-susceptible mice in vivo (Ji et al., Nature Microbiology, 2019). Next, we demonstrated that the Ifnb1 mRNA superinduction results from combined effects of TNF and JNK leading to integrated stress response in vitro (Bhattacharya, JCI, 2021). Thus, our previous work started with extensive characterization of the in vivo phenotype that led to the identification of the underlying macrophage deficiency that allowed for the detailed characterization of the macrophage phenotype in vitro presented in this manuscript. In a separate study, the Sher lab confirmed our conclusions and their in vivo relevance using Bach1 knockout in the sst1-susceptible B6.Sst1S background, where boosting antioxidant defense by Bach1 inactivation resulted in decreased type I interferon pathway activity and reduced granuloma necrosis. We have chosen TNF stimulation for our in vitro studies because this cytokine is most relevant for the formation and maintenance of the integrity of TB granulomas in vivo as shown in mice, non-human primates and humans. Here we demonstrate that although TNF is necessary for host resistance to virulent Mtb, its activity is insufficient for full protection of the susceptible hosts, because of altered macrophages responsiveness to TNF. Thus, our exploration of the necrosis of TB granulomas encompass both in vitro and extensive in vivo studies.

      Minor comments

      Introduction, while well written, is longer than necessary. Consider shortening this section. Throughout figures, many graphs show a fold induction/accumulation/etc, but it is rarely specified what the internal control is for each graph. This needs to be added.

      Paragraph one, authors use the phrase "the entire IFN pathway was dramatically upregulated..." seems to be an exaggeration. How do you know the "entire" IFN pathway was upregulated in a dramatic fashion?

      (1) We shortened the introduction and discussion; (2) verified that figure legends internal controls that were used to calculate fold induction; (3) removed the word “entire” to avoid overinterpretation.

      Figures 1E, G and H and supp fig 1C, the heat maps are missing an expression key Section 2 second paragraph refers to figs 2D, E as cytoplasmic in the text, but figure legend and y-axis of 2E show total protein.

      The expression keys were added to Fig.1E,G,H, Fig.7C, Suppl.Fig.1C and 1D and Suppl.Fig.11A of the revised manuscript.

      Section 3 end of paragraph 1 refers to Fig 3h. Does this also refer to Supp Fig 3E?

      Yes, Fig.3H shows microscopy of 4-HNE and Suppl.Fig.3H shows quantification of the image analysis. In the revised manuscript these data are presented in Fig.3H and Suppl.Fig.3F. The text was modified to reflect this change.

      Supplemental Fig 3 legend for C-E seems to incorrectly also reference F and G.

      We corrected this error in the figure legend. New panels were added to Suppl.Fig.3 and previous Suppl.Fig.3F and G were moved to Suppl.Fig.4 panels C and D of the revise version.

      Fig 3K, the p-cJun was inhibited with the JNK inhibitor, however it’s unclear why this was done or the conclusion drawn from this experiment. Use of the JNK inhibitor is not discussed in the text.

      The JNK inhibitor was used to confirm that c-Jun phosphorylation in our studies is mediated by JNK and to compare effects of JNK inhibition on phospho-cJun and Myc expression. This experiment demonstrated that the JNK inhibitor effectively inhibited c-Jun phosphorylation but not Myc upregulation, as shown in Fig.5I-J of the revised manuscript.

      Fig 4 I and Supp Fig 3 H seem to have been swapped? The graph in Fig 4I matches the images in Supp Fig 3I. Please check.

      We reorganized the panels to provide microscopy images and corresponding quantification together in the revised the panels Fig. 4H and Fig. 4I, as well as in Suppl. Fig. 4F and Suppl. Fig. 4G.

      Fig 6, it is unclear what % cell number means. Also for bacterial growth, the data are fold change compared to what internal control?

      We updated Fig.6 legend to indicate that the cell number percentages were calculated based on the number of cells at Day 0 (immediately after Mtb infection). We routinely use fixable cell death staining to enumerate cell death. Brief protocol containing this information is included in Methods section. The detailed protocol including normalization using BCG spike has been published – Yabaji et al, STAR Protocols, 2022. Here we did not present dead cell percentage as it remained low and we did not observe damage to macrophage monolayers. This allows us to exclude artifacts due to cell loss. The fold change of Mtb was calculated after normalization using Mtb load at Day 0 after infection and washes.

      Fig 7B needs an expression key

      The expression keys was added to Fig.7C (previously Fig. 7B).

      Supp Fig 7 and Supp Fig 8A, what do the arrows indicate?

      In Suppl.Fig.8 (previously Suppl.Fig.7) the arrows indicate acid fast bacilli (Mtb). In figures Fig.7A and Suppl.Fig.9A arrows indicate Mtb expressing fluorescent reporter mCherry. Corresponding figure legends were updated in the revised version.

      Supp Fig 9A, two ROI appear to be outlined in white, not just 1 as the legend says Methods:

      We updated the figure legend.

      Certain items are listed in the Reagents section that are not used in the manuscript, such as necrostatin-1 or Z-VAD-FMK. Please carefully check the methods to ensure extra items or missing items does not occur.

      These experiments were performed, but not included in the final manuscript. Hence, we removed the “necrostatin-1 or Z-VAD-FMK” from the reagents section in methods of revised version.

      Western blot, method of visualizing/imaging bands is not provided, method of quantifying density is not provided, though this was done for fig 5C and should be performed for the other WBs.

      We used GE ImageQuant LAS4000 Multi-Mode Imager to acquire the Western blot images and the densitometric analyses were performed by area quantification using ImageJ. We included this information in the method section. We added the densitometry of Western blot values after normalization above each lane in Fig.2A-C, Fig.3C-D and 3K; Fig.4A-B, Fig.5B,C,I,J.

      Reviewer #3 (Significance):

      The work of Yabaji et al is of high significance to the field of macrophage biology and M.tb pathogenesis in macrophages. This work builds from previously published work (Bhattacharya 2021) in which the authors first identified the aberrant response induced by TNF in sst1 mutant macrophages. Better understanding how macrophages with the sst1 locus respond not only to bacterial infection but stimulation with relevant ligands such as TNF will aid the field in identifying biomarkers for TB, biomarkers that can suggest a poor outcome vs. "cure" in response to antibiotic treatment or design of host-directed therapies.

      This work will be of interest to those who study macrophage biology and who study M.tb pathogenesis and tuberculosis in particular. This study expands the knowledge already gained on the sst1 locus to further determine how early macrophage responses are shaped that can ultimately determine disease progression.

      Strengths of the study include the methodologies, employing both bulk and single cell-RNA seq to answer specific questions. Data are analyze using automated methods (such as HALO) to eliminated bias. The experiments are well planned and designed to determine the mechanisms behind the increased iron-related oxidative stress found in the mutant macrophages following TNF treatment. Also, in vivo studies were performed to validate some of the in vitro work. Examining pauci-bacillary lesions vs multi-bacillary lesions and spatial transcriptomics is a significant strength of this work. The inclusion of human data is another strength of the study, showing increased Myc in humans with poor response to antibiotics for TB.

      Limitations include the fact that the work is all done with BMDMs. Use of alveolar macrophages from the mice would be a more relevant cell type for M.tb studies. AMs are less inflammatory, therefore treatment with TNF of AMs could result in different results compared to BMDMs. Reviewer's field of expertise: macrophage activation, M.tb pathogenesis in human and mouse models, cell signaling.

      Limitations: not qualified to evaluate single cell or bulk RNA-seq technical analysis/methodology or spatial transcriptomics analysis.

    1. Within the context of computer systems and networks, “digital” refers to the encoding of information in binary digits (bits), which may occupy only two distinct states (on or off, 1 or 0)

      I can understand why coding is only 1 and 0 because than people who work on codes for computers or do coding might get mix up with different numbers, if they had decide to use all of the numbers. But I am still wonder how a coder can't get mix up on different codes or how they already know what that certain code mean.

    1. Reviewer #3 (Public review):

      Summary:

      In this well-written manuscript, Unitt and colleagues propose a new, hierarchical nomenclature system for the pathogen Neisseria gonorrhoeae. The proposed nomenclature addresses a longstanding problem in N. gonorrhoeae genomics, namely that the highly recombinant population complicates typing schemes based on only a few loci and that previous typing systems, even those based on the core genome, group strains at only one level of genomic divergence without a system for clustering sequence types together. In this work, the authors have revised the core genome MLST scheme for N. gonorrhoeae and devised life identification numbers (LIN) codes to describe the N. gonorrhoeae population structure.

      Strengths:

      The LIN codes proposed in this manuscript are congruent with previous typing methods for Neisseria gonorrhoeae, like cgMLST groups, Ng-STAR, and NG-MAST. Importantly, they improve upon many of these methods as the LIN codes are also congruent with the phylogeny and represent monophyletic lineages/sublineages.

      The LIN code assignment has been implemented in PubMLST, allowing other researchers to assign LIN codes to new assemblies and put genomes of interest in context with global datasets.

      Weaknesses:

      The authors correctly highlight that cgMLST-based clusters can be fused due to "intermediate isolates" generated through processes like horizontal gene transfer. However, the LIN codes proposed here are also based on single linkage clustering of cgMLST at multiple levels. It is unclear if future recombination or sequencing of previously unsampled diversity within N. gonorrhoeae merges together higher-level clusters, and if so, how this will impact the stability of the nomenclature.

      The authors have defined higher resolution thresholds for the LIN code scheme. However, they do not investigate how these levels correspond to previously identified transmission clusters from genomic epidemiology studies. It would be useful for future users of the scheme to know the relevant LIN code thresholds for these investigations.

    2. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Bacterial species that frequently undergo horizontal gene transfer events tend to have genomes that approach linkage equilibrium, making it challenging to analyze population structure and establish the relationships between isolates. To overcome this problem, researchers have established several effective schemes for analyzing N. gonorrhoeae isolates, including MLST and NG-STAR. This report shows that Life Identification Number (LIN) Codes provide for a robust and improved discrimination between different N. gonorrhoeae isolates.

      Strengths:

      The description of the system is clear, the analysis is convincing, and the comparisons to other methods show the improvements offered by LIN Codes.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

      We thank the reviewer for their assessment of our paper.

      Reviewer #2 (Public review):

      Summary:

      This paper describes a new approach for analyzing genome sequences.

      Strengths:

      The work was performed with great rigor and provides much greater insights than earlier classification systems.

      Weaknesses:

      A minor weakness is that the clinical application of LIN coding could be articulated in a more in-depth way. The LIN coding system is very impressive and is certainly superior to other protocols. My recommendation, although not necessary for this paper, is that the authors expand their analysis to noncoding sequences, especially those upstream of open reading frames. In this respect, important cis-acting regulatory mutations that might help to further distinguish strains could be identified.

      We thank the reviewer for their comments. LIN code could be applied clinically, for example in the analysis of antibiotic resistant isolates, or to investigate outbreaks associated with a particular lineage. We will update the text to describe this more thoroughly.

      In regards to non-coding sequences: unfortunately, intergenic regions are generally unsuitable for use in typing systems as (i) they are subject to phase variation, which can occlude relationships based on descent; (ii) they are inherently difficult to assemble and therefore can introduce variation due to the sequencing procedure rather than biology. For the type of variant typing that LIN code represents, which aims to replicate phylogenetic clustering, protein encoding sequences are the best choice for convenience, stability, and accuracy. This is not to say that it is not a valid object to base a nomenclature on intergenic regions, which might be especially suitable for predicting some phenotypic characters, but this will still be subject to problem (ii), depending on the sequencing technology used.  Such a nomenclature system should stand beside, rather than be combined with or used in place of, phylogenetic typing. However, we could certainly investigate the relationship between an isolates LIN code and regulatory mutations in the future.

      Reviewer #3 (Public review):

      Summary:

      In this well-written manuscript, Unitt and colleagues propose a new, hierarchical nomenclature system for the pathogen Neisseria gonorrhoeae. The proposed nomenclature addresses a longstanding problem in N. gonorrhoeae genomics, namely that the highly recombinant population complicates typing schemes based on only a few loci and that previous typing systems, even those based on the core genome, group strains at only one level of genomic divergence without a system for clustering sequence types together. In this work, the authors have revised the core genome MLST scheme for N. gonorrhoeae and devised life identification numbers (LIN) codes to describe the N. gonorrhoeae population structure.

      Strengths:

      The LIN codes proposed in this manuscript are congruent with previous typing methods for Neisseria gonorrhea, like cgMLST groups, Ng-STAR, and NG-MAST. Importantly, they improve upon many of these methods as the LIN codes are also congruent with the phylogeny and represent monophyletic lineages/sublineages.

      The LIN code assignment has been implemented in PubMLST, allowing other researchers to assign LIN codes to new assemblies and put genomes of interest in context with global datasets.

      Weaknesses:

      The authors correctly highlight that cgMLST-based clusters can be fused due n to "intermediate isolates" generated through processes like horizontal gene transfer. However, the LIN codes proposed here are also based on single linkage clustering of cgMLST at multiple levels. It is unclear if future recombination or sequencing of previously unsampled diversity within N. gonorrhoeae merges together higher-level clusters, and if so, how this will impact the stability of the nomenclature.

      The authors have defined higher resolution thresholds for the LIN code scheme. However, they do not investigate how these levels correspond to previously identified transmission clusters from genomic epidemiology studies. It would be useful for future users of the scheme to know the relevant LIN code thresholds for these investigations.

      We thank the reviewer for their insightful comments. LIN codes do use multi-level single linkage clustering to define the cluster number of isolates. However, unlike previous applications of simple single linkage clustering such as N. gonorrhoeae core genome groups (Harrison et al., 2020), once assigned in LIN code, these cluster numbers are fixed within an unchanging barcode assigned to each isolate. Therefore, the nomenclature is stable, as the addition of new isolates cannot change previously established LIN codes.

      Cluster stability was considered during the selection of allelic mismatch thresholds. By choosing thresholds based on natural breaks in population structure (Figure 3), applying clustering statistics such as the silhouette score, and by assessing where cluster stability has been maintained within the previous core genome groups nomenclature, we can have confidence that the thresholds which we have selected will form stable clusters. For example, with core genome groups there has been significant group fusion with clusters formed at a threshold of 400 allelic differences, while clustering at a threshold of 300 allelic differences has remained cohesive over time (supported by a high silhouette score) and so was selected as an important threshold in the gonococcal LIN code. LIN codes have now been applied to >27000 isolates in PubMLST, and the nomenclature has remained effective despite the continual addition of new isolates to this collection. The manuscript will be revised to emphasise these points.

      Work is in progress to explore what LIN code thresholds are generally associated with transmission chains. These will likely be the last 7 thresholds (25, 10, 7, 5, 3, 1, 0) as previous work has suggested that isolates linked by transmission within one year are associated with <14 single nucleotide polymorphism differences (De Silva et al., 2016). The results of this analysis will be described in a future article, currently in preparation.

      Harrison, O.B., et al. Neisseria gonorrhoeae Population Genomics: Use of the Gonococcal Core Genome to Improve Surveillance of Antimicrobial Resistance. The Journal of Infectious Diseases 2020.

      De Silva, D., et al. Whole-genome sequencing to determine transmission of Neisseria gonorrhoeae: an observational study. The Lancet Infectious Diseases 2016;16(11):1295-1303.

    1. 1996 was used because this was the year social justice moved to the forefront of the Code of Ethics.

      With the elevation of social justice driven specifically by the 1996 NASW Code of Ethics revision, social workers now have an explicit commitment to systemic advocacy for welfare reform as well as multicultural and civil rights struggles.

    1. Vous trouverez sur GitHub la base de code nécessaire pour réaliser l’exercice sur la branche P3C2-exercice. Voici la solution sur la branche P3C2-solution pour vérifier si vous aviez vu juste.

      Bonjour ! je n'ai bien compris pourquoi dans le code css dans la correction le selecteur "cta", on lui ajoute du PADDING : 20PX 30PX alors que ce n'est pas demandé dans l'exercice de Robbie Lens

    1. AbstractBackground The central bearded dragon (Pogona vitticeps) is widely distributed in central eastern Australia and adapts readily to captivity. Among other attributes, it is distinctive because it undergoes sex reversal from ZZ genotypic males to phenotypic females at high incubation temperatures. Here, we report an annotated telomere to telomere phased assembly of the genome of a female ZW central bearded dragon.Results Genome assembly length is 1.75 Gbp with a scaffold N50 of 266.2 Mbp, N90 of 28.1 Mbp, 26 gaps and 42.2% GC content. Most (99.6%) of the reference assembly is scaffolded into 6 macrochromosomes and 10 microchromosomes, including the Z and W microchromosomes, corresponding to the karyotype. The genome assembly exceeds standard recommended by the Earth Biogenome Project (6CQ40): 0.003% collapsed sequence, 0.03% false expansions, 99.8% k-mer completeness, 97.9% complete single copy BUSCO genes and an average of 93.5% of transcriptome data mappable back to the genome assembly. The mitochondrial genome (16,731 bp) and the model rDNA repeat unit (length 9.5 Kbp) were assembled. Male vertebrate sex genes Amh and Amhr2 were discovered as copies in the small non-recombining region of the Z chromosome, absent from the W chromosome.This, coupled with the prior discovery of differential Z and W transcriptional isoform composition arising from pseudoautosomal sex gene Nr5a1, suggests that complex interactions between these genes, their autosomal copies and their resultant transcription factors and intermediaries, determines sex in the bearded dragon.Conclusion This high-quality assembly will serve as a resource to enable and accelerate research into the unusual reproductive attributes of this species and for comparative studies across the Agamidae and reptiles more generally.Species Taxonomy Eukaryota; Animalia; Chordata; Reptilia; Squamata; Iguania; Agamidae; Amphibolurinae; Pogona; Pogona vitticeps (Ahl, 1926) (NCBI:txid103695).

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf085), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 2: Yuan Li

      The authors de novo assembled a telomere to telomere phased genome assembly of the Australian central bearded dragon Pogona vitticeps, using PacBio HiFi, ONT, HiC, and Illumina sequencing platforms. The assembly achieves remarkable contiguity (scaffold N50: 266.2 Mb) and completeness (97.9% BUSCO score), surpassing Earth Biogenome Project standards. The phased assembly of sex chromosomes (Z/W) and identification of candidate sex-determining genes (Amh, Amhr2, and Nr5a1) provide valuable insights into reptilian sex determination. Overall, the study is well-executed and provides a valuable resource for comparative genomics and reproductive biology.

      Major concern: 1.The description of read depth had errors at lines 401-402, such as 60.6x. In addition, "4 x promethION", "2x150 bp" were should be revised and please check and revise all the similar description in the manuscript. 2.There are errors in the citation format of the journal references, such as the absence of punctuation "."marks between the title name and the journal name at lines 1005-1009, mixing abbreviations (e.g., "PNAS" vs. "Proceedings of the National Academy of Sciences USA") (lines 988-990, 1005-1009). Please check carefully the format of all references. 3.The script "calculateGC.py and processtrftelo.py" (lines 242 and 245) are mentioned without code availability or parameter details. Provide effective links or repository access. 4.The inconsistent use of "Gb" and "Gbp" is observed; it is recommended to adopt a unified description. 5.Units were missing in the descriptions in multiple places in Table 1 and 2, such as the unit for "Total Bases" and "Assembly length"; please include them. 6.At lines 683-687, the conclusion that Amh/Amhr2 are sex-determining genes relies solely on positional evidence. Discuss the need for functional studies (e.g., CRISPR knockouts) to strengthen claims. 7.There were errors in "Vasimuddin et al. 2019" (line 238) and "Danecek et al. 2021" (line 239). Please check all the other formats of references. 8.At lines 476-481, BAC mappings are cited as validation but lack visual evidence (e.g., alignment plots in figures or supplements). Please verify the accuracy of Figure 7 at line 478, as it does not correspond with the description.

    1. Reviewer #2 (Public review):

      This paper describes an analysis of a commercially available panel for a spatial transcriptomic approach and introduces a computational tool to predict potential off-target binding sites for the type of probe used in the aforementioned panel. The performance of the prediction tool was validated by examining a dataset that profiled the same cancer tissue with multiple modalities. Finally, a detailed analysis of the potential pitfalls in a published study communicated by the company that commercialized the spatial transcriptomic platform in question is provided, along with best practice guidelines for future studies to follow.

      Strengths:

      The manuscript is clearly written and easy to follow.

      The authors provide clean, organized, and well-documented code in the associated GitHub repository.

      Weaknesses:

      The manuscript section on the software tool feels underdeveloped.

    1. ☑️ peer.gos.ck-editor needs to set title so that annotations can show it

      currently it is done by manually adding a title tag in the source code for the document that is saved in Peergos

      where all the html and javascript encluses the source of the HTML document so the editor/capbiity gets loaded wwith the saved HTML content

    1. You need to claim ownership on Visual Studio Code's installation directory, by running this command: sudo chown -R $(whoami) "$(which code)" sudo chown -R $(whoami) /usr/share/code

      "claim ownership"

  3. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
    1. 6 different JUnit tests for each function future-proofing development

      You definitely want more unit tests, and also, unit tests do not future-proof anything! It just makes developers know "oh crap, the code should follow this path instead of the other way around" if they make any dumb breaking changes. If your library changes, that can be potentially future-breaking. Same thing with deployments as well!

    2. Revised, modularized, and updated old code bases to modern code bases.

      What does this mean? I'd probably prefer you to talk about the infrastructure rather than "I wrote new codes and sometimes written stuff over pre-written code"

    1. but this is the driving range and nobody official from the club has mentioned anything to us.

      but this is the driving range and nobody official from the club has mentioned anything to us.

      In this excerpt, the speaker argues that they are at the driving range and emphasizes that no official from the club has raised concerns about their dress code. This suggests they believe their attire is acceptable as it aligns with the informal setting of the driving range, highlighting a disconnect between personal views on dress and perceived official standards.

      在这段摘录中,讲述者主张他们身处练习场,并强调俱乐部的任何官方人员都没有对他们的着装提出过担忧。 这表明他们认为自己的着装是可以接受的,因为这符合练习场的非正式氛围,突显了个人对着装的看法与官方标准之间的脱节。

    1. (c) Code Sets.-- ``(1) In general.--The Secretary shall adopt standards that-- ``(A) select code sets for appropriate data elements for the transactions referred to in subsection (a)(1) from among the code sets that have been developed by private and public entities; or ``(B) establish code sets for such data elements if no code sets for the data elements have been developed. ``(2) Distribution.--The Secretary shall establish efficient and low-cost procedures for distribution (including electronic distribution) of code sets and modifications made to such code sets under section 1174(b).

      This is the HIPPA right to identify codesets etc.

    1. We find the statedadvantages of a 10-position numericidentifier convincing. We have revisedproposed § 142.402 (now § 162.406(a))to provide that the NPI will be a 10position numeric identifier, with the10th position being an ISO standardcheck digit.

      NPI mandated to be a 10 digit code with a check digit as per the ISO standard.

    2. We define two categories of healthcare providers for enumerationpurposes. A data element, the ‘‘Entitytype code,’’ in the NPS record for eachhealth care provider will indicate theappropriate category.• NPIs with an ‘‘Entity type code’’ of1 will be issued to health care providerswho are individual human beings.Examples of health care providers withan ‘‘Entity type code’’ of 1 arephysicians, dentists, nurses,chiropractors, pharmacists, and physicaltherapists.• NPIs with an ‘‘Entity type code’’ of2 will be issued to health care providersother than individual human beings,that is, organizations. Examples ofhealth care provider organizations withan ‘‘Entity type code’’ of 2 are: hospitals;home health agencies; clinics; nursinghomes; residential treatment centers;laboratories; ambulance companies;group practices; health maintenanceorganizations; suppliers of durablemedical equipment, supplies related tohealth care, prosthetics, and orthotics;and pharmacies

      Type 1 (individual) and Type 2 (organizational) are mandated here.

    1. 実装が完了したら、Listing 30のようにコミットを指示すると、Claude Code が変更内容を分析して適切なコミットメッセージを作成し、コミットを実行します。

      Claude Codeが作成してくれたコミットメッセージも見たいです。

    1. セキュリティを考慮した設定

      何も設定しないと全コマンド実行される感じでしょうか?allowのリストは必要に応じて作るって感じなんですかね? というところが気になりました。

      ちなみに、Kiroだと使ったことがないコマンドは毎回止まって、claude codeみたいに "mkdir *" のように設定を加えていくホワイトリスト方式な感じです。

    1. Essential derivations are contained within the main text, without reference to appendices. For a more interlinked reading experience, the thesis is also available in an online form at https://redeboer.github.io/phd-thesis, the source code of which can be easily reused and extended.

      Also deine Motivation für deine Arbeit ist es die Arbeit in nem geilen Format zu schrieben so dass es jeder versteht?:D Ich würd das vielleicht eher auf das Framework verallgemeinern? Also deine Motivation war es ja nen Framework zu bauen was self-documenting workflow hat.

    1. https://bytes.vadeai.com/how-one-clojure-function-destroyed-agent-framework-completely/

      How Clojure's iteration Function Disrupts Agent Frameworks

      Agent Framework Pitfalls

      Traditional agent frameworks like CrewAI introduce significant complexity with configuration files, rigid agent definitions, and orchestration mechanics. They require developers to manage state, orchestrate tasks, handle errors, and manage resources, but still abstract away critical decisions, making debugging and customization difficult[1].

      The Simplicity of iteration

      Clojure 1.11 introduced the iteration function, which models any sequential, stateful process — including agentic workflows — far more simply than the framework approach. Its key parameters:

      • step: Does the work (e.g., LLM call, tool execution)
      • initk: Starting state (prompt, initial data)
      • vf: Extracts the meaningful result from each step
      • kf: Determines the next state for the following iteration
      • somef: Decides if the workflow continues or stops

      This aligns perfectly with agentic workflows: - step: agent action - initk: initial task/state - vf: extract agent output - kf: update agent context/state - somef: goal/termination checker[1]

      Real World Example

      A basic agent workflow with iteration in Clojure:

      clojure (defn simple-agent-workflow [initial-prompt max-iterations] (let [llm-instance (create-llm-instance) step-fn (fn [{:keys [iteration prompt response]}] (when (< iteration max-iterations) (let [messages [(create-message :user prompt)] new-response (generate llm-instance messages) next-prompt (extract-next-task new-response)] {:iteration iteration :prompt prompt :response new-response :next-token {:iteration (inc iteration) :prompt next-prompt :response new-response}})))] (iteration step-fn :somef (fn [res] (some? res)) :vf identity :kf :next-token :initk {:iteration 0 :prompt initial-prompt :response {}})))

      Production variants in Vade AI simply expand this pattern for live API streaming, logging, and complex branching, without introducing unnecessary abstraction or opaque state[1].

      Benefits Over Frameworks

      • Complete control: Every workflow step and state transition is transparent and customizable.
      • Easy debugging: Print or inspect state at any moment; no special debugging tools needed.
      • Flexible termination: Workflow can halt based on any custom logic, not just predefined callbacks.
      • Resource efficiency: No framework overhead, predictable and low memory footprint.
      • Streaming and real-time: Can process operations incrementally as LLM responses stream in, with immediate visibility for the user[1].
      • Composability: Integrates natively with the rest of Clojure — no framework lock-in.

      When to Use This Approach

      The iteration pattern is ideal when you need:

      • Custom agent behaviors
      • Transparent workflows
      • Performance optimization
      • Complex branching or termination logic
      • Deep integration with Clojure systems

      Especially powerful for research and analysis, planning systems, validation pipelines, and unique business logic that standard frameworks struggle to express[1].

      Key Takeaways

      Frameworks often create more complexity than they solve. By embracing Clojure’s iteration, you implement agentic workflows with less code, greater clarity, and full control. This enables adaptive, resource-aware, and highly debuggable systems — proven at scale inside Vade AI[1].

      Citations: [1] How One Clojure Function Destroyed Agent Framework Completely https://bytes.vadeai.com/how-one-clojure-function-destroyed-agent-framework-completely/

  4. siraj-samsudeen.github.io siraj-samsudeen.github.io
    1. # This fails at compile time (missing required field) %User{name: "Alice"} # Error: missing required key :email

      It says %User{name: "Alice"} fails at compile time with a missing required field error, but when I tested it, it works fine and just sets the other fields to nil. There doesn't seem to be any required field validation happening here.

    2. Enum.reduce(numbers, fn x, acc -> x + acc end)

      The Python example uses reduce with a default value (0), but the Elixir translation uses Enum.reduce/2 without a default value. For a true equivalent, it should use Enum.reduce(numbers, 0, fn acc, x -> acc + x end) to match the Python behavior.

    3. |> System.system_time()

      Logic error: Cannot pipe to System.system_time() - this function doesn't accept string input.

      • According to Elixir docs: System.system_time() :: integer() - takes no arguments and returns an integer
      • System.system_time(unit) variant takes only a time unit parameter, not piped string data
    4. Required for zero-argument functions data |> String.trim()

      String.trim() is NOT zero-argument

      • String.trim/1 requires a string argument
      • String.trim/2 optionally takes characters to trim

      Comment "zero-argument functions" is wrong

    5. .title()

      The Python example uses .title(), but the Elixir version uses String.capitalize() - these don't produce the same output. Should probably use .capitalize() in Python for a true equivalent. since Elixir doesn't have a title() function.

      The outputs will also be different because of this.

    6. def clean_name(raw_name) do raw_name |> String.trim() |> String.capitalize() end name = " alice johnson " result = clean_name(name) IO.puts(result) # "Alice Johnson"

      The def syntax keeps appearing in examples, but I can't replicate it in the iex terminal. It seems like this only works in .ex files, but there's no explanation on how to create and run those files.

    7. I can see you got 7 words instead of the expected 6, but let me check your logic… Looking at your final filtered list: ["Hello", "world", "Elixir", "love", "programming", "Code", "every"] You used >= 4 (4 or more characters), since exercise asked for filtering out words “shorter than 4 characters”, which means keeping words with 4 or more characters - so your logic is actually correct!

      In the exercise above, it says 'Expected: 7', but here it says 'you got 7 words instead of the expected 6'. This part seems unnecessary.

    8. iex(67)> sentences = Enum.map(sentences, fn x -> String.split(x) end) [ ["Hello", "world"], ["Elixir", "is", "fun"], ["I", "love", "programming"], ["Code", "every", "day"] ] iex(73)> words = Enum.filter(words, fn x -> String.length(x) >= 4 end) ["Hello", "world", "Elixir", "love", "programming", "Code", "every"] iex(74)> length(words) 7

      This example missed out List.flatten part

    9. # With parentheses (also valid) length([1, 2, 3]) # Also works IO.puts("hello") # Also works String.upcase("hello") # Also works

      The parentheses examples could be shown first, since that's what Python devs are already familiar with, then introduce the optional syntax.

    10. # def is also used to create named functions def add(x, y) do x + y end

      As per the above explanation, Everything returns a value in Elixir, isn't the def a statement just as in Python in the above example

    11. This works in Elixir (but fails in Python):

      I got a CompileError when using the ^. ```elixir iex(7)> x = 6 6 iex(8)> 6 = x 6 iex(9)> 6 = 5 + 1 6 iex(10)> a = 5 5 iex(11)> ^x = ^a + 1 error: misplaced operator ^a

      The pin operator ^ is supported only inside matches or inside custom macros. Make sure you are inside a match or all necessary macros have been required └─ iex:11

      ** (CompileError) cannot compile code (errors have been logged) ```

    12. a + 1 = x

      I tried the example, but a + 1 = x throws a compile error. The book says this should work 'both directions like algebra', but Elixir won't let me put an expression on the left side of the match. Am I missing something, or is this example incorrect?

    13. go to this page, install elixir and do the steps mentioned there to say hello world interactively in iex and by running an elixir code file. Elixir Installation Guide

      The essential installation link is provided at the end after all the instructions. Is it okay?

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The authors demonstrate that female Spodoptera littoralis moths prefer to oviposit on wellwatered tomato plants and avoid drought-stressed plants. The study then recorded the sounds produced by drought-stressed plants and found that they produce 30 ultrasonic clicks per minute. Thereafter, the authors tested the response of female S. littoralis moths to clicks with a frequency of 60 clicks per minute in an arena with and without plants and in an arena setting with two healthy plants of which one was associated with 60 clicks per minute. These experiments revealed that in the absence of a plant, the moths preferred to lay eggs on the side of the area in which the clicks could be heard, while in the presence of a plant the S. littoralis females preferred to oviposit on the plant where the clicks were not audible. In addition, the authors also tested the response of S. littoralis females in which the tympanic membrane had been pierced making the moths unable to detect the click sounds. As hypothesised, these females placed their eggs equally on both sites of the area.

      Finally, the authors explored whether the female oviposition choice might be influenced by the courtship calls of S. littoralis males which emit clicks in a range similar to a drought-stressed tomato plant. However, no effect was found of the clicks from ten males on the oviposition behaviour of the female moths, indicating that the females can distinguish between the two types of clicks. Besides these different experiments, the authors also investigated the distribution of egg clusters within a longer arena without a plant, but with a sugar-water feeder. Here it was found that the egg clusters were mostly aggregated around the feeder and the speaker producing 60 clicks per minute. Lastly, video tracking was used to observe the behaviour of the area without a plant, which demonstrated

      that the moths gradually spent more time at the arena side with the click sounds.

      We thank the reviewers for their helpful comments. We agree with the summary, but would like to note that in the control experiment (Figure 2) we used a click rate of 30 clicks per minute—a design choice driven by the editor’s feedback. We have clarified this and, to further probe the system’s dynamics, added a second experiment employing the same click rate (30 clicks per minute) with a dehydrated plant (see details below). In both experiments, females again showed a clear tendency to oviposit nearer the speaker; these findings are described in the updated manuscript.

      (2) The study addresses a very interesting question by asking whether female moths incorporate plant acoustic signals into their oviposition choice, unfortunately, I find it very difficult to judge how big the influence of the sound on the female choice really is as the manuscript does not provide any graphs showing the real numbers of eggs laid on the different plants, but instead only provides graphs with the Bayesian model fittings for each of the experiments. In addition, the numbers given in the text seem to be relatively similar with large variations e.g. Figure 1B3: 1.8 {plus minus} 1.6 vs. 1.1 {plus minus} 1.0. Furthermore, the authors do not provide access to any of the raw data or scripts of this study, which also makes it difficult to assess the potential impact of this study. Hence, I would very much like to encourage the authors to provide figures showing the measured values as boxplots including the individual data points, especially in Figure 1, and to provide access to all the raw data underlying the figures.

      We acknowledge that there are researchers who favor Bayesian graphical representation versus raw data visualization. Therefore, we have added chartplots of the raw data from Figure 1 in the supplementary section. We are aware of the duplication in presentation and apologize for this redundancy.  

      Regarding the variance and means we obtained in our experiment, we have analyzed all raw data using the statistical model presented, and if statistical significance was found despite a particular mean difference or variance, this is meaningful from a biological perspective. One can certainly discuss whether this difference has biological importance, but it should be remembered that in this experimental system, we are trying to isolate the acoustic signal from a complex system that includes multiple signals. Therefore, at no point we’ve suggested that this is a standalone factor, but rather proposed it as an informative and significant component. 

      In addition to the experiments described above, we conducted an experiment in which we counted both eggs and clusters. The results indicate that cluster counts are a reliable proxy for reproductive investment at a given location. In this experiment, we present cluster numbers alongside egg counts (Figure 2).

      Furthermore, we apologize for the technical error that prevented our uploaded data files from reaching the reviewers. We have also uploaded updated data and code.

      (3) Regarding the analysis of the results, I am also not entirely convinced that each night can be taken as an independent egg-laying event, as the amount of eggs and the place were the eggs are laid by a female moth surely depends on the previous oviposition events. While I must admit that I am not a statistician, I would suggest, from a biological point of view, that each group of moths should be treated as a replicate and not each night. I would therefore also suggest to rather analyse the sum of eggs laid over the different consecutive nights than taking the eggs laid in each night as an independent data point.

      We thank the reviewer for this question. This is a valid and point that we will address in three aspects: 

      First, regarding our statistical approach, we used a model that takes into account the sequence of nights and examines whether there is an effect of the order of nights, i.e., we used GLMMs, with the night nested within the repetition. This is equivalent to addressing this as a repeated measure and is, to our best knowledge, the common way to treat such data. 

      Second, following the reviewer's comment, we also reran the statistics of the third experiment (i.e., “sound gradient experiments”, Figure 2 and Supplementary figure 4) when only taking the first night when the female/s laid eggs to avoid the concern of dependency. This analysis revealed the same result – i.e., a significant preference for the sound stimulus. We have now updated our methods and results section to clarify this point.  

      Third, an important detail that may not have been clearly specified in the methods: at the end of each night, we cleaned the arena of counted egg clusters using a cloth with ethanol, so that on the subsequent night, we would not expect there to be evidence of previous oviposition but thus would not exclude some sort of physiological or cognitive memories. We have now updated our methods section to clarify this important procedural point. 

      (4) Furthermore, it did not become entirely clear to me why a click frequency of 60 clicks per minute was used for most experiments, while the plants only produce clicks at a range of 30 clicks per minute. Independent of the ecological relevance of these sound signals, it would be nice if the authors could provide a reason for using this frequency range. Besides this, I was also wondering about the argument that groups of plants might still produce clicks in the range of 60 clicks per minute and that the authors' tests might therefore still be reasonable. I would agree with this, but only in the case that a group of plants with these sounds would be tested. Offering the choice between two single plants while providing the sound from a group of plants is in my view not the most ecologically reasonable choice. It would be great if the authors could modify the argument in the discussion section accordingly and further explore the relevance of different frequencies and dBlevels.

      This is an excellent point. We originally increased the click rate generate a strong signal. However, it was important for us to verify that there was ecological relevance in the stimulus we implemented in the system. For this purpose, we recorded a group of dehydrated plants at a distance of ~20cm and we measured a click rate of 20 clicks per minute (i.e., 0.33 Hz) (see Methods section). Therefore, as mentioned at the beginning of this letter, in the additional experiment described in Figure 2, we reduced the click frequency to 30 clicks per minute, and at this lower rate, the effect was maintained. Increasing plant density would probably lead to a higher rate of 30 clicks per minute. 

      (5) Finally, I was wondering how transferable the findings are towards insects and Lepidopterans in general. Not all insects possess a tympanic organ and might therefore not be able to detect the plant clicks that were recorded. Moreover, I would imagine that generalist herbivorous like Spodoptera might be more inclined to use these clicks than specialists, which very much rely on certain chemical cues to find their host plants. It would be great if the authors would point more to the fact that your study only investigated a single moth species and that the results might therefore only hold true for S. littoralis and closely related species, but not necessary for other moth species such as Sphingidae or even butterflies.

      Good point. Our research uses a specific model system of one moth species and one plant species in a particular plant-insect interaction where females select host plants for their offspring. As with any model-based research that attempts to draw broader conclusions, we've taken care to distinguish between our direct findings and potential wider implications. We believe our system may represent mechanisms relevant to a wider group of herbivorous insects with hearing capabilities, particularly considering that several moth families and other insect orders can detect ultrasound. However, additional research examining more moth and plant species is necessary to determine how broadly applicable these findings are. We have made these clarifications in the text.

      Reviewer #2 (Public review):

      (6) The results are intriguing, and I think the experiments are very well designed. However, if female moths use the sounds emitted by dehydrated plants as cues to decide where to oviposit, the hypothesis would predict that they would avoid such sounds. The discussion mentions the possibility of a multi-modal moth decision-making process to explain these contradictory results, and I also believe this is a strong possibility. However, since this remains speculative, careful consideration is needed regarding how to interpret the findings based solely on the direct results presented in the results section.  

      Thank you for this insightful observation. We agree that the apparent attraction of females to dehydrated-plant sounds contradicts our initial prediction. Having observed this pattern consistently across multiple setups, we have now added a targeted choice experiment to the revised manuscript: here female moths were offered a choice between dehydrated plants broadcasting their natural ultrasonic emissions and a control. These results—detailed in the Discussion and presented in full in the Supplementary Materials (Supplementary Figure 4)—show that when only a dehydrated plant is available, moths would prefer it for oviposition, supporting our hypothesis that in the absence of a real plant, the plant’s sounds might represent a plant..

      (7) Additionally, the final results describing differences in olfactory responses to drying and hydrated plants are included, but the corresponding figures are placed in the supplementary materials. Given this, I would suggest reconsidering how to best present the hypotheses and clarify the overarching message of the results. This might involve reordering the results or re-evaluating which data should appear in the main text versus the supplementary materials

      Thank you for this suggestion. We have reorganized the manuscript and removed the olfactory response data from the current version to maintain a focused narrative on acoustic cues. We agree that a detailed investigation of multimodal interactions deserves a separate study, which we plan to pursue in future work. 

      (8) There were also areas where more detailed explanations of the experimental methods would be beneficial.

      Thank you for highlighting this point. We have expanded and clarified the Methods section to provide comprehensive detail on our experimental procedures.

      Reviewer #1 (Recommendations for the authors):

      (9) Line 1: Please include the name of the species you tested also in the title as your results might not hold true for all moth species.

      We do not fully agree with this comment. Please see comment 5.

      (10) Line 19-20: Please rephrase the sentence so that it becomes clear that the "dehydration stress" refers to the plant and not to the moths.

      Thank you for the suggestion; we have clarified the text accordingly

      (11) Line 31: Male moths might provide many different signals to the females, maybe better "male sound signals" or similar.

      Thank you for the suggestion; we have clarified the text accordingly.

      (12) Line 52-53: Maybe mention here that not all moth species have evolved these abilities.

      Thank you for the suggestion; we have clarified the text accordingly.

      (13) Line 77: add a space after 38.

      Thank you for the suggestion; we have clarified the text accordingly.

      (14) Line 88: Maybe change "secondary predators" to "natural enemies".

      Thank you for the suggestion; we have clarified the text accordingly.

      (15) Line 134: Why is "notably" in italics? I would suggest using normal spelling/formatting rules here.

      Thank you for the suggestion; we have clarified the text accordingly.

      (16) Line 140-144: If you did perform the experiment also with the more ecological relevant playback rate, why not present these findings as your main results and use the data with the higher playback frequency as additional support?

      Thank you for this suggestion. We agree that the ecologically relevant playback data are important; as described in detail at the beginning of this letter and also in comment 4, however, to preserve a clear and cohesive narrative, we have maintained the original ordering of this section. Nevertheless, the various experiments conducted in Figure 1 differ in several components from Figure 2 and the work that examined sounds in plant groups in the appendices. Therefore, we find it more appropriate to use them as supporting evidence for the main findings rather than creating a comparison between different experimental systems. For this reason, we chose to keep them as a separate description in "The ecological playback findings (Lines 140–144) remain fully described in the Results and serve to reinforce the main observations without interrupting the manuscript's flow.

      (17) Line 146: Please explain already here how you deafened the moths.

      Thank you for the suggestion; we have clarified the text accordingly.

      (18) Line 181: should it be "male moths' " ?

      Thank you for the suggestion; we have clarified the text accordingly.

      (19) Line 215: Why is "without a plant" in italics? I would suggest using normal spelling/formatting rules here.

      Thank you for the suggestion; we have clarified the text accordingly.

      (20) Line 234: I do not understand why this type of statistic was used to analyse the electroantennogram (EAG) results. Would a rather simple Student's t-test or a Wilcon rank sum test not have been sufficient? I would also like to caution you not to overinterpret the data derived from the EAG, as you combined the entire headspace into one mixture it is no longer possible to derive information on the different volatiles in the blends. The differences you observe might therefore mostly be due to the amount of emitted volatiles.

      We have reorganized the manuscript and removed the olfactory response data from the current version to maintain a focused narrative on acoustic cues (See comment 7). 

      (21) Line 268: It might be nice to add an additional reference here referring to the multimodal oviposition behaviour of the moths.

      Thank you for the suggestion; we have clarified the text accordingly.

      (22) Line 284: If possible, please add another reference here referring to the different cues used by moths during oviposition.

      Thank you for the suggestion; we have clarified the text accordingly.

      (23) Line 336: What do you mean by "closed together"?

      Thank you for the suggestion; we have clarified the text accordingly.

      (24) Line 434-436: Please see my overall comments. I do not think that you can call it ecologically relevant if the signal emitted by multiple plants is played in the context of just a single plant.

      Please see comments 1 and 4.

      (25) Line 496: Please change "stats" to statistics.

      Thank you for the suggestion; we have clarified the text accordingly.

      (26) Line 522-524: I am not sure whether simply listing their names does give full credit to the work these people did for your study. Maybe also explain how they contributed to your work.

      Thank you for the suggestion; we have clarified the text accordingly.

      Reviewer #2 (Recommendations for the authors):

      (27) L54 20-60kHz --> 20Hz-60kHz or 20kHz - 60kHz?

      OK. We have replaced it.

      (28) L124 Are the results for the condition where nothing was placed and the condition where a decoy silent resistor was placed combined in the analysis? If so, were there no significant differences between the two conditions? Comparing these with a condition presenting band-limited noise in the same frequency range as the drought-stressed sounds might also have been an effective approach to further isolate the specific role of the ultrasonic emissions.

      We have used both conditions due to technical constrains and pooled them tougher for analysis— statistical tests confirmed no significant differences between them—and this clarification has now been added to the Methods section including the results of the statistical test.

      (29) L125 (Fig. 1A), see Exp. 1 in the Methods). -> (Fig.1B. See Exp.1 in the Methods).

      Thank you for the suggestion; we have clarified the text accordingly.

      (30) L132 "The opposite choice to what was seen in the initial experiment (Fig.1B)"

      Thank you for the suggestion; we have clarified the text accordingly.

      (31) L137-143 If you are writing about results, why not describe them with figures and statistics? The current description reads like a discussion.

      These findings were not among our primary research questions; however, we believe that including them in the Results section underscores the experimental differences. In our opinion, introducing an additional figure or expanding the statistical analysis at this point would disrupt the narrative flow and risk confusing the reader.

      (32) L141 "This is higher than the rate reported for a single young plant" Are you referring to the tomato plants used in the experiments? It might be helpful to include in the main text the natural click rate emitted by tomato plants, as this information is currently only mentioned in the Methods section.

      See comment 4.  

      (33) L191 Is the main point here to convey that the plant playback effect remained significant even when the sound presentation frequency was reduced to 30 clicks per minute? The inclusion of the feeder element, however, seems to complicate the message. To simplify the results, moving the content from lines 185-202 to the supplementary materials might be a better approach. Additionally, what is the rationale for placing the sugar solution in the arena? Is it to maintain the moths' vitality during the experiment? Clarifying this in the methods section would help provide context for this experimental detail.

      In this series of experiments, we manipulated four variables—single moths, ultrasonic click rate, arena configuration (from a two-choice design to an elongated enclosure), and the response metric (total egg counts rather than cluster counts)—to evaluate moth oviposition under more ecologically realistic conditions. We demonstrate the system’s robustness and validity in a more realistic setting (by tracking individual moths, counting single eggs, etc.).  

      As noted in the text, feeders were included to preserve the moths’ natural behavior and vitality. We have further clarified this in the revised manuscript.

      (34) L215 Is the click presentation frequency 30 or 60 per minute? Since Figure 3 illustrates examples of moth movement from the experiment described in Figure 1, it might be more effective to present Figure 3 when discussing the results of Figure 1 or to include it in the supplementary materials for better clarity and organization.

      See comments 1 and 4. As mentioned in the above 

      (35) L291 Please provide a detailed explanation of the experiments and measurements for the results shown in Figure S3 (and Figure S2). If the multi-modal hypothesis discussed in the study is a key focus, it might be better to include these results in the main results section rather than in the supplementary materials.

      Thank you for this suggestion. Figure S2 was removed, see comments above. We’ve added now the context to figure S3.

      (36) L303 It might be helpful to include information about the relationship between the moth species used in this study and tomato plants somewhere in the text. This would provide an important context for understanding the ecological relevance of the experiments.

      Thank you for the suggestion; we have clarified the text accordingly.

      (37) Table 1 The significant figures in the numbers presented in the tables should be consistent.

      Thank you for the suggestion; we have clarified the text accordingly.

      (38) L341 The text mentions that experiments were conducted in a greenhouse, but does this mean the arena was placed inside the greenhouse? Also, the term "arena" is used - does this refer to a sealed rectangular case or something similar? For the sound presentation experiments, it seems that the arena cage was placed inside a soundproof room. If the arena is indeed a case-like structure, were there any specific measures taken to prevent sound scattering within the case, such as the choice of materials or structural modifications?

      Here, “arena” refers to the plastic boxes used throughout this study. In this particular experiment, we presented plants alone—reflecting ongoing debate in the literature—and used these trials as a baseline for our subsequent sound-presentation experiments, during which we measured sound intensity as described in the Methods section. All sound-playback experiments were conducted in sound-proof rooms, and acoustic levels were measured beforehand—sound on the control side fell below our system’s detection threshold. 

      (39) L373 "resister similar to the speaker" Could you explain it in more detail? I think this would depend on the type of speaker used-particularly whether it includes magnets. From an experimental perspective, presenting different sounds such as white noise from the speaker might have been a better control. Was there a specific reason for not doing so? Additionally, the study does not clearly demonstrate whether the electric and magnetic field environments on both sides of the arena were appropriately controlled. Without this information, it is difficult to evaluate whether using a resistor as a substitute was adequate.

      Thank you for this comment. We have now addressed this point in the Discussion. We acknowledge that we did not account for the magnetic field, which might have differed between the speaker and the resistor. We agree that using an alternative control, such as white noise, could have been informative, and we now mention this as a limitation in the revised Methods.

      (40) L435 60Hz? The representation of frequencies in the text is inconsistent, with some values expressed in Hz and others as "clicks per second." It would be better to standardize these units for clarity, such as using Hz throughout the manuscript.

      We agree that this is confusing. We reviewed the text and made sure that when we addressed click per second, we meant how many clicks were produced and when we addressed Hz units it was in the context of sound frequencies.  

      (41) L484 "we quantified how many times each individual crossed the center of the arena" Is this data being used in the results?

      Yes. Mentioned in the text just before Figure 3. L220

    1. Author response:

      We thank the reviewers and editors for their careful and constructive assessment of our manuscript. We have provided a provisional response to the eLife assessment and the reviewer’s public comments below, addressing their main concerns and outlining our planned revisions that we believe will substantially strengthen our paper.  

      eLife Assessment

      This study presents a valuable finding on the representational structure of task encoding in the prefrontal cortex. The evidence supporting the claims of the authors is solid, representing an impressive data collection effort and best-practice fMRI analyses. However, at least including visual regions as a control and controlling for behavioral differences in the task in representation analyses would have strengthened the study. The work will be of interest to cognitive neuroscientists interested in the neural basis of cognitive control.

      We plan to address both specific methodological weaknesses mentioned in the assessment in our forthcoming revision. First, the revision will include analyses of an early visual cortex ROI as an additional control region, allowing us to test whether the primary auditory cortex findings generalize to the sensory cortex across input modalities. Preliminary results indicate that the early visual cortex ROI exhibits a similar pattern of results, with evidence for coding both task-relevant and task-irrelevant visual dimensions across both tasks, as well as the context dimension specifically in the hierarchy task. Second, we will include behavioral performance as a covariate for the relevant statistical comparison across tasks to mitigate concerns over performance-related confounds. In addition, we will include a set of control analyses that demonstrate that equating the amount of data for pattern analyses across the two tasks by subsampling from the hierarchy task, while reducing our overall power, does not appreciably alter our results. We note that our analyses of representational geometries relied only on neural data from correct trials and, in the first-level modelling of the fMRI data, already controlled for differences in trial-by-trial response times. Therefore, our analyses of decoding and representation similarity are not directly affected by differences in performance across the two tasks. Finally, we have provided clarifications regarding Reviewer 2’s questions about the size and construction of the regions of interest employed in the study, as well as about the language employed to discuss null results.  

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Bhandari and colleagues present tour-de-force analyses that compare the representational geometry in the lateral prefrontal cortex and primary auditory cortex between two complex cognitive control tasks, with one having a "flat" structure where subjects are asked to form rote memory of all the stimulus-action mappings in the task and one having a "hierarchical" task structure that allows clustering of task conditions and that renders certain stimulus dimensions irrelevant for choices. They discovered that the lPFC geometry is high-dimensional in nature in that it allows above-chance separation between different dichotomies of task conditions. The separability is significantly higher for task-relevant features than task-irrelevant ones. They also found task features that are represented in an "abstract" format (e.g., audio features), i.e., the neural representation generalizes across specific task conditions that share this variable. The neural patterns in lPFC are highly relevant for behaviors as they are correlated with subjects' reaction times and choices.

      Strengths:

      Typically, geometry in coding patterns is reflected in single-unit firings; this manuscript demonstrates that such geometry can be recovered using fMRI BOLD signals, which is both surprising and important. The tasks are well designed and powerful in revealing the differences in neural geometry, and analyses are all done in a rigorous way. I am thus very enthusiastic about this paper and identify no major issues.

      I am curious about the consequence of dimensionality collapse in lPFC. The authors propose a very interesting idea that separability is critical for cognitive control; indeed, separability is high for task-relevant information. What happens when task-relevant separation is low or task-irrelevant separation is high, and will this lead to behavioral errors? Maybe a difference score between the separability of task-relevant and taskirrelevant features is a signature of the strength of cognitive control?

      We appreciate the reviewers’ positive evaluation of our paper.

      Weaknesses:

      The authors show a difference between flat and hierarchical tasks, but the two tasks are different in accuracy, with the flat task having more errors. Will this difference in task difficulty/errors contribute to the task differences in results reported?

      To address the Reviewer’s concern about the difference in behavioural performance between the two tasks influencing our results, we will take several approaches. First, we will include behavioral performance as a covariate for the relevant statistical comparison across tasks. This should ensure that any differences we observe across tasks are over and above those that can be explained by the difference in behavioral performance. Second, we will include a set of decoding analyses that control for differences in performance across the tasks. We note that all our analyses of representational geometries relied on neural data from correct trials only. In addition, the first-level modelling of the fMRI data already controlled for trial-by-trial variability in response times. Therefore, our decoding and representation similarity analyses should not directly be affected by differences in performance across the two tasks. However, one possible issue with this approach is that the larger number of errors in the flat task means that less data was available for estimating multivoxel patterns in the flat task compared to the hierarchy task, resulting in differential power to detect decoding effects across the two tasks. We note that the on average, this difference was not substantial: on average, 21.7 runs were available per participant for the flat task, while 23.8 runs per participant were available for the hierarchy task. Moreover, rerunning our analyses with the number of runs equated for each participant does not meaningfully alter the pattern of results. These additional analyses will be included in the supplement in the forthcoming revised manuscript.  

      Reviewer #2 (Public review):

      Summary:

      The authors study the influence of tasks on the representational geometry of the lPFC and auditory cortex (AC). In particular, they use two context-dependent tasks: a task with a hierarchical structure and a task with a flat structure, in which each context/stimulus maps to a specific response. Their primary finding is that the representational geometry in the lPFC, in contrast to AC, aligns with the optimal organization of the task. They conclude that the geometry of representations adapts, or is tailored, to the task in the lPFC, therefore supporting control processes.

      Strengths:

      (1) Dataset:

      The dataset is impressive and well-sampled. Having data from both tasks collected in the same subjects is a great property. If it is publicly available, it will be a significant contribution to the community.

      (2) Choice of methods:

      The choice of analyses are largely well-suited towards the questions at hand - crosscondition generalization, RSA + regression, in combination with ANOVAs, are well-suited to characterizing task representations.

      (3) I found some of their results, in particular, those presented in Figures 4 and 5, to be particularly compelling.

      (4) The correlation analysis with behavior is also a nice result.

      We thank the reviewer for noting the strengths of the paper. We respond to the weaknesses noted below. 

      Weaknesses:

      (1) Choice of ROIs:

      A strength of fMRI is its spatial coverage of the whole brain. In this study, however, the authors focus on only two ROIs: the lPFC and auditory cortex. Though I understand the justification for choosing lPFC from decades of research, the choice of AC as a control feels somewhat arbitrary - AC is known to have worse SNR in fMRI data, and limiting a 'control' to a single region seems arbitrary. For example, why not also include visual regions, given that the task also involves two visual features?

      We agree with the reviewer that the whole-brain fMRI data certainly provide ample opportunities to explore the nature of these representations across the brain. Our focus in this paper is squarely on the principles of coding and flexibility in the lPFC. We believe that a whole-brain exploration addresses a separate question that would be out of the scope of this study. To clarify, we are not arguing that the lPFC is the only region in the brain that employs the coding principles that our study brings to light. Our contention is only that lPFC employs these principles, and it differs at least from the primary sensory cortex. The questions of whether these principles generalize beyond lPFC (quite likely) and, if so, how broadly, are distinct from the ones addressed in the manuscript. We intend to follow up with another manuscript that addresses these questions.

      Nevertheless, given the focus of this paper, we agree that a second control region, which allows one to test if the primary auditory cortex findings generalize to the sensory cortex more broadly, would strengthen our claims. We will include an early visual cortex ROI in our forthcoming revision. Preliminary results indicate that the early visual cortex ROI shows a similar set of findings – with evidence for coding of task-relevant and taskirrelevant visual dimensions across both tasks, but also specifically the context dimension in the hierarchy task. These results will be detailed in the forthcoming revision

      (2) Construction of ROIs:

      The choice and construction of the ROIs feel a bit arbitrary, as the lPFC region was constructed out of 10 parcels from Schaefer, while the AC was constructed from a different methodology (neurosynth). Did both parcels have the same number of voxels/vertices? It would be helpful to include a visualization of these masks as a figure.

      We defined the lPFC ROIs by selecting Schaefer parcels in the frontal lobe that were previously mapped onto the Control A resting state network identified by Yeo et al. (2011). This network aligns with the multiple-demand network, which has also been identified in the macaque, where it includes the lPFC regions that abut the principal sulcus. Prior results from these regions in the monkey brain provide the scientific premise for our hypotheses. The two lPFC ROIs in each hemisphere were constructed out of 5 Schaefer parcels in each hemisphere. These parcels cluster into the same functional network and tend to behave similarly in univariate analyses. Given that our hypotheses do not distinguish between the different parcels, we elected to improve power by merging them into left and right dlPFC ROIs. 

      On the other hand, the same approach could not be used to identify the primary auditory cortex. As Yeo et al. noted in their paper, the 17 resting state networks they identify did not adequately parcellate somatomotor and auditory cortices into distinct networks, likely due to their proximity (see Fig 14 and related text in Yeo et al. (2011)). We therefore relied on a different approach to define the primary auditory cortex, using an association test in Neurosynth to obtain a map of regions associated with the term “primary auditory”. In the revised manuscript, we will also include a primary auditory cortex ROI, defined again using a term-based association test in Neurosynth.

      Our lPFC ROIs and pAC ROIs are of similar size. In the left hemisphere, the lPFC ROI (constructed from merging Schaefer parcels 128-thru-132) has, on average, 624.55 voxels. The left pAC ROI (defined with Neurosynth) has, on average, 628 voxels. In the right hemisphere, the lPFC ROI (constructed from merging Schaefer parcels 330-thru334), has 470.8 voxels on average. The right pAC ROI has, on average, 568 voxels. A table reporting the size of our parcels and ROIs was included in the supplement. In our forthcoming revision, we will additionally include a supplementary figure visualizing the ROI masks. 

      (3) Task dimensionality:

      In some ways, the main findings - that representation dimensionality is tailored to the task - seem to obviously follow from the choice of two tasks, particularly from a normative modeling perspective. For example, the flat task is effectively a memorization task, and is incompressible in the sense that there are no heuristics to solve it. In contrast, the hierarchical task can have several strategies, an uncompressed (memorized) strategy, and a compressed strategy. This is analogous to other studies evaluating representations during 'rich' vs. 'lazy'/kernel learning in ANNs. However, it seems unlikely (if not impossible) to form a 'rich' representation in the flat task. Posed another way, the flat task will always necessarily have a higher dimensionality than the hierarchical task. Thus, is their hypothesis - that representational geometry is tailored to the task - actually falsifiable? I understand the authors posit alternative hypotheses, e.g., "a fully compressed global axis with no separation among individual stimulus inputs could support responding [in the flat task]" (p. 36). But is this a realistic outcome, for example, in the space of all possible computational models performing this task? I understand that directly addressing this comment is challenging (without additional data collection or modeling work), but perhaps some additional discussion around this would be helpful.

      We thank the reviewer for this comment, which gives us a chance to clarify our argument.

      As noted by the reviewer, whether a network takes advantage of the compressibility of a task depends on its learning regime (i.e. rich vs lazy). One way to frame our question regarding the lPFC’s coding strategy, then, is to ask whether it operates in a rich or a lazy learning regime (which would predict, respectively, task-tailored vs task-agnostic representations). The reviewer’s concern is that the two task structures we employed are differentially compressible, and therefore, it is inevitable that we observe tailored representations and therefore, our hypotheses are not falsifiable.

      First, it is important to clarify the theoretical premise behind our design and how it relates logically to our hypotheses. Under a lazy learning regime, a network would encode highdimensional representations of both tasks, regardless of their compressibility. On the other hand, under a rich learning regime, representational dimensionality will likely be shaped by the tasks’ structure. If the two tasks differ in their compressibility, only in the rich learning regime would the network learn representations of different dimensionality. Therefore, observing representations with dimensionality tailored to the task structure rules out the possibility that the lPFC is operating in a lazy regime. Therefore, the hypotheses are certainly testable.

      The second point of clarification is that, contrary to the reviewer’s assertion, the flat task is, in fact, compressible – the task can be solved with a categorical representation of the response categories, with no sensitivity to the different specific stimuli within each category. Indeed, it is possible to train a simple, three-layer feedforward artificial neural network to perform the flat task perfectly with only 2 units in the hidden layer, demonstrating this compressibility. While we agree with the reviewer that in the space of all possible architectures one might consider the two tasks may differ in compressibility, particularly at the local levels, as we noted above, this does not imply that our hypotheses are not testable.

      Finally, as a third point of clarification, our focus in this paper is on understanding the nature of coding in the lPFC in particular. Arguments based on a normative modelling perspective properly apply to the representations learned by an agent (such as an ANN or a human) as a whole. In a minimal feedforward ANN with a single hidden layer trained in a regime which encourages compression (i.e. a rich learning regime), it would indeed be the case that the representational dimensionality in that hidden layer would be higher for less compressible tasks. However, when applied to humans, such an argument applies to the brain as a whole rather than to an individual region of the brain like the lPFC. As such, it is less straightforward to predict how a single region might represent a task without additional information about the region’s inputs, outputs and broader position in a network. Even for a highly compressible task, a particular brain region may nevertheless be sensitive to all task dimensions. Conversely, even when a task is not compressible, a particular population within the brain may be invariant to some task features. For example, the primary auditory cortex is expected to be invariant to visual task dimensions.

      Therefore, how a task is represented in the lPFC in particular (as opposed to the whole brain) depends on its computational function and coding principles, which remain debated. For instance, as some accounts (such as the guided activation theory) posit, if the primary function of the lPFC is to encode ‘context’ and shape downstream processing based on context, we might only expect to see the abstract coding of the auditory context in the hierarchy task (and, perhaps, the response categories across both tasks as they encode the ’context’ for the lower-level response decision), while being invariant to lowerlevel features of the input. In our paper, we specifically contrast two accounts of lPFC coding that have emerged in the literature – one positing that the lPFC learns a representation tailored to the structure of the task, and another that the lPFC encodes a high-dimensional representation that privileges sensitivity to many task features and their non-linear mixture at the cost of generalization. Regardless of the compressibility of the tasks in question, how the lPFC encodes the two tasks is an empirical question.

      In our forthcoming revision, we will clarify these points in the discussion. We will also include the results of neural network simulations alluded to above.

      (4) Related to the above:

      The authors have a section on p. 27: "Local structure of lPFC representational geometry of the flat task shows high separability with no evidence for abstraction" - I understand a generalization analysis can be done in the feature space, but in practice, the fact that the flat task doubles as a memorization task implies that there are no useful abstractions, so it seems to trivially follow that there would be no abstract representations. In fact, the use of task abstractions in the stimulus space would be detrimental to task performance here. I could understand the use of this analysis as a control, but the phrasing of this section seems to indicate that this is a surprising result.

      As explained above, there is no need for high local separability in the flat task. The lPFC could have completely abstracted over the individual trial-types that contributed to each response category, encoding only the response categories. Indeed, as also noted above, it is possible to train a simple, three-layer feedforward artificial neural network to perform the flat task perfectly with only 2 units in the hidden layer. The two hidden layer units code for each of the two response categories. 

      (5) Statistical inferences:

      Throughout the manuscript, the authors appear to conflate failure to reject the null with acceptance of the null. For example, p. 24: "However, unlike left lPFC, paired t-tests showed no reliable difference in the separability of the task-relevant features vs the orthogonal, task-irrelevant features... Therefore, the overall separability of pAC representations is not shaped by either task-relevance of task structure."

      We thank the reviewer for pointing these out. These sentences will be corrected in the revision. For instance, the sentence above will be modified to “Therefore, we find no evidence that the overall separability of pAC representations is shaped by either taskrelevance or task structure.”

      Reviewer #3 (Public review):

      Summary:

      In this paper, Bhandari, Keglovits, et al. explore the representational structure of task encoding in the lateral prefrontal cortex. Through an impressive fMRI data-collection effort, they compare and contrast neural representations across tasks with different highlevel stimulus-response structures. They find that the lateral prefrontal cortex shows enhanced encoding of task-relevant information, but that most of these representations do not generalize across conditions (i.e., have low abstraction). This appears to be driven in part by the representation of task conditions being clustered by the higher-order task properties ('global' representations), with poor generalization across these clusters ('local' representations). Overall, this paper provides an interesting account of how task representations are encoded in the PFC.

      Strengths:

      (1) Impressive dataset, which may provide further opportunities for investigating prefrontal representations.

      (2) Clever task design, allowing the authors to confound several features within a complex paradigm.

      (3) Best-practice analysis for decoding, similarity analyses, and assessments of representational geometry.

      (4) Extensive analyses to quantify the structure of PFC task representations.

      Weaknesses:

      (1) The paper would benefit from improved presentational clarity: more scaffolding of design and analysis decisions, clearer grounding to understand the high-level interpretations of the analyses (e.g., context, cluster, abstraction), and better visualizations of the key findings.

      (2) The paper would benefit from stronger theoretical motivation for the experimental design, as well as a refined discussion on the implications of these findings for theories of cognitive control.

      We thank the reviewer for highlighting the strengths of our paper and their feedback on the writing. We have reviewed these helpful suggestions with an eye to which we may implement in our revision to improve clarity. Our forthcoming revision will 1) provide clearer scaffolding to aid the reader in understanding our design, analyses and our interpretation of the results 2) incorporate the MDS-based visualization of the representational geometries, which is currently presented in the Supplement, as a figure panel in the main text, 3) provide a justification for the particular task structures we picked in the introduction and 4) incorporate a new paragraph in the Discussion section to highlight the implications of our findings for cognitive control.

    1. Reviewer #3 (Public review):

      Summary:

      This paper presents a framework for a multilevel agent-based model of the drosophila larva, using a simplified larval body and locomotor equations coupled to oscillators and sensory input. The model itself is built upon significant existing literature, particularly Wystrach, Lagogiannis, and Webb 2016 and Jürgensen et al. 2024. The aim is to generate an easily configurable, well-documented platform for organism-scale behavioral simulation in specific experiments. The authors demonstrate qualitative similarity between in vivo behavioral experiments to calibrated models.

      Strengths:

      The goal is excellent - a system to rapidly run computational experiments that align naturally with behavioral experiments would be well-suited to develop intuitions and cut through hypotheses. The authors provide quantitative descriptions that show that the best-fit parameters in their models produce results that agree with several properties of larval locomotion.

      The description of model calibration in the appendix is clear and explains several aspects of the model better than the main text.

      In addition, the code is well-organized using contemporary Python tooling and the documentation is nicely in progress (although it remains incomplete). However, see notes for difficulties with installation.

      Weaknesses:

      (1) As presented here the modeling itself is described in an unclear fashion and without a particular scientific question. The majority of the effort appears to be calibrating modest extensions of existing models and applying them to very simple experiments. This could be an effective first part of a paper on the software tool, but the paper needs to point to a scientific question or, if it is a tool paper, a gap in the current state of modeling tools needed to address scientific goals. While the manuscript has a good overview of larval behavioral papers, the discussion of modeling is more of an afterthought. However, the paper is a modeling paper and the contribution is to modeling and particularly with this work's minor adaptions of existing models, it is unclear what the principle contribution is intended to be.

      (2) While the models presented do qualitatively agree with experimental data in specific situations, there is no effort to challenge the model assumptions or compare them to alternative models. Simply because the data is consistent in a small number of simple experiments does not mean that the models are correct. Moreover, given the highly empirical nature of the modeling, I wonder what results are largely the model putting out what was put in, particularly with regards to kinematic results like frequency and body length or the effect of learning simply changing the sensory gain constant. It is difficult to imagine how at this level of empirical modeling, it would appear quite difficult to integrate the type of cell-type-specific perturbation or functional observation that is common in larval experiments.

      (3) The central framing of a "layered control architecture" does not have a significant impact on the work presented here and the paper would do better with less emphasis on it. Given the limited empirical models, there are only so many parameters where different components can influence one another, and as best as I can tell from the paper there is only chemotaxis and modulation of a chemotactic gain constant that are incorporated so far. However, since these are empirical functions it says little about how the layers are actually controlled by the nervous system - indeed, the larval nervous system appears to have many levels of local and long-range module of circuits at both the sensory and motor layers. It is not clear how this aspect would contribute beyond the well-appreciated concept of a relatively finite set of behavioral primitives in an insect brain, particularly for the fly larva. What would be a contradictory model and how would the authors differentiate between that and the one they currently propose? If focusing only on olfactory learning and chemotaxis, how does the current framing add to the existing understanding?

      (4) The paper uses experimental data to calibrate the models, however, the experiments are not described at all in the text.

    2. Author response:

      We thank all three anonymous reviewers for their thoughtful evaluations of our manuscript and for recognizing the conceptual advance in combining agent-based behavioral simulations with systems neuroscience models. We are especially encouraged by the acknowledgement of the framework’s potential to support simulation of neural control of individual animal behavior in realistic sensory environments.

      Below, we respond to each reviewer’s public comments in turn. Throughout, we have aimed to clarify our rationale for modeling choices, acknowledge limitations, and outline concrete steps for improvement in the revised manuscript.

      Furthermore, the call for a better description of the model implementation as voiced by all three reviewers and additional requests from community members has prompted us to formulate a separate technically detailed description of the publicly available larvaworld software package as well as of the readily implemented models in form of a preprint paper (Sakagiannis et al., 2025, bioRxiv, DOI: https://doi.org/10.1101/2025.06.15.659765).

      Reviewer #1:

      We are happy to read that this reviewer considers the proposed behavioral architecture ‘a significant step forward in the field’, and that she/he recognizes the strengths of our work in the modular and hierarchical approach that provides connections to influential theories of motor control in the brain, in the experimental evidence it is based on, and in the valuable abstractions that we have chosen for the larval behavioral modeling.

      The reviewer raises important points about the simplifications we have made, both conceptually and in the specific implementation of larval behaviors. Our main goal in this study is to introduce a conceptual framework that integrates agent-based modeling with systems neuroscience models in a modular fashion. To serve this purpose, we aimed for a minimal yet representative implementation at the motor layer of the architecture, calibrated to larval locomotion kinematics. This choice enables efficient simulation while allowing us to test top-down modulation and adaptive mechanisms in higher layers without the computational overhead of a full neuromechanical model. In addition to chemotaxis, we have recently used this simplified approach to model thermotaxis in larvae (Kafle et al., 2025, iScience, DOI: https://doi.org/10.1016/j.isci.2025.112809).

      The reviewer notes the absence of explicit segmental neuromuscular control or central pattern generators (CPGs). We deliberately abstracted from these mechanisms, representing the larval body as two segments with basic kinematic control, to focus on reproducing overall locomotor patterns. This bisegmental simplification, which we illustrate in Supplemental Video “Bisegmental larva-body simplification”, retains the behavioral features relevant to our current aims. However, the modular structure of the framework means that more detailed neuromechanical models—incorporating CPG dynamics or connectome-derived circuit models—can be integrated in future work without altering the architecture as a whole.

      We fully agree that real neural circuits are more complex than a strict subsumption architecture implies. In the Drosophila larva, there is clear evidence for ascending sensory feedback from the motor periphery to premotor and higher brain circuits, as well as neuromodulatory influences. These add layers of complexity beyond the predominantly descending control in our present model. At the same time, both larval and adult connectome data show that across-level descending and ascending connections are sparse compared to the dense within-layer connectivity. We see value in casting our model as a hierarchical control system precisely to make the strengths and limitations of such an abstraction explicit. The revised manuscript will include further discussion of these points.

      In summary, our design choices reflect a trade-off: by limiting the biological detail in the lower layers, we gain computational efficiency and maintain a clear modular structure that can host models at different levels of abstraction. This ensures that the architecture remains both a tool for immediate behavioral simulation and a scaffold for integrating richer neural and biomechanical models as they become available.

      Reviewer #2:

      We thank the reviewer for recognizing the novelty of our locomotory model, particularly the implementation of peristaltic strides based on our new analyses of empirical larval tracks, and for providing constructive feedback that will help us improve the manuscript.

      The reviewer highlights the need for clearer explanations of the chemotaxis and odor preference modules. We expand these sections in the revised manuscript with more explicit descriptions of model structure, parameterization, and calibration. As mentioned above, we have also prepared a separate preprint dedicated to the larvaworld Python package, which contains detailed implementation notes and hands-on tutorials that allow users to adapt or extend individual modules.

      Regarding the comparison to empirical behavior in chemotaxis, our present analysis is indeed primarily qualitative. However, we would like to emphasize that the temporal profile of odor concentration at the larval head in our simulations matches that measured in Gomez-Marin et al. (Nature Comm., 2011, DOI: https://doi.org/10.1038/ncomms1455) using only one additional free parameter, while all parameters of the basic locomotory model had been fitted to a separate exploration dataset before and were kept fixed in the chemotaxis experiments. In addition to the simulation of chemotaxis in the present paper, we recently used larvaworld in a practical model application to estimate a species-specific parameter of thermotaxis from experiments across different drosophilids (Kafle et al., 2025, iScience, DOI: https://doi.org/10.1016/j.isci.2025.112809).

      The preference index in our simulations was computed using the same definition as in the established experimental group assay for larval memory retention, enabling a direct quantitative comparison between simulated and empirical results. Variability in the simulated outcomes arose naturally from inter-individual differences in body length and locomotory parameters, derived from real larval measurements, as well as from the random initial orientation of each individual in the arena. These factors contributed to variation in individual tracks and ultimately produced preference index values that closely matched those observed experimentally. In the revised manuscript, we also discuss handedness, as highlighted by the reviewer, as another meaningful expression of inter-individual variability in Drosophila larvae and insects more generally.

      Finally, we acknowledge the reviewer’s concern about the scalability and broader applicability of the model. While the present paper focuses on three specific behavioral paradigms (exploration, chemotaxis, odor preference), the modular structure of the architecture is designed for flexibility: modules at any layer can be exchanged for more detailed or alternative implementations, and new sensory modalities or behaviors can be integrated without redesigning the system. The larvaworld package, associated codebase, and documentation are openly available to encourage adoption and adaptation by the larval research community.

      Reviewer #3:

      This public review provides an excellent account of our central aim to build an easily configurable, well-documented platform for organism-scale behavioral simulation and we are happy to read that the reviewer considers this an excellent goal.

      We thank the reviewer for her/his account of our well-organized code using contemporary Python tooling. We are currently further improving code readability and code documentation, and we will release a new version of the larvaworld Python package. We further agree with the reviewer’s assessment that understanding the model calibration currently requires reading of the appendix. For the revised manuscript we thus aim at improving our description of all calibration and modeling steps along the way. We will also make sure to improve the description of the experimental datasets used for calibration.

      We recognize that our description of the paper’s scientific contribution could be clearer. In revision, we will sharpen the Introduction and Discussion to highlight our main contributions:

      (1) Promoting a shift from isolated neural circuit modeling to integrated agent-based simulations in realistic environments.

      (2) Proposing the layered behavioral architecture, adopting the subsumption paradigm for modular integration.

      (3) Providing the larvaworld software as a ready-to-use, extensible modeling platform.

      (4) Implementing an empirically calibrated locomotory model and demonstrating its integration with navigation and learning modules in replicated behavioral paradigms.

      We agree with the reviewer that the next challenge is to integrate the empirically based behavioral simulations presented here with functional brain models capable of reproducing or predicting experimental findings at the level of cellular neurophysiology, including the effects of cell-type-specific manipulations such as gene knock-down or optogenetic activation/inhibition. However, based on our experience with systems-level modeling, we deliberately invested in behavioral simulation because functional models of the nervous system—including our own—often lack translation into simulated agent behavior. In many cases, model output is limited to one or more variables that can at best be interpreted as a behavioral bias, and most often represents an “average animal” that fails to capture inter-individual differences. By linking our spiking mushroom body model to behavioral simulations in a group of individual agents during memory retention tests (Figure 6C,D), we were able to achieve a first successful direct comparison between simulated and experimental behavior metrics—in this case, the behavioral preference index reported in Jürgensen et al. (iScience, 2024, DOI: https://doi.org/10.1016/j.isci.2023.108640).

      Finally, we reiterate that the layered behavioral architecture is designed to promote a modular modeling paradigm. Our adoption of a subsumption architecture does not conflict with the concept of behavioral primitives; on the contrary, the notion that such primitives follow (semi-)autonomous motor programs and can be combined into more complex behaviors was the starting point for our implementation of the architecture in the fly larva. In our view, a genuinely contradictory paradigm for neural control of behavior would require a non-modular, strictly non-hierarchical organization of the nervous system and, by extension, of behavioral control.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review)

      Comment 

      Koonce et al. have generated a web-based visualization tool for exploring C. elegans neuronal morphology, contact area between neurons, and synaptic connectivity data. Here, the authors integrate volumetric segmentation of neurons and visualization of contact area patterns of individual neurons generated from Diffusion Condensation and C-PHATE embedding based on previous work from adult volumetric electron microscopy (vEM) data, extended to available vEM data for earlier developmental stages, which effectively summarizes modularity within the collated C. elegans contactomes to date. Overall, NeuroSC's relative ease of use for generating visualizations, its ability to quickly toggle between developmental stages, and its integration of a concise visualization of individual neurons' contact patterns strengthen its utility.

      We thank that reviewer for this positive assessment of our work.

      Comment

      NeuroSC provides an accessible and convenient platform. However, many of the characteristics of NeuroSC overlap with that of an existing tool for visualizing connectomics data, Neuroglancer, which is a widely-used and shared platform with data from other organisms. The authors do not make clear their motivation for generating this new tool rather than building on a system that has already collated previous connectomics data. Although the field will benefit from any tool that collates connectomics data and makes it more accessible and user-friendly, such a tool is only useful if it is kept up-to-date, and if data formatting for submitting electron microscopy data to be added to the tool is made clear. It is unclear from this manuscript whether NeuroSC will be updated with recently published and future C. elegans connectomes, or how additional datasets can be submitted to be added in the future.

      We have added new language to more explicitly state the motivations for developing NeuroSC (Introduction, lines 98-111, and discussion lines 375-384). In a new discussion section, we also include comparisons of the features of NeuroSC with other existing tools, like Neuroglancer and Webknossos, (lines 393-417).

      Briefly, the functional features of NeuroSC are substantially different (and do not exist) in other web-based tools for navigating EM datasets, including NeuroGlancer. This is because the intended use of NeuroSC is substantially different (and purposefully synergistic) to the intended use, and tools available, in NeuroGlancer. 

      NeuroGlancer is a versatile tool designed primarily for web-based visualizations and sharing of large EM datasets. NeuroSC was not designed to enable this type of access to the primary EM data (purposefully done because these features were already available through tools like NeuroGlancer). 

      Instead, the explicit goal of NeuroSC is to provide a platform specifically optimized for examining neuronal relationships across connectomic datasets. NeuroSC builds on the segmentations emerging from programs like NeuroGlancer, but the tools are tailored to explore relationships such as contact profiles in the context of neuronal morphologies and synaptic positions, and across datasets that represent different animals or different developmental stages. 

      To achieve this, all datasets in NeuroSC were optimized to facilitate comparisons across different connectomes of segmented neuronal features, including: 1) alignment of the neurons that are compared upon the display of the segmentations; 2) synchronization of the 3D windows; 3) implementation of a ‘universal color code’ across datasets for each neuron and relationship for easy visual comparisons; 4) use of the specific neuronal names to label instances of the same cells across all available datasets. The use of precise neuronal names among separate data sets allows integration of these objects with other catalogued datasets, including genomic and neuronal activity profiles.

      The formatting and display of the datasets used in NeuroSC was accompanied by the development of new tools including: 1) Rendering of the contact profiles of all neurons in the context of the morphology of the cell and the synapses and 2) C-PHATE diagrams to inspect multidimensional relationship hierarchies based on these contact profiles. In NeuroSC, C-PHATEs can be navigated and compared across multiple stages of development while visualizing neuronal reconstructions, allowing users to compare neuronal relationships across individual datasets.

      We agree with the reviewer that these tools are most useful when integrated. With that intention in mind, we designed NeuroSC as a series of modular, open-source tools that could be integrated into other programs, including Neuroglancer. In that sense our intent was not to produce another free-standing tool, but a set of tools that, if useful, could be integrated to other existing web-based connectomic resources to enhance the user experience of navigating complex EM datasets and draw biological meaning from the relationships between the neurons. Additionally, we intentionally designed NeuroSC to enable the ability to integrate new methods of understanding neuron relationships as they arise. We have dedicated a more detailed section to the discussion (lines 369- 417) to better convey this intention and directly address the unique abilities of NeuroSC as a complementary tool to the powerful existing tools, including Neuroglancer.

      Comment

      The interface for visualizing contacts and synapses would be improved with better user access to the quantitative underlying data. When contact areas or synapses are added to the viewer, adding statistics on the magnitude of the contact area, the number of synapses, and the rank of these values among the neuron's top connections, would make the viewer more useful for hypothesis generation. Furthermore, synapses are currently listed individually, with names that are not very legible to the web user. Grouping them by pre- and postsynaptic neurons and linking these groups across developmental stages would also be an improvement.

      [what do they even mean by linking?]

      We thank the reviewer for this insightful comment and have implemented several improvements to address these suggestions. Specifically, we have added new features to enhance user access to quantitative data within the NeuroDevSCAN viewer:

      Cell, Patch, and Synapse Statistics: Users can now see a statistics panel when clicking on a rendered neuron, contact patch, or a synapse. These panels provide the following information, respectively, and are highlighted in lines 303-315):

      Cell Stats: Click on a cell rendering to show cell stats which displays the total volume and surface area of the selected neuron within the defined neuropil area of our datasets (see Methods). 

      Contact Stats: Click on a patch rendering to show ‘contact stats’. This pop up displays quantifications of the selected contact relationship. Rank compares the summed surface area of contacts ("patches") between these two neurons relative to all other contact relationships for the primary neuron for the cell and the whole nerve ring. A rank of 1, for example, means this neuron pair shares the largest contact surface area of the examined relationship. “Total surface area” is displayed in nanometers, and is the summed surface area of all patches of this identity. Contact percentages are presented in two ways: (1) as the proportion of the primary cell's total surface area occupied by the contact in question, and (2) as the proportion of the total surface area of the nerve ring occupied by that same contact. (Showcased in figure S5). 

      Synapse Stats: A click on a synapse rendering now shows ‘synapse stats’, which displays the number of synapses of the selected identity within the primary neuron, including any polyadic synapse combinations involving the primary neurons. (Showcased in figure S7).

      (1) Grouping and Readability Improvements: While individual synapses are still visualized, their display has been improved for legibility. We have condensed the lengthy naming scheme to improve clarity and codified the synapse type by using superscript letters C, E, U to represent chemical, electrical and undefined synapses, respectively. This is explained and shown in figure S7, we added arrows to indicate the directionality of presumed information flow at each synapse. 

      (2) Developmental Linkage: We can link objects across datasets via cellular identity, but each synapse in the dataset does not yet have an identity attributed to its spatial coordinates, preventing us from linking specific synapses across development beyond their connectivity (ie, that a given synapses connects cell X to cell Y, for instance), also addressed in R1.11.  

      Together, these improvements substantially enhance the utility of the viewer for hypothesis generation by making key quantitative data readily accessible.

      Comment

      While the DC/C-PHATE visualizations are a useful tool for the user, it is difficult to understand when grouping or splitting of cell contact patterns is biologically significant. DC is a deterministic algorithm applied to a contactome from a single organism, and the authors do not provide quantitative metrics of distances between individual neurons or a number of DC iterations on the C-PHATE plot, nor is the selection process for the threshold for DC described in this manuscript. In the application of DC/C-PHATE to larval stage nerve ring strata organization shown by the authors, qualitative observations of C-PHATE plots colored based on adult data seem to be the only evidence shown for persistent strata during development (Figure 3) or changing architectural motifs across stages (Figure 4). Quantitation of differences in neuron position within the DC hierarchy, or differences in modularity across stages, is needed to support these conclusions. Furthermore, illustrating the quantitative differences in C-PHATE plots used to make these conclusions will provide a more instructive guide for users of NeuroSC in generating future hypotheses.

      There are several ways to visualize DC outputs, and one way to quantitatively compare DC clustering events of neurons is via Sankey diagrams. To make the inclusion of these resources more clear, we have highlighted them in lines 175-178 (Supplemental Tables 3-6). ‘DC outputs for each strata across animals can also be inspected using Sankey diagrams (Supplemental Tables 3-6). These spreadsheets detail the neuron members at each iteration of DC, allowing the user to derive quantitative comparisons of clustering events.’

      As the reviewer points out, DC is a deterministic algorithm that will iteratively cluster neurons based on the similarity of their contact profiles. To better explain the selection process for the threshold, the number of DC iterations and the quantitative metrics between the neurons, we have added new text in the Diffusion Condensation methods section.  Briefly:

      Number of DC iterations: During diffusion Condensation (DC) we track the modularity of the resulting clusters at each iteration and select the iteration with the highest modularity to define the clusters that represent the strata  (Moyle et al., 2021), (Brugnone et al., 2019). Mathematically, modularity is calculated by comparing the actual number of edges within clusters to the expected number of such edges in a randomized network with the same degree distribution (Newman et al., 2006). A higher modularity value implies that nodes within the same cluster are more densely connected to each other than to nodes in other clusters. We now better explain this in lines 562-567.

      Threshold for merging points: The threshold (epsilon) used to merge data points in each iteration is set as a small fraction of the spatial extent of the data: for each coordinate dimension (x, y, z), we compute the range (maximum minus minimum), take the maximum of these three values, and divide it by 10,000. This process is performed iteratively for each round of clustering until all data points cluster into a single point. We have updated the manuscript to clarify this threshold selection and included this information in the revised algorithm description and pseudocode. We now better explain this in lines 556-559.

      Distances between neurons in DC C-PHATE: In our previous description in Box 1 algorithm 1, we had provided a general algorithm for DC for any high dimensional dataset. We have now revised the algorithm to indicate how we used DC for these EM datasets. 

      Distances between neurons are determined by the pixel overlap between their segmented shapes in the EM dataset. We use these distances to build a graph with weighted edges, in which the weight of the edge represents the pixel overlap (the adjacency in the actual EM segmentation). Affinities between neurons, which are a proxy for their distance in the graph, are then computed as now revised in Box 1, Algorithm 1. This process is done iteratively as neurons cluster. To better communicate this, we have changed the text in lines 533-538.  

      Comment

      R1.5. While the case studies presented by the authors help to highlight the utility of the different visualizations offered by the NeuroSC platform, the authors need to be more careful with the claims they make from these correlative observations. For example, in Figure 4, the authors use C-PHATE clustering patterns to make conclusions about changes in clustering patterns of individual neurons across development based on single animal datasets. In this and many other cases presented in this study with the limited existing datasets, it is difficult to differentiate between developmental changes and individual variability between the neurite positions, contacts, and synapse differences within these data. This caveat needs to be clearly addressed.

      We now better explain in the manuscript that the selected case study, of the AVF neuron outgrowth, is not one of just correlation based solely on an EM dataset. Instead, the case study represents the NeuroSC-driven exploration of a biologically significant event supported by several independent datasets, as now explained in lines 257-276.

      Briefly, we agree with the reviewer that examining differences across individual EM datasets is insufficient evidence to make conclusions about developmental changes. But the strength of NeuroSC is in its ability to combine and compare multiple datasets, bolstering observations that are not possible by looking at just one dataset, and providing new insights on the way to new hypotheses. We now better explain that we are not looking at single connectomes in isolation and then deriving conclusions, but instead using NeuroSC to compare across 9 EM datasets. We better explain how the tools in NeuroSC, including C-PHATE, enabled comparisons across these multiple connectomes to identify apparent differences in neuronal relationships. We then explain that by using NeuroSC, we could examine these variations in neuronal relationships at the level of individual, cell biological differences of neuronal morphologies between the developmental datasets. This could be due, as pointed by the reviewer, to differences due to development, or just differences between individual animals. In the case of AVF, that features are absent in all early specimens, then arise and persist in all specimens after a certain time point, which lead us to hypothesize they result from a developmental event. Because the segmented objects in NeuroSC are linked to neuronal identities, we are also able to cross reference our observations from the EM datasets with information in other datasets and the literature. In the specific case of postembryonic development of AVF outgrowth, we can now tie the knowledge, from developmental lineage information and molecular profiles, that AVF is a postembryonically born neuron (Sulston et al. 1977, Sun et al 2022, Poole et al 2024, wormatlas.org) to the outgrowth dynamics of its neurites using the postembryonic EM datasets. Our findings using  NeuroSC provide a proof of concept of the utility of the resource and extended our understanding of how the outgrowth of this neuron affects the relationships between the neural circuits in the nerve ring.

      Comment

      R1.6. Given that recent studies have also quantified contact area between neurons across multiple connectomes (Cook et al., Current Biology, 2023; Yim et al., Nature Communications, 2024), and that the authors use a slightly different approach to quantify contact area, a direct comparison between contact area values obtained in this study with prior studies seems appropriate.

      We acknowledge that there are multiple different approaches to calculate adjacencies. In the papers cited above, there are 3 different algorithms used:

      (1) Brittin 2019 (python parse Track EM, boundary thresholds), used in Cook et al 2023, Moyle 2021, and this study).

      (2) Witvliet 2021 (Matlab 2D masks), used in Cook et al 2023.

      (3) Yim 2024 (3D masks), used in Yim et al 2024.

      To briefly describe the different approaches, and the methods we chose for this paper:

      Algorithm 1 (used in this study) defines adjacency based on distances between boundary points in TrakEM2 segmentations, allowing threshold tuning to accommodate differences in resolution and image quality across datasets—an important feature for consistent cross-dataset comparisons.

      Algorithm 2 infers contact via morphological dilation of VAST segmentations, identifying adjacency through overlapping expanded boundaries. 

      Algorithm 3 uses voxelwise contact detection with directional surface area measurements and normalization to account for dataset size differences. 

      In NeuroSC, we use algorithm 1, mostly because we had tested the rigor of this method in (Moyle et al. 2021), where we have shown that results were robust across a range of thresholds. This flexibility enables tailored application across datasets of varying quality and scale, critical for NeuroSC’s mission of curating data sets across differing methodologies to allow for direct relationship comparisons. We detail the methodology for defining thresholds for each dataset in methods section lines 492-521, defined in Supplementary table 1. Another difference between our analysis and the previously cited work is that for our analysis we also chose to include all individually resolved neurons, including post-embryonic cells, without collapsing them into left/right or dorsal/ventral symmetry classes. In this way our approach retains the full cellular resolution of the nervous system. 

      Comment

      Neuroglancer is not mentioned at all in the manuscript, despite it being a very similar and widely accepted platform for vEM data visualization across model organisms. An explicit comparison of NeuroSC and Neuroglancer would be appropriate, given the similarity of the tools. Currently, published C. elegans data (Witvliet et al., 2021; Yim et al., 2024) use Neuroglancer-based viewers, and directly comparing NeuroSC and highlighting its strengths relative to Neuroglancer would strengthen the paper.

      In the original manuscript we had not mentioned tools like Neuroglancer because we envisioned them as distinct, in intended use and output, from NeuroSC. But, as explained in R1.2 comment, in the revised version we have included a section in the Introduction lines 98-108 and in the Discussion (lines 369- 417) that compares these types of web-based tools and highlights synergies. 

      Comment

      Assigning shorthand names to strata, such as "shallow reflex circuit" (page 4, line 172), may oversimplify this group of neurons. Either more detailed support for shorthand names of C-PHATE modules should be included, or less speculative names for strata should be used.

      We appreciate this comment and understand that the original language used in the manuscript to describe strata categorizations may run the risk of oversimplification. We have now clarified the text to communicate that: 1) Strata are labeled by numbers (Strata 1, Strata 2, Strata 3 and Strata 4), rather than functional features of the neurons forming part of the strata, and that 2) the assignment of ‘strata’ is just one level of classification available via DC/CPHATE (as explained below). 

      To be sure, we have observed and published (Moyle et. al. Nature 2021) that within a given stratum, many neurons share the functional identities that we have used as summary descriptors for the strata (eg, shallow reflex circuits for Stratum 1; sensory and integrative circuits in Strata 3 and Strata 4; command interneurons in Strata 2, etc). However, those cell types are not the only members of the strata. We have adjusted the language in lines 197-204 to reflect this more clearly. “Stratum 1, which contains most neurons contributing to shallow reflex circuits that control aversive head movements in response to noxious stimuli, displayed the fewest changes among the developmental connectomes (Figure 3B–F; Supplementary Table 3). In contrast, C. elegans exhibit tractable behaviors that adapt to changing environmental conditions (Flavell et al., 2020). Strata 3 and 4 contain most neurons involved in circuits associated with such learned behaviors, including mechano- and thermo-sensation. This is reflected in Strata 3 and 4 showing the most change in neuronal relationships across postembryonic development.“

      Comment

      The authors state that NeuroSC can be applied to other model organisms. Since model organisms with greater neuron numbers include more individual neurons per cell class, the authors should support this by quantitatively demonstrating how DC/C-PHATE relationships correlate with shared functional roles among C. elegans neurons.

      We now clarify in the manuscript that, like in other organisms, C. elegans neurons are also grouped into functional classes with shared characteristics. In the context of the cylindrical nerve ring of the animal, these neuronal classes are sometimes bilaterally symmetric (forming left-right pairs), four-fold symmetric and six-fold symmetric. We now explain in the discussion that the DC/CPHATE analyses group these neuron classes and their relationships (lines 442-451). In the specific section mentioned by the reviewer, we now also add new text to contextualize this concept and how it might relate to the possible use of these tools in organisms with larger nervous systems: ‘However, our previous work has demonstrated that DC/CPHATE clustering of C. elegans neurons consistently pulls out clusters of shared neuron classes and shared functional roles Moyle et al. (2021). Building on this foundation, we envision applying similar clustering approaches to larger connectomes, aiming to identify classes and functionally related neuronal groups in more complex nervous systems. We suggest that contact profiles, along with neuron morphologies and synaptic partners, can act as ‘fingerprints’ for individual neurons and neuron classes. These ‘fingerprints’ can be aligned across animals of the same species to create identities for neurons. Frameworks for systematic connectomics analysis in tractable model systems such as C. elegans are critical in laying a foundation for future analyses in other organisms with up to a billion-fold increase in neurons (Toga et al., 2012).’

      Comment

      Lack of surface smoothing in NeuroSC leads to processes sometimes appearing to have gaps, which could be remedied by smoothing with a surface mesh. 

      We thank the reviewer for the suggestion, and understand the visibility of gaps in certain neuron processes can be distracting. But this was an intentional choice, with our main goal being to show the most accurate representation of the available data segmentation and avoid any rendering interpretations. In this way, we render the data with the highest fidelity we can and as close as possible to the ground truth of the EM segmentation. We have added language to describe this in the methods, lines 490-491, and in Figure legend 5b.

      Comment

      Toggling between time points while maintaining the same neurons and contact area in NeuroSC is a really valuable feature. The tool would be improved even more by extending this feature to synapses, specifically by allowing the user to add an entire group of synapses to the viewer at once (e.g. "all synapses between AIM and PVQ"), and to keep this synapse group invariant when toggling between developmental stages.

      We thank the reviewer for this suggestion. In response we have now implemented a new feature to ‘clone’ a rendered scene across time while preserving the original elements to ease comparisons. Once the user has rendered a scene, they can use the in-viewer developmental slider to clone the renderings and assigned colors, but display the renderings of the newly selected timepoint. These renderings populate a new window tab which can be dragged to align developmental stage windows side by side. We have added a sentence to account for this in lines 315-317 and to the legend of supplemental Figure S11. 

      Reviewer #2 (Public review)

      Comment

      The ability to visualize the data from both a connectomics and contactomics perspective across developmental time has significant power. The original C. elegans connectome (White et al., 1986) presented their circuits as line drawings with chemical and electrical synapses indicated through arrows and bars. While these line drawings remain incredibly useful, they were also necessary simplifications for a 2D publication and they lack details of the complex architecture seen within each EM image. Koonce et al take advantage of segmented image data of each neuronal process within the nerve ring to create a web interface where users can visualize 3D models for their neuron of choice. The C-PHATE visualization allows users to explore similarities among different neurons in terms of adjacency and then go directly to the 3D model for these neurons. The 3D models it generates are beautiful and will likely be showing up in many future presentations and publications. The tool doesn't require any additional downloading and is open source.

      We thank that reviewer for this positive assessment of our work.

      Comment

      While it's impossible to create one tool that will satisfy all potential users, I found myself wanting to have numbers associated with the data. For example, knowing the number of connections or the total surface area of contacts between individual neurons wasn't possible through the viewer, which limits the utility of taking deep analytical dives. While connectivity data is readily accessible through other interfaces such as Nemanode and WormWiring, a more thorough integration may be helpful to some users.

      We thank the reviewer for this feedback and in response have now implemented displays with quantitative information in NeuroSC. Now, upon hovering over a contact patch or synapse, the user will see the quantitative data of the relationship. For contact patches, you will see the total area shared between two neurons in that dataset. On hovering over a synapse, you will see how many synapses there are in total with the same members and throughout the dataset. We agree that this improves user analyses, (see also R1.3 response).

      Comment

      There were several issues with the user interface that made it a bit clunky to use. For example, as I added additional neurons to the filter search box, the loading time got longer and longer. I ran an experiment uploading all of the amphid neurons, one pair at a time. Each additional neuron pair added an additional 5-10 seconds to the loading. By the time I got to the last pair, it took over a minute to load. Issues like these, some of which may be unavoidable given the size of the data, could be conveyed through better documentation. I did not find the tutorial very helpful and the supplementary movies lacked any voiceover, so it wasn't always clear what they were trying to show.

      We appreciate that some of the more complex models can take a while to load. One of our core goals is to keep the high resolution of our models to most accurately represent the EM data, so we had to compromise between resolution and loading times. But to address this concern we have now added a ‘loading’ prompt that reassures the user when there is a wait. We also added, as suggested, text guidance throughout all of the supplemental videos (Supplemental Videos 1-4).

      Reviewer #3 (Public review)

      Comment

      A web-based app, NeuroSC, that individual researchers can use to interrogate the structure and organization of the C. elegans nerve ring across development In the opinion of this reviewer, only minor revisions are required.

      We thank that reviewer for this positive assessment of our work.

      Comment

      Contact is defined by length, why not contact area? How are these normalized for changes in the overall dimensions of neurons during development?

      To clarify our methodology: the adjacency algorithm that we use generates a 2D adjacency profile by summing the number of adjacent boundary points per EM section, which are then summed across all EM z slices.

      Contact area can be derived by multiplying the adjacency length in each slice by pixel resolution and z-thickness. Prompted by the reviewer we have now also calculated and display contact surface areas, along with their ranks among all contact relationships for a given neuron. These can be inspected directly via the interface by clicking on a rendered cell or contact patch (Figure S5 and lines 308-312). We believe these additional surface area metrics enhance the interpretability and utility of the viewer.

      We apply normalization at the level of the adjacency threshold to account for dataset-specific differences such as contrast, boundary definition, and age-related changes in neuropil packing density. This normalization is applied before running the adjacency algorithm. We do not normalize by individual neuron size, as the contact data are intended to reflect relational differences between neurons, rather than absolute morphological scaling. In fact, our addition of a scale-spheroid within each rendered model emphasizes the large increase in spatial scale that the nerve ring experiences during larval growth.  

      Comment

      Figure 1, C&D, explanation unclear for how the adjacency matrix is correlated with C-Phate schematic in D.

      We thank the reviewer for the comment and have clarified this section by adding greater detail to the explanation of how an adjacency matrix is computed (lines 149-155), as well as a description now in the figure legend 1C. Additionally, we revised Figure 1C and D to simplify neuron representations/colors and to simplify the adjacency heat map gradient. We also extended the area of contact between neurons on Figure 1C to better reflect what would be considered a “contact”. Lastly, in the figure, we changed the color and placement for the z plane arrow and label from black to white, to make it more visible, to highlight the method of computing adjacency for each z slice. 

      Comment

      Figure 4, panels F & G, unclear why AVF is shown in panel G (L3) but not panel F (L1). Explanation (see below) should be provided earlier, i.e., AVF is not generated until the end of the L1.

      We have now clarified this important point by adding labels to Figure 4 panels F and G, ‘Pre-AVF outgrowth’ and ‘Post-AVF outgrowth’ respectively. Briefly, the point is that AVF grows into the nerve ring after the L2 stage, and that is why it is absent in panel F (L1 stage, now with the label ‘Pre-AVF outgrowth’).  

      Comment

      Line 146 What is the justification for the statement: "By end of Larval Stage 1 (L1), neuronal differentiation has concluded...."? This statement is confusing since this sentence also states that "90% of neurons in the neuropil...have entered the nerve ring..." which would suggest that at least 10% additional NR neurons have NOT fully differentiated.

      We have fixed this sentence in the text. Now the sentence reads ‘By Larval stage 1 (L1) 90% of the neurons in the neuropil (161 neurons out of the 181 neurons) have grown into the nerve ring and adopted characteristic morphologies and positions. 

      Lines 171-175 What is meant by the statement that "degree of these changes mapped onto...plasticity? What are examples of "behavioral plasticity?"

      We have added the following new lines of text (lines 200-204) and now additionally cite a review discussing C. elegans behaviors to clarify and give context to behavioral plasticity. ‘C. elegans exhibit tractable behaviors which can adapt due to changing environmental conditions  (Flavell et. al. Genetics 2020). Strata 3 and 4 contain most neurons belonging to circuits associated with such learned behaviors, including chemo, mechano and thermo sensation. This is seemingly reflected by strata 3 and 4 harboring the most readily recognized set of changes in neuronal relationships across postembryonic development.’  

      Comment

      Lines 189-190 The meaning of this sentence is unclear, "The logic in....merge events."

      This sentence has been deleted and we have instead refocused our descriptions of C-PHATES comparisons by neuronal clustering trajectories and cluster members (rather than iterations).

      Comment

      Lines 193-208 This section reports varying levels of convergence across larval development in C-Phate maps for the interneurons AIML and PVQL. Iterations leading to convergence varied: 16 (L1), 14 (L2), 22 (L3), 20 (l4), 14 (adult). The authors suggest that these differences are biologically significant and reflect the reorganization of AIML and PVQL contact relationships especially between the L4 and adult. Are these differences in iterations significant?

      We agree this could be confusing and instead of focusing on comparing the iteration at which each merging event occurs, we now focus on examining the differences in members of clusters, before and after the merge event. Cluster membership is easier to interpret than the differences in the number of DC iterations (lines 224-229).

      Lines 240-241 States that AVF neurons "terminally differentiate in the embryo" which is not correct. AVF neurons are generated from neuronal precursors (P0 and P1) at the end of the L1 stage which accounts for their outgrowth into the NR during the L2 stage. 

      We thank the reviewer for the correction and have edited the text to read: ‘AVF neurons are generated from neuronal precursors (P0 and P1) at the end of the L1 stage (Sulston et al. (1983); Sun and Hobert (2023); Poole et al. (2024); Hall and Altun (2008); Sulston and Horvitz (1977). AVF neurons do not grow into the nerve ring until the L2 stage, and continue to grow until the Adult stage (lines 261-266).’

      Comment

      Lines 289-315. A detailed and highly technical description of website architecture would seem more appropriate for the Methods section.

      We agree and have moved this section to the methods as suggested (lines 663-690).

      Comment

      Line 307 "source data is" should be "source data are"

      Thank you- we have fixed this grammatical error.

      Comment

      Line 324 "circuits identities" should be "circuit identity".

      Thank you- we have fixed this grammatical error.

      Comment

      Trademark/copyright conflict with these sites? https://compumedicsneuroscan.com/about/ https://www.neuroscanai.com/

      We thank the reviewer for drawing our attention to this. To avoid potential conflicts, we have proactively altered the name to NeuroSC throughout the paper.

    1. 3. Build Trust in Your Evaluation Harness If you never rerun baselines, how do you know: your code, logging, and evaluation are sound? improvements aren’t due to hidden differences in implementation? By reproducing others’ results in your own harness, you ensure a fair playing field for comparing your new method. 👉 Otherwise, reviewers (and yourself) can’t be sure whether gains are real or artifacts.

      Donggyun used to mention with me about this. It's possible that the improvement in performance is just due to noise/artifact (e.g randomness in seed), but might not be real improvement.

    1. Reviewer #1 (Public review):

      This study provides an integrative model of the visuomotor control in Drosophila melanogaster. This model presents an experimentally derived model based on visually evoked wingbeat pattern recordings of three strategically selected visual stimulus types with well-established behavioral response characteristics. By testing variations of these models, the authors demonstrate that the virtual model behavior can recapitulate the recorded wing beat behavioral results and those recorded by others for these specific stimuli when presented individually. Yet, the novelty of this study and their model is that it allows predictions for natural visual scenes in which multiple visual stimuli occur simultaneously and may have opposite or enhancing effects on behavior. Testing three models that would allow interactions of these visual modalities, the authors show that using a visual efference copy signal allows visual streams to interact, replicating behavior recorded when multiple stimuli are presented simultaneously. Importantly, they validated the prediction of this model in real flies using magnetically tethered flies, e.g., presenting moving bars with varying backgrounds. In conclusion, the presented manuscript presents a commendable effort in developing and demonstrating the validity of a mixture model that enables predictions of Drosophila behavior in natural visual environments.

      The manuscript employs a thorough, logical approach, combining computational modeling with experimental behavioral validation using magnetically tethered flies. This iterative integration of simulation and empirical behavioral evidence enhances the credibility of the findings. The quantitative models and validating behavioral experiments make this a valuable contribution to the field. This study is well executed and addresses a significant gap in the modeling of fly behavior and holistic understanding of visuomotor behaviors.

      The associated code base is well documented and readily produces all figures in the document.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript "Drosophila Visuomotor Integration: An Integrative Model and Behavioral Evidence of Visual Efference Copy" provides an integrative model of the visuomotor control in Drosophila melanogaster. This model presents an experimentally derived model based on visually evoked wingbeat pattern recordings of three strategically selected visual stimulus types with well-established behavioral response characteristics. By testing variations of these models, the authors demonstrate that the virtual model behavior can recapitulate the recorded wing beat behavioral results and those recorded by others for these specific stimuli when presented individually. Yet, the novelty of this study and their model is that it allows predictions for natural visual scenes in which multiple visual stimuli occur simultaneously and may have opposite or enhancing effects on behavior. Testing three models that would allow interactions of these visual modalities, the authors show that using a visual efference copy signal allows visual streams to interact, replicating behavior recorded when multiple stimuli are presented simultaneously. Importantly, they validated the prediction of this model in real flies using magnetically tethered flies, e.g., presenting moving bars with varying backgrounds. In conclusion, the presented manuscript presents a commendable effort in developing and demonstrating the validity of a mixture model that allows predictions of the behavior of Drosophila in natural visual environments.

      Strengths:

      Overall, the manuscript is well-structured and clear in its presentation, and the modeling and experimental research are methodically conducted and illustrated in visually appealing and easy-to-understand figures and their captions.

      The manuscript employs a thorough, logical approach, combining computational modeling with experimental behavioral validation using magnetically tethered flies. This iterative integration of simulation and empirical behavioral evidence enhances the credibility of the findings.

      The associated code base is well documented and readily produces all figures in the document.

      Suggestions:

      However, while the experiments provide evidence for the use of a visual efference copy, the manuscript would be even more impressive if it presented specific predictions for the neural implementation or even neurophysiological data to support this model. Or, at the very least, a thorough discussion. Nonetheless, these models and validating behavioral experiments make this a valuable contribution to the field; it is well executed and addresses a significant gap in the modeling of fly behavior and holistic understanding of visuomotor behaviors.

      We appreciate the reviewer’s thoughtful comments on the strengths and weaknesses of our manuscript. We agree that biophysically realistic model reflecting the structure of neural circuits as well as physiological data from them would be invaluable. However, we are currently unable to provide physiological evidence for EC-based suppression, nor provide circuit architecture for efference copy-based suppression of the stability circuit because the neural pathway underlying this behavior remains unidentified. Extensive recordings from the HS/VS system have revealed cell-type-specific motor-related inputs during both spontaneous and loom-evoked flight turns (Fenk et al., 2021; Kim et al., 2017, 2015). These studies predicted suppression of the optomotor stability response during such turns, and our new experiments confirmed this suppression specifically during loom-evoked turns (Figures 5, 6). However, these neurons are primarily involved in the head optomotor response, not the body optomotor response. We hope to extend our current model in future studies to incorporate more cellular-level detail, as the feedforward circuits underlying stability behavior become more clearly defined.

      Here are a few points that should be addressed:

      (1) The biomechanics block (Figure 2) should be elaborated on, to explain its relevance to behavior and relation to the underlying neural mechanisms.

      We appreciate this suggestion. The mathematical representation of the biomechanics block has been developed by other groups in previous studies (Fry et al., 2003; Ristroph et al., 2010). We used exactly the same model, and its parameters were identical to those used in one of those studies (Fry et al., 2003; Ristroph et al., 2010), in which the parameters were estimated from the stabilizing response in response to magnetic “stumbling” pulses. In the previous version of the manuscript, we had a description of the biomechanics block in the Method section (see Equation 4). In response to the reviewer’s comment, we have made a few changes in Figure 2A and expanded the associated description in the main text, as follows.

      (Line 160) “To test the orientation behavior of the model, we developed an expanded model, termed “virtual fly model” hereafter. In this model, we added a biomechanics block that transforms the torque response of the fly to the actual heading change according to kinematic parameters estimated previously (Michael H Dickinson, 2005; Ristroph et al., 2010) (Figure 2A, see Equation 4 in Methods and Movie S1). The virtual fly model, featuring position and velocity blocks that are conditioned on the type of the visual pattern, can now change its body orientation, simulating the visual orientation behavior of flies in the free flight condition.”

      (2) It is unclear how the three integrative models with different strategies were chosen or what relevance they have to neural implementation. This should be explained and/or addressed.

      Thank you for this valuable comment. We selected the three models based on previous studies investigating visuomotor integration across multiple species, under conditions where multiple sensory cues are presented simultaneously.

      The addition-only model represents the simplest hypothesis, analogous to the “additive model” proposed by Tom Collett in his 1980 study (Collett, 1980). We used this model as a baseline to illustrate behavior in the absence of any efference copy mechanism. Notably, some modeling studies have proposed linear (additive) integration for multimodal sensory cues at the behavioral level (Liu et al., 2023; Van der Stoep et al., 2021). However, experimental evidence demonstrating strictly linear integration—either behaviorally or physiologically—remains limited. In our study, new data (Figure 5) show that bar-evoked and background movement-evoked locomotor responses are combined linearly, supporting the addition-only model.

      The graded efference copy model has been most clearly demonstrated in the cerebellum-like circuit of Mormyrid fish during electrosensation (Bell, 1981; Kennedy et al., 2014). In this system, the efference copy signal forms a negative image of the predicted reafferent input and undergoes plastic changes as the environment changes—an idea that inspired our modifiable efference copy model (Figure 4–figure supplement 1). The all-or-none efference copy model is exemplified in the sensory systems of smaller organisms, such as the auditory neurons of crickets during stridulation (Poulet and Hedwig, 2006). Notably, in crickets, the motor-related input is referred to as corollary discharge rather than efference copy. Typically, “efference copy” refers to a graded, subtractive motor-related signal, while “corollary discharge” denotes an all-or-none signal, both counteracting the sensory consequences of self-generated actions. In this manuscript, we use the term efference copy more broadly, encompassing both types of motor-related feedback signals (Sommer and Wurtz, 2008).

      In response to this comment, we have made the following changes in the main text to enhance its accessibility to general readers.

      (Line#268) “This integration problem has been studied across animal sensory systems, typically by analyzing motor-related signals observed in sensory neurons (Bell, 1981; Collett, 1980; Kim et al., 2017; Poulet and Hedwig, 2006). Building on the results of these studies, we developed three integrative models. The first model, termed the “addition-only model”, assumes that the outputs of the object (bar) and the background (grating) response circuits are summed to control the flight orientation (Figure 4B, see Equation 14 in Methods).”

      (Line#272) “In the second and third models, an EC is used to set priorities between different visuomotor circuits (Figure 4C,D). In particular, the EC is derived from the object-induced motor command and sent to the object response system to nullify visual input associated with the object-evoked turn (Bell, 1981; Collett, 1980; Poulet and Hedwig, 2006). These motor-related inputs fully suppress sensory processing in some systems (Poulet and Hedwig, 2006), whereas in others they selectively counteract only the undesirable components of the sensory feedback (Bell, 1981; Kennedy et al., 2014).”

      (3) There should be a discussion of how the visual efference could be represented in the biological model and an evaluation of the plausibility and alternatives.

      Thank you for this helpful comment. We have now added the following discussion to share our perspective on the circuit-level implementation of the visual efference copy in Drosophila.

      (Line#481) “Efference copy in Drosophila vision

      Under natural conditions, various visual features in the environment may concurrently activate multiple motor programs. Because these may interfere with one another, it is crucial for the central brain to coordinate between the motor signals originating from different sensory circuits. Among such coordination mechanisms, the EC mechanisms were hypothesized to counteract so-called reafferent visual input, those caused specifically by self-movement (Collett, 1980; von Holst and Mittelstaedt, 1950). Recent studies reported such EC-like signals in Drosophila visual neurons during spontaneous as well as loom-evoked flight turns (Fenk et al., 2021; Kim et al., 2017, 2015). One type of EC-like signals were identified in a group of wide-field visual motion-sensing neurons that were shown to control the neck movement for the gaze stability (Kim et al., 2017). The EC-like signals in these cells were bidirectional depending on the direction of flight turns, and their amplitudes were quantitatively tuned to those of the expected visual input across cell types. Although amplitude varies among cell types, it remains inconclusive whether it also varies within a given cell type to match the amplitude of expected visual feedback, thereby implementing the graded EC signal. A more recent study examined EC-like signal amplitude in the same visual neurons for loom-evoked turns, across events (Fenk et al., 2021). Although the result showed a strong correlation between wing response and the EC-like inputs, the authors pointed that this apparent correlation could stem from noisy measurement of all-or-none motor-related inputs.

      Thus, these studies did not completely disambiguate between graded vs. all-or-none EC signaling. Another type of EC-like signals observed in the visual circuit tuned to a moving spot exhibited characteristics consistent with all-or-none EC. That is, it entirely suppressed visual signaling, irrespective of the direction of the self-generated turn (Kim et al., 2015; Turner et al., 2022). 

      Efference-copy (EC)–like signals have been reported in several Drosophila visual circuits, yet their behavioral role remains unclear. Indirect evidence comes from a behavioral study showing that the dynamics of spontaneously generated flight turns were unaffected by unexpected background motion (Bender and Dickinson, 2006a). Likewise, our behavioral experiments showed that, during loom-evoked turns, responses to background motion are suppressed in an all-or-none manner (Figures 6 and 7). Consistent with this, motor-related inputs recorded in visual neurons exhibit nearly identical dynamics during spontaneous and loom-evoked turns (Fenk et al., 2021). Together, these behavioral and physiological parallels support the idea that a common efference-copy mechanism operates during both spontaneous and loom-evoked flight turns.

      Unlike loom-evoked turns, bar-evoked turn dynamics changed in the presence of moving backgrounds (Figure 5), a result compatible with both the addition-only and graded EC models. However, when the static background was updated just before a bar-evoked turn—thereby altering the amplitude of optic flow—the turn dynamics remained unaffected (Figures 5 and 7), clearly contradicting the addition-only model. Thus, the graded EC model is the only one consistent with both findings. If a graded EC mechanism were truly at work, however, an unexpected background change should have modified turn dynamics because of the mismatch between expected and actual visual feedback (Figure 4–figure supplement 1)—yet we detected no such effect at any time scale examined (Figure 7–figure supplement 1). This mismatch would be ignored only if the amplitude of the graded EC adapted to environmental changes almost instantaneously—a mechanism that seems improbable given the limited computational capacity of the Drosophila brain. In electric fish, for example, comparable adjustments take more than 10 minutes (Bell, 1981; Muller et al., 2019). Further investigation is needed to clarify how reorienting flies ignore optic flow generated by static backgrounds, potentially by engaging EC mechanisms not captured by the models tested in this study.

      Why would Drosophila rely on the all-or-none EC mechanism instead of the graded one for loom-evoked turns? A graded EC must be adjusted adaptively depending on the environment, as the amplitude of visual feedback varies with both the dynamics of self-generated movement and environmental conditions (e.g., empty vs. cluttered visual backgrounds) (Figure 4—figure supplement 1). Recent studies on electric fish have suggested that a large array of neurons in a multi-layer network is crucial for generating a modifiable efference copy signal matched to the current environment (Muller et al., 2019). Given their small-sized brain, flies might opt for a more economical design for suppressing unwanted visual inputs regardless of the visual environment. Circuits mediating such a type of EC were identified in the cricket auditory system during stridulation (Poulet and Hedwig, 2006), for example. Our study strongly suggests the existence of a similar circuit in the Drosophila visual system. 

      We tested the hypothesis that efference-copy (EC) signals guide action selection by suppressing specific visuomotor reflexes when multiple visual features compete. An alternative motif with a similar function is mutual inhibition between motor pathways (Edwards, 1991; Mysore and Kothari, 2020). In Drosophila, descending neurons form dense lateral connections (Braun et al., 2024), offering a substrate for such competitive interactions. Determining whether—and how—EC and mutual inhibition operate will require recordings from the neurons that ensure visual stability, which remain unidentified. Mapping these pathways and assessing how they are modulated by visual and behavioral context are important goals for future work.”

      Reviewer #2 (Public Review):

      It has been widely proposed that the neural circuit uses a copy of motor command, an efference copy, to cancel out self-generated sensory stimuli so that intended movement is not disturbed by the reafferent sensory inputs. However, how quantitatively such an efference copy suppresses sensory inputs is unknown. Here, Canelo et al. tried to demonstrate that an efference copy operates in an all-or-none manner and that its amplitude is independent of the amplitude of the sensory signal to be suppressed. Understanding the nature of such an efference copy is important because animals generally move during sensory processing, and the movement would devastatingly distort that without a proper correction. The manuscript is concise and written very clearly. However, experiments do not directly demonstrate if the animal indeed uses an efference copy in the presented visual paradigms and if such a signal is indeed non-scaled. As it is, it is not clear if the suppression of behavioral response to the visual background is due to the act of an efference copy (a copy of motor command) or due to an alternative, more global inhibitory mechanism, such as feedforward inhibition at the sensory level or attentional modulation. To directly uncover the nature of an efference copy, physiological experiments are necessary. If that is technically challenging, it requires finding a behavioral signature that unambiguously reports a (copy of) motor command and quantifying the nature of that behavior.

      We thank the reviewer for this insightful and constructive comment. We agree that our current behavioral evidence does not directly identify the underlying circuit mechanism, and that direct recordings from visual neurons modulated by an efference copy would be critical for distinguishing between potential mechanisms.

      A prerequisite for such physiological investigations would be the identification of both (1) the feedforward neurons directly involved in the optomotor response, and (2) the neurons conveying motor-related signals to the optomotor circuit. Despite efforts by several research groups, the location of the feedforward circuit mediating the optomotor response remains elusive. This limitation has prevented us from obtaining direct cellular evidence of flight turn-associated suppression of optomotor signaling.

      In light of the reviewer’s suggestion, we expanded our investigation to strengthen the behavioral evidence for efference copy (EC) mechanisms. In addition to our earlier experiments involving unexpected changes in the static background, we examined how object-evoked flight turns influence the optomotor stability reflex and vice versa (Figures 5 and 6). To quantify the interaction between different visuomotor behaviors, we systematically varied the temporal relationship between two types of visual motion—loom versus moving background, or moving bar versus moving background—and measured the resulting behavioral responses.

      Our findings support pattern- and time-specific suppressive mechanisms acting between flight turns associated with the different visual patterns. Specifically:

      The responses to a moving bar and a moving background add linearly, even when presented in close temporal proximity.

      Loom-evoked turns and the optomotor stability reflex mutually suppress each other in a time-specific manner.

      For both loom- and moving bar-evoked flight turns, changes in the static background had no measurable effect on the dynamics of the object-evoked responses.

      These results provide a detailed behavioral characterization of a suppressive interaction between distinct visuomotor responses. This, in turn, offers correlative evidence supporting the involvement of an efference copy-like mechanism acting on the visual system. While similar efference copy mechanisms have been documented in other parts of the visual system, we acknowledge that our findings do not exclude alternative explanations. In particular, it is still possible that lateral inhibition within the central brain or ventral nerve cord contributes to the suppression we observed.

      Ultimately, definitive proof will require identifying the specific neurons that convey efference copy signals and demonstrating that silencing these neurons abolishes the behavioral suppression. Until such experiments are feasible, our behavioral approach provides an important contribution toward understanding the nature of sensorimotor integration in this system.

      Reviewer #3 (Public Review):

      Summary:

      Canelo et al. used a combination of mathematical modeling and behavioral experiments to ask whether flies use an all-or-none EC model or a graded EC model (in which the turn amplitude is modulated by wide-field optic flow). Particularly, the authors focus on the bar-ground discrimination problem, which has received significant attention in flies over the last 50-60 years. First, they use a model by Poggio and Reichardt to model flight response to moving small-field bars and spots and wide-field gratings. They then simulate this model and compare simulation results to flight responses in a yaw-free tether and find generally good agreement. They then ask how flies may do bar-background discrimination (i.e. complex visual environment) and invoke different EC models and an additive model (balancing torque production due to background and bar movement). Using behavioral experiments and simulation supports the notion that flies use an all-or-none EC since flight turns are not influenced by the background optic flow. While the study is interesting, there are major issues with the conceptual framework.

      Strengths:

      They ask a significant question related to efference copies during volitional movement.

      The methods are well detailed and the data (and statistics) are presented clearly.

      The integration of behavioral experiments and mathematical modeling of flight behavior.

      The figures are overall very clear and salient.

      Weaknesses:

      Omission of saccades: While the authors ask a significant question related to the mechanism of bar-ground discrimination, they fail to integrate an essential component of the Drosophila visuomotor responses: saccades. Indeed, the Poggio and Reichardt model, which was developed almost 50 years ago, while appropriate to study body-fixed flight, has a severe limitation: it does not consider saccades. The authors identify this major issue in the Discussion by citing a recent switched, integrate-and-fire model (Mongeau & Frye, 2017). The authors admit that they "approximated" this model as a smooth pursuit movement. However, I disagree that it is an approximation; rather it is an omission of a motor program that is critical for volitional visuomotor behavior. Indeed, saccades are the main strategy by which Drosophila turn in free flight and prior to landing on an object (i.e. akin to a bar), as reported by the Dickinson group (Censi et al., van Breugel & Dickinson [not cited]). Flies appear to solve the bar-ground discrimination problem by switching between smooth movement and saccades (Mongeau & Frye, 2017; Mongeau et al., 2019 [not cited]). Thus, ignoring saccades is a major issue with the current study as it makes their model disconnected from flight behavior, which has been studied in a more natural context since the work of Poggio.

      Thank you for this helpful comment. We agree that including saccadic turns is essential and qualitatively improves the model. In the revised manuscript, we therefore expanded our bar-tracking model to incorporate an integrate-and-saccade strategy, now presented in Figure 2—figure supplement

      The manuscript now introduces this result as follows:

      (Line#190) “Finally, one important locomotion dynamics that a flying Drosophila exhibits while tracking an object is a rapid orientation change, called a “saccade” (Breugel and Dickinson, 2012; Censi et al., 2013; Heisenberg and Wolf, 1979). For example, while tracking a slowly moving bar, flies perform relatively straight flights interspersed with saccadic flight turns (Collett and Land, 1975; Mongeau and Frye, 2017). During this behavior, it has been proposed that visual circuits compute an integrated error of the bar position with respect to the frontal midline and triggers a saccadic turn toward the bar when the integrated value reaches a threshold (Frighetto and Frye, 2023; Mongeau et al., 2019; Mongeau and Frye, 2017). We expanded our bar fixation model to incorporate this behavioral strategy (Figure 2--figure supplement 2). The overall structure of the modified model is akin to the one proposed in a previous study (Mongeau and Frye, 2017), and the amplitude of a saccadic turn was determined by the sum of the position and velocity functions (Figure 2--figure supplement 2A; see Equation 13 in Methods). When simulated, our model successfully reproduced experimental observations of saccade dynamics across different object velocities (Figure 2--figure supplement 2B-D) (Mongeau and Frye, 2017). Together, our models faithfully recapitulated the results of previous behavioral observations in response to singly presented visual patterns (Collett, 1980; Götz, 1987; H. Kim et al., 2023; Maimon et al., 2008; Mongeau and Frye, 2017).”

      Apart from Figures 1 and 2, most of our data—whether from simulations or behavioral experiments—use brief visual patterns lasting 200 ms or less. These stimuli trigger a single, rapid orientation change reminiscent of a saccadic flight turn. In this part of the paper, we essentially have examined how multiple visuomotor pathways interact to determine the direction of object-evoked turns when several visual patterns occur simultaneously.

      Critically, recent work showed that a group of columnar neurons (T3) appear specialized for saccadic bar tracking through integrate-and-fire computations, supporting the notion of parallel visual circuits for saccades and smooth movement (Frighetto & Frye, 2023 [not cited]).

      Thanks for bringing up this critical issue. We have now added this paper in the following part of the manuscript.

      (Line#193) “During this behavior, it has been proposed that visual circuits compute an integrated error of the horizontal bar position with respect to the frontal midline and triggers a saccadic turn toward the bar when the integrated value reaches a threshold (Frighetto and Frye, 2023; Mongeau and Frye, 2017).”

      (Line#462) “Visual systems extract features from the environment by calculating spatiotemporal relationships of neural activities within an array of photoreceptors. In Drosophila, these calculations occur initially on a local scale in the peripheral layers of the optic lobe (Frighetto and Frye, 2023; Gruntman et al., 2018; Ketkar et al., 2020).”

      A major theme of this work is bar fixation, yet recent work showed that in the presence of proprioceptive feedback, flies do not actually center a bar (Rimniceanu & Frye, 2023). Furthermore, the same study found that yaw-free flies do not smoothly track bars but instead generate saccades. Thus prior work is in direct conflict with the work here. This is a major issue that requires more engagement by the authors.

      Thank you for your thoughtful comments and for drawing our attention to this important paper. In our experiments, bar fixation on oscillating vertical objects emerges during the “alignment” phase of the magneto-tether protocol. The pattern movement dynamics was similar those used by Rimniceanu & Frye (2023), yet the two studies differ in a key respect: Rimniceanu & Frye employed a motion-defined bar, whereas we presented a dark vertical bar against a uniform or random-dot background. The alignment success rate—defined as the proportion of trials in which the fly’s body angle is within ±25° of the target—was about 50 % (data not shown). Our alignment pattern consisted of three vertical stripes spanning ~40° horizontally; when we replaced it with a single, narrower stripe, the success rate was lowered (data not shown). These observations suggest that bar fixation in the magnetically tethered assay is less robust than in the rigid-tethered assay, although flies still orient toward highly salient vertical objects.

      We also observed that bar-evoked turns were elicited more reliably when the bar moved rapidly (45° in 200 ms) in the magneto-tether assay, although the turn magnitude was significantly smaller than the actual bar displacement (Figure 3).

      In response to the reviewer’s comment, we now added the following description in the paper regarding the bar fixation behavior, citing Rimniceanu&Frye 2023.

      (Line#239) “Another potential explanation arises from recent studies demonstrating that proprioceptive feedback provided during flight turns in a magnetically tethered assay strongly dampens the amplitude of wing and head responses (Cellini and Mongeau, 2022; Rimniceanu et al., 2023).”

      Relevance of the EC model: EC-related studies by the authors linked cancellation signals to saccades (Kim et al, 2014 & 2017). Puzzlingly, the authors applied an EC model to smooth movement, when the authors' own work showed that smooth course stabilizing flight turns do not receive cancellation signals (Fenk et al., 2021). Thus, in Fig. 4C, based on the state of the field, the efference copy signal should originate from the torque commands to initiate saccades, and not from torque to generate smooth movement. As this group previously showed, cancellation signals are quantitatively tuned to that of the expected visual input during saccades. Importantly, this tuning would be to the anticipated saccadic turn optic flow. Thus the authors' results supporting an all-or-none model appear in direct conflict with the author's previous work. Further, the addition-only model is not particularly helpful as it has been already refuted by behavioral experiments (Rimneceanu & Frye, Mongeau & Frye).

      Thank you for this constructive comment. Efference copy is best established for brief, discrete actions like flight saccades. While motor-related modulation of visual processing has been reported across short- and long-duration behaviours (Chiappe et al., 2010; Fujiwara et al., 2017; Kim et al., 2015, 2017; Maimon et al., 2010; Turner et al., 2022), only flight saccade-associated signals exhibit the temporal profile appropriate to cancel reafferent input. However, von Holst & Mittelstaedt (1950) originally formulated efference copy to explain the smooth optomotor response of hoverflies. In HS/VS recordings in previous studies, however, we could not detect membrane-potential changes tied to baseline wing-beat amplitude (data not shown), but further work is needed. 

      Note that visually evoked flight turns analyzed in this paper have relatively fast dynamics. Fenk et al. (2021) showed that HS cells carry EC-like motor signals during both loom-evoked turns and spontaneous saccades. Building on this, we tested whether object-evoked rapid turns modulate other visuomotor pathways. Although Fenk et al. also found that optomotor turns lack motor input to HS cells, the authors did not test whether the optomotor pathway suppresses other reflexes, such as loom-evoked turns. Our new behavioral data (Figure 6) show that optomotor turns indeed suppress loom-evoked turns, suggesting a potential EC signal arising from the optomotor pathway that inhibits loom-responsive visual neurons.

      In Kim et al. (2017), the authors argued that HS/VS neurons receive a “quantitatively tuned” efference copy that varies across cell types: yaw-sensitive LPTCs are strongly suppressed, roll-sensitive cells receive intermediate input, and pitch-sensitive cells receive little or none. We also showed that when the amplitude of ongoing visual drive changes, the amplitude of saccade-related potentials (SRPs) scales linearly. This proportionality does not imply a genuinely graded EC, however, because SRP amplitude could vary solely through changes in driving force (Vm – Vrest) with a fixed EC conductance. Crucially, SRPs do not fully suppress feed-forward visual signalling, arguing against an all-or-none EC mechanism.

      How, then, can the cellular and behavioural data be reconciled? Silencing HS/VS neurons—or their primary inputs, the T4/T5 neurons—does not markedly diminish the optomotor response in flight (Fenk et al., 2014; Kim et al., 2017), indicating the presence of additional, as-yet-unidentified pathways.

      Physiological recordings from other visual neurons that drive the optomotor response in flying Drosophila are therefore needed to determine how strongly they are suppressed during loom-evoked turns.

      Behavioral evidence for all-or-none EC model: The authors state "unless the stability reflex is suppressed during the flies' object evoked turns, the turns should slow down more strongly with the dense background than the sparse one". This hypothesis is based on the fact that the optomotor response magnitude is larger with a denser background, as would be predicted by an EMD model (because there are more pixels projected onto the eye). However, based on the authors' previous work, the EC should be tuned to optic flow and thus the turning velocity (or amplitude). Thus the EC need not be directly tied to the background statistics, as they claim. For instance, I think it would be important to distinguish whether a mismatch in reafferent velocity (optic flow) links to distinct turn velocities (and thus position). This would require moving the background at different velocities (co- and anti-directionally) at the onset of bar motion. Overall, there are alternative hypotheses here that need to be discussed and more fully explored (as presented by Bender & Dickinson and in work by the Maimon group).

      We appreciate the reviewer’s important suggestion. In response, we performed the recommended experiment. In Figures 5 and 6 of the revised manuscript, we now present how bar- or loom-evoked flight turns affect the response to a moving background pattern. These experiments revealed that bar-evoked turns do not suppress the optic flow response, whereas loom-evoked turns strongly suppress it. Specifically, when background motion began 100 ms after the onset of loom expansion, the response to the background was significantly suppressed. Although weak residual responses to the background motion were observed in this case, this could be due to background motion occurring outside of the suppression interval, which may correspond in duration to the duration of flight turns (Figure 6C,D). 

      The lack of suppression of the optic flow response during and after bar-evoked turns appears to suggest that the responses are added linearly (Figure 5), seemingly contradicting the lack of dynamic change when the background dot density was altered (Figure 7, Figure 7–figure supplement 1). That is, the experimental result in Figure 5 supports either an addition-only or a graded efference copy (EC) model. However, the result in Figure 7 supports an all-or-none EC model. If a graded EC were used, the amplitude of the EC should be updated almost instantaneously when the static background changes.

      Another possibility is that the optic flow during self-generated turns in a static background is extremely weak compared to the optic flow input generated by physically moving the pattern, perhaps due to the rapid nature of head movements. Indeed, detailed kinematic analysis of head movement during spontaneous saccades in blow flies revealed that the head reaches the target angle before the body completes the orientation change, making the effective speed of reafferent optic flow higher than the speed of body rotation (Hateren and Schilstra, 1999). To test these hypotheses, further experiments will be needed for bar-evoked flight turns.

      Publishing the reviewed preprint:

      (1) The Reviewed Preprint (including the full text of the preprint we reviewed, the eLife assessment, and public reviews) will typically be published in two weeks' time.

      Please let us know if you would like to provide provisional author responses to be posted at the same time (if so, please send these by email). Please do not resubmit within the next two/three weeks, as we will need to publish the first version of the Reviewed Preprint first.

      If there are any factual errors in the eLife assessment or public reviews, or other issues we should be aware of, please let us know as soon as possible.

      (2) After publication of the Reviewed Preprint, you can use the link below to submit a revised version. There is no deadline to resubmit. Before resubmitting, please ensure that you update the preprint at the preprint server to correspond with the revised version. Upon submitting a revised version, we will ask the editors and reviewers if it's appropriate to update their assessment and public reviews, which will be included alongside the revised Reviewed Preprint. At that time we will also post the recommendations to the authors and the author responses you provide with the revised version. In the author response, please respond to the public reviews (where relevant) and the recommendations to the authors.

      (3) Alternatively, you can proceed with the current version of the Reviewed Preprint (once published), without revisions, and request an eLife Version of Record. See the Author Guide for further information: https://elife-rp.msubmit.net/html/elife-rp_author_instructions.html#vor. However, most authors decide to request a Version of Record after a round of revision.

      (4) After publication of eLife's Reviewed Preprint, you also have the option to submit/publish in another journal instead: if you choose to do this, please let us know so we can update our records.

      The reviewers identified two key revisions that could improve the assessment of the paper:

      (1) Consideration of saccades within the model framework (outlined by reviewer 3).

      (2) Addition of physiology data to support the conclusions of the paper (outlined by reviewer 2). If this is not feasible within the timescale of revisions, the paper would need to be revised to clarify that the model leads to a hypothesis that would need to be tested with future physiology experiments.

      Thank you for these comments.

      Regarding revision point #1, we have added Figure 2–figure supplement 2, where we incorporated our position-velocity model (estimated in Figure 1) into the framework of the integrate-and-saccade model. A detailed description of this model is now provided in the main text (Lines 190–203).

      For revision point #2, obtaining electrophysiological evidence for efference copy remains challenging, as neither the visual neurons nor the efference-copy neuron has been identified for the wing optomotor response. As suggested by the reviewers, we have revised the title of the paper to reduce emphasis on efference copy and have noted electrophysiological recordings as a direction for future work.

      old title: A visual efference copy-based navigation algorithm in Drosophila for complex visual environments

      new title: Integrative models of visually guided steering in Drosophila

      Specific recommendations are detailed below.

      Reviewer #2 (Recommendations For The Authors):

      To directly demonstrate if an efference copy is non-scaled, the following experiments can be helpful: record from HS/VS cells and examine the relation between the amplitude of the succade-suppression signal vs. succade amplitude.

      Thanks for raising this important point. We previously carried out the suggested analysis for loom-evoked saccades in Fenk et al. (2021). There, significant correlations emerged between wing-response amplitude and saccade-related potentials (Figures 2F and 3C). However, we did not interpret the strong correlation (r ≈ 0.8) as evidence for a graded efference copy, because the amplitude of saccade-related potentials appeared to be bimodal. Upon presentation of the looming stimulus, flies either executed large evasive turns or showed minimal changes in wing-stroke amplitude. Large wing responses were accompanied by strong, saturated suppression of HS-cell membrane potential, whereas trials without wing responses produced only weak modulations—reflected in the bimodal distribution of saccade-related potential amplitudes (Figure 3C). 

      Importantly, in rigidly tethered preparations—where these potentials are typically measured—the absence of proprioceptive feedback can itself drive wingbeat amplitudes to saturation during saccades. We therefore reasoned that the lack of intermediate-sized flight saccades would naturally yield correspondingly saturated saccade-related potentials, even if a graded EC system is in play. 

      In Kim et al. (2017), we also performed a comprehensive analysis of spontaneous saccade-related potentials across all HS/VS cell types. When we later examined the relationship between saccade amplitude and the corresponding saccade-related potentials in each cell type, we could not find any statistically significant correlation (unpublished data).

      measure how much a weak visual stimulus and a strong visual stimulus are suppressed by the suppression signal. If the signal is non-scaled, visual stimuli should always be suppressed independently of their intensities.

      Thank you for this important suggestion. As mentioned in our response to the previous comment, we believe it is not feasible to record from neurons responsible for the body optomotor response at this point, as their identity remains unknown. Regarding the HS/VS cells, our previous study showed that HS cells are not always fully suppressed. The changes in saccade-related potential amplitude can be described as a linear function of the pre-saccadic visually-evoked membrane potential (Figure 7 in Kim et al., 2017). 

      As suggested by Fenk et al. 2014 (doi: 10.1016/j.cub.2014.10.042), HS cells might also be responsive to a moving bar. If that is the case, and if you present a bar and background (either sparse or dense) in a closed-loop manner to a head-fixed fly, HS cells might be sensitive only to the bar but not to the background (independently of the density).

      Thanks for pointing out this important issue. HS cells indeed respond strongly to the horizontal movement of a vertical bar, as expected given that their receptive fields are formed by the integration of local optic flow vectors. In one of our previous studies (Supplemental Figure 1 in Kim et al., 2015), we showed that the response amplitude to a single vertical bar is roughly equivalent to that elicited by a vertical grating composed of 12 bars of the same size. Therefore, we believe that HS cells are likely to contribute to the head response to a moving vertical bar. In a body-fixed flight simulator, HS cells would respond only to the bar if the bar runs in a closed loop with a static background. In this scenario, HS cells are likely to play a role in the head optomotor response.

      Note also that the role of HS cells in the wing optomotor response remains unresolved. Unilateral activation of HS cells has been shown to elicit locomotor turns in walking Drosophila (Fujiwara et al., 2017), as well as in flying individuals (unpublished data from our lab). However, a previous study also showed that strong silencing of HS/VS cells significantly reduced the head optomotor response, but not the wing optomotor response (Kim et al., 2017).

      If neurophysiology is technically challenging, an alternative way might pay attention to a head movement that exclusively follows the background (Fox et al., 2014 (doi: 10.1242/jeb.080192)). Because HS cells are thought to promote head rotation to background motion, a non-scaled suppression signal on HS cells would always suppress the head rotation independently of the background density.

      Thanks for this helpful comment. We have analyzed head movements during bar-evoked flight turns (Figure 7–figure supplement 1B) and found no significant changes across different background dot densities. We think that this might suggest that HS cells are unlikely to receive suppressive inputs during bar-evoked turns, akin to the lack of modulation during optomotor turns.

      Another way to separate a potential efference copy from other mechanisms (more global inhibition) is the directionality. A global inhibition would suppress the response to the background even if the background moves in the same direction as self-motion, but the efference copy would not.

      Thanks for this important point. In Heisenberg and Wolf, 1979, it was proposed that modulation might be bidirectional, with behavioral effects observed only for perturbations in the “unexpected” direction. In our new data on loom-evoked turns (Figure 6), the suppression appears equally strong for background motion in either direction, supporting an all-or-none suppression mechanism.

      Besides, in general, it is unclear if you think an efference copy operates both in smooth pursuits and saccades or if such a signal is only present during saccades. Your previous neurophysiological work supports the latter. Are your behavioral results consistent with the previous saccade suppression idea, or do you propose a new type of efference copy that also operates in smooth pursuits?

      Thanks for raising this important point. von Holst and Mittelstaedt (1950) originally introduced the concept of efference copy to explain the smooth optomotor response. We previously analyzed electrophysiological recordings from HS cells for membrane-potential changes associated with slow deviations in wing-steering angle but found none. However, this negative result does not entirely rule out modulation of visual processing during smooth flight turns, given the slow drift in membrane potential observed in most whole-cell recordings.

      In this study, We examined only the interactions among visuomotor pathways during these rapid flight turns as the dynamics of visually evoked turns are almost as rapid as spontaneous saccades. Our data reveal that interactions between distinct visuomotor reflexes are more diverse than previously appreciated.

      Minor comments:

      Line 108, 109: match the description between here and the labels in Fig. 1F.

      Thank you for indicating this issue. We have defined the general equation to obtain the position and velocity components in the main text lines 108,109, but due to a slight asymmetry in the data (Fig. 1E) we used the approach indicated in Fig. 1F. and explained in lines 113-117.

      Fig.1 F: If the position-dependent component is due to fatigue, the tuning curve's shape is likely changed (shrunk or extended) depending on the stimulus speed. How can you generalize the tuning curve shown here? Does the result hold even if the stimulus speed/contrast/spatial frequency is changed?

      We appreciate this indication. We believed that fatigue may be the reason why the wing response to the grating stimulus showed that significant decay (Fig. 1E). As you mention, the stimulus speed would increase the amplitude of the fly’s response up to a saturation point. We addressed this in our model by multiplying the derived value by the angular velocity of the grating.

      Regarding the contrast, and spatial frequency we did not test it experimentally, instead, we simulated our model for changing visual feedback (Fig. 4A, B), which can be seen as increasing/decreasing contrast of a grating. An increase in the contrast would increase the response of the fly to the grating and so will contribute to dampening the response to the foreground object (Fig. 4C).

      Line 233-255: Here, the description sounds like you will consider several parallel objects (e.g., two stripes) in the visual field instead of the combination of the figure and background (which is referred to in the following paragraph).

      Thank you for pointing it out. Indeed it was slightly ambiguous. We have addressed this by explaining the specific situation of a combination of an object and the background in lines 231-233.

      Figure 6C: you kept the foreground visual field between sparse and dense random dot backgrounds to keep the bar's saliency. Is it sure that this does not influence the difference in the fly's response to these two backgrounds (in Figure 6B)?

      This is a good point that we have also discussed internally. We also carried out similar experiments with a fully covered background and found no significant differences (Figure 7–figure supplement 1).

      Reviewer #3 (Recommendations For The Authors):

      Identify and analyze flight saccade dynamics in the raw trajectories (e.g., Fig. 3B). There should be some since the bar is near the 'sweet spot' for triggering saccades (see Mongeau & Frye, 2017).

      Thank you for bringing up this interesting point. In previous work, it was reported that the fly fixated on a vertical bar through saccadic turns rather than smooth-tracking (Mongeau & Frye, 2017). When the bar width was thin (<15 deg) there was barely one saccade per second (Mongeau & Frye, 2017, Fig. 4). In our magno tether essay (Fig. 3A, B) the object width was 11.25 degrees, and the object moved for a short time window, and so the fly only generated the saccade related to the onset of the object. It could not be considered as a saccade some small turns of a few degrees that are likely related to small perturbations in comparison to those previously reported (Mongeau & Frye, 2017). Additionally, in our protocol (Fig. 3A) from onset time (‘go’ mark), only a single object moved, within an empty background, so in principle there is no trigger for a switch to a smooth movement. We addressed this in lines x-x.

      Consider updating the Poggio model with flight saccades (switched, integrate-and-fire).

      We appreciate this suggestion. Following previous work (Mongeau et al., 2017), we expanded our model to include a saccade mechanism: the torque produced by the summed position- and velocity-dependent components is now replaced by an integrate-and-fire saccade (Figure 2—figure supplement 2). We optimized the saccade interval and amplitude so that both vary linearly with stimulus amplitude and faithfully reproduce the kinematic properties reported previously (Mongeau et al., 2017).  

      Please engage more with the literature, especially work that directly conflicts with your conclusions (see above). Also, highly relevant work by Bender & Dickinson was not sufficiently discussed. Spot results presented in Fig. 3 should be contextualized in light of the work of Mongeau et al., 2019, who performed similar experiments and identified a switch in saccade valence.

      We appreciate your pointing out the relevant previous work. We have added references to the following papers and tried to describe the relationship between our data and previous ones.

      Bender & Dickinson 2006

      (Line#162) “This simulation experiment is reminiscent of the magnetically tethered flight assay, where a flying fly remains fixed at a position but is free to rotate around its yaw axis (Bender and Dickinson, 2006b; Cellini et al., 2022; G. Kim et al., 2023; Mongeau and Frye, 2017).”

      (Line#218) “We tested the predictions of our models with flies flying in an environment similar to that used in the simulation (Figure 3A). A fly was tethered to a short steel pin positioned vertically at the center of a vertically oriented magnetic field, allowing it to rotate around its yaw axis with minimal friction (Bender and Dickinson, 2006b; Cellini et al., 2022; G. Kim et al., 2023).”

      (Line#238) “To determine if our assay imposes additional friction compared to other assays used in previous studies, we analyzed the dynamics of spontaneous saccades during the “freeze” phase (Figure 3–figure supplement 1A). We found their duration and amplitude to be within the range reported previously (Bender and Dickinson, 2006b; Mongeau and Frye, 2017) (Figure 3–figure supplement 1B-D). 

      Mongeau et al., 2019

      (Line#196) “During this behavior, it has been proposed that visual circuits compute an integrated error of the bar position with respect to the frontal midline and triggers a saccadic turn toward the bar when the integrated value reaches a threshold (Frighetto and Frye, 2023; Mongeau et al., 2019; Mongeau and Frye, 2017). We expanded our bar fixation model to incorporate this behavioral strategy (Figure 2–figure supplement 2).”

      This paper shows that the dynamics of saccadic flight turns elicited by a rotating bar or spot determine whether flies display attraction or aversion. In that study, the visual stimulus—a bar or spot—rotated slowly at a constant 75 deg s⁻¹. By contrast, in our Figure 3 the object moves much faster, driving the neural “integrator” to saturation and triggering an almost immediate flight turn. In Mongeau et al. (2019), saccades occur at variable times and their amplitudes and directions are more stochastic, again reflecting the slower stimulus speed. Because these differences all arise from the disparity in object speed, we did not cite Mongeau et al. (2019) in Figure 3 or the associated text.

      In addition to the two papers cited above, we have incorporated several relevant studies on the Drosophila visuomotor control identified through the reviewers’ insightful comments. Examples include:

      Frighetto G, Frye MA. 2023 (Line#195, 464)

      Rimniceanu et al., 2023 (Line#241)

      Cellini & Mongeau 2020 (Line#91)

      Cellini & Mongeau 2022 (Line#241)

      Cellini et al., 2022 (LIne#91, 162, 218)

      Many citations are not in the proper format (e.g. using numbers rather than authors' last name).

      Thank you for letting us know. We have changed the remaining citations to the proper format.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      *The authors have a longstanding focus and reputation on single cell sequencing technology development and application. In this current study, the authors developed a novel single-cell multi-omic assay termed "T-ChIC" so that to jointly profile the histone modifications along with the full-length transcriptome from the same single cells, analyzed the dynamic relationship between chromatin state and gene expression during zebrafish development and cell fate determination. In general, the assay works well, the data look convincing and conclusions are beneficial to the community. *

      Thank you for your positive feedback.

      *There are several single-cell methodologies all claim to co-profile chromatin modifications and gene expression from the same individual cell, such as CoTECH, Paired-tag and others. Although T-ChIC employs pA-Mnase and IVT to obtain these modalities from single cells which are different, could the author provide some direct comparisons among all these technologies to see whether T-ChIC outperforms? *

      In a separate technical manuscript describing the application of T-ChIC in mouse cells (Zeller, Blotenburg et al 2024, bioRxiv, 2024.05. 09.593364), we have provided a direct comparison of data quality between T-ChIC and other single-cell methods for chromatin-RNA co-profiling (Please refer to Fig. 1C,D and Fig. S1D, E, of the preprint). We show that compared to other methods, T-ChIC is able to better preserve the expected biological relationship between the histone modifications and gene expression in single cells.

      *In current study, T-ChIC profiled H3K27me3 and H3K4me1 modifications, these data look great. How about other histone modifications (eg H3K9me3 and H3K36me3) and transcription factors? *

      While we haven't profiled these other modifications using T-ChIC in Zebrafish, we have previously published high quality data on these histone modifications using the sortChIC method, on which T-ChIC is based (Zeller, Yeung et al 2023). In our comparison, we find that histone modification profiles between T-ChIC and sortChIC are very similar (Fig. S1C in Zeller, Blotenburg et al 2024). Therefore the method is expected to work as well for the other histone marks.

      *T-ChIC can detect full length transcription from the same single cells, but in FigS3, the authors still used other published single cell transcriptomics to annotate the cell types, this seems unnecessary? *

      We used the published scRNA-seq dataset with a larger number of cells to homogenize our cell type labels with these datasets, but we also cross-referenced our cluster-specific marker genes with ZFIN and homogenized the cell type labels with ZFIN ontology. This way our annotation is in line with previous datasets but not biased by it. Due the relatively smaller size of our data, we didn't expect to identify unique, rare cell types, but our full-length total RNA assay helps us identify non-coding RNAs such as miRNA previously undetected in scRNA assays, which we have now highlighted in new figure S1c .

      *Throughout the manuscript, the authors found some interesting dynamics between chromatin state and gene expression during embryogenesis, independent approaches should be used to validate these findings, such as IHC staining or RNA ISH? *

      We appreciate that the ISH staining could be useful to validate the expression pattern of genes identified in this study. But to validate the relationships between the histone marks and gene expression, we need to combine these stainings with functional genomics experiments, such as PRC2-related knockouts. Due to their complexity, such experiments are beyond the scope of this manuscript (see also reply to reviewer #3, comment #4 for details).

      *In Fig2 and FigS4, the authors showed H3K27me3 cis spreading during development, this looks really interesting. Is this zebrafish specific? H3K27me3 ChIP-seq or CutTag data from mouse and/or human embryos should be reanalyzed and used to compare. The authors could speculate some possible mechanisms to explain this spreading pattern? *

      Thanks for the suggestion. In this revision, we have reanalysed a dataset of mouse ChIP-seq of H3K27me3 during mouse embryonic development by Xiang et al (Nature Genetics 2019) and find similar evidence of spreading of H3K27me3 signal from their pre-marked promoter regions at E5.5 epiblast upon differentiation (new Figure S4i). This observation, combined with the fact that the mechanism of pre-marking of promoters by PRC1-PRC2 interaction seems to be conserved between the two species (see (Hickey et al., 2022), (Mei et al., 2021) & (Chen et al., 2021)), suggests that the dynamics of H3K27me3 pattern establishment is conserved across vertebrates. But we think a high-resolution profiling via a method like T-ChIC would be more useful to demonstrate the dynamics of signal spreading during mouse embryonic development in the future. We have discussed this further in our revised manuscript.

      Reviewer #1 (Significance (Required)):

      *The authors have a longstanding focus and reputation on single cell sequencing technology development and application. In this current study, the authors developed a novel single-cell multi-omic assay termed "T-ChIC" so that to jointly profile the histone modifications along with the full-length transcriptome from the same single cells, analyzed the dynamic relationship between chromatin state and gene expression during zebrafish development and cell fate determination. In general, the assay works well, the data look convincing and conclusions are beneficial to the community. *

      Thank you very much for your supportive remarks.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      *Joint analysis of multiple modalities in single cells will provide a comprehensive view of cell fate states. In this manuscript, Bhardwaj et al developed a single-cell multi-omics assay, T-ChIC, to simultaneously capture histone modifications and full-length transcriptome and applied the method on early embryos of zebrafish. The authors observed a decoupled relationship between the chromatin modifications and gene expression at early developmental stages. The correlation becomes stronger as development proceeds, as genes are silenced by the cis-spreading of the repressive marker H3k27me3. Overall, the work is well performed, and the results are meaningful and interesting to readers in the epigenomic and embryonic development fields. There are some concerns before the manuscript is considered for publication. *

      We thank the reviewer for appreciating the quality of our study.

      *Major concerns: *

        • A major point of this study is to understand embryo development, especially gastrulation, with the power of scMulti-Omics assay. However, the current analysis didn't focus on deciphering the biology of gastrulation, i.e., lineage-specific pioneer factors that help to reform the chromatin landscape. The majority of the data analysis is based on the temporal dimension, but not the cell-type-specific dimension, which reduces the value of the single-cell assay. *

      We focused on the lineage-specific transcription factor activity during gastrulation in Figure 4 and S8 of the manuscript and discovered several interesting regulators active at this stage. During our analysis of the temporal dimension for the rest of the manuscript, we also classified the cells by their germ layer and "latent" developmental time by taking the full advantage of the single-cell nature of our data. Additionally, we have now added the cell-type-specific H3K27-demethylation results for 24hpf in response to your comment below. We hope that these results, together with our openly available dataset would demonstrate the advantage of the single-cell aspect of our dataset.

      1. *The cis-spreading of H3K27me3 with developmental time is interesting. Considering H3k27me3 could mark bivalent regions, especially in pluripotent cells, there must be some regions that have lost H3k27me3 signals during development. Therefore, it's confusing that the authors didn't find these regions (30% spreading, 70% stable). The authors should explain and discuss this issue. *

      Indeed we see that ~30% of the bins enriched in the pluripotent stage spread, while 70% do not seem to spread. In line with earlier observations(Hickey et al., 2022; Vastenhouw et al., 2010), we find that H3K27me3 is almost absent in the zygote and is still being accumulated until 24hpf and beyond. Therefore the majority of the sites in the genome still seem to be in the process of gaining H3K27me3 until 24hpf, explaining why we see mostly "spreading" and "stable" states. Considering most of these sites are at promoters and show signs of bivalency, we think that these sites are marked for activation or silencing at later stages. We have discussed this in the manuscript ("discussion"). However, in response to this and earlier comment, we went back and searched for genes that show H3K27-demethylation in the most mature cell types (at 24 hpf) in our data, and found a subset of genes that show K27 demethylation after acquiring them earlier. Interestingly, most of the top genes in this list are well-known as developmentally important for their corresponding cell types. We have added this new result and discussed it further in the manuscript (Fig. 2d,e, , Supplementary table 3).

      *Minors: *

        • The authors cited two scMulti-omics studies in the introduction, but there have been lots of single-cell multi-omics studies published recently. The authors should cite and consider them. *

      We have cited more single-cell chromatin and multiome studies focussed on early embryogenesis in the introduction now.

      *2. T-ChIC seems to have been presented in a previous paper (ref 15). Therefore, Fig. 1a is unnecessary to show. *

      Figure 1a. shows a summary of our Zebrafish TChIC workflow, which contains the unique sample multiplexing and sorting strategy to reduce batch effects, which was not applied in the original TChIC workflow. We have now clarified this in "Results".

      1. *It's better to show the percentage of cell numbers (30% vs 70%) for each heatmap in Figure 2C. *

      We have added the numbers to the corresponding legends.

      1. *Please double-check the citation of Fig. S4C, which may not relate to the conclusion of signal differences between lineages. *

      The citation seems to be correct (Fig. S4C supplements Fig. 2C, but shows mesodermal lineage cells) but the description of the legend was a bit misleading. We have clarified this now.

      *5. Figure 4C has not been cited or mentioned in the main text. Please check. *

      Thanks for pointing it out. We have cited it in Results now.

      Reviewer #2 (Significance (Required)):

      *Strengths: This work utilized a new single-cell multi-omics method and generated abundant epigenomics and transcriptomics datasets for cells covering multiple key developmental stages of zebrafish. *

      *Limitations: The data analysis was superficial and mainly focused on the correspondence between the two modalities. The discussion of developmental biology was limited. *

      *Advance: The zebrafish single-cell datasets are valuable. The T-ChIC method is new and interesting. *

      *The audience will be specialized and from basic research fields, such as developmental biology, epigenomics, bioinformatics, etc. *

      *I'm more specialized in the direction of single-cell epigenomics, gene regulation, 3D genomics, etc. *

      Thank you for your remarks.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      *This manuscript introduces T‑ChIC, a single‑cell multi‑omics workflow that jointly profiles full‑length transcripts and histone modifications (H3K27me3 and H3K4me1) and applies it to early zebrafish embryos (4-24 hpf). The study convincingly demonstrates that chromatin-transcription coupling strengthens during gastrulation and somitogenesis, that promoter‑anchored H3K27me3 spreads in cis to enforce developmental gene silencing, and that integrating TF chromatin status with expression can predict lineage‑specific activators and repressors. *

      *Major concerns *

      1. *Independent biological replicates are absent, so the authors should process at least one additional clutch of embryos for key stages (e.g., 6 hpf and 12 hpf) with T‑ChIC and demonstrate that the resulting data match the current dataset. *

      Thanks for pointing this out. We had, in fact, performed T-ChIC experiments in four rounds of biological replicates (independent clutch of embryos) and merged the data to create our resource. Although not all timepoints were profiled in each replicate, two timepoints (10 and 24hpf) are present in all four, and the celltype composition of these replicates from these 2 timepoints are very similar. We have added new plots in figure S2f and added (new) supplementary table (#1) to highlight the presence of biological replicates.

      2. *The TF‑activity regression model uses an arbitrary R² {greater than or equal to} 0.6 threshold; cross‑validated R² distributions, permutation‑based FDR control, and effect‑size confidence intervals are needed to justify this cut‑off. *

      Thank you for this suggestion. We did use 10-fold cross validation during training and obtained the R2 values of TF motifs from the independent test set as an unbiased estimate. However, the cutoff of R2 > 0.6 to select the TFs for classification was indeed arbitrary. In the revised version, we now report the FDR-adjusted p-values for these R2 estimates based on permutation tests, and select TFs with a cutoff of padj supplementary table #4 to include the p-values for all tested TFs. However, we see that our arbitrary cutoff of 0.6 was in fact, too stringent, and we can classify many more TFs based on the FDR cutoffs. We also updated our reported numbers in Fig. 4c to reflect this. Moreover, supplementary table #4 contains the complete list of TFs used in the analysis to allow others to choose their own cutoff.

      3. *Predicted TF functions lack empirical support, making it essential to test representative activators (e.g., Tbx16) and repressors (e.g., Zbtb16a) via CRISPRi or morpholino knock‑down and to measure target‑gene expression and H3K4me1 changes. *

      We agree that independent validation of the functions of our predicted TFs on target gene activity would be important. During this revision, we analysed recently published scRNA-seq data of Saunders et al. (2023) (Saunders et al., 2023), which includes CRISPR-mediated F0 knockouts of a couple of our predicted TFs, but the scRNAseq was performed at later stages (24hpf onward) compared to our H3K4me1 analysis (which was 4-12 hpf). Therefore, we saw off-target genes being affected in lineages where these TFs are clearly not expressed (attached Fig 1). We therefore didn't include these results in the manuscript. In future, we aim to systematically test the TFs predicted in our study with CRISPRi or similar experiments.

      4. *The study does not prove that H3K27me3 spreading causes silencing; embryos treated with an Ezh2 inhibitor or prc2 mutants should be re‑profiled by T‑ChIC to show loss of spreading along with gene re‑expression. *

      We appreciate the suggestion that indeed PRC2-disruption followed by T-ChIC or other forms of validation would be needed to confirm whether the H3K27me3 spreading is indeed causally linked to the silencing of the identified target genes. But performing this validation is complicated because of multiple reasons: 1) due to the EZH2 contribution from maternal RNA and the contradicting effects of various EZH2 zygotic mutations (depending on where the mutation occurs), the only properly validated PRC2-related mutant seems to be the maternal-zygotic mutant MZezh2, which requires germ cell transplantation (see Rougeot et al. 2019 (Rougeot et al., 2019)) , and San et al. 2019 (San et al., 2019) for details). The use of inhibitors have been described in other studies (den Broeder et al., 2020; Huang et al., 2021), but they do not show a validation of the H3K27me3 loss or a similar phenotype as the MZezh2 mutants, and can present unwanted side effects and toxicity at a high dose, affecting gene expression results. Moreover, in an attempt to validate, we performed our own trials with the EZH2 inhibitor (GSK123) and saw that this time window might be too short to see the effect within 24hpf (attached Fig. 2). Therefore, this validation is a more complex endeavor beyond the scope of this study. Nevertheless, our further analysis of H3K27me3 de-methylation on developmentally important genes (new Fig. 2e-f, Sup. table 3) adds more confidence that the polycomb repression plays an important role, and provides enough ground for future follow up studies.

      *Minor concerns *

      1. *Repressive chromatin coverage is limited, so profiling an additional silencing mark such as H3K9me3 or DNA methylation would clarify cooperation with H3K27me3 during development. *

      We agree that H3K27me3 alone would not be sufficient to fully understand the repressive chromatin state. Extension to other chromatin marks and DNA methylation would be the focus of our follow up works.

      *2. Computational transparency is incomplete; a supplementary table listing all trimming, mapping, and peak‑calling parameters (cutadapt, STAR/hisat2, MACS2, histoneHMM, etc.) should be provided. *

      As mentioned in the manuscript, we provide an open-source pre-processing pipeline "scChICflow" to perform all these steps (github.com/bhardwaj-lab/scChICflow). We have now also provided the configuration files on our zenodo repository (see below), which can simply be plugged into this pipeline together with the fastq files from GEO to obtain the processed dataset that we describe in the manuscript. Additionally, we have also clarified the peak calling and post-processing steps in the manuscript now.

      *3. Data‑ and code‑availability statements lack detail; the exact GEO accession release date, loom‑file contents, and a DOI‑tagged Zenodo archive of analysis scripts should be added. *

      We have now publicly released the .h5ad files with raw counts, normalized counts, and complete gene and cell-level metadata, along with signal tracks (bigwigs) and peaks on GEO. Additionally, we now also released the source datasets and notebooks (.Rmarkdown format) on Zenodo that can be used to replicate the figures in the manuscript, and updated our statements on "Data and code availability".

      *4. Minor editorial issues remain, such as replacing "critical" with "crucial" in the Abstract, adding software version numbers to figure legends, and correcting the SAMtools reference. *

      Thank you for spotting them. We have fixed these issues.

      Reviewer #3 (Significance (Required)):

      The method is technically innovative and the biological insights are valuable; however, several issues-mainly concerning experimental design, statistical rigor, and functional validation-must be addressed to solidify the conclusions.

      Thank you for your comments. We hope to have addressed your concerns in this revised version of our manuscript.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This manuscript introduces T‑ChIC, a single‑cell multi‑omics workflow that jointly profiles full‑length transcripts and histone modifications (H3K27me3 and H3K4me1) and applies it to early zebrafish embryos (4-24 hpf). The study convincingly demonstrates that chromatin-transcription coupling strengthens during gastrulation and somitogenesis, that promoter‑anchored H3K27me3 spreads in cis to enforce developmental gene silencing, and that integrating TF chromatin status with expression can predict lineage‑specific activators and repressors.

      Major concerns

      1. Independent biological replicates are absent, so the authors should process at least one additional clutch of embryos for key stages (e.g., 6 hpf and 12 hpf) with T‑ChIC and demonstrate that the resulting data match the current dataset.
      2. The TF‑activity regression model uses an arbitrary R² {greater than or equal to} 0.6 threshold; cross‑validated R² distributions, permutation‑based FDR control, and effect‑size confidence intervals are needed to justify this cut‑off.
      3. Predicted TF functions lack empirical support, making it essential to test representative activators (e.g., Tbx16) and repressors (e.g., Zbtb16a) via CRISPRi or morpholino knock‑down and to measure target‑gene expression and H3K4me1 changes.
      4. The study does not prove that H3K27me3 spreading causes silencing; embryos treated with an Ezh2 inhibitor or prc2 mutants should be re‑profiled by T‑ChIC to show loss of spreading along with gene re‑expression.

      Minor concerns

      1. Repressive chromatin coverage is limited, so profiling an additional silencing mark such as H3K9me3 or DNA methylation would clarify cooperation with H3K27me3 during development.
      2. Computational transparency is incomplete; a supplementary table listing all trimming, mapping, and peak‑calling parameters (cutadapt, STAR/hisat2, MACS2, histoneHMM, etc.) should be provided.
      3. Data‑ and code‑availability statements lack detail; the exact GEO accession release date, loom‑file contents, and a DOI‑tagged Zenodo archive of analysis scripts should be added.
      4. Minor editorial issues remain, such as replacing "critical" with "crucial" in the Abstract, adding software version numbers to figure legends, and correcting the SAMtools reference.

      Significance

      The method is technically innovative and the biological insights are valuable; however, several issues-mainly concerning experimental design, statistical rigor, and functional validation-must be addressed to solidify the conclusions.

    1. the networks in 1951enacted a Television Code closely modelled on the Hollywood ProductionCode

      Due to controversial topics and group opinions television networks had to now follow a television code so that everyone was happy with what was being shown on tv.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This is an exploratory study that doesn't explore quite enough. Critically, the authors make a point of mentioning that neuronal firing properties vary across cell types, but only use baseline firing rate as a proxy metric for cell type. This leaves several important explorations on the table, not limited to the following:”

      1a: “Do waveform shape features, which can also be informative of cell type, predict the effect of stimulation?”

      To address this question, we modeled our approach to cell type classification after Peyrache et al. 2012. More specifically, we extracted two features from the mean unit waveforms—the valley-to-peak time (VP) and the peak half-width (PHW). These features were then used to classify units into two distinct clusters (k-means, clusters = 2, based on a strong prior from existing literature), representing putative excitatory and inhibitory neurons. Our approach recapitulated many of the same observations in Peyrache et al. 2012, namely (1) identification of two clusters (low PHW/VP: inhibitory, high PHW/VP: excitatory), (2) an ~80/20 ratio of excitatory/inhibitory neurons, and (3) greater baseline firing rates in the inhibitory vs. excitatory neurons. However, we did not observe a preferential modulation of one cell type compared to another (see newly created Figure 4). A description of this analysis and its takeaways has been incorporated into the manuscript.

      Change to Text:

      Created Figure 4 (Separation of presumed excitatory and inhibitory neurons by waveform morphology).

      Caption: (A) Two metrics were calculated using the averaged waveforms for each detected unit: the valley-to-peak width (VP) and peak half-width (PHW). (B) Scatterplot of the relationship between VP and PHW; note that units with identical metrics are overlaid. Using k-means clustering, we identified two distinct response clusters, representing presumed excitatory (E, blue) and inhibitory (I, red) neurons. The units from which the example waveforms were taken are outlined in black. Probability distributions for each metric are shown along the axes. (C) Total number of units within each cluster, separated by region. (D) Comparison of baseline firing rates, separated by cluster. (E) Percent of modulated units in each cluster. * p < 0.05, NS = not significant.

      Added a description of clustering methodology to lines 132-137: “We calculated two metrics from the averaged waveform from each detected unit: the valley-to-peak-width (VP) and the peak half-width (PHW) (Figure 4A); previously, these two properties of waveform morphology have been used to discriminate pyramidal cells (excitatory) from interneurons (inhibitory) in human intracranial recordings (Peyrache et al., 2012). Next, we performed k-means clustering (n = 2 clusters) on the waveform metrics, in line with previous approaches to cell type classification.

      Added a section in the Results titled “Theta Burst Stimulation Modulates Excitatory and Inhibitory Neurons Equally”. Lines 370-378: “Using k-means clustering, we grouped neurons into two distinct clusters based on waveform morphology, representing neurons that were presumed to be excitatory (E) and inhibitory (I) (Figure 4B). Inhibitory (fast-spiking) neurons exhibited shorter waveform VP and PHW, compared with excitatory (regular-spiking) neurons (I cluster centroid: VP = 0.50ms, PHW = 0.51ms; E cluster centroid: VP = 0.32ms, PHW = 0.31ms), and greater baseline firing rates (U(N<sub>I</sub> = 23, N<<sub>E</sub> = 133) = 1074.50, p = 0.023) (Figure 4D). Although we observed a much greater proportion of excitatory vs. inhibitory neurons (E: 85.3%, I: 14.7%), stimulation appeared to affect excitatory and inhibitory neurons equally, suggesting that one cell type is not preferentially activated over another (Figure 4E).

      Modified discussion of the effects of stimulation on different cell types. Lines 475-483: “…To test these hypotheses directly, we clustered neurons into presumed excitatory and inhibitory neurons based on waveform morphology. In doing so, we observed ~85% excitatory and ~15% inhibitory neurons, which is very similar what has been reported previously in human intracranial recordings (Cowan et al. 2024, Peyrache et al., 2012). Interestingly, stimulation appeared to modulate approximately the same proportion of neurons for each cell type (~30%), despite the differently-sized groups. Recent reports, however, have suggested that the extent to which electrical fields entrain neuronal spiking, particularly with respect to phase-locking, may be specific to distinct classes of cells (Lee et al., 2024).”

      1b:  “Is the autocorrelation of spike timing, which can be informative about temporal dynamics, altered by stimulation? This is especially interesting if theta-burst stimulation either entrains theta-rhythmic spiking or is more modulatory of endogenously theta-modulated units.”

      The reviewer is correct in suggesting that rate-modulation represents only one of many possible ways by which exogenous theta burst stimulation may influence neuronal activity. Indeed, intracranial theta burst stimulation has previously been shown to evoke theta-frequency oscillatory responses in local field potentials (Solomon et al. 2021), and other forms of stimulation (i.e., transcranial alternating current stimulation) may modulate the rhythm, rather than the rate, of neuronal spiking (Krause et al. 2019).

      To investigate whether stimulation altered rhythmicity in neuronal firing, we contrasted the spike timing autocorrelograms, as suggested. More specifically, we computed the pairwise differences in spike timing for each trial, separating spikes into the same pre-, during-, and post-stimulation epochs described in the manuscript (bin size = 5 ms, max lag = 250 ms), grouped neurons by whether they were modulated, and then contrasted the differences in the latencies of the peak normalized autocorrelation value between epochs. Only neurons with a firing rate of ≥ 1 Hz (n = 70/203, 34.5%) were included in this analysis since sparse firing resulted in noisy autocorrelation estimates. Subsequent statistical testing of the peak latency differences between pre-/during- and pre-/post-stimulation did not reveal any group-level differences (Mann-Whitney U tests, p > 0.05). Thus, we were not able to identify neuronal responses suggestive of altered rhythmicity (see Figure S5). A description of this analysis and its takeaways has been incorporated into the manuscript.

      Of note, there are two elements of the data that constrain our ability to detect modulation in the rhythm of firing. First, the baseline activity recorded across neurons modulated by stimulation was relatively low (i.e., median firing rate = 1.77 Hz). Second, stimulation often resulted in a suppression, rather than an enhancement, of firing rate. Taken together, the sparse firing afforded limited opportunity to characterize changes to subtle patterns of spiking. 

      Change to Text:

      Created Figure S5 (Analysis of modulation in spiking rhythmicity)

      Caption: (A) Representative autocorrelograms ACG) for a single neuron. The pairwise differences in spike timing were computed for each trial and epoch (bin size = 5 ms, max lag = 250 ms), then smoothed with a Gaussian kernel. The peak in the normalized ACG across trials was computed for each epoch. (B) Kernel density estimate of the peak ACG lag, separated by epoch. (C) The peak ACG lags were split by whether the neuron was modulated (Mod) or unaffected by stimulation (NS = not significant) for each of the two contrasts: pre- vs. during-stim (left) and pre- vs. post-stim (right).

      Details about the autocorrelation methodology have been incorporated. Lines 166-172: “To investigate whether stimulation altered rhythmicity in neuronal firing, we analyzed the spike timing autocorrelograms. More specifically, we computed the pairwise differences in spike timing for each trial (bin size = 5 ms, max lag = 250 ms) and then contrasted the differences in the latencies of the peak normalized autocorrelation value between epochs (pre-, during-, post-stimulation). Only neurons with a firing rate of ≥ 1 Hz (n = 70/203, 34.5%) were included in this analysis since sparse firing resulted in noisy autocorrelation estimates.

      The results from contrasting the autocorrelograms are now mentioned briefly. Lines 297-298: “Stimulation, however, did not appear to alter the rhythmicity in neuronal firing, as measured by spiking autocorrelograms (Figure S5).”

      1c: “The authors reference the relevance of spike-field synchrony (30-55 Hz) in animal work, but ignore it here. Does spike-field synchrony (comparing the image presentation to post-stimulation) change in this frequency range? This does not seem beyond the scope of investigation here.”

      We agree that a further characterization of spike-field and spike-phase relationships may provide rich insights into more complex regional and interregional dynamics that may be altered by stimulation. Given that many metrics are biased by sample size (e.g., number of spikes), which can vary considerably, computing the pairwise phase consistency (PPC) between spikes and LFP is a preferred metric (Vinck et al. 2010). Although PPC is unbiased, its variance nonetheless increases considerably with low spike counts; pooling spike counts across trials, however, decouples the temporal relationship between spiking and the LFP phase for each trial, confounding results and yielding an unstable estimate.

      To determine whether such an analysis is indeed possible, we calculated the percentage of stimulation trials with ≥ 10 spikes in both the 1s pre- and post-stimulation epochs (a relatively low threshold for inclusion). Only a very small proportion of the total number of trials across all neurons met this criterion (2.5%). Thus, because of the sparse spiking in our data, we are unable to reliably characterize spike-field or spike-phase modulation in detected neurons.

      Change to Text:

      In the manuscript, we have added a description of why our data is not well-suited to investigate these relationships.

      Lines 532-538: “The present study did not investigate interactions between spiking activity and local field potentials because neuronal spiking was sparse at baseline and often further suppressed by stimulation; only a very small proportion of the total number of trials across all neurons exhibited ≥ 10 spikes in both the 1s pre- and post-stimulation epochs (~2.5%). Although certain metrics are not biased by sample size (e.g., pairwise phase consistency), low spike counts can dramatically affect variance and, therefore, result in unstable estimates (Vinck et al., 2011).

      1d: “How does multi-unit activity respond to stimulation? At this somewhat low count of neurons (total n=156 included) it would be valuable to provide input on multi-unit responses to stimulation as well.”

      We thank the reviewer for this suggestion. We have incorporated an analysis of multiunit activity (MUA), which similarly identifies robust modulation via permutation-based statistical testing and characterizes the different profiles of responses (i.e., increased vs. decreased MUA threshold crossings pre- vs. post-stimulation).

      Change to Text:

      Created Figure S8 (Analysis of multiunit activity response to stimulation)

      Caption: (A) Example trace of multiunit activity (MUA) in one channel during a single stimulation trial. Threshold crossings are highlighted with a pink dot overlaid on the MUA signal with a corresponding hash below. (B) The percentage of channels with significantly modulated MUA, separated by the direction of effect. (C) The percentage of channels with significantly modulated MUA, separated by direction effect and region. Inc (red; post > pre) vs. Dec (blue; post < pre). HIP = hippocampus, OFC = orbitofrontal cortex, AMY = amygdala, ACC = anterior cingulate cortex. *** p < 0.001, NS = not significant.

      Details about the MUA methodology have been incorporated. Lines 174-180: “Finally, we measured modulation in multiunit activity (MUA) by filtering the microleectrode signals in a 300-3,000 Hz window and counting the number of threshold crossings. Thresholds were determined on a per-channel basis and defined as -3.5 times the root mean square of the signal during the baseline period; activity during stimulation was excluded since stimulation artifact is difficult to separate from MUA in the absence of spike sorting.

      MUA results are now incorporated. Lines 365-367: “Additional characterization of MUA revealed a dominant signature of increased activity post- vs. pre-stimulation, in line with these trends observed at the single-neuron level (Figure S8).”

      1e: “Several intracranial studies have implicated proximity to white matter in determining the effects of stimulation on LFPs; do the authors see an effect of white matter proximity here?”

      We thank the reviewer for the interesting question. Subsequent characterization revealed only small differences in the proximity of stimulation contacts to white matter (range 1.5-8.0 mm), likely because the chosen target (i.e., basolateral amygdala) has several nearby white matter structures (e.g., stria terminalis). Nonetheless, we performed a linear regression between the proximity to white matter and the stimulation-induced effect on behavior (stimulation vs. no-stimulation d’ difference), the results of which indicate no clear association (p > 0.05; see Figure S9). Critically, this is not to suggest that white matter proximity has no interaction with the reported behavioral effects, but rather, that we could not identify such an association within our data.

      Change to Text:

      Created Figure S9 (The effect of stimulation proximity to white matter and distance to recorded neurons).

      Caption: (A) Kernel density estimate of the Euclidean distance from stimulation contacts to nearest WM structure (in mm); hash marks represent individual observations. (B) The change in memory performance (Δd’) was linearly regressed onto the distance from the stimulated contacts to white matter.

      The following has been added to lines 405-426: “Proximity to white matter has been shown to influence the effects of stimulation on behavior and the strength of evoked responses (Mankin et al., 2021; Mohan et al., 2020; Paulk et al., 2022). Across all stimulated contacts, we observed only small differences in the proximity of stimulation contacts to white matter (median = 4.5 mm, range = 1.5-8.0 mm), likely because the chosen target (i.e., basolateral amygdala) has several nearby white matter structures (e.g., stria terminalis). Nonetheless, we performed a linear regression between the proximity to white matter and the stimulation-induced effect on behavior (stimulation vs. no-stimulation d’ difference), the results of which indicate no clear association (p > 0.05; see Figure S9).

      Comment 2: “It is a little confusing to interpret stimulation-induced modulation of neuronal spiking in the absence of stimulation-induced change in behavior. How do the authors findings tell us anything about the neural mechanisms of stimulation-modulated memory if memory isn't altered? In line with point #1, I would suggest a deeper dive into behavior (e.g. reaction time? Or focus on individual sessions that do change in Figure 4A?) to make a stronger statement connecting the neural results to behavioral relevance.”

      We agree that the connection between the observed stimulation-induced neuronal modulation and effects on behavior is unclear and has proven challenging to elucidate. Per the reviewer’s suggestion, we further focused our analyses on the neuronal modulation effects in the individual sessions that resulted in a robust change in memory performance (stimulation vs. no-stimulation d’ difference threshold of ± 0.5, based on a moderate effect size for Cohen’s d); both a positive and negative threshold were used to capture robust changes in memory performance associated with firing rate modulation, whether enhancement or suppression. To this end, we contrasted the proportion of modulated neurons in the sessions where stimulation resulted in a robust behavioral change (Δd’) with those that did not (~d’). We did not observe a difference in the proportions between groups when collapsed across all sampled regions, or when separately evaluated (Fisher’s exact tests, p > 0.05; see Figure 5C).

      Given that this approach did not further clarify the connection between our neural and behavioral results, we believe it is most appropriate to deemphasize claims in the manuscript regarding the potential insights for behavioral modulation (e.g., memory enhancement), and have done so.

      Change to Text:

      Toned down reference to the memory-related effects of stimulation in the abstract by removing the following lines from the abstract: “Previously, we demonstrated that intracranial theta burst stimulation (TBS) of the basolateral amygdala (BLA) can enhance declarative memory, likely by modulating hippocampal-dependent memory consolidation…” and “…and motivate future neuromodulatory therapies that aim to recapitulate specific patterns of activity implicated in cognition and memory.”

      Changed Figure 4 to Figure 5

      Created Figure 5C (Interaction between behavioral effects and neuronal modulation)(C)  Change in recognition memory performance was split into two categories using a d’ difference threshold of ± 0.5: responder (positive or negative; Δd’, pink) and non-responder (~d’, grey). Individual d’ scores are shown (left) with points colored by outcome category; dotted lines demarcate category boundaries, and the grey-shaded region represents negligible change. The number of sessions within each outcome category (middle) and the proportion of modulated units as a function of outcome category, separated by region (right). NS = not significant.

      The description of the behavioral results has been updated. Lines 394-403: “At the level of individual sessions, we observed enhanced memory (Δd’ > +0.5) in 36.7%, impaired memory (Δd’ < -0.5) in 20.0%, and negligible change (-0.5 ≤ Δd’ ≤ 0.5) in 43.3% when comparing performance between the stim and no-stim conditions; a threshold of Δd’ ± 0.5 was chosen for this classification based on the defined range of a “medium effect” for Cohen’s d. To test our hypothesis that neuronal modulation would be associated with changes in memory performance, we combined the sessions that resulted in either memory enhancement or impairment and contrasted the proportion of modulated units across regions sampled. We did not, however, observe a meaningful difference in the proportion of modulated units when grouped by behavioral outcome (all contrasts p > 0.05) (Figure 5C).

      Lines 213-214 and 394-397 have been edited to reflect a change in the d’ threshold used for categorizing behavioral results (from Δd’ ± 0.2 to Δd’ ± 0.5).

      Comment 3: “It is not clear to me why the assessment of firing rates after image onset and after stim offset is limited to one second - this choice should be more theoretically justified, particularly for regions that spike as sparsely as these.”

      We thank the reviewer for this question and acknowledge that no clear justification was provided for this decision in the manuscript. Our decision to limit each of the analysis epochs to 1s was chosen for two reasons. First, the maximum possible length of the during-stimulation epoch was 1 s (stim on for 1 s). Although the pre- and post-stimulation epochs could be extended without issue, we were concerned that variable time windows could introduce a bias, for instance, resulting in different variances between epochs. Second, we anticipated, both from empirical observations and prior literature, that the neural response following stimulation or task features (e.g., image onset/offset) was likely to be transient, rather than sustained for a period of many seconds. By keeping the windows short, we ensured that our approach to detecting modulation (i.e., contrasting trial-wise spike counts between each pair of epochs) captured the intended effect rather than random noise. We have incorporated a discussion of this rationale in the Peri-Stimulation Modulation Analyses section.

      Change to Text:

      Lines 156-158 have been added: “Each epoch was constrained to 1 s to ensure that subsequent firing rate contrasts were unbiased and to capture potential transient effects (e.g., image onset/offset).”

      Comment 4: “This work coincides with another example of human intracranial stimulation investigating the effect on firing rates (doi: https://doi.org/10.1101/2024.11.28.625915). Given how incredibly rare this type of work is, I think the authors should discuss how their work converges with this work (or doesn't).”

      Thank you for bringing this highly relevant work to our attention. We were unaware of this recent preprint and have incorporated a discussion of its main findings into the manuscript.

      Change to Text:

      New citations: van der Plas et al. 2024 (bioRxiv), Cowan et al. 2024 (bioRxiv)

      The discussion of related studies has been updated. Lines 447-457: “Few studies, however, have characterized the impact of electrical stimulation via macroelectrodes on the spiking activity of human cortical neurons, none of which involve intracranial theta burst stimulation. One study reported a long-lasting reduction in neural excitability among parietal neurons, with variable onset time and recovery following continuous transcranial TBS in non-human primates (Romero et al., 2022). In a similar vein, it was recently shown that human neurons are largely suppressed by single-pulse electrical stimulation (Cowan et al., 2024; Plas et al., 2024). Other emerging evidence suggests that transcranial direct current stimulation may entrain the rhythm rather than rate of neuronal spiking (Krause et al., 2019) and that stimulation-evoked modulation of spiking may meaningfully impact behavioral performance on cognitive tasks (Fehring et al., 2024).”

      Comment 5: “What information does the pseudo-population analysis add? It's not totally clear to me.”

      We recognize the need to further contextualize the motivation for the exploratory pseudo-population analysis and appreciate the reviewer for bringing the lack of detail to our attention. In brief, the analysis allowed us to observe trends in activity across populations of neurons, which, in principle, are not visible by characterizing modulation solely in discrete neurons. Additional details have been incorporated into the manuscript, as suggested.

      Change to Text:

      Additional justification has been incorporated in the description of the methodology. Lines 185-187: “…This approach enables the identification of dominant patterns of coordinated neural activity that may not be apparent when examining individual neurons in isolation.”, lines 192-194: “…By collapsing across subjects into a common pseudo-population, this analysis provides a mesoscale view of how stimulation modulates shared activity patterns across anatomically distributed neural populations.”

      A summary interpretation has been added to the paragraph describing the results. Lines 326-328: “Taken together, these analyses reveal global structure in the state space of responses to BLA stimulation within hippocampal circuits.”

      Reviewer #2 (Public review):

      Comment 1 “Authors suggest that the units modulated by stimulation are largely distinct from those responsive to image offset during trials without stimulation. The subpopulation that responds strongly also tends to have a higher baseline of firing rate. It's important to add that the chosen modulation index is more likely to be significant in neurons with higher firing rates.”

      This is an important point that was not previously addressed in our manuscript. We suspect there are likely two factors at play worth considering with respect to our chosen nonparametric modulation index: neurons with lower activity require smaller changes in spike counts to be significantly modulated (easier to flip ranks), and neurons with higher activity empirically exhibit greater absolute shifts in the number of spikes. Our further use of permutation testing, while mitigating false positives, may also somewhat constrain the ability to detect modulation in sparsely active neurons. Nonetheless, given that many trials entailed few or no spikes, we believe this approach is preferable to alternatives that may be more susceptible to noise (e.g., percent change in trial-averaged firing rate from baseline).

      To better understand the tradeoffs with detection probability, we performed a sensitivity analysis. We generated synthetic data with different baseline firing rates (0.1-5.0 Hz) and effect sizes (± 0.1-0.7 Hz) and simulated the likelihood of detection with our given modulation index across neurons. The results of the simulation support the notion that the probability of detecting modulation is lower for sparsely active neurons (Figure S8C). Further discussion of this consideration for the chosen modulation index, as well as details regarding the sensitivity analysis, have been incorporated into the manuscript.

      Change to Text:

      Created Figure S7C (Detection probability analysis)

      Caption: The same permutation-based analyses reported in the manuscript were repeated under different control conditions… (C) Visualization of the predicted probability of detecting modulation across synthetic neurons with variable firing rates and modulation effect sizes; FR = firing rate.

      Lines 223-224 have been added to the Methods section titled “Firing Rate Control Analyses”: “We performed a series of control analyses to test whether our approach to firing rate detection was robust…”

      A description of the simulation has been incorporated into the same section as above. Lines 234-237: “Finally, to better understand the tradeoffs with our statistical approach, we generated synthetic data with different baseline firing rates (0.1-5.0 Hz) and effect sizes (± 0.1-0.7 Hz), then simulated the likelihood of detecting modulation across variable conditions (Figure S7C).”

      The description of the results from the control analyses has been updated. Lines 330-339: “Finally, we performed three supplementary analyses to evaluate the robustness of our approach to detecting firing rate modulation: a sensitivity analysis assessing the proportion of modulated units at different firing rate thresholds for inclusion/exclusion, a data dropout analysis designed to control for the possibility that non-physiological stimulation artifacts may preclude the detection of temporally adjacent spiking, and a synthetic detection probability analysis. These results recapitulate our observation that units with higher baseline firing are most likely to exhibit modulation (though the probability of detecting modulation is lower for sparsely active neurons) and suggest that suppression in firing rate is not solely attributable to amplifier saturation following stimulation (Figure S7).

      Comment 2: “Readers can benefit from understanding with more details the locations chosen for stimulation - in light of previous studies that found differences between effects based on proximity to white matter (For example - PMID 32446925, Mohan et al, Brain Stimul. 2020 and PMID 33279717 Mankin et al Brain Stimul. 2021).”

      This has been addressed in the above response to Reviewer’s 1 comment 1.1e.

      Change to Text:

      See changes related to Reviewer 1 comment 1.1e.

      Comment 3: “Missing information in the manuscript…”

      3a: “Images of stimulation anatomical locations for all subjects included in this study. Ideally information about the impedance of the contacts to be able to calculate the actual current used.”

      As requested, we have provided an image from the coronal T1 MRI sequence, which highlights the position of the stimulated contacts for each of the 16 patients. Though we did not measure the impedances directly, the stimulation was current-controlled, which ensured that the desired current and charge density were consistent regardless of the tissue or electrode impedance.

      Change to Text:

      Created Figure S1 (Anatomical location of stimulated electrodes).

      Caption: A coronal slice from the T1-weighted MRI scan is shown for each patient who participated in the study (n = 16). Electrode contacts within the same plane of the image are shown with blue circles, and the bipolar pair of stimulated contacts within the basolateral amygdala is highlighted in red.

      Lines 144-145 have been edited to reflect that the delivered stimulation was current-controlled: “Specifically, we administered current-controlled, charge-balanced, …”

      3b: “The studied population is epilepsy patients, and the manuscript lacks description of their condition, proximity to electrodes included in the study to pathological areas, and the number of units from each patient/hemisphere.”

      We agree that additional information regarding patient demographics, experimental details, and clinical characteristics would further contextualize this unique patient population. A new table has been included, which contains the following information: patient ID, sex, age, # experimental session, # SEEG leads (and # microelectrodes), # detected units (L vs. R hemisphere), and suspected seizure onset zone.

      Change to Text:

      Created Table S1 (Patient demographics and clinical characteristics).

      Lines 258-259 have been added: “…(see Table S1 for patient demographics).”

      3c: “I haven't seen any comments on code availability (calculating modulation indices and statistics) and data sharing.”

      For clarification, a section titled Resource Availability is already appended to the end of the manuscript following the Conclusion, which describes the data and code availability.

      Change to Text:

      None

      3d: “Small comment - Figure legend 3E - Define gray markers (non-modulated units?)”

      Thank you for highlighting this omission. We have updated the relevant figure caption.

      Change to Text:

      The following has been added to the Figure 3 caption: “…whereas units without a significant change in activity are shown in grey.”

    1. We gather requirements. We think through implementation details. We integrate with lots of different sources. And the ultimate thing that we produce is code

      the ultimate thing we produce is code

    2. the dream of the industry where you can write your your code your intentions once and run them everywhere.

      write your intention

      run everywhere

      nod code?

  5. clavis-nxt-user-guide-clavisnxt-erste-dev.apps.okd.dorsum.intra clavis-nxt-user-guide-clavisnxt-erste-dev.apps.okd.dorsum.intra
    1. Author response:

      The following is the authors’ response to the previous reviews

      We have made the following small adjustments and resubmit the manuscript to be published as a Version of Record with eLife.

      Changes in main text of the manuscript:

      We have moved the “Proposed additional tests” subsection to the Discussion section as suggested by the referee. 

      We have added a link to a Github repository and a link to a Zenodo data repository at the beginning of the Materials and Methods section in the “Data and materials availability” subsection. The Github repository contains simulation code and data, and single-cell data analysis code. The Zenodo link contains our experimental data (we await your confirmation before we publish it officially on Zenodo).   

      Changes in the supplemental information files

      We have fixed the typo on page 29 of the SI in which Eq. (8) was referred to in a derivation. It should be Eq. (5) instead. We thank the referee for catching this mistake which has now been corrected.

      We have fixed a typo on page 29 of SI, in which the word “evoke” is now “invoke”.  

      We have clarified the derivation on page 29 of the SI. The referee is correct that the limit condition was used to set the right-hand side of Eq. (5.11) to zero.

    1. The provocative essay written by James Heathers is a genuine attempt to quantify the current prevalence of two growing research malpractices, namely fabrication and falsification (FF for short), which are universally recognized as gross misconducts. The matter is of interest not only to researchers themselves (including meta-scientists), but also to general audiences, since taxpayers have a natural right to oversee the rewards of Science for the society at large. The underlying assumption of the author is that the generally accepted figure of 2% of researchers involved at least once in FF should now be considered as a lower bound. This 2% rate appeared in an article authored by Daniele Fanelli in 2009, and made an impact in the scholarly community. However, a lot of water has flowed under the bridge since then, and new actors showed up: papermills, sophisticated digital tools (intended for both data fabrication and FF tracking), whistleblowers communicating via social networks, generative artificial intelligence, etc. The update proposed by James Heathers is thus certainly welcome.

      The other premise of the author is that the assessment of the proportion of faking scientists is not a suitable proxy. Instead, he preferred to address a tangential issue: the estimation of the rate of scholarly papers including fabricated or falsified data. According to the author, such an approach has more benefits than drawbacks, and could be, from an idealistic point of view, fully automated. One could agree, although the fear of seeing the building of an Orwellian machinery is never far away. At the end of the process, offending papers are retracted (assuming, again, an ideal world), while the authors of the flagged papers are jailed (metaphorically or not).

      A survey of more recent studies was thus carried out. Although the author acknowledges that the small sample size for his study (N = 12), as well as the large dispersion of FF estimates retrieved from this corpus, do not allow a proper meta-analysis, an alarming figure of 14.3% for the updated FF rate emerges. Moreover, this figure is consistent with independent data reported by other sleuths engaged in the fight against questionable research practices, which are mentioned in the “discussion” section of the paper. Even if estimated in a rough way, the increase of FF in less than 15 years, if confirmed by other studies, is a real threat to Science, and should be addressed urgently.

      The main value of this essay is thus to raise concerns about the fast growth of FF, rather than to provide an up-to-date FF rate, which is anyway probably impossible to obtain in a reliable manner. On the other hand, an obvious weakness of the study is the chosen target: by focusing his attention on papers, James Heathers is missing the human dimension of the academic endeavour. Indeed, authors and papers are entangled bodies, and like entangled particles, they are described by a single state involving both entities: a paper does not exist without authors, and authors are invisible if they do not publish on a regular basis.

      Nowadays, scientific papers are extremely complex, and almost always impenetrable to researchers outside of the involved field. However, Homo academicus (as coined by Pierre Bourdieu) is also a very complex being. This is why, despite there is an unambiguous definition for FF, the false positive and negative rates of detecting FF are unknown, as recognized by James Heather. In particular, false positive detections can be detrimental to authors. This point is mentioned en passant in the essay, but should be emphasized: it is more than just a drawback of the used methodology, since it is related to the very human dimension of the scholarly enterprise.

      Perhaps a complementary perspective of the work carried out by James Heathers could be based on the following example: James Ibers (1930-2021), an old-school chemist and influential crystallographer, wrote a memoir published by the American Crystallographic Association, shortly before his death.1 He describes how, as a freshman at Caltech, he attended a mandatory one-week orientation workshop. In his own words: “The most important message I took away was the Caltech Honor Code for all undergraduates. In its simplest terms: You can’t cheat in Science because you will eventually be found out. I have adhered to that Code as a husband, a father, a scientist, a teacher, a research director, and all others I have dealt with”. How many of us can ensure, without hesitation, that they stand next to Ibers? What is the tolerable threshold of cheaters in Science? 2%? 14.3%? More?

      James Heathers ends his article with a worrying sentence: “Priorities must change, or science will start to die”. Perhaps, however, Science is already as dead as a dodo.

      1 https://chemistry.northwestern.edu/documents/people/james_ibers.aca.memoir.2020.pdf

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank the reviewers for taking the time to review our manuscript and for providing valuable comments on how to improve it. We are pleased to see that both reviewers recognize the novelty and importance of our study, its conceptual advance and potential clinical significance. They also noted the novelty and value of our functional mechanistic approach using epigenetic editing. Below, we provide a point-by-point response to their questions and points raised. The changes introduced in response to their feedback are highlighted in yellow in the revised manuscript file.

      Point-by-point description of the revisions

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __

      Summary This study by Prada et al. aimed to explore DNA methylation and gene expression in primary EpCAMhigh/PDPNlow cells, consisting of for (probably) the largest part of AT2 cells, to understand the molecular mechanisms behind the impaired regeneration of alveolar epithelial progenitor cells in COPD. They found that higher or lower promoter methylation in COPD-associated cells was inversely correlated with changes in gene expression, with interferon signaling emerging as one of the most upregulated pathways in COPD. IRF9 was identified as the master regulator of interferon signaling in COPD. Targeted DNA demethylation of IRF9 in an A549 cell line resulted in a robust activation of its downstream target genes, including OAS1, OAS3, PSMB8, PSMB9, MX2 and IRF7, demonstrating that demethylation of IRF9 is sufficient to activate the IFN signaling pathway, validating IRF9 as a master regulator of IFN signaling in (alveolar) epithelial cells.

      Major comments:

      • To remove airways (and blood vessels) completely from the lung tissue is difficult, if not impossible. This means that the assumption that the sorted EpCAMpos/PDPNlow cells primarily consisted of AT2 cells remains valid only if a quantitative analysis is conducted on the proportion of HT2-280pos cells in all samples in cytospins to exclude any significant contamination from bronchial epithelial cells. If authors cannot demonstrate >95% pure HT-280-positive cells, then the key conclusions suggesting that the epigenetic regulation of the IFN pathway might be crucial in AT2 progenitor cell regeneration could also potentially apply to bronchial progenitor cells. In addition, if >95% purity cannot be demonstrated, the data should be adjusted to account for differences in cell type composition.

      __Response: __

      We thank the reviewer for raising this important point. Although, as pointed out by the reviewer, we cannot guarantee that our sorted cells do not contain a minor contamination from respiratory / terminal bronchial cells, we carefully selected donors, tissue regions, and sorting strategy to ensure the highest possible enrichment of AT2 cells, as we explain below. We have now expanded the methods and results section and covered this point in the manuscript discussion.

      • The lung tissue pieces we received were distal, as evidenced by the presence of pleura. We collected representative tissue pieces for histology to validate sample quality. Our protocol includes a dissection of all visible airways and vessels using a dissecting microscope, which were cryopreserved separately from distal parenchyma. Hence, the starting material for tissue dissociation was depleted from airways and vessels. The importance of vessel/airway removal for enrichment of distal alveolar cells was established by Tata's group (PMID: 35712012).
      • We selected the AT2 sorting protocol (EpCAMpos/PDPNlow) based on previous publications that used tissue from both healthy and COPD lungs to separate AT2 cells from AT1 and airway basal cells, as AT1 and basal cells are both PDPNhigh (PMID: 22033268, PMID: 23117565; PMID: 35078977). This protocol was favoured due to the lack of information about HT2-280 expression and distribution in COPD lungs.
      • The sort quality for each sample was assessed by the FACS analysis (back sorting) of the sorted cells, where we observed 95-97% purity (EpCAMpos/PDPNlow, __ 1G __shown below). In addition, we validated the sorting protocol and high AT2 enrichment from both no COPD and COPD tissues by immunostaining the FACS-sorted cells with HT2-280, an AT2 marker widely used in the field (strategy suggested by the reviewer) and observed that close to 100% of cells were positive for this marker (__Fig. 1H __shown below). However, we could not do it retrospectively for those patients, where we didn't have enough material. Sorting primary AT2 from small tissue pieces is challenging, and we need at least 20.000 cells to obtain high-quality methylation & RNA-seq data.
      • AT2 marker genes (ABCA3, LPCAT1, LAMP3 and the surfactant genes SFTPA2, SFTPB and SFTPC) were among the top highly expressed genes in our RNA-seq data and were not significantly changed in COPD (please see expression data in __ S2A__ in the manuscript, and below for convenience), as well as Table 6, providing further evidence that the sorted cells carry a strong AT2 transcriptional signature. Fig. 1G* FACS plot examples showing the analysis of sorted AT2 cells (back sorting) from control (blue) and COPD (green) donors displayed over total cell lung suspensions (grey) H Representative IF staining of HT2-280 expression in sorted AT2 cells from no COPD (top) and COPD (bottom) donors. Nuclei (blue) were stained with DAPI, scale bars=20µm __Fig. S2A __Normalized read counts from RNA-seq data for AT2-specific genes in sorted AT2 cells from each donor (dots). Data points represent normalised counts from no COPD (blue), COPD I (light green) and COPD II-IV (dark green). Group median is shown as a black bar. *

      • In agreement with a previous study which profiled bulk AT2 using expression arrays (PMID: 23117565), we also observed upregulation of IFN signaling pathway in COPD AT2s. The enrichment of IFNα/β signature was also observed in COPD in the inflammatory AT2 cluster (iAT2) in a recent scRNA-seq study (PMID: 36108172). As part of the revision, we compared the IFN gene signature identified in our bulk AT2 RNA-seq with a recent scRNA-seq study (published after the submission of our manuscript, PMID: 39147413) that profiled EpCAMpos cells from COPD and non-smoker donor lungs. We observed an upregulation of our IFN signature genes in AT2 in COPD (mostly in AT2c and rbAT2 subsets), suggesting that similar signatures were observed in COPD AT2s in this dataset as well (please see __ S4E-F__ below). ____Figure S4E Expression values for the indicated genes of the IFN pathway from an external scRNA-seq dataset of AT2 cells from COPD patients and healthy controls (Hu et al, 2024). Y-axis shows log-normalized gene expression levels. F. Combined gene set score of the genes shown in (E) in different subsets of AT2 cells from Hu et al, 2024. The IFN signature genes were identified in our integrative analysis of TWGBS and RNA-seq in sorted AT2 cells.

      • We have also carefully examined DNA methylation profiles across all samples. The density plots of our T-WGBS DNA methylation data are very similar among the individual samples in all 3 groups, indicating that the sorted cells consist mostly of a single cell type, as there are no obvious intermediate (25-75%) methylation peaks, as observed in cell mixtures ( 2A and the panel below). No reference DNA methylation profiles are available for respiratory or terminal bronchial cells; hence, we cannot compare how epigenetically different these cells would be from AT2 nor perform a deconvolution for potential minor contamination with distal airway cells. *Figure: DNA methylation density plots of sorted EpCAMpos/PDPNneg cells from no COPD (blue, n=3), COPD I (light green, n=3) and COPD II-IV (dark green, n=5) showing a homogeneous methylation pattern and low abundance at intermediate (25%-75%) methylation values across all profiled samples, indicating that the sorted cells were mostly of a single cell type. *

      • We have now added a sentence to the limitations section of the discussion to cover that point specifically. CHANGES IN THE MANUSCRIPT:

      AT2 cells were isolated by fluorescence-activated cell sorting (FACS) from cryopreserved distal lung parenchyma, depleted of visible airways and vessels of three no COPD controls, three COPD I and five COPD II-IV patients as previously described (24, 52, 53)

      The isolated cells were positive for HT2-280, a known AT2 marker (54)*, as confirmed by immunofluorescence (Fig. 1H), validating the identity and high enrichment of the isolated AT2 populations. ** *

      *Known AT2-specific genes, including ABCA3, LAMP3 and surfactant genes (SFTPA2, SFTPB and SFTPC) were among the top highly expressed genes and were not significantly changed in COPD AT2s (Fig. S2A, Table 6), further confirming the AT2-characteristic transcriptional signature of our isolated cells. *

      However, 5-AZA is a global demethylating agent, and the observed effects may not be direct. To validate the epigenetic regulation of central AT2 pathways further, we took advantage of locus-specific epigenetic editing technology *(73). We focused on the IFN pathway because it was the most significantly enriched Gene Ontology (GO) term in our integrative analysis of TWGBS and RNA-seq data. Several IFN pathway members had associated hypomethylated DMRs within promoter-proximal regions and concomitant increased gene expression (Fig. 4C and S2C). Additionally, we confirmed the elevated expression of IFN-related genes with associated DMRs identified in our study in AT2 cells and AT2 cell subclusters from a recently published scRNA-seq cohort (74) (Fig. S4E-F). *

      We observed upregulation of multiple IFN genes in AT2 in COPD, consistent with a previous expression array study (24). IFNα/β signaling was also enriched in COPD patients in the inflammatory AT2 cluster (iAT2) in a recent scRNA-seq study (84) and our INF signature genes were also upregulated in AT2c and AT2rb subsets in COPD, identified by another scRNA-seq study recently (74)*. ** *

      Finally, despite careful removal of airways from distal lung tissue using a dissecting microscope, we cannot exclude the presence of some terminal/respiratory bronchiole cells in our FACS-isolated EpCAMpos/PDPNlow population. Recent scRNA-seq studies provided an unprecedented resolution and identified several epithelial subpopulations and transitional cells residing in the terminal/respiratory bronchioles and alveoli, including respiratory airway secretory cells (93), terminal airway-enriched secretory cells (28), terminal bronchiole-specific alveolar type-0 (AT0) (70), and emphysema-specific AT2 cells (74). These cells may contribute to alveolar repair in healthy and COPD lungs; however, with our bulk DNA methylation and RNA-seq study, we are unable to resolve all these subpopulations. Future development of single-cell methylation and non-reference-based algorithms for DNA methylation deconvolution will enable deeper epigenetic phenotyping of specific AT2 and bronchiolar cell subsets.

      (Methods) Validation of IFN gene upregulation in a published scRNA-seq dataset

      scRNA-seq data from (74), generously provided by M. Köningshoff, were processed using the default Seurat workflow (117). Expression of IFN-related genes was extracted and plotted as log-normalised gene expression levels in AT2 cells from control and COPD donors. Seurat's AddModuleScore() function was used to compute a gene set score for a custom IFN program using the genes listed in __Fig. S4E __and to analyse the IFN gene set scores in AT2 cell subclusters identified in (74). Briefly, average gene expression scores were computed for the gene set of interest, and the expression of control features (randomly selected) was subtracted as described in (118).

      Fig. S4E and F: E. Expression values for the indicated genes of the IFN pathway from an external scRNA-seq dataset of AT2 cells from COPD patients and healthy controls (74). Y-axis shows log-normalized gene expression levels. F. Combined gene set score of the genes shown in (E) in different subsets of AT2 cells from (74). The IFN signature genes were identified in our integrative analysis of TWGBS and RNA-seq in sorted AT2 cells.

      • The overrepresentation of several keratins (KRT5, KRT14, KRT16, KRT17), mucins (MUC12, MUC13, MUC16, MUC20) and the transcription factor FoxJ1 is now attributed by the authors to a possible dysregulation of AT2 identity and differentiation in COPD (lines 282 - 284) where they cite refs 28, 69, 70. Authors try to support this with IF double stains for KRT5 and HT-280 to identify co-expression of KRT5 and HT2-280 in lung tissue (Figure S2H). However, the evidence for the co-expression of both markers could be presented more convincingly.

      __Response: __

      We found the potential co-expression of airway and alveolar markers in COPD lungs interesting and hence included it in the original manuscript. The initial discovery came from our bulk RNA-seq data, where we observed upregulation of several genes typically found in more proximal airways in COPD (mentioned above by the reviewer). Of note, some of them (e.g., FoxJ1) are expressed at very low levels. Following reviewer's comments, to validate possible colocalization of AT2 and airway markers on protein level, we performed further IF analysis. We took Z-stack images to demonstrate the co-localization of HT2-280 and Krt5 more convincingly and co-stained the same tissue regions with SCGB3A2 (a TASC/distal airway cell marker, PMID 36796082). Even though these are rare events, we were able to reproduce the existence of HT2-280/Krt5 positive, SCGB3A2 negative cells in the alveoli of COPD patients on the protein level (__Fig. S2H __and panels below). Although interesting, we decided to keep this finding in the supplement and did not include it in the discussion to focus the story on the epigenetic regulation of the IFN pathway, which is the main discovery of our study. We will investigate this observation in future studies.

      Figure S2H and here: Examples of HT2-280/Krt5 double positive cells. Top, immunofluorescence staining of the alveolar region of a COPD II donor showing the existence of AT2 cells (HT2-280 positive (red), which are SCGB3A2 negative (green, left) but KRT5 positive (green, right). In conclusion, double-positive HT2-280/KRT5 cells are rare but present in the alveoli of COPD patients. Magnification: 20x. Scale bar: 50 µm. Bottom, Z-stack images highlighting HT2-280 (red) and KRT5 (green) double-positive cells at 63x magnification. Scale bar: 5 µm.

      CHANGES IN THE MANUSCRIPT:

      In addition, we observed an upregulation of several keratins (KRT5, KRT14, KRT16, KRT17) and mucins (MUC12, MUC13, MUC16, MUC20), suggesting a potential dysregulation of alveolar epithelial cell differentiation programs in COPD (Table 6, Fig. S2F). Immunofluorescence staining confirmed the presence of KRT5-positive cells in the distal lung in COPD and identified cells positive for both KRT5 and HT2-280 (Fig. S2H). Collectively, these results indicate a dysregulation of stemness and identity in the alveolar epithelial cells in COPD.

      Fig. S2H legend: The zoomed-in panel (right corner, bottom) demonstrates the presence of rare HT2-280/KRT5 double-positive cells in the alveoli of COPD patients.* Slides were counterstained with DAPI, scale bars = 50µm, 20µm or 5µm, as displayed in images. *

      • Double staining for KRT5 and HT2-280 did highlight the proximity of both cell types in lung tissue, underscoring the challenge of removing airways (including the smaller and terminal bronchi) from the tissue. In addition, HT-280/KRT5 co-expression is not consistent with recent studies from refs 28, 69, 70 where other markers for distal airway cell transition, such as SCGB3A2 and BPIFB1, have been demonstrated, which were not investigated in this study.

      Response:

      We provided a general overview of the different signatures observed in our data, but we could not validate every deregulated pathway or gene. We include the relevant tables detailing all differentially expressed genes and differentially methylated regions to enable and encourage the community to follow up on the data in subsequent studies.

      As demonstrated above, we detect the co-occurrence of HT2-280/KRT5 staining on the protein level in the same cells in the alveoli of COPD patients. We would like to emphasize that alveolar epithelial cell identity in CODP lungs has not been investigated in detail on the protein or RNA level, and HT2-280/KRT5 co-expression/co-localization has not been directly tested in the studies mentioned by the reviewer since, among other reasons, the gene encoding HT2-280 has not been identified. Notably, a recent study (published after the submission of our manuscript) focusing on enriched epithelial cells from the distal lungs of COPD patients (PMID 35078977), identified an emphysema-specific AT2 subtype co-expressing the AT2 marker SFTPC and distal airway cell transition marker SCGB3A2, indicating that disease-specific AT2 populations with possible co-occurrence of AT2 and airway markers exist. In our dataset, SCGB3A2 was not deregulated (log2 fold change=0.22, adj p-value= 0.47), as shown in Table 6, and the HT2-280/Krt5 positive cells were negative for SCGB3A2 in our IF staining (see above).

      BPIFB1 is one of the antimicrobial peptides genes with an associated DMR and is significantly upregulated in COPD cells in our study (log2 fold change=1.17, adj p-value=0.0016), as shown in the supplementary figure Fig S4C and here below for convenience.

      Figure S4C Fold-change in gene expression of BPIFB1 in AT2 cells in COPD (RNA-seq) and A549 cells treated with 0.5µM AZA (RT-qPCR) compared to control samples. Left, RNA-seq data from AT2 cells (no COPD, blue, n=3; COPD II-IV, green, n=5). Right, A549 treated with AZA (orange, n=3) compared to control DMSO-treated cells (grey, n=3). The group median is shown as a black bar.

      • The small (and not evenly divided) sample size of both COPD and non-COPD specimens may lead to a higher risk for false positive results as adjustments for multiple testing typically rely on the number of comparisons, and small sample sizes may not provide enough data points to adequately control for this.

      __Response: __

      We acknowledge the problem of testing for multiple traits with relatively small numbers of samples. The availability of donor tissue, especially from non-COPD and COPD-I donors, was limited, and we applied very strict donor matching and quality control criteria for sample inclusion to avoid additional variability and confounding factors. The importance of strict quality control in selecting appropriate control samples was highlighted in our previous study (PMID: 33630765), where we demonstrated that approximately 50% of distal lung tissue from cancer patients with normal spirometry has pathological changes. Hence, we believe that the quality of the tissue was paramount to the reliability of the data. Strict quality control and sample matching for multiple parameters, including age, BMI, smoking status and smoking history (critical for DNA methylation studies), and cancer type (for background tissue), is a key strength of our approach, but it inevitably limited our sample size.

      First, all samples were cryopreserved and then processed in parallel in groups of 1 non-COPD and 2-3 COPD samples. This process included tissue dissociation, FACS sorting, back sorting (always), and immunofluorescence staining (when enough material was available). Cell pellets were stored at -80{degree sign}C until the entire cohort was ready for sequencing. This was done to limit the potential variation introduced by processing and sorting. RNA and DNA isolations were performed in parallel for all the sorted cell pellets, which were then sequenced as a single batch.

      During data analysis, we applied stringent cutoffs for DMR detection to reduce the risk of false positives due to multiple comparisons and a small sample size. Specifically, we filtered for regions with at least 10% methylation difference and containing at least 3 CpGs. Additionally, we applied a non-parametric Wilcoxon test using average DMR methylation levels to remove potentially false-positive regions, as the t-statistic is not well suited for non-normally distributed values, as expected at very low/high (close to 0% / 100%) methylation levels. A significance level of 0.1 has been used. Therefore, we are confident that the rigorous analysis and strict criteria applied in this study allowed us to detect trustworthy DMRs that we could further functionally validate using epigenetic editing. All the details of the DMR analysis are provided in the methods section. To address this point and limitation, we have added the following paragraphs in the discussion section of the manuscript:

      CHANGE IN THE MANUSCRIPT:

      *The strengths of our study include the use of purified human alveolar type 2 epithelial progenitor cells from a well-matched and carefully validated cohort of human samples, including mild and severe COPD patients, providing high relevance to human COPD. *

      However, we acknowledge several limitations of our study that warrant further investigation. First, the sample size was small. The use of strict quality criteria for donor selection limited the available samples, particularly for the ex-smoker control group. This resulted in an unequal distribution of COPD and control samples. This impacts the power of statistical analysis, particularly in the WGBS analysis, where millions of regions genome-wide are tested. Nevertheless, the clear negative correlation between promoter methylation and corresponding gene expression highlights the robustness of the DMR selection. Additionally, we were able to experimentally validate interferon-associated DMRs using epigenetic editing, highlighting the power of integrated epigenetic profiling in identifying disease-relevant regulators.

      __Minor suggestions for improvement __

      __Introduction __ • In general, refer to the actual experimental studies rather than review papers where appropriate.

      Response:

      We have now carefully checked all the references and amended them to refer to experimental studies when required.

      • Clearly specify whether a study was conducted in mice or humans, as this distinction is crucial for understanding the relevance of the findings to COPD.

      __Response: __

      All our experiments were performed with human lung cells and tissues. No mouse samples were used. As suggested, we have now clearly stated that our study was performed using human tissue samples and cells in different parts of the manuscript, including the discussion, where we now explicitly highlight the strengths and limitations of our study.

      CHANGES IN THE MANUSCRIPT:

      ...we generated whole-genome DNA methylation and transcriptome maps of sorted human primary alveolar type 2 cells (AT2) at different disease stages.

      However, the regulatory circuits that drive aberrant gene expression programs in human AT2 cells in COPD are poorly understood

      Therefore, we set out to profile DNA methylation of human AT2 cells at single CpG-resolution across COPD stages.

      ...*suggesting that aberrant epigenetic changes may drive COPD phenotypes in human AT2. *

      To identify genome-wide DNA methylation changes associated with COPD in purified human AT2 cells...

      The similarity of the methylation and gene expression profiles in the PCAs suggested that epigenetic and transcriptomic changes in human AT2 cells during COPD might be interrelated ...

      *In this work, we demonstrate that genome-wide DNA methylation changes occurring in human AT2 cells may drive COPD pathology by dysregulating key pathways that control inflammation, viral immunity and AT2 regeneration. *

      *Using high-resolution epigenetic profiling, we uncovered widespread alterations of the DNA methylation landscape in human AT2 cells in COPD that were associated with global gene expression changes. *

      *Currently, it is unclear how cigarette smoking leads to changes in DNA methylation patterns in human AT2 *

      The strengths of our study include the use of purified human alveolar epithelial progenitor cells from a well-matched and carefully validated cohort of human samples, including mild and severe COPD patients, providing high relevance to human COPD.

      __Methods __ • Line 473, here is meant 3 ex-smoker controls instead of smoker controls?

      __Response: __

      All donors (no COPD and COPD) used in our study are ex-smokers. Matching the samples with regard to smoking status and history is critical for epigenetic studies, as cigarette smoke profoundly affects DNA methylation genome-wide (PMID: 38199042, PMID: 27651444). This has now been clarified in the revised manuscript.

      CHANGE IN THE MANUSCRIPT____:

      Of note, we included only ex-smokers in our profiling to avoid acute smoking-induced inflammation as a confounding factor (50)*. *

      Importantly, we matched the smoking status and smoking history of all donors, which is key in epigenetic studies, as cigarette smoking profoundly impacts the DNA methylation landscape of tissues (96).

      In total, 3 ex-smoker controls (no COPD), 3 mild COPD donors ex-smokers (GOLD I, COPD I) and 5 moderate-to-severe COPD donors ex-smokers (GOLD II-IV, COPD II-IV) were profiled (Fig. 1A-C, Table 1)

      __Discussion __ • A list of limitation should be added to the discussion. One is the use of the alveolar cell line A549, which produces mucus, a characteristic more commonly associated with bronchial epithelial cells. (ref 43)l530:

      __Response: __

      The profiling was performed using purified primary human alveolar epithelial progenitor cells. For technical reasons, A549 cells were only used for validation of the results using epigenetic editing. The A549 phenotype depends on the growth medium used, in our case, Ham's F-12 medium, which is recommended for long-term A549 culture and promotes multilamellar body formation and differentiation toward an AT2-like phenotype (PMID: 27792742)__. __We are developing epigenetic editing technology for use in primary lung cells; however, the approach currently relies on the high efficiency of transient transfections, which cannot yet be achieved with primary adult AT2 cells. We were positively surprised by how well the methylation data obtained from patient AT2s translated into mechanistic insights when using A549 cells, despite being a cancer cell line. This suggests that the fundamental mechanisms of epigenetic regulation of IRF9 and the IFN signaling pathway are conserved between A549 and primary AT2 cells.

      • Another limitation to consider is that cells were isolated primarily from individuals with lung cancer, except for patients with COPD stage IV. In particular as COPD stage II and IV samples were taken together. And discuss the small and unevenly divided sample size

      __Response: __

      We thank the reviewer for bringing up this important point, which we carefully considered when designing our study. To match our samples across the cohort, all the no-COPD, COPD I, and two of the COPD II-IV samples were obtained from cancer resections. In addition to other characteristics, like age, BMI and smoking status, we also matched the donors by cancer type (all profiled donors had squamous cell carcinoma). We collected lung tissue as far away from the carcinoma as possible and sent representative pieces for histological analysis by an experienced lung pathologist to confirm the absence of visible tumours. In addition, to ensure that our data represents COPD-relevant signatures, we intentionally included samples from three COPD donors undergoing lung resections (without a cancer background) in the profiling.

      Following the reviewer's suggestion, to investigate the potential impact of non-cancer samples on driving the observed differences, we carefully checked the PCAs for both DNA methylation and RNA-seq. We could not identify a clear separation of no-cancer COPD samples from the cancer COPD samples (or other cancer samples) in any examined PCs, indicating no cofounding effect of cancer background in the samples. We observed that one sample contributing to PC2 is a non-cancer sample, but this was a rather sample-specific effect, as the other two non-cancer samples clustered together with the other severe COPD samples with a cancer background. Notably, in our DNA methylation data, we do not observe typical features of cancer methylomes, like global loss of DNA methylation or aberrant methylation of CpG islands (e.g., in tumour suppressor genes) (see Fig 2A), further suggesting that we do not "pick up" confounding cancer signatures in our data.

      Following the comments from both reviewers, to clarify that point, we added the information about cancer and non-cancer samples to the PCA figures for DNA methylation (new Fig. 2B) and RNA-seq (new Fig. 3A) data in the revised manuscript, as shown below

      CHANGE IN THE MANUSCRIPT____:

      COPD samples from donors with a cancer background clustered together with the COPD samples from lung resections, confirming that we detected COPD-relevant signatures (Fig. 2B).

      Fig.2B* Principal component analysis (PCA) of methylation levels at CpG sites with > 4-fold coverage in all samples. COPD I and COPD II-IV samples are represented in light and dark green triangles, respectively, and no COPD samples as blue circles. COPD samples without a cancer background are displayed with a black contour. The percentage indicates the proportion of variance explained by each component. *

      Unsupervised principal component analysis (PCA) on the top 500 variable genes revealed a clear influence of the COPD phenotype in separating no COPD and COPD II-IV samples, as previously observed with the DNA methylation analysis, irrespective of the cancer background of COPD samples (Fig.3A, Fig. S2B).

      *Principal component analysis (PCA) of 500 most variable genes in RNA-seq analysis. PCA 1 and 2 are shown in Fig.3A, PCA 1 and 4 in Fig.S2B. COPD I and COPD II-IV samples are represented in light and dark green triangles, respectively, and no COPD samples as blue circles. COPD samples without a cancer background are displayed with a black contour. The percentage indicates the proportion of variance explained by each component. *

      __Response: __

      We thank the reviewer for suggestions on how to improve the discussion of our manuscript. We have now added a strength/limitation section to our discussion and included the points suggested by both reviewers.

      CHANGE IN THE MANUSCRIPT____:

      The strengths of our study include the use of purified human alveolar epithelial progenitor cells from a well-matched and carefully validated cohort of human samples, including mild and severe COPD patients, providing high relevance to human COPD. Importantly, we matched the smoking status and smoking history of all donors, which is key in epigenetic studies, as cigarette smoking profoundly impacts the DNA methylation landscape of tissues (96). With the first genome-wide high-resolution methylation profiles of isolated cells across COPD stages, we offer novel insights into the epigenetic regulation of gene expression in epithelial progenitor cells in COPD, expanding our understanding of how alterations in regulatory regions and specific genes could contribute to disease development. We identified IRF9 as a key IFN transcription factor regulated by DNA methylation. Notably, by targeting IRF9 through epigenetic modifications, we modulated the activity of the IFN pathway, which plays a crucial role in the immune response and lung tissue regeneration. Epigenetic editing techniques could offer a novel therapeutic strategy for COPD by downregulating IFN pathway activation and promoting the regeneration of epithelial progenitor cells in the lungs. Further preclinical and clinical studies are needed to validate the efficacy and safety of epigenetic editing approaches in COPD treatment (33)*. *

      *However, we acknowledge several limitations to our study that warrant further investigation. First is the small sample size and replication difficulty due to the lack of available data, common challenges for studies working with sparse human material and hard-to-purify cell populations. The use of strict quality criteria in donor selection limited the available samples, especially for the ex-smoker control group, leading to an unequal distribution of COPD and control samples. Overall, this impacts the power of statistical analysis, especially in the WGBS analysis, where millions of regions genome-wide are tested. Nevertheless, the clear negative correlation of promoter methylation to the corresponding gene expression highlights the robustness of the DMR selection. Furthermore, we could experimentally validate interferon-associated DMRs using epigenetic editing, highlighting the power of integrated epigenetic profiling for the discovery of disease-relevant regulators. *

      Overall, we detected a higher number of correlated DMR-DEG associations using our simple promoter-proximal linkage compared to the GeneHancer approach. Assigning enhancers to their target genes with high confidence is a complex and challenging task. Enhancers are often located far from the genes they regulate and can interact with their target genes through three-dimensional chromatin loops. Furthermore, enhancers can operate in a highly context-dependent manner, with the same enhancer regulating different genes depending on the cell type, developmental stage, or environmental signals. Determining which enhancer is active under specific conditions remains a hurdle in the field, especially since the AT2-specific chromatin profiles of enhancer marks are not yet available.

      In addition, while WGBS provides unprecedented resolution and high coverage of the DNA methylation sites across the genome, it does not allow distinguishing 5-methylcytosine from 5-hydroxymethylcytosine. Therefore, we cannot exclude that some methylated sites we detected are 5-hydroxymethylated. However, as 5-hydroxymethylcytosine is present at very low levels in the lung tissue (97)*, its effect is likely marginal. *

      Finally, despite careful removal of airways from distal lung tissue using a dissecting microscope, we cannot exclude the presence of some terminal/respiratory bronchiole cells in our FACS-isolated EpCAMpos/PDPNlow population. Recent scRNA-seq studies provided an unprecedented resolution and identified several epithelial subpopulations and transitional cells residing in the terminal/respiratory bronchioles and alveoli, including respiratory airway secretory cells (93), terminal airway-enriched secretory cells (28), terminal bronchiole-specific alveolar type-0 (AT0) (70), and emphysema-specific AT2 cells (74). These cells may contribute to alveolar repair in healthy and COPD lungs; however, with our bulk DNA methylation and RNA-seq study, we are unable to resolve all these subpopulations. Future development of single-cell methylation and non-reference-based algorithms for DNA methylation deconvolution will enable deeper epigenetic phenotyping of specific AT2 and bronchiolar cell subsets.

      __References __ • Check references. For instance, there is no reference in the text to ref 43.

      • Align format of references

      __Response: __

      We thank the reviewer for spotting this inconsistency. We have carefully checked and aligned the format of all references. The (old) reference 43 is now mentioned in the discussion part.

      __Reviewer #1 (Significance (Required)): __

      The strength of this study lies in its focus on the molecular mechanisms underlying the impaired regeneration of epithelial progenitor cells in COPD. The discovery of IRF9, which regulates IFN signaling and is prominently upregulated in COPD, together with the convincing validation of the epigenetic control of the IFN pathway by targeted DNA demethylation of the IRF9 gene, adds significant value to the COPD research field.

      Main limitations of the study are the relatively small sample size of both COPD and non-COPD specimens and the claim that the sorted EpCAMpos/PDPNlow cells primarily consisted of AT2 cells.

      __- Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. __

      The nature and significance of the advance in epigenetic editing of IRF9 in COPD can be described as both conceptual and potentially clinical:

      Conceptual Advance: The epigenetic editing of IRF9 enhances our understanding of the molecular mechanisms underlying COPD pathogenesis. By targeting IRF9 through epigenetic modifications, researchers were able to modulate the activity of the IFN pathway, which plays a crucial role in the immune response and lung tissue regeneration. This approach offers insights into the epigenetic regulation of gene expression in epithelial progenitor cells in COPD and expands our understanding of how alterations in specific gene methylation could contribute to disease progression.

      Clinical Significance: The potential clinical significance of epigenetic editing of IRF9 lies in its implications for COPD therapy. If successful, epigenetic editing techniques could offer a novel therapeutic strategy for COPD by downregulating IFN pathway activation and promoting regeneration of epithelial progenitor cells in the lungs. Obviously, further preclinical and clinical studies are needed to validate the efficacy and safety of epigenetic editing approaches in COPD treatment.

      __Response: __We thank the reviewer for recognising the importance of our study, its conceptual advance and potential clinical significance. We are pleased to see that the reviewer highlights the promise of epigenetic editing in both furthering our basic understanding of molecular mechanisms of chronic diseases and its future potential as a therapeutic strategy.

      __- Place the work in the context of the existing literature (provide references, where appropriate). __ Few experimental papers have been published on epigenetic editing in lung diseases, with limited research available beyond the study referenced in citation 43. Song J, Cano-Rodriquez D, Winkle M, Gjaltema RA, Goubert D, Jurkowski TP, Heijink IH, Rots MG, Hylkema MN. Targeted epigenetic editing of SPDEF reduces mucus production in lung epithelial cells. Am J Physiol Lung Cell Mol Physiol. 2017 Mar 1;312(3):L334-L347. doi: 10.1152/ajplung.00059.2016. Epub 2016 Dec 23. PMID: 28011616.

      Response:

      We thank the reviewer for recognising the uniqueness and novelty of our study and the lack of research on the functional understanding of DNA methylation in the context of lung and lung diseases.

      - State what audience might be interested in and influenced by the reported findings.

      This study is of broad interest to researchers investigating the pathogenesis and treatment of COPD.

      __- Define your field of expertise with a few keywords to help the authors contextualize your point of view. __

      Expertise in: Lung pathology, Immunology, COPD, Epigenetics

      - Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. Less expertise in: Epigenetic Editing

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __

      __Summary: __

      This study aim to understand the molecular mechanisms underlying dysfunction in AT2 cells in COPD, by profiling bulk genome wide DNA methylation using Tagmentation-based whole-genome bisulfite sequencing (T-WGBS) and RNA sequencing in selectively sorted primary AT2 cells. The study stands out in it's sequencing breadth and use of an incredibly difficult cell population, and has the potential to add substantially to our mechanistic understanding of epigenetic contributions to COPD. A further highlight is the concluding aspect of the study where the authors undertook targeted modification of specific CpG methylation, provided direct, site-specific evidence for transcriptional regulation by CpG methylation.

      Response:

      We thank the reviewer for recognizing the conceptual and methodological advance of our study and for noting the value of our functional mechanistic approach.

      __Major comments: __

      The authors clearly show that there is DNA methylation alteration in AT2 cells from COPD individuals that links functional to gene expression at some level. However, I think the statement "to identify genome-wide changes associated with COPD development and progression..." and similar other references to disease development understanding is not accurate given the DNA methylation primary comparison is between control and moderate to severe COPD, with no temporal detail or evidence that they drive progression rather than are a result of COPD development. The paragraph starting on line 186 where this is a addressed to some extent is quite vague and doesn't really provide confidence that DNAm dysregulation occurs at an early stage in this context. This can be addressed by changing the focus/style of the text.

      __Response: __

      Thank you for raising this point. We agree with the reviewer that our cross-sectional study describes the association of methylation changes with either COPD I or more established disease (COPD II-IV) and that the observed changes may be either the driver or a result of COPD development. This has been clarified in the revised manuscript, and we removed the statements about disease initiation and progression. This is an important point; hence, we added an extra line to the discussion to make that clear.

      __CHANGE IN THE MANUSCRIPT____: __

      Therefore, we set out to profile DNA methylation of human AT2 cells at single CpG-resolution across COPD stages to identify epigenetic changes associated with disease and combine this with RNA-seq expression profiles.

      To identify epigenetic changes associated with COPD, we collected lung tissue from patients with different stages of COPD,

      ....to identify methylation changes associated with mild disease, we included TWGBS data from AT2 isolated from COPD I patients (n=3) in the analysis.

      Currently, we do not know whether the identified DNA methylation changes are the cause or the consequence of the disease process and not much is known about the correlation of DNA methylation with disease severity.

      *However, our study is cross-sectional, our cohort included only 3 COPD I donors, and we did not have any follow-up data on the patients, so future large-scale profiling of mild disease (or even pre-COPD cohorts) in an extended patient cohort will be crucial for a better understanding of early disease and its progression trajectories. *

      __Results comments and suggestions: __

      For the integrated analysis, there is a focus on DMRs in promoters with very little analysis on other regions. The paragraph starting on line 317 describes some analysis on enhancers but is very brief, doesn't include information on how many/which DMRs were included, making it hard to interpret the impact of the 147 DMRs and 93 genes identified - is this nearly all DMRs and genes analysed or very few? A comparison to the promoter analysis would be of interest. Especially as the targeted region followed up with lovely functional assessment in the last sections is a gene body DMR, not a promoter DMR.

      __Response: __

      We thank the reviewer for pointing out the importance of changes in enhancers. We agree that extending the enhancer analysis is very interesting. However, assigning enhancers to their target genes with high confidence is a complex and challenging task. Enhancers are often located far from the gene they regulate, sometimes spanning hundreds of kilobases. They can interact with their target genes through three-dimensional chromatin loops, potentially bypassing nearby genes to activate more distant ones, making it difficult to confidently link specific enhancers to their target genes. Furthermore, enhancers can operate in a highly context-dependent manner. The same enhancer can regulate different genes depending on the cell type, developmental stage, or environmental signals. Another challenge is that enhancers often work in clusters or "enhancer landscapes," where multiple enhancers contribute to the regulation of a single gene. Disentangling the contribution of individual enhancers within such clusters and determining which enhancer is active under specific conditions remains an ongoing hurdle in the field, especially since the AT2-specific chromatin profiles of enhancer marks are not yet available.

      One approach we tried to account for more distal regulatory regions was to assign DMRs to the nearest gene with a maximum distance of up to 100 kb using GREAT (Genomic Regions Enrichment of Annotations Tool) and simultaneously perform gene enrichment analysis of the associated genes. The old Figure S1C (now S1D) shows the top 10 enriched terms of either hyper- or hypomethylated DMRs, and Table 4 shows the full list of enriched terms. However, in this analysis, we did not integrate the results of the RNA-seq analysis. To demonstrate that we can correlate methylation with gene expression associations in this analysis, we then took a closer look at the WNT/b-catenin pathway, which contains 147 DMRs associated with 93 genes from the respective pathway (old Figure S3D, now S3G). Here, we showed that distal DMRs up to 100 kb away from the TSS show a high correlation with gene expression. We are including the two figures below for convenience:

      *Left panels, functional annotation of genes located next to hypermethylated (top) and hypomethylated (bottom) DMRs using GREAT. Hits were sorted according to the binominal adjusted p-value and the top 10 hits are shown. The adjusted p-value is indicated by the color code and the number of DMR associated genes is indicated by the node size. Right panel, scatter plot showing distal DMR-DEG pairs associated with Wnt-signaling. Pairs were extracted from GREAT analysis (hypermethylated, DMR-DEG distance Following the reviewer's suggestion, we have now extended the enhancer analysis using the GeneHancer database, the most comprehensive, integrated resource of enhancer/promoter-gene associations. We used the GeneHancer version 5.14, which annotates 392,372 regulatory genomic elements (GeneHancer element) on the hg19 reference genome. Of the 25,028 DMRs, 18,289 DMRs (73% of all DMRs) coincided with at least one GeneHancer element, resulting in 19,661 DMR-GeneHancer associations. Next, we extracted the GeneHancer elements associated with protein-coding or long-non-coding RNAs genes, which left us with 2,144 DMR-GeneHancer associations. Next, we used only high-scoring gene GeneHancer associations ("Elite"), leaving 1,485 DMR-GeneHancer associations. Of those, we selected the GeneHancer elements, which are linked to genes differentially expressed in our RNA-seq analysis resulting in a final table of 376 DMR-GeneHancer associations (Table 9 DMR_DEG_GeneHancer, Tab 2). Similar to the promoter-proximal analysis, we analysed the correlation of expression and methylation changes of the DMR-GeneHancer associations, demonstrating a high number of negatively and positively correlated events (Fig.S3D). Finally, we performed the gene enrichment analysis for positively and negatively correlating genes. We detected significant GO term enrichments only for negatively correlating genes (Fig.S3E and Table 10_Enrichment_results, Tab2).

      CHANGE IN THE MANUSCRIPT

      To harness the full resolution of our whole-genome DNA methylation data, we extended the analysis beyond promoter-proximal regions and assessed how epigenetic changes in distal regulatory regions (enhancers) may relate to transcriptional differences in COPD. As the assignment of enhancer elements to the corresponding genes is challenging, we tried two different approaches. First, we used the GeneHancer database (72) to link DMRs to regulatory genomic elements (GeneHancer element). Of the 25,028 DMRs, 18,289 DMRs (73%) coincided with at least one GeneHancer element. Of those 2,144 DMR-GeneHancer associations were linked either to protein-coding or lncRNA genes. Next, we filtered for high-scoring gene GeneHancer associations ("Elite"), leaving 1,485 DMR-GeneHancer Elite associations. Of those, we selected the GeneHancer elements, which are linked to genes differentially expressed in our RNA-seq analysis, resulting in 376 DMR-GeneHancer associations (Table 9). Similar to the promoter-proximal analysis, we assessed the correlation of expression and methylation changes of the DMR-GeneHancer associations, demonstrating a high proportion of negatively and positively correlated events (Fig. S3E). Finally, we performed gene enrichment analysis for positively and negatively correlated genes. We detected significant GO term enrichments for negatively correlating genes only (Fig. S3F and Table 10), with the most pronounced term "regulation of tumor necrosis factor". In an alternative approach, we linked proximal and distal (within 100 kb from TSS) DMRs to the next gene using GREAT (57) (Fig S1C, Table 4) *and calculated Spearman correlation between DMRs and associated DEGs__. 147 DMRs were associated with high correlation rates with 93 genes from the WNT/β-catenin pathway (Fig. S3G)__, suggesting that DNA methylation may also drive the expression of genes of the WNT/β-catenin family. *

      Figure S3E and F: E. Spearman correlation between gene expression and DMR methylation of DMRs assigned to gene regulatory elements using the GeneHancer database. F. GO-Term over-representation analysis of DEGs negatively correlated to DMRs in gene regulatory elements. The adjusted p-value is indicated by the color code and the percentage number of associated DEGs is indicated by the node size.

      (Methods) For enhancer analysis, the GeneHancer database version 5.14, which annotates 392,372 regulatory genomic elements (GeneHancer element) on the hg19 reference genome, was used (72). Of the 25,028 DMRs 18,289 DMRs coincided with at least one GeneHancer element, resulting in 19,661 DMR-GeneHancer associations. Next, the GeneHancer elements were filtered for association with protein-coding or long-non-coding RNAs genes and high-scoring gene GeneHancer associations ("Elite"), leaving 1,485 DMR-GeneHancer associations. Of those, the GeneHancer elements were selected, which are linked to differentially expressed genes in COPD resulting in a final table of 376 DMR-GeneHancer associations. Similar to the promoter-proximal analysis, the Spearman correlation of expression and methylation changes of the DMR-GeneHancer associations was assessed. GO gene enrichment analysis for positively and negatively correlating genes was done using Metascape (111).

      A comparison to the promoter analysis would be of interest.

      Response:

      We detected more highly correlated (|correlation coefficient| > 0.5) DMR-DEG associations using our simple promoter proximal linkage (n=643) in comparison with the GeneHancer approach comprising annotated enhancer elements (n=327/2,144). Gene enrichment results pointed to the interferon pathway, which we could confirm using epigenetic editing. This pathway was not present in the GeneHancer analysis, indicating that regulation of the IFN pathway may be controlled by proximal elements.

      CHANGE IN THE MANUSCRIPT____:

      Overall, we detected a higher number of correlated DMR-DEG associations using our simple promoter-proximal linkage compared to the GeneHancer approach. Assigning enhancers to their target genes with high confidence is a complex and challenging task. Enhancers are often located far from the genes they regulate and can interact with their target genes through three-dimensional chromatin loops. Furthermore, enhancers can operate in a highly context-dependent manner, with the same enhancer regulating different genes depending on the cell type, developmental stage, or environmental signals. Determining which enhancer is active under specific conditions remains a hurdle in the field, especially since the AT2-specific chromatin profiles of enhancer marks are not yet available.

      Especially as the targeted region followed up with lovely functional assessment in the last sections is a gene body DMR, not a promoter DMR.

      Response:

      We thank the reviewer for bringing up that point. To clarify, we defined the promoter regions for the analysis as regions located {plus minus} 6 kb (upstream and downstream) from the transcriptional start site (TSS). Since the term "promoter" often refers to the region upstream of the transcriptional start site, its use may have been misleading. For clarity, we changed the text correspondingly to __promoter proximal methylation __and explained in the methods how the regions for analysis were defined.

      __CHANGE IN THE MANUSCRIPT____: __

      "DMR association per gene promoter" was changed to "Gene promoter proximal DMRs"

      Fig. S3B: "DMR in promoter" was changed to "promoter proximal DMR(s)"

      "by DNA methylation changes in promoters" was changed to "by DNA methylation changes in promoter proximity"

      "regulated by promoter methylation" was changed to "regulated by promoter-proximal methylation"

      "analysis of the promoter DMRs" was changed to "analysis of the promoter-proximal DMRs"

      "between promoter methylation" was changed to "between promoter proximal methylation"

      Cytoscape was used to analyse negatively or positively correlated DMR DEG pairs. ClueGO (v2.5.6) analysis was conducted using all DEG associated with a promoter proximal DMR (+/- 6 kb from TSS) and the Spearman correlation coefficient 0.5 (112).

      • Lines 299-301 - I'm not sure the graph in Fig S3A support the conclusion that there was a preferential negative relationship between DNAm and gene expression. Looks like there are a substantial number of cases where a positive relationship is observed and this needs to be acknowledged.

      Response:

      In this part, we refer to Fig S3C. In the left panel, downregulated genes clearly show higher counts for the hypermethylated DMRs, whereas the hypomethylated DMRs are enriched at upregulated genes (right panel), indicating a preference for negative correlation: lower methylation, higher gene expression. If there were no preference, we would expect a 50:50 ratio of hypo- and hypermethylated DMRs, and we observed a 77:23 ratio. Nevertheless, we agree that there is a substantial number of cases (n=151) with a high positive correlation, which we now highlight in the text. For clarity, we also modified the figure legend to indicate that a stacked histogram is represented in the panel.

      __CHANGE IN THE MANUSCRIPT____: __

      L303: Interestingly, 23.5% of the identified DMR DEG pairs (n=151) showed a positive correlation between gene expression and DNA methylation.

      *Figure legend in Fig. S3C was changed to: C Stacked histogram showing location of hyper- and hypomethylated DMRs relative to the TSS of DEGs in downregulated (left) and upregulated (right) genes. *

      • Line 307 - what are the "analysed DEGs"? Are they the methylation associated genes?

      Response:

      Those are the DEGs we identified in RNA-seq analysis. To clarify, we changed the text to "identified DEGs".

      __CHANGE IN THE MANUSCRIPT____: __

      • "analysed DEGs" was changed to "identified DEGs"*

      • Line 307-309 - "Among the analyzed DEGs, 76.5% (492) displayed a negative correlation (16.8% of the total DEGs), indicating a possible direct regulation by DNA methylation, while 23.5% (151) showed a positive correlation between gene expression and DNA methylation" - are the authors suggesting the positive correlation doesn't indicate direct regulation?

      __Response: __

      Thank you for highlighting this point. We did not intend to suggest that negative correlation indicates direct regulation, while positive correlation suggests a lack thereof. To clarify that point, we have reformulated this sentence.

      __CHANGE IN THE MANUSCRIPT____: __

      Among the identified DEGs, 76.5% (n=492) displayed a negative correlation (16.8% of the total DEGs), consistent with a repressive role of promoter DNA methylation. Interestingly, 23.5% of the identified DEG (n=151) showed a positive correlation between gene expression and DNA methylation.

      • Line 313 - why did the authors focus on only negatively correlated genes to identify their top dysregulated pathway of IFN signalling? Why not do pathway analysis on the DNAm associated genes separately to identify DNAm associated pathways?

      Response:

      We have also performed a pathway enrichment analysis using the positively correlated genes but did not identify any significantly enriched pathways/process/terms. When we examined the top hit of the gene set enrichment analysis, the interferon signaling pathway, we observed only negatively correlated DMR gene associations (Fig. 5B). Therefore, we decided to use only the negatively correlated DMRs, as using all correlated genes would give a higher background and dilute our results.

      CHANGE IN THE MANUSCRIPT____:

      Cytoscape was used to analyse negatively or positively correlated DMR DEG pairs. ClueGO (v2.5.6) analysis was conducted using all DEG associated with a promoter proximal DMR (+/- 6 kb from TSS) and the Spearman correlation coefficient 0.5 (113).

      • A comparison of the gene expression data with previous data in AT2 cell/single cell data would strengthen the gene expression section.

      __Response: __

      We compared our gene expression signatures with the study of Fujino et al., who profiled sorted AT2 cells (EpCAMhighPDPNlow) from COPD/controls using expression arrays (PMID: 23117565). Consistent with our study, the authors also observed the upregulation of interferon signalling (among other pathways) in COPD AT2s. However, no raw data was available in the published manuscript for a more in-depth analysis.

      Several recent scRNA-seq studies identified transcriptional signatures of COPD and control cells (e.g., PMIDs: 36108172, 35078977, 36796082, 39147413__). However, most studies did not match the smoking status of the control and COPD donors and looked at the whole lung tissue, with limited power to detect gene expression changes in distal alveolar cells. It is difficult to directly compare our data to the gene expression data from non-smokers vs COPD patients, as cigarette smoking profoundly remodels the epigenome and transcriptional signatures of cells. In addition, differences in technologies and depth of sequencing make such comparisons challenging. However, one study (PMID: 36108172) performed scRNA-seq analysis on 3 non-smokers, 4 ex-smokers and 7 COPD ex-smoker lungs. Despite relatively limited coverage of epithelial cells in the dataset (We also compared the main AT2 IFN signature identified in the integration of our DNA methylation in promoter-proximal regions and RNA-seq with a recent study (published after the submission of our manuscript, PMID: 39147413) that profiled EpCAMpos cells from COPD and control lungs (non-smokers) using scRNA-seq. We observed an upregulation of our IFN signature genes in AT2 in COPD (specifically in AT2-c and rbAT2 subsets), suggesting that similar signatures were observed in this dataset as well. However, ex-smokers were not included in this study, making direct comparisons difficult. We have now included the panels shown below as __Figure S4E and S4F:

      Figure S4E and F: Expression values for the indicated genes of the IFN pathway from an external scRNA-seq dataset of AT2 cells from COPD patients and healthy controls (74). Y-axis shows log-normalized gene expression levels. F. Combined gene set score of the genes shown in (E) in different subsets of AT2 cells from (74)*. The IFN signature genes were identified in our integrative analysis of TWGBS and RNA-seq in sorted AT2 cells. *

      CHANGES IN THE MANUSCRIPT:

      However, 5-AZA is a global demethylating agent, and the observed effects may not be direct. To validate the epigenetic regulation of central AT2 pathways further, we took advantage of locus-specific epigenetic editing technology (73). We focused on the IFN pathway because it was the most significantly enriched Gene Ontology (GO) term in our integrative analysis of TWGBS and RNA-seq data. Several IFN pathway members had associated hypomethylated DMRs within promoter-proximal regions and concomitant increased gene expression (Fig. 4C and Fig.S2C). Additionally, we confirmed the elevated expression of IFN-related genes with associated DMRs identified in our study in AT2 cells and AT2 cell subclusters from a recently published scRNA-seq cohort (74)* (Fig. S4E-F). *

      (Methods) Validation of IFN gene upregulation in a published scRNA-seq dataset

      scRNA-seq data from (74), generously provided by M. Köningshoff, were processed using the default Seurat workflow (117). Expression of IFN-related genes was extracted and plotted as log-normalised gene expression levels in AT2 cells from control and COPD donors. Seurat's AddModuleScore() function was used to compute a gene set score for a custom IFN program using the genes listed in __Fig. S4E __and to analyse the IFN gene set scores in AT2 cell subclusters identified in (74). Briefly, average gene expression scores were computed for the gene set of interest, and the expression of control features (randomly selected) was subtracted as described in (118).

      Fig. S4 E and F. E. Expression values for the indicated genes of the IFN pathway from an external scRNA-seq dataset of AT2 cells from COPD patients and healthy controls (74). Y-axis shows log-normalized gene expression levels. F. Combined gene set score of the genes shown in (E) in different subsets of AT2 cells from (74). The IFN signature genes were identified in our integrative analysis of TWGBS and RNA-seq in sorted AT2 cells. __ __

      • The paragraph starting on line 173 feels a little redundant when we know there is RNA available to test if the differential DNAm links to altered gene expression - this selected of example regions/genes would be better placed after the gene expression has been reported, at which point you could say whether the linked genes displayed altered transcription.

      Response:

      The current structure (with DNA methylation, followed by RNA-seq and integration) is intentional and serves several important purposes. As this is the first genome-wide high-resolution COPD DNA methylation study of AT2, we aimed to describe the methylation landscape independently of gene expression (noting the limitation of current understanding of how DNA methylation regulates expression). This early focus on DMRs lays clear groundwork by highlighting potential regulatory elements and pathways that could be disrupted, independent of or even before corroborative transcriptional data. Additionally, positioning these examples early in the narrative helps to frame subsequent gene expression analyses. Once RNA data are introduced later, the reader can directly compare the methylation patterns with transcriptional outcomes, thereby enhancing the overall story. In other words, by first showcasing disease-relevant methylation changes, we underscore a hypothesis that these epigenetic modifications are functionally meaningful. The later integration of gene expression data then serves as a confirmatory or complementary layer, rather than the sole basis for inferring biological significance. This is important as we still do not fully understand the function of DNA methylation outside promoters, and its role is also important for splicing, 3D genome organisation, non-coding RNA regulation, enhancer regulation, etc.

      • Similarly, the TF enrichment analysis is great but maybe would have added value to be done on DNA regions later shown to be linked to differential expression - was there different enrichment at DNA regions that are vs are not associated with altered expression? And could you test in vitro whether changing methylation of DNA (maybe a blunt too like 5-aza would be ok) alters TF binding (cut+run/ChIP?). Furthermore, it would be interesting to understand the TF sensitivity analysis within the context of positive versus negative DNA methylation:gene expression correlations.

      Response:

      As suggested by the reviewer, we now performed the TF enrichment analysis using the DMRs with a high correlation (|correlation coefficient|>0.5) between methylation and expression (Figure S3D) and expanded the method section to include TF analysis. We observed ETS domain motifs enriched at hypomethylated regions. They prefer unmethylated DNA (MethylMinus) and are therefore expected to bind with higher affinity to the respective DMRs in COPD. We agree with the reviewer that further verifying altered TF binding using cut&run or ChIP assays would be very interesting, but it is out of the scope of this manuscript. Such analysis is technically very challenging to perform with low numbers of primary AT2 cells and will be the focus of our follow-up mechanistic studies.

      CHANGE IN THE MANUSCRIPT____:

      Additionally, motif analysis of DMRs that were highly correlated (|Spearman correlation coefficient| > 0.5) with DEGs revealed a prominent enrichment of the cognate motif for ETS family transcription factors, such as ELF5, SPIB, ELF1 and ELF2 at hypomethylated DMRs (Fig. S3D). Interestingly, SPIB was shown to facilitate the recruitment of IRF7, activating interferon signaling (71)*, and our WGBS data uncovers SPIB motifs at hypomethylated DMRs, which aligns with its binding preferences at unmethylated DNA (methyl minus, Fig. S3D). *

      Figure S3D: Enrichment of methylation-sensitive binding motifs at hypo- (right) and hypermethylated (left) DMRs, using DMRs with a high correlation (|Spearman correlation coefficient| > 0.5) between methylation and gene expression. Methylation-sensitive motifs were derived from Yin et al (64). Transcription factors, whose binding affinity is impaired upon methylation of their DNA binding motif, are shown in red (Methyl Minus), and transcription factors, whose binding affinity upon CpG methylation is increased, are shown in blue (Methyl Plus).

      (Methods) To obtain information about methylation-dependent binding for transcription factor motifs which are enriched at DMRs, the results of a recent SELEX study (64)* were integrated into the analysis. They categorised transcription factors based on the binding affinity of their corresponding DNA motif to methylated or unmethylated motifs. Those whose affinity was impaired by methylation were categorised as MethylMinus, while those whose affinity increased were categorised as MethylPlus. A motif database of 1,787 binding motifs with associated methylation dependency was constructed. The log odds detection threshold was calculated for the HOMER motif search as follows. Bases with a probability > 0.7 got a score of log(base probability/0.25); otherwise, the score was set to 0. The final threshold was calculated as the sum of the scores of all bases in the motif. Motif enrichment analysis was carried out against a sampled background of 50,000 random regions with matching GC content using the findMotifsGenome.pl script of the HOMER software suite, omitting CG correction and setting the generated SELEX motifs as the motif database. *

      __Methods: __ • The authors should include more detail of the TWGBS rather than directing the reader to a previous publication. Also DNA concentration post bisulfite conversion would be a useful metric to provide.

      __Response: __

      Following the suggestion, we have now expanded the details of TWGBS in the methods part of the manuscript. Due to limited space, we did not include a detailed protocol but instead referred to a published step-by-step protocol (55). Of note, we do not measure DNA concentration post-bisulfite conversion but consistently use the starting input of 30 ng of genomic DNA across all samples.

      __CHANGE IN THE MANUSCRIPT____: __

      (Methods): 15 pg of unmethylated DNA phage lambda was spiked in as a control for bisulfite conversion. Tagmentation was performed in TAPS buffer using an in-house purified Tn5 assembled with load adapter oligos (55) at 55 {degree sign}C for 8 min. Tagmentation was followed by purification using AMPure beads, oligo replacement and gap repair as described (55). Bisulfite treatment was performed using EZ DNA Methylation kit (Zymo) following the manufacturer's protocol.

      *The T-WGBS library preparations were performed for all donors in parallel and sequenced in a single batch to minimize batch effects and technical variability. *

      • Differential DNA methylation analysis: It is stated that DNA regions had to contain 3 CpG sites but was this within a defined DNA size range?

      Response:

      The maximum distance between individual CpGs within DMR was set to 300 bp. To clarify, we added that information to the methods part.

      __CHANGE IN THE MANUSCRIPT____: __

      *"regions with at least 10% methylation difference and containing at least 3 CpGs with a maximum distance of 300 bp between them. *

      • Refence genome only provided for RNAseq not TWGBS?

      __Response: __We used hg19 as the reference genome. The information on the reference genome for DNA methylation analysis was provided in the methods L574 (original manuscript_: "The reads were aligned to the transformed strands of the hg19 reference genome using BWA MEM")

      • The tables do not appear in the PDF and I struggled to tally to the "Dataset" files provided if that is what they were referring to?

      Response:

      Full tables (uploaded as Datasets in the manuscript central due to their size) were uploaded together with the manuscript files. They are quite large and will not convert to pdf, so they may not have been included in the merged pdf file. We assume that they should be available to the reviewers with the other files and will clarify that with the editorial staff in the resubmission cover letter.

      • For the gene expression analysis, can it be made clearer that a full analysis was done on COPD I samples. It is a little confusing to the reader as this was not done for DNAm so might be assumed the same targeted analysis on only genes found to be differentially expressed between control and COPD II-IV, but that cannot be the case as an overlap of COPD1 vs COPD II-IV genes if provided. For this overlap, do genes show the same effect direction?

      __Response: __

      To clarify, for the RNA-seq analysis, we performed DEG analysis for no-COPD versus COPD II-IV, as well as no-COPD versus COPD I. We then took all differentially expressed genes (presented in the Venn diagram) and plotted them for all samples as a heatmap. To split the genes into groups displaying similar effect directions, we applied a clustering approach and identified 3 main signatures. Cluster 3 primarily comprises genes unique to COPD I samples, which are associated with the adaptive immune system and hemostasis (Fig. 4E). In the other two clusters, we mainly observe a transitioning pattern from control to severe COPD samples, correlating with the FEV1 values of the patients. This has now been clarified in the manuscript.

      • Replication is difficult on these studies as the samples are so difficult to come by. Also limited by sample size for the same reason. It doesn't mean the study is not worth doing and the data are still valuable. However, it may be pertinent to include technical validation of a few regions of interest, acknowledge the limitation (along side strengths) in the discussion, and perhaps provide actual p value rather than blanket Response:

      We thank the reviewer for acknowledging the replication challenges for studies working with sparse human material and hard-to-purify cell populations. Following the reviewer's suggestion, we have now included a strengths and limitations section in the discussion where we summarised the points highlighted by both reviewers.

      Regarding technical validation, we would like to note that the whole genome bisulfite sequencing (WGBS) technology, as well as the tagmentation-based WGBS (T-WGBS), have been validated in the past few years in several publications (e.g., PMID: 24071908) and shown to yield reliable DNA methylation quantification in comparison to other technologies (PMID: 27347756). For us, technical validation using alternative methods (e.g. bisulfite sequencing or pyrosequencing) is difficult as it requires significantly more input DNA than the low-input T-WGBS we have performed and obtaining sufficient amounts of material from primary human AT2 cells (especially from severe COPD) is not possible with the size of tissue we can access. However, while establishing the T-WGBS for this project, we initially validated our approach using Mass Array, a sequencing-independent method. For this, we performed T-WGBS on the commercially available smoker and COPD lung fibroblasts and selected 9 regions with different methylation levels for validation using a Mass Array. We obtained an excellent correlation between both methods, providing technical validation of T-WGBS and our analysis workflow. This validation was published in our earlier manuscript (PMID: 37143403), but we provided the data below for convenience.

      Scatter plots showing correlation of average methylation obtained with T-WGBS and Mass Array from COPD and smoker fibroblasts. Each dot represents one region with varying methylation levels. The blue diagonal represents the linear regression. Shaded areas are confidence intervals of the correlation coefficient at 95%. Correlation coefficients and P values were calculated by the Pearson correlation method.

      To enable further validation and follow-up by the community, we included the full list of DMRs, associated p-values and additional information for DNA methylation analysis (DMR width, n.CpGs, MethylDiff, etc) in Table 3 (Table_3_wgbs_dmr_info.xlsx) and the information about DEGs from RNA-seq in Table 6 (Table_6_RNAseq_DEG_info.xlsx).

      • It isn't clear to me if DNA and RNA are from the same cells? The results say "cells matching those used for T-WGBS" but the methods suggest separate extractions so not the same cells? If they are not the same cells a comment on the implications of this should be included in the discussion for example, potentially some differences in cell type composition, storage time etc.

      Response:

      Lung tissue samples were freshly cryopreserved, and H&E slides derived from exemplary pieces of the tissue analyzed. Once we had a group of at least 3 samples comprising one non-COPD and 2 COPD samples, we processed them in parallel to limit sorting variation between control and disease samples. The sorted cells were counted, aliquoted and pelleted at 4{degree sign}C before flash freezing and storing at -80{degree sign}C. The storage time of the cell pellets varied between the donors. RNA and DNA were isolated from cell pellets collected from the same FACS sorting experiment; therefore, we do not expect differences in cell type composition. In addition, RNA and DNA isolation were performed for all sorted pellets in parallel. All library preparations for TWGBS and RNA-seq were performed for all donors in parallel and sequenced in a single batch to minimise batch effects and technical variability. This has now been clarified in the methods part of the manuscript.

      __CHANGE IN THE MANUSCRIPT____: __

      To minimize potential technical bias, samples from no COPD and COPD donors were processed in parallel in groups of 3 (one no COPD and 2 COPD samples).

      RNA and genomic DNA for RNA-seq and TWGBS were isolated from identical aliquots of sorted cell pellets.

      Genomic DNA was extracted from 1-2x104 sorted alveolar epithelial cells isolated from cryopreserved lung parenchyma from 11 different donors in parallel using QIAamp Micro Kit

      The TWGBS library preparations were performed for all donors in parallel and sequenced in a single batch to minimize batch effects and technical variability.* *

      RNA was isolated from flash-frozen pellets of 2x104 sorted AT2 cells from 11 different donors in parallel.

      The RNA-seq library preparation for all donors was performed in parallel and all samples were sequenced in a single batch to minimize batch effects and technical variability.

      • Line 193 the authors say "Since DMRs were overrepresented at cis-regulatory sites...." - "cis" needs to be defined. If you link DNAm regions to gene via "closest gene" does this not automatically mean you're outputs will be cis? Just needs better definition/explanation.

      Response:

      The term "cis‐regulatory sites" in our manuscript is intended to denote regulatory elements-such as enhancers, promoters, and other nearby control regions-that reside on the same chromosome and close to the genes they regulate. While it's true that linking a DMR to its closest gene captures a cis association, our phrasing emphasises that the DMRs are enriched specifically at these functional regulatory elements (Fig. 2E) rather than being randomly distributed. This usage aligns with established conventions in the field. To avoid any misunderstandings, we have now changed the term to gene regulatory sites.

      __CHANGE IN THE MANUSCRIPT____: __

      *We changed the "cis-regulatory sites" to "gene regulatory sites" *

      __Minor comments: __

      Line 157: "we identified site-specific differences....". Change to region specific?

      Response:

      This has now been corrected as suggested.

      Line 102-103: needs a reference for the statement "Alterations in DNA methylation patterns have been implicated......"

      Response:

      Following the reviewer's suggestion, we added the relevant references (34-36) to this statement.

      Line 266 - what does "strong dysregulation" mean? Large fold change, very significant?

      Response:

      We removed the word "strong" from this sentence.

      Lines 423-425 - statement needs a reference

      Response:

      Following the reviewer's suggestion, we added the relevant reference to this statement.

      Line 428 - word missing between "epigenetic , we"?

      Response:

      This has now been corrected. The text reads: "Through treatment with a demethylating drug and targeted epigenetic editing, we demonstrated the ability to modulate..."

      Prior studies are well references, text and figures are clear and accurate.

      __Reviewer #2 (Significance (Required)): __

      This study has several strengths:

      1) Sample collection and characterisation. AT2 cells are incredibly hard to come by and the authors should be commended to generating the samples. However, proximity to cancer is always a potential issue, especially in epigenetic studies. Is it feasible to include any analysis to show the samples derived from those with cancer don't drive the changes observed? Even a high level PCA or an edit of fig 2A with non-cancer in a different colour in supplemental - looks like there is one outlier, is that a non-cancer? Or a correlation of change in beta between control and cancer/COPD and control and non-cancer:COPD (for want a better phrase!). just an indicator that the non-cancer COPD samples are not driving differences.

      Response:

      We thank the reviewer for highlighting the value of generating data from hard-to-work-with AT2 populations and bringing up the important point of cancer proximity, which we considered very carefully when designing our study. To match our samples across the cohort, all the no-COPD, COPD I, and two of the COPD II-IV distal lung samples were obtained from cancer resections. In addition to other characteristics, like age, BMI and smoking status, we also matched the donors by cancer type (all profiled donors had squamous cell carcinoma). We collected lung tissue as far away from the carcinoma as possible and sent representative pieces for histological analysis by an experienced lung pathologist to confirm the absence of visible tumours. In addition, to ensure that our data represents COPD-relevant signatures, we intentionally included samples from three COPD donors undergoing lung resections (without a cancer background) in the profiling.

      Following the reviewer's suggestion, to investigate the potential impact of non-cancer samples on driving the observed differences, we carefully checked the PCAs for both DNA methylation and RNA-seq. We could not identify a clear separation of no-cancer COPD samples from the cancer COPD samples (or other cancer samples) in any examined PCs, indicating no cofounding effect of cancer samples. We observed that one sample contributing to PC2 is a non-cancer sample, but this was a rather sample-specific effect, as the other two non-cancer samples clustered together with the other severe COPD samples with a cancer background. Notably, in our DNA methylation data, we do not observe typical features of cancer methylomes, like global loss of DNA methylation or aberrant methylation of CpG islands (e.g., in tumour suppressor genes) (see Fig. 2A), further suggesting that we do not "pick up" confounding cancer signatures in our data.

      Following the comments from both reviewers, to clarify that point, we added the information about cancer and non-cancer samples to the PCA figures for DNA methylation (new Fig. 2B) and RNA-seq (new Fig. 3A) data in the revised manuscript, as shown below

      CHANGE IN THE MANUSCRIPT____:

      COPD samples from donors with a cancer background clustered together with the COPD samples from lung resections, confirming that we detected COPD-relevant signatures (Fig. 2B).

      Fig. 2B.* Principal component analysis (PCA) of methylation levels at CpG sites with > 4-fold coverage in all samples. COPD I and COPD II-IV samples are represented in light and dark green triangles, respectively, and no COPD samples as blue circles. COPD samples without a cancer background are displayed with a black contour. The percentage indicates the proportion of variance explained by each component. *

      Unsupervised principal component analysis (PCA) on the top 500 variable genes revealed a clear influence of the COPD phenotype in separating no COPD and COPD II-IV samples, as previously observed with the DNA methylation analysis, irrespective of the cancer background of COPD samples (Fig.3A, Fig. S2B).

      *Principal component analysis (PCA) of 500 most variable genes in RNA-seq analysis. PCA 1 and 2 are shown in Fig.3A, PCA 1 and 4 in Fig.S2B. COPD I and COPD II-IV samples are represented in light and dark green triangles, respectively, and no COPD samples as blue circles. COPD samples without a cancer background are displayed with a black contour. The percentage indicates the proportion of variance explained by each component. *

      2) This is the first time DNAm has been profiled in AT2 cells. It is incredibly difficult, valuable and novel data that will increase the fields capability technically, their understanding of functional mechanisms and potential translation considerably. It's audience will be primarily translational respiratory however the fundamental science aspect of gene expression regulation by DNA methylation with have wider reach across developmental and disease science.

      Response:

      We thank the reviewer for recognising the uniqueness and novelty of our study and highlighting the value and potential impact of our datasets for the lung field.

      3) the functional analysis using targeted CRISPR-Cas9 is very well done and adds impact.

      Response:

      We thank the reviewer for recognising the strengths and added value of the functional analysis using epigenetic editing.

      __Potential weaknesses/areas for development __

      I feel the main weakness is the in the section integrating DNA methylation and gene expression. The rationale for a focus on various aspects, for example inversely related DNAm/gene expression pairs, the IFN pathway and IRF9, are not clear. Also further understanding of the differences between DNAm associated genes and non-DNAm associated genes could be expanded, at the pathway level, TF regulation level, effect size level (are DNAm associated changes to gene expression larger, enriched for earlier differential expression)

      Response:

      Our rationale for focusing on the inversely related DNAm/gene expression pairs in promoter proximal is purely data-driven, as they represent the biggest group in our data (Fig. 4A-B). Among those negatively correlated genes, we observed the strongest enrichment for the IFN pathway (Fig. C), making it an obvious, data-driven target for further studies. The negative correlation of expression and methylation for IFN pathway genes could be validated in 5-AZA assays in A549 cells (Fig. 5A). Next, we made an interaction network analysis showing IRF9 and STAT2 as master regulators (Fig. 5B) of the negatively correlated IFN genes. As IRF9 itself displayed a negative correlation between DNA methylation and expression (Fig. 5C), we used the associated DMR for further epigenetic editing (Fig. 5D-E). We performed the additional requested analyses of the enhancer-associated changes and genes, as described above. We fully agree with the reviewer that our data sets are a great resource and can be further used to elaborate on other relationships of DNA methylation and RNA expression or other pathways, but this is out of the scope of this study. To enable further studies by the research community, we provide all necessary information about DMRs and DEGs in the associated supplementary tables and the raw data through the EGA, as well as the CRISPRa editing assay.

      The authors could comment on potential masking of differences between 5hmC and mC and the implications it may have

      Response:

      We thank the reviewer for bringing up this important point. Indeed, bisulfite sequencing cannot differentiate between methylated and hydroxymethylated cytosines; hence, some of the methylated sites may be hydroxymethylated. However, the overall levels of hydromethylation in differentiated adult tissues are very low (except for the brain), orders of magnitude lower compared to DNA methylation. Following the reviewer's suggestion, we have added a sentence in the limitation section of the discussion to clarify that point.

      __CHANGE IN THE MANUSCRIPT: __

      In addition, while WGBS provides unprecedented resolution and high coverage of the DNA methylation sites across the genome, it does not allow distinguishing 5-methylcytosine from 5-hydroxymethylcytosine. Therefore, we cannot exclude that some methylated sites we detected are 5-hydroxymethylated. However, the 5-hydroxymethylcytosine is present at very low levels in the lung tissue (97)*. ** *

      Furthermore, while the rationale for looking at DMRs is clear, especially given the sample number, I am interested to understand what proportion of the assayed CpGs "fit" within the cut off stipulations of the DMR analysis - that is, is their potentially COPD effects at sparse CpG regions/individual CpG sites that are not being identified. A comment on this would be useful and seems the strength of profiling genome wide. I'm happy genome wide is beneficial it just feels a little circular that the authors have chosen whole genome to avoid the bias of the Illumina array and a focus on promotors, but have primarily reported promoter DNAm. This caught my attention again in the discussion where the authors state that cis-regulatory regions were also identified in their fibroblast data .....is this finding a factor of the analysis performed? (also a comparison of regions Identified in AT2 cells versus fibroblasts would be really interesting for a future paper)

      Response:

      We decided to focus our analysis on regions rather than individual CpG sites when looking at differential methylation, as DNA methylation is spatially correlated, and methylation changes in larger regions are more likely to have a biological function. Extending the analysis to single CpG sites would require a higher number of samples for a reliable analysis compared to the DMR analysis (as mentioned by the reviewer).

      Of note, we addressed the platform comparison between Illumina array technology and WGBS in our previous fibroblast study (PMID: 37143403), where we compared our WGBS data with the published 450k array data of COPD parenchymal fibroblasts (Clifford et al., 2018). We observed only a marginal overlap between the CpGs from our DMRs and the CpGs probes available on the array (which was due to the differences in technologies used and the limited coverage of the 450K array in comparison to our genome-wide approach, in which we covered 18 million CpGs). Out of the 6279 DMRs identified in our fibroblast study, only 1509 DMRs overlapped with at least one CpG probe on the 450K array, and after removing low-quality CpGs from the array data, only 1419 DMRs were left. This comparison highlighted the increased resolution of the WGBS compared to Illumina arrays.

      The reason why we focused on promoter proximal DMRs are the following: 1) the assignment of the enhancer elements in AT2 to the corresponding gene is still too inaccurate in the absence of AT2 specific enhancer chromatin maps 2) regulation at enhancers by DNA methylation might be more complex and might change (increase or attenuate) binding affinities of certain transcription factors (Fig.2H), which might lead to gene expression changes or 3) methylation changes might be an indirect effect of differential TF binding PMID: 22170606). However, we agree with the reviewer that despite these limitations, expanding the analysis beyond promoters adds value to the manuscript; hence, as described above, we expanded the analysis of non-promoter regions, including enhancers, in the revised manuscript.

      We thank the reviewer for the suggestion to compare the regions identified in AT2 cells and fibroblasts in a future paper.

      My expertise:Respiratory, cell biology, epigenetics.

    1. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Gerken et al examined how neurons in the human medial temporal lobe respond to and potentially code dynamic movie content. They had 29 patients watch a long-form movie while neurons within their MTL were monitored using depth electrodes. They found that neurons throughout the region were responsive to the content of the movie. In particular, neurons showed significant responses to people, places, and to a lesser extent, movie cuts. Modeling with a neural network suggests that neural activity within the recorded regions was better at predicting the content of the movies as a population, as opposed to individual neural representations. Surprisingly, a subpopulation of unresponsive neurons performed better than the responsive neurons at decoding the movie content, further suggesting that while classically nonresponsive, these neurons nonetheless provided critical information about the content of the visual world. The authors conclude from these results that low-level visual features, such as scene cuts, may be coded at the neuronal level, but that semantic features rely on distributed population-level codes.

      Strengths:

      Overall, the manuscript presents an interesting and reasonable argument for their findings and conclusions. Additionally, the large number of patients and neurons that were recorded and analyzed makes this data set unique and potentially very powerful. On the whole, the manuscript was very well written, and as it is, presents an interesting and useful set of data about the intricacies of how dynamic naturalistic semantic information may be processed within the medial temporal lobe.

      Weaknesses:

      There are a number of concerns I have based on some of the experimental and statistical methods employed that I feel would help to improve our understanding of the current data.

      In particular, the authors do not address the issue of superposed visual features very well throughout the manuscript. Previous research using naturalistic movies has shown that low-level visual features, particularly motion, are capable of driving much of the visual system (e.g, Bartels et al 2005; Bartels et al 2007; Huth et al 2012; Çukur et al 2013; Russ et al 2015; Nentwich et al 2023). In some of these papers, low-level features were regressed out to look at the influence of semantics, in others, the influence of low-level features was explicitly modeled. The current manuscript, for the most part, appears to ignore these features with the exception of scene cuts. Based on the previous evidence that low-level features continue to drive later cortical regions, it seems like including these as regressors of no interest or, more ideally, as additional variables, would help to determine how well MTL codes for semantic features over top of these lower-order variables.

      Following on this, much of the current analyses rely on the training of deep neural networks to decode particular features. The results of these analyses are illuminating, however, throughout the manuscript, I was increasingly wondering how the various variables interact with each other. For example, separate analyses were done for the patients, regions, and visual features. However, the logistic regression analysis that was employed could have all of these variables input together, obtaining beta weights for each one in an overall model. This would potentially provide information about how much each variable contributes to the overall decoding in relation to the others.

      A few more minor points that would help to clarify the current results involve the selection of data for particular analyses. For some analyses, the authors chose to appropriately downsample their data sets to compare across variables. However, there are a few places where similar downsampling would be informative, but was not completed. In particular, the analyses for patients and regions may have a more informative comparison if the full population were downsampled to match the size of the population for each patient or region of interest. This could be done with the Monte Carlo sampling that is used in other analyses, thus providing a control for population size while still sampling the full population.

    2. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Gerken et al examined how neurons in the human medial temporal lobe respond to and potentially code dynamic movie content. They had 29 patients watch a long-form movie while neurons within their MTL were monitored using depth electrodes. They found that neurons throughout the region were responsive to the content of the movie. In particular, neurons showed significant responses to people, places, and to a lesser extent, movie cuts. Modeling with a neural network suggests that neural activity within the recorded regions was better at predicting the content of the movies as a population, as opposed to individual neural representations. Surprisingly, a subpopulation of unresponsive neurons performed better than the responsive neurons at decoding the movie content, further suggesting that while classically nonresponsive, these neurons nonetheless provided critical information about the content of the visual world. The authors conclude from these results that low-level visual features, such as scene cuts, may be coded at the neuronal level, but that semantic features rely on distributed population-level codes.

      Strengths:

      Overall, the manuscript presents an interesting and reasonable argument for their findings and conclusions. Additionally, the large number of patients and neurons that were recorded and analyzed makes this data set unique and potentially very powerful. On the whole, the manuscript was very well written, and as it is, presents an interesting and useful set of data about the intricacies of how dynamic naturalistic semantic information may be processed within the medial temporal lobe.

      We thank the reviewer for their comments on our manuscript and for describing the strengths of our presented work

      Weaknesses:

      There are a number of concerns I have based on some of the experimental and statistical methods employed that I feel would help to improve our understanding of the current data.

      In particular, the authors do not address the issue of superposed visual features very well throughout the manuscript. Previous research using naturalistic movies has shown that low-level visual features, particularly motion, are capable of driving much of the visual system (e.g, Bartels et al 2005; Bartels et al 2007; Huth et al 2012; Çukur et al 2013; Russ et al 2015; Nentwich et al 2023). In some of these papers, low-level features were regressed out to look at the influence of semantics, in others, the influence of low-level features was explicitly modeled. The current manuscript, for the most part, appears to ignore these features with the exception of scene cuts. Based on the previous evidence that low-level features continue to drive later cortical regions, it seems like including these as regressors of no interest or, more ideally, as additional variables, would help to determine how well MTL codes for semantic features over top of these lower-order variables.

      We thank the reviewer for this insightful comment and for the relevant literature regarding visual motion in not only the primary visual system but in cortical areas as well. While we agree that the inclusion of visual motion as a regressor of no interest or as an additional variable would be overall informative in determining if single neurons in the MTL are driven by this level of feature, we would argue that our analyses already provide some insight into its role and that only the parahippocampal cortical neurons would robustly track this feature.

      As noted by the reviewer, our model includes two features derived from visual motion: Camera Cuts (directly derived from frame-wise changes in pixel values)  and Scene Cuts (a subset of Camera Cuts restricted to changes in scene). As shown in Fig. 5a, decoding performance for these features was strongest in the parahippocampal cortex (~20%), compared to other MTL areas (~10%). While the entorhinal cortex also showed some performance for Scene Cuts (15%), we interpret this as being driven by the changes in location that define a scene, rather than by motion itself.

      These findings suggest that while motion features are tracked in the MTL, the effect may be most robust in the parahippocampal cortex. We believe that quantifying more complex 3D motion in a naturalistic stimulus like a full-length movie is a significant challenge that would likely require a dedicated study. We agree this is an interesting future research direction and will update the manuscript to highlight this for the reader.

      A few more minor points that would help to clarify the current results involve the selection of data for particular analyses. For some analyses, the authors chose to appropriately downsample their data sets to compare across variables. However, there are a few places where similar downsampling would be informative, but was not completed. In particular, the analyses for patients and regions may have a more informative comparison if the full population were downsampled to match the size of the population for each patient or region of interest. This could be done with the Monte Carlo sampling that is used in other analyses, thus providing a control for population size while still sampling the full population.

      We thank the reviewer for raising this important methodological point. The decision not to downsample the patient- and region-specific analyses was deliberate, and we appreciate the opportunity to clarify our rationale.

      Generally, we would like to emphasize that due to technical and ethical limitations of human single-neuron recordings, it is currently not possible to record large populations of neurons simultaneously in individual patients. The limited and variable number of recorded neurons per subject (Fig. S1) generally requires pooling neurons into a pseudo-populations for decoding, which is a well‐established standard in human single‐neuron studies (see e.g., (Jamali et al., 2021; Kamiński et al., 2017; Minxha et al., 2020; Rutishauser et al., 2015; Zheng et al., 2022)).

      For the patient-specific analysis, our primary goal was to show that no single patient's data could match the performance of the complete pseudo-population. Crucially, we found no direct relationship between the number of recorded neurons and decoding performance; patients with the most neurons (patients 4, 13) were not top performers, and those with the fewest (patients 11, 14) were not the worst (see Fig. 4). This indicates that neuron count was not the primary limiting factor and that downsampling would be unlikely to provide additional insight.

      Similarly, for the region-specific analysis, regions with larger neural populations did not systematically outperform those with fewer neurons (Fig. 5). Given the inherent sparseness of single-neuron data, we concluded that retaining the full dataset was more informative than excluding neurons simply to equalize population sizes.

      We agree that this methodological choice should be transparent and explicitly justified in the text. We will add an explanation to the revised manuscript to justify why this approach was taken and how it differs from the analysis in Fig. 6.

      Reviewer #2 (Public review):

      Summary:

      This study introduces an exciting dataset of single-unit responses in humans during a naturalistic and dynamic movie stimulus, with recordings from multiple regions within the medial temporal lobe. The authors use both a traditional firing-rate analysis as well as a sophisticated decoding analysis to connect these neural responses to the visual content of the movie, such as which character is currently on screen.

      Strengths:

      The results reveal some surprising similarities and differences between these two kinds of analyses. For visual transitions (such as camera angle cuts), the neurons identified in the traditional response analysis (looking for changes in firing rate of an individual neuron at a transition) were the most useful for doing population-level decoding of these cuts. Interestingly, this wasn't true for character decoding; excluding these "responsive" neurons largely did not impact population-level decoding, suggesting that the population representation is distributed and not well-captured by individual-neuron analyses.

      The methods and results are well-described both in the text and in the figures. This work could be an excellent starting point for further research on this topic to understand the complex representational dynamics of single neurons during naturalistic perception.

      We thank the reviewer for their feedback and for summarizing the results of our work.

      (1) I am unsure what the central scientific questions of this work are, and how the findings should impact our understanding of neural representations. Among the questions listed in the introduction is "Which brain regions are informative for specific stimulus categories?". This is a broad research area that has been addressed in many neuroimaging studies for decades, and it's not clear that the results tell us new information about region selectivity. "Is the relevant information distributed across the neuronal population?" is also a question with a long history of work in neuroscience about localist vs distributed representations, so I did not understand what specific claim was being made and tested here. Responses in individual neurons were found for all features across many regions (e.g., Table S1), but decodable information was also spread across the population.

      We thank the reviewer for this important point, which gets to the core of our study's contribution. While concepts like regional specificity are well-established from studies on the blood-flow level, their investigation at the single-neuron level in humans during naturalistic, dynamic stimulation remains a critical open question. The type of coding (sparse vs. distributed) on the other hand cannot be investigated with blood-flow studies as the technology lacks the spatial and temporal resolution.

      Our study addresses this gap directly. The exceptional temporal resolution of single-neuron recordings allows us to move beyond traditional paradigms and examine cellular-level dynamics as they unfold in neuronal response on a frame-by-frame basis to a more naturalistic and ecologically valid stimulus. It cannot be assumed that findings from other modalities or simplified stimuli will generalize to this context.

      To meet this challenge, we employed a dual analytical strategy: combining a classic single-unit approach with a machine learning-based population analysis. This allowed us to create a bridge between prior work and our more naturalistic data. A key result is that our findings are often consistent with the existing literature, which validates the generalizability of those principles. However, the differences we observe between these two analytical approaches are equally informative, providing new insights into how the brain processes continuous, real-world information.

      We will revise the introduction and discussion to more explicitly frame our work in this context, emphasizing the specific scientific question driving this study, while also highlighting the strengths of our experimental design and recording methods.

      (2) The character and indoor/outdoor labels seem fundamentally different from the scene/camera cut labels, and I was confused by the way that the cuts were put into the decoding framework. The decoding analyses took a 1600ms window around a frame of the video (despite labeling these as frame "onsets" like the feature onsets in the responsive-neuron analysis, I believe this is for any frame regardless of whether it is the onset of a feature), with the goal of predicting a binary label for that frame. Although this makes sense for the character and indoor/outdoor labels, which are a property of a specific frame, it is confusing for the cut labels since these are inherently about a change across frames. The way the authors handle this is by labeling frames as cuts if they are in the 520ms following a cut (there is no justification given for this specific value). Since the input to a decoder is 1600ms, this seems like a challenging decoding setup; the model must respond that an input is a "cut" if there is a cut-specific pattern present approximately in the middle of the window, but not if the pattern appears near the sides of the window. A more straightforward approach would be, for example, to try to discriminate between windows just after a cut versus windows during other parts of the video. It is also unclear how neurons "responsive" to cuts were defined, since the authors state that this was determined by looking for times when a feature was absent for 1000ms to continuously present for 1000ms, which would never happen for cuts (unless this definition was different for cuts?).

      We thank the reviewer for the valuable comment regarding specifically the cut labels. The choice to label frames that lie in a time window of 520ms following a cut as positive was selected based on prior research and is intended to include the response onsets across all regions within the MTL (Mormann et al., 2008). We agree that this explanation is currently missing from the manuscript, and we will add a brief clarification in the revised version.

      As correctly noted, the decoding analysis does not rely on feature onset but instead continuously decodes features throughout the entire movie. Thus, all frames are included, regardless of whether they correspond to a feature onset.

      Our treatment of cut labels as sustained events is a deliberate methodological choice. Neural responses to events like cuts often unfold over time, and by extending the label, we provide our LSTM network with the necessary temporal window to learn this evolving signature. This approach not only leverages the sequential processing strengths of the LSTM (Hochreiter et al., 1997) but also ensures a consistent analytical framework for both event-based (cuts) and state-based (character or location) features.

      (3) The architecture of the decoding model is interesting but needs more explanation. The data is preprocessed with "a linear layer of same size as the input" (is this a layer added to the LSTM that is also trained for classification, or a separate step?), and the number of linear layers after the LSTM is "adapted" for each label type (how many were used for each label?). The LSTM also gets to see data from 800 ms before and after the labeled frame, but usually LSTMs have internal parameters that are the same for all timesteps; can the model know when the "critical" central frame is being input versus the context, i.e., are the inputs temporally tagged in some way? This may not be a big issue for the character or location labels, which appear to be contiguous over long durations and therefore the same label would usually be present for all 1600ms, but this seems like a major issue for the cut labels since the window will include a mix of frames with opposite labels.

      We thank the reviewer for their insightful comments regarding the decoding architecture. The model consists of an LSTM followed by 1–3 linear readout layers, where the exact number of layers is treated as a hyperparameter and selected based on validation performance for each label type. The initial linear layer applied to the input is part of the trainable model and serves as a projection layer to transform the binned neural activity into a suitable feature space before feeding it into the LSTM. The model is trained in an end-to-end fashion on the classification task.

      Regarding temporal context, the model receives a 1600 ms window (800 ms before and after the labeled frame), and as correctly pointed out by the reviewer, LSTM parameters are shared across time steps. We do not explicitly tag the temporal position of the central frame within the sequence. While this may have limited impact for labels that persist over time (e.g., characters or locations), we agree this could pose a challenge for cut labels, which are more temporally localized.

      This is an important point, and we will clarify this limitation in the revised manuscript and consider incorporating positional encoding in future work to better guide the model’s focus within the temporal window. Additionally, we will add a data table, specifying the ranges of hyperparameters in our decoding networks. Hyperparameters were optimized for each feature and split individually, but we agree that some more details on how these parameters were chosen are important and we will provide a data table in our revised manuscript giving more insights into the ranges of hyperparameters.

      We thank the reviewer for this important point. We will clarify this limitation in the revised manuscript and note that positional encoding is a valuable direction to better guide the model’s focus within the temporal window. To improve methodological transparency, we will also add a supplementary table detailing the hyperparameter ranges used for our optimization process.

      (4) Because this is a naturalistic stimulus, some labels are very imbalanced ("Persons" appears in almost every frame), and the labels are correlated. The authors attempt to address the imbalance issue by oversampling the minority class during training, though it's not clear this is the right approach since the test data does not appear to be oversampled; for example, training the Persons decoder to label 50% of training frames as having people seems like it could lead to poor performance on a test set with nearly 100% Persons frames, versus a model trained to be biased toward the most common class. [...]

      We thank the reviewer for this critical and thoughtful comment. We agree that the imbalanced and correlated nature of labels in naturalistic stimuli is a key challenge.

      To address this, we follow a standard machine learning practice: oversampling is applied exclusively to the training data. This technique helps the model learn from underrepresented classes by creating more balanced training batches, thus preventing it from simply defaulting to the majority class. Crucially, the test set remains unaltered to ensure our evaluation reflects the model's true generalization performance on the natural data distribution.

      For the “Persons” feature, which appears in nearly all frames, defining a meaningful negative class is particularly challenging. The decoder must learn to identify subtle variations within a highly skewed distribution. Oversampling during training helps provide a more balanced learning signal, while keeping the test distribution intact ensures proper evaluation of generalization.

      The reviewer’s comment—that we are “training the Persons decoder to label 50% of training frames as having people”—may suggest that labels were modified. We want to emphasize this is not the case. Our oversampling strategy does not alter the labels; it simply increases the exposure of the rare, underrepresented class during training to ensure the model can learn its pattern despite its low frequency.

      We will revise the Methods section to describe this standard procedure more explicitly, clarifying that oversampling is a training-only strategy to mitigate class imbalance.

      (5) Are "responsive" neurons defined as only those showing firing increases at a feature onset, or would decreased activity also count as responsive? If only positive changes are labeled responsive, this would help explain how non-responsive neurons could be useful in a decoding analysis.

      We define responsive neurons as those showing increased firing rates at feature onset; we did not test for decreases in activity. We thank the reviewer for this valuable comment and will address this point in the revised manuscript by assessing responseness without a restriction on the direction of the firing rate.

      (6) Line 516 states that the scene cuts here are analogous to the hard boundaries in Zheng et al. (2022), but the hard boundaries are transitions between completely unrelated movies rather than scenes within the same movie. Previous work has found that within-movie and across-movie transitions may rely on different mechanisms, e.g., see Lee & Chen, 2022 (10.7554/eLife.73693).

      We thank the reviewer for pointing out this distinction and for including the relevant work from Lee & Chan (2022) which further contextualizes this distinction. Indeed, the hard boundaries defined in the cited paper differ slightly from ours. The study distinguishes between (1) hard boundaries—transitions between unrelated movies—and (2) soft boundaries—transitions between related events within the same movie. While our camera cuts resemble their soft boundaries, our scene cuts do not fully align with either category. We defined scene cuts to be more similar to the study’s hard boundaries, but we recognize this correspondence is not exact. We will clarify the distinctions between our scene cuts and the hard boundaries described in Zheng et al. (2022) in the revised manuscript, and will update our text to include the finding from Lee & Chan (2022).

      Reviewer #3 (Public review):

      This is an excellent, very interesting paper. There is a groundbreaking analysis of the data, going from typical picture presentation paradigms to more realistic conditions. I would like to ask the authors to consider a few points in the comments below.

      (1) From Figure 2, I understand that there are 7 neurons responding to the character Summer, but then in line 157, we learn that there are 46. Are the other 39 from other areas (not parahippocampal)? If this is the case, it would be important to see examples of these responses, as one of the main claims is that it is possible to decode as good or better with non-responsive compared to single responsive neurons, which is, in principle, surprising.

      We thank the reviewer for pointing out this ambiguity in the text. Yes, the other 39 units are responsive neurons from other areas. We will clarify to which neuronal sets the number of responsive neurons corresponds. We will also include response plots depicting the unit activity for the mentioned units.

      (2) Also in Figure 2, there seem to be relatively very few neurons responding to Summer (1.88%) and to outdoor scenes (1.07%). Is this significant? Isn't it also a bit surprising, particularly for outdoor scenes, considering a previous paper of Mormann showing many outdoor scene responses in this area? It would be nice if the authors could comment on this.

      We thank the reviewer for this insightful point. While a low response to the general 'outdoor scene' label seems surprising at first, our findings align with the established role of the parahippocampal cortex (PHC) in processing scenes and spatial layouts. In previous work using static images, each image introduces a new spatial context. In our movie stimulus, new spatial contexts specifically emerge at scene cuts. Accordingly, our data show a strong PHC response precisely at these moments. We will revise the discussion to emphasize this interpretation, highlighting the consistency with prior work.

      Regarding the first comment, we did not originally test if the proportion of the units is significant using e.g. a binomial test. We will include the results of a binomial test for each region and feature pair in the revised manuscript.

      (3) I was also surprised to see that there are many fewer responses to scene cuts (6.7%) compared to camera cuts (51%) because every scene cut involves a camera cut. Could this have been a result of the much larger number of camera cuts? (A way to test this would be to subsample the camera cuts.)

      The decrease in responsive units for scene cuts relative to camera cuts could indeed be due to the overall decrease in “trials” from one label to the other. To test this, we will follow the reviewer’s suggestion and perform tests using sets of randomly subsampled camera cuts and will include the results in the revised manuscript.

      (4) Line 201. The analysis of decoding on a per-patient basis is important, but it should be done on a per-session basis - i.e., considering only simultaneously recorded neurons, without any pooling. This is because pooling can overestimate decoding performances (see e.g. Quian Quiroga and Panzeri NRN 2009). If there was only one session per patient, then this should be called 'per-session' rather than 'per-patient' to make it clear that there was no pooling.

      The per-patient decoding was indeed also a per-session decoding, as each patient contributed only a single session to the dataset. We will make note of this explicitly in the text to resolve the ambiguity.

      (6) Lines 406-407. The claim that stimulus-selective responses to characters did not account for the decoding of the same character is very surprising. If I understood it correctly, the response criterion the authors used gives 'responsiveness' but not 'selectivity'. So, were people's responses selective (e.g., firing only to Summer) or non-selective (firing to a few characters)? This could explain why they didn't get good decoding results with responsive neurons. Again, it would be nice to see confusion matrices with the decoding of the characters. Another reason for this is that what are labelled as responsive neurons have relatively weak and variable responses.

      We thank the reviewer for pointing out the importance of selectivity in addition to responsiveness. Indeed, our response criterion does not take stimulus selectivity into account and exclusively measures increases in firing activity after feature onsets for a given feature irrespective of other features.

      We will adjust the text to reflect this shortcoming of the response-detection approach used here. To clarify the relationship between neural populations, we will add visualizations of the overlap of responsive neurons across labels for each subregion. These figures will be included in the revised manuscript.

      In our approach, we trained separate networks for each feature to effectively mitigate the issue of correlated feature labels within the dataset (see earlier discussion). While this strategy effectively deals with the correlated features, it precluded the generation of standard confusion matrices, as classification was performed independently for each feature.

      To directly assess the feature selectivity of responsive neurons, we will fit generalized linear models to predict their firing rates from the features. This approach will enable us to quantify their selectivity and compare it to that of the broader neuronal population.

      (7) Line 455. The claim that 500 neurons drive decoding performance is very subjective. 500 neurons gives a performance of 0.38, and 50 neurons gives 0.33.

      We agree with the reviewer that the phrasing is unclear. We will adjust our summary of this analysis as given in Line 455 to reflect that the logistic regression-derived neuronal rankings produce a subset which achieve comparable performance.

      (8) Lines 492-494. I disagree with the claim that "character decoding does not rely on individual cells, as removing neurons that responded strongly to character onset had little impact on performance". I have not seen strong responses to characters in the paper. In particular, the response to Summer in Figure 2 looks very variable and relatively weak. If there are stronger responses to characters, please show them to make a convincing argument. It is fine to argue that you can get information from the population, but in my view, there are no good single-cell responses (perhaps because the actors and the movie were unknown to the subjects) to make this claim. Also, an older paper (Quian Quiroga et al J. Neurophysiol. 2007) showed that the decoding of individual stimuli in a picture presentation paradigm was determined by the responsive neurons and that the non-responsive neurons did not add any information. The results here could be different due to the use of movies instead of picture presentations, but most likely due to the fact that, in the picture presentation paradigm, the pictures were of famous people for which there were strong single neuron responses, unlike with the relatively unknown persons in this paper.

      This is an important point and we thank the reviewer for highlighting a previous paradigm in which responsive neurons did drive decoding performance. Indeed, the fact that the movie, its characters and the corresponding actors were novel to patients could explain the disparity in decoding performance by way of weaker and more variable responses. We will include additional examples in the supplement of responses to features. Additionally, we will modify the text to emphasize the point that reliable decoding is possible even in the absence of a robust set of neuronal responses. It could indeed be the case that a decoder would place more weight on responsive units if they were present (as shown in the mentioned paper and in our decoding from visual transitions in the parahippocampal cortex).

    1. Summary of the Discussion on SwiftUI: Understanding Identity, Lifetime, and Dependencies

      1. Introduction to SwiftUI and Its Declarative Nature

      2. SwiftUI operates as a declarative UI framework where you describe UI states, and the framework manages their actualization. "That means that you describe what you want for your app at a high level, and SwiftUI decides exactly how to make it happen."

      3. Understanding Identity in SwiftUI

      4. SwiftUI views have identity to distinguish elements as the same or different across updates, critical for UI transitions and state management. "Identity is how SwiftUI recognizes elements as the same or distinct across multiple updates of your app."

      5. Concept of View Identity Using Practical Examples

      6. Demonstrated using the "Good Dog, Bad Dog" app example, explaining how identity influences view transitions and behavior. "That distinction actually matters a great deal because it changes how our interface transitions from one state to another."

      7. Explicit vs. Structural Identity

      8. Discussed two types of identity:

        • Explicit identity is assigned using identifiers like tags. "Explicit identity is powerful and flexible, but does require that someone, somewhere keeps track of all those names."
        • Structural identity is derived from the view's type and position in the hierarchy. "SwiftUI uses the structure of your view hierarchy to generate implicit identities for your views so you don't have to."
      9. Role of Lifetime in SwiftUI

      10. Explained how SwiftUI manages the life cycle of views and data by associating views' identity over time. "Lifetime is how SwiftUI tracks the existence of views and data over time."

      11. Impact of Dependencies on UI Updates

      12. Dependencies are inputs like state variables or environmental settings that trigger UI updates when they change. "Dependencies are how SwiftUI understands when your interface needs to be updated and why."

      13. How SwiftUI Manages State and Identity

      14. Discussed how State and StateObject help preserve state across the lifetime of views tied to their identity. "State and StateObject are the persistent storage associated with your view's identity."

      15. Advanced Use of Identity with SwiftUI's ForEach

      16. ForEach leverages identifiers for efficient updates and animations, showing how identity can impact performance and correctness. "Choosing a good identifier is your opportunity to control the lifetime of your views and data."

      17. Best Practices for Using Identity

      18. Emphasized the importance of stable and unique identifiers to improve performance and prevent state loss. "An identifier that isn't stable can result in a shorter view lifetime."

      19. Troubleshooting and Optimization Techniques

        • Discussed common pitfalls with AnyView and alternatives using view builders to optimize SwiftUI’s understanding and performance. "Having too many AnyViews will often make code harder to read and understand."

      Key Takeaways

      • Identity, lifetime, and dependencies are core concepts that determine how SwiftUI manages and updates the UI.
      • Effective management of these properties can significantly enhance the performance and predictability of SwiftUI applications.
      • Developers are encouraged to use stable and unique identifiers and understand the implications of explicit and structural identities on their code.

      This summary focuses on the critical aspects discussed in the tech talk, ensuring a comprehensive understanding of the primary themes and practical implications for SwiftUI developers.

    1. Eric Normand
      • Introduction

      • Eric Normand introduces himself and the purpose of the talk: "The title of this talk is building composable abstractions...to develop a process to do that and also I'd like to start a discussion about how we can do that better."

      • Importance of Abstractions

      • Abstractions are critical for creating complex applications from small, simple problems. "A lot of people are able to solve small problems like Fibonacci...when they finally want to create an app...they don't know how to take the tools that they've learned and turn them into software."

      • Map of the Talk

      • The talk covers the importance of abstractions, the process of developing them, an example, and concluding thoughts. "Here's sort of the map of the talk: why focus on abstractions, the process, an example abstraction, and concluding thoughts."

      • Why Focus on Abstractions?

      • Refactoring introduces the distinction between the behavior of the code and its implementation. "In the general industry we now have this idea that there's a difference between the behavior of the code and the actual implementation."

      • Example of Newtonian mechanics replacing Aristotelian physics illustrates that some systems can't be refactored but need to be redesigned from scratch. "You can't refactor Aristotle into Newton."

      • Objectives of the Abstraction Process

      • The process should produce good, Newtonian-style abstractions, be iterative, accessible to all, and foster collaboration. "It has to consistently produce good abstractions...an iterative process...anyone can do it...fosters collaboration."

      • Example of Vector Graphics System

      • Normand uses a simple vector graphics system as an example to demonstrate the process of building abstractions. "This is the example we're going to develop: a vector graphics system."

      • Step 1: Physical Metaphor

      • Choose a metaphor to capture important information. "The idea behind this is to choose a metaphor that will capture the important information in your program."

      • Shapes and construction paper is chosen as the metaphor. "Shapes and construction paper...I cut out shapes like rectangles and ellipses...and then I can move them around."

      • Step 2: Meaning Construction

      • Convert physical intuition into precise mathematical language, focusing on the interface. "We're going to be focusing on the interface right now...precise mathematical language."

      • Definitions in Clojure for different components like color, shape, and transformations. "We're defining two types here: cutout and shape...defining a function that takes a cutout and returns a shape."
      • Importance of preserving shape and color, overlay order, and rotation and translation independence. "Preservation of shape...preservation of color...overlay order...rotation and translation independence."

      • Step 3: Implementation

      • Implement the system based on the constructed meaning, ensuring it can be refactored to different requirements like SVG output. "Implementation...we already know what to do...refactor from quill to SVG."

      • Summary of Process

      • Use a physical metaphor, define the parts and their relationships in mathematical language, and refactor for implementation details. "Use a physical metaphor...define the parts and their relationships...refactor to get all the meta properties."

      • Corollaries for the Process

      • Know your domain, constructs, and refactoring techniques. "Know your domain...know your constructs...know your refactoring."

      • Conclusion

      • Encourages further learning and provides resources. "Please go to my site...download the slides...sign up for my newsletter."

    1. Configure the Test Application in Python
      • Copy and save the code into your project folder using a text editor. Be sure to save the filename with a .py extension.
      • Modify the code to use your Google Sheets ID.
      • Execute the program: python3 <your-file>.py
    1. Jacques Rancière : L'Émancipation Intellectuelle Aujourd'hui - Synthèse et Analyse Ce document de synthèse présente les thèmes principaux et les idées essentielles développées par Jacques Rancière dans son exposé sur "L'émancipation intellectuelle aujourd'hui", en s'appuyant sur des citations directes pour éclairer ses propos.

      Introduction à la Conférence : La Schizophrénie du Formateur et le Lien au Travail Social L'introduction de la conférence par Élisabeth Lefort établit d'emblée un pont entre la théorie universitaire et la pratique concrète du travail social, notamment en évoquant la difficulté d'allier sa formation universitaire et ses fonctions de cadre de formation. Elle utilise deux "vignettes cliniques" pour illustrer les problématiques de l'inégalité intellectuelle et de la domination pédagogique.

      Vignette Clinique 1 : Madame P et la Langue Maternelle (1990) Madame P, immigrée vietnamienne, est convoquée à l'école de sa fille de 3 ans, qui ne parle pas en classe. La directrice insiste sur la nécessité de parler français à la maison pour éviter un retard scolaire, malgré le fait que Madame P exprime ses sentiments en vietnamien. Cette vignette met en lumière la pression institutionnelle à l'uniformisation linguistique et la négation d'une identité et d'un savoir situés.

      Vignette Clinique 2 : Madame T et le Multilinguisme (2010) Madame T, postdoctorante turque maîtrisant cinq langues, est confrontée à une situation similaire concernant son fils de 3 ans à Bruxelles. La directrice de l'école s'inquiète de son isolement et du fait qu'il ne parle que le turc à la maison. Cependant, Madame T réfute les arguments de la directrice en citant des études scientifiques prouvant les avantages du multilinguisme. Cette vignette illustre la persistance de la "relation pédagogique écrasante" malgré l'évolution des contextes sociaux et des niveaux d'éducation des mères.

      Ces deux situations, séparées par vingt ans, sont présentées comme des manifestations d'une même "relation pédagogique écrasante qui s'exprime et qui nie des identités et des savoirs situés". Elles servent de point de départ pour introduire la pensée de Jacques Rancière, et plus particulièrement sa thèse de "l'égalité des intelligences" tirée de son œuvre "Le Maître ignorant".

      La Pensée de Joseph Jacotot et la Subversion de l'Ordre Éducatif Jacques Rancière, en réponse à l'introduction, souligne la nature paradoxale de sa conférence au sein d'une institution éducative, lui qui a toujours affirmé que "aucune institution n'émancipe jamais personne". Il se propose d'éclaircir la pensée de Joseph Jacotot (années 1820-1830), souvent perçu comme un pédagogue excentrique, mais dont les idées sont une "provocation radicale à l'égard de tout un ordre des choses et des pensées, un ordre à la fois intellectuel et politique".

      La Logique Pédagogique comme Instrument d'Ordre Social Jacotot s'inscrit dans un contexte post-révolutionnaire où l'instruction est envisagée comme un moyen de "achever la révolution", c'est-à-dire de mettre fin au désordre tout en accompagnant le progrès. La pédagogie devient un modèle pour la société : "l'exercice de l'autorité des maîtres et la soumission des élèves n'a pas d'autre but en principe que la progression des élèves aussi loin qu'ils peuvent aller". L'instruction est vue comme un moyen de former les élites et de "gouverner la société par les gens instruits", tout en offrant au peuple les connaissances "nécessaires et suffisantes" pour s'intégrer pacifiquement. Cette approche présuppose une progression sage et adaptée au "niveau d'intelligence supposé un peu primitive ou un peu frustre" du peuple.

      La Thèse Jacotiste : L'Égalité n'est pas un But, mais un Point de Départ La rupture radicale de Jacotot se résume par l'affirmation suivante : "La distance que l'école et la société pédagogisée font formé à son modèle prétendent réduire est en réalité la distance dont elles vivent et qu'elle ne cesse de reproduire." L'égalité n'est pas un résultat à atteindre en réduisant l'inégalité, mais une "opinion fondamentale" ou un point de départ. L'inégalité n'est pas un état de fait, mais une "position de principe" de la logique pédagogique.

      L'Explication : Vecteur de l'Abroutissement L'acte pédagogique fondamental, l'explication, est dénoncé par Jacotot. Il ne s'agit pas seulement d'une procédure pratique, mais d'une métaphore de la relation inégalitaire : "l'explication se donne un petit peu comme l'espèce de procédure qui va un petit peu lever le voile... mais en même temps euh disons on va disons mettre cette connaissance à sa place dans la totalité du savoir et on va la délivrer bah au bon moment quoi." L'explication, en prétendant combler un défaut de savoir, "construit et reproduit continuellement la présupposition inégalitaire fondamentale", en postulant deux intelligences : une "inférieure" (celle de l'ignorant, de l'enfant, de l'homme du peuple) et une "supérieure" (celle du maître qui détient le savoir et sait comment l'apprendre). Ainsi, "la transmission du savoir est donc toujours en même temps transmission du sentiment de l'inégalité des intelligences", processus que Jacotot nomme "abrutissement".

      L'Émancipation : La Conscience d'une Capacité Intellectuelle Égale pour Tous L'émancipation, à l'opposé de l'abrutissement, est un acte positif : "la prise de conscience d'une capacité intellectuel qui vous appartient mais qui vous appartient dans la mesure où disons vous la présupposez également chez toute autre". Jacotot affirme qu'il n'y a qu'une seule intelligence, celle qui découvre l'inconnu à partir du connu, depuis l'apprentissage de la langue maternelle jusqu'aux hypothèses des savants. L'émancipation consiste à amener l'élève à prendre conscience de cette capacité universelle. Les deux formules clés de Jacotot sont "Tout est dans tout" et "Apprendre quelque chose et rapporter tout le reste". Cela signifie qu'il n'y a pas de point de départ ou d'ordre de progression obligé dans l'apprentissage ; n'importe quel élément de connaissance peut servir de "tout" à partir duquel établir des connexions et tracer de nouveaux chemins.

      L'Émancipation Intellectuelle Aujourd'hui : Saturation et Confirmation de l'Inégalité Rancière analyse l'évolution de nos sociétés contemporaines où "la société pédagogisée" s'est perfectionnée à l'extrême, rendant les "friches et ces interstices propres à l'autodidaxie égalitaire" de plus en plus rares. Le tissu du savoir a coïncidé avec la totalité du tissu social, et le temps de l'apprentissage avec le temps de la vie, confirmant l'inégalité.

      La Réduction des Ambitions Égalitaires dans l'Éducation Rancière observe, notamment depuis les années 1960, une "réduction des ambitions égalitaires à quelque chose comme un processus de confirmation de de l'inégalité". Il cite le débat sur l'éducation en France dans les années 1980, opposant une position "sociologique" (adapter la culture aux enfants des milieux défavorisés) et une position "républicaine" (distribuer universellement le même savoir). Ces deux approches, bien que contradictoires, sont toutes deux fondées sur la "présupposition inégalitaire".

      L'institution globale, selon Rancière, a finalement imposé une "programmation bureaucratique d'une concordance présupposée entre trois temps" : le développement des individus, le temps des institutions éducatives et le processus économique et social global. Des outils comme Parcoursup ou le processus de Bologne visent une uniformité idéale, mais cette harmonie est "constamment différée dans la pratique", transformant la discordance en "leur propre échec" pour les individus. Cette machine à fabriquer la concordance des temps fabrique en réalité une "hiérarchie du temps", distinguant le temps homogène des experts et le temps chaotique des individus incapables de s'y accorder.

      L'Annexion des Espaces de Savoir par la Science Sociale Les "interstices" et "espaces vacants" propices à l'émancipation, dont parlait Jacotot, ont été "annexés au territoire de cette science nouvelle... la science sociale". Rancière retrace trois âges de la science sociale :

      L'âge de la "médecine de la société" (socialisme utopique) : la science sociale propose des lois pour organiser la société et résoudre ses maux. L'âge marxiste : critique de la première approche, la science analyse les lois de l'évolution historique de la société pour fournir les moyens d'une action collective transformatrice. Cependant, la faillite du communisme a entraîné une "dissociation durable des liens entre la connaissance des lois de la société et l'action politique transformatrice". Le troisième stade (actuel) : la science sociale est devenue "le savoir de sa reproduction nécessaire" de l'inégalité. Rancière illustre cela avec l'évolution de la pensée de Bourdieu et Passeron, de l'optimisme des "Héritiers" (1964) qui proposait des remèdes aux inégalités scolaires, à la confirmation de la "reproduction" (1970) où la science "ne se propose plus de remédier cette violence mais essentiellement de montrer la nécessité objective des mécanismes de sa reproduction". La science sociale "confirme à sa manière l'axiome de la société pédagogisée à savoir que ceux qui sont dominés le sont en raison de leur défaut de savoir". Le savoir est devenu coextensif à la vie de la société : "tous les aspects du monde où nous vivons sont devenus objets de savoir", mais un savoir qui "ne promet plus finalement aucune forme de libération" et qui tourne en rond pour "montrer à quel point le savoir diffère de l'ignorance". Ce savoir a remplacé les "savoirs autodidactes" ancrés dans des expériences sociales alternatives, comme le "savoir ouvrier".

      L'Institution Journalistique et la Logique du "Décryptage" L'institution journalistique incarne également cette logique explicatrice. Si elle se justifiait autrefois par le fait de "donner aux lecteurs les informations nécessaires pour qu'ils puissent exercer... leurs droits de citoyen libre et égaux", elle a aujourd'hui inversé sa doctrine. Face à un excès d'informations, la tâche est de "sélectionner l'information nécessaire et suffisante et en la compagnant son explication".

      Le mot clé de cette nouvelle approche est "décrypté". Ce terme, autrefois réservé aux espions pour traduire un message codé en langage clair, a subi un "étrange détournement". Aujourd'hui, "le décryptage c'est l'opération qui montre qu'un message d'apparence claire est en réalité un texte obscur". Tout fait évident est transformé en "énigme qui réclame un spécialiste pour en révéler le sens caché". Cela renforce la coextensivité du savoir et de la vie, où "à la fois on nous raconte tout mais en même temps attention vous vous allez rien d'y comprendre si on vous explique pas et s'il y a pas si on fait pas venir les analystes".

      Le Consensus : Négation de la Discussion et de l'Égalité des Intelligences Le "consensus", apparu dans les années 1990, est dénoncé comme bien plus qu'un simple accord de bonne volonté. C'est en réalité "l'accord sur le fait que bah justement il y a pas il y a rien à discuter ou pas grand-chose à discuter Pourquoi ? Parce que bon ben voilà si on recense les données ben on va on va tout de suite voir les solutions qu'elles commandent quoi ce qui amène toujours un dernier ressort à constater qu'il y en a qu'une seule en définitif". Le consensus est "l'accord sur le fait bah les choses sont comme elles sont ne peuvent pas être autrement quoi". Toute tentative de "donner un autre nom aux choses une autre topographie aux événements" est ignorée, car elle n'est "pas reconnue" et "même ne pas être entendu". Le consensus est un "procédé de saturation de ce qu'il y a" destiné à exclure la possibilité même de voir et de nommer autrement, rendant "l'inégalité des intelligences interminablement expliquée... semblable à l'ordre normal des choses".

      Le Paradoxe de l'Émancipation et l'Humeur du Mépris Dans ce contexte de saturation du savoir et de confirmation de l'inégalité, l'émancipation intellectuelle prend une signification "particulière et un peu redoutable". Elle exige une "rupture avec la présupposition inégalitaire" et la "mise en œuvre de l'égalité des intelligence". Cependant, l'inégalité n'est plus un simple "sentiment d'incapacité individuelle", elle est "incorporée dans la texture même des descriptions du monde".

      La "Désexplication" et la Rupture avec la Réalité Imposée L'émancipation ne peut être une connaissance qui éclaire la réalité, car cette réalité est "tissée par les mots de la domination et par ses cartes". Elle doit être un "brouillage de cette réalité", une "volonté d'ignorer ce qu'elle impose". Rancière propose le terme de "désexplication" : "ce qui se passe aujourd'hui ? Passé que voilà on a les cartes on a on peut dire là les cartes d'un monde où tout est en place les choses bien identifiées les événements bien expliqués les voix bien tracé puis bon il faut d'une certaine façon commencer à renoncer à toutes ces identifications et à tous ces liens". Cela implique de "réapprendre un marché sans les repères qui nous disait où on était où on allait".

      Le Maître Ignorant : Dissociation de la Cause et de l'Effet Le paradoxe de l'émancipation est que sa "méthode... ne peut pas être un programme d'émancipation qui se substitue à la méthode explicatrice". L'émancipation implique la "dissociation de la cause et de l'effet". Le "maître ignorant" n'est pas celui qui ne sait rien, mais celui "qui ignore l'effet de savoir qu'il produit et surtout l'effet d'émancipation qu'il produit ou qu'elle produit". Il est donc contradictoire de vouloir imaginer un "système d'éducation émancipé" parallèle, car l'émancipation ne peut pas être planifiée ou vérifiée dans le temps de l'institution. Elle demande des "temps et des espaces assez libres pour qu'elles puissent tracer des chemins à travers les mailles de la logique explicatrice".

      L'Expérience Personnelle de l'Autodidaxie et la Solidarité Collective Rancière partage son expérience d'historien amateur dans les archives du mouvement ouvrier, où son "impréparation" et son "absence de guide" lui ont permis de "trouver... la manifestation d'une pensée d'une intelligence égale à tout autre". Il souligne que l'émancipation intellectuelle, bien qu'individuelle, n'est pas opposée à la lutte collective. Elle n'est pas le "développement personnel" ou "l'estime de soi", mais "la conscience qu'on est intelligent que de l'intelligence égale qu'on prête à tout autre". Des mouvements collectifs (Révolution française, 1830, Mai 68, mouvements d'occupation des places) peuvent créer les espaces propices à ces rencontres émancipatrices. L'émancipation "n'aime pas être planifiée elle aime pas la séparation des moyens et des fins".

      L'Humeur du Mépris : La Tristesse du Savoir Impuissant Rancière conclut sur l'importance de l'"humeur" qui accompagne le savoir. Le savoir actuel, omniprésent et interprétatif, est "impuissant" et "dépité de cette impuissance". Il est pris dans une "humeur triste qui est l'humeur du mépris". Cette humeur caractérise les "esprits progressistes" qui, effrayés par la montée des forces "antidémocratiques, identitaires, racistes et obscurantistes" (incarnées par Donald Trump), les expliquent par l'"ignorance" et "l'arriération" des populations (le "populisme").

      Cependant, Rancière critique cette explication, car elle reproduit la logique explicatrice du mépris. Que ce soient les partisans de Trump qui méprisent les "inutiles" ou les critiques de Trump qui méprisent les "ignorants" qui se laissent séduire, les deux camps partagent "ce sentiment de savoir qu'essentiellement une croyance bah en l'inégalité".

      Vers une Communauté des Égaux Jacotot rappelle que "le même mot intelligence peut signifier deux choses opposées" :

      Ceux qui se sentent intelligents "de par la bêtise qu'ils identifient chez les autres". Ceux qui se sentent intelligents "de par l'intelligence qui reconnaissent entre autres". L'intelligence n'est pas seulement une capacité, mais "une manière d'être de sentir et de communiquer en bref une manière de faire monde". La société actuelle est une "société du mépris", tandis que la "société des égos" est une "création continue" d'actes d'égalité, d'expérimentations singulières qui "brouille les cartes du consensus" et "recrée surtout une confiance dans la capacité de tous".

      Il appelle à une "vertu d'attention", à se demander "quel genre de monde nous construisons à travers nos paroles et et nos actes", et à "changer la direction de notre regard et l'humeur qui l'accompagne". Il faut s'intéresser aux "expérimentations de l'égalité", aux "pratiques qui essayent ici et là de mettre en œuvre la présupposition égalitaire" et "respirer l'air de l'égalité et de la faire circuler autant qu'on peut".

    1. Rewrote the keyboard shortcut code to remove an unmaintained dependency — PR #35425 (by diondiondion)

      GitHub Copilot's explanation of this checkout seemed rather impressive to me:


      At checkout 35425 in the Mastodon repository, the former implementation of keyboard shortcuts, which relied on the third-party library react-hotkeys, has been replaced by a custom React-based solution. Here is a detailed breakdown of the technologies and architectural patterns now used:


      1. Custom React Hook and Component

      • File Introduced:
      • app/javascript/mastodon/components/hotkeys/index.tsx
      • Key Elements:
      • A custom React hook (useHotkeys) and a component (Hotkeys) have been implemented to handle keyboard shortcut logic directly within the application.
      • This solution leverages React's built-in hooks (useEffect, useRef) for event handling and state management.

      2. TypeScript for Type Safety

      • TypeScript Usage:
      • The new implementation is written in TypeScript, providing stronger type safety and maintainability.
      • For example, handler maps are strongly typed, and key matcher logic uses TypeScript types for clarity and correctness.

      3. Utility Functions for Key Handling

      • File Introduced:
      • app/javascript/mastodon/components/hotkeys/utils.ts
      • Features:
      • Utility functions such as normalizeKey and type guards like isKeyboardEvent help standardize and robustly handle keyboard events across browsers.

      4. Storybook Integration for Testing

      • File Introduced:
      • app/javascript/mastodon/components/hotkeys/hotkeys.stories.tsx
      • Purpose:
      • A Storybook story is included for interactive demonstration and testing of the hotkeys component.
      • This enables developers to visually and functionally verify the keyboard shortcut system in isolation.

      5. Removal of react-hotkeys

      • The dependency on react-hotkeys is removed from package.json and corresponding imports are replaced by the new local implementation throughout the codebase.

      6. Architectural Improvements

      • Matching Logic:
      • Hotkey matching logic now supports single-key, combo (modifier + key), and sequence (multi-key) shortcuts, with prioritization for conflict resolution.
      • Global vs. Local Scoping:
      • The custom component allows for both globally-scoped shortcuts (listening on the document) and locally-scoped ones (within a React component subtree).
      • Accessibility:
      • The new approach explicitly avoids triggering shortcuts in <input>, <textarea>, and <select> elements, enhancing usability and accessibility.

      7. Modern React Patterns

      • The hook-based and functional component design aligns with modern React development best practices, reducing reliance on class components and third-party abstractions.

      Summary Table

      | Old Tech (Removed) | New Tech (Introduced) | |----------------------------|-----------------------------------------------| | react-hotkeys (NPM lib) | Custom React hook/component (Hotkeys) | | JavaScript (in places) | TypeScript | | Third-party event handling | Native DOM/React event handling & utilities | | External matching | App-specific matcher logic (TypeScript) | | No isolated test UI | Storybook-based playground |


      References:


      In summary:<br /> The Mastodon project at this commit has migrated from a third-party keyboard shortcut library (react-hotkeys) to a modern, TypeScript-based, locally developed React solution for keyboard shortcut management, offering improved flexibility, maintainability, and testability.

    1. Guidance on National Provider Identifier (NPI) Enumeration; 45 Code of Federal Regulations(CFR) § 162.412(b)

      This guidance details the rules for "getting additional NPIs" in order to work with non-government payers.

      It emphasizes that payers may no require an individual to get an additional NPI.

      It highlights the rules for payers to require organizational subpart NPIs (this is a feature of organizational NPIs)

    1. Reviewer #1 (Public review):

      Summary:

      The authors note that while many software packages exist for spike sorting, these do not automatically differentiate with known accuracy between excitatory and inhibitory neurons. Moreover, most existing spike sorting packages are for in vivo use, where the majority of electrodes are separated from each other by several hundred microns or more. There is a need for spike sorting packages that can take advantage of high-density electrode arrays where all electrodes are within a few tens of microns from other electrodes. Here, the authors offer such a software package with SpikeMAP, and they validate its performance in identifying parvalbumin interneurons that were optogenetically stimulated.

      Strengths:

      The main strength of this work is that the authors use ground truth measures to show that SpikeMAP can take features of spike shapes to correctly identify known parvalbumin interneurons against a background of other neuron types. They use spike width and peak to peak distance as the key features for distinguishing between neuron types, a method that has been around for many years (Barthó, Peter, et al. "Characterization of neocortical principal cells and interneurons by network interactions and extracellular features." Journal of neurophysiology 92.1 (2004): 600-608.), but whose performance has not been validated in the context of high-density electrode arrays.

      Another strength of this approach is that it is automated - a necessity if your electrode array has 4096 electrodes. Hand-sorting or even checking such a large number of channels is something even the cruellest advisor would not wish upon a graduate student. With such large channel counts, it is essential to have automated methods that are known to work accurately. Hence, the combination of validation and automation is an important advance.

      A nice feature of this work is that with high-density electrode arrays, the spike waveforms appear on multiple nearby electrodes simultaneously. And since spike amplitudes fall off with distance, this allows triangulation of neuron locations within the regular electrode array. Thus, spike correlations between neuron types, or within neuron types, can be plotted as a function of distance. While SpikeMAP is not the first to do this (Peyrache, Adrien, et al. "Spatiotemporal dynamics of neocortical excitation and inhibition during human sleep." Proceedings of the national academy of sciences 109.5 (2012): 1731-1736.), it is a welcome capability of this package.

      It is also good that the code for this package is open-source, allowing a community of people (I expect in vitro labs will especially want to use this) to use the code and further improve it.

      Weaknesses:

      As this code was developed for use with a 4096-electrode array, it is important to be aware of double counting neurons across the many electrodes. I understand that there are ways within the code to ensure that this does not happen, but care must be taken in two key areas: First, action potentials traveling down axons will exhibit a triphasic waveform that is different from the biphasic waveform that appears near the cell body, but these two signals will still be from the same neuron (for example, see Litke et al., 2004 "What does the eye tell the brain: Development of a System for the Large-Scale Recording of Retinal Output Activity"; figure 14). I did not see anything that would directly address this situation, so it might be something for you to consider in updated versions of the code. Second, spike shapes are known to change when firing rates are high, like in bursting neurons (Harris, K.D., Hirase, H., Leinekugel, X., Henze, D.A. & Buzsáki, G. Temporal interaction between single spikes and complex spike bursts in hippocampal pyramidal cells. Neuron 32, 141-149 (2001)). I did not see this addressed in the present version of the manuscript.

      Another area for possible improvement would be to build on the excellent validation experiments you have already conducted with parvalbumin interneurons. Although it would take more work, similar experiments could be conducted for somatostatin and vasoactive intestinal peptide neurons against a background of excitatory neurons. These may have different spike profiles, but your success in distinguishing them can only be known if you validate against ground truth, like you did for the PV interneurons.

      Appraisal:

      This work addresses the need for an automated spike sorting software package for high density electrode arrays. Although no spike sorting software is flawless, the package presented here, SpikeMAP, has been validated on PV interneurons, inspiring a degree of confidence. This is a good start, and further validation on other neuron types could increase that confidence. Groups doing in vitro experiments, where 4096 electrode arrays are more common, could find this system particularly helpful.

      Comments on revised version:

      I appreciate the dialogue that has occurred over this submission. I have seen how the authors have taken into account the issues that I have raised, as well as those brought up by reviewer 2. I am satisfied that the paper has improved and is now a novel and useful contribution in the area of spike sorting.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review)

      As this code was developed for use with a 4096 electrode array, it is important to be aware of double-counting neurons across the many electrodes. I understand that there are ways within the code to ensure that this does not happen, but care must be taken in two key areas. Firstly, action potentials traveling down axons will exhibit a triphasic waveform that is different from the biphasic waveform that appears near the cell body, but these two signals will still be from the same neuron (for example, see Litke et al., 2004 "What does the eye tell the brain: Development of a System for the Large-Scale Recording of Retinal Output Activity"; figure 14). I did not see anything that would directly address this situation, so it might be something for you to consider in updated versions of the code.

      Thank you for this comment. We have added a routine to the SpikeMAP to remove highly correlated spikes detected within a given spatial radius of each other. The following was added to the main text (line 149):

      “As an additional verification step, SpikeMAP allows the computation of spike-count correlations between putative neurons located within a user-defined radius. Signals that exceed a defined threshold of correlation can be rejected as they likely reflect the same underlying cell.”

      Secondly, spike shapes are known to change when firing rates are high, like in bursting neurons (Harris, K.D., Hirase, H., Leinekugel, X., Henze, D.A. & Buzsáki, G. Temporal interaction between single spikes and complex spike bursts in hippocampal pyramidal cells. Neuron 32, 141-149 (2001)). I did not see this addressed in the present version of the manuscript.

      We have added a routine to SpikeMAP that computes population spike rates to verify stationarity over time. We have also added a routine to identify putative bursting neurons through a Hartigan statistical dip test applied to the inter-spike distribution of individual cells.

      We added the following (line 204):

      “Further, SpikeMAP contains a routine to perform a Hartigan statistical dip test on the inter-spike distribution of individual cells to detect putative bursting neurons.”

      Another area for possible improvement would be to build on the excellent validation experiments you have already conducted with parvalbumin interneurons. Although it would take more work, similar experiments could be conducted for somatostatin and vasoactive intestinal peptide neurons against a background of excitatory neurons. These may have different spike profiles, but your success in distinguishing them can only be known if you validate against ground truth, like you did for the PV interneurons.

      We have added the following (line 326):

      “future work could include different inhibitory interneurons such as somatostatin (SOM) and vasoactive intestinal polypeptide (VIP) neurons to improve the classification of inhibitory cell types. Another avenue could involve applying SpikeMAP on artificially generated spike data (Buccino & Einevoll 2021; Laquitaine et al., 2024).”

      Reviewer #2 (Public review)

      Summary:

      While I find that the paper is nicely written and easy to follow, I find that the algorithmic part of the paper is not really new and should have been more carefully compared to existing solutions. While the GT recordings to assess the possibilities of a spike sorting tool to distinguish properly between excitatory and inhibitory neurons are interesting, spikeMAP does not seem to bring anything new to state-of-the-art solutions, and/or, at least, it would deserve to be properly benchmarked. I would suggest that the authors perform a more intensive comparison with existing spike sorters.

      Thank you for your insightful comment. A full comparison between SpikeMAP and related methods is provided in Table. 1. As can be seen, SpikeMAP is the only method listed that performs E/I sorting on large-scale multielectrodes. Nonetheless, several aspects of SpikeMAP included in the spike sorting pipeline do overlap with existing methods, as these constitute necessary steps prior to performing E/I identification. These steps are not novel to the current work, nor do they constitute rigid options that cannot be substituted by the user. Rather, we aim to offer SpikeMAP users the option to combine E/I identification with preliminary steps performed either through our software or through another package of their choosing. For instance, preliminary spike sorting could be done through Kilosort before importing the spike data into SpikeMAP for E/I identification. To allow greater flexibility, we have now modularized our suite so that E/I identification can be performed as a stand-alone module. We have clarified the text accordingly (line 317):

      “While SpikeMAP is the only known method to enable the identification of putative excitatory and inhibitory neurons on high-density multielectrode arrays (Table 1), several aspects of SpikeMAP included in the spike sorting pipeline (Figure 1) overlap with existing methods, as these constitute required steps prior to performing E/I identification. To enable users the ability to integrate SpikeMAP with existing toolboxes, we provide a modularized suite of protocols so that E/I identification can be performed separately from preliminary spike sorting steps. In this way, a user could carry out spike sorting through Kilosort or another package before importing their data to SpikeMAP for E/I identification.”

      Weaknesses:

      (1) The global workflow of spikeMAP, described in Figure 1, seems to be very similar to that of Hilgen et al. 2020 (10.1016/j.celrep.2017.02.038). Therefore, the first question is what is the rationale of reinventing the wheel, and not using tools that are doing something very similar (as mentioned by the authors themselves). I have a hard time, in general, believing that spikeMAP has something particularly special, given its Methods, compared to state-of-the-art spike sorters.

      The paper by Hilgen et al. is reported in Table 1. As seen, while this paper employs optogenetics, it does not target inhibitory (e.g., PV) cells. We have added the following clarification (line 82):

      “Despite evidence showing differences in action potential kinetics for distinct cell-types as well as the use of optogenetics (Hilgen et al., 2017), there exists no large-scale validation efforts, to our knowledge, showing that extracellular waveforms can be used to reliably distinguish cell-types.”

      This is why, at the very least, the title of the paper is misleading, because it lets the reader think that the core of the paper will be about a new spike sorting pipeline. If this is the main message the authors want to convey, then I think that numerous validations/benchmarks are missing to assess first how good spikeMAP is, with reference to spike sorting in general, before deciding if this is indeed the right tool to discriminate excitatory vs inhibitory cells. The GT validation, while interesting, is not enough to entirely validate the paper. The details are a bit too scarce for me, or would deserve to be better explained (see other comments after).

      We thank the reviewer for this comment, and have amended the title as follows:

      “SpikeMAP: An unsupervised pipeline for the identification of cortical excitatory and inhibitory neurons in high-density multielectrode arrays with ground-truth validation”

      (2) Regarding the putative location of the spikes, it has been shown that the center of mass, while easy to compute, is not the most accurate solution [Scopin et al, 2024, 10.1016/j.jneumeth.2024.110297]. For example, it has an intrinsic bias for finding positions within the boundaries of the electrodes, while some other methods, such as monopolar triangulation or grid-based convolution,n might have better performances. Can the authors comment on the choice of the Center of Mass as a unique way to triangulate the sources?

      We agree with the reviewer that the center-of-mass algorithm carries limitations that are addressed by other methods. To address this issue, we have included two additional protocols in SpikeMAP to perform monopolar triangulation and grid-based convolution, offering additional options for users of the package. The text has been clarified as follows (line 429):

      “In addition to center-of-mass triangulation, SpikeMAP includes protocols to perform monopolar triangulation and grid-based convolution, offering additional options to estimate putative soma locations based on waveform amplitudes.”

      (3) Still in Figure 1, I am not sure I really see the point of Spline Interpolation. I see the point of such a smoothing, but the authors should demonstrate that it has a key impact on the distinction of Excitatory vs. Inhibitory cells. What is special about the value of 90kHz for a signal recorded at 18kHz? What is the gain with spline enhancement compared to without? Does such a value depend on the sampling rate, or is it a global optimum found by the authors?

      We clarified the text as follows (line 183):

      “While we found that a resolution of 90 kHZ provided a reasonable estimate of spike waveforms, this value can be adjusted as a parameter in SpikeMAP.”

      (4) Figure 2 is not really clear, especially panel B. The choice of the time scale for the B panel might not be the most appropriate, and the legend filtered/unfiltered with a dot is not clear to me in Bii.

      We apologize for the rendering issues in the Figures that occurred during conversion into PDF format. We have now ensured that all figures are properly displayed.

      In panel E, the authors are making two clusters with PCA projections on single waveforms. Does this mean that the PCA is only applied to the main waveforms, i.e. the ones obtained where the amplitudes are peaking the most? This is not really clear from the methods, but if this is the case, then this approach is a bit simplistic and does not really match state-of-the-art solutions. Spike waveforms are quite often, especially with such high-density arrays, covering multiple channels at once, and thus the extracellular patterns triggered by the single units on the MEA are spatio-temporal motifs occurring on several channels. This is why, in modern spike sorters, the information in a local neighbourhood is often kept to be projected, via PCA, on the lower-dimensional space before clustering. Information on a single channel only might not be informative enough to disambiguate sources. Can the authors comment on that, and what is the exact spatial resolution of the 3Brain device? The way the authors are performing the SVD should be clarified in the methods section. Is it on a single channel, and/or on multiple channels in a local neighbourhood?

      We agree with the reviewer that it would be useful to have the option of performing PCA on several channels at once, since spikes can occur at several channels at the same time. We have now added a routine to SpikeMAP that allows users to define a radius around individual channels prior to performing PCA. The text was clarified as follows (line 131):

      “The SpikeMAP suite also offers a routine to select a radius around individual channels in order to enter groups of adjacent channels in PCA.”

      (5) About the isolation of the single units, here again, I think the manuscript lacks some technical details. The authors are saying that they are using a k-means cluster analysis with k=2. This means that the authors are explicitly looking for 2 clusters per electrode? If so, this is a really strong assumption that should not be held in the context of spike sorting, because, since it is a blind source separation technique, one can not pre-determine in advance how many sources are present in the vicinity of a given electrode. While the illustration in Figure 2E is ok, there is no guarantee that one can not find more clusters, so why this choice of k=2? Again, this is why most modern spike sorting pipelines do not rely on k-means, to avoid any hard-coded number of clusters. Can the authors comment on that?

      We clarified the text as follows (line 135):

      “In SpikeMAP, the optimal number of k-means clusters can be chosen by a Calinski-Harabasz criterion (Calinski and Harabasz, 1974) or pre-selected by the user.”

      (6) I'm surprised by the linear decay of the maximal amplitude as a function of the distance from the soma, as shown in Figure 2H. Is it really what should be expected? Based on the properties of the extracellular media, shouldn't we expect a power law for the decay of the amplitude? This is strange that up to 100um away from the soma, the max amplitude only dropped from 260 to 240 uV. Can the authors comment on that? It would be interesting to plot that for all neurons recorded, in a normed manner V/max(V) as function of distances, to see what the curve looks like.

      We added Supplemental Figure 1 showing the drop in voltage over all putative somas (N=1,950) of one recording, after excluding somas with an increase voltage away from electrode peak and computing normed values V/max(V). We see a distribution of slopes as well as intercepts across somas, showing some variability across recordings sites. As the reviewer suggests, it is possible that a power-law describes these data better than a linear function, and this would need to be investigated further by quantitatively comparing the fit of these functions.

      (7) In Figure 3A, it seems that the total number of cells is rather low for such a large number of electrodes. What are the quality criteria that are used to keep these cells? Did the authors exclude some cells from the analysis, and if yes, what are the quality criteria that are used to keep cells? If no criteria are used (because none are mentioned in the Methods), then how come so few cells are detected, and can the authors convince us that these neurons are indeed "clean" units (RPVs, SNRs, ...)?

      The reviewer is correct to point out that a number of stringent criteria were employed to exclude some putative cells. We now outline these criteria directly in the text (line 161):

      “ At different steps in the process, conditions for rejecting spikes can be tailored by applying: (1) a stringent threshold to filtered voltages; (2) a minimal cut-off on the signal-to-noise ratio of voltages (see Supplemental Figure 2); (3) an LDA for cluster separability; (4) a minimal spike rate to putative neurons; (5) a Hartigan statistical dip test to detect spike bursting; (6) a decrease in voltage away from putative somas; and (7) a maximum spike-count correlation for nearby channels. Together, these criteria allow SpikeMAP users the ability to precisely control parameters relevant to automated spike sorting.”

      Further, we provide SNRs of individual channels (Supplemental Figure 2), and added to the SpikeMAP software the ability to apply a minimal criterion based on SNR.

      (8) Still in Figure 3A, it looks like there is a bias to find inhibitory cells at the borders, since they do not appear to be uniformly distributed over the MEA. Can the authors comment on that? What would be the explanation for such a behaviour? It would be interesting to see some macroscopic quantities on Excitatory/Inhibitory cells, such as mean firing rates, averaged SNRs... Because again, in Figure 3C, it is not clear to me that the firing rates of inhibitory cells are higher than Excitatory ones, whilst they should be in theory.

      We have added figures showing the distribution of E and I firing rates across a population of N=1,950 putative cells (Supplemental Figure 3). Firing rates of inhibitory neurons are marginally higher than excitatory neurons, and both E and I follow an approximately exponential distribution of rates.

      Reviewer may be right that there are more I neurons at borders in Fig.3B because injections were done in medial prefrontal cortex, so this may reflect an experimental artefact related to a high probability of activating I neurons in locations where the opsin was activated. We added a sentence to the text to clarify this point (line 201):

      “It is possible that the spatial location of putative I cells reflects the site of injection of the opsin in medial prefrontal cortex.”

      (9) For Figure 3 in general, I would have performed an exhaustive comparison of putative cells found by spikeMAP and other sorters. More precisely, I think that to prove the point that spikeMAP is indeed bringing something new to the field of spike sorting, the authors should have compared the performances of various spike sorters to discriminate Exc vs Inh cells based on their ground truth recordings. For example, either using Kilosort [Pachitariu et al, 2024, 10.1038/s41592-024-02232-7], or some other sorters that might be working with such large high-density data [Yger et al, 2018, 10.7554/eLife.34518].

      The reviewer is correct to point out that our the spike-sorting portion of our pipeline shares similarities with related approaches. Other aspects, however, are unique to SpikeMAP. We have clarified the text accordingly:

      “In sum, SpikeMAP provides an end-to-end pipeline to perform spike-sorting on high-density multielectrode arrays. Some elements of this pipeline are similar to related approaches (Table 1), including the use of voltage filtering, PCA, and k-means clustering. Other elements are novel, including the use of spline interpolation, LDA, and the ability to identify putative excitatory and inhibitory cells.”

      (10) Figure 4 has a big issue, and I guess the panels A and B should be redrawn. I don't understand what the red rectangle is displaying.

      Again, we apologize for the rendering issues in the Figures that occurred during conversion into PDF format. We have now ensured that all figures are properly displayed.

      (11) I understand that Figure 4 is only one example, but I have a hard time understanding from the manuscript how many slices/mices were used to obtain the GT data? I guess the manuscript could be enhanced by turning the data into an open-access dataset, but then some clarification is needed. How many flashes/animals/slices are we talking about? Maybe this should be illustrated in Figure 4, if this figure is devoted to the introduction of the GT data.

      Details of the open access data are now provided in Supplemental Table 1. We also clarified Figure 5B:

      “Quantification of change in firing rate following optogenetic stimulation. Average firing rates are taken over four recordings obtained from 3 mice.”

      (12) While there is no doubt that GT data as the ones recorded here by the authors are the most interesting data from a validation point of view, the pretty low yield of such experiments should not discourage the use of artificially generated recordings such as the ones made in [Buccino et al, 2020, 10.1007/s12021-020-09467-7] or even recently in [Laquitaine et al, 2024, 10.1101/2024.12.04.626805v1]. In these papers, the authors have putative waveforms/firing rate patterns for excitatory and inhibitory cells, and thus, the authors could test how good they are in discriminating the two subtypes.

      We agree with the reviewer that it would be worthwhile for future work to apply SpikeMAP to artificially generated spike trains, and have added the following (line 328):

      “Another avenue could involve applying SpikeMAP on artificially generated spike data (Buccino & Einevoll 2021; Laquitaine et al., 2024).”

      Reviewer #1 (Recommendations for the authors):

      (1) Line 154 seems to include a parenthetical expression left over from editing: "sensitive to noise (contamination? Better than noise?) generated by the signal of proximal units." See also line 186: "use (reliance?) of light-sensitive" and line 245: "In the absence of synaptic blockers (right?)," and line 270: "the size of the data prevents manual intervention (curation?)." Check carefully for all parentheses like that, which should be removed.

      Thank you for pointing this out. We have revised the text and removed parenthetical expressions left over from editing.

      (2) In lines 285-286, you state that: "k-mean clustering of spike waveform properties best differentiated the two principal classes of cells..." But I could not find where you compared k-means clustering to other methods. I think you just argued that k-means seemed to work well, but not better than, another method. If that is so, then you should probably rephrase those lines.

      The reviewer is correct that direct comparisons are not performed here, hence we removed this sentence.

      (3) Methods section, E/I classification, lines 396-405: You give us figures on what fraction was E and I (PV subtype) (94.75% and 5.25%), but there is more that you could have said. First of all, what is the expected fraction of parvalbumin-sensitive interneurons in the cortex - is it near 5%?

      We clarified the text as follows (line 444): “This number is close to the expected percentage of PV interneurons in cortex (4-6%) (Markram et al. 2004).”

      Second, how would these percentages change if you altered the threshold from 3 s.d. to something lower, like 2 s.d.? Giving us some idea of how the threshold affects the fraction of PV interneurons could give us an idea of whether this method agrees with our expectations or not.

      While SpikeMAP offers the flexibility to set the voltage threshold manually, we opted for a stringent threshold to demonstrate the capabilities of the software. As seen in Figure 2D, at 2 and 3 s.d., the signal is largely accounted for by Gaussian noise, while deviation from noise arises around 4 s.d. We clarified the text as follows (line 120):

      “At a threshold of -3 , the signal could be largely accounted for by Gaussian noise, while a separation between signal and noise began around a threshold of -4 ”

      Third, did the inhibitory neurons identified by this optogenetic method also have narrow spike widths at half amplitude? Could you do a scatterplot of all the spike widths and inter-peak distances that had color-coded dots for E and I based on your optogenetic method?

      We have added a scatterplot (Supplemental Figure 5).

      (4) Can you compare your methods with others now widely in use, like, for example, Spiking Circus or Kilosort? You do that in Table 1 in terms of features, but not in terms of performance. For example, you could have applied Kilosort4 to your data from the 4096 electrode array and seen how often it sorted the same neurons that SpikeMAP did. I realize this could not give you a comparison of how many were E/I, but it could tell you how close your numbers of neurons agreed with their numbers. Were your numbers within 5% of each other? This would be helpful for groups who are already using Kilosort4.

      As mentioned ealier, packages listed in Table 1 do not provide an identification of putative E/I neurons on high-density electrode arrays. To facilitation the integration of SpikeMAP with other spike sorting packages, our suite now provides a stand-alone module to perform E/I identification. This is now mentioned in the text (see earlier comment).

      Reviewer #2 (Recommendations for the authors):

      I would encourage the authors to decide what the paper is about: is it about a new sorting method (and if yes, more tests/benchmarks are needed to explain the pros and the cons of the pipelines, and the Methods need to be expanded). Or is it about the new data for Ground Truth validation, and again, if yes, then maybe explain more what they are, how many slices/mice/cells, ... Maybe also consider making the data available online as an open dataset.

      We agree with the reviewer that the paper is best slated toward ground truth validation of E/I identification. We now specify how many slices/mice/cells etc. (see Supplemental Table 1) and make the data available online as open source.

    1. Reviewer #2 (Public review):

      Summary:

      The main goal of this study is to examine how information about odor concentration is encoded by second-order neurons in the invertebrate and vertebrate olfactory system. In many animal models, the overall mean firing rates across the second-order neurons appear to be relatively flat or near constant with increasing odor intensity. While such compression of concentration information could aid in achieving concentration invariant recognition of odor identity, how this observation could be reconciled with the need to preserve information about the changes in stimulus intensity is a major focus of the study. The authors show that second-order neurons have 'diverse' dose-response curves and that the combinations of neurons activated (particularly the rank-order) differ with concentration. Further, they argue that a single circuit-level computation, termed 'divisive normalization,' where the individual neural response is normalized by the total activity across all neurons, could help explain the coding properties of neurons at this stage of processing in all model organisms examined. They present approaches to read out the concentration information using spike rates or timing-based approaches. Finally, the authors reveal that tufted cells in the mouse olfactory bulb provide an exception to this coding approach and encode concentration information with a monotonic increase in firing rates.

      Strengths:

      (1) Comparative analysis of odor intensity coding across four different species, revealing the common features in encoding stimulus-driven features, is highly valuable.

      (2) Showing how mitral and tufted cells differ in encoding odor intensity is potentially very important to the field.

      (3) How to preserve concentration information while compressing the same with divisive normalization is also a novel and important problem in the field of sensory coding.

      Weaknesses:

      (1) The encoding problem:

      The main premise that divisive normalization generates this diversity of dose-response curves in the second-order neurons is a little problematic. The authors acknowledge this as part of their analysis in Figure 3.

      "Therefore, divisive normalization mostly does not alter the relative contribution (rank order) of each neuron in the ensemble." (Page 4, last paragraph, lines 6-8).

      The analysis in this figure indicates that divisive normalization does what it is supposed to do, i.e., compresses concentration information and not alter the rank-order of neurons or the combinatorial patterns. Changes in the combinations of neurons activated with intensity arise directly from the fact that the first-order neurons did not have monotonic responses with odor intensity (i.e., crossovers). This was the necessary condition, and not the divisive normalization for changes in the combinatorial code.

      There seems to be a confusion/urge to attribute all coding properties found in the second-order neurons to 'divisive normalization.' If the input from sensory neurons is monotonic (i.e., no crossovers), then divisive normalization did not change the rank order, and the same combinations of neurons are activated in a similar fashion (same vector direction or combinatorial profile) to encode for different odor intensities. Concentration invariance is achieved, and concentration information is lost. However, when the first-order neurons are non-monotonic (i.e., with crossovers), that causes the second-order neurons to have different rank orders with different concentrations. Divisive normalization compresses information about concentrations, and rank-order differences preserve information about the odor concentration. Does this not mean that the non-monotonicity of sensory neuron response is vital for robustly maintaining information about odor concentration?

      Naturally, the question that arises is whether many of the important features of the second-order neuron's response simply seem to follow the input. Or is my understanding of the figures and the write-up flawed, and are there more ways in which divisive normalization contributes to reshaping the second-order neural response? This must be clarified.

      Lastly, the tufted cells in the mouse OB are also driven by this sensory input with crossovers. How does the OB circuit convert the input with crossovers into one that is monotonic with concentration? I think that is an important question that this computational effort could clarify.

      (2) The decoding problem.

      The way the decoding results and analysis are presented does not add a lot of information to what has already been presented. For example, based on the differences in rank-order with concentration, I would expect the combinatorial code to be different. Hence, a very simple classifier based on cosine or correlation distance would work well. However, since divisive normalization (DN) is applied, I would expect a simple classification scheme that uses the Euclidean distance metric to work equally as well after DN. Is this the case?<br /> Leave-one-trial/sample-out seems too conservative. How robust are the combinatorial patterns across trials? Would just one or two training trials suffice for creating templates for robust classification? Based on my prior experience (https://elifesciences.org/reviewed-preprints/89330), I do expect that the combinatorial patterns would be more robust to adaptation and hence also allow robust recognition of odor intensity across repeated encounters.

      Lastly, in the simulated data, since the affinity of the first-order sensory neurons to odorants is expected to be constant across concentration, and "Jaccard similarity between the sets of highest-affinity neurons for each pair of concentration levels was > 0.96," why would the rank-order change across concentration? DN should not alter the rank order.

      If the set of early responders does change, how will the decoder need to change, and what precise predictions can be made that can be tested experimentally? The lack of exploration of this aspect of the results seems like a missed opportunity.

      (3) Analysis of existing data.

      I had a couple of issues related to the presentation and analysis of prior results.

      i) Based on the methods, for Figures 1 and 2, it appears the responses across time, trials, and odorants were averaged to get a single data point per neuron for each concentration. Would this averaging not severely dilute trends in the data? The one that particularly concerns me is the averaging across different odorants. If you do odor-by-odor analysis, is the flattening of second-order neural responses still observable? Because some odorants activate more globally and some locally, I would expect a wide variety of dose-response relationships that vary with odor identity (more compressed in second-order neurons, of course). It would be good to show some representative neural responses and show how the extracted values for each neuron are a faithful/good representation of its response variation across intensities.

      ii) A lot of neurons seem to have responses that flat line closer to zero (both firing rate and dF/F in Figure 1). Are these responsive neurons? The mean dF/F also seems to hover not significantly above zero. Hence, I was wondering if the number of neurons is reducing the trend in the data significantly.

      iii) I did not fully understand the need to show the increase in the odor response across concentrations as a polar plot. I see potential issues with the same. For example, the following dose-response trend at four intensities (C4 being the highest concentration and C1 the lowest): response at C3 > response at C1 and response at C4 > response at C2. But response at C3 < response at C2. Hence, it will be in the top right segment of the polar plot. However, the responses are not monotonic with concentrations. So, I am not convinced that the polar plot is the right way to characterize the dose-response curves. Just my 2 cents.

      (4) Simulated vs. Actual data.

      In many analyses, simulated data were used (Figures 3 and 4). However, there is no comparison of how well the simulated data fit the experimental data. For example, the Simulated 1st order neuron in Figure 3D does not show a change in rank-order for the first-order neuron. In Figure 3E, temporal response patterns in second-order neurons look unrealistic. Some objective comparison of simulated and experimental data would help bolster confidence in these results.

    2. Author response:

      (1) Explore the temporal component of neural responses (instead of collapsing responses to a single number, i.e., the average response over 4s), and determine which of the three models can recapitulate the observed dynamics.

      (2) Expand the polar plot visualization to show all three slopes (changes in responses across all three successive concentrations) instead of only two slopes.

      (3) Attempt to collect and analyze, from published papers, data of: (a) first-order neuron responses to odors to determine the role of first-order inhibition towards generating non-monotonic responses, and (b) PN responses in Drosophila to properly compare with corresponding first-order neuron responses.

      (4) Further discuss: (a) why the brain may need to encode absolute concentration, (b) the distinction between non-monotonic responses and cross-over responses, and (c) potential limitations of the primacy model.

      (5) Expand the divisive normalization model by evaluating different values of k and R, and study the effects of divisive normalization on tufted cells.

      (6) Add discussion of other potential inhibitory mechanisms that could contribute towards the observed effects.

      Reviewer #1:

      The article starts from the premise that animals need to know the absolute concentration of an odor over many log units, but the need for this isn't obvious. The introduction cites an analogy to vision and audition. These are cases where we know for a fact that the absolute intensity of the stimulus is not relevant. Instead, sensory perception relies on processing small differences in intensity across space or time. And to maintain that sensitivity to small differences, the system discards the stimulus baseline. Humans are notoriously bad at judging the absolute light level. That information gets discarded even before light reaches the retina, namely through contraction of the pupil. Similarly, it seems plausible that a behavior like olfactory tracking relies on sensing small gradients across time (when weaving back and forth across the track) or space (across nostrils). It is important that the system function over many log units of concentration (e.g., far and close to a source) but not that it accurately represents what that current concentration is [see e.g., Wachowiak et al, 2025 Recalibrating Olfactory Neuroscience..].

      We thank the Reviewer for the insightful input and agree that gradients across time and space are important for various olfactory behaviors, such as tracking. At the same time, we think that absolute concentration is also needed for two reasons. First, in order to extract changes in concentration, the absolute concentration needs to be normalized out; i.e., change needs to be encoded with respect to some baseline, which is what divisive normalization computes. Second, while it is true that representing the exact number of odor molecules present is not important, this number directly relates to distance from the odor source, which does provide ethological value (e.g., is the tiger 100m or 1000m away?). Indeed, our decoding experiments focused on discriminating relative, and not on absolute, concentrations by classifying between each pair of concentrations (i.e., relative distances), which is effectively an assessment of the gradient. In our revision, we will make all of these points clearer.

      Still, many experiments in olfactory research have delivered square pulses of odor at concentrations spanning many log units, rather than the sorts of stimuli an animal might encounter during tracking. Even within that framework, though, it doesn't seem mysterious anymore how odor identity and odor concentration are represented differently. For example, Stopfer et al 2003 showed that the population response of locust PNs traces a dynamic trajectory. Trajectories for a given odor form a manifold, within which trajectories for different concentrations are distinct by their excursions on the manifold. To see this, one must recognize that the PN responds to an odor pulse with a time-varying firing rate, that different PNs have different dynamics, and that the dynamics can change with concentration. This is also well recognized in the mammalian systems. Much has been written about the topic of dynamic coding of identity and intensity - see the reviews of Laurent (2002) and Uchida (2014).

      Given the above comments on the dynamics of odor responses in first- and second-order neurons, it seems insufficient to capture the response of a neuron with a single number. Even if one somehow had to use a single number, the mean firing rate during the odor pulse may not be the best choice. For example, the rodent mitral cells fire in rhythm with the animal's sniffing cycle, and certain odors will just shift the phase of the rhythm without changing the total number of spikes (see e.g., Fantana et al, 2008). During olfactory search or tracking, the sub-second movements of the animal in the odor landscape get superposed on the sniffing cycle. Given all this, it seems unlikely that the total number of spikes from a neuron in a 4-second period is going to be a relevant variable for neural processing downstream.

      To our knowledge, it is not well understood how downstream brain regions read out mitral cell responses to guide olfactory behavior. The olfactory bulb projects to more than a dozen brain regions, and different regions could decode signals in different ways. We focused on the mean response because it is a simple, natural construct.

      The datasets we analyzed may not include all relevant timing information; for example, the mouse data is from calcium imaging studies that did not track sniff timing. Nonetheless, we plan to address this comment within our framework by binning time into smaller-sized windows (e.g., 0-0.2s, 0.2-0.4s, etc.) and repeating our analysis for each of these windows. Specifically, we will determine how each normalization method fares in recapitulating statistics of the population responses of each window, beyond simply assessing the population mean.

      Much of the analysis focuses on the mean activity of the entire population. Why is this an interesting quantity? Apparently, the mean stays similar because some neurons increase and others decrease their firing rate. It would be more revealing, perhaps, to show the distribution of firing rates at different concentrations and see how that distribution is predicted by different models of normalization. This could provide a stronger test than just the mean.

      We agree that mean activity is only one measure to summarize a rich data set and will perform the suggested analysis.

      The question "if concentration information is discarded in second-order neurons, which exclusively transmit odor information to the rest of the brain, how does the brain support olfactory behaviors, such as tracking and navigation?" is really not an open question anymore. For example, reference 23 reports in the abstract that "Odorant concentration had no systematic effect on spike counts, indicating that rate cannot encode intensity. Instead, odor intensity can be encoded by temporal features of the population response. We found a subpopulation of rapid, largely concentration-invariant responses was followed by another population of responses whose latencies systematically decreased at higher concentrations."

      Primacy coding does provide one plausible mechanism to decode concentration. Our manuscript demonstrated how such a code could emerge in second-order neurons with the help of divisive normalization, though it does require maintaining at least partial rank invariance across concentrations, which may not be robust. We also showed how concentration could be decoded via spike rates, even if average rates are constant, which provides an alternative hypothesis to that of ref 23.

      Further, ref 23 only considers the piriform cortex, which, as mentioned above, is one of many targets of the olfactory bulb, and it remains unclear what the decoding mechanisms are of each of these targets. In addition, work from the same authors of ref 23 found multiple potential decoding strategies in the piriform cortex itself, including changes in firing rate (see Fig. 2E of ref. 23 - Bolding & Franks, 2017; as well as Fig. 4 in Roland et al., 2017).

      It would be useful to state early in the manuscript what kinds of stimuli are being considered and how the response of a neuron is summarized by one number. There are many alternative ways to treat both stimuli and responses.

      We will add this explanation to the manuscript.

      "The change in response across consecutive concentration levels may not be robust due to experimental noise and the somewhat limited range of concentrations sampled": Yes, a number of the curves just look like "no response". It would help the reader to show some examples of raw data, e.g. the time course of one neuron's firing rate to 4 concentrations, and for the authors to illustrate how they compress those responses into single numbers.

      We agree and will add this information to the manuscript.

      "We then calculated the angle between these two slopes for each neuron and plotted a polar histogram of these angles." The methods suggest that this angle is the arctan of the ratio of the two slopes in the response curve. A ratio of 2 would result from a slope change from 0.0001 to 0.0002 (i.e., virtually no change in slope) or from 1 to 2 (a huge change). Those are completely different response curves. Is it reasonable to lump them into the same bin of the polar plot? This seems an unusual way to illustrate the diversity of response curve shapes.

      We agree that the two changes in the reviewer’s example will be categorized in the same quadrant in our analysis. We did not focus on the absolute changes because our analysis covers many log ratios of concentrations. Instead, we focused on the relative shapes of the concentration response curves, and more specifically, the direction of the change (i.e., the sign of the slope). We will better motivate this style of analysis in the revision. Moreover, in response to comments by Reviewer 2, we will compare response shapes between all three successive levels of concentration changes, as opposed to only two levels.

      The Drosophila OSN data are passed through normalization models and then compared to locust PN data. This seems dangerous, as flies and locusts are separated by about 300 M years of evolution, and we don't know that fly PNs act like locust PNs. Their antennal lobe anatomy differs in many ways, as does the olfactory physiology. To draw any conclusions about a change in neural representation, it would be preferable to have OSN and PN data from the same species.

      We are in the process of requesting PN response data in Drosophila from groups that have collected such data and will repeat the analysis once we get access to the data.

      One conclusion is that divisive normalization could account for some of the change in responses from receptors to 2nd order neurons. This seems to be well appreciated already [e.g., Olsen 2010, Papadopoulou 2011, minireview in Hong & Wilson 2013].

      While we agree that these manuscripts do study the effects of divisive normalization in insects and fish, here we show that this computation also generalizes to rodents. In addition, these previous studies do not focus on divisive normalization’s role towards concentration encoding/decoding, which is our focus. We will clarify this difference in the revision.

      Another claim is that subtractive normalization cannot perform that function. What model was used for subtractive normalization is unclear (there is an error in the Methods). It would be interesting if there were a categorical difference between divisive and subtractive normalization.

      We apologize for the mistake in the subtractive normalization equation and will correct it. Thank you for catching it.

      Looking closer at the divisive normalization model, it really has two components: (a) the "lateral inhibition" by which a neuron gets suppressed if other neurons fire (here scaled by the parameter k) , and (b) a nonlinear sigmoid transformation (determined by the parameters n and sigma). Both lateral inhibition and nonlinearity are known to contribute to decorrelation in a neural population (e.g., Pitkow 2012). The "intraglomerular gain control" contains only the nonlinearity. The "subtractive normalization" we don't know. But if one wanted to put divisive and subtractive inhibition on the same footing, one should add a sigmoid nonlinearity in both cases.

      Our intent was not to place all the methods on the “same footing” but rather to isolate the two primary components of normalization methods – non-linearity and lateral inhibition – and determine which of these, and in which combination, could generate the desired effects. Divisive normalization incorporates both components, whereas intraglomerular gain control and subtractive normalization only incorporate one of these components. We will clarify this reasoning in the revision.

      The response models could be made more realistic in other ways. For example, in both locusts and fish, the 2nd order neurons get inputs from multiple receptor types; presumably, that will affect their response functions. Also, lateral inhibition can take quite different forms. In locusts, the inhibitory neurons seem to collect from many glomeruli. But in rats, the inhibition by short axon cells may originate from just a few sparse glomeruli, and those might be different for every mitral cell (Fantana 2008).

      We thank the Reviewer for the input. Instead of fixing k for all second-order neurons, we will apply different k values for different neurons. We will also systematically vary the percentage of neurons used for the divisive normalization calculation in the denominator, and determine the regime under which the effects experimentally observed are reproducible. This approach takes into account the scenario that inter-glomerular inhibitory interactions are sparse.

      There are questions raised by the following statements: "traded-off energy for faster and finer concentration discrimination" and "an additional type of second-order neuron (tufted cells) that has evolved in land vertebrates and that outperforms mitral cells in concentration encoding" and later "These results suggest a trade-off between concentration decoding and normalization processes, which prevent saturation and reduce energy consumption.". Are the tufted cells inferior to the mitral cells in any respect? Do they suffer from saturation at high concentration? And do they then fail in their postulated role for odor tracking? If not, then what was the evolutionary driver for normalization in the mitral cell pathway? Certainly not lower energy consumption (50,000 mitral cells = 1% of rod photoreceptors, each of which consumes way more energy than a mitral cell).

      The question of what mitral cells are “good for”, compared to tufted cells, remains unclear in our view. We speculate that mitral cells provide superior context-dependent processing and are better for determining stimuli-reward contingencies, but this remains far from settled experimentally.

      We believe the mitral cell pathway evolved earlier than tufted cells, since the former appear akin to projection neurons in insects. Nonetheless, we agree that differences in energy consumption are unlikely to be the primary distinguishing factor, and in the revision, we will drop this argument.

      Reviewer #2:

      The main premise that divisive normalization generates this diversity of dose-response curves in the second-order neurons is a little problematic. … The analysis in [Figure 3] indicates that divisive normalization does what it is supposed to do, i.e., compresses concentration information and not alter the rank-order of neurons or the combinatorial patterns. Changes in the combinations of neurons activated with intensity arise directly from the fact that the first-order neurons did not have monotonic responses with odor intensity (i.e., crossovers). This was the necessary condition, and not the divisive normalization for changes in the combinatorial code. There seems to be a confusion/urge to attribute all coding properties found in the second-order neurons to 'divisive normalization.' If the input from sensory neurons is monotonic (i.e., no crossovers), then divisive normalization did not change the rank order, and the same combinations of neurons are activated in a similar fashion (same vector direction or combinatorial profile) to encode for different odor intensities. Concentration invariance is achieved, and concentration information is lost. However, when the first-order neurons are non-monotonic (i.e., with crossovers), that causes the second-order neurons to have different rank orders with different concentrations. Divisive normalization compresses information about concentrations, and rank-order differences preserve information about the odor concentration. Does this not mean that the non-monotonicity of sensory neuron response is vital for robustly maintaining information about odor concentration? Naturally, the question that arises is whether many of the important features of the second-order neuron's response simply seem to follow the input. Or is my understanding of the figures and the write-up flawed, and are there more ways in which divisive normalization contributes to reshaping the second-order neural response? This must be clarified. Lastly, the tufted cells in the mouse OB are also driven by this sensory input with crossovers. How does the OB circuit convert the input with crossovers into one that is monotonic with concentration? I think that is an important question that this computational effort could clarify.

      It appears that there is confusion about the definitions of “non-monotonicity” and “crossovers”.  These are two independent concepts – one does not necessarily lead to the other. Non-monotonicity concerns the response of a single neuron to different concentration levels. A neuron’s response is considered non-monotonic if its response goes up then down, or down then up, across increasing concentrations. A “cross-over” is defined based on the responses of multiple neurons. A cross-over occurs when the response of one neuron is lower than another neuron at one concentration, but higher than the other at a different concentration. For example, the responses of both neurons could increase monotonically with increasing concentration, but one neuron might start lower and grow faster, hence creating a cross-over. We will clarify this in the manuscript, which we believe will resolve the questions raised above.

      The way the decoding results and analysis are presented does not add a lot of information to what has already been presented. For example, based on the differences in rank-order with concentration, I would expect the combinatorial code to be different. Hence, a very simple classifier based on cosine or correlation distance would work well. However, since divisive normalization (DN) is applied, I would expect a simple classification scheme that uses the Euclidean distance metric to work equally as well after DN. Is this the case?

      Yes, we used a simple classification scheme, logistic regression with a linear kernel, which is essentially a Euclidean distance-based classification. This scheme works better for tufted cells because they are more monotonic; i.e., if neuron A and B both increase their responsiveness with concentration, then Euclidean distance would be fine. But if neuron A’s response amplitude goes up and neuron B’s response goes down – as often happens for mitral cells – then Euclidean distance does not work as well. We will add intuition about this in the manuscript.

      Leave-one-trial/sample-out seems too conservative. How robust are the combinatorial patterns across trials? Would just one or two training trials suffice for creating templates for robust classification? Based on my prior experience (https://elifesciences.org/reviewed-preprints/89330https://elifesciences.org/reviewed-preprints/89330), I do expect that the combinatorial patterns would be more robust to adaptation and hence also allow robust recognition of odor intensity across repeated encounters.

      As suggested, we will compute the correlation coefficient of the similarity of neural responses for each odor (across trials). We will repeat this analysis for both mitral and tufted cells. To determine the effect of adaptation, we will compute correlation coefficients of responses between the 1st and 2nd trials vs the 1st and final trial.

      Lastly, in the simulated data, since the affinity of the first-order sensory neurons to odorants is expected to be constant across concentration, and "Jaccard similarity between the sets of highest-affinity neurons for each pair of concentration levels was > 0.96," why would the rank-order change across concentration? DN should not alter the rank order.

      We agree that divisive normalization should not alter the rank order, but the rank order may change in first-order neurons, which carries through to second-order neurons. This confusion may be related to the one mentioned above re: cross-overs vs non-monotonicity. Moreover, in the simulated data (Fig. 4D-H), the Jaccard similarity was calculated based on only the 50 neurons with the highest affinity, not the entire population of neurons. As shown in Fig. 4H, most of the rank-order change happens in the remaining 150 neurons.

      Note that in response to a comment by Reviewer 3, we will change the presentation of Fig. 4H in the revision.

      If the set of early responders does change, how will the decoder need to change, and what precise predictions can be made that can be tested experimentally? The lack of exploration of this aspect of the results seems like a missed opportunity.

      In the Discussion, we wrote about how downstream circuits will need to learn which set of neurons are to be associated with each distinct concentration level. We will expand upon this point and include experimentally testable predictions.

      Based on the methods, for Figures 1 and 2, it appears the responses across time, trials, and odorants were averaged to get a single data point per neuron for each concentration. Would this averaging not severely dilute trends in the data? The one that particularly concerns me is the averaging across different odorants. If you do odor-by-odor analysis, is the flattening of second-order neural responses still observable? Because some odorants activate more globally and some locally, I would expect a wide variety of dose-response relationships that vary with odor identity (more compressed in second-order neurons, of course). It would be good to show some representative neural responses and show how the extracted values for each neuron are a faithful/good representation of its response variation across intensities.

      It appears there is some confusion here; we will clarify in the text and figure captions that we did not average across different odors in our analysis. We will also add figure panels showing some representative neural responses as suggested by the Reviewer.

      A lot of neurons seem to have responses that flat line closer to zero (both firing rate and dF/F in Figure 1). Are these responsive neurons? The mean dF/F also seems to hover not significantly above zero. Hence, I was wondering if the number of neurons is reducing the trend in the data significantly.

      Yes, if a neuron responds to at least one concentration level in at least 50% of the trials, it is considered responsive. So it is possible that some neurons respond to one concentration level and otherwise flatline near zero.  We will highlight a few example neurons to visualize this scenario.

      I did not fully understand the need to show the increase in the odor response across concentrations as a polar plot. I see potential issues with the same. For example, the following dose-response trend at four intensities (C4 being the highest concentration and C1 the lowest): response at C3 > response at C1 and response at C4 > response at C2. But response at C3 < response at C2. Hence, it will be in the top right segment of the polar plot. However, the responses are not monotonic with concentrations. So, I am not convinced that the polar plot is the right way to characterize the dose-response curves. Just my 2 cents.

      Your 2 cents are valuable! Thank you for raising this point. Instead of computing two slopes (C1-C3 and C2-C4), we will expand our analysis to include all three slopes (C1-C2, C2-C3, C3-C4). Consequently, there are 2^3 = 8 different response shapes, and we will list them and quantify the fraction of the responses that fall into each shape category.

      In many analyses, simulated data were used (Figures 3 and 4). However, there is no comparison of how well the simulated data fit the experimental data. For example, the Simulated 1st order neuron in Figure 3D does not show a change in rank-order for the first-order neuron. In Figure 3E, temporal response patterns in second-order neurons look unrealistic. Some objective comparison of simulated and experimental data would help bolster confidence in these results.

      We believe the Reviewer is referring to Figs. 4D and 4E, since Fig. 3D does not show a first-order neuron simulation, and there is no Fig 3E. In Fig. 4D there is no change of rank order because the simulation is for a single odor and single concentration level, and the change of rank-order (i.e., cross-overs) as we define occurs between concentration levels. We will clarify this in the manuscript.

      Reviewer #3:

      While the authors focus on concentration-dependent increases in first-order neuron activity, reflecting the majority of observed responses, recent work from the Imai group shows that odorants can also lead to direct first-order neuron inhibition (i.e., reduction in spontaneous activity), and within this subset, increasing odorant concentration tends to increase the degree of inhibition. Some discussion of these findings and how they may complement divisive normalization to contribute to the diverse second-order neuron concentration-dependence would be of interest and help expand the context of the current results.

      We thank the Reviewer for the suggestion. We will request datasets of first-order neuron responses from the groups who acquired them. We will analyze this data to determine the role of inhibition or antagonistic binding and quantify what percentage of first-order neurons respond less strongly with larger concentrations.

      Related to the above point, odorant-evoked inhibition of second-order neurons is widespread in mammalian mitral cells and significantly contributes to the flattened concentration-dependence of mitral cells at the population level. Such responses are clearly seen in Figure 1D. Some discussion of how odorant-evoked mitral cell inhibition may complement divisive normalization, and likewise relate to comparatively lower levels of odorant-evoked inhibition among tufted cells, would further expand the context of the current results. Toward this end, replication of analyses in Figures 1D and E following exclusion of mitral cell inhibitory responses would provide insight into the contribution of such inhibition to the flattening of the mitral cell population concentration dependence.

      We will perform the analysis suggested, specifically, we will set the negative mitral cell responses to 0 and assess whether the population mean remains flat.

      The idea of concentration-dependent crossover responses across the first-order population being required for divisive normalization to generate individually diverse concentration response functions across the second-order population is notable. The intuition of the crossover responses is that first-order neurons that respond most sensitively to any particular odorant (i.e., at the lowest concentration) respond with overall lower activity at higher concentrations than other first-order neurons less sensitively tuned to the odorant. Whether this is a consistent, generalizable property of odorant binding and first-order neuron responsiveness is not addressed by the authors, however. Biologically, one mechanism that may support such crossover events is intraglomerular presynaptic/feedback inhibition, which would be expected to increase with increasing first-order neuron activation such that the most-sensitively responding first-order neurons would also recruit the strongest inhibition as concentration increases, enabling other first-order neurons to begin to respond more strongly. Discussion of this and/or other biological mechanisms (e.g., first-order neuron depolarization block) supporting such crossover responses would strengthen these results.

      We thank the reviewer for providing additional mechanisms to consider. As suggested, we will add discussion of these alternatives to divisive normalization.

      It is unclear to what degree the latency analysis considered in Figures 4D-H works with the overall framework of divisive normalization, which in Figure 3 we see depends on first-order neuron crossover in concentration response functions. Figure 4D suggests that all first-order neurons respond with the same response amplitude (R in eq. 3), even though this is supposed to be pulled from a distribution. It's possible that Figure 4D is plotting normalized response functions to highlight the difference in latency, but this is not clear from the plot or caption. If response amplitudes are all the same, and the response curves are, as plotted in Figure 4D, identical except for their time to half-max, then it seems somewhat trivial that the resulting second-order neuron activation will follow the same latency ranking, regardless of whether divisive normalization exists or not. However, there is some small jitter in these rankings across concentrations (Figure 4G), suggesting there is some randomness to the simulations. It would be helpful if this were clarified (e.g., by showing a non-normalized Figure 4D, with different response amplitudes), and more broadly, it would be extremely helpful in evaluating the latency coding within the broader framework proposed if the authors clarified whether the simulated first-order neuron response timecourses, when factoring in potentially different amplitudes (R) and averaging across the entire response window, reproduces the concentration response crossovers observed experimentally. In summary, in the present manuscript, it remains unclear if concentration crossovers are captured in the latency simulations, and if not, the authors do not clearly address what impact such variation in response amplitudes across concentrations may have on the latency results. It is further unclear to what degree divisive normalization is necessary for the second-order neurons to establish and maintain their latency ranks across concentrations, or to exhibit concentration-dependent changes in latency.

      As suggested by the Reviewer, we will add another simulation scenario where the response amplitudes (R) are different for different neurons. For each concentration, we will then average each neuron’s response across the entire response window and determine if the simulation reproduces the cross-overs as observed experimentally.

      How the authors get from Figure 4G to 4H is not clear. Figure 4G shows second-order neuron response latencies across all latencies, with ordering based on their sorted latency to low concentration. This shows that very few neurons appear to change latency ranks going from low to high concentration, with a change in rank appearing as any deviation in a monotonically increasing trend. Focusing on the high concentration points, there appear to be 2 latency ranks switched in the first 10 responding neurons (reflecting the 1 downward dip in the points around neuron 8), rather than the 7 stated in the text. Across the first 50 responding neurons, I see only ~14 potential switches (reflecting the ~7 downward dips in the points around neurons 8, 20, 32, 33, 41, 44, 50), rather than the 32 stated in the text. It is possible that the unaccounted rank changes reflect fairly minute differences in latencies that are not visible in the plot in Figure 4G. This may be clarified by plotting each neuron's latency at low concentration vs. high concentration (i.e., similar to Figure 4H, but plotting absolute latency, not latency rank) to allow assessment of the absolute changes. If such minute differences are not driving latency rank changes in Fig. 4G, then a trend much closer to the unity line would be expected in Figure 4H. Instead, however, there are many massive deviations from unity, even within the first 50 responding neurons plotted in Figure 4G. These deviations include a jump in latency rank from 2 at low concentration to ~48 at high concentration. Such a jump is simply not seen in Figure 4G.

      We apologize that Fig. 4H was a poor choice for visualization. What is plotted in Fig. 4H is the sorted identity of neurons under low and high concentrations, and points on the y=x line indicate that the two corresponding neurons have the same rank under the two concentrations. We will replace this panel with a more intuitive visualization, where the x and y axes are the ranks of the neurons; and deviation from the y=x line indicates how different the ranks are of a neuron to the two concentrations.

      In the text, the authors state that "Odor identity can be encoded by the set of highest-affinity neurons (which remains invariant across concentrations)." Presumably, this is a restatement of the primacy model and refers to invariance in latency rank (since the authors have not shown that the highest-affinity neurons have invariant response amplitudes across concentration). To what degree this statement holds given the results in Figure 4H, however, which appear to show that some neurons with the earliest latency rank at low concentration jump to much later latency ranks at high concentration, remains unclear. Such changes in latency rank for only a few of the first responding neurons may be negligible for classifying odor identity among a small handful of odorants, but not among 1-2 orders of magnitude more odors, which may feasibly occur in a natural setting. Collectively, these issues with the execution and presentation of the latency analysis make it unclear how robust the latency results are.

      The original primacy model states that the latency of a neuron decreases with increasing concentration, while the ranks of neurons remain unaltered. Our results, on the other hand, suggest that the ranks do at least partially change across concentrations. This leads to two possible decoding mechanisms. First, if the top K responding neurons remain invariant across concentrations (even if their individual ranks change within the top K), then the brain could learn to associate a population of K neurons with a response latency; lower response latency means higher concentration. Second, if the top K responding neurons do not remain invariant across concentrations, then the brain would need to learn to associate a different set of neurons with each concentration level. The latter imposes additional constraints on the robustness of the primacy model and the corresponding read-out mechanism. We will include more discussion of these possibilities in the revision.

      Analysis in Figures 4A-C shows that concentration can be decoded from first-order neurons, second-order neurons, or first-order neurons with divisive normalization imposed (i.e., simulating second-order responses). This does not say that divisive normalization is necessary to encode concentration, however. Therefore, for the authors to say that divisive normalization is "a potential mechanism for generating odor-specific subsets of second-order neurons whose combinatorial activity or whose response latencies represent concentration information" seems too strong a conclusion. Divisive normalization is not generating the concentration information, since that can be decoded just as well from the first-order neurons. Rather, divisive normalization can account for the different population patterns in concentration response functions between first- and second-order neurons without discarding concentration-dependent information.

      We agree that the word “generating” is faulty. We thank the reviewer for their more precise wording, which we will adopt.

      Performing the same polar histogram analysis of tufted vs. mitral cell concentration response functions (Figure 5B) provides a compelling new visualization of how these two cell types differ in their concentration variance. The projected importance of tufted cells to navigation, emerging directly through the inverse relationship between average concentration and distance (Figure 5C), is not surprising, and is largely a conceptual analysis rather than new quantitative analysis per se, but nevertheless, this is an important point to make. Another important consideration absent from this section, however, is whether and how divisive normalization may impact tufted cell activity. Previous work from the authors, as well as from Schoppa, Shipley, and Westbrook labs, has compellingly demonstrated that a major circuit mediating divisive normalization of mitral cells (GABA/DAergic short-axon cells) directly targets external tufted cells, and is thus very likely to also influence projection tufted cells. Such analysis would additionally provide substantially more justification for the Discussion statement "we analyzed an additional type of second-order neuron (tufted cells)", which at present instead reflects fairly minimal analysis.

      We agree that tufted cells are subject to divisive normalization as well, albeit probably to a less degree than mitral cells. To determine the effect of this, we will alter the strength (and degree of sparseness of interglomerular interactions) of divisive normalization and determine if there is a regime where response features of tufted cells match those observed experimentally.

      • ClojureScript has excelled in standard UI patterns but now aims to harness modern browser APIs for high-performance use cases.

      "The majority of Clojurescript application development and community discussions seems to be focused on improving standard UI implementation patterns and the general workflow of how we can build web applications better, easier and faster." (Medium)

      • The workshop’s goal was to probe ClojureScript’s internals, identify bottlenecks, and introduce technologies like WebGL, WebRTC, WebWorkers, and Emscripten.

      "So for this workshop I chose to look more below the surface of Clojurescript, analyze problem areas, examine possible optimization strategies and above all introduce people to a number of modern web technologies (WebGL, WebRTC, WebWorkers, Emscripten), techniques & tools offering possible routes to use the language in a sound and elegant way to work with these features." (Medium)

      • A six-step implementation of Conway’s Game of Life, from naive to optimized, achieved a speedup from 10,840 ms to 16.5 ms per frame (\~650× faster) on a 1024×1024 grid.

      "Six implementations of Conway’s Game of Life — from naive (but idiomatic & slow) to optimized Clojurescript using typed arrays and direct pixel manipulations (10,840 ms / frame vs 16.5 ms / frame = \~650× faster for a 1024×1024 grid)." (Medium)

      • A compile-time macro version of get-in eliminated temporary vector allocations and reduce calls, boosting lookup speed from 205.18 ns to 43.61 ns (\~5× faster).

      "Benchmarking this example with criterium under Clojure (which has somewhat different/faster protocol dispatch than in Clojurescript), the macro version results in 43.61ns vs 205.18ns for the default get-in (\~5× faster)." (Medium)

      • Switching from nested vectors to a flat 1D vector enabled nth-based indexing (\~6× speed-up), before Typed Arrays and loop-based pixel updates removed millions of function calls for the full \~650× gain.

      "The more obvious improvement to speed up the simulation was using a flat 1D vector to encode the grid and calculate cell indices for the 2D coordinates ... gain a \~6× speed up ... Since all our data ... are stored in typed arrays ... and altogether gained a \~650× speedup compared to the original." (Medium)

      • Adopting transduce for neighbor counting proved \~15–20% slower than map & reduce, highlighting that idiomatic functions can sometimes underperform.

      "One of the intermediate steps ... was using transduce instead of map & reduce to compute the number of alive neighbor cells, however this ended up actually being \~15–20% slower in this case." (Medium)

      • Effective WebGL programming demands deep knowledge of geometry, linear algebra, the OpenGL state machine, GPU pipelines, and GLSL, making it daunting for newcomers.

      "To anyone interested in directly utilizing the GPU in the browser, WebGL is a huge & fascinating topic, but it can also be very daunting for newcomers to graphics programming, since efficient use of it requires a multitude of prerequisite knowledge and terminology about 2D/3D geometry, linear algebra, spatial thinking in multiple spaces (coordinate systems), low-level data organization, the OpenGL state machine ... GPU processing pipelines, knowledge of the GLSL shading language, color theory etc." (Medium)

      • The thi.ng/geom library employs Clojure maps for semi-declarative OpenGL/WebGL buffer and shader specifications, while preserving explicit control over the GL state machine.

      "The thi.ng/geom library takes a semi-declarative approach to working with OpenGL/WebGL in that it’s extensively using Clojure maps to define various geometry and shader specifications, which are then compiled into the required data buffers & GLSL programs ... but at no point is it hiding the underlying layer, giving advanced users full control over the GL state machine." (Medium)

      • Shadergraph addresses GLSL code reuse by offering transitive dependency resolution, a library of common functions, compile-time minification, and metadata extraction for tooling.

      "To address this in Clojurescript from early on, we can use the thi.ng/shadergraph library, which provides us with: a transitive dependency resolution mechanism for GLSL code ... a growing library of pure, commonly used GLSL functions (lighting, color conversion, matrix math, rotations, effects etc.) ... and a basic compile-time shader minifier ... Clojure meta data extraction of the defined GLSL functions ..." (Medium)

      • A hands-on WebRTC demo showed how to stream a camera feed into Shadertoy-style WebGL shaders for real-time video FX processing.

      "I prepared a small example combining a WebRTC camera stream with Shadertoy-like WebGL image processing using a bunch of effect options." (Medium)

      • True parallelism in the browser comes from WebWorkers—unlike core.async’s simulated concurrency—and relies on isolated modules, message passing, and transferable ArrayBuffers for efficient data sharing.

      "However, the currently only way to obtain real extra compute resources of a multi-core CPU in JavaScript is to use WebWorkers ... WebWorker code needs to be loaded from a separate source file and can only communicate with the main process via message passing. By default, the data passed ... is copied, but some types (e.g. ArrayBuffers) can also be transferred ..." (Medium)

      • Emscripten’s LLVM-based compiler targets asm.js (and soon WebAssembly), enabling C/C++ modules to outperform idiomatic ClojureScript for math-heavy and mutable-data tasks.

      "Emscripten ... a LLVM-based transpiler for C and C++ to asm.js ... the resulting asm.js code almost always performs noticeably faster than the Clojurescript version ... With WebAssembly on the horizon, it’s maybe a good time to invest some time into some “upskilling” ..." (Medium)

      • The workshop’s capstone was a C-based 3D particle system demo, using Emscripten’s JavaScript ArrayBuffer heap and typed arrays to pack 36-byte particle structs tightly and avoid copying overhead.

      "For the final exercise ... we implemented a simple 3D particle system in C, compiled it with Emscripten and learned how to integrate it into a Clojurescript WebGL demo ... The Emscripten runtime emulates the C heap as a single, large JS ArrayBuffer ... Each particle only takes up 36 bytes ... all particles in this array are tightly packed ..." (Medium)

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to the Reviews

      We thank the reviewers for their input and detailed feedback, which has helped us improve both the manuscript and the Microscopy Nodes software. Based on the comments, we have implemented new features, currently available as version 2.2.1 of Microscopy Nodes. We have edited the text and figures of the manuscript to reflect these changes and add clarification where needed.

      Reviewer #1

      Evidence, reproducibility and clarity

      *The work by Gros et al. presents a paper introducing Microscopy Nodes, a new plugin for Blender 3D visualization software designed to import and visualize multi-dimensional (up to 5D) light and electron microscopy datasets. Given that Blender is not directly suited for such tasks, this plugin significantly simplifies the process, making its visualization engine accessible to a wide range of researchers without prior knowledge of Blender. The plugin supports importing volumes and labels from generic TIF or modern OME-Zarr image formats and includes supplementary video tutorials on YouTube to facilitate basic understanding of the visualization workflows.

      Major comments: - The manuscript suggests that Microscopy Nodes can easily handle large datasets, as evidenced by the showcases. However, in my personal tests, I was unable to import a moderate TIF stack of about 5GB, which is considerably smaller than the showcased datasets. Post-import, a data cube was displayed, but the Blender interface became unresponsive. The manuscript should include a section stating limitations and addressing issues and providing suggestions for visualization of large datasets.*

      We want to thank the reviewer for this valuable comment, which led us to find a core issue in Blender’s large data handling. Specifically, Blender’s rasterized pipeline causes issues with > 4 GiB of data loaded. This issue does not occur in the raytraced (Cycles) renderer, which is why we had not previously encountered it.

      To address this, we have extended the reloading workflow of Microscopy Nodes to provide a workaround for this. If the data is larger than 4 Gibibytes (GiB) (per timepoint, or per timepoint per channel), Microscopy Nodes now automatically downsamples these data during import. While using these downsampled options is recommended for adjusting the visualization settings, the user can then still make their animation and reload their data to the largest scale for the final render by using the raytraced (Cycles) renderer. Additionally, we have raised this bug with the core Blender developers, and hope to work this out in the long term (blender/blender#136263).

      We reflect these changes in the manuscript in the segment:

      “Blender currently has a notable limitation that its default ‘quick’ rasterized rendering engines (such as ‘EEVEE’, but also the viewport ‘Surface’ and ‘Wireframe’ modes) do not support more than 4 Gibibytes (GiB) of volumetric data. The raytracing render mode ‘Cycles’, however, can handle large volumetric data. To allow users with large data to flexibly use Microscopy Nodes, we implemented a reloading scheme, where one first loads a smaller version of the data (under 4 GiB per timeframe for all loaded channels combined) - and only upon final render in Cycles, exchange it for the full/larger scale copy (Fig 3A). This downscaling of data offers additional benefits as it allows for fast adjustment of the render settings on e.g. a personal computer which can eventually be transferred to a larger workstation or HPC cluster for the final render at full resolution. This feature is critical as working in Cycles with larger files requires sufficient RAM to fit the (temporary) VDB files comfortably. For example, multiple figures in this manuscript were made on a 32GB RAM M1 Macbook Pro (Fig 1A, Video SV1, Fig 1D, Figure 2A-D, Fig S2A-B), but for larger data or long movies the movies were made on workstations or prepared on a laptop and then transferred to an HPC cluster for final rendering.”

      * - The feature of importing Zarr-datasets over HTTP is great, but the import process was very slow in my tests, even on a robust network. For reference, loading 1.8 GB of the PRPE1_4x dataset at s1 level took 52 minutes. This raises concerns about potential code issues and general usability of the suggested workflow.*

      We believe that this loading time may have been caused by the same issue that plagued all of our datasets of >4GB outside of the raytraced mode, as we have not seen loading issues like that. Moreover, Microscopy Nodes now supports Zarr version to Zarr 3/OME-Zarr 0.5, which allows ‘sharded’ Zarr datasets, which should be even faster at loading large blocks of data at the same time, as Microscopy Nodes does.

      - The onsite documentation is a bit outdated and fails to fully describe the plugin settings.

      We have updated our documentation to offer new written tutorials, which include full start-up tutorials, but also for some key extra instructions.

      - The YouTube tutorials feature an outdated version of the plugin, which could confuse the general microscopy audience. These should be updated to better align with the current plugin functionality. Additionally, using smaller, easily accessible datasets for these tutorials would improve user testing experiences. Hosting complete (downsampled) demo project folder on platforms like zenodo.org could also enhance usability of such tutorials.

      We have made a new series of YouTube tutorials that align with the current interface of Microscopy Nodes. These tutorials include public datasets, allowing users to follow along easily. We have chosen to also retain the older tutorials for users running legacy versions of the plugin, as they cover different workflows.

      - The manuscript describes a novel dataset used in Fig. 2, but no reference is provided. Additionally, practical implementation of the coloring description for Fig. 2D can be unclear for inexperienced users, necessitating either step-by-step instructions or the provision of downsampled Blender files to aid understanding.

      We have now shared the OME-Zarr address in the text (https://uk1s3.embassy.ebi.ac.uk/idr/share/microscopynodes/FIBSEM_dino_masks.zarr), and included this both in the manuscript and the tutorials. Additionally, to guide the implementation and explain the logic behind the coloring we introduced additional panels in Fig S1 and Fig S2 to showcase the shader setups used for this image.

      [OPTIONAL] When importing labels, they can be assigned to individual materials only if initially split into multiple color channels. It would be great if the same logic is implemented when those materials are provided as indices within a single color channel. There can be a switch to define the logic used during the import process: e.g. the current one, when the objects are just colored based on a color map, or when they are arranged as individual materials as done when labels are imported from multiple color channels.

      We agree with the reviewer and to address this concern with the update to version 2.2, we have implemented a new colorpicking system (See Fig 3B, inset 3, Fig 3C), this allows users to choose between a single color, various continuous, or categorical color maps.

      Minor comments: - The manuscript shows nice visualizations of time series, light, and electron microscopy datasets, but in its current state, it is targeted more for light microscopy, where the signal is white. On the other hand, many EM datasets are rendered in inverted contrast (TEM-like), where the signal is black. To render such volume properly, it is needed to go into the Shading tab and flip the color ramp. Would it be possible to perhaps define the data type during import to accommodate various data types or perhaps select the flipped color ramp when the emission mode is switched off? It could make it easier for inexperienced EM users to use the plugin.

      To address this, we include new default settings, with ‘invert colormaps on load’ option in the preferences, and default colors per channel (See Fig S4). We have also implemented a new color picking system in version 2.2 (See Fig 3B, inset 3, Fig 3C) that hopefully makes it easier before and after load to change colors.

      - It was not completely clear to me whether it is possible to render a single/multiple EM slices using the inverted (TEM-like) contrast. For example, XY, XZ, YZ ortho slices across the volume. The manuscript contains: "This visualization is also supported in Blender, allowing for arbitrary selections of viewing angles (Fig 2B).", but it is not clear how to achieve that.

      We introduced an additional explanation in Fig S1A and added a separate density window in the default shader to make this opaque view easier. To get a single slicing plane, users can reduce the scale of the slicing cube in one axis, at it is now also explained in Fig S2B.

      - In 3D microscopy, it is quite common to have data with anisotropic voxels. As a result, the surfaces may require smoothing. I was not able to quickly find a way to smooth the surfaces (at least smooth modifiers for surfaces did not work for me). Is it possible to apply smoothing during the import of labels, or alternatively, smoothing of the generated surfaces can be a topic for an additional YouTube video.

      The smoothness of the loaded masks can be indirectly affected in the preferences by changing the mesh resolution (changing the relative amount of vertices per pixel), but can be further affected by operations such as the Blender “Smooth” or e.g. the “Smooth by Laplacian” modifiers. To guide the users in doing so, we have included instructions for smoothing in the written tutorials on the website https://aafkegros.github.io/MicroscopyNodes/tutorials/surface_smoothing/ .

      - It is also typical to have somewhat custom color maps for materials. It would be great if the plugin remembers the previously used color map for labels.

      We have implemented new Preference settings, which include default colors and colormaps per channel, improving customization and reproducibility. This new option is described in Figure S4.

      * - The pixel size edit box rounds up the values to 2 digits after the dot. Could it be changed to accommodate 3 or 4 digits as the units are um.*

      Blender’s interface truncates the display, but stores higher-precision values internally, and become visible when users click or edit the values. We have added support for alternative pixel units to reduce the impact of the truncation.

      - Import is not working when: - Start Blender - Select Data storage: with project - Overwrite files: on, set env: on, chunked: on - Select a file to import - Save Blender file - Pressing the Load button gives an error: "Empty data directory - please save the project first before using With Project saving."

      We thank the reviewer for finding this bug which is now fixed in version 2.2.

      - I was not able to play the downloaded supplementary video 3 using my VLC media player, while it was working fine in a browser. The video can be opened but looks distorted and heavily zoomed in. It may need to be re-saved from a video editor.

      We have recompiled this video.

      - References 12 and 16 are URL links instead of proper references to articles.

      Thanks for catching this mistake in our bibliography. We have corrected this.

      Significance

      *This work effectively bridges a gap in the availability of tools for 3D microscopy dataset visualization. While many visualization programs exist, the high-quality ones are often expensive and thus not accessible to all researchers. The integration of Blender with Microscopy Nodes democratizes access to high-quality 3D visualization, enabling researchers to explore datasets and models from multiple perspectives, potentially leading to new discoveries and enhancing the understanding of key study findings. Despite its limitations, my experience with the plugin was engaging and useful. I would like to thank the authors for such useful work!

      Limitations: - There remains a steep learning curve associated with using Microscopy Nodes, primarily due to Blender's complexity. More comprehensive tutorials could help mitigate this. - The conversion of imported images to Blender's internal 32-bit format results in a 4x increase in data size for 8-bit datasets. - Managing moderate-sized volumes (5-10 GB) can be challenging without clear strategies for effective handling. - The import of Zarr-datasets over the net is notably slow.

      Audience: The plugin is suitable for a broad audience with a basic understanding of 3D visualization concepts, providing a solid foundation for exploring Blender's extensive features and options for optimal visualizations.

      Reviewer expertise: Light microscopy, electron microscopy, image segmentation and analysis, software development, no experience with Blender*

      Reviewer #2

      *Evidence, reproducibility and clarity *

      *Summary:

      The article introduces Microscopy Nodes, a Blender add-on designed to simplify the loading and visualization of 3D microscopy data. It supports TIF and OME-Zarr images, handling datasets with up to five dimensions. The authors present different visualization modes, including volumetric rendering, isosurfaces, and label masks, demonstrating the application in light and electron microscopy. They provide examples using expansion microscopy, electron microscopy, and real-time imaging, highlighting how the tool enhances scientific communication and interactive visualization.

      Comments:

      However, some key aspects could be improved to enhance usability and reproducibility:

      Example datasets: The images used in the YouTube tutorials were not accessible, making it difficult to reproduce the workflows shown in the figures and tutorials. It would be helpful if the authors provided direct links to the datasets or ensured that the same examples used in the tutorials were readily available for replication.*

      We created new and updated tutorials and for all new tutorials, the data is now easily available from an S3 server.

      Input file specifications: The article does not clearly detail how input files should be formatted. Many users will pre-visualize images in Fiji to convert their original images to a compatible format. It would be beneficial to specify which formats are supported for hyperstack creation, including details on bit depth, dimension ordering, label formats, and metadata compatibility, if applicable.

      We have added new documentation on this on the website and in the manuscript. The addon can take 8, 16, and 32 bit data, and any dimension order (with the letters tzcyx) and pixel size. Dimension order and pixel size can be edited in the GUI. This is reflected in the manuscript in the rewritten section in Design and Implementation:

      “It can handle 8bit to 32bit integer and floating point data, although all data types will be resaved into 32bit floating point VDB files, which can cause temporary files to take up more space than the original. Microscopy Nodes loads 2D to 5D files of containing data across time, z, y, x and channels, in arbitrary order (can be remapped in the user interface as well, Fig 3B, inset 2). To focus on relevant data, users can clip the time axis, which can be useful for long videos.”

      * Hardware requirements: The article does not discuss RAM or hardware constraints in detail. In testing, attempting to load two images into the same project caused the program to freeze (tested on Mac M1). Specifying hardware requirements and limitations would help users manage expectations when working with large datasets.*

      We have since found a limitation in the Blender engine that indeed limits the amount of data loaded (see also comment by Reviewer 1). Currently, rasterized engines are capped at 4 GiB, and only the raytraced engine can handle larger data. As such, the Microscopy Nodes pipeline, where one works with small images until it is time to render a final version, and the data is only exchanged for the final render, is still viable. To make this easier, we now also included optional downscaling for Tif images. This is described in the rewritten section on Design and Implementation:

      “Blender currently has a notable limitation that its default ‘quick’ rasterized rendering engines (such as ‘EEVEE’, but also the viewport ‘Surface’ and ‘Wireframe’ modes) do not support more than 4 Gibibytes (GiB) of volumetric data. The raytracing render mode ‘Cycles’, however, can handle large volumetric data. To allow users with large data to flexibly use Microscopy Nodes, we implemented a reloading scheme, where one first loads a smaller version of the data (under 4 GiB per timeframe for all loaded channels combined) - and only upon final render in Cycles, exchange it for the full/larger scale copy (Fig 3A). This downscaling of data offers additional benefits as it allows for fast adjustment of the render settings on e.g. a personal computer which can eventually be transferred to a larger workstation or HPC cluster for the final render at full resolution. This feature is critical as working in Cycles with larger files requires sufficient RAM to fit the (temporary) VDB files comfortably. For example, multiple figures in this manuscript were made on a 32GB RAM M1 Macbook Pro (Fig 1A, Video SV1, Fig 1D, Figure 2A-D, Fig S2A-B), but for larger data or long movies the movies were made on workstations or prepared on a laptop and then transferred to an HPC cluster for final rendering.”

      Significance

      *General Assessment:

      One of the major strengths of this work is its seamless compatibility with Blender, a powerful and widely used animation and 3D rendering tool. Integrating advanced visualization techniques from the animation and graphics industry into scientific imaging opens new possibilities for presenting complex microscopy data in an intuitive and accessible way. Additionally, the support for OME-Zarr is particularly valuable, as this format represents a major shift in bioimaging towards scalable, cloud-compatible, and standardized data storage solutions. The adoption of OME-Zarr facilitates large-scale data handling and improves interoperability across imaging platforms, making this integration a significant step forward for the field. Overall, the greatest strength of the tool lies in its flexibility for rendering microscopy data, but its accessibility for users without Blender experience might be a challenge.

      Advance in the Field This work introduces a novel solution to the visualization challenges in microscopy by leveraging Blender's advanced rendering capabilities.

      Audience This paper will be of interest to: Bioimage researchers seeking to enhance their microscopy data visualization. Image analysis tool developers interested in integrating advanced visualization into their workflows.

      Field of Expertise This review is based on expertise in image analysis, segmentation, and 3D biological data visualization.*

      *Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The paper "Microscopy Nodes: Versatile 3D Microscopy Visualization with Blender" presents an easy and accessible approach for microscopists and microscopy users to visualize their data in a different and more controlled way. The authors have developed a plug-in script that enables the integration of complex 3D datasets into Blender, a widely used software for 3D visualization and illustration. By leveraging Blender's advanced rendering engine, the plug-in provides greater control over the scene, enviromint and presentation of the 3D data.

      I believe that this development, especially when combined with additional analysis tools can be of a great value for microscopist and advanced users to presenting their 3D data sets.

      However, at this stage, the paper does not seem to fully demonstrate the benefits of using Microscopy Nodes. To enhance the paper impact, it would be helpful for the authors to further emphasize and provide examples of how Blender's rendering specifically improves data presentation and, in turn, enhances the understanding of the data compared to existing solutions. Specifically, the authors claim at the end of the introduction that their development provides powerful tools for high-quality, visually compelling presentations, enabling "more effective communication of 3D biological data." I believe this statement should be supported by a figure comparing currently available visualization methods and demonstrating how using Blender enhances data presentation and by which enhances the communication of the results. *

      *Additionally, at the end of the first paragraph of the results, the authors say: "These options allow us to combine the data and its analyzed interpretation in the same representation with Microscopy Nodes." However, this capability already exists in currently available software. Aside from now being able to achieve this in Blender, what additional benefits does it offer? *

      We now include a new Table 1, to showcases which requirements for visualizing complex biological data are available in different visualization software, and discuss this in the text:

      “Although several tools for 3D visualization of bioimages already exist and offer essential features for microscopy data (Table 1), many are proprietary, and open-source alternatives often struggle to deliver a comprehensive user experience, such as advanced animation and annotation controls. Proprietary solutions may offer some of these capabilities, but they are frequently limited by licensing costs, platform restrictions, and a lack of customizability. In contrast, Blender is a mature, well-supported open-source platform with a large community of developers that excels in both animation and visualization. By integrating microscopy-specific functionality through Microscopy Nodes, Blender becomes a uniquely powerful solution that bridges the gap between high-end graphics capabilities and the specialized needs of bioimage visualization.”

      Additionally, we attempted to remake Figure 2C and 2D in the EM-field standard software Amira, but were not able to. This is because without an advanced light scattering algorithm, it is very hard to see the depth in the nucleus, and the semi-transparent masks do show each other behind them, but cannot interact with the volume.

      We chose not to include this in the actual manuscript, as we are not experts at the Amira software, and will, by the nature of this manuscript, present a challenge that Blender is especially good at, such as here the combination of scattering light and semitransparent masks.

      * In the last sentence of the second paragraph of the results, it is stated: "Blender powered by Microscopy Nodes: the ability to combine microscopy data with any 3D illustration in the same 3D environment." Could you please elaborate on the accuracy of the models that can be built and provide guidelines for achieving this using the data coordinates imported by Microscopy Nodes? If the illustrations are purely freehand and do not require specific accuracy, it would be helpful to clarify the advantages of creating them within the same environment rather than separately, as many scientists currently do. Additionally, if the inclusion of 3D model illustrations is one of the key advantages of using Blender, I believe it would be beneficial to present this in a figure rather than only in the supplementary video. *

      We thank the reviewer for this comment and agree that in the previously submitted version of Microscopy Nodes, it was very difficult to align objects accurately, as the coordinate space was not transparent. A hurdle in this was the fact that Blender only works well with the unit ‘meters’. To address this issue, we now provide a choice of mapping the physical size to meters, as shown in the new interface (See Fig 3B, inset 5). Here the user can choose from the default ‘px -> cm’ (this will always look fine for a quick look) to options such as ‘nm -> m’ or ‘µm -> m’, which, combined with the new choice for adjusting the object origin upon load, allow users to treat the Blender coordinate space as based on the actual physical scales. Additionally, other Blender addons, such as Molecular Nodes (Reference 25 of the manuscript), also allow for accurate localization for cryo-EM datasets.

      We appreciate the note that we should more clearly display the ability to show our illustrations and the data together in the figure and have added a visualization to show this in Figure 1C.

      * Reviewer #3 (Significance (Required)):

      The significance of the paper at this stage is primarily technical and mainly relevant to the field of microscopy

      My field of expertise is microscopy and 3D visualization of models using mainly Maya3D and AMIRA.*

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The work by Gros et al. presents a paper introducing Microscopy Nodes, a new plugin for Blender 3D visualization software designed to import and visualize multi-dimensional (up to 5D) light and electron microscopy datasets. Given that Blender is not directly suited for such tasks, this plugin significantly simplifies the process, making its visualization engine accessible to a wide range of researchers without prior knowledge of Blender. The plugin supports importing volumes and labels from generic TIF or modern OME-Zarr image formats and includes supplementary video tutorials on YouTube to facilitate basic understanding of the visualization workflows.

      Major comments:

      • The manuscript suggests that Microscopy Nodes can easily handle large datasets, as evidenced by the showcases. However, in my personal tests, I was unable to import a moderate TIF stack of about 5GB, which is considerably smaller than the showcased datasets. Post-import, a data cube was displayed, but the Blender interface became unresponsive. The manuscript should include a section stating limitations and addressing issues and providing suggestions for visualization of large datasets.
      • The feature of importing Zarr-datasets over HTTP is great, but the import process was very slow in my tests, even on a robust network. For reference, loading 1.8 GB of the PRPE1_4x dataset at s1 level took 52 minutes. This raises concerns about potential code issues and general usability of the suggested workflow.
      • The onsite documentation is a bit outdated and fails to fully describe the plugin settings.
      • The YouTube tutorials feature an outdated version of the plugin, which could confuse the general microscopy audience. These should be updated to better align with the current plugin functionality. Additionally, using smaller, easily accessible datasets for these tutorials would improve user testing experiences. Hosting complete (downsampled) demo project folder on platforms like zenodo.org could also enhance usability of such tutorials.
      • The manuscript describes a novel dataset used in Fig. 2, but no reference is provided. Additionally, practical implementation of the coloring description for Fig. 2D can be unclear for inexperienced users, necessitating either step-by-step instructions or the provision of downsampled Blender files to aid understanding.

      [OPTIONAL] When importing labels, they can be assigned to individual materials only if initially split into multiple color channels. It would be great if the same logic is implemented when those materials are provided as indices within a single color channel. There can be a switch to define the logic used during the import process: e.g. the current one, when the objects are just colored based on a color map, or when they are arranged as individual materials as done when labels are imported from multiple color channels.

      Minor comments:

      • The manuscript shows nice visualizations of time series, light, and electron microscopy datasets, but in its current state, it is targeted more for light microscopy, where the signal is white. On the other hand, many EM datasets are rendered in inverted contrast (TEM-like), where the signal is black. To render such volume properly, it is needed to go into the Shading tab and flip the color ramp. Would it be possible to perhaps define the data type during import to accommodate various data types or perhaps select the flipped color ramp when the emission mode is switched off? It could make it easier for inexperienced EM users to use the plugin.
      • It was not completely clear to me whether it is possible to render a single/multiple EM slices using the inverted (TEM-like) contrast. For example, XY, XZ, YZ ortho slices across the volume. The manuscript contains: "This visualization is also supported in Blender, allowing for arbitrary selections of viewing angles (Fig 2B).", but it is not clear how to achieve that.
      • In 3D microscopy, it is quite common to have data with anisotropic voxels. As a result, the surfaces may require smoothing. I was not able to quickly find a way to smooth the surfaces (at least smooth modifiers for surfaces did not work for me). Is it possible to apply smoothing during the import of labels, or alternatively, smoothing of the generated surfaces can be a topic for an additional YouTube video.
      • It is also typical to have somewhat custom color maps for materials. It would be great if the plugin remembers the previously used color map for labels.
      • The pixel size edit box rounds up the values to 2 digits after the dot. Could it be changed to accommodate 3 or 4 digits as the units are um.

      • Import is not working when:

      • Start Blender
      • Select Data storage: with project
      • Overwrite files: on, set env: on, chunked: on
      • Select a file to import
      • Save Blender file
      • Pressing the Load button gives an error: "Empty data directory - please save the project first before using With Project saving."
      • I was not able to play the downloaded supplementary video 3 using my VLC media player, while it was working fine in a browser. The video can be opened but looks distorted and heavily zoomed in. It may need to be re-saved from a video editor.
      • References 12 and 16 are URL links instead of proper references to articles.

      Significance

      This work effectively bridges a gap in the availability of tools for 3D microscopy dataset visualization. While many visualization programs exist, the high-quality ones are often expensive and thus not accessible to all researchers. The integration of Blender with Microscopy Nodes democratizes access to high-quality 3D visualization, enabling researchers to explore datasets and models from multiple perspectives, potentially leading to new discoveries and enhancing the understanding of key study findings. Despite its limitations, my experience with the plugin was engaging and useful. I would like to thank the authors for such useful work!

      Limitations:

      • There remains a steep learning curve associated with using Microscopy Nodes, primarily due to Blender's complexity. More comprehensive tutorials could help mitigate this.
      • The conversion of imported images to Blender's internal 32-bit format results in a 4x increase in data size for 8-bit datasets.
      • Managing moderate-sized volumes (5-10 GB) can be challenging without clear strategies for effective handling.
      • The import of Zarr-datasets over the net is notably slow.

      Audience: The plugin is suitable for a broad audience with a basic understanding of 3D visualization concepts, providing a solid foundation for exploring Blender's extensive features and options for optimal visualizations.

      Reviewer expertise: Light microscopy, electron microscopy, image segmentation and analysis, software development, no experience with Blender

  6. clavis-nxt-user-guide-clavisnxt-erste-dev.apps.okd.dorsum.intra clavis-nxt-user-guide-clavisnxt-erste-dev.apps.okd.dorsum.intra
    1. Bevezet

      Ez alá beírnám a következőt. Az SSI - Standard Settlement instruction. Ez a kifejezés azt jelenti, hogy a tőkepiaci értékpapír tranzakciók az elszámolópartnerek között egy meghatározott standard elszámolási rend szerint kell végezni az adott tőkepiacon az elszámolóháznál nyitott un. helyszámlákon a befektetési és letétkezelési tevékenységet nyújtó intézményenek. Ezen SSI adatoknak tartalmaznia kell az elszámolópartnerek, és elszámolási hely azonosíthatóságát , melyet jellemzően BIC code-al vagy un. proprietary code-al (saját azonosítóval) is meg lehet tenni. Az alábbiakban azt taglaljuk, hogyan lehet ezeket az SSI adatokat pl. elszámolópartnerekhez BIC code rögzíteni, hogyan lehet a BIC code-ot elszámolási helyként rögzíteni illetve egy adott ügyfélhez pedig, hogyan tudunk elszámolási helyen nyitott számlát kapcsolni.

    2. Új SSI PSET hozzáadása

      Kiegészítve a sort további magyarázattal "Ennek a funkció gombnak a segítségével tudom a már rögzített BIC code-ot pl: KELRHUHBXXX úgy beállítani mint PSET = place of settlement = elszámolási hely.

    1. Vous l'aurez sûrement remarqué dans le code HTML : c'est un peu déroutant, mais il est conseillé d'écrire les balises dans l'ordre suivant :

      je ne vois pas le changement quand on inverse. Pourquoi faut il cet ordre?

    1. Visual Studio Code

      Pour avoir la version française de Visual Code : - Cliquer sur "Extensions" - Taper "French" dans la barre de recherche - Choisir le premier résultat (ou sinon, regarder si c'est qqch "d'officiel") - Télécharger … et c'est prêt !

    1. find out that I didn't have the whole picture, the problem was messier than it first appeared, and there were perfectly valid reasons for the code being that way

      I've tried using a hiking metaphor to describe a similar phenomenon (specifically, and perversely, as a preface when trying to explain second panel syndrome.

    2. I'd be looking at some code at my desk, and it made no sense. Why would anyone write it like this? There's an obvious and cleaner way to approach the same problem.So I'd go down the hall to the person who wrote it in the first place and start asking questions...and find out that I didn't have the whole picture, the problem was messier than it first appeared, and there were perfectly valid reasons for the code being that way. This happened again and again. Sometimes I did find a real flaw, but even then it may have only occurred with data that wasn't actually possible (because, for example, it was filtered by another part of the system). Talking face to face changed everything, because they could draw diagrams, pull out specs, and give concrete examples.I think that initial knee-jerk "I've been looking at this for ten seconds and now let me explain the critical flaws" reaction is a common one among people with engineering mindsets.

      Humility!

    1. Reviewer #2 (Public review):

      Summary:

      Yamashiro et al. investigated how the transient absence of visual input (i.e., darkness) impacts tactile neural encoding in the rat primary somatosensory cortex (S1). They recorded local field potentials (LFPs) using a 32-channel array implanted in forelimb and hindlimb primary somatosensory cortex while rats walked on smooth or rough textures under illuminated and dark conditions. Employing a convolutional neural network (CNN), they successfully decoded both texture and lighting conditions from the LFPs. The authors conclude that the subtle differences in LFP patterns underlie tactile representation of surface roughness and become more distinct in darkness, suggesting a rapid cross-modal reorganization of the neural code for this sensory feature.

      Strengths:

      (1) The manuscript addresses a valuable question regarding how sensory cortices adapt dynamically to changes in sensory context.

      (2) Utilization of machine learning (CNNs) allowed the authors to go beyond conventional amplitude-based analyses, potentially uncovering a subtle but interesting phenomenon.

      Weaknesses:

      (1) Despite applying explainability techniques to the CNN-based decoder, the study does not clearly demonstrate the precise "subtle, high-dimensional patterns" exploited by the CNN for surface roughness decoding, limiting the physiological interpretability of the results. Additional analyses (e.g., detailed waveform morphology analysis on grand averages, time-frequency decompositions, or further use of explainability methods) are necessary to clarify the exact nature of the discriminative activity features enabling the CNN to decode surface roughness and how these change with the sensory context (i.e., in light or darkness).

      (2) The claim regarding cross-modal representation reorganization heavily relies on a silhouette analysis (Figure 5C), which shows a modest effect size and borderline statistical significance (p≈0.05 with n=9+2). More rigorous statistical quantification, such as permutation tests and reporting underlying cluster distances for all animals, would strengthen confidence in this finding.

      (3) While the authors recorded in the somatosensory cortex, primarily known for its tactile responsivity, I would be cautious not to rule out a priori the presence of crossmodal (visual) responses in the area. In this case, the stronger texture separation in darkness might be explained by the absence of some visually-evoked potentials (VEPs) rather than genuine cross-modal reorganization. Clarification is needed to rule out visual interference and this would strengthen the claim.

      (4) Behavioural controls are limited to gross gait parameters; more detailed analyses of locomotor behavior and additional metrics (e.g., pupil size or locomotor variance) would robustly rule out potential arousal or motor confounds.

      (5) The consistent ordering of trials (10 minutes of light then 10 minutes of dark) could introduce confounds such as fatigue or satiation (and also related arousal state), which should be controlled by analyzing sessions with reversed condition ordering.

      (6) The focus on forelimb-aligned LFP analyses raises the possibility that hindlimb-aligned data might yield different conclusions, suggesting alignment effects might bias the results.

      (7) The authors' dismissal of amplitude-based metrics as ineffective is inadequately substantiated. A clearer demonstration (e.g., event-related waveforms averaged by conditions, presented both spatially and temporally) would support this claim.

      (8) Wording ambiguity regarding "attribution score" versus "activation amplitude" (Figure 5) complicates the interpretation of key findings. This distinction must be clarified for proper assessment of the results.

      (9) Generalization across animals remains unaddressed. The current within-subject decoding setup limits conclusions regarding shared neural representations across individuals. Adopting cross-validation strategies and exploring between-animal analyses would add significant value to the manuscript.

    1. Reviewer #2 (Public review):

      Summary:

      The manuscript by Yu et al. describes a novel approach for collecting complex and different cognitive phenotypes in individually housed mice in their home cage. The authors report a simple yet elegant design that they developed for assessing a variety of complex and novel behavioral paradigms autonomously in mice.

      Strengths:

      The data are strong, the arguments are convincing, and I think the manuscript will be highly cited given the complexity of behavioral phenotypes one can collect using this relatively inexpensive ($100/box) and high-throughput procedure (without the need of human interaction). Additionally, the authors include a machine learning algorithm to correct for erroneous strategies that mice develop which is incredibly elegant and important for this approach, as mice will develop odd strategies when given complete freedom.

      Weaknesses:

      A limitation to this approach is that it requires mice to be individually housed for days to months. This is now adequately addressed in the discussion.

      A major issue with continuous self-paced tasks such as the autonomous d2AFC used by the authors is that the inter-trial intervals can vary significantly. Mice may do a few trials, lose interest and disengage from the task for several hours. This is problematic for data analysis that relies on trial duration to be similar between trials (e.g., reinforcement learning algorithms). The authors now provide information regarding task engagement of the mice across a 24 hour cycle (e.g., trials started, trials finished across a 24 h period).

      Movies - it would be beneficial for the authors to add commentary to the video (hit, miss trials). It was interesting watching the mice but not clear whether they were doing the task correctly or not. The new videos adequately address these concerns.

      The strength of this paper (from my perspective) is the potential utility it has for other investigators trying to get mice to do behavioral tasks. However, not enough information was provided about the construction of the boxes, interface, and code for running the boxes. If the authors are not willing to provide this information through eLife, GitHub, or their own website then my evaluation of impact and significance of this paper would go down significantly. This information is now available to readers.

      Minor concerns

      Learning rate is confusing for Figure 3 results as it actually refers to trials to reach criterion, and not the actual rate of learning (e.g., slope). This has been modified in the manuscript.

      Comments on revisions:

      The authors have addressed all my concerns regarding this very exciting manuscript.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      This is a new and important system that can efficiently train mice to perform a variety of cognitive tasks in a flexible manner. It is innovative and opens the door to important experiments in the neurobiology of learning and memory. 

      Strengths: 

      Strengths include: high n's, a robust system, task flexibility, comparison of manual-like training vs constant training, circadian analysis, comparison of varying cue types, long-term measurement, and machine teaching. 

      Weaknesses: 

      I find no major problems with this report. 

      Minor weaknesses: 

      (1)  Line 219: Water consumption per day remained the same, but number of trails triggered was more as training continued. First, is this related to manual-type training? Also, I'm trying to understand this result quantitatively, since it seems counter-intuitive: I would assume that with more trials, more water would be consumed since accuracy should go up over training (so more water per average trial). Am I understanding this right? Can the authors give more detail or understanding to how more trials can be triggered but no more water is consumed despite training? 

      Thanks for the comment. We would like to clarify the phenomenon described in Line 219: As the training advanced, the number of trials triggered by mice per day decreased (rather than increased as you mentioned in the comment) gradually for both manual and autonomous groups of mice (Fig. 2H left). The performance, as you mentioned, improved over time (Fig. 2D and 2E), leading to an increased probability of obtaining water and thus relatively stable daily water intake (Fig. 2H middle). We believe the stable daily intake is the minimum amount of water required by the mice under circumstance of autonomous behavioral training. To make the statement more clearly, we indicated the corresponding figure numbers in the text.

      Results “… As shown in Fig. 2H, autonomous training yielded significantly higher number of trial/day (980 ± 25 vs. 611 ± 26, Fig. 2H left) and more volume of water consumption/day (1.65 ± 0.06 vs. 0.97 ± 0.03 ml, Fig. 2H middle), which resulted in monotonic increase of body weight that was even comparable to the free water group (Fig.2H right). In contrast, the body weight in manual training group experienced a sharp drop at the beginning of training and was constantly lower than autonomous group throughout the training stage (Fig. 2H right).”

      (2) Figure 2J: The X-axis should have some label: at least "training type". Ideally, a legend with colors can be included, although I see the colors elsewhere in the figure. If a legend cannot be added, then the color scheme should be explained in the caption.

      Thanks for the suggestion. The labels with corresponding colors for x-axis have been added for Fig. 2J.

      (3) Figure 2K: What is the purple line? I encourage a legend here. The same legend could apply to 2J.

      Thanks for the suggestion. The legend has been added for Fig. 2K.

      (4) Supplementary Figure S2 D: I do not think the phrase "relying on" is correct. Instead, I think "predicted by" or "correlating with" might be better. 

      We thank the reviewer for the valuable suggestion. The phrase has been changed to ‘predicted by’ for better suitability.

      Figure S2 “(D), percentage of trials significantly predicted by different regressors during task learning. …”

      Reviewer #2 (Public review): 

      Summary: 

      The manuscript by Yu et al. describes a novel approach for collecting complex and different cognitive phenotypes in individually housed mice in their home cage. The authors report a simple yet elegant design that they developed for assessing a variety of complex and novel behavioral paradigms autonomously in mice. 

      Strengths: 

      The data are strong, the arguments are convincing, and I think the manuscript will be highly cited given the complexity of behavioral phenotypes one can collect using this relatively inexpensive ($100/box) and high throughput procedure (without the need for human interaction). Additionally, the authors include a machine learning algorithm to correct for erroneous strategies that mice develop which is incredibly elegant and important for this approach as mice will develop odd strategies when given complete freedom. 

      Weaknesses:

      (1) A limitation of this approach is that it requires mice to be individually housed for days to months. This should be discussed in depth. 

      Thank you for raising this important point. We agree that the requirement for individual housing of mice during the training period is a limitation of our approach, and we appreciate the opportunity to discuss this in more depth. In the manuscript, we add a section to the Discussion to address this limitation, including the potential impact of individual housing on the mice, the rationale for individual housing in our study, and efforts or alternatives made to mitigate the effects of individual housing.

      Discussion “… Firstly, our experiments were confined to single-housed mice, which is known to influence murine behavior and physiology, potentially affecting social interaction and stress levels [76]. In our study, individual housing was necessary to ensure precise behavioral tracking, eliminate competitive interactions during task performance, and maintain consistent training schedules without disruptions from cage-mate disturbances. However, the potential of group-housed training has been explored with technologies such as RFID [28,29,32–34] to distinguish individual mice, which potentially improving the training efficiency and facilitating research of social behaviors [77]. Notably, it has shown that simultaneous training of group-housed mice, without individual differentiation, can still achieve criterion performance [25].”

      (2) A major issue with continuous self-paced tasks such as the autonomous d2AFC used by the authors is that the inter-trial intervals can vary significantly. Mice may do a few trials, lose interest, and disengage from the task for several hours. This is problematic for data analysis that relies on trial duration to be similar between trials (e.g., reinforcement learning algorithms). It would be useful to see the task engagement of the mice across a 24-hour cycle (e.g., trials started, trials finished across a 24-hour period) and approaches for overcoming this issue of varying inter-trial intervals. 

      Thank you for your insightful comment regarding the variability in inter-trial intervals and its potential impact on data analysis. We agree that this is an important consideration for continuous self-paced tasks.

      In our original manuscript, we have showed the general task engagement across 24-hour cycle (Fig. 2K), which revealed two peaks of engagements during the dark cycle with relatively fewer trials during the light cycle. To facilitate analyses requiring consistent trial durations, we defined trial blocks as sequences between two no-response trials. Notably, approximately 66.6% of trials occurred within blocks of >5 consecutive trials (Fig. 2L), which may be particularly suitable for such analyses.

      In the revised manuscript, we also added the analysis of the histogram of inter-trial-interval for both the autonomous and manual training paradigms in HABITS (Fig. S2H), which shows that around 55.2% and 77.5% of the intervals are less than 2 seconds in autonomous and manual training, respectively.

      Results “… We found more than two-third of the trials was done in >5-trial blocks (Fig. 2L left) which resulted in more than 55% of the trials were with inter-trial-interval less than 2 seconds (Fig. S2H).”

      Regarding the approaches to mitigate the issue of varying inter-trial interval, we observed that manual training (i.e., manually transferring to HABITS for ~2 hr/day) in Fig. S2H resulted in more trials with short inter-trial-interval, suggesting that constrained access time promotes task engagement and reduces interval variability. Fig. 2L also indicated that the averaged correct rate increased and the earlylick rate decreased as the length of block increased. This approach could be valuable for studies where consistent trial timing is critical. In the context of our study, we could actually introduce a light, for example, to serve as the cue that prompt the animals to engage during a fixed time duration in a day.

      Discussion “… In contrast, the self-paced nature of autonomous training may permit greater variability in attentional engagement 83 and inter-trial-intervals, which could be problematic for data analysis relaying on consistent intervals and/or engagements. Future studies should explore how controlled contextual constraints enhance learning efficiency and whether incorporating such measures into HABITS could optimize its performance.”

      (3) Movies - it would be beneficial for the authors to add commentary to the video (hit, miss trials). It was interesting watching the mice but not clear whether they were doing the task correctly or not. 

      Thanks for the reminder. We have added subtitles to both of the videos. Since the supplementary video1 was not recorded with sound, the correctness of the trials was hard to judge. We replaced the video with another one with clear sound recordings, and the subtitles were commented in detail.

      (4) The strength of this paper (from my perspective) is the potential utility it has for other investigators trying to get mice to do behavioral tasks. However, not enough information was provided about the construction of the boxes, interface, and code for running the boxes. If the authors are not willing to provide this information through eLife, GitHub, or their own website then my evaluation of the impact and significance of this paper would go down significantly. 

      Thanks for this important comment. We would like to clarify that the construction methods, GUI, code for our system, PCB and CAD files (newly uploaded) have already been made publicly available on https://github.com/Yaoyao-Hao/HABITS. Additionally, we have open-sourced all the codes and raw data for all training protocols (https://doi.org/10.6084/m9.figshare.27192897). We will continue to maintain these resources in the future.

      Minor concerns: 

      (5) Learning rate is confusing for Figure 3 results as it actually refers to trials to reach the criterion, and not the actual rate of learning (e.g., slope).

      Thanks for pointing this out. The ‘learning rate’ which refers to trial number to reach criterion has been changed to ‘the number of trials to reach criterion’.

      Reviewer #3 (Public review): 

      Summary: 

      In this set of experiments, the authors describe a novel research tool for studying complex cognitive tasks in mice, the HABITS automated training apparatus, and a novel "machine teaching" approach they use to accelerate training by algorithmically providing trials to animals that provide the most information about the current rule state for a given task. 

      Strengths: 

      There is much to be celebrated in an inexpensively constructed, replicable training environment that can be used with mice, which have rapidly become the model species of choice for understanding the roles of distinct circuits and genetic factors in cognition. Lingering challenges in developing and testing cognitive tasks in mice remain, however, and these are often chalked up to cognitive limitations in the species. The authors' findings, however, suggest that instead, we may need to work creatively to meet mice where they live. In some cases, it may be that mice may require durations of training far longer than laboratories are able to invest with manual training (up to over 100k trials, over months of daily testing) but the tasks are achievable. The "machine teaching" approach further suggests that this duration could be substantially reduced by algorithmically optimizing each trial presented during training to maximize learning. 

      Weaknesses: 

      (1) Cognitive training and testing in rodent models fill a number of roles. Sometimes, investigators are interested in within-subjects questions - querying a specific circuit, genetically defined neuron population, or molecule/drug candidate, by interrogating or manipulating its function in a highly trained animal. In this scenario, a cohort of highly trained animals that have been trained via a method that aims to make their behavior as similar as possible is a strength. 

      However, often investigators are interested in between-subjects questions - querying a source of individual differences that can have long-term and/or developmental impacts, such as sex differences or gene variants. This is likely to often be the case in mouse models especially, because of their genetic tractability. In scenarios where investigators have examined cognitive processes between subjects in mice who vary across these sources of individual difference, the process of learning a task has been repeatedly shown to be different. The authors do not appear to have considered individual differences except perhaps as an obstacle to be overcome. 

      The authors have perhaps shown that their main focus is highly-controlled within-subjects questions, as their dataset is almost exclusively made up of several hundred young adult male mice, with the exception of 6 females in a supplemental figure. It is notable that these female mice do appear to learn the two-alternative forced-choice task somewhat more rapidly than the males in their cohort.

      Thank you for your insightful comments and for highlighting the importance of considering both within-subject and between-subject questions in cognitive training and testing in rodent models. We acknowledge that our study primarily focused on highly controlled within-subject questions. However, the datasets we provided did show preliminary evidences for the ‘between-subject’ questions. Key observations include:

      The large variability in learning rates among mice observed in Fig. 2I;

      The overall learning rate difference between male and female subjects (Fig. 2D vs. Fig. S2G);

      The varying nocturnal behavioral patterns (Fig. 2K), etc.

      We recognize the value of exploring between-subjects differences in mouse model and discussed more details in the Discussion part.

      Discussion “Our study was designed to standardize behavior for the precise interrogation of neural mechanisms, specifically addressing within-subject questions. However, investigators are often interested in between-subject differences—such as sex differences or genetic variants—which can have long-term behavioral and cognitive implications [72,74]. This is particularly relevant in mouse models due to their genetic tractability [75]. Although our primary focus was not on between-subject differences, the dataset we generated provides preliminary evidence for such investigations. Several behavioral readouts revealed individual variability among mice, including large disparities in learning rates across individuals (Fig. 2I), differences in overall learning rates between male and female subjects (Fig. 2D vs. Fig. S2G), variations in nocturnal behavioral patterns (Fig. 2K), etc.”

      (2) Considering the implications for mice modeling relevant genetic variants, it is unclear to what extent the training protocols and especially the algorithmic machine teaching approach would be able to inform investigators about the differences between their groups during training. For investigators examining genetic models, it is unclear whether this extensive training experience would mitigate the ability to observe cognitive differences, or select the animals best able to overcome them - eliminating the animals of interest. Likewise, the algorithmic approach aims to mitigate features of training such as side biases, but it is worth noting that the strategic uses of side biases in mice, as in primates, can benefit learning, rather than side biases solely being a problem. However, the investigators may be able to highlight variables selected by the algorithm that are associated with individual strategies in performing their tasks, and this would be a significant contribution.

      Thank you for the insightful comments. We acknowledge that the extensive training experience, particularly through the algorithmic machine teaching approach, could potentially influence the ability to observe cognitive differences between groups of mice with relevant genetic variants. However, our study design and findings suggest that this approach can still provide valuable insights into individual differences and strategies used by the animals during training. First, the behavioral readout (including learning rate, engagement pattern, etc.) as mentioned above, could tell certain number of differences among mice. Second, detailed modelling analysis (with logistical regression modelling) could further dissect the strategy that mouse use along the training process (Fig. S2B). We have actually highlighted some variables selected by the regression that are associated with individual strategies in performing their tasks (Fig. S2C) and these strategies could be different between manual and autonomous training groups (Fig. S2D). We included these comments in the Discussion part for further clearance.

      Discussion “… Furthermore, a detailed logistic regression analysis dissected the strategies mice employed during training (Fig. S2B). Notably, the regression identified variables associated with individual task-performance strategies (Fig. S2C), which also differed between manually and autonomously trained groups (Fig. S2D). Thus, our system could facilitate high-throughput behavioral studies exploring between-subject differences in the future.”

      (3) A final, intriguing finding in this manuscript is that animal self-paced training led to much slower learning than "manual" training, by having the experimenter introduce the animal to the apparatus for a few hours each day. Manual training resulted in significantly faster learning, in almost half the number of trials on average, and with significantly fewer omitted trials. This finding does not necessarily argue that manual training is universally a better choice because it leads to more limited water consumption. However, it suggests that there is a distinct contribution of experimenter interactions and/or switching contexts in cognitive training, for example by activating an "occasion setting" process to accelerate learning for a distinct period of time. Limiting experimenter interactions with mice may be a labor-saving intervention, but may not necessarily improve performance. This could be an interesting topic of future investigation, of relevance to understanding how animals of all species learn.

      Thank you for your insightful comments. We agree that the finding that manual training led to significantly faster learning compared to self-paced training is both intriguing and important. One of the possible reasons we think is due to the limited duration of engagement provided by the experimenter in the manual training case, which forced the mice to concentrate more on the trials (thus with fewer omitting trials) than in autonomous training. Your suggestion that experimenter interactions might activate an "occasion setting" process is particularly interesting. In the context of our study, we could actually introduce, for example, a light, serving as the cue that prompt the animals to engage; and when the light is off, the engagement was not accessible any more for the mice to simulate the manual training situation. We agree that this could be an interesting topic for future investigation that might create a more conducive environment for learning, thereby accelerating the learning rate.

      Discussion “… Lastly, while HABITS achieves criterion performance in a similar or even shorter overall days compared to manual training, it requires more trials to reach the same learning criterion (Fig. 2G). We hypothesize that this difference in trial efficiency may stem from the constrained engagement duration imposed by the experimenter in manual training, which could compel mice to focus more intensely on task execution, resulting in less trial omissions (Fig. 2F). In contrast, the self-paced nature of autonomous training may permit greater variability in attentional engagement 83 and inter-trial-intervals, which could be problematic for data analysis relaying on consistent intervals and/or engagements. Future studies should explore how controlled contextual constraints enhance learning efficiency and whether incorporating such measures into HABITS could optimize its performance.”

      Reviewer #2 (Recommendations for the authors):

      As I mentioned in the weaknesses, I did not see code or CAD drawings for their home cages and how these interact with a computer.

      Thanks for the comment. We would like to clarify that the construction methods, GUI, code for our system, PCB and CAD files (newly uploaded) have already been made publicly available on https://github.com/Yaoyao-Hao/HABITS.

    1. From NavBar Mess to React Router Success! Fixing Layouts, Routing & AIOHA in One Go 🚀

      Transforming a chaotic navigation bar into a functional component using React Router. While also integrating AIOHA for user authentication. And creating a reusable layout to streamline React app development. The update emphasizes improvements in user experience and code organization. As foundational steps for their ongoing project.

    1. Level Up Your Imports: Using Path Aliases for Cleaner React + TypeScript Projects 🚀

      A guide on using path aliases in React and TypeScript projects. To simplify and clean up import statements. Making code easier to maintain. See a step-by-step process for setting up path aliases. Ultimately transitioning from cumbersome relative imports to more readable ones.

    1. Day 6 🚀 - Learning ReactJS: Calling APIs & Creating Context Providers | Distriator Feature Update

      Learning ReactJS to call APIs and implement Context Providers for managing business data. See a new feature in the Distriator project that displays the Hive Power of businesses. Emphasizing the importance of organized code structure and TypeScript for predictable data handling.

    1. AbstractUncovering the epigenomic regulation of immune responses is essential for a comprehensive understanding of host defence mechanisms, though remains poorly investigated in farmed fish. We report the first annotation of the innate immune regulatory response in the turbot genome (Scophthalmus maximus), integrating RNA-Seq with ATAC-Seq and ChIP-Seq (H3K4me3, H3K27ac and H3K27me3) data from head kidney (in vivo) and primary leukocyte cultures (in vitro) 24 hours post-stimulation with viral (poly I:C) and bacterial (inactive Vibrio anguillarum) mimics. Among the 8,797 differentially expressed genes (DEGs), we observed enrichment of transcriptional activation pathways in response to Vibrio and immune pathways - including interferon stimulated genes - for poly I:C. We identified notable differences in chromatin accessibility (20,617 in vitro, 59,892 in vivo) and H3K4me3-bound regions (11,454 in vitro, 10,275 in vivo) between stimulations and controls. Overlap of DEGs with promoters showing differential accessibility or histone mark binding revealed significant coupling of the transcriptome and chromatin state. DEGs with activation marks in their promoters were enriched for similar functions to the global DEG set, but not always, suggesting key regulatory genes being in poised state. Active promoters and putative enhancers were enriched in specific transcription factor binding motifs, many common to viral and bacterial responses. Finally, an in-depth analysis of immune response changes in chromatin state surrounding key DEGs encoding transcription factors was performed. This multi-omics investigation provides an improved understanding of the epigenomic basis for the turbot immune responses and provides novel functional genomic information, leverageable for disease resistance selective breeding.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf077), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer name: Laura Caquelin

      1. Summary of the Study This study provides the first multi-omics investigation of the innate immune response in turbot (Scophthalmus maximus). By integrating RNA-Seq, ATAC-Seq, and ChIP-Seq data, researchers identified changes in gene expression, chromatin accessibility, and histone modifications after viral and bacterial stimulation. The findings reveal a significant coupling between the transcriptome and chromatin state, offering insights for the selection of disease resistance in aquaculture.

      2. Scope of reproducibility

      According to our assessment the primary objective is: Association of ATAC-Seq and ChIP-Seq data with RNA-Seq data

      ● Outcome: Overlap of promoter DARs and DHMRs with DEG promoters ● Analysis method outcome: Hypergeometric test ● Main result: "DARs and DHMRs were much more overrepresented at the promoter regions of upregulated rather than downregulated DEGs" (Table 4, Supplementary Table 11; Lines 403-405, Page 9)

      1. Availability of Materials a. Data ● Data availability: Raw data are available, but generated data from the study are shared with the journal and not yet publicly available ● Data completeness: Complete ● Access Method: Manuscript's supplementary files/Private journal dropbox ● Repository: - ● Data quality: Structured, but lacks variable definitions in supplementary files, making it difficult to interpret and use. b. Code ● Code availability: Not available for the primary result ● Programming Language(s): Excel ● Repository link: - ● License: - ● Repository status: - ● Documentation: README lacks information on hypergeometric test.

      2. Computational environment of reproduction analysis

      ● Operating system for reproduction: MacOS 14.7.4 ● Programming Language(s): Excel ● Code implementation approach: Excel formulas based on methodology description provided by authors ● Version environment for reproduction: Excel version 16.94

      1. Results

      5.1 Original study results ● Results 1: Table 4 and supplementary table 11

      5.3 Steps for reproduction

       Reproduce supplementary table 11 to perform hypergeometric test * Issue 1: No code or instructions for constructing Table 4 in manuscript and README text. ▪ Resolved: Authors shared methodology upon request Authors' Clarification: The hypergeometric test wasn't carried out with any particular script but with the following public online tool, that can be replicated in excel: https://systems.crump.ucla.edu/hypergeometric/ The tool basically runs the following excel formulas: Cumulative distribution function (CDF) of the hypergeometric distribution in Excel =IF(k>=expected,1-HYPGEOM.DIST(k-1,s,M,N,TRUE),HYPGEOM.DIST(k,s,M,N,TRUE)) =IF(k>=((sM)/N),1-HYPGEOM.DIST(k-1,s,M,N,TRUE),HYPGEOM.DIST(k,s,M,N,TRUE)) expected = (sM)/N direction =IF(k=expected,"match",IF(k<expected,"de-enriched","enriched")) fold change =IF(k<expected,expected/k,k/expected)

      where k is the number of successes (intersection of DAR/DHMR in promoters + DEG), s the sample size (DEG), M the number of successes in the population (DAR/DHMR in promoters) and N the population size (28.602 genes). For each condition, the count of downregulated and upregulated DEG (s) was taken from supplementary table 4. Similarly, the count of downregulated and upregulated DAR/DHMR (M) was taken from supplementary table 10, considering only differential peaks that are annotated as "promoter-TSS" in the annotation column (column M). The population size (N) was the total list of genes that were DEG, DAR or DHMR (combining the data on supplementary tables 4 and 11, eliminating duplicates). Finally, the intersection of of DAR and DEG (k) for each condition was retrieved with the following venn diagram online tool: https://bioinformatics.psb.ugent.be/webtools/Venn/" * Issue 2: Discrepancies in DEG counts from supplementary table 11 ▪ Resolved: Investigated variable definitions (using the wrong variable - strand), confirmed that log2FoldChange determines up/down-regulation * Issue 3: Filling in DAR/DHMR values ▪ Unresolved: Unclear correspondence between "promoters" rows and excel file sheets. Does H3K27me3 correspond to the promoters? * Issue 4: Using the Venn diagram tool to find intersections ▪ Unresolved: Worked for one condition (ATC vivo poly (down)) but failed for ATAC vitro-vibrio and ATAC-vivo-vibrio. Tool returns a "Request Entity Too Large" error. * Issue 5: Define the population size ▪ Unresolved: The instructions for defining the population size are not clear. In supplementary table 4, it seems that the variable "Gene ID (ENSEMBL)" should be used, but in supplementary table 10, should the variable "Nearest PromoterID" or "Gene symbol" be used?  Using supplementary table 11 values to perform hypergeometric test Having failed to obtain the values required to reproduce supplementary table 11, the data already provided were used to obtain the "enrichment" and "p-value" values using the excel function provided. * Issue 1: Comparison of p-values ▪ Resolved: For Up condition, extremely small p-values are not displayed correctly due to Excel's limitations in scientific notation. Excel may either display them as zero or in an incomplete scientific format (e.g., 0.00E+00). Using the tool on the web.

      5.4 Statistical comparison Original vs Reproduced results ● Results: Based on the available data in supplementary table 11, the "enrichment" and "p-value" values have been successfully reproduced in most cases. ● Comments: The full table could not be reproduced, particularly the data corresponding to DAR/DHMR, DAR/DHMR+DEG and population size values, due to missing information or unclear definitions in the supplementary files. ● Errors detected: The enrichment value for the Up condition of promoters-vitro-vibrio was incorrectly reported in the manuscript/table. Based on the Excel formula and the online tool used, the correct value appears to be 2.28 instead of 2.82. ● Statistical Consistency: All the values that could be reproduced from the available data matched the original results, except for the detected error.

      1. Conclusion
      2. Summary of the computational reproducibility review The study's results were partially reproduced. Key values such as enrichment and p-values were successfully replicated, but some dataset elements (DAR/DHMR, DAR/DHMR+DEG, and size population) could not be verified due to insufficient methodological details provided in the manuscript. An error in the enrichment value for the Up condition of promoters-vitro-vibrio was identified (2.28 instead of 2.82). The p values used for statistical inference were however successfully reproduced.

      3. Recommendations for authors o Improve data documentation: Define variables in supplementary files. o Provide all code and scripts: Share the excel formulas used for table 4/supplementary table 11. o Clarify statistical methodology: Include detailed methods description for the hypergeometric test. o Enhance reproducibility workflow: Provide a structured README with all necessary steps.

  7. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
    1. Content or information on social media that may hurt your chance of being hired includes: Inappropriate or provocative pictures, videos, or comments. Evidence of drinking or using recreational drugs. Discriminatory comments. Negative or overly critical comments about previous employers or co-workers. Evidence of sub-par communication skills.

      This list would make a great discussion starter in a high school English class, especially when we talk about ethos and credibility. It’s a real-world example of how communication (tone, content, even grammar) can affect how others perceive you. I could see using this list to help students evaluate sample social media posts and consider how their own digital presence contributes to or undermines the persona they want to build. It also offers a bridge to teach professionalism and code-switching across contexts.

  8. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
  9. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
    1. And that’s my exact argument, that we all should know everybody’s dialect,at least as many as we can, and be open to the mix of them in oral and writtencommunication (Young)

      Young mentioned we should respect of others backgrounds, and doubts code-switching. Educational institution support single academic system because of its simplicity.

  10. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
    1. Created LLM extension tools to help translate complex internal wikipedia pages to hyperlinked code snippets to help internal customers use the project at low-level logic, increasing efficiency by 300%.

      Provide context on what 'efficiency' means here. What specific tasks were made easier or faster?

    2. Automated robust CI/CD by building custom pipelines to unit, load, and integration test the code with 100% code coverage, enhancing safety in deployment into production waves.

      Specify how this automation improved deployment frequency or reduced errors in production.

    1. Reviewer #2 (Public review):

      Summary:

      This manuscript uses single-molecule run-off experiments and TASEP/HMM models to estimate biophysical parameters, i.e., ribosomal initiation and elongation rates. Combining inferred initiation and elongation rates, the authors quantify ribosomal density. TASEP modeling was used to simulate the mechanistic dynamics of ribosomal translation, and the HMM is used to link ribosomal dynamics to microscope intensity measurements. The authors' main conclusions and findings are:

      (1) Ribosomal elongation rates and initiation rates are strongly coordinated.

      (2) Elongation rates were estimated between 1-4.5 aa/sec. Initiation rates were estimated between 0.5-2.5 events/min. These values agree with previously reported values.

      (3) Ribosomal density was determined below 12% for all constructs and conditions.

      (4) eIF5A-perturbations (KO and GC7 inhibition) resulted in non-significant changes in translational bursting and ribosome density.

      (5) eIF5A perturbations resulted in increases in elongation and decreases in initiation rates.

      Strengths:

      This manuscript presents an interesting scientific hypothesis to study ribosome initiation and elongation concurrently. This topic is highly relevant for the field. The manuscript presents a novel quantitative methodology to estimate ribosomal initiation rates from Harringtonine run-off assays. This is relevant because run-off assays have been used to estimate, exclusively, elongation rates.

      Weaknesses:

      The conclusion of the strong coordination between initiation and elongation rates is interesting, but some results are unexpected, and further experimental validation is needed to ensure this coordination is valid.

      (1) eIF5a perturbations resulted in a non-significant effect on the fraction of translating mRNA, translation duration, and bursting periods. Given the central role of eIF5a, I would have expected a different outcome. I would recommend that the authors expand the discussion and review more literature to justify these findings.

      (2) The AAG construct leading to slow elongation is very surprising. It is the opposite of the field consensus, where codon-optimized gene sequences are expected to elongate faster. More information about each construct should be provided. I would recommend more bioinformatic analysis on this, for example, calculating CAI for all constructs, or predicting the structures of the proteins.

      (3) The authors should consider using their methodology to study the effects of modifying the 5'UTR, resulting in changes in initiation rate and bursting, such as previously shown in reference Livingston et al., 2023. This may be outside of the scope of this project, but the authors could add this as a future direction and discuss if this may corroborate their conclusions.

      (4) The mathematical model and parameter inference routines are central to the conclusions of this manuscript. In order to support reproducibility, the computational code should be made available and well-documented, with a requirements file indicating the dependencies and their versions.

  11. Jul 2025
    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors aim to explore the effects of the electrogenic sodium-potassium pump (Na<sup>+</sup>/K<sup>+</sup>-ATPase) on the computational properties of highly active spiking neurons, using the weakly-electric fish electrocyte as a model system. Their work highlights how the pump's electrogenicity, while essential for maintaining ionic gradients, introduces challenges in neuronal firing stability and signal processing, especially in cells that fire at high rates. The study identifies compensatory mechanisms that cells might use to counteract these effects, and speculates on the role of voltage dependence in the pump's behavior, suggesting that Na<sup>+</sup>/K<sup>+</sup>-ATPase could be a factor in neuronal dysfunctions and diseases

      Strengths:

      (1) The study explores a less-examined aspect of neural dynamics-the effects of (Na<sup>+</sup>/K<sup>+</sup>-ATPase) electrogenicity. It offers a new perspective by highlighting the pump's role not only in ion homeostasis but also in its potential influence on neural computation.

      (2) The mathematical modeling used is a significant strength, providing a clear and controlled framework to explore the effects of the Na+/K+-ATPase on spiking cells. This approach allows for the systematic testing of different conditions and behaviors that might be difficult to observe directly in biological experiments.

      (3) The study proposes several interesting compensatory mechanisms, such as sodium leak channels and extracellular potassium buffering, which provide useful theoretical frameworks for understanding how neurons maintain firing rate control despite the pump's effects.

      Weaknesses:

      (1) While the modeling approach provides valuable insights, the lack of experimental data to validate the model's predictions weakens the overall conclusions.

      (2) The proposed compensatory mechanisms are discussed primarily in theoretical terms without providing quantitative estimates of their impact on the neuron's metabolic cost or other physiological parameters.

      We thank the reviewer for their concise and accurate summary and appreciate the constructive feedback on the article’s strengths and weaknesses. Experimental work is beyond the scope of our modeling-based study. However, we would like our work to serve as a framework for future experimental studies into the role of the electrogenic pump current (and its possible compensatory currents) in disease, and its role in evolution of highly specialized excitable cells (such as electrocytes).

      Quantitative estimates of metabolic costs in this study are limited to the ATP that is required to fuel the pump. By integrating the net pump current over time and dividing by one elemental charge, one can find the rate of ATP that is consumed by the Na<sup>+</sup>/K<sup>+</sup>pump for either compensatory mechanism. The difference in net pump current is thus proportional to ATP consumption, which allows for a direct comparison of the cost efficiency of the Na<sup>+</sup>/K<sup>+</sup> pump for each proposed compensatory mechanism. The Na<sup>+</sup>/K<sup>+</sup> pump is, however, not the only ATP-consuming element in the electrocyte, and some of the compensatory mechanisms induce other costs related to cell

      ‘housekeeping’ or presynaptic processes. We now added a section in the appendix titled

      ‘Considerations on metabolic costs of compensatory mechanisms’ (section 11.4), where we provide ballpark estimates for the influence of the compensatory mechanisms on the total metabolic costs of the cell and membrane space occupation. Although we argue that according these estimates, the impact of discussed compensatory mechanisms could be significant, due to the absence of more detailed experimental quantification, a plausible quantitative cost approximation on the whole cell level remains beyond the scope of this article.

      Reviewer #1 (Recommendations for the authors):

      (1)  For the f-I curves in Figures 1 and 6, the firing rate increases as the input current increases. I am curious to know: (a) whether the amplitudes of the action potentials (APs) vary with increased input current; (b) whether the waveform of APs (such as in Fig. 1I) transitions into smaller amplitude oscillations at higher input currents; and (c) if the waveform does change at higher input currents, how do the "current contributions," "current," and "ion exchanges per action potential" in Figures 1HJ and 6AB respond?

      To fully answer these questions, we added a supplemental figure with accompanied text in section 11.1 (Fig. A1). We also added a reference to this figure in the main text (section 4.1). Here, it is shown that, as previously illustrated in [1], AP amplitude decreases when the input current increases (Fig. A1 A, left). This effect remains upon addition of either a pump with constant pump rate and co-expressed sodium leak channels (Fig. A1 A, center), or a voltage-dependent pump (Fig. A1 A, right). Interestingly, even though the shape of the current contributions (Fig. A1 B) and the APs (Fig. A1 C) look very different for low (Fig. A1 C, top) and high inputs (Fig. A1 C, bottom), the total sodium and potassium displacement per AP, and thus the pump rate, is roughly the same (Fig. A1 D). Under the assumption that voltage-gated sodium channel (NaV) expression is adjusted to facilitate fixed-AP amplitudes, however, (as in [1]) more NaV channels would be expressed in fish with higher synaptic drives. This would then result in an additional sodium influx per AP and result in higher energetic requirements per AP for electrocytes with higher firing rates (also shown in [1]).

      (2) Could the authors clarify what the vertical dashed line represents in Figures 1B and 1F? Does it correspond to an input current of 0.63uA?

      (Reviewer comment refers to Fig. 1C and 1F in new version): Yes, it corresponds to the input current that is also used in figures 1D and 1G. We clarified this by adding an additional tick label on the x-axis in 1F. The current input of 0.63uA was chosen as a representative input for this cell as follows: we first modeled an electrocyte with a periodic synaptic drive as in [1]. The frequency of this drive was set to 400 Hz, which is an intermediate value in the range of reported EODfs (and thus presumably pacemaker firing rates) of 200-600Hz [2]. Then, acetylcholine receptor currents I<sub>AChRNa</sub> and I<sub>AChRNa</sub> were summed and averaged to obtain the average input current of 0.63uA. This is now also explained in new Methods section 6.2.1.

      (3) What input current was used for Figures 1H, 1I, and 1J?

      Response: In a physiological setting, where the electrocyte is electrochemically coupled to the pacemaker nucleus, stimulation of the electrocyte occurs through neurotransmitter release in the synaptic cleft, which then leads to the opening of acetylcholine receptor channels. As figures 1H-J concern different ion fluxes, we aimed to also include currents stemming from acetylcholine receptor channels. We therefore did not stimulate the electrocyte with a constant input current as in Fig. 1C and F, but simulated elevated constant neurotransmitter levels in the synaptic cleft, which then leads to elevated acetylcholine receptor currents. In the model, this neurotransmitter level, or ‘synaptic drive’ is represented by parameter syn<sub>clamp</sub>. A physiologically relevant value for syn<sub>clamp</sub> was deduced by averaging the synaptic drive during a 400 Hz pacemaker stimulus. This is now also explained in new Methods section 6.2.1.

      (4) In Figure 4A, there is a slight delay between the PN spikes (driver) and the EO (receiver), and no EO spikes occur without PN spikes. However, the firing rate of EO (receiver) appears to decrease before the chirp initiations in Fig 4B; and this delay seems to disappear in Fig 4C. Could the authors explain these observations?

      As shown in the bottom right of figure 4A, when plotting the instantaneous firing rate as one over the inter-spike-interval (1/ISI), the firing rate of a cell is only plotted at the end of every ISI. Therefore, even though the PN drives the electrocyte and thus spikes earlier in time than the electrocyte, when it initiates chirps, these will only be plotted as an instantaneous firing rate at the end of the chirp. If the electrocyte fires spontaneously within this chirp, its instantaneous firing rate will appear earlier in time than the initiation of the chirp of the PN. The PN did, however, initiate the chirp before that and causality between the PN and electrocyte is not disturbed.

      (5) Regarding Figure 6, could the authors specify the input current used in Figures 6A and 6B?

      Figure 6A and 6B have the same synaptic drive as Fig. 1 H, I and J (syn<sub>clamp</sub>=0.13).

      (6) In Section 6, I would recommend that the authors provide a table of parameters and their corresponding values for clarity.

      Thank you for your suggestion. We now reorganized the method section and added two tables with parameters for clarity. Table 1 (see Methods 6.1) includes all parameters that differ from the parameters reported in [1], and parameters that arise from the additionally modeled equations to simulate ion concentration dynamics and pump. We also added the parameters used to simulate the different stimulus protocols (and corresponding tuned parameters) that are presented in the article in Table 2 (see Methods 6.2).

      Reviewer #2 (Public review):

      Summary:

      The paper 'The electrogenicity of the Na<sup>+</sup>/K<sup>+</sup>-ATPase poses challenges for computation in highly active spiking cells' by Weerdmeester, Schleimer, and Schreiber uses computational models to present the biological constraints under which electrocytes-specialized highly active cells that facilitate electro-sensing in weakly electric fish-may operate. The authors suggest potential solutions these cells could employ to circumvent these constraints.

      Electrocytes are highly active or spiking (greater than 300Hz) for sustained periods (for minutes to hours), and such activity is possible due to an influx of sodium and efflux of potassium ions into these cells for each spike. This ion imbalance must be restored after each spike, which in electrocytes, as with many other biological cells, is facilitated by the Na-K pumps at the expense of biological energy, i.e., ATP molecules. For each ATP molecule the pump uses, three positively charged sodium ions from the intracellular space are exchanged for two positively charged potassium ions from the extracellular volume. This creates a net efflux of positive ions into the extracellular space, resulting in hyperpolarized potentials for the cell over time. This does not pose an issue in most cells since the firing rate is much slower, and other compensatory mechanisms and other pumps can effectively restore the ion imbalances. In electrocytes of weakly electric fish, however, that operate under very different circumstances, the firing rate is exceptionally high. On top of this, these cells are also involved in critical communication and survival behaviors, emphasizing their reliable functioning.

      In a computation model, the authors test four increasingly complex solutions to the problem of counteracting the hyperpolarized states that occur due to continuous NaK pump action to sustain baseline activity. First, they propose a solution for a well-matched Na leak channel that operates in conjunction with the NaK pump, counteracting the hyperpolarizing states naturally. Additionally, their model shows that when such an orchestrated Na leak current is not included, quick changes in the firing rates could have unexpected side effects. Secondly, they study the implication of this cell in the context of chirps - a means of communication between individual fishes. Here, an upstream pacemaking neuron entrains the electrocyte to spike, which ceases to produce a so-called chirp - a brief pause in the sustained activity of the electrocytes. In their model, the authors show that it is necessary to include the extracellular potassium buffer to have a reliable chirp signal. Thirdly, they tested another means of communication in which there was a sudden increase in the firing rate of the electrocyte followed by a decay to the baseline. For reliable occurrence of this, they emphasize that a strong synaptic connection between the pacemaker neuron and the electrocyte is warranted. Finally, since these cells are energy-intensive, they hypothesize that electrocytes may have energyefficient action potentials, for which their NaK pumps may be sensitive to the membrane voltages and perform course correction rapidly.

      Strengths:

      The authors extend an existing electrocyte model (Joos et al., 2018) based on the classical Hodgkin and Huxley conductance-based models of Na and K currents to include the dynamics of the NaK pump. The authors estimate the pump's properties based on reasonable assumptions related to the leak potential. Their proposed solutions are valid and may be employed by weakly electric fish. The authors explore theoretical solutions that compound and suggest that all these solutions must be simultaneously active for the survival and behavior of the fish. This work provides a good starting point for exploring and testing in in vivo experiments which of these proposed solutions the fish use and their relative importance.

      Weaknesses:

      The modeling work makes assumptions and simplifications that should be listed explicitly. For example, it assumes only potassium ions constitute the leak current, which may not be true as other ions (chloride and calcium) may also cross the cell membrane. This implies that the leak channels' reversal potential may differ from that of potassium. Additionally, the spikes are composed of sodium and potassium currents only and no other ion type (no calcium). Further, these ion channels are static and do not undergo any post-translational modifications. For instance, a sodium-dependent potassium pump could fine-tune the potassium leak currents and modulate the spike amplitude (Markham et al., 2013).

      This model considers only NaK pumps. In many cell types, several other ion pumps/exchangers/symporters are simultaneously present and actively participate in restoring the ion gradients. It may be true that only NaK pumps are expressed in the weakly electric fish Eigenmannia virescens. This limits the generalizability of the results to other cell types. While this does not invalidate the results of the present study, biological processes may find many other solutions to address the non-electroneutral nature of the NaK pump. For example, each spike could include a small calcium ion influx that could be buffered or extracted via a sodium-calcium exchanger.

      Finally, including testable hypotheses for these computational models would strengthen this work.

      We thank the reviewer for the detailed summary and the identified weaknesses according to which we improved our article. Our model assumptions and simplifications are now mentioned in more detail in the introduction of the article (section 3), and justified in the Methods (section 6.1).

      Furthermore, we added a discussion section (section 5.1) where we outline the conditions under which the present study can be extended to other cell types. We now also state more clearly that the pump current will be present for any excitable cell with significant sodium flux (assuming that the NaK pump carries out the majority of its active transport), but that compensatory mechanisms (if employed at all in a particular cell) could also be implemented via other ionic currents and transporters. We furthermore now highlight the testable hypotheses that we put forward with our computational study on the weakly electric fish electrocyte more explicitly in the first paragraph of the discussion.

      Reviewer #2 (Recommendations for the authors):

      Main text

      Please explicitly state this model's assumptions in the introduction and elaborate on them in the discussion if necessary. For example, some assumptions that I find relevant to mention are: - The Na and K channels are classic HH conductance-based channels, with no post-translational modifications or beta subunit modifications as seen in other high-frequency firing cells (10.1523/JNEUROSCI.23-12-04899.2003).

      Neither calcium nor chloride ions are considered in the spike generation. Nor are Na-dependent K channels (10.1152/jn.00875.2012).

      Only the Na-K pump (and not the Na-Ca exchanger, Ca-pump, or Cl pumps) is modeled,

      Calmodulin, which can buffer calcium, is highly expressed in electric eels, but it is not considered. If some of these assumptions have valid justifications in weakly electric fish electrocytes, please state so with the citations. I recognize that including these in your models is beyond the scope of the current paper.

      We thank the reviewer for pointing out this issue. We now specified in the introduction that the model only contains sodium and potassium ions and only classic HH conductance-based channels. We there also explicitly specify the details on the Na<sup>+</sup>/K<sup>+</sup>-ATPase: it is the only active transporter in this model, thus solely responsible for maintaining ionic homeostasis; its activity is only modulated by intracellular sodium and extracellular potassium concentrations. In the discussion (6.1), we now elaborate on how ion-channel-related aspects (i.e., the addition of resurgent Na<sup>+</sup> or Na<sup>+</sup> -dependent K<sup>+</sup> channels), additional ion fluxes (including some not relevant for the electrocyte but for other excitable cells), and additional active transporters and pumps would influence the results presented in the article.

      In addition, there might be other factors that the authors and the reviewers have yet to consider. The model is a specific case study about the weakly electric fish electrocyte with high-frequency firing. It is almost guaranteed that biology will find other compensatory ways in different cell types, systems, and species (auditory nerve, for example). Given this, it would be prudent to use phrases such as 'this model suggests,' 'perhaps,' 'could,' 'may,' and 'eludes to,' etc., to accommodate other possible solutions to ion homeostasis in rapidly spiking neurons. The solutions the authors are proposing are some of many.

      We rephrased some of the statements to highlight more the hypothetical nature of the compensatory mechanisms in specific cells and to draw attention to the fact that there can be many more such factors. This fact is now also explicitly mentioned in discussion section 5.2.

      Figures

      Some of my comments on the figures are stylistic, others are to improve clarity, and some are critical for accuracy.

      The research problem concerns weakly electric fish E. virescens. I suggest introducing a picture of an electric fish in the beginning (such as that in Figure 3, but not exactly; see specific comments on this fish figure) along with a schema of the research question. 

      We agree, and added an overview schema in Fig. 1A.

      Font sizes change between the panels in all the figures. Please maintain consistency. The figure panel titles and axis labels should start with a capital letter.

      Thank you for pointing this out, both issues have been resolved in the new version of the article.

      Figure 1:

      Please rearrange the figure - BCFG belong together and should appear in the same order. The x-axis labels could be better placed.

      Consider using fewer pump current f-I curves (B, D, E, F). Five is sufficient to make the point. Having 10 curves adds to the clutter. The placement of the color bar could be better. Similarly, the placement of the panel titles 'without co-expression' and 'with co-expression' and the panel labeling (BCFG) makes it confusing. The panel labels should be above the panel title.

      Response (C, D, F, G in new version): We improved the layout of figure 1. Panels B, C, F, G are now C, D, F, G. We opted to include panel E before panels F and G, because it shows the coexpression mechanism before its effect on the tuning curve. We did move the colorbar, added x-axis labels to B and C, and adjusted the location of the panel labels for clarity. We also plotted fewer pump currents.

      B, F: What does the dashed line indicate?

      Response (C, F in new version): The dashed line indicates the input current that was used in figures 1D and 1G. We now clarified this by adding this value on the x-axis.

      C: Any reason not to show the lower firing rates?

      Response (B in new version): In the previous version of the article, pump currents were estimated for electrocytes that were stimulated with the mean synaptic drive that stems from periodic stimulation in the 200-600 Hz regime. We now extended the range of synaptic inputs to obtain lower (and higher) firing rates. The linear relationship between firing rate and pump current also holds for these additional firing rates.

      D: There is no difference between the curves at the top and the bottom. One fills the area between the curve and the zero line; the other shows the curve itself. Please use only one of the two representations.

      Response (panel I in new version): In the previous version, the difference between the plots was that one showed the absolute values of the currents (the curves), and the other plot showed the contributions of the currents to the total (area between the curves). We now only depict the current contributions.

      The I and H orders can be swapped.

      Thank you, they are now swapped.

      The colors used for Na and K are very dull (light blue and pink).

      We now use darker colors in the new version of the article.

      Figure 2:

      Please verify that without the synaptic input perturbations (i.e., baseline in A, D), the firing rate (B, E) and pump current (C, F) converge to the baseline. There is a noticeable drift (downward for firing rate and upward for pump currents) at the 10-second time point.

      Thanks to you noticing, we identified a version mismatch in the code that estimates the pump current required for ionic homeostasis (see Methods 6.1.2). We have now corrected the code and made sure to start the simulation in the steady state so that there is no drift at baseline firing. We also used this corrected code to present tuned parameters for different stimulus protocols in Table 2 (Methods 6.2).

      Figure 3:

      A. The dipole orientation with respect to the fish in panel B needs to be corrected. Consider removing this as this work is not about the dipole.

      This panel has been removed.

      B. This figure has already been overused in multiple papers; please redraw it. Localized expressions of different pumps and ion channels are present within each electrocyte, which generates the dipole. Either show this correctly or don't at all (the subfigure pointed out by the red arrow).

      This panel has been moved to Fig. 1A. We opted to remove the localized expressions.

      C and D belong together; please place them next to each other. Consider introducing panel D first since it follows a similar protocol to the last figure.

      Response (A in new version): Panel placement has been adjusted. We opted to maintain the order to maintain the flow of the text, but we do now combine them in one panel.

      E and F are very similar in that they are swapped on the x and y axes. Either that or I have severely misunderstood something, in which case it needs to be shown better.

      Response (B and C in new version): We adjusted the placement of these panels. They are not the same, panel B shows the mean of physiological periodic inputs, and figure C shows that when this mean is fed to the electrocyte, it also induces tonic firing. The range of mean currents that result from periodic synaptic stimulation in the physiological regime (panel B, y-axis) is now indicated in panel C by a grey box along the x-axis.

      G. Why show the lines with double arrow ends? The curves are diverging - that's enough.

      Good point, we updated this panel accordingly (now panel D).

      Figure 4

      Please verify the time units in these plots. Something seems amiss. B and D lower plots-perhaps this is seconds? B could use an inset box/ background gray color (t1, t2) indicating the plots of the C panel (left, right). Likewise, for D (t1, t2), connect to E (left, right).

      You are right, the x-axes were supposed to be in seconds, we updated this. We indicated the relations between D-C and D-E by gray backgrounds and by adding the corresponding panel label on the x-axis.

      A: Indicate the perturbation in the schematic, i.e., extracellular K buffer.

      The perturbation is now indicated.

      D: Even with the extracellular K buffer, there is a decay (slower than in B) of the pump current over time. Please verify (you do not have to show in your paper) that this decay saturates.

      After the ten chirps are initiated, pacemaker firing goes back to baseline. In both cases (panel B and panel D), the pump current goes back to baseline after some time. With extracellular potassium buffering, this happens more slowly due to a decreased reaction speed of the pump to changes in firing rate (in comparison to the case without extracellular potassium buffer).

      The decrease in reaction speed however merely delays the effects of changes in firing rates on the pump current in time. Therefore, even with an extracellular potassium buffer, when more chirps are initiated in a short period of time, the pump current can still decrease to an extent that impairs entrainment. Using the same protocol as in panel B and D, we increased the number of chirps and found that with an extracellular potassium buffer, a maximum of 13 chirps could be encoded without entrainment failure (as opposed to 2 chirps without the buffer as shown in panel B).

      Figure 5

      Please verify the time units in these plots, as for Figure 4. B and E lower plots-perhaps this is seconds? B could use an inset box/ background gray color (t1, t2) indicating the plots of the panels C and D. Likewise, for E (t1, t2), connect to F and G.

      The time axis in this figure was indeed also in seconds, which we corrected here. The relations between plots B-C/D and E-F/G are now indicated through gray backgrounds and corresponding panel references on the x-axis.

      A: Indicate the perturbation in the schematic, i.e., the synapse's strength. There is no need to include the arrow or to mention freq. rise. The placement of the time scale can be misinterpreted as a current clamp. Instead, plot it as a zoomed inset.

      The arrow is removed and we now also show a zoomed inset. Also, the perturbation is now indicated.

      E: Verify that the pump current in the strong synapse case already starts at 1.25

      We verified this and noticed that the pump current in the strong synapse case is indeed lower than that in the weak synapse case. This is because to ensure a fair comparison for this stimulation protocol, voltage-gated sodium channel conductance was tuned to maintain a spike amplitude of 13 mV in both cases (see Methods 6.2). In this case, a weak synapse leads to a lower influx of sodium via AChR channels, but a higher influx via voltage-gated sodium channels. The total sodium influx in this case is larger than that for a stronger synapse with relatively less voltage-gated sodium currents, and thus a larger pump current. In the previous version of the article, this was wrongly commented on in the figure captions, and we removed the erroneous statement.

      This is not critical, but because the R-value here can be obtained as a continuous value, it would be appropriate to show it for the whole duration of the weak and strong synapses in B and E. Maybe consider including a schema that shows how R is calculated in panel A.The caption has a typo, 'during frequency rises before (D) and after (E)'. It should be before C) and after (D) instead.

      The caption typo has been corrected. The R-value for the whole duration of the weak and strong synapses in B and E is 1.000. This is because the R-value is the variance of all phase relations between the PN and the electrocyte, and for the entire duration of the stimulus protocol, there are only a few outliers in phase relations at the maxima of the frequency rises. We decided to include this R-value to show that in general, synchronization between the PN and the electrocyte is very stable. The schema that explains how R is calculated has not been included in favor of not overcrowding the figure. We did add a reference in the figure caption to the methods section in which the calculation of R is explained.

      Figure 6:

      A: The top and bottom plots are redundant. Use one of the two. They show the same thing. It may be better to plot Na, K, pump, and net currents on the top panels and the Na leak, which is of smaller magnitude, in a different panel.

      We now only show current contributions.

      B: Please change the color schema. It is barely visible on my prints.

      D: Pump current, instantaneous case, is barely visible

      Color schemes were adjusted.

      Figure A1: It's all good.

      Methods:

      Please provide some internal citations for where specific equations were used in the results/figures. You do this for sections 6.2.3, referencing Figure 5 (c,d,e,g), and 6.2.4, referencing Fig 5 C-E.

      There are now internal references in each methods section to where in the figures they were used. We also included a table with stimulus parameters for each figure with a stimulus protocol (Table 2).

      Also, the methods could be ordered in the same order as the results are presented. Please consider if some details in the methods could be moved to the appendix.

      The ordering of the methods has now been changed to separately explain the model expansions (6.1) and the stimulus protocols (6.2). Both sections are in corresponding order of the figures presented in the article. We opted to maintain all details in the methods.

      6.1.1 Please cite 26 after the first line. Where was this used? In Figure 3C, 4, 5?

      We added the citation. The effects of co-expressed leak channels are shown in Fig. 1 EG, and were used to compensate for pump currents at baseline firing in figures 1 D, H-J (left, with pump), 2, 4, 5, and 6 A-B (left), C (top). This is now also added to the text for clarity.

      Traditionally (Hodgkin, A. L. and Huxley, A. F. (1952). J. Physiol. (Lond.), 117:500-544. Table 3; & Hodgkin, A. L. and Huxley, A. F. (1952). J. Physiol. (Lond.), 116:473-496 Table 5 and the paragraph around it), leak potential is set such that it accounts for all leak from all ions. While in your work, this potential is equal to the reversal of potassium - it need not be so in the animal. There may be leaks from other ions as well, particularly sodium and chloride. Please verify that assuming the leak reversal is the same as that of potassium (Ek, in Equation 3) does not lead to having to model Na leak currents separately.

      In the original model [1], it was assumed that the reversal potential of the leak was the same as that of potassium, which contains the implicit assumption that only potassium ions contribute to the leak. In our article, we also assume that sodium ions contribute to the leak. This can be modeled by adjusting the leak reversal potential accordingly, or by adding an additional leak current that solely models the sodium leak. We opted for the latter in order to track all sodium and potassium ions separately so that ion concentration dynamics could also be modeled properly. Chloride ions were neglected in this study; in our model they do not contribute to the leak. If one were to also model chloride currents and chloride concentration dynamics, it would be beneficial to model these as an additional separate leak current.

      The notation of I_pump_0 needs to be more convenient. Please consider another notation instead of the _0 (pump at baseline). Similarly for [Na<sup>+</sup>]_in_0 [Na<sup>+</sup>]_out_0 and [K<sup>+</sup>]_in_0 and [K+]_out_0

      We changed the notation for baseline similarly to [3], with ‘0’ as a superscript instead of a subscript.

      Equation 11: Please mention why AChRs do not let calcium ions through. Please cite a justification for this. If this is an assumption of the model, please state this explicitly.

      The AChR channels that were found in the E. virescence electrocytes are muscle-type acetylcholine nicotinic receptors [4], which are non-selective cation channels that could indeed support calcium flux [5]. No calcium currents were, however, modeled in the original electrocyte model [1], presumably due to the lack of significant contributions of calcium currents or extracellular calcium concentrations to electrocyte action potentials of a similar weakly electric electrogenic wave-type fish Sternopygus macrurus [6].

      Due to the lack of calcium currents in the original electrocyte model, and due to the limitation of this study to sodium and potassium ions, we chose not to include calcium currents stemming from AChR channels. This assumption is now explicitly stated in Methods 6.1.

      Equation 12, V_in, where the intracellular volume. If possible, avoid the notation of 'V' - you already use a small v for membrane potential.

      We changed the notation for volume to ‘ω’ similarly to [3]. As we previously used ω as a notation for the firing rate, we changed the notation for firing rate to ‘r’.

      Equation 17: Does this have any assumptions? Would the I_AchRNa, and thus Sum(mean(I_Na))) not change depending on the synaptic drive?

      The assumptions of this equations are the following (now also mentioned in Methods 6.1.2):

      The sum of all sodium currents also includes sodium currents through acetylcholine channels (I_AChRNa).

      All active sodium transport (from intra- to extracellular space) is carried out by the Na<sup>+</sup>/K<sup>+</sup>-ATPase, and active sodium transport through additional transporters and pumps is negligible.

      The time-average of sodium currents is either taken in a tonic firing regime where the timeinterval that is averaged over is a multiple of the spiking period, nT, or if it is taken for a more variable firing regime, the size of the averaging window should be sufficiently large to properly sample all firing statistics.

      Under these assumptions, Eq. 17 can be used to compute suitable pump currents for different synaptic drives (as Sum(mean(I_Na))) and thus I_pump0 indeed change with the synaptic drive, see Table 2 in Methods 6.2). 

      6.2: Please rewrite the first sentence of this paragraph.

      The first sentence of this paragraph, which has been moved to section 6.2.2 for improved structuring of the text, has been rewritten.

      6.2.1: The text section could use a rewrite.

      Please elaborate on what t_p is. If it is not time, please do not use 't.' What is p here? What are the units of the equation (22), t_p < 0.05 (?)

      This section has now also been moved to 6.2.2. It has been rewritten to improve clarity and t_p has been renamed to t_pn (as it does reflect time, which is now better explained). The units have now also been added to the equation (which is now Eq. 26).

      6.2.4: Please rewrite this.

      This section has been rewritten (and has been moved to section 6.1.4).

      Bibliography

      Some references are omitted (left anonymous) or inconsistent on multiple occasions.

      Thank you for pointing this out! It is now rectified.

      References used for author response

      (1) Joos B, Markham MR, Lewis JE, Morris CE. A model for studying the energetics of sustained high frequency firing. PLOS ONE. 2018 Apr;13:e0196508.

      (2) Hopkins CD. Electric communication: Functions in the social behavior of eigenmannia virescens. Behaviour. 1974;50(3-4):270–304.

      (3) Hübel N, Dahlem MA. Dynamics from seconds to hours in hodgkin-huxley model with time-dependent ion concentrations and buer reservoirs. PLoS computational biology.ff2014;10(12):e1003941.

      (4) BanY, Smith BE, Markham MR. A highly polarized excitable cell separates sodium channels from sodium-activated potassium channels by more than a millimeter. Journal of neurophysiology. 2015; 114(1):520–30.

      (5) Vernino S, Rogers M, Radcliffe KA, Dani JA. Quantitative measurement of calcium flux through muscle and neuronal nicotinic acetylcholine receptors. Journal of Neuroscience. 1994;14(9):5514-5524.

      (6) Ferrari M, Zakon H. Conductances contributing to the action potential of sternopygus electro-cytes. Journal of Comparative Physiology A. 1993;173:281–92.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for the authors):

      Thank you for your thorough review of our manuscript and your valuable suggestions. Here are our responses to each point you raised:

      (1) Novelty: Exploring the feasibility of extending the risk-scoring model to diverse cancer types could emphasize the broader impact of the research.

      Thank you so much for your thoughtful and insightful feedback. Your suggestion to explore extending the risk-scoring model to diverse cancer types is truly valuable and demonstrates your broad vision in this field. We deeply appreciate your interest in our research and the effort you put into providing such constructive input.

      After careful consideration, we have decided to focus our current study on the specific cancer type(s) we initially set out to explore. This decision was made to ensure that we can thoroughly address the research questions at hand, given our current resources, time constraints, and the complexity of the topic. By maintaining this focused approach, we aim to achieve more in-depth and reliable results that can contribute meaningfully to the understanding of this particular area.

      However, we fully recognize the potential significance of your proposed direction and firmly believe that it could be an excellent avenue for future research. We will definitely keep your suggestion in mind and may explore it in subsequent studies as our research progresses and evolves.

      (2) Improvement in Figure Presentation: The inconsistency in font formatting across figures, particularly in Figure 2 (A-D, E, F-H, I), Figure 3 (A-C, D-J, H, K), and the distinct style change in Figure 5, raises concerns about the professionalism of the visual presentation. It is recommended to standardize font sizes and styles for a more cohesive and visually appealing layout. This ensures that readers can easily follow and comprehend the graphical data presented in the article.

      The text in the picture has been revised as requested.

      (3) Enhancing Reliability of Immune Cell Infiltration Data: Address the potential limitations associated with relying solely on RNASeq data for immune cell infiltration analysis between ICD and ICD high groups in Figure 2. It is advisable to discuss the inherent challenges and potential biases in this methodology. To strengthen the evidence, consider incorporating bladder cancer single-cell sequencing data, which could provide a more comprehensive and reliable understanding of immune cell dynamics within the tumor microenvironment.

      Thank you very much for your meticulous review and the highly constructive suggestions. Your insight regarding the limitations of relying on RNASeq data for immune cell infiltration analysis and the proposal to incorporate bladder cancer single-cell sequencing data truly reflect your profound understanding of the field. We deeply appreciate your efforts in guiding our research and the valuable perspectives you've offered.

      After careful deliberation, given our current research scope, timeline, and available resources, we've decided to focus on further discussing and addressing the challenges and biases inherent in RNASeq-based immune cell infiltration analysis. By delving deeper into the methodological limitations and conducting more in-depth statistical validations, we aim to provide a comprehensive and reliable interpretation of the data within our study framework. This focused approach allows us to maintain the integrity of our original research design and deliver robust findings on the relationship between immune cell infiltration and ICD in the current context.

      However, we fully acknowledge the significant value of your proposed single-cell sequencing approach. It is indeed a powerful method that could offer more detailed insights into immune cell dynamics, and we believe it holds great promise for future research in this area. We will keep your suggestion in mind as an important direction for potential future studies, especially when we plan to expand and deepen our exploration of the tumor microenvironment.

      (4) Clarity in Data Sources and Interpretation of Figure 5: In the results section, provide a detailed and transparent explanation of the sources of data used in Figure 5. This includes specifying the databases or platforms from which the chemotherapy, targeted therapy, and immunotherapy data were obtained. Additionally, elucidate the rationale behind the chosen data sources and how they contribute to the overall interpretation of the study's findings. And, strangely, these immune-related genes are associated with cancer sensitivities to different targeted therapies.

      Thank you very much for your detailed and valuable feedback on Figure 5. We sincerely appreciate your careful review and insightful suggestions, which have provided us with important directions for improvement.

      Regarding the data sources in Figure 5, we used the pRRophetic algorithm to conduct a drug sensitivity analysis on the TCGA database. The reason for choosing these data sources is multi - faceted. Firstly, these databases and platforms are well - established and widely recognized in the field. They have strict data collection and verification processes, ensuring the accuracy and reliability of the data. For example, TCGA has a large - scale, long - term - accumulated chemotherapy case database, which can comprehensively reflect the clinical application and treatment effects of various chemotherapeutic drugs.

      Secondly, these data sources cover a wide range of cancer types and patient information, which can meet the requirements of our study's diverse sample size and variety. This comprehensiveness enables us to conduct a more in - depth and representative analysis of the relationships between different therapies and immune - related genes.

      In terms of the overall interpretation of the study's findings, the use of these data sources provides a solid foundation. The accurate chemotherapy, targeted therapy, and immunotherapy data help us clearly demonstrate the associations between immune - related genes and cancer sensitivities to different treatments. This allows us to draw more reliable conclusions and provides a scientific basis for understanding the complex mechanisms of cancer treatment from the perspective of immune - gene - therapy interactions.

      As for the unexpected association between immune - related genes and cancer sensitivities to different targeted therapies, this is indeed a fascinating discovery. In our analysis, we hypothesized that immune - related genes may affect the tumor microenvironment, thereby influencing the response of cancer cells to targeted therapies. Although this finding is currently beyond our initial expectations, it has opened up a new research direction for us. We will further explore and verify the underlying mechanisms in future research.

      Once again, thank you for your guidance. We will make corresponding revisions and improvements according to your suggestions to make our research more rigorous and complete.

      (5) Legends and Methods: Address the brevity and lack of crucial details in the figure legends and methods section. Expand the figure legends to include essential information, such as the number of samples represented in each figure. In the methods section, provide comprehensive details, including the release dates of databases used, versions of coding packages, and any other pertinent information that is crucial for the reproducibility and reliability of the study.

      We would like to express our sincere gratitude for your valuable feedback on the figure legends and methods section of our study. We highly appreciate your sharp observation of the issues regarding the brevity and lack of key details, which are crucial for further improving our research.

      We have supplemented the methods section with data including the number of samples, the release dates of the databases used, and the versions of the coding packages, etc. For TCGA samples: 421 tumor samples and 19 normal samples.Database release date: March 29, 2022, v36 versions.Coding package version: R version 4.1.1.We will immediately proceed to supplement these key details, making the research process and methods transparent. This will allow other researchers to reproduce our study more accurately and enhance the persuasiveness of our research conclusions.

      (6) Evidence Supporting Immunotherapy Response Rates: The importance of providing a robust foundation for the conclusion regarding lower immunotherapy response rates. Strengthen this section by offering a more detailed description of sample parameters, specifying patient demographics, and presenting any statistical measures that validate the observed trends in Figure 5Q-T. More survival data are required to conclude. Avoid overinterpretation of the results and emphasize the need for further investigation to solidify this aspect of the study.

      Thank you very much for your professional and meticulous feedback on the content related to immunotherapy response rates in our study! Your suggestions, such as providing a solid foundation for the conclusions and supplementing key information, are of great value in enhancing the quality of our research, and we sincerely appreciate them.

      The data in Figures 5Q to T are from the TCGA database, which has already been provided. The statistical measure used for Figures 5Q to T is the P-value, which has been marked in the figures. The survival data have been provided in Figure 3D.

      Reviewer #2 (Recommendations for the authors):

      Thank you for your thorough review of our manuscript and your valuable suggestions. Here are our responses to each point you raised:

      (1) There is no information on the samples studied. Are all TCGA bladder cancer samples studied? Are these samples all treatment naïve? Were any excluded? Even simply, how many samples were studied?

      Thank you so much for pointing out the lack of sample - related information. Your attention to these details has been extremely helpful in identifying areas for improvement in our study.

      All the samples in our study were sourced from the TCGA (The Cancer Genome Atlas) and TCIA (The Cancer Immunome Atlas) databases. It should be noted that the patient data in the TCIA database are originally from the TCGA database. Regarding whether the patients received prior treatment, this information was not specifically mentioned in our current report. Instead, we mainly relied on the scores of the prediction model for evaluation. Since all samples were obtained from publicly available databases, we understand the importance of clarifying their origin and characteristics.

      We sincerely apologize for the omission of the sample size and other relevant details. We will promptly supplement this crucial information in the revised version, including a detailed description of the sample sources and any relevant characteristics. This will ensure greater transparency and help readers better understand the basis of our research.

      For TCGA samples: 421 tumor samples and 19 normal samples.Database release date: March 29, 2022, v36 versions.Coding package version: R version 4.1.1.

      (2) What clustering method was used to divide patients into ICD high/low? The authors selected two clusters from their "unsupervised" clustering of samples with respect to the 34 gene signatures. A Delta area curve showing the relative change in area under the cumulative distribution function (CDF) for k clusters is omitted, but looking at the heatmap one could argue there are more than k=2 groups in that data. Why was k=2 chosen? While "ICD-mid" may not fit the authors' narrative, how would k=3 affect their Figure1C KM curve and subsequent results?

      Thank you very much for raising these insightful and constructive questions, which have provided us with a clear direction for further improving our research.

      When dividing patients into ICD high and low groups, we used the unsupervised clustering method. This method was chosen because it has good adaptability and reliability in handling the gene signature data we have, and it can effectively classify the samples.

      Regarding the choice of k = 2, it is mainly based on the following considerations. Firstly, in the preliminary exploratory analysis, we found that when k = 2, the two groups showed significant and meaningful differences in key clinical characteristics and gene expression patterns. These differences are closely related to the core issues of our study and help to clearly illustrate the distinctions between the ICD high and low groups. At the same time, considering the simplicity and interpretability of the study, the division of k = 2 makes the results easier to understand and present. Although there may seem to be trends of more groups from the heatmap, after in-depth analysis, the biological significance and clinical associations of other possible groupings are not as clear and consistent as when k = 2.

      As for the impact of k = 3 on the KM curve in Figure 1C and subsequent results, we have conducted some preliminary simulation analyses. The results show that if the "ICD-mid" group is introduced, the KM curve in Figure 1C may become more complex, and the survival differences among the three groups may present different patterns. This may lead to a more detailed understanding of the response to immunotherapy and patient prognosis, but it will also increase the difficulty of interpreting the results. Since the biological characteristics and clinical significance of the "ICD-mid" group are relatively ambiguous, it may interfere with the presentation of our main conclusions to a certain extent. Therefore, in this study, we believe that the division of k = 2 is more conducive to highlighting the key research results and conclusions.

      Thank you again for your valuable comments. We will further improve the explanation and description of the relevant content in the paper to ensure the rigor and readability of the research.

      (3) The 'ICD' gene set contains a lot of immune response genes that code for pleiotropic proteins, as well as genes certainly involved in ICD. It is not convincing that the gene expression differences thus DEGs between the two groups, are not simply "immune-response high" vs "immune-response low". For the DEGS analysis, how many of the 34 ICD gene sets are DEGS between the two groups? Of those, which markers of ICD are DEGs vs. those that are related to immune activation?

      a. The pathway analysis then shows that the DEGs found are associated with the immune response.

      b. Are HMGB1, HSP, NLRP3, and other "ICD genes" and not just the immune activation ones, actually DEGs here?

      c. Figures D, I-J are not legible in the manus.

      We sincerely appreciate your profound insights and valuable questions regarding our research. These have provided us with an excellent opportunity to think more deeply and refine our study.

      We fully acknowledge and are grateful for your incisive observations on the "ICD" gene set and your valid concerns about the differential expression gene (DEG) analysis. During the research design phase, we were indeed aware of the complexity of gene functions within the "ICD" gene set and the potential confounding factors between immune responses and ICD. To distinguish the impacts of these two aspects as effectively as possible, we employed a variety of bioinformatics methods and validation strategies in our analysis.

      Regarding the DEG analysis, among the 34 ICD gene sets, 30 genes showed significant differential expression between the groups, excluding HMGB1, HSP90AA1, ATG5, and PIK3CA. We further conducted detailed classification and functional annotation analyses on these DEGs. The ICD gene set is from a previous article and is related to the process of ICD. Relevant literature is in the materials section. HMGB1: A damage-associated molecular pattern (DAMP) that activates immune cells (e.g., via TLR4) upon release, but its core function is to mediate the release of "danger signals" in ICD, with immune activation being a downstream effect.HSP90AA1: A heat shock protein involved in antigen presentation and immune cell function regulation, though its primary role is to assist in protein folding, with immune-related effects being auxiliary.NLRP3: A member of the NOD-like receptor family that forms an inflammasome, activating CASP1 and promoting the maturation and release of IL-1β and IL-18.Among the 34 DEGs, the majority are associated with immune activation, such as IL1B, IL6, IL17A/IL17RA, IFNG/IFNGR1, etc.

      (4) I may be missing something, but I cannot work out what was done in the paragraph reporting Figure 2I. Where is the ICB data from? How has this been analysed? What is the cohort? Where are the methods?

      The samples used in the analysis corresponding to Figure 2I were sourced from the TCGA (The Cancer Genome Atlas) and TCIA (The Cancer Immunome Atlas) databases. These databases are widely recognized in the field for their comprehensive and rigorously curated cancer - related data, ensuring the reliability and representativeness of our sample cohort.

      Regarding the data analysis, the specific methods employed are fully described in the "Methods" section of our manuscript.

      (5) How were the four genes for your risk model selected? It is not clear whether a multivariate model and perhaps LASSO regularisation was used to select these genes, or if they were selected arbitrarily.

      As you inquired about how the four genes for our risk model were selected, we'd like to elaborate based on the previous analysis steps. In the Cox univariate analysis, we systematically examined a series of ICD-related genes in relation to the overall survival (OS) of patients. Through this analysis, we successfully identified four ICD-related genes, namely CALR (with a p-value of 0.003), IFNB1 (p = 0.037), IFNG (p = 0.022), and IF1R1 (p = 0.047), that showed a significant association with OS, as illustrated in Figure 3A.

      Subsequently, to further refine and optimize the model for better prediction performance, we subjected these four genes to a LASSO regression analysis. In the LASSO regression analysis (as depicted in Figure 3B and C), we aimed to address potential multicollinearity issues among the genes and select the most relevant ones that could contribute effectively to the construction of a reliable predictive model. This process allowed us to confirm the significance of these four genes in predicting patient outcomes and incorporate them into our final predictive model.

      (6) How related are the high-risk and ICD-high groups? It is not clear. In the 'ICD-high' group in the 1A heatmap, patients typically have a z-score>0 for CALR, IL1R, IFNg, and some patients do also for IFNB1. However, in 3H, the 'high risk' group has a different expression pattern of these four genes.

      Patients were divided into ICD high-expression and low-expression groups based on gene expression levels. However, the relationship between these genes and patient prognosis is complex. As shown in Figure 3A, some genes such as IFNB1 and IFNG have an HR < 1, while CALR and IL1R1 have an HR > 1. Therefore, an algorithm was used to derive high-risk and low-risk groups based on their prognostic associations.

      (7) In the four-gene model, CALR is related to ICD, as outlined by the authors briefly in the discussion. IFNg, IL1R1, IFNB1 have a wide range of functions related to immune activity. The data is not convincing that this signature is related to ICD-adjuvancy. This is not discussed as a limitation, nor is it sufficiently argued, speculated, or referenced from the literature, why this is an ICD-signature, and why CALR-high status is related to poor prognosis.

      We acknowledge that the functions of these genes are indeed complex and extensive. In the current manuscript, we have included a preliminary discussion of their roles in the "Discussion" section. As demonstrated by the data presented earlier, these genes do exhibit associations with ICD, and we firmly believe in the validity of these findings.

      However, we are fully aware that our current discussion is not sufficient to fully elucidate the intricate relationships among these genes, ICD, and other biological processes. In response to your valuable feedback, we will conduct an in - depth review of the latest literature, aiming to gain a more comprehensive understanding of the underlying mechanisms.

      (8) Score is spelt incorrectly in Figures 3F-J.

      Figures 3F-J have been revised as requested.

      (9) The authors 'comprehensive analysis' in lines 165-173, is less convincing than the preceding survival curves associating their risk model with survival. Their 'correlations' have no statistics.

      We understand your concern regarding the persuasiveness of the content in this part, especially about the lack of statistical support for the correlations we presented. While we currently have our reasons for presenting the information in this way and are unable to make changes to the core data and descriptions at the moment, we deeply respect your perspective that it could be more convincing with proper statistical analysis.

      (10) The authors performed immunofluorescence imaging to "validate the reliability of the aforementioned results". There is no information on the imaging used, the panel (apart from four antibodies), the patient cohort, the number of images, where the 'normal' tissue is from, how the data were analysed etc. This data is not interpretable without this information.

      a. Is CD39 in the panel? CD8, LAG3? It's not clear what this analysis is.

      The color of each antibody has been marked in Fig 2B. The cohort information and its source have been supplemented. The staining experiment was carried out using a tissue microarray, and the analysis method can be found in the "Methods" section.Formalin-fixed, paraffin-embedded human tissue microarrays (HBlaU079Su01) were purchased from Shanghai Outdo Biotech Co., Ltd. (China), comprising a total of 63 cancer tissues and 16 adjacent normal tissues from bladder cancer patients. Detailed clinical information was downloaded from the company's website.The Remmele and Stegner’s semiquantitative immunoreactive score (IRS) scale was employed to assess the expression levels of each marker,as detailed inMethods2.5.CD39, CD8, and LAG3 were also stained, but the results were not presented.

      (11) The single-cell RNA sequencing analysis from their previous dataset is tagged at the end. CALR expression in most identified cells is interesting. Not clear what this adds to the work beyond 'we did scRNA-seq'. How were these data analysed? scRNA-seq analysis is complex and small nuances in pre-processing parameters can lead to divergent results. The details of such analysis are required!

      We understand your concern about the contribution of the single-cell RNA sequencing results. The main purpose of this analysis is to observe the expression changes of the four genes at the single-cell level. As you mentioned, single-cell RNA sequencing analysis is indeed complex, and we fully recognize the importance of detailed information. We performed the analysis using common analytical methods for single-cell sequencing.It has been supplemented in the Methods section.

      • Core Concept: The paper introduces Universalis, an AI-first, program-synthesis framework and language designed to be read by human knowledge workers but generated by AI models (LLMs). > "this paper outlines the high-level design of an AI-first, program-synthesis framework built around a new programming language, Universalis, designed for knowledge workers to read, optimized for our neural computer (Automind) to execute, and ready to be analyzed and manipulated by an accompanying set of tools."

      • Design Philosophy: Universalis prioritizes readability for domain experts over writability for professional developers, making code intuitive and easier for AI to generate accurately. > "Unlike traditional programming languages, which prioritize syntax and structure optimized for writing by professional developers, Universalis is designed with the philosophy that code should be read by domain experts and written by machines."

      • Structure and Syntax: The language syntax is analogous to "literate Excel spreadsheet formulas," embedding logical predicates inside [...] hedges surrounded by natural language explanations. > "Think of Universalis clauses as some kind of literate Excel spreadsheet formulas such as [@D is (@S-@B)] over named tables, or relations, enclosed in hedges surrounded by natural language explanation, where cells in the table correspond to variables such as @B, @S, and @D..."

      • AI Safety through Contracts: It natively embeds pre- and post-conditions (contracts) into the language, providing a formal and extensible method to ensure the logical correctness and safety of AI-generated computations. > "By embedding pre- and post-conditions directly into the language semantics, Universalis provides a pragmatic and extensible method for implementing AI safety."

      • Readable Conditionals: Conditional logic is structured as simple checklists, making decision-making processes clear and intuitive for human readers while still being executable by the AI interpreter, Automind. > "By structuring conditionals as checklists and explaining each branch in natural language, Universalis ensures that the logic is clear and intuitive for the human reader, while the Universalis interpreter Automind can still recognize them as control-flow decision points..."

      • Loopless Bulk Processing: The framework handles operations on data collections via implicit broadcasting, similar to NumPy or modern Excel, removing the need for explicit loops that can be a distraction for non-programmers. > "In Universalis, this is handled by implicitly broadcasting operations on single elements to collections, similar to how NumPy or Pandas operate in Python or how dynamic array formulas and spilled array behavior in Excel allow for loopless programming."

      • Accessible Data Queries: It features query comprehensions that use a structured, natural-language style for complex data operations like filtering and aggregation, making them more approachable than traditional query languages like SQL. > "By focusing on a structured natural-language approach for comprehending queries, Universalis ensures that even those with minimal experience in programming can perform advanced data manipulations."

      • Simplified Data Extraction: The language includes powerful pattern-matching capabilities, allowing users to easily extract specific information from complex, nested data structures like JSON without writing complicated parsing code. > "This is where Universalis excels with its pattern-matching capability. Users can simply specify the patterns they want to match within the JSON structure..."

      • Intentional Representation: Rather than generating final, concrete syntax, the Automind system produces an abstract, intentional representation of the user's intent, which can then be rendered into different formats. > "Since Universalis programs are trying to capture the high-level intent of the user as well, Automind does not generate the concrete syntax seen in the examples so far but instead creates an abstract, intentional representation of the Universalis code..."

      • Minimalist Language Design: The language is intentionally kept minimal, focusing on core features like sequential composition, implicit looping, and nested dataframe queries to maintain readability and compatibility with formal verification tools. > "For Universalis, we intentionally keep the language minimal, focusing on sequential composition, implicit looping by lifting singleton operations over collections, and fully nested dataframe queries."

  12. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
    1. AbstractChevreul is an open-source R Bioconductor package and interactive R Shiny app for processing and visualization of single cell RNA sequencing (scRNA-seq) data. It differs from other scRNA- seq analysis packages in its ease of use, its capacity to analyze full-length RNA sequencing data for exon coverage and transcript isoform inference, and its support for batch correction. Chevreul enables exploratory analysis of scRNA-seq data using Bioconductor SingleCellExperiment or Seurat objects. Simple processing functions with sensible default settings enable batch integration, quality control filtering, read count normalization and transformation, dimensionality reduction, clustering at a range of resolutions, and cluster marker gene identification. Processed data can be visualized in an interactive R Shiny app with dynamically linked plots. Expression of gene or transcript features can be displayed on PCA, tSNE, and UMAP embeddings, heatmaps, or violin plots while differential expression can be evaluated with several statistical tests without extensive programming. Existing analysis tools do not provide specialized tools for isoform-level analysis or alternative splicing detection. By enabling isoform-level expression analysis for differential expression, dimensionality reduction and batch integration, Chevreul empowers researchers without prior programming experience to analyze full-length scRNA-seq data.Data availability A test dataset formatted as a SingleCellExperiment object can be found at https://github.com/cobriniklab/chevreuldata.

      Reviewer 1. Dr. Luyi Tian and Dr. Hongke Peng

      Is there a clear statement of need explaining what problems the software is designed to solve and who the target audience is? Yes. Thus, the statement of need is well-defined, addressing both the problem (complexity of scRNA-seq data analysis without programming skills) and the intended audience (non-programming researchers in the field).

      Additional Comments: This study provides Chevreul, a Bioconductor package, for analysis and visualization of single-cell sequencing data. This package contains a shinny app. It also provide the functions which implemented by a set of bioconductor packages for standard scRNA-seq analysis to generate the necessary input of the shinny app. I believe that this app can provide an additional option for researchers who work with single-cell data. However, there might be a few comments need addressing.

      While the title emphasizes "exploratory analysis of full-length single-cell sequencing," the authors do not explicitly mention the analysis full-length data (e.g., isoform detection or quantification). For instance, the “sce_process(...)” pipeline figure lacks specific steps addressing full-length sequencing workflows. To strengthen this claim, the authors might need to mention/summarize the methods for isoform detection and quantification, for both annotated and novel ones. It would be better to specify recommended tools for transcript-level analysis (e.g., transcript assembly or differential isoform usage) that integrate with Chevreul's visualization features. Meanwhile, The manuscript focuses on Smart-seq as the representative full-length method. It might also be helpful to discuss other full-length methods such as ONT nanopore sequencing or PacBio, in aspect of data processing, transcript assembly, de novel usage or potential challenges in adapting Chevreul to these platforms, etc.

      There is another minor suggestion. Functions mentioned in the text and Figure 1 (e.g., “sce_process”, “sce_integrate”) should include parentheses (e.g., “sce_process()”) to align with R syntax conventions and clarify their roles as package functions.

      Re-review: I am happy with the revision and author have fully addressed my concerns.

      Reviewer 2. Dr.Tianhang Lv

      Is there a clear statement of need explaining what problems the software is designed to solve and who the target audience is? Yes. Chevreul provides tools for exploratory analysis of single-cell data and offers essential tools for the analysis and visualization of single-cell full-length transcriptomes. In several sections of the article, the authors discuss the key computational challenges addressed by this software. However, in the abstract, they need to emphasize the advantages of Chevreul in single-cell full-length transcript analysis (the current version lacks sufficient description). In the "Statement of Need" section, the authors could also highlight the limitations of existing single-cell full-length transcript analysis tools and introduce the advantages of Chevreul in this regard.

      Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined?

      Yes. Although the authors have provided installation documentation, the current documentation on GitHub is not user-friendly. For example, the page at https://github.com/cobriniklab/chevreul does not include code for importing seuratTools, yet it runs the built-in function clustering_workflow from seuratTools. Additionally, the current documentation is overly simplistic and not accessible to those without programming experience.

      Is the documentation provided clear and user friendly?

      No. The authors have separated the example workflows for SingleCellExperiment objects and Seurat objects into two different GitHub projects, which is not conducive for users to understand the structure of Chevreul or to facilitate learning. Additionally, the batch integration mentioned in the article lacks specific implementation examples. The authors should at least provide implementation examples for the results mentioned in the manuscript. Furthermore, the current documentation needs further refinement to truly enable individuals without programming expertise to easily analyze single-cell data.

      Is there a clearly-stated list of dependencies, and is the core functionality of the software documented to a satisfactory level?

      No. The authors have developed an excellent Shiny app for single-cell visualization, enabling users without programming expertise to freely export visualization results from single-cell analysis. The installation commands provided by the authors on https://github.com/cobriniklab/chevreul do indeed allow for the installation of Chevreul. However, Chevreul involves nearly 300 dependency packages, including sub-libraries developed by the authors (seuratTools, chevreulPlot, chevreuldata, chevreulPlot, chevreulProcess, chevreulShiny) as dependencies. Relying solely on the installation commands provided by the authors to install all dependency packages may result in some packages (especially large ones) failing to install due to network bandwidth issues, which is not user-friendly for those without programming experience. Additionally, could the numerous dependency packages of Chevreul potentially cause dependency conflicts with existing R environments? Should the authors recommend users to deploy Chevreul in a new R environment? It is recommended that the authors provide a step-by-step installation guide, explaining potential issues and solutions during the installation process based on the dependencies of Chevreul and its sub-libraries. By installing dependency packages step by step, users can gradually complete the installation of Chevreul. The current installation documentation is clearly not user-friendly for non-programmers and does not align with the authors' statement in the manuscript: "It differs from other scRNAseq analysis packages in its ease of installation and use." At present, the installation documentation provided by the authors may not meet the original design intent of Chevreul. Additionally, the authors should specify that Chevreul supports Seurat version V5.

      Have any claims of performance been sufficiently tested and compared to other commonly-used packages?

      No. The authors could provide specifications for the minimum hardware requirements needed to run Chevreul, such as the number of CPU cores and the amount of memory. Additionally, the authors could offer data on the runtime of Chevreul as the volume of data increases.

      Is automated testing used or are there manual steps described so that the functionality of the software can be verified? No.

      Additional Comment. The authors have developed an R Shiny app for single-cell exploratory data analysis, which will significantly expand the application scenarios of single-cell data analysis and bring great benefits to a wide range of biology practitioners. The large size of Chevreul's installation package indicates the considerable difficulty in its development, reflecting the immense wisdom and effort the authors have invested in creating this package. Chevreul's advantages in visualization and analysis are evident, and if further developed and refined, it is certain to attract even more users in the future. To ensure that such an excellent package as Chevreul can be easily and quickly adopted by users, several suggestions for improving the documentation and enhancing user-friendliness are provided. We hope the authors can refine the package based on the reviewers' feedback and recommendations.

      Re-review: I have carefully reviewed the revised manuscript and am satisfied that all my comments have been adequately addressed. The authors have resolved the software errors reported in the original submission by updating the relevant shiny app modules. They have also enhanced the package documentation to assist users without programming experience in installing and using Chevreul. In the manuscript itself, the authors have provided detailed responses and explanations to each of my points.

      Overall, they have addressed all of my comments thoroughly. That said, a few minor issues remain in the manuscript (revised version with tracked changes) that should be corrected to ensure consistency with academic publishing standards and to help readers better learn how to use Chevreul: 1. On line 52, the placeholder “(doi reference for Shayler et al. data to be provided)” appears—did the authors forget to insert the citation or data link? 2. On line 96, would it be more appropriate to replace “SingleCellExperiments” with “SingleCellExperiment objects”? 3. On line 119, please add a space so that “databases[19–21]used” reads “databases [19–21] used.” 4. For consistency, should the second occurrence of “batchelor” on line 132 be italicized? 5. The Chevreul link is already cited in the “Availability & Implementation” section and need not be repeated in the Figure 1 legend. 6. On line 184, the gene symbol “NRL” should be set in italic Latin script. 7. On the GitHub page (https://github.com/cobriniklab/chevreul), the phrase “A demo with a developing human retina scRNA-seq dataset from Shayler et al. is available here” points to an inaccessible web demo. Restoring this demo in a future update would greatly facilitate experimental biologists in learning and using Chevreul.

    1. Summary of the essay

      In this essay, the author seeks to explain the ‘firehose’ problem in academic research, namely the rapid growth in the number of articles but also the seemingly concurrent decline in quality. The explanation, he concludes, lies in the ‘superstructure’ of misaligned incentives and feedback loops that primarily drive publisher and researcher behaviour, with the current publish or perish evaluation system at the core. On the publisher side, these include commercial incentives driving both higher acceptance rates in existing journals and the launch of new journals with higher acceptance rates. At the same time, publishers seek to retain reputational currency by maintaining consistency and therefore brand power of scarcer, legacy-prestige journals. The emergence of journal cascades (automatic referrals from one journal to another journal within the same publisher) and the introduction of APCs (especially for special issues) also contribute to commercial incentives driving article growth. On the researcher side, he argues that there is an apparent demand from researchers for more publishing outlets and simultaneous salami slicing by researchers because authors feel they have to distribute relatively more publications among journals that are perceived to be of lower quality (higher acceptance rates) in order to gain equivalent prestige to that of a higher impact paper. The state of peer review also impacts the firehose. The drain of PhD qualified scientists out of academia, compounded by a lack of recognition for peer review, further contributes to the firehose problem because there are insufficient reviewers in the system, especially for legitimate journals. Moreover, what peer review is done is no guarantee of quality (in highly selective journals as well as ‘predatory’). One of his conclusions is that there is not just a crisis in scholarly publishing but in peer review specifically and it is this crisis that will undermine science the most. Add AI into the mix of this publish or perish culture, and he predicts the firehose will burst.

      He suggests that the solution lies in researchers taking back power themselves by writing more but ‘publishing’ less. By writing more he means outputs beyond traditional journal publications such as policy briefs, blogs, preprints, data, code and so on, and that these should count as much as peer-reviewed publications. He places special emphasis on the potential role of preprints and on open and more collegiate preprint review acting as a filter upstream of the publishing firehouse. He ends with a call for more collegiality across all stakeholders to align the incentives and thus alleviate the pressure causing the firehose in the first place.

      General Comment

      I enjoyed reading the essay and think the author does a good job of exposing multiple incentives and competing interests in the system. Although discussion of perverse incentives has been raised in many articles and blog posts, the author specifically focuses on some of the key commercial drivers impacting publishing and the responses of researchers to those drivers. I found the essay compellingly written and thought provoking although it took me a while to work through the various layers of incentives.  In general, I agree with the incentives and drivers he has identified and especially his call for stakeholders to avoid polarization and work together to repair the system. Although I appreciate the need to have a focused argument I did miss a more in-depth discussion about the equally complex layers of incentives for institutions, funders and other organisations (such as Clarivate) that also feed the firehose.

      I note that my perspective comes from a position of being deeply embedded in publishing for most of my career. This will have also impacted what I took away from the essay and the focus of my comments below.

      Main comments

      1. I especially liked the idea of a ‘superstructure’ of incentives as I think that gives a sense of the size and complexity of the problem. At the same time, by focusing on publisher incentives and researchers’ response to them he has missed out important parts of the superstructure contributing to the firehose, namely the role of institutions and funders in the system. Although this is implicit, I think it would have been worth noting more, in particular:

        • He mentions institutions and the role of tenure and promotion towards the end but not the extent of the immense and immobilizing power this wields across the system (despite initiatives such as DORA and CoARA).

        • Most review panels (researchers) assessing grants for funders are also still using journal publications as a proxy for quality, even if the funder policy states journal name and rank should not be used

        • Many Institutions/Universities still rely on number and venue of publications. Although some notable institutions are moving away from this, the impact factor/journal rank is still largely relied on. This seems especially the case in China and India for example, which has shown a huge growth in research output. Although the author discusses the firehose, it would have been interesting to see a regional breakdown of this.

        • Libraries also often negotiate with publishers based on volume of articles – i.e they want evidence that they are getting more articles as they renegotiate a specific contract (e.g. Transformative agreements), rather than e.g. also considering the quality of service.

        • Institutions are also driven by rankings in a parallel way to researchers being assessed based on journal rank (or impact factor). How University Rankings are calculated is also often opaque (apart from the Leiden rankings) but publications form a core part. This further incentivises institutions to select researchers/faculty based on the number and venue of their publications in order to promote their own position in the rankings (and obtain funding)

      2. The essay is also about power dynamics and where power in the system lies. The implication in the essay is that power lies with the publishers and this can be taken back by researchers. Publishers do have power, especially those in possession of high prestige journals and yet publishers are also subject to the power of other parts of the system, such as funder and institutional evaluation policies. Crucially, other infrastructure organisations, such as Clarivate, that provide indexing services and citation metrics also exert a strong controlling force on the system, for example:

        • Only a subset of journals are ever indexed by Clarivate. And funders and Institutions also use the indexing status of a journal as a proxy of quality. A huge number of journals are thus excluded from the evaluation system (primarily in the arts and humanities but also many scholar-led journals from low and middle income countries and also new journals). This further exacerbates the firehose problem because researchers often target only indexed journals. I’d be interested to see if the firehose problem also exists in journals that are not traditionally indexed (although appreciate this is also likely to be skewed by discipline)

        • Indexers also take on the role of arbiters of journal quality and can choose to delist or list journals accordingly. Listing or delisting has a huge impact on the submission rates to journals that can be worth millions of dollars to a publisher, but it is often unclear how quality is assessed and there seems to be a large variance in who gets listed or not.

        • Clarivate are also paid large fees by publishers to use their products, which creates a potential conflict of interest for the indexer as delisting journals from major publishers could potentially cause a substantial loss of revenue if they withdraw their fees. Also Clarivate relies on publishers to create the journals on which their products are based which may also create a conflict if Clarivate wishes to retain the in-principle support of those publishers.

        • The delisting of elife recently, even though it is an innovator and of established quality, shows the precariousness of journal indexing.

      3. All the stakeholders in the system seem to be essentially ‘following the money’ in one way or another – it’s just that the currency for researchers, institutions, publishers and others varies. Publishers – both commercial and indeed most not-for profit -  follow the requirements of the majority of their ‘customers’  (and that’s what authors, institutions, subscribers etc are in this system) in order to ensure both sustainability and revenue growth. This may be a legacy of the commercialisation of research in the 20th Century but we should not be surprised that growth is a key objective for any company. It is likely that commercial players will continue to play an important role in science and science communication; what needs to be changed are the requirements of the customers.

      4. The root of the problem, as the author notes, is what is valued in the system, which is still largely journal publications. The author’s solution is for researchers to write more – and for value to be placed on this greater range of outputs by all stakeholders. I agree with this sentiment – I am an ardent advocate for Open Science. And yet, I also think the focus on outputs per se and not practice or services is always going to lead to the system being gamed in some way in order to increase the net worth of a specific actor in the system. Preprints and preprint review itself could be subject to such gaming if value is placed on e.g. the preprint server or the preprint-review platform as a proxy of preprint and then researcher quality.

      5. I think the only way to start to change the system is to start placing much more value on both the practices of researchers (as well as outputs) and on the services provided by publishers. Of course saying this is much easier than implementing it.

      Other comments

      1. A key argument is that higher acceptance rates actually create a perverse incentive for researchers to submit as many manuscripts as possible because they are more likely to get accepted in journals with higher acceptance rates. I disagree that higher acceptance rates per se are the main incentive for researchers to publish more. More powerful is the fact that those responsible for grants and promotion continue to use quantity of journal articles as a proxy for research quality.

      2. Higher acceptance rates are not necessarily an indicator of low quality or a bad thing if it means that null, negative and inconclusive results are also published

      3. The author states that Journal Impact Factors might have been an effective measure of quality in the past.  I take issue with this because the JIF has, as far as I know, always been driven by relatively few outliers (papers with very high citations) and I don’t know of evidence to show that this wasn’t also true in the past. It also makes the assumption that citations = quality.

      4. The author asks at one point “Why would field specialization need a lower threshold for publication if the merits of peer review are constant? ” I can see a case for lower thresholds, however, when the purpose of peer review is primarily to select for high impact, rather than rigour, of the science conducted. A similar case might be made for multidisciplinary research, where peer reviewers tend to assess an article from their discipline’s perspective and reject it because the part that is relevant to them is not interesting enough… Of course, this all points to the inherent problems with peer review (with which I agree with the author)

      5. The author puts his essay in appropriate context, drawing on a range of sources to support his argument. I particularly like that he tried to find source material that was openly available.

      6. He cites 2 papers by Bjoern Brembs to substantiate the claim that there is potentially poorer review in higher prestige journals than in lower ranked journals. These papers were published in 2013 and 2018 and the conclusions relied, in part, on the fact that higher ranked journals had more retractions. Apart from a potential reporting bias, given the flood of retractions across multiple journals in more recent years, I doubt this correlation now exists?

      7. The author works out submission rates from the published acceptance rates of journals. The author acknowledges this is only approximate and discusses several factors that could inflate or deflate it. I can add a few more variables that could impact the estimate, including: 1) the number of articles a publisher/journal rejects before articles are assigned to any editor (e.g. because of plagiarism, reporting issues or other research integrity issues), 2) the extent to which articles are triaged and rejected by editors before peer review (e.g. because it is out of scope or not sufficiently interesting to peer review); the number of articles rejected after peer review;  and 4) the extent to which authors independently withdraw an article at any stage of the process. When publishers publish acceptance rates, they don’t make it clear what goes into the numerator or the denominator and there are no community standards around this. The author rightly notes this process is too opaque.

      Catriona J. MacCallum

      As is my practice, I do not wish to remain anonymous. Please also note that I work for a large commercial publisher and am writing this review in an independent capacity such that this review reflects my own opinion, which are not necessarily those of my employer.

    2. This is a well written and clear enough piece that may be helpful for a reader new to the topic. To people familiar with the field there is not so much which is new here. The final recommendation is not well expressed. As currently put it is, I think, wrong. But it is a provocative idea. I comment section by section below.

      The first paragraphs repeat well established facts that there are too many papers. Seppelt et al’s contribution is missing here. It also reproduces the disengenuous claim, by a publisher’s employee, that publishers ‘only’ respond to demand. I do not think that is true. They create demand. They encourage authors to write and submit papers, as anyone who has been emailed by MDPI recently can testify. Why repeat something which is so inaccuate?

      The section on ‘upstream of the nozzle’ is rather confusing. I think the author is trying to establish if more work is being submitted. But this cannot be deduced from the data presented. No trends are given. Rejection rates will be a poor guide if the same paper is being rejected by several journals. I was also confused by the sources used to track growth in papers – why not just use Dimensions data? The final paragraph again repeats well known facts about the proliferation of outlets and salami slicing. Thus far the article has not introduced new arguments.

      Minor points in this section:

      • there are some unsupported claims. Eg ‘This is a practice that is often couched within the seemingly innocuous guise of field specialty journals.’

      • I also do not understand the logic of this rather long sentence: ‘The expansion of journals with higher acceptance rates alters the rational calculus for researchers - all things being equal higher acceptance rates create a perverse incentive to submit as many manuscripts as possible since the underlying probability of acceptance is simply higher than if those same publications were submitted to a journal with a lower acceptance rate, and hence higher prestige.’ I suggest it be rephrased

      The section on peer review (Who’s testing the water) is mostly a useful review of the issues. But there are some problems which need addressing. Bizarrely, when discussing whether there enough scientists, it fails to mention Hanson et al’s global study, despite linking to it’s preprint in the opening lines. Instead the author adopts a parochial North American approach and refers only to PhDs coming from the US. It is not adequate to take trends in one country to cannot explain an international publishing scene. These are not the ‘good data’ the author claims. Likewise the value of data on doctorates not going onto a post-doc hinges on how many post-docs there are. That trend is not supplied. This statement ‘Almost everyone getting a doctorate goes into a non-university position after graduation’ may be true, but no supporting data are supplied to justify it. Nor do we know what country, or countries, the author is referring to.

      The section ‘A Sip from the Spring’ makes the mistaken claim that researchers hold market power. This is not true. Researchers institutions, their libraries and governments are the main source of publisher income. It is here that the key proposal for improvement is made: researcher can write more and publish less. But if the problem is that there is too much poorly reviewed literature then this cannot be the solution. Removing all peer review, would mean there is even more material to read whose appearance is not slowed up by peer review at all. If peer review is becoming inadequate, evading it entirely is hardly a solution.

      This does not mean we should not release pre-prints. The author is right to advocate them, but the author is mistaken to think that this will reduce publishing pressures. The clue is in their name ‘pre-print’. Publication is intended.

      Missing from the author’s argument is recognition of the important role that communities of researchers form, and the roles that journals play in providing venues for conversation, disagreement and disucssion. They provide a filter. Yes researchers produce other material than publications as the author states: ‘grant proposals, editorials, policy briefs, blog posts, teaching curricula and lectures, software code and documentation, dataset curation, and labnotes and codebooks.’ I would add email and whatsapp messages to that list. But adding all that to our reading lists will not reduce the volume of things to be read. It must increase it. And it would make it harder to marshall and search all those words.

      But the idea is provocative nonetheless. Running through this paper, and occasionally made explicit, is the fact that publishers earn billions from their ‘service’ to academia. They have a strong commercial interest in our publishing more, and in competing with each other to produce a larger share of the market. If writing more, and publishing less, means we need to find ways of directing our thoughts so that they earn less money for publishers, then that could bring real change to the system.

      A minor point: the fire hose analogy is fully exploited and rather laboured in this paper. But it is a North American term and image, that does not travel so easily.

    1. Reviewer #1 (Public review):

      Wojcik et al. conducted a working memory (WM) experiment in which participants had to press the right or left button after being presented with a square (upright) or diamond stimulus. The response mapping ('context') depended on a colour cue presented at the start of each trial. This results in an XOR task, requiring participants to integrate colour and shape information. Importantly, multiple colours could map onto the same context, allowing the authors to disentangle the (neural) representations of context from those of colour.

      The authors report that participants learn the appropriate context mappings quickly over the course of the experiment. Neural context representation is evident in the WM delay and emerges later in the experiment, unlike colour representation, which is present only during colour presentation and does not evolve over experimental time. There are furthermore results on neural geometry (averaged cross-generalized decoding) and neural dimensionality (averaged decoding after shattering all task dimensions), which are somewhat harder to interpret.

      Overall, the findings are likely Important, as they highlight the flexible and future-oriented nature of WM. The strength of support at the moment is incomplete: there are some loose ends on the context/colour generalization, and the evidence for the XOR neural representation is not (yet) well-established.

      I have one (major) concern and several suggestions for improvement.

      (1a) As the authors also acknowledge in several places, the XOR dimension is strongly correlated with motor responses, in any case toward the end of the task (and by definition for all correct trials). This should be dealt with properly. Right now, e.g. Figures 2g/i, 2h/j, 3e/g, 3f/h are highly similar, respectively, because of this strong collinearity. I would remove the semi-duplicate graphs and/or deal with this explicitly through some partial regression, trial selection, or similar (and report these correlations).

      (1b) Most worrisome in this respect is that one of the key results presented is that XOR decoding increases with learning. But also task accuracy increases, meaning that the proportion of correct trials increases with learning, meaning that the XOR and motor regressors become more similar over experimental time. This means that any classifier picking up on motor signals will be better able to do so later on in the task than earlier on. (In other words, the XOR regressor may be a noisy version of the motor regressor early on, and a more precise version of the motor regressor later on.) Therefore, the increase in XOR decoding over experimental time may be (entirely) due to an increase in similarity between the XOR and motor dimensions. The authors should either rule out this explanation, and/or remove/tone down the conclusions regarding the XOR coding increase. (Note that the takeaway regarding colour/context generalization does not depend on this analysis, fortunately.) The absence of a change in motor decoding with learning (as reported on page 11) does not affect this potential confound; in fact it is made more likely with it.

      (2) Bayes factors would be valuable in several places, especially with null results (p. 5) or cases with borderline-significant p-values.

      (3) The authors' interpretation of the key results implies that the abstract coding learned over the task should be relevant for behaviour. The current results do not show a particularly strong behavioural relevance of coding, to put it mildly. It might be worth exploring whether neural coding expresses itself in reaction times, rather than (in)correct responses, and reflecting on the (lack of) behavioural relevance in the Discussion.

      (4) All data and experiment/analysis code should be made available, in public repositories (i.e., not "upon request").

    1. Reviewer #1 (Public review):

      This is a theoretical study addressing the problem of constructing integrator networks for which the activity state and integrated variables display non-trivial topologies. Historically, researchers in theoretical neuroscience have focused on models with simple underlying geometries (e.g., circle, torus), for which analytical models could be more easily constructed. How these models can be generalised to complex scenarios is, however, a non-trivial question. This is furthermore a time-sensitive issue, as population recordings from the brain in complex tasks and environments increasingly require the ability to construct such models.

      I believe the authors do a good job of explaining the challenges related to this problem. They also propose a class of models that, although not fully general, overcome many of these difficulties while appearing solid and well-functioning. This requires some non-trivial mathematics, which is nevertheless conveyed in a reasonably accessible form. The manuscript is well written, and both the methodology and the code are well documented.

      That said, I believe the manuscript has two major limitations, which could be addressed in a revision. First, some of the assumptions underlying this class of models are somewhat restrictive but are not sufficiently discussed. Second, although the stated goal of the manuscript is to provide practical recipes for constructing integrator networks, the methods section is not very explicit about the specific steps required for different geometries. I elaborate on these limitations below.


      (1) The authors repeatedly describe MADE as a technique for constructing integrators of specified "topologies and geometries." What do they mean by "geometries"? Intuitively, I would associate geometry with properties beyond topology, such as embedding dimensionality or curvature. However, it is unclear to me to what extent these aspects are explicitly specified or controlled in MADE. It seems that geometry is only indirectly defined via the connectivity kernel, which itself obeys certain constraints (e.g., limited spatial scale; see below). I believe it is important for the authors to clarify what they mean by "geometry." They should also specify which aspects are under their control, and whether, in fact, all geometries can be realized.


      (2) The authors make two key assumptions: that connectivity is purely inhibitory and that the connectivity kernel has a small spatial scale. They state that under these conditions, the homogeneous fixed point becomes unstable, leading to a non-periodic state. However, it seems to me that they do not demonstrate that this emergent state is necessarily a bump localized in all manifold dimensions -- although this is assumed throughout the manuscript. Are other solutions possible or observed? For example, might the network converge to states that are localized in one dimension but extended in another, yielding e.g., stripe-like activity in the plane rather than bumps? In other words, does the proposed recipe guarantee convergence to bumps? This is a critical point and should be clarified.


      (3) Related to the question above: What are the failure modes when these two assumptions are violated? Does the network always exhibit runaway activity (as suggested in the text), or can other types of solutions emerge? It would be useful if the authors could briefly discuss this.


      (4) Again, related to the question above: can this formalism be extended to activity profiles beyond bumps? For example, periodic fields as seen in grid cells, or irregular fields as observed in many biological datasets -- particularly in naturalistic environments? These activity profiles are of key importance to neuroscientists, so I believe this is an important point that should at least be addressed in the Discussion. Can MADE be naturally extended to these scenarios? What are the challenges involved?


      (5) Line 119: "Since σ is the only spatial scale being introduced in the dynamics, we qualitatively expect that a localized bump state within the ball will have a spatial scale of O(σ)."
Is this statement always true? I understand that the spatial scale of the synaptic inputs exchanged via recurrent interactions (i.e., the argument of the function f in Equation 1) is characterised by the spatial scale σ. But the non-linear function f could modify that spatial scale -- for example, by "cutting" the bump close to its tip. Where am I wrong? Could the authors clarify?


      (6) The authors provide beautiful intuition about the problem of constructing integrators on non-trivial topologies and propose a mathematically grounded solution using Killing vectors. Of course, solutions based on Killing vectors are more complex than those with constant offsets, which raises the question: Is the brain capable of learning and handling such complex structures? Perhaps the authors could speculate in the Discussion about the biological plausibility of these mechanisms.


      (7) A great merit of this paper is that it provides mathematical tools for neuroscience researchers to build integrators on non-trivial geometries. I found that, although all the necessary information is present in the Methods, the authors could improve the presentation by schematizing the steps required to build each type of model. It would be extremely useful if, for each considered geometry, the authors provided a short list of required components: the manifold P, the choice of distance, and the connectivity offsets defined by the Killing vectors. Currently, this information is presented, but scattered (not grouped by geometry).

    1. Reviewer #1 (Public review):

      Summary of the paper:

      The paper presents an elegant task designed to investigate humans' ability to generalize knowledge of learned graph structures to new experiences that share the same structure but are built from different stimuli. Using behavior and MEG recordings, the authors test evidence for neural representation and application of structural knowledge.

      Review overview:

      While the task design is elegant, it isn't clear to me that the data support all the claims made in the paper. I have detailed my concerns below.

      Major concerns

      (1) The authors claim that their findings reveal "striking learning and generalization abilities based on factorization of complex experiences into underlying structural elements, parsing these into distinct subprocesses derived from past experience, and forming a representation of the dynamical roles these features play within distinct subprocesses." And "neural dynamics that support compositional generalisation, consistent with a structural scaffolding mechanism that facilitates efficient adaption within new contexts".

      a. First, terms used in these example quotes (but also throughout the paper) do not seem to be well supported by data or the task design. For example, terms such as 'compositional generalisation' and 'building blocks' have important relevance in other papers by (some of) the same authors (e.g., Schwartenbeck et al., 2023), but in the context of this experiment, what is 'composition'? Can the authors demonstrate clear behavioural or neural evidence for compositional use of multiple graph structures, or alternatively remove reference to these terms? In the current paper, it seems to me that the authors are investigating abstract knowledge for singular graph structures (together with the influence of prior learning), as opposed to knowledge for the compound, more complex graph formed from the product of two simpler graphs.

      b. While I would like to be convinced that this data provides evidence for the transfer of abstract, structural knowledge, I think the authors either need to provide more convincing evidence or tone down their claims.

      Specifically:

      (i) Can the increase in neural similarity between stimuli mapping to the same abstract structural sub-process not be explained by temporal proximity in experiencing the transitions (e.g., Cai et al., 2016)? Indeed, behavior seems to be dominated by direct experience of the structure as opposed to applying abstract knowledge of equivalent structures (and, as a result, there is little difference in behavioural performance between experience and inference probes).

      (ii) The strongest evidence for neural representation of abstract task structures seems to be the increase in similarity by transition type. But this common code for 'transition type' is only observed for 6-bridge graphs and only for experienced transitions. There was no significant effect in inference probes. Therefore, there doesn't seem to be evidence for the application of a knowledge scaffold to facilitate transfer learning. Instead, the data reflects learning from direct experience and not generalisation.

      (iii) The authors frequently suggest that they are providing insight into temporal dynamics, but there is no mention of particular oscillations or particular temporal sequences of neural representation that support task performance.

      (2) Regardless of point (b), can the authors provide more convincing evidence for a graph structure being represented per se (regardless of whether this representation is directly experienced or inferred)? From Figure 3C, it seems that the model RDM doesn't account for relative distance within the graph. Do they see evidence for distance coding? Can they reconstruct the graph from representational patterns using MDS?

      (3) In general, the figures are not very clear, and the outcome from statistical tests is not graphically shown. The paper would be easier to digest if, for example, Figures 1-2 were made clearer and statistical significance relative to chance were indicated throughout. To give two examples: (i) Figure 1 should clearly indicate what is meant by observed and held-out transitions and whether it is just the transition or also the compound that is new to the participant. (ii) Figure 2D-E could be shown with relevant comparisons and simpler statistical comparisons. Currently, it is hard to follow without carefully reading the legend.

    1. Joint Public Review:

      Summary:

      This manuscript couples a 32-parameter model with simulation-based inference (SBI) to identify parameter changes that can compensate for three canonical hyperexcitability perturbations (interneuron loss, recurrent-excitatory sprouting, and intrinsic depolarisation). The study demonstrates a careful implementation of SBI and offers a practical ranking of "compensatory levers" that could, in principle, guide therapeutic strategies for epilepsy and related network disorders.

      Strengths:

      (1) By analysing three mechanistically distinct hyper-excitable regimes within the same modelling and inference framework, the work reveals how different perturbations require different compensatory interventions.

      (2) The authors adopt posterior estimation to systematically rank the efficiency of different mechanisms in balancing hyperexcitability.

      (3) Code and data are available.

      Weaknesses:

      (1) A highly dense presentation of the simulated models and undefined symbols makes it hard for readers outside the modelling community to follow the biological message. An illustration of the models, accompanied by some explanations and references to the main equations and parameters discussed in this paper, would make the first section much more straightforward.

      (2) This methodology appears to be a brute-force approach, requiring millions of simulations to tune 32 parameters in a network of 500-700 cells. It isn't scalable. Moreover, the authors did not use cross-validation, which, with a relatively low increase in computational cost, would provide a quantitative measure as to how well it generalizes; this combination raises doubts about both scalability and reliability.

      (3) Several parameters remain so broadly distributed after fitting that the model cannot say with confidence which specific changes matter. Therefore, presenting them as "compensatory levers" is somewhat questionable.

      (4) Every conclusion is drawn from simulated data; without testing the predictions on recordings, we have no evidence that the proposed interventions would work in real neural tissue. Because today we cannot diagnose which of the three modelled pathological regimes is actually present in vivo, the paper's recommendations cannot yet be used to guide therapy.

  13. content.ebscohost.com content.ebscohost.com
    1. This study examined the frequency of code-switching by Spanish–English-speaking children as a functionof language proficiency in each language and diagnosis(developmental language disorder [DLD] or typical languagedevelopment [TLD]).

      Main Idea of the article.

    2. However, it has beenwell documented that bilingual children with DLD and theirpeers with TLD will switch between the two languages informal and informal language

      code switching or "broken" English

    3. Purpose: This study examined the frequency of code-switching by Spanish–English-speaking children as a functionof language proficiency in each language and diagnosis(developmental language disorder [DLD] or typical languagedevelopment [TLD]).Method: Sixty-two Spanish–English-speaking children, 5–7 years of age, participated in this study (24 with DLD and38 with TLD). Language samples were used to determine thelevel of language proficiency in each language as a continuumof performance. Correlational analyses were conducted toevaluate the relationship between Spanish and Englishlanguage proficiency and the frequency of code-switchingconsidering the total number of code-switched words,intrasentential code-switching (i.e., number of sentences/C-units with code-switched parts), and intersentential code-switching (i.e., code-switched sentences/C-units). Negativebinomial and zero-inflated Poisson regressions wereconducted to evaluate whether the frequency of code-switched words, code-switched parts, and code-switchedsentences is different across children with DLD and their peerswith TLD, controlling for Spanish and English proficiency.Results: When Spanish was the target language, lowerproficiency in Spanish was associated with higher frequencyof code-switches to English for the group with DLD, butnot for their peers with TLD who code-switched to Englishregardless of their Spanish and English proficiency. Therewere no statistically significant effects of diagnosis on thefrequency of code-switches.Conclusion: Results indicate that code-switching occurssimilarly across children with DLD and their peers with TLD;therefore, the frequency and type of code-switches shouldnot be used as an indicator of DLD.

      The purpose of this research and the results from the research.

    1. University of Texas finance professor John M. Griffin and his doctoral student Amin Shams detailed Tether’s activities in a 2018 paper. For the period of March 1, 2017 to March 31, 2018, Griffin and Shams found plausible evidence to conclude that a few actors printed tethers without real dollar backing to artificially rescue Bitcoin (BTC) when its price fell and stimulate its overall growth. The trading activity was concentrated on Bitfinex with trading patterns not seen on other exchanges. Griffin and Shams also noted the dubious nature of Tether’s reserves and demonstrated unbacked issuance. So long as no one could tell the difference between a tether token and a real dollar, these unbacked tokens could be traded as if they were real dollars. Think of it as a cheat code in a video game for unlimited gold when every other player must grind quests to get them.

      Perfect, expansionary fiscal bitcoin issuance

    1. The Utopian Vision of Smalltalk

      Smalltalk as a vision, not just a language:

      “Smalltalk is a vision of the world as it should be, not necessarily as it is but the way it's supposed to be.”

      A future imagined where Smalltalk has won:

      “As we all know, Smalltalk has won and a new age, a Utopia is upon us.”


      Language and System Design Philosophy

      Smalltalk wasn’t originally a full language, but a live object system:

      “Small Talk was never a programming language it was a programming system… there is no way to express a class as a linguistic construct.”

      Early Smalltalk used a clever hack for class definition:

      “Yes a class just happens to be this object that knows how to make other objects… but it is a hack and it's important to recognize that.”

      Textual syntax was later added to enable integration with tools like source control:

      “We came up with an actual textual Syntax for Smalltalk… interacting with Source Control Systems became really easy.”


      Resolving Dialect & Library Fragmentation

      Overcoming dialect fragmentation:

      “We realized all this and came together to define a Common Language.”

      Library standardization was harder but necessary:

      “The situation with the libraries was a bit of a mess… but we realized that the benefits of having a standard… outweighed [vendor-specific advantages].”


      Reflection and Mirrors

      Original reflection API mixed base and meta levels:

      “These class objects… are playing two roles… that architecture was replaced with mirrors.”

      Mirrors cleanly separate base-level and meta-level:

      “We clearly separated the base level and the meta level… that has all kinds of advantages for deployment, distribution, security.”


      Mobile and UI Integration

      Early design enabled seamless remote mobile dev:

      “We were already running images that could manipulate other images… fantastic development experiences almost immediately.”

      Binding to native UI was essential for adoption:

      “We realized early on that we should… bind to Native stuff… they run natively on all the platforms.”

      Tool evolution toward navigation-based UIs:

      “We wouldn’t dream of using the 40-year-old design of a browser… we evolved our tools.”


      Foreign Function Interfaces and “Alien” Objects

      Unified model for system integration:

      “These things outside of Smalltalk… they're second class objects… but they were objects.”

      Alien objects replaced clunky primitive syntax:

      “We didn’t need these hacky things like the Primitive syntax… everything was an object.”


      Modularity and Deployment

      Object ecosystems are difficult to transfer without structure:

      “Fish and objects live in an environment… if you want the fish to be transferred and survive you have to bring its friends.”

      Deployment as serialized object graphs:

      “We realized that an application… was also an object… we just had to serialize it.”

      Avoided “extraneous concepts like packages”:

      “We used the concepts of classes and objects to solve our distribution and modularity problems.”


      Optional Typing and Pluggable Type Systems

      Types as useful but not mandatory:

      “Types can be useful at times… we certainly don’t want them telling us how to live.”

      Static analyses used for optimization:

      “Extra information helps… we could have multiple analyses that didn’t conflict.”


      Web and Live Environments

      Ahead of the curve with web-based IDEs:

      “We built complete programming environments with all their features running in the browser long before anyone else.”

      True liveness beyond class browsers:

      “We never want to look at Dead code… environments evolved to show Exemplar values.”

      Integration of ML into the environment:

      “We started to incorporate [ML] in our live programming environments.”


      Performance and JIT

      Ahead of AOT compilation trend:

      “We had systems that would… have a database of compiled methods… ready to apply them instantly on Startup.”

      Addressed Apple’s no-JIT constraint early:

      “We already had techniques for that… just deploy that and turn off the jit.”


      Security through Object Capabilities

      Reinvention of capability-based security:

      “The capability you want is an object… the only damage you can do is through a message send.”

      Enforced encapsulation and access control:

      “We needed an access control model… public, protected, and private messages.”


      Education and Longevity

      Revolutionary ideas in education:

      “Smalltalk had these ideals of Education from day one.”

      Avoided student drop-off by staying dominant:

      “We short circuited [students leaving] because of all our previous successes.”

      Hypertext-based live documentation:

      “We could put live widgets embedded in text long before the worldwide web.”


      Satirical Punchline

      Entire talk is satirical utopia:

      “This is the world as it should be… imagine if the world wasn’t like the way I described it…”

      Ironic jab at real-world language use:

      “Think if one of the world’s largest… sites ran on PHP… we all know Facebook runs on Smalltalk.”

    1. Unlike many programmers, I love code review. I find that it is one of the best parts of the process of programming. I can help people learn, and develop their skills, and learn from them, and appreciate the decisions they made, develop an impression of a fellow programmer’s style. It’s a great way to build a mutual theory of mind. Of course, it can still be really annoying; people make mistakes, often can’t see things I find obvious, and in particular when you’re reviewing a lot of code from a lot of different people, you often end up having to repeat explanations of the same mistakes. So I can see why many programmers, particularly those more introverted than I am, hate it. But, ultimately, when I review their code and work hard to provide clear and actionable feedback, people learn and grow and it’s worth that investment in inconvenience. The process of coding with an “agentic” LLM appears to be the process of carefully distilling all the worst parts of code review, and removing and discarding all of its benefits. The lazy, dumb, lying robot asshole keeps making the same mistakes over and over again, never improving, never genuinely reacting, always obsequiously pretending to take your feedback on board.

      a junior engineer you don't keep around

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Xie and colleagues presents transcriptomic experiments that measure gene expression in eight different tissues taken from adult female and male mice from four species. These data are used to make inferences regarding the evolution of sex-biased gene expression across these taxa.

      Strengths:

      The experimental methods and data analysis appear appropriate. The authors promote their study as unprecedented in its size and technical precision.

      We do not understand the statement "the authors promote" as if there was a doubt about this. If there is a doubt, we welcome to see it specified.

      Weaknesses:

      The manuscript does not present a clear set of novel evolutionary conclusions. The major findings recapitulate many previous comparative transcriptomics studies - gene expression variation is prevalent between individuals, sexes, and species; and genes with sex-biased expression evolve more rapidly than genes with unbiased expression - but it is not clear how the study extends our understanding of gene expression or its evolution.

      There have been no "previous comparative transcriptomics studies" at a micro- evolutionary scale in animals, hence, we do not "replicate" these. And our contrast between somatic and gonadal patterns reveals insights that have not been recognized before, namely that gonadal sex-specific expression turnover is actually not faster that the corresponding non-sex-specific truover. We have now further clarified this distinction throughout the text and have also adapted the title of the paper accordingly.

      We agree with the overall statement that "gene expression variation is prevalent between individuals, sexes, and species" but the aspect of "sex-biased gene expression between individuals" has not been systematically analysed before in such a context.

      Concerning the statement that "genes with sex-biased expression evolve more rapidly than genes with unbiased expression", we note that this is mostly derived from gonadal data and that there is no study that has quantified this so far at a population level and between subspecies in comparison to somatic data.

      Our results show further that previous assumptions of a substantial set of genes with sex- biased expression conserved between mice and humans are due to underestimating the convergence issues when there is an extremly fast turnover of sex-biased gene expression. This has a major implication for using mice as a model for gender-speficic medicine questions in humans.

      Many gene expression differences between individual animals are selectively neutral, because these differences in mRNA concentration are buffered at the level of translation, or differences in protein abundance have no effect on cellular or organismal function. The hypothesis that sex-biased genes are enriched for selectively neutral expression differences is supported by the excess of inter-individual expression variance and inter-specific expression differences in sex-biased genes.

      This statement repeats a statement from the first round of reviews. We had added new data and extensive discussion on this topic. We do not understand why this has not been taken into account. In fact, a major strength of our paper is that it shows that most sex- biased gene expression differences are not neutral!

      There are two major issues here: to identify sex-biased gene expression in the first place, we (and all other papers in the field) use the neutral model as null-hypothesis. Genes that are not compatible with this null-hypothesis are considered sex-biased. In contrast to most previous papers, we have the possibility to take into account the variances between individuals to add an additional significance test. Hence, we can apply a much more rigorous two-step process: first a ratio-cutoff plus a Wilcoxon rank sum test with correction for multiple testing to identify significant deviations from the null-hypothesis. We have added some additional statements in the Results and Discussion sections to emphasize this.Second, by focusing on the genes that are not following a neutral model, the variance and divergences data support the action of selection, rather than neutral drift.

      A higher rate of adaptive coding evolution is inferred among sex-biased genes as a group, but it is not clear whether this signal is driven by many sex-biased genes experiencing a little positive selection, or a few sex-biased genes experiencing a lot of positive selection, so the relationship between expression and protein-coding evolution remains unclear.

      Again, there are two major issues here. First, the distribution of alpha-values shown in Figure 3B are rather homogeneous, i.e. there is not support for a scenario that the average is driven by only a few genes.

      Second, it seems that the referee wants to see an analysis where dn/ds ratios are broken down for every single gene. This has been done in previous papers, but it is now understood that this procedure is fraught with error because of the demographic contingencies inherent to natural populations that can yield wrong results for individual loci. We have added some statements to the text to clarify this further.

      It is likely that only a subset of the gene expression differences detected here will have phenotypic effects relevant for fitness or medicine, but without some idea of how many or which genes comprise this subset, it is difficult to interpret the results in this context.

      It is the basic underlying assumption for the whole research field that significantly sex- biased genes are phenotypically relevant for fitness, since they would otherwise not be sex- biased in the first place.

      Throughout the paper the concepts of sexual selection and sexually antagonistic selection are conflated; while both modes of selection can drive the evolution of sexually dimorphic gene expression, the conditions promoting and consequence of both kinds of selection are different, and the manuscript is not clear about the significance of the results for either mode of selection.

      We had explained in our previous response that our data collection was not designed to distinguish between these two processes. But given that the issue is being brought up again, we have now added some discussion on this issue.

      The manuscript's conclusion that "most of the genetic underpinnings of sex-differences show no long-term evolutionary stability" is not supported by the data, which measured gene expression phenotypes but did not investigate the underlying genetic variation causing these differences between individuals, sexes, or species.

      We agree that - under a strict definition - our use of the term "genetic underpinning" in this conclusion sentence can be criticized. The most correct term would be "transcriptional underpinnings", but of course, given that it is the current practice of the whole field to assume that "transcriptional" is part of the overall genetics, we do not consider our initial statement as incorrect. Still, we have changed the term accordingly.

      Furthermore, most of the gene expression differences are observed between sex-specific organs such as testes and ovaries, which are downstream of the sex-determination pathway that is conserved in these four mouse species, so these conclusions are limited to gene expression phenotypes in somatic organs shared by the sexes.

      Yes - correct. But the whole focus of the paper is on somatic expression, i.e. organs that share the same cell compositions. Of course, the comparison between gonadal organs is conflated by being composed of different cell types. We have extended the discussion of this point.

      The differences between sex-biased expression in mice and humans are attributed to differences in the two species effective population sizes; but the human samples have significantly more environmental variation than the mouse samples taken from age-matched animals reared in controlled conditions, which could also explain the observed pattern.

      These are indeed the two alternative explanations that we had discussed (last paragraph of the discussion section, now the penultimate paragraph).

      The smoothed density plots in Figure 5 are confusing and misleading. Examining the individual SBI values in Table S9 reveals that all of the female and male SBI values for each species and organ are non-overlapping, with the exception of the heart in domesticus and mammary gland in musculus, where one male and one female individual fall within the range of the other sex. The smoothed plots therefore exaggerate the overlap between the sexes;

      Smoothing across discrete values is an entirely standard procedure for continuous variables. It allows to visualize the inherent data trends that cannot easily be glanced from simple inspection of the actual values. This is a mathematical procedure, not an "exaggeration". We used the same smoothening procedure for all the comparisons, and it is clear that the distributions between females and males of the sex organs and a few somatic organs are well separated (non-overlapping), which serves as a control.

      in particular, the extreme variation shown in the SBI in the mammary glands in spretus females and spicilegus males is hard to understand given the normalized values in Table S3. The R code used to generate the smoothed plots is not included in the Github repository, so it is not possible to independently recreate those plots from the underlying data.

      We apologize that there was indeed an error in the Figure - the columns for SPR and SPI were accidentally interchanged. We have corrected this figure. Generally, the smoothened patterns we show are easily verified by looking up the respective primary values. We apologize that the code lines for the plots were accidentally omitted. We have used a standard function from ggplot2: geom_density, with "adjust=3, alpha=0.5" for all plots and included this description in the Methods. We have now added this to the R code in the GitHub repository.

      The correlations provided in Table S9 are confusing - most of the reported correlations are 1.0, which are not recovered when using the SBI values in Table S9, and which does not support the manuscript's assertion that sex-biased gene expression can vary between organs within an individual. Indeed, using the SBI values in Table S9, many correlations across organs are negative, which is expected given the description of the result in the text.

      There is a misunderstanding here. The tables do not report correlations, but only p-values for correlations, the raw ones and the ones after corrections for multiple testing. P = 1.0 means no significant correlation. We have adjusted the caption of this table to clarify this further.

      Reviewer #3 (Public review):

      This manuscript reports interesting data on sex differences in expression across several somatic and reproductive tissues among 4 mice species or subspecies. The focus is on sex- biased expression in the somatic tissues, where the authors report high rates of turnover such that the majority of sex-biased genes are only sex-biased in one or two taxa. The authors show sex-biased genes have higher expression variance than unbiased genes but also provide some evidence that sex-bias is likely to evolve from genes with higher expression variance. The authors find that sex-biased genes (both female- and male-biased) experience more adaptive evolution (i.e., higher alpha values) than unbiased genes. The authors develop a summary statistic (Sex-Bias Index, SBI) of each individual's degree of sex- bias for a given tissue. They show that the distribution of SBI values often overlap considerably for somatic (but not reproductive) tissues and that SBI values are not correlated across tissues, which they interpret as indicating an individual can be relatively "male-like" in one tissue and relatively "female-like" in another tissue.

      This is a good summary of the data, but we are puzzled that it does not include the completely new module analysis and the finding of extremely fast evolution of sex-biased somatic gene expression compared to the gonadal one.

      Though the data are interesting, there are some disappointing aspects to how the authors have chosen to present the work. For example, their criteria for sex-bias requires an expression ratio of one sex to the other of 1.25. A reasonably large fraction of the "sex- biased genes" have ratios just beyond this cut-off (Fig. S1). A gene which has a ratio of 1.27 in taxa 1 can be declared as "sex-biased" but which has a ratio of 1.23 in taxa 2 will not be declared as "sex-biased". It is impossible to know from how the data are presented in the main text the extent to which the supposed very high turnover represents substantial changes in dimorphic expression. A simple plot of the expression sex ratio of taxa 1 vs taxa 2 would be illuminating but the authors declined this suggestion.

      Choosing a cutoff is the standard practice when dealing with continuously distributed data. As we have pointed out, we looked at various cutoff options and decided to use the present one, based on the observed data distributions. Note that some studies have used even lower ones (e.g. 1.1). To visualize the data distribution, we had provided the overall distribution of ratios, because one would have to look at many more plots otherwise. But we have now also added individual plots as Figure 1, Figure supplement 2, as requested. They confirm what is also evident from the overall plots, namely that most ratio changes are larger than the incremental values suggested by the reviewer. Note that the original data are of course also available for inspection.

      I was particularly intrigued by the authors' inference of the proportion of adaptive substitutions ("alpha") in different gene sets. The show alpha is higher for sex-biased than unbiased genes and nicely shows that the genes that are unbiased in focal taxa but sex- biased in the sister taxa also have low alpha. It would be even stronger that sex-bias is associated with adaptive evolution to estimate alpha for only those genes that are sex- biased in the focal taxa but not in the sister taxa (the current version estimates alpha on all sex-biased genes within the focal taxa, both those that are sex-biased and those that are unbiased in the sister taxa).

      We have added the respective values in the results section, but since fewer genes are involved, they are less comparable to the other sets of genes. Still, the tendencies remain.

      The author's Sex Bias Index is measured in an individual sample as: SBI = median(TPM of female-biased genes) - median(TPM of male-biased genes). This index has some strange properties when one works through some toy examples (though any summary statistic will have limitations). The authors do little to jointly discuss the merits and limitations of this metric. It would have been interesting to examine their two key points (degree of overlapping distributions between sexes and correlation across tissues) using other individual measures of sex-bias.

      We had responded to this comment before (including the explanation that it has no strange properties when one applies the normalization that is now implemented) and we have added a whole section devoted to the discussion of the merits of the SBI. We do not know which other "individual measures of sex-bias" this should be compared to. Still, we have now added a paragraph in the discussion about using PCA as an alternative to show that this would result in similar conclusions, but is technically less suitable for this purpose.

      Figure 5 shows symmetric gaussian-looking distributions of SBI but it makes me wonder to what extent this is the magic of model fitting software as there are only 9 data points underlying each distribution. Whereas Figure 5 shows many broadly overlapping distributions for SBI, Figure 6 seems to suggest the sexes are quite well separated for SBI (e.g., brain in MUS, heart in DOM).

      We use a standard fitting function in R (see above), which tries to fit a normalized distribution, but this function can also add an additional peak when the data are too heterogeneous (e.g. Mammary in Figure 7).

      Fig. S1 should be shown as the log(F/M) ratio so it is easier to see the symmetry, or lack thereof, of female and male-biased genes.

      The log will work differently for values <1, compared to values >1 when used in a single plot. We have now generated combined plots with symmetric values to allow a better comparability.

      It is important to note that for the variance analysis that IQR/median was calculated for each gene within each sex for each tissue. This is a key piece of information that should be in the methods or legend of the main figure (not buried in Supplemental Table 17).

      ​We have now moved these descriptions into the Methods section.

    1. Reviewer #1 (Public review):

      In this study, Ma et al. aimed to determine previously uncharacterized contributions of tissue autofluorescence, detector afterpulse, and background noise on fluorescence lifetime measurement interpretations. They introduce a computational framework they named "Fluorescence Lifetime Simulation for Biological Applications (FLiSimBA)" to model experimental limitations in Fluorescence Lifetime Imaging Microscopy (FLIM) and determine parameters for achieving multiplexed imaging of dynamic biosensors using lifetime and intensity. By quantitatively defining sensor photon effects on signal to noise in either fitting or averaging methods of determining lifetime, the authors contradict any claims of FLIM sensor expression insensitivity to fluorescence lifetime and highlight how these artifacts occur differently depending on analysis method. Finally, the authors quantify how statistically meaningful experiments using multiplexed imaging could be achieved.

      A major strength of the study is the effort to present results in a clear and understandable way given that most researcher do not think about these factors on a day-to-day basis. Additionally, the model code is readily available in Matlab and Python, which should allow for open access to a larger community.

      Overall, the authors' achieved their aims of demonstrating how common factors (autofluorescence, background, and sensor expression) will affect lifetime measurements and they present a clear strategy for understanding how sensor expression may confound results if not properly considered. This work should bring to awareness an issue that new users of lifetime biosensors may not be aware of and that experts, while aware, have not quantitatively determine the conditions where these issues arise. This work will also point to future directions for improving experiments using fluorescence lifetime biosensors and the development of new sensors with more favorable properties.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      In this study, Ma et al. aimed to determine previously uncharacterized contributions of tissue autofluorescence, detector afterpulse, and background noise on fluorescence lifetime measurement interpretations. They introduce a computational framework they named "Fluorescence Lifetime Simulation for Biological Applications (FLiSimBA)" to model experimental limitations in Fluorescence Lifetime Imaging Microscopy (FLIM) and determine parameters for achieving multiplexed imaging of dynamic biosensors using lifetime and intensity. By quantitatively defining sensor photon effects on signal-to-noise in either fitting or averaging methods of determining lifetime, the authors contradict any claims of FLIM sensor expression insensitivity to fluorescence lifetime and highlight how these artifacts occur differently depending on the analysis method. Finally, the authors quantify how statistically meaningful experiments using multiplexed imaging could be achieved. 

      A major strength of the study is the effort to present results in a clear and understandable way given that most researchers do not think about these factors on a day-to-day basis. The model code is available and written in Matlab, which should make it readily accessible, although a version in other common languages such as Python might help with dissemination in the community. One potential weakness is that the model uses parameters that are determined in a

      specific way by the authors, and it is not clear how vastly other biological tissue and microscope setups may differ from the values used by the authors. 

      Overall, the authors achieved their aims of demonstrating how common factors

      (autofluorescence, background, and sensor expression) will affect lifetime measurements and they present a clear strategy for understanding how sensor expression may confound results if not properly considered. This work should bring to awareness an issue that new users of lifetime biosensors may not be aware of and that experts, while aware, have not quantitatively determined the conditions where these issues arise. This work will also point to future directions for improving experiments using fluorescence lifetime biosensors and the development of new sensors with more favorable properties. 

      We appreciate the comments and helpful suggestions. We now also include FLiSimBA simulation code in Python in addition to Matlab to make it more accessible to the community.

      One advantage of FLiSimBA is that the simulation package is flexible and adaptable, allowing users to input parameters based on the specific sensors, hardware, and autofluorescence measurements for their biological and optical systems. We used parameters based on a FRETbased sensor, measured autofluorescence from mouse tissue, and measured dark count/after pulse of our specific GaAsP PMT in this manuscript as examples. In Discussion and Materials and methods, we now emphasize this advantage and further clarify how these parameters can be adapted to diverse tissues, imaging systems, and sensors based on individual experiments. We further explain that these input parameters will not affect the conclusions of our study, but the specific input parameters would alter the quantitative thresholds.

      Reviewer #2 (Public review): 

      Summary: 

      By using simulations of common signal artefacts introduced by acquisition hardware and the sample itself, the authors are able to demonstrate methods to estimate their influence on the estimated lifetime, and lifetime proportions, when using signal fitting for fluorescence lifetime imaging. 

      Strengths: 

      They consider a range of effects such as after-pulsing and background signal, and present a range of situations that are relevant to many experimental situations. 

      Weaknesses: 

      A weakness is that they do not present enough detail on the fitting method that they used to estimate lifetimes and proportions. The method used will influence the results significantly. They seem to only use the "empirical lifetime" which is not a state of the art algorithm. The method used to deconvolve two multiplexed exponential signals is not given. 

      We appreciate the comments and constructive feedback. Our revision based on the reviewer’s suggestions has made our manuscript clearer and more user friendly. We originally described the detail of the fitting methods in Materials and methods. Given the importance of these methodological details for evaluating the conclusions of this study, we have moved the description of the fitting method from Materials and methods to Results. In addition, we provide further clarification and more details of the rationale of using these different methods of lifetime estimates in Discussion to aid users in choosing the best metric for evaluating fluorescence lifetime data.

      More specifically, we modified our writing to highlight the following.

      (1) In Results, we describe that lifetime histograms were fitted to Equation 3 with the GaussNewton nonlinear least-square fitting algorithm and the fitted P<sub1</sub> was used as lifetime estimation.

      (2) In Results, we clarify that our simulation of multiplexed imaging was modeled with two sensors, each displaying a single exponential decay, but the two sensors have different decay constants. We also describe that Equation 3 with the Gauss-Newton nonlinear least-square fitting algorithm was used to deconvolve the two multiplexed exponential signals (Fig. 8)

      Reviewer #3 (Public review): 

      Summary: 

      This study presents a useful computational tool, termed FLiSimBA. The MATLAB-based FLiSimBA simulations allow users to examine the effects of various noise factors (such as autofluorescence, afterpulse of the photomultiplier tube detector, and other background signals) and varying sensor expression levels. Under the conditions explored, the simulations unveiled how these factors affect the observed lifetime measurements, thereby providing useful guidelines for experimental designs. Further simulations with two distinct fluorophores uncovered conditions in which two different lifetime signals could be distinguished, indicating multiplexed dynamic imaging may be possible. 

      Strengths: 

      The simulations and their analyses were done systematically and rigorously. FliSimba can be useful for guiding and validating fluorescence lifetime imaging studies. The simulations could define useful parameters such as the minimum number of photons required to detect a specific lifetime, how sensor protein expression level may affect the lifetime data, the conditions under which the lifetime would be insensitive to the sensor expression levels, and whether certain multiplexing could be feasible. 

      Weaknesses: 

      The analyses have relied on a key premise that the fluorescence lifetime in the system can be described as two-component discrete exponential decay. This means that the experimenter should ensure that this is the right model for their fluorophores a priori and should keep in mind that the fluorescence lifetime of the fluorophores may not be perfectly described by a twocomponent discrete exponential (for which alternative algorithms have been implemented: e.g., Steinbach, P. J. Anal. Biochem. 427, 102-105, (2012)). In this regard, I also couldn't find how good the fits were for each simulation and experimental data to the given fitting equation (Equation 2, for example, for Figure 2C data). 

      We thank the reviewer for the constructive feedback. We agree that the FLiSimBA users should ensure that the right decay equations are used to describe the fluorescent sensors. In this study, we used a FRET-based PKA sensor FLIM-AKAR to provide proof-of-principle demonstration of the capability of FLiSimBA. The donor fluorophore of FLIM-AKAR, truncated monomeric enhanced GFP, displays a single exponential decay. FLIM-AKAR, a FRET-based sensor, displays a double exponential decay. The time constants of the two exponential components were determined and reported previously (Chen, et al, Neuron (2017)).  Thus, a double exponential decay equation with known τ<sub>1</sub> and τ<sub>2</sub> was used for both simulation and fitting. The goodness of fit is now provided in Supplementary Fig. 1 for both simulated and experimental data. In addition to referencing our prior study characterizing the double exponential decay model of FLIM-AKAR in Materials and methods, we have emphasized in Discussion the versality of FLiSimBA to adapt to different sensors, tissues, and analysis methods, and the importance of using the right mathematical models to describe the fluorescence decay of specific sensors. 

      Also, in Figure 2C, the 'sensor only' simulation without accounting for autofluorescence (as seen in Sensor + autoF) or afterpulse and background fluorescence (as seen in Final simulated data) seems to recapitulate the experimental data reasonably well. So, at least in this particular case where experimental data is limited by its broad spread with limited data points, being able to incorporate the additional noise factors into the simulation tool didn't seem to matter too much.  

      In the original Fig 2C, the sensor fluorescence was much higher than the contributions from autofluorescence, afterpulse, and background signals, resulting in minimal effects of these other factors, as the reviewer noted. This original figure was based on photon counts from single neurons expressing FLIM-AKAR. For the rest of the manuscript, photon counts were based on whole fields of view (FOV). Since the FOV includes cells that do not express fluorescent sensors, the influence of autofluorescence, dark currents, and background is much more pronounced, as shown in Fig. 2B. 

      Both approaches – using photon counts from the whole FOV or from individual neurons – have their justifications. Photon counts from the whole FOV simulate data from fluorescence lifetime photometry (FLiP), whereas photon counts from individual neurons simulate data from fluorescence lifetime imaging microscopy (FLIM). However, the choice of approach does not affect the conclusions of the manuscript, as a range of photon count values are simulated. To maintain consistency throughout the manuscript, we have revised the photon counts in this figure (now Supplementary Fig. 1C) to match those from the whole FOV.

      Additionally, we have made some modifications in our analyses of Supplementary Fig. 1C and Fig. 2B, detailed in the “FLIM analysis” section of Materials and methods. For instance, to minimize system artifact interference at the histogram edges, we now use a narrower time range (1.8 to 11.5 ns) for fitting and empirical lifetime calculation.

      Reviewer #1 (Recommendations for the authors): 

      (1) The authors report how autofluorescence was measured from "imaged brain slices from mice at postnatal 15 to 19 days of age without sensor expression." However, it remains unclear how many acute slices and animals were used (for example, were all 15um x 15um FOV from a single slice) and if mouse age affects autofluorescence quantification. Furthermore, would in vivo measurements have different autofluorescence conditions given that blood flow would be active? It would help if the authors more clearly explained how reliable their autofluorescence measurement is by clarifying how they obtained it, whether this would vary across brain areas, and whether in vitro vs in vivo conditions would affect autofluorescence. 

      We have added description in Materials and methods that for autofluorescence ‘Fluorescence decay histograms from 19 images of two brain slices from a single mouse were averaged.’ We have added in Discussion that users should carefully ‘measure autofluorescence that matches the age, brain region, and data collection conditions (e.g., ex vivo or in vivo) of their tissue…’, and emphasize that FLiSimBA offers customization of inputs, and it is important for users to adapt the inputs such as autofluorescence to their experimental conditions. We also clarify in Discussion that the change of input parameters such as autofluorescence across age and brain region would not affect the general insights from this study, but will affect quantitative values.

      (2) Does sensor expression level issues arise more with in-utero electroporation compared to AAV-based delivery of biosensors? A brief comment on this in the discussion may help as most users in the field today may be using AAV strategies to deliver biosensors.

      In our experience, in-utero electroporation results in higher sensor expression than AAV-based delivery, and so pose less concern for expression-level dependence. However, both delivery methods can result in expression level dependence, especially with a sensor that is not bright. We have added in Discussion ‘For a sensor with medium brightness delivered via in utero electroporation, adeno-associated virus, or as a knock-in gene, the brightness may not always fall within the expression level-independent regime.’

      (3) Figure 1. Should the x-axis on the top figures be "Time (ns)" instead of "Lifetime (ns)"?

      Similarly in Figure 8A&B, wouldn't it make more sense to have the x-axis be Time not Lifetime?

      The x-axis labels in Fig. 1 and Fig. 8A-8B have been changed to ‘Time (ns)’.   

      (4) Figure 2b: why is the empirical lifetime close to 3.5ns? Shouldn't it be somewhere between

      2.14 and 0.69? 

      In our empirical lifetime calculation, we did not set the peak channel to have a time of 0.0488 ns (i.e. the laser cycle 12.5 ns divided by 256 time channels). Rather, we set the first time channel within a defined calculation range (i.e. 1.8 ns in Supplementary Fig. 1B) to have a time of 0.0488 ns (i.e.). Thus, the empirical lifetime exceeds 2.14 ns and depends on the time range of the histogram used for calculation. 

      For Fig. 2B and Supplementary Fig. 1C, we have now adjusted the range to 1.8-11.5 ns to eliminate FLIM artifacts at the histogram edges in our experimental data, resulting in an empirical lifetime around 2.255 ns. In contrast, the range for calculating the empirical lifetime of simulated data in the rest of the study (e.g. Fig. 4D) is 0.489-11.5 ns, yielding a larger lifetime of ~3.35 ns. 

      We have clarified these details and our rationale in Materials and methods.

      (5) Figure 2b: how come the afterpulse+background contributes more to the empirical lifetime than the autofluorescence (shorter lifetime). This was unclear in the results text why autofluorescence photons did not alter empirical lifetime as much as did the afterpulse/background.

      With a histogram range from 1.8 ns to 11.5 ns used in Fig. 2B, the empirical lifetime for FLIM-AKAR sensor fluorescence, autofluorescence, and background/afterpulse are: 2-2.3 ns, around 1.69 ns, and around 4.90 ns. The larger difference of background/afterpulse from FLIM-AKAR sensor fluorescence leads to larger influence of afterpulse+background than autofluorescence. We have added an explanation of this in Results.

      (6) One overall suggestion for an improvement that could help active users of lifetime biosensors understand the consequences would be to show either a real or simulated example of a "typical experiment" conducted using FLIM-AKAR and how an incorrect interpretation could be drawn as a consequence of these artifacts. For example, do these confounds affect experiments involving comparisons across animals more than within-subject experiments such as washing a drug onto the brain slice, and the baseline period is used to normalize the change in signal? I think this type of direct discussion will help biosensor users more deeply grasp how these factors play out in common experiments being conducted.

      We have added the following in Discussion, ‘…While this issue is less problematic when the same sample is compared over short periods (e.g. minutes), It can lead to misinterpretation when fluorescence lifetime is compared across prolonged periods or between samples when comparison is made across chronic time periods or between samples with different sensor expression levels. For example, apparent changes in fluorescence lifetime observed over days, across cell types, or subcellular compartments may actually reflect variations in sensor expression levels rather than true differences in biological signals (Fig. 6), Therefore, considering biologically realistic factors in FLiSimBA is essential, as it qualitatively impacts the conclusions.’

      Reviewer #2 (Recommendations for the authors): 

      The paper would be improved with more detail on the fitting methods, and the use of state-of-theart methods. Consult for example the introduction of this paper where many methods are listed: https://www.mdpi.com/1424-8220/22/19/7293

      We have moved the description of the Gauss-Newton nonlinear least-square fitting algorithm from Materials and methods to Results to enhance clarity. We appreciate the reviewer’s suggestion to combine FLiSimBA with various analysis methods. However, the primary focus of our manuscript is to call for attention of how specific contributing factors in biological experiments influence FLIM data, and to provide a tool that rigorously considers these factors to simulate FLIM data, which can then be used for fitting. Therefore, we did not expand the scope of our manuscript. Instead, we have added in the Discussion that ‘‘FLiSimBA can be used to test multiple fitting methods and lifetime metrics as an exciting future direction for identifying the best analysis method for specific experimental conditions’, citing relevant references.

      I would also improve the content of the GitHub repository as it is very hard to identify to source code used for simulation and fitting. 

      We have reorganized and relabeled our GitHub repository and now have three folders labeled as ‘Simulation_inMatlab’, ‘DataAnalysis_inMatlab’, and ‘SimulationAnalysis_inPython’. We also updated the clarification of the contents of each folder in the README file.

      Reviewer #3 (Recommendations for the authors): 

      (1) P. 10 "For example, to detect a P1 change of 0.006 or a lifetime change of 5 ps with one sample measurement in each comparison group, approximately 300,000 photons are needed." If I am reading the graphs in Figures 3B and C, this sentence is talking about the red line. However, the intersection of 0.006 in the MDD of P1 in 3B and red is not 3E5 photons. And the intersection of 0.005 ns and red in 3C is not 3E5 photons either. Are you sure you are talking about n=1? Maybe the values are correct for the blue curve with n=5.

      Thank you for catching our error. We have corrected the text to ‘with five sample measurements’.

      (2) Figure 2 (B) legend: It would be helpful to specify what is being compared in the legend. For example, consider revising "* p < 0.05 vs sensor only; n.s. not significant vs sensor + autoF; # p < 0.05 vs sensor + autoF. Two-way ANOVA with Šídák's multiple comparisons test" to "* p <0.05 for sensor + auto F (cyan) vs sensor only; n.s. not significant for final simulated data (purple) vs sensor + autoF; # p < 0.05 for final simulated data (purple) vs sensor + autoF. Twoway ANOVA with Šídák's multiple comparisons test".

      We’ve made the change and thanks for the suggestion to make it clearer.

      (3) Figure 2 (c) Can you please show the same Two-way ANOVA test values for Experimental vs. Sensor only and for Experimental vs. Sensor + autoF? Currently, the value (n.s.) is marked only for Experimental vs. Final simulation. Given that the experimental data are sparse (compared to the simulations), it seems likely that there may be no significant difference among the 3 different simulations regarding how well they match the experimental data. Also, can you specify the P1 and P2 of the experimental data  used to generate the simulated data on this panel? Also, what is the reason why P1=0.5 was used for panels A and B, instead of the value matching the experimental value?

      As the reviewer suggested, we have included statistical tests in the figure (now Supplementary Fig. 1C). Please see our response to the Public Review of Reviewer 3’s comments as well as our changes in Materials and Methods on other changes and their rationale for this figure. We have now specified the P<sub>1</sub> value of the experimental data used to generate the simulated data on this panel both in Figure Legends and Materials and Methods. Based on the suggestion, we have now used the same P<sub>1</sub> value in Fig. 2B.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary

      This manuscript presents a large-scale comparative genomics analysis of Salmonella genomes to identify and characterize the repertoire of Type VI Secretion System (T6SS) effectors. The authors combine bioinformatic predictions with experimental validation of one novel toxin domain (Tox-Act1), revealing a unique catalytic activity not previously reported in bacterial toxins. While the study is comprehensive and offers valuable insights into T6SS diversity, the insufficient description of computational methods and limited accessibility of underlying data reduce reproducibility and impact.

      Major comments

      1. The computational methods are inadequately described in the Materials and Methods section, and the authors did not provide the underlying datasets. These omissions make it impossible to reproduce the analysis or to apply the approach to other organisms.
      2. The criteria used to distinguish between T6SS effectors and non-effectors are unclear. The reliance on proximity to structural genes ("guilt-by-association") is insufficient and may have led to the omission of cargo effectors not proximal to these structural genes.
      3. No information is provided in the Materials and Methods section about the graph-based clustering strategy mentioned in the main text (Rows 109-111), including the Jaccard index and Louvain algorithm.
      4. The definition and identification of T6SS subtypes, including the use of the term "orphan," are not explained (Rows 111-112).
      5. The phylogenetic analysis of the newly identified domain Tox-Act1 lacks consistency and detail. For example, Rows 324-326 state: "To predict the function of Tox-Act1, we sought to understand its evolutionary relationship by constructing a phylogenetic tree using the sequences of Tox-Act1, TseH and additional permuted members, such as LRAT and YiiX." However, this contradicts Rows 342-344 and Figure 4A, which describe the phylogenetic tree as being built from permuted NlpC/P60 members, and indicate that a single query was used for PSI-BLAST, marked with a red star. It is unclear whether Tox-Act1, TseH, or another sequence was used as the initial PSI-BLAST query.
      6. The Tox-Act1 domain investigated is labeled as an acyltransferase, but the evidence presented supports only phospholipid-degrading activity. In my opinion, the naming should better reflect the activity demonstrated by the data.
      7. Table S1 should include representative protein accessions for each T6SS toxin domain. This is essential for evaluating the novelty of the identified domains and for enabling their use in future analyses. The repeated use of "This study" (96 times) as a reference, without further detail, is confusing and unhelpful. In my view, referencing the current study is appropriate only when the manuscript provides sufficient information on the corresponding domain.
      8. In general, the authors should place greater emphasis on ensuring that the proteins and genomes analyzed in this study can be reliably identified. Genomic accessions and locus tags should be traceable in public databases such as NCBI, and the supplemental information must correspond accurately to the main text. For example, I was unable to find information on FD01543424_00914, which was used as the query for the alignment of STox_15 (the name used in the supplemental information, while in the main text it is referred to as Tox-Act1; see related comment below).
      9. A supplementary table listing all Salmonella effectors and their domain annotations is missing. This is essential for transparency, reproducibility, and future use of the data.
      10. The GitHub repository contains a large volume of data and code but lacks detailed documentation and clear instructions, including example files. This greatly limits reproducibility and usability. The current organization of the repository makes it difficult to locate specific results; for example, Tox-Act1 is referred to as STox_15 in the GitHub files, but this is not mentioned in the manuscript. The authors should improve data organization and provide a README file for clarity.

      Minor comments

      1. The introduction should discuss previous work on Salmonella T6SS effectors, including Blondel et al. (2023) (ref 71 in the manuscript), Amaya et al. (2022), and Amaya et al. (2024).
      2. In Figure 1C, genomic examples should include strain names and locus tags.
      3. In Figure 1F, 'ND' should be replaced with 'Unknown' or 'Not Determined'.
      4. Figure 1E is overly complex and, in my opinion, does not add value, especially since the accompanying text is sufficient on its own. Moreover, the authors acknowledge that their initial analysis missed the similarity between Tox-Act1 and both DUF4105 and the TseH effector, which raises concerns about the accuracy and usefulness of this graph.
      5. Figure 3D lacks information about the number of replicates (n=?).
      6. Discrepancies in domain annotations:
        • Row 232: STox_47 is missing from Table S1.
        • Row 233: STox_18 is pore-forming and STox_53 is a nuclease (per Table S1), which contradicts the main text.
      7. Multiple grammatical and typographical errors exist throughout the text, including:
        • Row 41: "provide" should be "provides"
        • Rows 131, 222: "immunities" should be "immunity proteins"
        • Rows 170, 253, 288: "thee" should be "three"
        • Row 388: "corresponds" should be "correspond"
        • Row 389: "chomatogram" should be "chromatogram"
      8. Rows 257-259: The claim that PAAR and RHS domains assist in translocation across the bacterial inner membrane is presented as fact, but this is only a hypothesis and should be stated more cautiously.
      9. Figure 3A: The selection of representative genomic loci is unclear. For example, FD01843896 is shown in the figure, but cloning was performed using FD01848827, and the HHPred analysis was based on FD01543424. The rationale for using different sequences at each step should be clarified.
      10. Rows 296-299: The absence of a secretion assay in the study is notable. If this is due to the inability to activate the SPI-6 T6SS of Salmonella enterica serovar Typhimurium, as discussed in these lines, it should be explicitly mentioned in the text.
      11. Figure 4C (sequence logo) is not described in the Materials and Methods section.
      12. Row 467: The retrieval date of the gff files from the 10KSG database is missing.
      13. Rows 474-476: The domain models used for T6SS cluster prediction are not described.

      Significance

      This is a comprehensive study involving a large number of Salmonella genomes, potentially identifying many new T6SS effectors and toxic activities. One new domain analyzed in this work is experimentally investigated and shown to have a unique catalytic activity not previously observed in toxins. However, the bioinformatic methods are not described in sufficient detail, making it difficult to assess or reproduce the work. Protein accession numbers are missing, even for representative toxins, and locus tags are not traceable, making the identified effectors not readily accessible. There are many inaccuracies throughout the text and supplemental data. The Tox-Act1 domain investigated is labeled as an acyltransferase, but the evidence only supports phospholipid-degrading activity. While the study includes many graphs and histograms, they often obscure the main findings. Consequently, the audience is likely to be limited.

      Nevertheless, despite these concerns, I believe this is an important work that could be valuable to the broad community once a more thorough revision is undertaken, not only by addressing the specific comments raised, but also by rechecking the analyses, reorganizing the presentation, and ensuring that all data and annotations are clearly accessible and traceable.

    1. This makes a lot of sense: with AIs generating a large amount of code, the challenge moves towards validation and testing. If you generate towards a higher level of abstraction, there's less code and less moving parts to be reviewed and validated. Think of it as low-code/DSLs for GenAI, still code, but with less technical details to review and test.
    2. Looks neat!As humans augmented with agents write more code, solutions that require less context shifting to get stuff done will win.A common web stack may include API handlers, OpenAPI spec, generated TypeScript definitions, generated TypeScript client, React logic and effects code, TSX code, HTML, and CSS.This generally needs filesystem watchers, code generators, transpilers, compilers to get stuff done.Something that can go from a backend handlers straight to terse markup for reactive UI would be a massive simplification to all this, and a big productivity boost.
    1. Reviewer #3 (Public review):

      In the study, the authors performed longitudinal 1P calcium imaging of mouse mPFC across 8 weeks during learning of an olfactory-guided task, including habituation, training, and sleep periods. The task had 3 arms. Odor was sampled at the end of the middle arm (named the "Sample" period). The animal then needed to run to one of the two other arms (R or L) based on the odor. The whole period until they reached the end of one of the choice arms was the "Outward" period. The time at the reward end was the "Reward" period. They noted several changes from the learning condition to the learned condition (there are some questions for the authors interspersed):

      (1) They classified cells in a few ways. First, each cell was classified as SI (spatially informative) if it had significantly more spatial information than shuffled activity, and ~50% of cells ended up being SI cells. Then, among the SI cells, they classified a cell as a TC (task cell) if it had statistically similar activity maps for R versus L arms, and a GC (goal arm cell) otherwise. Note that there are 4 kinds of these cells: outer arm TCs and GCs, and middle arm TCs and GCs (with middle arm GCs essentially being like "splitter cells" since they are not similarly active in the middle arm for R versus L trials). There was an increase in TCs from the learning to the learned condition sessions.

      (2) They analyze activity sequences across cells. They extracted 500 ms duration bursts (defined as periods of activity > 0.5 standard deviations over what I assume is the mean - if so, the authors can add "over the mean" to the burst definition in the methods). They first noted that the resulting "Burst rates were significantly larger during behavioral epochs than during sleep and during periods of habituation to the arena", and "Moreover, burst rates during correct trials were significantly lower than during error trials". For the sequence analysis, they only considered bursts consisting of at least 5 active cells. A cell's activity within the burst was set to the center of mass of calcium activity. Then they took all the sequences from all learned and learning sessions together and hierarchically clustered them based on Spearman's rank correlation between the order of activity in each pair of sequences (among the cells active in both). The iterative hierarchical clustering process produces groups (clusters) of sequences such that there are multiple repeats of sequences within a cluster. Different sequences are expressed across all the longitudinally recorded sessions. They noted "large differences of sequence activation between learning and learned condition, both in the spatial patterns (example animal in Figure 3D) and the distribution of the sequences (Figures 3D, E). Rastermap plots (Figure 3D) also reveal little similarity of sequence expression between task and habituation or sleep condition." They also note that the difference in the sequences between learning and learned conditions was larger than the difference between correct and error trials within each condition. They conclude that during task learning, new representations are established, as measured by the burst sequence content. They do additional analyses of the sequence clusters by assessing the spatial informativeness (SI) of each sequence cluster. Over learning, they find an increase in clusters that are spatially informative (clusters that tend to occur in specific locations). Finally, they analyzed the SI clusters in a similar manner to SI cells and classified them as task phase selective sequences (TSs) and goal arm selective sequences (GSs), and did some further analysis. However, they themselves conclude that the frequency of TSs and GSs is limited (I believe because most sequence clusters were non-SI - the authors can verify this and write it in the text?). In the discussion, they say, "In addition to GSs and TSs, we found that most of the recurring sequences are not related to behavior".

      (3) As an alternative to analyzing individual cells and sequences of individual cells, they then look for trajectory replay using Bayesian population decoding of location during bursts. They analyze TS bursts, GS bursts, and non-SI bursts. They say "we found correlations of decoded position with time bin (within a 500 ms burst) strongly exceeding chance level only during outward and reward phase, for both GSs and TSs (Fig 4H)." Figure 4H shows distributions indicating statistically significant bias in the forward direction (using correlations of decoded location versus time bin across 10 bins of 50 ms each within each 500-ms burst). They find that the Outward trajectories appear to reflect the actual trajectory during running itself, so they are likely not replay. But the sequences at the Reward are replay as they do not reflect the current location. Furthermore, replay at the Reward is in the forward direction (unlike the reverse replay at Reward seen in the hippocampus), and this replay is only seen in the learned and not the learning condition. At the same time, they find that replay is not seen during odor Sampling, from which they conclude there is no evidence of replay used for planning. Instead, they say the replay at the Reward could possibly be for evaluation during the Reward phase, though this would only be for the learned condition. They conclude "Together with our finding of strong changes in sequence expression after learning (Figure 3E) these findings suggest that a representation of task develops during learning, however, it does not reflect previous network structure." I am not sure what is meant here by the second part of this sentence (after "however ..."). Is it the idea that the replay represents network structure, and the lack of Reward replay in the learning condition means that the network structure must have been changed to get to the learned condition? Please clarify.

      This study provides valuable new information about the evolution of mPFC activity during the learning of an odor-based 2AFC T-maze-like task. They show convincing evidence of changes in single-cell tuning, population sequences, and replay events. They also find novel forward replay at the Reward, and find that this is present only after the animal has learned the task. In the discussion, the authors note "To our knowledge, this study identified for the first time fast recurring neural sequence activity from 1-p calcium data, based on correlation analysis."

      (1) There are some statements that are not clear, such as at the end of the introduction, where the authors write, "Both findings suggest that the mPFC task code is locally established during learning." What is the reasoning behind the "locally established" statement? Couldn't the learning be happening in other areas and be inherited by the mPFC? Or are the authors assuming that newly appearing sequences within a 500-ms burst period must be due to local plasticity? I have also pointed out a question about the statement "however, it does not reflect previous network structure" in (3) above.

      (2) The threshold for extracting burst events (0.5 standard deviations, presumably above the mean, but the authors should verify this) seems lower than what one usually sees as a threshold for population burst detection. What fraction of all data is covered by 500 ms periods around each such burst? However, it is potentially a strength of this work that their results are found by using this more permissive threshold.

    1. Reviewer #1 (Public review):

      Summary:

      Chao et al. produced an updated version of the SpliceAI package using modern deep learning frameworks. This includes data preprocessing, model training, direct prediction, and variant effect prediction scripts. They also added functionality for model fine-tuning and model calibration. They convincingly evaluate their newly trained models against those from the original SpliceAI package and investigate how to extend SpliceAI to make predictions in new species. While their comparisons to the original SpliceAI models are convincing on the grounds of model performance, their evaluation of how well the new models match the original's understanding of non-local mutation effects is incomplete. Further, their evaluation of the new calibration functionality would benefit from a more nuanced discussion of what set of splice sites their calibration is expected to hold for, and tests in a context for which calibration is needed.

      Strengths:

      (1) They provide convincing evidence that their new implementation of SpliceAI matches the performance of the original model on a similar dataset while benefiting from improved computational efficiencies. This will enable faster prediction and retraining of splicing models for new species as well as easier integration with other modern deep learning tools.

      (2) They produce models with strong performance on non-human model species and a simple, well-documented pipeline for producing models tuned for any species of interest. This will be a boon for researchers working on splicing in these species and make it easy for researchers working on new species to generate their own models.

      (3) Their documentation is clear and abundant. This will greatly aid the ability of others to work with their code base.

      Weaknesses:

      (1) The authors' assessment of how much their model retains SpliceAI's understanding of "non-local effects of genomic mutations on splice site location and strength" (Figure 6) is not sufficiently supported. Demonstrating this would require showing that for a large number of (non-local) mutations, their model shows the same change in predictions as SpliceAI or that attribution maps for their model and SpliceAI are concordant even at distances from the splice site. Figure 6A comes close to demonstrating this, but only provides anecdotal evidence as it is limited to 2 loci. This could be overcome by summarizing the concordance between ISM maps for the two models and then comparing across many loci. Figure 6B also comes close, but falls short because instead of comparing splicing prediction differences between the models as a function of variants, it compares the average prediction difference as a function of the distance from the splice site. This limits it to only detecting differences in the model's understanding of the local splice site motif sequences. This could be overcome by looking at comparisons between differences in predictions with mutants directly and considering non-local mutants that cause differences in splicing predictions.

      (2) The utility of the calibration method described is unclear. When thinking about a calibrated model for splicing, the expectation would be that the models' predicted splicing probabilities would match the true probabilities that positions with that level of prediction confidence are splice sites. However, the actual calibration that they perform only considers positions as splice sites if they are splice sites in the longest isoform of the gene included in the MANE annotation. In other words, they calibrate the model such that the model's predicted splicing probabilities match the probability that a position with that level of confidence is a splice site in one particular isoform for each gene, not the probability that it is a splice site more broadly. Their level of calibration on this set of splice sites may very well not hold to broader sets of splice sites, such as sites from all annotated isoforms, sites that are commonly used in cryptic splicing, or poised sites that can be activated by a variant. This is a particularly important point as much of the utility of SpliceAI comes from its ability to issue variant effect predictions, and they have not demonstrated that this calibration holds in the context of variants. This section could be improved by expanding and clarifying the discussion of what set of splice sites they have demonstrated calibration on, what it means to calibrate against this set of splice sites, and how this calibration is expected to hold or not for other interesting sets of splice sites. Alternatively, or in addition, they could demonstrate how well their calibration holds on different sets of splice sites or show the effect of calibrating their models against different potentially interesting sets of splice sites and discuss how the results do or do not differ.

      (3) It is difficult to assess how well their calibration method works in general because their original models are already well calibrated, so their calibration method finds temperatures very close to 1 and only produces very small and hard to assess changes in calibration metrics. This makes it very hard to distinguish if the calibration method works, as it doesn't really produce any changes. It would be helpful to demonstrate the calibration method on a model that requires calibration or on a dataset for which the current model is not well calibrated, so that the impact of the calibration method could be observed.

    2. Reviewer #2 (Public review):

      Summary:

      The paper by Chao et al offers a reimplementation of the SpliceAI algorithm in PyTorch so that the model can more easily/efficiently be retrained. They apply their new implementation of the SpliceAI algorithm, which they call OpenSpliceAI, to several species and compare it against the original model, showing that the results are very similar and that in some small species, pre-training on other species helps improve performance.

      Strengths:

      On the upside, the code runs fine, and it is well documented.

      Weaknesses:

      The paper itself does not offer much beyond reimplementing SpliceAI. There is no new algorithm, new analysis, new data, or new insights into RNA splicing. There is no comparison to many of the alternative methods that have since been published to surpass SpliceAI. Given that some of the authors are well-known with a long history of important contributions, our expectations were admittedly different. Still, we hope some readers will find the new implementation useful.

    3. Author response:

      Reviewer #1 (Public review):

      Summary:

      Chao et al. produced an updated version of the SpliceAI package using modern deep learning frameworks. This includes data preprocessing, model training, direct prediction, and variant effect prediction scripts. They also added functionality for model fine-tuning and model calibration. They convincingly evaluate their newly trained models against those from the original SpliceAI package and investigate how to extend SpliceAI to make predictions in new species. While their comparisons to the original SpliceAI models are convincing on the grounds of model performance, their evaluation of how well the new models match the original's understanding of non-local mutation effects is incomplete. Further, their evaluation of the new calibration functionality would benefit from a more nuanced discussion of what set of splice sites their calibration is expected to hold for, and tests in a context for which calibration is needed.

      Strengths:

      (1) They provide convincing evidence that their new implementation of SpliceAI matches the performance of the original model on a similar dataset while benefiting from improved computational efficiencies. This will enable faster prediction and retraining of splicing models for new species as well as easier integration with other modern deep learning tools.

      (2) They produce models with strong performance on non-human model species and a simple, well-documented pipeline for producing models tuned for any species of interest. This will be a boon for researchers working on splicing in these species and make it easy for researchers working on new species to generate their own models.

      (3) Their documentation is clear and abundant. This will greatly aid the ability of others to work with their code base.

      We thank the reviewer for these positive comments.  

      Weaknesses:

      (1) The authors' assessment of how much their model retains SpliceAI's understanding of "nonlocal effects of genomic mutations on splice site location and strength" (Figure 6) is not sufficiently supported. Demonstrating this would require showing that for a large number of (non-local) mutations, their model shows the same change in predictions as SpliceAI or that attribution maps for their model and SpliceAI are concordant even at distances from the splice site. Figure 6A comes close to demonstrating this, but only provides anecdotal evidence as it is limited to 2 loci. This could be overcome by summarizing the concordance between ISM maps for the two models and then comparing across many loci. Figure 6B also comes close, but falls short because instead of comparing splicing prediction differences between the models as a function of variants, it compares the average prediction difference as a function of the distance from the splice site. This limits it to only detecting differences in the model's understanding of the local splice site motif sequences. This could be overcome by looking at comparisons between differences in predictions with mutants directly and considering non-local mutants that cause differences in splicing predictions.

      We agree that two loci are insufficient to demonstrate preservation of non-local effects. To address this, we have extended our analysis to a larger set of sites: we randomly sampled 100 donor and 100 acceptor sites, applied our ISM procedure over a 5,001 nt window centered at each site for both models, and computed the ISM map as before. We then calculated the Pearson correlation between the collection of OSAI<sub>MANE</sub> and SpliceAI ISM importance scores. We also created 10 additional ISM maps similar to those in Figure 6A, which are now provided in Figure S23.

      Follow is the revised paragraph in the manuscript’s Results section:

      First, we recreated the experiment from Jaganathan et al. in which they mutated every base in a window around exon 9 of the U2SURP gene and calculated its impact on the predicted probability of the acceptor site. We repeated this experiment on exon 2 of the DST gene, again using both SpliceAI and OSAI<sub>MANE</sub> . In both cases, we found a strong similarity between the resultant patterns between SpliceAI and OSAI<sub>MANE</sub> , as shown in Figure 6A. To evaluate concordance more broadly, we randomly selected 100 donor and 100 acceptor sites and performed the same ISM experiment on each site. The Pearson correlation between SpliceAI and OSAI<sub>MANE</sub> yielded an overall median correlation of 0.857 (see Methods; additional DNA logos in Figure S23). 

      To characterize the local sequence features that both models focus on, we computed the average decrease in predicted splice-site probability resulting from each of the three possible singlenucleotide substitutions at every position within 80bp for 100 donor and 100 acceptor sites randomly sampled from the test set (Chromosomes 1, 3, 5, 7, and 9). Figure 6B shows the average decrease in splice site strength for each mutation in the format of a DNA logo, for both tools.

      We added the following text to the Methods section:

      Concordance evaluation of ISM importance scores between OSAI<sub>MANE</sub> and SpliceAI

      To assess agreement between OSAI<sub>MANE</sub> and SpliceAI across a broad set of splice sites, we applied our ISM procedure to 100 randomly chosen donor sites and 100 randomly chosen acceptor sites. For each site, we extracted a 5,001 nt window centered on the annotated splice junction and, at every coordinate within that window, substituted the reference base with each of the three alternative nucleotides. We recorded the change in predicted splice-site probability for each mutation and then averaged these Δ-scores at each position to produce a 5,001-score ISM importance profile per site.

      Next, for each splice site we computed the Pearson correlation coefficient between the paired importance profiles from ensembled OSAI<sub>MANE</sub> and ensembled SpliceAI. The median correlation was 0.857 for all splice sites. Ten additional zoom-in representative splice site DNA logo comparisons are provided in Supplementary Figure S23.

      (2) The utility of the calibration method described is unclear. When thinking about a calibrated model for splicing, the expectation would be that the models' predicted splicing probabilities would match the true probabilities that positions with that level of prediction confidence are splice sites. However, the actual calibration that they perform only considers positions as splice sites if they are splice sites in the longest isoform of the gene included in the MANE annotation. In other words, they calibrate the model such that the model's predicted splicing probabilities match the probability that a position with that level of confidence is a splice site in one particular isoform for each gene, not the probability that it is a splice site more broadly. Their level of calibration on this set of splice sites may very well not hold to broader sets of splice sites, such as sites from all annotated isoforms, sites that are commonly used in cryptic splicing, or poised sites that can be activated by a variant. This is a particularly important point as much of the utility of SpliceAI comes from its ability to issue variant effect predictions, and they have not demonstrated that this calibration holds in the context of variants. This section could be improved by expanding and clarifying the discussion of what set of splice sites they have demonstrated calibration on, what it means to calibrate against this set of splice sites, and how this calibration is expected to hold or not for other interesting sets of splice sites. Alternatively, or in addition, they could demonstrate how well their calibration holds on different sets of splice sites or show the effect of calibrating their models against different potentially interesting sets of splice sites and discuss how the results do or do not differ.

      We thank the reviewer for highlighting the need to clarify our calibration procedure. Both SpliceAI and OpenSpliceAI are trained on a single “canonical” transcript per gene: SpliceAI on the hg 19 Ensembl/Gencode canonical set and OpenSpliceAI on the MANE transcript set. To calibrate each model, we applied post-hoc temperature scaling, i.e. a single learnable parameter that rescales the logits before the softmax. This adjustment does not alter the model’s ranking or discrimination (AUC/precision–recall) but simply aligns the predicted probabilities for donor, acceptor, and non-splice classes with their observed frequencies. As shown in our reliability diagrams (Fig. S16-S22), temperature scaling yields negligible changes in performance, confirming that both SpliceAI and OpenSpliceAI were already well-calibrated. However, we acknowledge that we didn’t measure how calibration might affect predictions on non-canonical splice sites or on cryptic splicing. It is possible that calibration might have a detrimental effect on those, but because this is not a key claim of our paper, we decided not to do further experiments. We have updated the manuscript to acknowledge this potential shortcoming; please see the revised paragraph in our next response.

      (3) It is difficult to assess how well their calibration method works in general because their original models are already well calibrated, so their calibration method finds temperatures very close to 1 and only produces very small and hard to assess changes in calibration metrics. This makes it very hard to distinguish if the calibration method works, as it doesn't really produce any changes. It would be helpful to demonstrate the calibration method on a model that requires calibration or on a dataset for which the current model is not well calibrated, so that the impact of the calibration method could be observed.

      It’s true that the models we calibrated didn’t need many changes. It is possible that the calibration methods we used (which were not ours, but which were described in earlier publications) can’t improve the models much. We toned down our comments about this procedure, as follows.

      Original:

      “Collectively, these results demonstrate that OSAIs were already well-calibrated, and this consistency across species underscores the robustness of OpenSpliceAI’s training approach in diverse genomic contexts.” Revised:

      “We observed very small changes after calibration across phylogenetically diverse species, suggesting that OpenSpliceAI’s training regimen yielded well‐calibrated models, although it is possible that a different calibration algorithm might produce further improvements in performance.”

      Reviewer #2 (Public review):

      Summary:

      The paper by Chao et al offers a reimplementation of the SpliceAI algorithm in PyTorch so that the model can more easily/efficiently be retrained. They apply their new implementation of the SpliceAI algorithm, which they call OpenSpliceAI, to several species and compare it against the original model, showing that the results are very similar and that in some small species, pretraining on other species helps improve performance.

      Strengths:

      On the upside, the code runs fine, and it is well documented.

      Weaknesses:

      The paper itself does not offer much beyond reimplementing SpliceAI. There is no new algorithm, new analysis, new data, or new insights into RNA splicing. There is no comparison to many of the alternative methods that have since been published to surpass SpliceAI. Given that some of the authors are well-known with a long history of important contributions, our expectations were admittedly different. Still, we hope some readers will find the new implementation useful.

      We thank the reviewer for the feedback. We have clarified that OpenSpliceAI is an open-source PyTorch reimplementation optimized for efficient retraining and transfer learning, designed to analyze cross-species performance gains, and supported by a thorough benchmark and the release of several pretrained models to clearly position our contribution.

      Reviewer #3 (Public review):

      Summary:

      The authors present OpenSpliceAI, a PyTorch-based reimplementation of the well-known SpliceAI deep learning model for splicing prediction. The core architecture remains unchanged, but the reimplementation demonstrates convincing improvements in usability, runtime performance, and potential for cross-species application.

      Strengths:

      The improvements are well-supported by comparative benchmarks, and the work is valuable given its strong potential to broaden the adoption of splicing prediction tools across computational and experimental biology communities.

      Major comments:

      Can fine-tuning also be used to improve prediction for human splicing? Specifically, are models trained on other species and then fine-tuned with human data able to perform better on human splicing prediction? This would enhance the model's utility for more users, and ideally, such fine-tuned models should be made available.

      We evaluated transfer learning by fine-tuning models pretrained on mouse (OSAI<sub>Mouse</sub>), honeybee (OSAI<sub>Honeybee</sub>), Arabidopsis (OSAI<sub>Arabidopsis</sub>), and zebrafish (OSAI<sub>Zebrafish</sub>) on human data. While transfer learning accelerated convergence compared to training from scratch, the final human splicing prediction accuracy was comparable between fine-tuned and scratch-trained models, suggesting that performance on our current human dataset is nearing saturation under this architecture.

      We added the following paragraph to the Discussion section:

      We also evaluated pretraining on mouse (OSAI<sub>Mouse</sub>), honeybee (OSAI<sub>Honeybee</sub>), zebrafish (OSAI<sub>Zebrafish</sub>), and Arabidopsis (OSAI<sub>Arabidopsis</sub>) followed by fine-tuning on the human MANE dataset. While cross-species pretraining substantially accelerated convergence during fine-tuning, the final human splicing-prediction accuracy was comparable to that of a model trained from scratch on human data. This result indicates that our architecture seems to capture all relevant splicing features from human training data alone, and thus gains little or no benefit from crossspecies transfer learning in this context (see Figure S24).

      Reviewer #1 (Recommendations for the authors):

      We thank the editor for summarizing the points raised by each reviewer. Below is our point-bypoint response to each comment:

      (1) In Figure 3 (and generally in the other figures) OpenSpliceAI should be replaced with OSAI_{Training dataset} because otherwise it is hard to tell which precise model is being compared. And in Figure 3 it is especially important to emphasize that you are comparing a SpliceAI model trained on Human data to an OSAI model trained and evaluated on a different species.

      We have updated the labels in Figures 3, replacing “OpenSpliceAI” with “OSAI_{training dataset}” to more clearly specify which model is being compared.

      (2) Are genes paralogous to training set genes removed from the validation set as well as the test set? If you are worried about data leakage in the test set, it makes sense to also consider validation set leakage.

      Thank you for this helpful suggestion. We fully agree, and to avoid any data leakage we implemented the identical filtering pipeline for both validation and test sets: we excluded all sequences paralogous or homologous to sequences in the training set, and further removed any sequence sharing > 80 % length overlap and > 80 % sequence identity with training sequences. The effect of this filtering on the validation set is summarized in Supplementary Figure S7C.

      Figure S7. (C) Scatter plots of DNA sequence alignments between validation and training sets for Human-MANE, mouse, honeybee, zebrafish, and Arabidopsis. Each dot represents an alignment, with the x-axis showing alignment identity and the y-axis showing alignment coverage. Alignments exceeding 80% for both identity and coverage are highlighted in the redshaded region and were excluded from the test sets.

      Reviewer #3 (Recommendations for the authors):

      (1) The legend in Figure 3 is somewhat confusing. The labels like "SpliceAI-Keras (species name)" may imply that the model was retrained using data from that species, but that's not the case, correct?

      Yes, “SpliceAI-Keras (species name)” was not retrained; it refers to the released SpliceAI model evaluated on the specified species dataset. We have revised the Figure 3 legends, changing “SpliceAI-Keras (species name)” to “SpliceAI-Keras” to clarify this.

      (2) Please address the minor issues with the code, including ensuring the conda install works across various systems.

      We have addressed the issues you mentioned. OpenSpliceAI is now available on Conda and can be installed with:  conda install openspliceai. 

      The conda package homepage is at: https://anaconda.org/khchao/openspliceai We’ve also corrected all broken links in the documentation.

      (3) Utility:

      I followed all the steps in the Quick Start Guide, and aside from the issues mentioned below, everything worked as expected.

      I attempted installation using conda as described in the instructions, but it was unsuccessful. I assume this method is not yet supported.

      In Quick Start Guide: predict, the link labeled "GitHub (models/spliceai-mane/10000nt/)" appears to be incorrect. The correct path is likely "GitHub (models/openspliceaimane/10000nt/)".

      In Quick Start Guide: variant (https://ccb.jhu.edu/openspliceai/content/quick_start_guide/quickstart_variant.html#quick-startvariant), some of the download links for input files were broken. While I was able to find some files in the GitHub repository, I think the -A option should point to data/grch37.txt, not examples/data/input.vcf, and the -I option should be examples/data/input.vcf, not data/vcf/input.vcf.

      Thank you for catching these issues. We’ve now addressed all issues concerning Conda installation and file links. We thank the editor for thoroughly testing our code and reviewing the documentation.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This fundamental work employed multidisciplinary approaches and conducted rigorous experiments to study how a specific subset of neurons in the dorsal striatum (i.e., "patchy" striatal neurons) modulates locomotion speed depending on the valence of the naturalistic context.

      Strengths:

      The scientific findings are novel and original and significantly advance our understanding of how the striatal circuit regulates spontaneous movement in various contexts.

      We appreciate the reviewer’s positive evaluation.

      Weaknesses:

      This is extensive research involving various circuit manipulation approaches. Some of these circuit manipulations are not physiological. A balanced discussion of the technical strengths and limitations of the present work would be helpful and beneficial to the field. Minor issues in data presentation were also noted.

      We have incorporated the recommended discussion of technical limitations and addressed the physiological plausibility of our manipulations on Page 33 of the revised Discussion section. Specifically, we wrote:

      “Judicious interpretation of the present data must consider the technical limitations of the various methods and circuit-level manipulations applied. Patchy neurons are distributed unevenly across the extensive structure of the striatum, and their targeted manipulation is constrained by viral spread in the dorsal striatum. Somatic calcium imaging using single-photon microscopy captures activity from only a subset of patchy neurons within a narrow focal plane beneath each implanted GRIN lens. Similarly, limitations in light diffusion from optical fibers may reduce the effective population of targeted fibers in both photometry and optogenetic experiments. For example, the more modest locomotor slowing observed with optogenetic activation of striatonigral fibers in the SNr compared to the stronger effects seen with Gq-DREADD activation across the dorsal striatum could reflect limited fiber optic coverage in the SNr. Alternatively, it may suggest that non-striatonigral mechanisms also contribute to generalized slowing. Our photometry data does not support a role for striatopallidal projections from patchy neurons in movement suppression. The potential contribution of intrastriatal mechanisms, discussed earlier, remains to be empirically tested. Although the behavioral assays used were naturalistic, many of the circuit-level interventions were not. Broad ablation or widespread activation of patchy neurons and their efferent projections represent non-physiological manipulations. Nonetheless, these perturbation results are interpreted alongside more naturalistic observations, such as in vivo imaging of patchy neuron somata and axon terminals, to form a coherent understanding of their functional role”.

      Reviewer #2 (Public review):

      Hawes et al. investigated the role of striatal neurons in the patch compartment of the dorsal striatum. Using Sepw1-Cre line, the authors combined a modified version of the light/dark transition box test that allows them to examine locomotor activity in different environmental valence with a variety of approaches, including cell-type-specific ablation, miniscope calcium imaging, fiber photometry, and opto-/chemogenetics. First, they found ablation of patchy striatal neurons resulted in an increase in movement vigor when mice stayed in a safe area or when they moved back from more anxiogenic to safe environments. The following miniscope imaging experiment revealed that a larger fraction of striatal patchy neurons was negatively correlated with movement speed, particularly in an anxiogenic area. Next, the authors investigated differential activity patterns of patchy neurons' axon terminals, focusing on those in GPe, GPi, and SNr, showing that the patchy axons in SNr reflect movement speed/vigor. Chemogenetic and optogenetic activation of these patchy striatal neurons suppressed the locomotor vigor, thus demonstrating their causal role in the modulation of locomotor vigor when exposed to valence differentials. Unlike the activation of striatal patches, such a suppressive effect on locomotion was absent when optogenetically activating matrix neurons by using the Calb1-Cre line, indicating distinctive roles in the control of locomotor vigor by striatal patch and matrix neurons. Together, they have concluded that nigrostriatal neurons within striatal patches negatively regulate movement vigor, dependent on behavioral contexts where motivational valence differs.

      We are grateful for the reviewer’s thorough summary of our main findings.

      In my view, this study will add to the important literature by demonstrating how patch (striosomal) neurons in the striatum control movement vigor. This study has applied multiple approaches to investigate their functionality in locomotor behavior, and the obtained data largely support their conclusions. Nevertheless, I have some suggestions for improvements in the manuscript and figures regarding their data interpretation, accuracy, and efficacy of data presentation.

      We appreciate the reviewer’s overall positive assessment and have made substantial improvements to the revised manuscript in response to reviewers’ constructive suggestions. 

      (1) The authors found that the activation of the striatonigral pathway in the patch compartment suppresses locomotor speed, which contradicts with canonical roles of the direct pathway. It would be great if the authors could provide mechanistic explanations in the Discussion section. One possibility is that striatal D1R patch neurons directly inhibit dopaminergic cells that regulate movement vigor (Nadal et al., Sci. Rep., 2021; Okunomiya et al., J Neurosci., 2025). Providing plausible explanations will help readers infer possible physiological processes and give them ideas for future follow-up studies.

      We have added the recommended data interpretation and future perspectives on Page 30 of the revised Discussion section. Specifically, we wrote:

      “Potential mechanisms by which striatal patchy neurons reduce locomotion involve the suppression of dopamine availability within the striatum. Dopamine, primarily supplied by neurons in the SNc and VTA, broadly facilitates locomotion (Gerfen and Surmeier 2011, Dudman and Krakauer 2016). Recent studies have shown that direct activation of patchy neurons leads to a reduction in striatal dopamine levels, accompanied by decreased walking speed (Nadel, Pawelko et al. 2021, Dong, Wang et al. 2025, Okunomiya, Watanabe et al. 2025). Patchy neuron projections terminate in structures known as “dendron bouquets”, which enwrap SNc dendrites within the SNr and can pause tonic dopamine neuron firing (Crittenden, Tillberg et al. 2016, Evans, Twedell et al. 2020). The present work highlights a role for patchy striatonigral inputs within the SN in decelerating movement, potentially through GABAergic dendron bouquets that limit dopamine release back to the striatum (Dong, Wang et al. 2025). Additionally, intrastriatal collaterals of patch spiny projection neurons (SPNs) have been shown to suppress dopamine release and associated synaptic plasticity via dynorphin-mediated activation of kappa opioid receptors on dopamine terminals (Hawes, Salinas et al. 2017). This intrastriatal mechanism may further contribute to the reduction in striatal dopamine levels and the observed decrease in locomotor speed, representing a compelling avenue for future investigation.”

      (2) On page 14, Line 301, the authors stated that "Cre-dependent mCheery signals were colocalized with the patch marker (MOR1) in the dorsal striatum (Fig. 1B)". But I could not find any mCherry on that panel, so please modify it.

      We have included representative images of mCherry and MOR1 staining in Supplementary Fig. S1 of the revised manuscript.

      (3) From data shown in Figure 1, I've got the impression that mice ablated with striatal patch neurons were generally hyperactive, but this is probably not the case, as two separate experiments using LLbox and DDbox showed no difference in locomotor vigor between control and ablated mice. For the sake of better interpretation, it may be good to add a statement in Lines 365-366 that these experiments suggest the absence of hyperactive locomotion in general by ablating these specific neurons.

      As suggested by the reviewer, we have added the following statement on Page 17 of the revised manuscript: “These data also indicate that PA elevates valence-specific speed without inducing general hyperactivity”.

      (4) In Line 536, where Figure 5A was cited, the author mentioned that they used inhibitory DREADDs (AAV-DIO-hM4Di-mCherrry), but I could not find associated data on Figure 5. Please cite Figure S3, accordingly.

      We have added the citation for the now Fig. S4 on Page 25 of the revised manuscript.

      (5) Personally, the Figure panel labels of "Hi" and "ii" were confusing at first glance. It would be better to have alternatives.

      As suggested by the reviewer, we have now labeled each figure panel with a distinct single alphabetical letter.

      (6) There is a typo on Figure 4A: tdTomata → tdTomato

      We have made the correction on the figure.

      Reviewer #3 (Public review):

      Hawes et al. combined behavioral, optical imaging, and activity manipulation techniques to investigate the role of striatal patch SPNs in locomotion regulation. Using Sepw1-Cre transgenic mice, they found that patch SPNs encode locomotion deceleration in a light-dark box procedure through optical imaging techniques. Moreover, genetic ablation of patch SPNs increased locomotion speed, while chemogenetic activation of these neurons decreased it. The authors concluded that a subtype of patch striatonigral neurons modulates locomotion speed based on external environmental cues. Below are some major concerns:

      The study concludes that patch striatonigral neurons regulate locomotion speed. However, unless I missed something, very little evidence is presented to support the idea that it is specifically striatonigral neurons, rather than striatopallidal neurons, that mediate these effects. In fact, the optogenetic experiments shown in Fig. 6 suggest otherwise. What about the behavioral effects of optogenetic stimulation of striatonigral versus striatopallidal neuron somas in Sepw1-Cre mice?

      Our photometry data implicate striatonigral neurons in locomotor slowing, as evidenced by a negative cross-correlation with acceleration and a negative lag, indicating that their activity reliably precedes—and may therefore contribute to—deceleration. In contrast, photometry results from striatopallidal neurons showed no clear correlation with speed or acceleration.

      Figure 6 demonstrates that optogenetic manipulation within the SNr of Sepw1-Cre<sup>+</sup> striatonigral axons recapitulated context-dependent locomotor changes seen with Gq-DREADD activation of both striatonigral and striatopallidal Sepw1-Cre<sup>+</sup> cells in the dorsal striatum but failed to produce the broader locomotor speed change observed when targeting all Sepw1-Cre<sup>+</sup> cells in the dorsal striatum using either ablation or Gq-DREADD activation. The more subtle speed-restrictive phenotype resulting from ChR activation in the SNr could, as the reviewer suggests, implicate striatopallidal neurons in broad locomotor speed regulation. However, our photometry data indicate that this scenario is unlikely, as activity of striatopallidal Sepw1-Cre<sup>+</sup> fibers is not correlated with locomotor speed. Another plausible explanation is that the optogenetic approach may have affected fewer striatonigral fibers, potentially due to the limited spatial spread of light from the optical fiber within the SNr. Broad locomotor speed change in LDbox might require the recruitment of a larger number of striatonigral fibers than we were able to manipulate with optogenetics. We have added discussion of these technical limitations to the revised manuscript. Additionally, we now discuss the possibility that intrastriatal collaterals may contribute to reduced local dopamine levels by releasing dynorphin, which acts on kappa opioid receptors located on dopamine fibers (Hawes, Salinas et al. 2017), thereby suppressing dopamine release.

      The reviewer also suggests an interesting experiment involving optogenetic stimulation of striatonigral versus striatopallidal somata in Sepw1-Cre mice. While we agree that this approach would yield valuable insights, we have thus far been unable to achieve reliable results using retroviral vectors. Moreover, selectively targeting striatopallidal terminals optogenetically remains technically challenging, as striatonigral fibers also traverse the pallidum, and the broad anatomical distribution of the pallidum complicates precise targeting. This proposed work will need to be pursued in a future study, either with improved retrograde viral tools or the development of additional mouse lines that offer more selective access to these neuronal populations as we documented recently (Dong, Wang et al. 2025).

      In the abstract, the authors state that patch SPNs control speed without affecting valence. This claim seems to lack sufficient data to support it. Additionally, speed, velocity, and acceleration are very distinct qualities. It is necessary to clarify precisely what patch neurons encode and control in the current study.

      We believe the reviewer’s interpretation pertains to a statement in the Introduction rather than the Abstract: “Our findings reveal that patchy SPNs control the speed at which mice navigate the valence differential between high- and low-anxiety zones, without affecting valence perception itself.” Throughout our study, mice consistently preferred the dark zone in the Light/Dark box, indicating intact perception of the valence differential between illuminated areas. While our manipulations altered locomotor speed, they did not affect time spent in the dark zone, supporting the conclusion that valence perception remained unaltered. We appreciate the reviewer’s insight and agree it is an intriguing possibility that locomotor responses could, over time, influence internal states such as anxiety. We addressed this in the Discussion, noting that while dark preference was robust to our manipulations, future studies are warranted to explore the relationship between anxious locomotor vigor and anxiety itself.

      We report changes in scalar measures of animal speed across Light/Dark box conditions and under various experimental manipulations. Separately, we show that activity in both patchy neuron somata and striatonigral fibers is negatively correlated with acceleration—indicating a positive correlation with deceleration. Notably, the direction of the cross-correlational lag between striatonigral fiber activity and acceleration suggests that this activity precedes and may causally contribute to mouse deceleration, thereby influencing reductions in speed. To clarify this, we revised a sentence in the Results section: “Moreover, patchy neuron efferent activity at the SNr may causally contribute to deceleration, as indicated by the negative cross-correlational lag, thereby reducing animal speed.”. We also updated the Discussion to read: “Together, these data specifically implicate patchy striatonigral neurons in slowing locomotion by acting within the SNr to drive deceleration.”

      One of the major results relies on chemogenetic manipulation (Figure 5). It would be helpful to demonstrate through slice electrophysiology that hM3Dq and hM4Di indeed cause changes in the activity of dorsal striatal SPNs, as intended by the DREADD system. This would support both the positive (Gq) and negative (Gi) findings, where no effects on behavior were observed.

      We were unable to perform this experiment; however, hM3Dq has previously been shown to be effective in striatal neurons (Alcacer, Andreoli et al. 2017). The lack of effect observed in Gi-DREADD mice serves as an unintended but valuable control, helping to rule out off-target effects of the DREADD agonist JHU37160 and thereby reinforcing the specificity of hM3Dq-mediated activation in our study. We have now included an important caveat regarding the Gi-DREADD results, acknowledging the possibility that they may not have worked effectively in our target cells: “Potential explanations for the negative results in Gi-DREADD mice include inherently low basal activity among patchy neurons or insufficient expression of GIRK channels in striatal neurons, which may limit the effectiveness of Gi-coupling in suppressing neuronal activity (Shan, Fang et al. 2022).

      Finally, could the behavioral effects observed in the current study, resulting from various manipulations of patch SPNs, be due to alterations in nigrostriatal dopamine release within the dorsal striatum?

      We agree that this is an important potential implication of our work, especially given that we and others have shown that patchy striatonigral neurons provide strong inhibitory input to dopaminergic neurons involved in locomotor control (Nadel, Pawelko et al. 2021, Lazaridis, Crittenden et al. 2024, Dong, Wang et al. 2025, Okunomiya, Watanabe et al. 2025). Accordingly, we have expanded the discussion section to include potential mechanistic explanations that support and contextualize our main findings.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Here are some minor issues for the authors' reference:

      (1) This work supports the motor-suppressing effect of patchy SPNs, and >80% of them are direct pathway SPNs. This conclusion is not expected from the traditional basal ganglia direct/indirect pathway model. Most experiments were performed using nonphysiological approaches to suppress (i.e., ablation) or activate (i.e., continuous chemo-optogenetic stimulation). It remains uncertain if the reported observations are relevant to the normal biological function of patchy SPNs under physiological conditions. Particularly, under what circumstances an imbalanced patch/matrix activity may be induced, as proposed in the sections related to the data presented in Figure 6. A thorough discussion and clarification remain needed. Or it should be discussed as a limitation of the present work.

      We have added discussion and clarification of physiological limitations in response to reviewer feedback. Additionally, we revised the opening sentence of an original paragraph in the discussion section to emphasize that it interprets our findings in the context of more physiological studies reporting natural shifts in patchy SPN activity due to cognitive conflict, stress, or training. The revised opening sentence now reads: “Together with previous studies of naturally occurring shifts in patchy neuron activation, these data illustrate ethologically relevant roles for a subgroup of genetically defined patchy neurons in behavior.”

      (2) Lines 499-500: How striato-nigral cells encode speed and deceleration deserves a thorough discussion and clarification. These striatonigral cells can target both SNr GABAergic neurons and dendrites of the dopaminergic neurons. A discussion of microcircuits formed by the patchy SPNs axons in the SNr GABAergic and SNC DAergic neurons should be presented.

      We have added this point at lines 499–500, including a reference to a relevant review of microcircuitry. Additionally, we expanded the discussion section to address microcircuit mechanisms that may underlie our main findings.

      (3) Line 70: "BNST" should be spelled out at the first time it is mentioned.

      This has been done.

      (4) Line 133: only GCaMP6 was listed in the method, but GCaMP8 was also used (Figure 4). Clarification or details are needed.

      Thank you for your careful attention to detail. We have corrected the typographical errors in the Methods section. Specifically, in the Stereotaxic Injections section, we corrected “GCaMP83” to “GCaMP8s.” In the Fiber Implant section, we removed the incorrect reference to “GCaMP6s” and clarified that GCaMP8s was used for photometry, and hChR2 was used for optogenetics.

      (5) Line 183: Can the authors describe more precisely what "a moment" means in terms of seconds or minutes?

      This has been done.

      (6) Line 288: typo: missing / in ΔF.

      Thank you this has been fixed.

      (7) Line 301-302: the statement of "mCherry and MOR1 colocalization" does not match the images in Figure 1B.

      This has been corrected by proving a new Supplementary Figure S1.

      (8) Related to the statement between Lines 303-304: Figure 1c data may reflect changes in MOR1 protein or cell loss. Quantification of NeuN+ neurons within the MOR1 area would strengthen the conclusion of 60% of patchy cell loss in Figure 1C.

      Since the efficacy of AAV-FLEX-taCasp3 in cell ablation has been well established in our previous publications and those of others (Yang, Chiang et al. 2013, Wu, Kung et al. 2019), we do not believe the observed loss of MOR1 staining in Fig. 1C merely reflects reduced MOR1 expression. Moreover, a general neuronal marker such as NeuN may not reliably detect the specific loss of patchy neurons in our ablation model, given the technical limitations of conventional cell-counting methods like MBF’s StereoInvestigator, which typically exhibit a variability margin of 15–20%.

      (9) Lines 313-314: "Similarly, PA mice demonstrated greater stay-time in the dark zone (Figure 1E)." Revision is needed to better reflect what is shown in Figure 1E and avoid misunderstandings.

      Thank you this has been addressed.

      (10) The color code in Figure 2Gi seems inconsistent with the others? Clarifications are needed.

      Color coding in Figure 2Gi differs from that in 2Eii out of necessity. For example, the "Light" cells depicted in light blue in 2Eii are represented by both light gray and light red dots in 2Gi. Importantly, Figure 2G does not encode specific speed relationships; instead, any association with speed is indicated by a red hue.

      (11) Lines 538-539: the statement of "Over half of the patch was covered" was not supported by Figure 5C. Clarification is needed.

      Thank you. For clarity, we updated the x-axis labels in Figures 1C and 5C from “% area covered” to “% DS area covered,” and defined “DS” as “dorsal striatal” in the corresponding figure legends. Additionally, we revised the sentence in question to read: “As with ablation, histological examination indicated that a substantial fraction of dorsal patch territories, identified through MOR1 staining, were impacted (Fig. 5C).”

      (12) Figure 3: statistical significance in Figure 3 should be labeled in various panels.

      We believe the reviewer's concern pertains to the scatter plot in panel F—specifically, whether the data points are significantly different from zero. In panel 3F, the 95% confidence interval clearly overlaps with zero, indicating that the results are not statistically significant.

      (13) Figures 6D-E: no difference in the speed of control mice and ChR2 mice under continuous optical stimulation was not expected. It was different from Gq-DRADDS study in Figure 5E-F. Clarifications are needed.

      For mice undergoing constant ChR2 activation of Sepw1-Cre<sup>+</sup> SNr efferents, overall locomotor speed does not differ from controls. However, the BIL (bright-to-illuminated) effect on zone transitions is disrupted: activating Sepw1-Cre<sup>+</sup> fibers in the SNr blunts the typical increase in speed observed when mice flee from the light zone toward the dark zone. This impaired BIL-related speed increase upon exiting the light was similarly observed in the Gq-DREADD cohort. The reviewer is correct that this optogenetic manipulation within the SNr did not produce the more generalized speed reductions seen with broader Gq-DREADD activation of all Sepw1-Cre<sup>+</sup> cells in the dorsal striatum. A likely explanation is the difference in targeting—ChR2 specifically activates SNr-bound terminals, whereas Gq-DREADD broadly activates entire Sepw1-Cre<sup>+</sup> cells. Notably, many of the generalized speed profile changes observed with chemogenetic activation are opposite to those resulting from broad ablation of Sepw1-Cre<sup>+</sup> cells.

      The more subtle speed-restrictive phenotype observed with ChR2 activation targeted to the SNr may suggest that fewer striatonigral fibers were affected by this technique, possibly due to the limited spread of light from the fiber optic. Broad locomotor speed change in LDbox might require the recruitment of a larger number of striatonigral fibers than we were able to manipulate with an optogenetic approach. Alternatively, it could indicate that non-striatonigral Sepw1-Cre+ projections—such as striatopallidal or intrastriatal pathways—play a role in more generalized slowing. If striatopallidal fibers contributed to locomotor slowing, we would expect to see non-zero cross-correlations between neural activity and speed or acceleration, along with negative lag indicating that neural activity precedes the behavioral change. However, our fiber photometry data do not support such a role for Sepw1-Cre+ striatopallidal fibers.

      We have also referenced the possibility that intrastriatal collaterals could suppress striatal dopamine levels, potentially explaining the stronger slowing phenotype observed when the entire striatal population is affected, as opposed to selectively targeting striatonigral terminals.

      These technical considerations and interpretive nuances have been incorporated and clarified in the revised discussion section.

      (14) Lines 632: "compliment": a typo?

      Yes, it should be “complement”.

      (15) Figure 4 legend: descriptions of panels A and B were swapped.

      Thank you. This has been corrected.

      6) Friedman (2020) was listed twice in the bibliography (Lines 920-929).

      Thank you. This has been corrected.

      Reviewer #3 (Recommendations for the authors):

      It will be helpful to label and add figure legends below each figure.

      Thank you for the suggestion.

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript. We noted some instances where only p values are reported.

      Readers would also benefit from coding individual data points by sex and noting N/sex.

      We have included detailed statistical information in the revised manuscript. Both male and female mice were used in all experiments in approximately equal numbers. Since no sex-related differences were observed, we did not report the number of animals by sex.

      References

      Alcacer, C., L. Andreoli, I. Sebastianutto, J. Jakobsson, T. Fieblinger and M. A. Cenci (2017). "Chemogenetic stimulation of striatal projection neurons modulates responses to Parkinson's disease therapy." J Clin Invest 127(2): 720-734.

      Crittenden, J. R., P. W. Tillberg, M. H. Riad, Y. Shima, C. R. Gerfen, J. Curry, D. E. Housman, S. B. Nelson, E. S. Boyden and A. M. Graybiel (2016). "Striosome-dendron bouquets highlight a unique striatonigral circuit targeting dopamine-containing neurons." Proc Natl Acad Sci U S A 113(40): 11318-11323.

      Dong, J., L. Wang, B. T. Sullivan, L. Sun, V. M. Martinez Smith, L. Chang, J. Ding, W. Le, C. R. Gerfen and H. Cai (2025). "Molecularly distinct striatonigral neuron subtypes differentially regulate locomotion." Nat Commun 16(1): 2710.

      Dudman, J. T. and J. W. Krakauer (2016). "The basal ganglia: from motor commands to the control of vigor." Curr Opin Neurobiol 37: 158-166.

      Evans, R. C., E. L. Twedell, M. Zhu, J. Ascencio, R. Zhang and Z. M. Khaliq (2020). "Functional Dissection of Basal Ganglia Inhibitory Inputs onto Substantia Nigra Dopaminergic Neurons." Cell Rep 32(11): 108156.

      Gerfen, C. R. and D. J. Surmeier (2011). "Modulation of striatal projection systems by dopamine." Annual review of neuroscience 34: 441-466.

      Hawes, S. L., A. G. Salinas, D. M. Lovinger and K. T. Blackwell (2017). "Long-term plasticity of corticostriatal synapses is modulated by pathway-specific co-release of opioids through kappa-opioid receptors." J Physiol 595(16): 5637-5652.

      Lazaridis, I., J. R. Crittenden, G. Ahn, K. Hirokane, T. Yoshida, A. Mahar, V. Skara, K. Meletis, K. Parvataneni, J. T. Ting, E. Hueske, A. Matsushima and A. M. Graybiel (2024). "Striosomes Target Nigral Dopamine-Containing Neurons via Direct-D1 and Indirect-D2 Pathways Paralleling Classic Direct-Indirect Basal Ganglia Systems." bioRxiv.

      Nadel, J. A., S. S. Pawelko, J. R. Scott, R. McLaughlin, M. Fox, M. Ghanem, R. van der Merwe, N. G. Hollon, E. S. Ramsson and C. D. Howard (2021). "Optogenetic stimulation of striatal patches modifies habit formation and inhibits dopamine release." Sci Rep 11(1): 19847.

      Okunomiya, T., D. Watanabe, H. Banno, T. Kondo, K. Imamura, R. Takahashi and H. Inoue (2025). "Striosome Circuitry Stimulation Inhibits Striatal Dopamine Release and Locomotion." J Neurosci 45(4).

      Shan, Q., Q. Fang and Y. Tian (2022). "Evidence that GIRK Channels Mediate the DREADD-hM4Di Receptor Activation-Induced Reduction in Membrane Excitability of Striatal Medium Spiny Neurons." ACS Chem Neurosci 13(14): 2084-2091.

      Wu, J., J. Kung, J. Dong, L. Chang, C. Xie, A. Habib, S. Hawes, N. Yang, V. Chen, Z. Liu, R. Evans, B. Liang, L. Sun, J. Ding, J. Yu, S. Saez-Atienzar, B. Tang, Z. Khaliq, D. T. Lin, W. Le and H. Cai (2019). "Distinct Connectivity and Functionality of Aldehyde Dehydrogenase 1a1-Positive Nigrostriatal Dopaminergic Neurons in Motor Learning." Cell Rep 28(5): 1167-1181 e1167.

      Yang, C. F., M. C. Chiang, D. C. Gray, M. Prabhakaran, M. Alvarado, S. A. Juntti, E. K. Unger, J. A. Wells and N. M. Shah (2013). "Sexually dimorphic neurons in the ventromedial hypothalamus govern mating in both sexes and aggression in males." Cell 153(4): 896-909.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer 1 (Public Review):

      Many thanks for the positive and constructive feedback on the manuscript.

      This study reveals a great deal about how certain neural representations are altered by expectation and learning on shorter and longer timescales, so I am loath to describe certain limitations as 'weaknesses'. But one limitation inherent in this experimental design is that, by focusing on implicit, task-irrelevant predictions, there is not much opportunity to connect the predictive influences seen at the neural level to the perceptual performance itself (e.g., how participants make perceptual decisions about expected or unexpected events, or how these events are detected or appear).

      Thank you for the interesting comment. We now discuss the limitation of task-irrelevant prediction . In brief, some studies which showed sharpening found that task demands were relevant, while some studies which showed dampening were based on task-irrelevant predictions, but it is unlikely that task relevance - which was not manipulated in the current study - would explain the switch between sharpening and dampening that we observe within and across trials.

      The behavioural data that is displayed (from a post-recording behavioural session) shows that these predictions do influence perceptual choice - leading to faster reaction times when expectations are valid. In broad strokes, we may think that such a result is broadly consistent with a 'sharpening' view of perceptual prediction, and the fact that sharpening effects are found in the study to be larger at the end of the task than at the beginning. But it strikes me that the strongest test of the relevance of these (very interesting) EEG findings would be some evidence that the neural effects relate to behavioural influences (e.g., are participants actually more behaviourally sensitive to invalid signals in earlier phases of the experiment, given that this is where the neural effects show the most 'dampening' a.k.a., prediction error advantage?)

      Thank you for the suggestion. We calculated Pearson’s correlation coefficients for behavioural responses (difference in mean reaction times), neural responses during the sharpening effect (difference in decoding accuracy), and neural responses during the dampening effect for each participant, which resulted in null findings.

      Reviewer 2 (Public Review):

      Thank you for your helpful and constructive comments on the manuscript.

      The strength in controlling for repetition effects by introducing a neutral (50% expectation) condition also adds a weakness to the current version of the manuscript, as this neutral condition is not integrated into the behavioral (reaction times) and EEG (ERP and decoding) analyses. This procedure remained unclear to me. The reported results would be strengthened by showing differences between the neutral and expected (valid) conditions on the behavioral and neural levels. This would also provide a more rigorous check that participants had implicitly learned the associations between the picture category pairings.

      Following the reviewer's suggestion, we have included the neutral condition in the behavioural analysis and performed a repeated measures ANOVA on all three conditions.

      It is not entirely clear to me what is actually decoded in the prediction condition and why the authors did not perform decoding over trial bins in prediction decoding as potential differences across time could be hidden by averaging the data. The manuscript would generally benefit from a more detailed description of the analysis rationale and methods.

      In the original version of the manuscript, prediction decoding aimed at testing if the upcoming stimulus category can be decoded from the response to the preceding ( leading) stimulus. However, in response to the other Reviewers’ comments we have decided to remove the prediction decoding analysis from the revised manuscript as it is now apparent that prediction decoding cannot be separated from category decoding based on pixel information.

      Finally, the scope of this study should be limited to expectation suppression in visual perception, as the generalization of these results to other sensory modalities or to the action domain remains open for future research.

      We have clarified the scope of the study in the revised manuscipt .

      Reviewer 3 (Public Review):

      Thank you for the thought-provoking and interesting comments and suggestions.

      (1) The results in Figure 2C seem to show that the leading image itself can only be decoded with ~33% accuracy (25% chance; i.e. ~8% above chance decoding). In contrast, Figure 2E suggests the prediction (surprisingly, valid or invalid) during the leading image presentation can be decoded with ~62% accuracy (50% chance; i.e. ~12% above chance decoding). Unless I am misinterpreting the analyses, it seems implausible to me that a prediction, but not actually shown image, can be better decoded using EEG than an image that is presented on-screen.

      Following this and the remaining comments by the Reviewer (see below), we have decided to remove the prediction analysis from the manuscript. Specifically, we have focused on the Reviewer’s concern that it is implausible that image prediction would be better decoded that an image that is presented on-screen. This led us to perform a control analysis, in which we tried to decode the leading image category based on pixel values alone (rather than on EEG responses). Since this decoding was above chance, we could not rule out the possibility that EEG responses to leading images reflect physical differences between image categories. This issue does not extend to trailing images, as the results of the decoding analysis based on trailing images are based on accuracy comparisons between valid and invalid trials, and thus image features are counterbalanced. We would like to thank the Reviewer for raising this issue

      (2) The "prediction decoding" analysis is described by the authors as "decoding the predictable trailing images based on the leading images". How this was done is however unclear to me. For each leading image decoding the predictable trailing images should be equivalent to decoding validity (as there were only 2 possible trailing image categories: 1 valid, 1 invalid). How is it then possible that the analysis is performed separately for valid and invalid trials? If the authors simply decode which leading image category was shown, but combine L1+L2 and L4+L5 into one class respectively, the resulting decoder would in my opinion not decode prediction, but instead dissociate the representation of L1+L2 from L4+L5, which may also explain why the time-course of the prediction peaks during the leading image stimulus-response, which is rather different compared to previous studies decoding predictions (e.g. Kok et al. 2017). Instead for the prediction analysis to be informative about the prediction, the decoder ought to decode the representation of the trailing image during the leading image and inter-stimulus interval. Therefore I am at present not convinced that the utilized analysis approach is informative about predictions.

      In this analysis, we attempted to decode ( from the response to leading images) which trailing categories ought to be presented. The analysis was split between trials where the expected category was indeed presented (valid) vs. those in which it was not (invalid). The separation of valid vs invalid trials in the prediction decoding analysis served as a sanity check as no information about trial validity was yet available to participants. However, as mentioned above, we have decided to remove the “prediction decoding” analysis based on leading images as we cannot disentangle prediction decoding from category decoding.

      (3) I may be misunderstanding the reported statistics or analyses, but it seems unlikely that >10  of the reported contrasts have the exact same statistic of Tmax= 2.76 . Similarly, it seems implausible, based on visual inspection of Figure 2, that the Tmax for the invalid condition decoding (reported as Tmax = 14.903) is substantially larger than for the valid condition decoding (reported as Tmax = 2.76), even though the valid condition appears to have superior peak decoding performance. Combined these details may raise concerns about the reliability of the reported statistics.

      Thank you for bringing this to our attention. This copy error has now been rectified.

      (4) The reported analyses and results do not seem to support the conclusion of early learning resulting in dampening and later stages in sharpening. Specifically, the authors appear to base this conclusion on the absence of a decoding effect in some time-bins, while in my opinion a contrast between time-bins, showing a difference in decoding accuracy, is required. Or better yet, a non-zero slope of decoding accuracy over time should be shown ( not contingent on post-hoc and seemingly arbitrary binning).

      Thank you for the helpful suggestion. We have performed an additional analysis to address this issue, we calculated the trial-by-trial time-series of the decoding accuracy benefit for valid vs. invalid for each participant and averaged this benefit across time points for each of the two significant time windows. Based on this, we fitted a logarithmic model to quantify the change of this benefit over trials, then found the trial index for which the change of the logarithmic fit was < 0.1% (i.e., accuracy was stabilized). Given the results of this analysis and to ensure a sufficient number of trials, we focussed our further analyses on bins 1-2 to directly assess the effects of learning. This is explained in more detail in the revised manuscript .

      (5) The present results both within and across trials are difficult to reconcile with previous studies using MEG (Kok et al., 2017; Han et al., 2019), single-unit and multi-unit recordings (Kumar et al., 2017; Meyer & Olson 2011), as well as fMRI (Richter et al., 2018), which investigated similar questions but yielded different results; i.e., no reversal within or across trials, as well as dampening effects with after more training. The authors do not provide a convincing explanation as to why their results should differ from previous studies, arguably further compounding doubts about the present results raised by the methods and results concerns noted above.

      The discussion of these findings has been expanded in the revised manuscript . In short, the experimental design of the above studies did not allow for an assessment of these effects prior to learning. Several of them also used repeated stimuli (albeit some studies changed the pairings of stimuli between trials), potentially allowing for RS to confound their results.

      Recommendations for the Authors:

      Reviewer 1 (Recommendations for the authors):

      (1) On a first read, I was initially very confused by the statement on p.7 that each stimulus was only presented once - as I couldn't then work out how expectations were supposed to be learned! It became clear after reading the Methods that expectations are formed at the level of stimulus category (so categories are repeated multiple times even if exemplars are not). I suspect other readers could have a similar confusion, so it would be helpful if the description of the task in the 'Results' section (e.g., around p.7) was more explicit about the way that expectations were generated, and the (very large) stimulus set that examples are being drawn from.

      Following your suggestion, we have clarified the paradigm by adding details about the categories and the manner in which expectations are formed.

      (2) p.23: the authors write that their 1D decoding images were "subjected to statistical inference amounting to a paired t-test between valid and invalid categories". What is meant by 'amounting to' here? Was it a paired t-test or something statistically equivalent? If so, I would just say 'subjected to a paired t-test' to avoid any confusion, or explaining explicitly which statistic inference was done over.

      We have rephrased this as “subjected to (1) a one-sample t-test against chance-level, equivalent to a fixed-effects analysis, and (2) a paired t-test”.

      Relatedly, this description of an analysis amounting to a 'paired t-test' only seems relevant for the sensory decoding and memory decoding analyses (where there are validity effects) rather than the prediction decoding analysis. As far as I can tell the important thing is that the expected image category can be decoded, not that it can be decoded better or worse on valid or invalid trials.

      In the previous version of the manuscript, the comparison of prediction decoding between valid and invalid trials was meant as a sanity check. However, in response to the other Reviewers’ comments we have decided to remove the prediction decoding analysis from the revised manuscript due to confounds.

      It would be helpful if authors could say a bit more about how the statistical inferences were done for the prediction decoding analyses and the 'condition against baseline' contrasts (e.g., when it is stated that decoding accuracy in valid trials *,in general,* is above 0 at some cluster-wise corrected value). My guess is that this amounts to something like a one-sample t-test - but it may be worth noting that one-sample t-tests on information measures like decoding accuracy cannot support population-level inference, because these measures cannot meaningfully be below 0 (see Allefeld et al, 2016).

      When testing for decoding accuracy against baseline, we used one-sample t-tests against chance level (rather than against 0) throughout the manuscript. We now clarify in the manuscript that this corresponds to a fixed-effects analysis (Allefeld et al., 2016). In contrast, when testing for differences in decoding accuracy between valid and invalid conditions, we used paired-sample t-tests. As mentioned above, the prediction decoding analysis has been removed from the analysis.

      (3) By design, the researchers focus on implicit predictive learning which means the expectations being formed are ( by definition) task-irrelevant. I thought it could be interesting if the authors might speculate in the discussion on how they think their results may or may not differ when predictions are deployed in task-relevant scenarios -  particularly given that some studies have found sharpening effects do not seem to depend on task demands ( e.g., Kok et al, 2012 ; Yon et al, 2018)  while other studies have found that some dampening effects do seem to depend on what the observer is attending to ( e.g., Richter et al, 2018) . Do these results hint at a possible explanation for why this might be? Even if the authors think they don't, it might be helpful to say so!

      Thank you for the interesting comment. We have expanded on this in the revised manuscript.

      Reviewer 2  (Recommendations for the authors):

      Methods/results

      (1) The goal of this study is the assessment of expectation effects during statistical learning while controlling for repetition effects, one of the common confounds in prediction suppression studies (see, Feuerriegel et al., 2021). I agree that this is an important aspect and I assume that this was the reason why the authors introduced the P=0.5 neutral condition (Figure 1B, L3). However, I completely missed the analyses of this condition in the manuscript. In the figure caption of Figure 1C, it is stated that the reaction times of the valid, invalid, and neutral conditions are shown, but only data from the valid and invalid conditions are depicted. To ensure that participants had built up expectations and had learned the pairing, one would not only expect a difference between the valid and invalid conditions but also between the valid and neutral conditions. Moreover, it would also be important to integrate the neutral condition in the multivariate EEG analysis to actually control for repetition effects. Instead, the authors constructed another control condition based on the arbitrary pairings. But why was the neutral condition not compared to the valid and invalid prediction decoding results? Besides this, I also suggest calculating the ERP for the neutral condition and adding it to Figure 2A to provide a more complete picture.

      As mentioned above, we have included the neutral condition in the behavioural analysis, as outlined in the revised manuscript. We have also included a repeated measures ANOVA on all 3 conditions. The purpose of the neutral condition was not to avoid RS, but rather to provide a control condition. We avoided repetition by using individual, categorised stimuli. Figure 1C has been amended to include the neutral condition). In response to the remaining comments, we have decided to remove the prediction decoding analysis from the manuscript.

      (2) One of the main results that is taken as evidence for the OPT is that there is higher decoding accuracy for valid trials (indicate sharpening) early in the trial and higher decoding accuracy for invalid trials (indicate dampening) later in the trial. I would have expected this result for prediction decoding that surprisingly showed none of the two effects. Instead, the result pattern occurred in sensory decoding only, and partly (early sharpening) in memory decoding. How do the authors explain these results? Additionally, I would have expected similar results in the ERP; however, only the early effect was observed. I missed a more thorough discussion of this rather complex result pattern. The lack of the opposing effect in prediction decoding limits the overall conclusion that needs to be revised accordingly.

      Since sharpening vs. dampening rests on the comparison between valid and invalid trials, evidence for sharpening vs. dampening could only be obtained from decoding based on responses to trailing images. In prediction decoding (removed from the current version), information about the validity of the trial is not yet available. Thus, our original plan was to compare this analysis with the effects of validity on the decoding of trailing images (i.e. we expected valid trials to be decoded more accurately after the trailing image than before). The results of the memory decoding did mirror the sensory decoding of the trailing image in that we found significantly higher decoding accuracy of the valid trials from 123-180 ms. As with the sensory decoding, there was a tendency towards a later flip (280-296 ms) where decoding accuracy of invalid trials became nominally higher, but this effect did not reach statistical significance in the memory decoding.

      (3) To increase the comprehensibility of the result pattern, it would be helpful for the reader to clearly state the hypotheses for the ERP and multivariate EEG analyses. What did you expect for the separate decoding analyses? How should the results of different decoding analyses differ and why? Which result pattern would (partly, or not) support the OPT?

      Our hypotheses are now stated in the revised manuscript.

      (4) I was wondering why the authors did not test for changes during learning for prediction decoding. Despite the fact that there were no significant differences between valid and invalid conditions within-trial, differences could still emerge when the data set is separated into bins. Please test and report the results.

      As mentioned above, we have decided to remove the prediction decoding analysis from the current version of the manuscript.

      (5) To assess the effect of learning the authors write: 'Given the apparent consistency of bins 2-4, we focused our analyses on bins 1-2.' Please explain what you mean by 'apparent consistency'. Did you test for consistency or is it based on descriptive results? Why do the authors not provide the complete picture and perform the analyses for all bins? This would allow for a better assessment of changes over time between valid and invalid conditions. In Figure 3, were valid and invalid trials different in any of the QT3 or QT4 bins in sensory or memory encoding?

      We have performed an additional analysis to address this issue. The reasoning behind the decision to focus on bins 1-2 is now explained in the revised manuscript. In short, fitting a learning curve to trial-by-trial decoding estimates indicates that decoding stabilizes within <50% of the trials. To quantify changes in decoding occurring within these <50% of the trials while ensuring a sufficient number of trials for statistical comparisons, we decided to focus on bins 1-2 only.

      (6) Please provide the effect size for all statistical tests.

      Effect sizes have now been provided.

      (7) Please provide exact p-values for non-significant results and significant results larger than 0.001.

      Exact p-values have now been provided.

      (8) Decoding analyses: I suppose there is a copy/paste error in the T-values as nearly all T-values on pages 11 and 12 are identical (2.76) leading to highly significant p-values (0.001) as well as non-significant effects (>0.05). Please check.

      Thank you for bringing this to our attention. This error has now been corrected.

      (9) Page 12:  There were some misleading phrases in the result section. To give one example: 'control analyses was slightly above change' - this sounds like a close to non-significant effect, but it was indeed a highly significant effect of p<0.001. Please revise.

      This phrase was part of the prediction decoding analysis and has therefore been removed.

      (10) Sample size: How was the sample size of the study be determined (N=31)? Why did only a subgroup of participants perform the behavioral categorization task after the EEG recording? With a larger sample, it would have been interesting to test if participants who showed better learning (larger difference in reaction times between valid and invalid conditions) also showed higher decoding accuracies.

      This has been clarified in the revised manuscript. In short, the larger sample size of N=31 was based on previous research; ten participants were initially tested as part of a pilot which was then expanded to include the categorisation task.

      (11) I assume catch trials were removed before data analyses?

      We have clarified that catch trials were indeed removed prior to analyses.

      (12) Page 23, 1st line: 'In each, the decoder...' Something is missing here.

      Thank you for bringing this to our attention, this sentence has now been rephrased as “In both valid and invalid analyses” in the revised manuscript.

      Discussion

      (1) The analysis over multiple trials showed dampening within the first 15 min followed by sharpening. I found the discussion of this finding very lengthy and speculative (page 17). I recommend shortening this part and providing only the main arguments that could stimulate future research.

      Thank you for the suggestion. Since Reviewer 3 has requested additional details in this part of the discussion, we have opted to keep this paragraph in the manuscript. However, we have also made it clearer that this section is relatively speculative and the arguments provided for the across trials dynamics are meant to stimulate further research.

      (2) As this task is purely perceptual, the results support the OPT for the area of visual perception. For action, different results have been reported. Suppression within-trial has been shown to be larger for expected than unexpected features of action targets and suppression even starts before the start of the movement without showing any evidence for sharpening ( e.g., Fuehrer et al., 2022, PNAS). For suppression across trials, it has been found that suppression decreases over the course of learning to associate a sensory consequence to a specific action (e.g., Kilteni et al., 2019, ELife). Therefore, expectation suppression might function differently in perception and action (an area that still requires further research). Please clarify the scope of your study and results on perceptual expectations in the introduction, discussion, and abstract.

      We have clarified the scope of the study in the revised manuscript.

      Figures

      (1) Figure 1A: Add 't' to the arrow to indicate time.

      This has been rectified.

      (2) Figure 3:  In the figure caption, sensory and memory decoding seem to be mixed up. Please correct. Please add what the dashed horizontal line indicates.

      Thank you for bringing this to our attention, this has been rectified.

      Reviewer 3  (Recommendations for the authors):

      I applaud the authors for a well-written introduction and an excellent summary of a complicated topic, giving fair treatment to the different accounts proposed in the literature. However, I believe a few additional studies should be cited in the Introduction, particularly time-resolved studies such as Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011. This would provide the reader with a broader picture of the current state of the literature, as well as point the reader to critical time-resolved studies that did not find evidence in support of OPT, which are important to consider in the interpretation of the present results.

      The introduction has been expanded to include the aforementioned studies in the revised manuscript.

      Given previous neuroimaging studies investigating the present phenomenon, including with time-resolved measures (e.g. Kok et al., 2017; Han et al., 2019; Kumar et al., 2017; Meyer & Olson 2011), why do the authors think that their data, design, or analysis allowed them to find support for OPT but not previous studies? I do not see obvious modifications to the paradigm, data quantity or quality, or the analyses that would suggest a superior ability to test OPT predictions compared to previous studies. Given concerns regarding the data analyses (see points below), I think it is essential to convincingly answer this question to convince the reader to trust the present results.

      The most obvious alteration to the paradigm is the use of non-repeated stimuli. Each of the above time-resolved studies utilised repeated stimuli (either repeated, identical stimuli, or paired stimuli where pairings are changed but the pool of stimuli remains the same), allowing for RS to act as a confound as exemplars are still presented multiple times. By removing this confound, it is entirely plausible that we may find different time-resolved results given that it has been shown that RS and ES are separable in time (Todorovic & de Lange, 2012). We also test during learning rather than training participants on the task beforehand. By foregoing a training session, we are better equipped to assess OPT predictions as they emerge. In our across-trial results, learning appears to take place after approximately 15 minutes or 432 trials, at which point dampening reverses to sharpening. Had we trained the participants prior to testing, this effect would have been lost.

      What is actually decoded in the "prediction decoding" analysis? The authors state that it is "decoding the predictable trailing images based on the leading images" (p.11). The associated chance level (Figure 2E) is indicated as 50%. This suggests that the classes separated by the SVM are T6 vs T7. How this was done is however unclear. For each leading image decoding the predictable trailing images should be equivalent to decoding validity (as there are only 2 possible trailing images, where one is the valid and the other the invalid image). How is it then possible that the analysis is performed separately for valid and invalid trials? Are the authors simply decoding which leading image was shown, but combine L1+L2 and L4+L5 into one class respectively? If so, this needs to be better explained in the manuscript. Moreover, the resulting decoder would in my opinion not decode the predicted image, but instead learn to dissociate the representation of L1+L2 from L4+L5, which may also explain why the time course of the prediction peaks during the leading image stimulus-response, which is rather different compared to previous studies decoding (prestimulus) predictions (e.g. Kok et al. 2017). If this is indeed the case, I find it doubtful that this analysis relates to prediction. Instead for the prediction analysis to be informative about the predicted image the authors should, in my opinion, train the decoder on the representation of trailing images and test it during the prestimulus interval.

      As mentioned above, the prediction decoding analysis has been removed from the manuscript. The prediction decoding analysis was intended as a sanity check, as validity information was not yet available to participants.

      Related to the point above, were the leading/trailing image categories and their mapping to L1, L2, etc. in Figure 1B fixed across subjects? I.e. "'beach' and 'barn' as 'Leading' categories would result in 'church' as a 'Trailing' category with 75% validity" (p.20) for all participants? If so, this poses additional problems for the interpretation of the analysis discussed in the point above, as it may invalidate the control analyses depicted in Figure 2E, as systematic differences and similarities in the leading image categories could account for the observed results.

      Image categories and their mapping were indeed fixed across participants. While this may result in physical differences and similarities between images influencing results, counterbalancing categories across participants would not have addressed this issue. For example, had we swapped “beach” with “barn” in another participant, physical differences between images may still be reflected in the prediction decoding. On the other hand, counterbalancing categories across trials was not possible given our aim of examining the initial stages of learning over trials. Had we changed the mappings of categories throughout the experiment for each participant, we would have introduced reversal learning and nullified our ability to examine the initial stages of learning under flat priors. In any case, the prediction decoding analysis has been removed from the manuscript, as outlined above.

      Why was the neutral condition L3 not used for prediction decoding? After all, if during prediction decoding both the valid and invalid image can be decoded, as suggested by the authors, we would also expect significant decoding of T8/T9 during the L3 presentation.

      In the neutral condition, L3 was followed by T8 vs. T9 with 50% probability, precluding prediction decoding. While this could have served as an additional control analysis for EEG-based decoding, we have opted for removing prediction decoding from the analysis. However, in response to the other Reviewers’ comments, the neutral condition has now been included in the behavioral analysis.

      The following concern may arise due to a misunderstanding of the analyses, but I found the results in Figures 2C and 2E concerning. If my interpretation is correct, then these results suggest that the leading image itself can only be decoded with ~33% accuracy (25% chance; i.e. ~8% above chance decoding). In contrast, the predicted (valid or invalid) image during the leading image presentation can be decoded with ~62% accuracy (50% chance; i.e. ~12% above chance decoding). Does this seem reasonable? Unless I am misinterpreting the analyses, it seems implausible to me that a prediction but not actually shown image can be better decoded than an on-screen image. Moreover, to my knowledge studies reporting decoding of predictions can (1) decode expectations just above chance level (e.g. Kok et al., 2017; which is expected given the nature of what is decoded) and (2) report these prestimulus effects shortly before the anticipated stimulus onset, and not coinciding with the leading image onset ~800ms before the predicted stimulus onset. For the above reasons, the key results reported in the present manuscript seem implausible to me and may suggest the possibility of problems in the training or interpretation of the decoding analysis. If I misunderstood the analyses, the analysis text needs to be refined. If I understood the analyses correctly, at the very least the authors would need to provide strong support and arguments to convince the reader that the effects are reliable (ruling out bias and explaining why predictions can be decoded better than on-screen stimuli) and sensible (in the context of previous studies showing different time-courses and results).

      As explained above, we have addressed this concern by performing an additional analysis, implementing decoding based on image pixel values. Indeed we could not rule out the possibility that “prediction” decoding reflected stimulus differences between leading images.

      Relatedly, the authors use the prestimulus interval (-200 ms to 0 ms before predicted stimulus onset) as the baseline period. Given that this period coincides with prestimulus expectation effects ( Kok et al., 2017) , would this not result in a bias during trailing image decoding? In other words, the baseline period would contain an anticipatory representation of the expected stimulus ( Kok et al., 2017) , which is then subtracted from the subsequent EEG signal, thereby allowing the decoder to pick up on this "negative representation" of the expected image. It seems to me that a cleaner contrast would be to use the 200ms before leading image onset as the baseline.

      The analysis of trailing images aimed at testing specific hypotheses related to differences between decoding accuracy in valid vs. invalid trials. Since the baseline was by definition the same for both kinds of trials (since information about validity only appears at the onset of the trailing image), changing the baseline would not affect the results of the analysis. Valid and invalid trials would have the same prestimulus effect induced by the leading image.

      Again, maybe I misunderstood the analyses, but what exactly are the statistics reported on p. 11 onward? Why is the reported Tmax identical for multiple conditions, including the difference between conditions? Without further information this seems highly unlikely, further casting doubts on the rigor of the applied methods/analyses. For example: "In the sensory decoding analysis based on leading images, decoding accuracy was above chance for both valid (Tmax= 2.76, pFWE < 0.001) and invalid trials (Tmax= 2.76, pFWE < 0.001) from 100 ms, with no significant difference between them (Tmax= 2.76, pFWE > 0.05) (Fig. 2C)" (p.11).

      Thank you for bringing this to our attention. As previously mentioned, this copy error has been rectified in the revised manuscript.

      Relatedly, the statistics reported below in the same paragraph also seem unusual. Specifically, the Tmax difference between valid and invalid conditions seems unexpectedly large given visual inspection of the associated figure: "The decoding accuracy of both valid (Tmax = 2.76, pFWE < 0.001) and invalid trials (Tmax = 14.903, pFWE < 0.001)" (p.12). In fact, visual inspection suggests that the largest difference should probably be observed for the valid not invalid trials (i.e. larger Tmax).

      This copy error has also been rectified in the revised manuscript.

      Moreover, multiple subsequent sections of the Results continue to report the exact same Tmax value. I will not list all appearances of "Tmax = 2.76" here but would recommend the authors carefully check the reported statistics and analysis code, as it seems highly unlikely that >10 contrasts have exactly the same Tmax. Alternatively, if I misunderstand the applied methods, it would be essential to better explain the utilized method to avoid similar confusion in prospective readers.

      This error has also now been rectified. As mentioned above the prediction decoding analysis has been removed.

      I am not fully convinced that Figures 3A/B and the associated results support the idea that early learning stages result in dampening and later stages in sharpening. The inference made requires, in my opinion, not only a significant effect in one-time bin and the absence of an effect in other bins. Instead to reliably make this inference one would need a contrast showing a difference in decoding accuracy between bins, or ideally an analysis not contingent on seemingly arbitrary binning of data, but a decrease ( or increase) in the slope of the decoding accuracy across trials. Moreover, the decoding analyses seem to be at the edge of SNR, hence making any interpretation that depends on the absence of an effect in some bins yet more problematic and implausible.

      Thank you for the helpful suggestion. As previously mentioned we fitted a logarithmic model to quantify the change of the decoding benefit over trials, then found the trial index for which the change of the logarithmic fit was < 0.1 %. Given the results of this analysis and to ensure a sufficient number of trials, we focussed our further analyses on bins 1-2 . This is explained in more detail in the revised manuscript.

      Relatedly, based on the literature there is no reason to assume that the dampening effect disappears with more training, thereby placing more burden of proof on the present results. Indeed, key studies supporting the dampening account (including human fMRI and MEG studies, as well as electrophysiology in non-human primates) usually seem to entail more learning than has occurred in bin 2 of the present study. How do the authors reconcile the observation that more training in previous studies results in significant dampening, while here the dampening effect is claimed to disappear with less training?

      The discussion of these findings has been expanded on in the revised manuscript. As previously outlined, many of the studies supporting dampening did not explicitly test the effect of learning as they emerge, nor did they control for RS to the same extent.

      The Methods section is quite bare bones. This makes an exact replication difficult or even impossible. For example, the sections elaborating on the GLM and cluster-based FWE correction do not specify enough detail to replicate the procedure. Similarly, how exactly the time points for significant decoding effects were determined is unclear (e.g., p. 11). Relatedly, the explanation of the decoding analysis, e.g. the choice to perform PCA before decoding, is not well explained in the present iteration of the manuscript. Additionally, it is not mentioned how many PCs the applied threshold on average resulted in.

      Thank you for this suggestion, we have described our methods in more detail.

      To me, it is unclear whether the PCA step, which to my knowledge is not the default procedure for most decoding analyses using EEG, is essential to obtain the present results. While PCA is certainly not unusual, to my knowledge decoding of EEG data is frequently performed on the sensor level as SVMs are usually capable of dealing with the (relatively low) dimensionality of EEG data. In isolation this decision may not be too concerning, however, in combination with other doubts concerning the methods and results, I would suggest the authors replicate their analyses using a conventional decoding approach on the sensory level as well.

      Thank you for this suggestion, we have explained our decision to use PCA in the revised manuscript.

      Several choices, like the binning and the focus on bins 1-2 seem rather post-hoc. Consequently, frequentist statistics may strictly speaking not be appropriate. This further compounds above mentioned concerns regarding the reliability of the results.

      The reasoning behind our decision to focus on bins 1-2 is now explained in more detail in the revised manuscript.

      A notable difference in the present study, compared to most studies cited in the introduction motivating the present experiment, is that categories instead of exemplars were predicted.

      This seems like an important distinction to me, which surprisingly goes unaddressed in the Discussion section. This difference might be important, given that exemplar expectations allow for predictions across various feature levels (i.e., even at the pixel level), while category predictions only allow for rough (categorical) predictions.

      The decision to use categorical predictions over exemplars lies in the issue of RS, as it is impossible to control for RS while repeating stimuli over many trials. This has been discussed in more detail in the revised manuscript.

      While individually minor problems, I noticed multiple issues across several figures or associated figure texts. For example: Figure 1C only shows valid and invalid trials, but the figure text mentions the neutral condition. Why is the neutral condition not depicted but mentioned here? Additionally, the figure text lacks critical information, e.g. what the asterisk represents. The error shading in Figure 2 would benefit from transparency settings to not completely obscure the other time-courses. Increasing the figure content and font size within the figure (e.g. axis labels) would also help with legibility (e.g. consider compressing the time-course but therefore increasing the overall size of the figure). I would also recommend using more common methods to indicate statistical significance, such as a bar at the bottom of the time-course figure typically used for cluster permutation results instead of a box. Why is there no error shading in Figure 2A but all other panels? Fig 2C-F has the y-axis label "Decoding accuracy (%)" but certainly the y-axis, ranging roughly from 0.2 to 0.7, is not in %. The Figure 3 figure text gives no indication of what the error bars represent, making it impossible to interpret the depicted data. In general, I would recommend that the authors carefully revisit the figures and figure text to improve the quality and complete the information.

      Thank you for the suggestions. Figure 1C now includes the neutral condition. Asterisks denote significant results. The font size in Figure 2C-E has been increased. The y-axis on Figure 2C-E has been amended to accurately reflect decoding accuracy in percentage. Figure 2A has error shading, however, the error is sufficiently small that the error shading is difficult to see. The error bars in Figure 3 have been clarified.

      Given the choice of journal (eLife), which aims to support open science, I was surprised to find no indication of (planned) data or code sharing in the manuscript.

      Plans for sharing code/data are now outlined in the revised manuscript.

      While it is explained in sufficient detail later in the Methods section, it was not entirely clear to me, based on the method summary at the beginning of the Results section, whether categories or individual exemplars were predicted. The manuscript may benefit from clarifying this at the start of the Results section.

      Thank you for this suggestion, following this and suggestions from other reviewers, the experimental paradigm and the mappings between categories has been further explained in the revised manuscript, to make it clearer that predictions are made at the categorical level.

      "Unexpected trials resulted in a significantly increased neural response 150 ms after image onset" (p.9). I assume the authors mean the more pronounced negative deflection here. Interpreting this, especially within the Results section as "increased neural response" without additional justification may stretch the inferences we can make from ERP data; i.e. to my knowledge more pronounced ERPs could also reflect increased synchrony. That said, I do agree with the authors that it is likely to reflect increased sensory responses, it would just be useful to be more cautious in the inference.

      Thank you for the interesting comment, this has been rephrased as a “more pronounced negative deflection” in the revised manuscript.

      Why was the ERP analysis focused exclusively on Oz? Why not a cluster around Oz? For object images, we may expect a rather wide dipole.

      Feuerriegel et al (2021) have outlined issues questioning the robustness of univariate analyses for ES, as such we opted for a targeted ROI approach on the channel showing peak amplitude of the visually evoked response (Fig. 2B). More details on this are in the revised manuscript.           

      How exactly did the authors perform FWE? The description in the Method section does not appear to provide sufficient detail to replicate the procedure.

      FWE as implemented in SPM is a cluster-based method of correcting for multiple comparisons using random field theory. We have explained our thresholding methods in more detail in the revised manuscript.

      If I misunderstand the authors and they did indeed perform standard cluster permutation analyses, then I believe the results of the timing of significant clusters cannot be so readily interpreted as done here (e.g. p.11-12); see: Maris & Oostenveld 2007; Sassenhagen & Dejan 2019.

      All statistics were based on FWE under random field theory assumptions (as implemented in SPM) rather than on cluster permutation tests (as implemented in e.g.  Fieldtrip)

      Why did the authors choose not to perform spatiotemporal cluster permutation for the ERP results?

      As mentioned above, we opted to target our ERP analyses on Oz due to controversies in the literature regarding univariate effects of ES (Feuerriegel et al., 2021).

      Some results, e.g. on p.12 are reported as T29 instead of Tmax. Why?

      As mentioned above, prediction decoding analyses have been removed from the manuscript.

    1. Reviewer #1 (Public review):

      Summary:

      The authors introduce a novel algorithm for the automatic identification of long-range axonal projections. This is an important problem as modern high-throughput imaging techniques can produce large amounts of raw data, but identifying neuronal morphologies and connectivities requires large amounts of manual work. The algorithm works by first identifying points in three-dimensional space corresponding to parts of labelled neural projections, these are then used to identify short sections of axon using an optimisation algorithm and the prior knowledge that axonal diameters are relatively constant. Finally, a statistical model that assumes axons tend to be smooth is used to connect the sections together into complete and distinct neural trees. The authors demonstrate that their algorithm is far superior to existing techniques, especially when a dense labelling of the tissue means that neighbouring neurites interfere with the reconstruction. Despite this improvement, however, the accuracy of reconstruction remains below 90%, so manual proof-reading is still necessary to produce accurate reconstructions of axons.

      Strengths:

      The new algorithm combines local and global information to make a significant improvement on the state-of -the-art for automatic axonal reconstruction. The method could be applied more broadly and might have applications to reconstructions of electron microscopy data, where similar issues of high-throughput imaging and relatively slow or inaccurate reconstruction remain.

      Weaknesses:

      There are three weaknesses with the algorithm and manuscript.

      (1) The best reconstruction accuracy is below 90%, which does not fully solve the problem of needing manual proof-reading.

      (2) The 'minimum information flow tree' model the authors use to construct connected axonal trees has the potential to bias data collection. In particular, the assumption that axons should always be as smooth as possible is not always correct. This is a good rule-of-thumb for reconstructions, but real axons in many systems can take quite sharp turns and this is also seen in the data presented in the paper (Fig 1C). I would like to see explicit acknowledgement of this bias in the current manuscript and ideally a relaxation of this rule in any later versions of the algorithm.

      (3) The writing of the manuscript is not always as clear as it could be. The manuscript would benefit from careful copy editing for language, and the Methods section in particular should be expanded to more clearly explain what each algorithm is doing. The pseudo code of the Supplemental Information could be brought into the Methods if possible as these algorithms are so fundamental to the manuscript.

      Comments on revisions: I have no further comments or recommendations.

    2. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors introduce a novel algorithm for the automatic identification of longrange axonal projections. This is an important problem as modern high-throughput imaging techniques can produce large amounts of raw data, but identifying neuronal morphologies and connectivities requires large amounts of manual work. The algorithm works by first identifying points in three-dimensional space corresponding to parts of labelled neural projections, these are then used to identify short sections of axons using an optimisation algorithm and the prior knowledge that axonal diameters are relatively constant. Finally, a statistical model that assumes axons tend to be smooth is used to connect the sections together into complete and distinct neural trees. The authors demonstrate that their algorithm is far superior to existing techniques, especially when dense labelling of the tissue means that neighbouring neurites interfere with the reconstruction. Despite this improvement, however, the accuracy of reconstruction remains below 90%, so manual proofreading is still necessary to produce accurate reconstructions of axons.

      Strengths:

      The new algorithm combines local and global information to make a significant improvement on the state-of-the-art for automatic axonal reconstruction. The method could be applied more broadly and might have applications to reconstructions of electron microscopy data, where similar issues of highthroughput imaging and relatively slow or inaccurate reconstruction remain.

      We thank the reviewer for their positive comments and for taking the time to review our manuscript. We are truly grateful that the reviewer recognized the value of our method in automatically reconstructing long-range axonal projections. While we report that our method achieves reconstruction accuracy of approximately 85%, we fully acknowledge that manual proofreading is still necessary to ensure accuracy greater than 95%. We also appreciate the reviewer’s insightful suggestion regarding the potential adaptation of our algorithm for reconstructing electron microscopy (EM) data, where similar challenges in high-throughput imaging and relatively slow or inaccurate reconstruction persist. We look forward to exploring ways to integrate our method with EM data in future work.

      Weaknesses:

      There are three weaknesses in the algorithm and manuscript.

      (1) The best reconstruction accuracy is below 90%, which does not fully solve the problem of needing manual proofreading.

      We sincerely appreciate the reviewer's valuable insights regarding reconstruction accuracy. Indeed, as illustrated in Figure S4, our current best automated reconstruction accuracy on fMOST data is still below 90%. This indicates that manual proofreading remains essential to ensure reliability.

      For the reconstruction of long-range axonal projections, ensuring the accuracy of the reconstruction process necessitates manual revision of the automatically generated results. Existing literature has demonstrated that a higher accuracy in automatic reconstruction correlates with a reduced need for manual revisions, thereby facilitating an accelerated reconstruction process (Winnubst et al., Cell 2019; Liu et al., Nature Methods 2025).

      As the reviewer rightly points out, achieving an accuracy exceeding 95% currently necessitates manual proofreading. Although our method does not completely eliminate this requirement, it significantly alleviates the proofreading workload by: 1) Minimizing common errors in regions with dense neuron distributions; 2) Providing more reliable initial reconstructions; and 3) Reducing the number of corrections needed during the proofreading process.

      In the future, we will continue to enhance our reconstruction framework. As imaging systems achieve higher signal-to-noise ratios and deep learning techniques facilitate more accurate foreground detection, we anticipate that our method will attain even greater reconstruction accuracy. Furthermore, we plan to develop a software system capable of predicting potential error locations in our automated reconstruction results, thereby streamlining manual revisions. This approach distinguishes itself from existing models by obviating the need for individual traversal of the brain regions associated with each neuron reconstruction.

      (2) The 'minimum information flow tree' model the authors use to construct connected axonal trees has the potential to bias data collection. In particular, the assumption that axons should always be as smooth as possible is not always correct. This is a good rule-of-thumb for reconstructions, but real axons in many systems can take quite sharp turns and this is also seen in the data presented in the paper (Figure 1C). I would like to see explicit acknowledgement of this bias in the current manuscript and ideally a relaxation of this rule in any later versions of the algorithm.

      We appreciate the reviewer's insightful opinion regarding the potential bias introduced by our minimum information flow tree model. The reviewer is absolutely correct in noting that while axon smoothness serves as a useful reconstruction heuristic, it should not be treated as an absolute constraint given that real axons can exhibit sharp turns (as shown in Figure 1C). In response to this valuable feedback, we add explicit discussion of this limitation in Discussion section as follow: “Finally, the minimal information flow tree’s fundamental assumption, that axons should be as smooth as possible does not always hold true.

      In fact, real axons can take quite sharp turns leading the algorithm to erroneously separate a single continuous axon into disjoint neurites.”

      In our reconstruction process, the post-processing approach partially mitigates erroneous reconstructions derived from this rule. Specifically: The minimum information flow tree will decompose such structures into two separate branches (Fig. S7A), but the decomposition node is explicitly recorded. The newly decomposed branches attempt to reconnect by searching for plausible neurites starting from their head nodes (determined by the minimum information flow tree). If no connectable neurites are found, the branch is automatically reconnected to its originally recorded decomposition node (Fig. S7B). In Fig.S7C, two reconstruction examples demonstrate the effectiveness of the post-processing approach.

      As pointed out by the reviewers, the proposed rule for revising neuron reconstruction does not encompass all scenarios. Relaxing the constraints of this rule may lead to numerous new erroneous connections. Currently, the proposed rule is solely based on the positions of neurite centerlines and does not integrate information regarding the intensity of the original images or segmentation data. Incorporating these elements into the rule could potentially reduce reconstruction errors. 

      (3) The writing of the manuscript is not always as clear as it could be. The manuscript would benefit from careful copy editing for language, and the Methods section in particular should be expanded to more clearly explain what each algorithm is doing. The pseudo-code of the Supplemental Information could be brought into the Methods if possible as these algorithms are so fundamental to the manuscript.

      We sincerely thank the reviewer for these valuable suggestions to improve our manuscript’s clarity and methodological presentation. We have implemented the following revisions:

      (1) Language Enhancement: we have conducted rigorous internal linguistic reviews to address grammatical inaccuracies and improve textual clarity.

      (2) Methods Expansion and Pseudo-code Integration: we have incorporated all relevant derivations from the Supplementary Materials into the Methods section, with additional explanatory text to clarify the purpose and implementation of each algorithm. All mathematical formulations have been systematically rederived with modifications to variable nomenclature, subscript/superscript notations and identified errors in the original submission. All pseudocode from Supplementary Materials has been integrated into their corresponding methods subsection.

      Reviewer #2 (Public review):

      In this manuscript, Cai et al. introduce PointTree, a new automated method for the reconstruction of complex neuronal projections. This method has the potential to drastically speed up the process of reconstructing complex neurites. The authors use semi-automated manual reconstruction of neurons and neurites to provide a 'ground-truth' for comparison between PointTree and other automated reconstruction methods. The reconstruction performance is evaluated for precision, recall, and F1-score and positions. The performance of PointTree compared to other automated reconstruction methods is impressive based on these 3 criteria.

      As an experimentalist, I will not comment on the computational aspects of the manuscript. Rather, I am interested in how PointTree's performance decreases in noisy samples. This is because many imaging datasets contain some level of background noise for which the human eye appears essential for the accurate reconstruction of neurites. Although the samples presented in Figure 5 represent an inherent challenge for any reconstruction method, the signal-to-noise ratio is extremely high (also the case in all raw data images in the paper). It would be interesting to see how PointTree's performance changes in increasingly noisy samples, and for the author to provide general guidance to the scientific community as to what samples might not be accurately reconstructed with PointTree.

      We thank the reviewer for her/his time reviewing our manuscript and the interest on how PointTree perform on noisy samples. It is important to clarify that PointTree is solely responsible for the reconstruction of neurons from the foreground regions of neural images. The foreground regions of these neuronal images are obtained through a deep learning segmentation network. In cases where the image has a low signal-to-noise ratio, if the segmentation network can accurately identify the foreground areas, then PointTree will be able to accurately reconstruct neurons. In fact, existing deep learning networks have demonstrated their capability to effectively extract foreground regions from low signal-to-noise ratio images; therefore, PointTree is well-suited for processing neuronal images characterized by low signal-to-noise ratios.

      In the revised manuscript, we conducted experiments on datasets with varying signal-to-noise ratios (SNR). The results demonstrate that Unet3D is capable of identifying the foreground regions in low-SNR images, thereby supporting the assertion that PointTree has broad applicability across diverse neuronal imaging datasets. 

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      It would be interesting to see how PointTree's performance changes in increasingly noisy samples, and for the author to provide general guidance to the scientific community as to what samples might not be accurately reconstructed with PointTree.

      We extend our heartfelt gratitude to the reviewer for their insightful suggestion concerning experiments involving different noisy samples. Here are the details of the datasets used:

      LSM dataset: Mean SNR = 5.01, with 25 samples, and a volume size of 192×192×192.

      fMOST dataset: Mean SNR = 8.68, with 25 samples, and a volume size of 192×192×192.

      HD-fMOST dataset: Mean SNR = 11.4, with 25 samples, and a volume size of 192×192×192.

      The experimental results reveal that, thanks to the deep learning network's robust feature extraction capabilities, even when working with low-SNR data (as depicted in Figure 4B, first two columns of the top row), satisfactory segmentation results (Figure 4B, first two columns of the third row) were achieved. These results laid a solid foundation for subsequent accurate reconstruction.

      PointTree demonstrated consistent mean F1-scores of 91.0%, 90.0%, and 93.3% across the three datasets, respectively. This underscores its reconstruction robustness under varying SNR conditions when supported by the segmentation network. For more in-depth information, please refer to the manuscript section titled "Reconstruction of data with different signal-to-noise ratios" and Figure 4.

    1. How about we use Python to process real-world data and then draw a few charts? Okay sure, let's fire up our trusty 1960s-era text editor (not Microsoft Word) and write some code. Wait, first we need to install the proper add-on libraries such as NumPy and Matplotlib. [an hour of troubleshooting later, especially for Windows users ...] Okay, let's write some code. [type, type, type] Yeah, isn't this fun and intuitive? Python makes it all so easy ...