10 Matching Annotations
  1. Apr 2026
    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary

      The manuscript by K.H. Lee et al. presents Spyglass, a new open-source framework for building reproducible pipelines in systems neuroscience. The framework integrates the NWB (Neurodata Without Borders) data standard with the DataJoint relational database system to organize and manage analysis workflows. It enables the construction of complete pipelines, from raw data acquisition to final figures. The authors demonstrate their capabilities through examples, including spike sorting, LFP filtering, and sharpwave ripple (SWR) detection. Additionally, the framework supports interactive visualizations via integration with Figurl, a platform for sharing neuroscience figures online.

      Strengths:

      Reproducibility in data analysis remains a significant challenge within the neuroscience community, posing a barrier to scientific progress. While many journals now require authors to share their data and code upon publication, this alone does not ensure that the code will execute properly or reproduce the original results. Recognizing this gap, the authors aim to address the community's need for a robust tool to build reproducible pipelines in systems neuroscience.

      We appreciate the summary and the recognition of the key need for maximally reproducible scientific workflows.

      Weaknesses:

      The issues identified here may serve as a foundation for future development efforts.

      (1) User-friendliness:

      The primary concern is usability. The manuscript does not clearly define the intended user base within a modern systems neuroscience lab. Improving user experience and lowering the barrier to entry would significantly enhance the framework's potential for broad adoption. The authors provide an online example notebook and a local setup notebook. However, the local setup process is overly complex, with many restrictive steps that could discourage new users. A more streamlined and clearly documented onboarding process is essential. Additionally, the lack of Windows support represents a practical limitation, particularly if the goal is widespread adoption across diverse research environments.

      We agree that usability is critical, and we now clarify that Spyglass

      “… is designed to be used by everyone in a laboratory who works with the data, both as a general-purpose tool to enable the development of new analysis pipelines and a tool that allows those pipelines and associated results to be frozen and packaged to enable reproducibility…”

      To address the local setup issue, we have now created an interactive quick start program to guide new users through the setup (scripts/install.py). It now leads the user through a few prompts with sensible defaults to reduce the complexity of the setup. It aids the user in installing the Spyglass dependencies and creating the Data joint configuration file. We also validate the configuration to make sure the set up was successful (scripts/validate.py). Combined, these should reduce the complexity and set up time for most users while allowing expert users to configure Spyglass as they need. We thank the reviewer for the suggestion.

      We also agree that the lack of support for Windows is a key issue, and that is something we plan to address in the coming years. We note that it may be possible to run Spyglass under the Windows Subsystem for Linux (WSL 2), which allows users to run Linux programs on a Windows machine without the need for a virtual machine or dual boot setup.

      (2) Dependency management and long-term sustainability:

      The framework depends on numerous external libraries and tools for data processing. This raises concerns about long-term maintainability, especially given the short lifespan of many academic software projects and the instability often associated with Python's backward compatibility. It would be helpful for the authors to clarify how flexible and modular the pipeline is, and whether it can remain functional if upstream dependencies become deprecated or change substantially.

      This is a very good point that reflects a broad challenge to maintainability and reproducibility. We now explicitly raise this point in our Limitations section, and note that

      “…even in cases where reproducing a result would require installing older versions of software, the results themselves remain accessible within NWB files referenced in Spyglass, ensuring that previous results can be built on even as packages evolve.”

      The merge table pattern also allows us to update (version) our pipelines as software changes. For example, we have already done so for changes in SpikeInterface versions for the version 1 pipeline for spike sorting. New and older versions of the pipeline (v0 and v1) are accessed through the merge table SpikeSortingOutput. This allows the user to have consistent results despite the version change.

      (3) Extensibility for custom pipelines:

      A further limitation is the insufficient documentation regarding the creation of custom pipelines. It is unclear how a user could adapt Spyglass to implement their own analysis workflows, especially if these differ from the provided examples (e.g., spike sorting, LFP analysis that are very specific to the hippocampal field). A clearer explanation or example of how to extend the framework for unrelated or novel analyses would greatly improve its utility and encourage community contributions.

      Here we failed to provide the required links to the documentation. We now explicitly refer to documentation on Custom Pipeline, which include a link to a YouTube video walking users through the creation of such a pipeline:

      Specifically, Spyglass uses DataJoint syntax to define tables as Python classes (see online documentation on Custom Pipelines and this video for examples).

      (4) Flexibility vs. Standardization:

      The authors may benefit from more explicitly defining the intended role of the framework: is Spyglass designed as a flexible, general-purpose tool for developing custom data analysis pipelines, or is its primary goal to provide a standardized framework for freezing and preserving pipelines post-publication to ensure reproducibility? While both goals are valuable, attempting to fully support both may introduce unnecessary complexity and result in a tool that is not well-suited for either purpose. The manuscript briefly touches on this tradeoff in the introduction, and the latter-pipeline preservation-may be the more natural fit for the package. If so, this intended use should be clearly communicated in the documentation to help users understand its scope and strengths.

      We appreciate this point, and have now clarified in the beginning of the Results section that

      It is both a general-purpose tool to enable the development of new analysis pipelines and a tool that allows those pipelines and associated results to be frozen and packaged to enable reproducibility.

      In practice, our lab uses Spyglass to systematize analyses to enable rapid application across many datasets. Then, once a paper has been finalized, we can export the data and the code in a package that enables reproduction. Being able to do both things is, in our view, a key strength of Spyglass. More broadly, we feel it is critical that there be a clear path for users to take their analysis code and make it reproducible. That process normally involves a very substantial amount of work, and our goal was to reduce the burden on users and make this a straightforward extension of how analyses are carried out.

      Impact:

      This work represents a significant milestone in advancing reproducible data analysis pipelines in neuroscience. Beyond reproducibility, the integration of cloud-based execution and shareable, interactive figures has the potential to transform how scientific collaboration and data dissemination are conducted. The authors are at the forefront of this shift, contributing valuable tools that push the field toward more transparent and accessible research practices.

      We appreciate this positive assessment.

      Reviewer #1 (Recommendations for the authors):

      (1) "The authors write: ‘the relational database, a well-known data structure that uses tables to organize data.’ This phrasing may be misleading… It would be more accurate to describe them as ‘well-established’ rather than ‘well-known.’"

      We have made this change.

      (2) The statement "It makes it easy to apply the same analysis to multiple datasets, as users need to specify only the data and parameters for computation ("what") rather than the execution details ("how")." would benefit from further elaboration. Specifically, how does this approach compare in practice to using a simple configuration file (e.g., YAML or JSON) to manage parameters and execution logic? A comparison or example would help ground the claim.?"

      We agree one could in principle do something similar with configuration files, but this is a discipline that the user must impose on themselves, as configuration files in general have no constraint on how they are to be used. On the other hand, a system like Spyglass enforces the separation of data from parameters by design. We have now added a brief comment on this point in the Results:

      “It provides a structure to organize and systematize the analysis parameters, data, and outputs into different tables. This contrasts with user-generated configuration files where each user could adopt their own idiosyncratic approach to specifying parameters and data.”

      We also come back to this point in the Discussion:

      Other approaches do away with the relational database altogether. For example, DataLad uses version control tools such as git and git-annex to manage both code and data as files [39]. This enables the creation of a data analysis environment and decentralized data sharing. For building analysis pipelines, it may be combined with other tools for managing the sequential execution of scripts. For example, Snakemakeb[40] (and related projects such as Cobrawap [41]) allows the users to gather and define the input, output, and the associated scripts to execute for each analysis step, thereby tracking the dependency between steps. But because these tools do not provide any formal structure for data analysis or parameter specification, they lack the advantages of the relational database that we discussed, such as being able to easily organize or search for the records of previous analysis based on specific parameters, efficient data sharing and access management to multiple users, and built-in data integrity checks based on constraints native to the database (e.g. primary keys).

      (3) The sentence ‘It enables easy access to multiple datasets via queries’ may overstate the benefit… clarify what specific advantages database queries offer.

      We agree that this is an important feature and we added the following as an example of the advantage of being able to query the database:

      It enables easy access to multiple datasets via queries (e.g. to find all datasets with recordings from a particular brain region or that used a particular behavioral paradigm)

      (4) Specifically, Spyglass uses DataJoint syntax to define tables as Python classes’ lacks clarity… Expanding this explanation with a brief, concrete example would

      We agree that this sentence does not provide information on how to use DataJoint syntax to define a table. We carefully considered adding that syntax to the manuscript, but we are concerned that doing so here and in other places where syntax examples could be used would decrease the readability of the document. We also noted that other papers that present analysis frameworks typically provide much less information.

      Nevertheless, it is clear that users would benefit from a concrete example, and as we mentioned above, we have added a link to the documentation describing how to make custom schema and pipelines, as well as a YouTube video that we created to walk users through this process.

      (5) The authors write: "Selection tables associate parameter entries with data object entries." This terminology is confusing. From a naming perspective, it is not immediately obvious what a "selection table" is or how it differs from other components. Moreover, shouldn't parameter entries be associated with a specific pipeline rather than directly with data objects? Further clarification is needed. "

      We appreciate that our terminology was not clear. The idea behind a selection table is that there are many data entries and many potential sets of parameters that can be used to analyze each of those entries. We have now revised this section of the text and added an explanatory paragraph:

      An analysis pipeline consists of sets of tables downstream of the Common tables. In each step in the analysis, the user populates one of four table types (Figure 2A):

      Data tables contain pointers to data objects in either the original NWB file or ones generated by an upstream analysis.

      Parameter tables contain a list of the parameters needed to fully specify the desired analysis.

      Selection tables allow users to select and pair a data entry and a parameter entry, defining the input to the Compute table.

      Compute tables execute the computations to carry out the analysis using the Data and Parameters specified in the Selection table entry. These results are then stored and can serve as Data for downstream analysis.

      This design has multiple features that we have found to be beneficial. First, Parameter tables store the full set of parameters needed to specify a given analysis. For example, a Parameter table entry for a firing rate analysis of a single neuron might specify the bin size and smoothing to be used for that analysis. Multiple such entries can be defined, allowing a user to select the most appropriate one for the question being addressed. Second, because Selection tables specify which Parameter table entry was used for a given analysis on the associated Data table entry, they provide the key information needed to know which parameters were used to generate the entry in the downstream Compute table. Third, it is simple to associate a given Data table entry with multiple Parameter table entries and then re-run the analysis on those pairs. This enables a user to understand how their choice of parameters impacts their results, something that is otherwise difficult to manage and track.

      (6) Including ‘fitting state-space models’ as a standard example may be misleading… Presenting it as a routine task might set unrealistic expectations."

      We agree and have changed “standard” to “a diverse range of”.

      (7) Figure 2 would benefit from clearer sequential logic. For example, the object ‘LFPSelection’ appears after a method call referencing it."

      We agree that the figure was not explained adequately. We now make it clear in the caption that the method call creates the entry in the LFPSelection table, and is thus upstream of the picture of the table entry that was created.

      (8) Example 3 would be strengthened by a comparison to SpikeInterface, a framework increasingly adopted by the community."

      Here we clearly did not explain the spike sorting pipeline sufficiently thoroughly. As we now clarify in the text:

      This pipeline uses SpikeInterface [19] to perform the operations critical for spike sorting, but also tracks all of the parameters used and provides a system for tracking multiple sorting curations.

      Thus, Spyglass takes advantage of the special purpose routines within SpikeInterface, but also provides an organizational framework for the outputs, and, equally critically, allows direct use of the outputs of sorting in downstream analyses with the ability to go back and know which sorting parameters were used for that analysis.

      (9) The authors state: "These are saved as Docker containers and optionally uploaded to DANDI." However, it is unclear how end users are expected to interact with these containers. Additional guidance or an example interaction would be valuable.

      We agree that this interaction was not described in the text, and we have now added the following to explain how a user might interact with these containers:

      ...This can be done by (i) hosting the database on the cloud and granting access to users outside the lab; or (ii) exporting and sharing parts of the database that were used by the project. Spyglass facilitates the second option by providing functions that automatically log the table entries and NWB files used for creating figures of a manuscript in a Python environment (Table 1, 05_Export). The dependencies of these entries are traced through the database to compile the complete set of raw, intermediate, and plotted NWB files and their corresponding database entries. These are stored in the `Export` table, which also generates a bash script to create SQL dumps of the identified database entries.

      To upload these files to DANDI, users must first register a new dandiset for their project and record their API and dandiset ID. With this information, they can then use the method `DandiPath.compile_dandiset()` to automatically validate, organize, and upload all project files to the DANDI archive. Additionally, this process stores the archive information for each file in the `DandiPath` table, allowing `fetch_nwb` to automatically stream data from the DANDI cloud storage when not available locally.

      To create a sharable docker image of the project, we provide a template repository spyglass-export-docker. Users first download a local copy of this repo and copy the SQL dump file, environment yaml, and figure-generating notebooks generated during spyglass export into the appropriate folders. Running the provided docker compose scripts then generates two linked docker containers: one running the reconstructed spyglass SQL database, and a second connected to this database and running a jupyter hub with a python environment matching that used when generating the figures. These can be readily shared with new users to provide them immediate access to all steps of the analysis process and the corresponding data through DANDI streaming

      (10) The phrase "not requiring a central location to track available files and providing a user-friendly Python API" is somewhat vague. Does this imply that multiple sources can exist for the same NWB file? How does the system handle potential version conflicts, such as when an NWB file is modified locally? A clearer explanation would help users understand the system's behavior in collaborative scenarios. "

      This is an important point that we now explain in the manuscript:

      Critically, the downloaded files are never modified locally within Spyglass and attempt to access a modified file would result in a DataJoint error. This ensures that each user is working on the same underlying data even if they are at different sites.

      To provide interested readers with more details, we also now point them to the repo for more information:

      We point interested readers to the Kachery GitHub repo (https://github.com/magland/kachery) for further descriptions.

      (11) "The concept of a ‘kachery zone’ in Figure 4 is ambiguous. Is this storage local or in the cloud? If a third-party storage system is involved, it should be explicitly labeled and described in the diagram."

      We agree that the depiction of a Kachery zone in Figure 4 is hard to understand. For the reviewer’s reference, a Kachery zone defines a list of users that have permissions to upload and download a particular set of files that have been linked to that zone. This is a explained in the tutorials, and to simplify the figure we have replaced the Kachery zone with a remote computer.

      (12) If one of the manuscript's goals is to showcase the functionality of the pipeline, Figure 5 would be more informative if it also illustrated the workflow or steps involved in generating the displayed figures.

      We have added a supplementary figure (Supplementary Figure 1) related to figure 5 that illustrates the main data workflow used in generating the figure. In addition, we note that the code for generating the figure 5 and supplemental are included in the code repository for the paper (https://github.com/LorenFrankLab/spyglass-paper/).

      (13) In the conclusion, the authors write: "By contrast, Spyglass begins with a shared data format that includes the raw data and offers both transparent data management and reproducible analysis pipelines using a formal data structure." However, the tools discussed in the previous paragraph seem to offer similar capabilities. The real challenge in transparent data management often lies in the technical overhead associated with setting up and maintaining a database, particularly when collaborating across labs.

      Here we may not have explained the differences between Spyglass and these other approaches sufficiently clearly. The various tools mentioned in the paragraph above this one do not begin with a shared format nor do they include a formal data structure. That said, we agree that maintaining a database accessible across labs is a key challenge. We note here that we provide tutorials to ease this process, which are linked and described in the manuscript (e.g. Table 1).

      (14) Specifying a preferred IDE… may not be necessary. This recommendation could be made optional or omitted."

      We agree that it may not be necessary, but we have also noted that users come to Spyglass with a very wide range of expertise, and in our lab it has been helpful to specify the IDE.

      Reviewer #2 (Public review):

      Summary:

      This valuable paper presents Spyglass, a comprehensive software framework designed to address the critical challenges of reproducibility and data sharing in neuroscience.

      The authors have developed a robust ecosystem built on community standards such as NWB and DataJoint, and demonstrate its utility by applying it to datasets from two independent labs, successfully validating the framework's ability to reproduce and extend published findings. While the framework offers a powerful blueprint for modern, reproducible research, its immediate broad impact may be tempered by the significant upfront investment required for adoption and its current focus on electrophysiological data. Nevertheless, Spyglass stands as an important and practical contribution, providing a well-documented and thoughtfully designed path toward more transparent and collaborative science.

      Strengths:

      (1) Principled solution to a foundational challenge:

      The work offers a concrete and comprehensive framework for reproducibility in neuroscience, moving beyond abstract principles to provide an implemented, end-to-end ecosystem.

      (2) Pragmatic and robust architectural design:

      Features such as the "cyclic iteration" motif for spike-sorting curation and the "merge" motif for pipeline consolidation demonstrate deep, practical experience with neurophysiological analysis and address real-world challenges.

      (3) Cross-laboratory validation:

      The successful replication and extension of published hippocampal decoding findings across independent datasets strongly support the framework's utility and underscore its potential for enabling reproducible science.

      (4) Accessibility through documentation and demos:

      Extensive tutorials and the availability of a public demo environment lower some of the barriers to adoption.

      We appreciate the Reviewer’s recognition of these strengths.

      Weaknesses:

      (1) High barrier to adoption:

      The requirement to convert all data into NWB, maintain a relational database, and train users in structured workflows is a significant hurdle, particularly for smaller labs.

      We agree that this is a significant hurdle, but we also believe that it comes with many advantages. It is also increasingly easy to do given the many community-supported tools, regardless of how much resource the lab has. These points are discussed in detail in “Why NWB?” section.

      We also note that, to our knowledge, there is no simpler alternative that provides the key features of Spyglass.

      (2) Limited tool integration:

      The current pipelines, while useful, still resemble proof-of-principle demonstrations.

      Closer integration with established analysis libraries such as Pynapple and others could broaden the toolkit and reduce duplication of effort.

      Here we clearly failed to explain that we have integrated other libraries, including Pynapple. We now make this clear in the Results section:

      Our goal was take advantage of other open source packages, and we have therefore integrated support for Pynapple [21], a general purpose neural data analysis package. We also built our pipelines to take advantage of other community-developed, open-source packages, like GhostiPy [20], SpikeInterface [19], DeepLabCut [2] and Moseq [29].

      We also have added a specific reference to the relevant function call in the Practical use cases and extensions section:

      For example, the user can conveniently read specific data types from the NWB file by first ingesting it into Spyglass and accessing database tables with Spyglass functions (e.g. fetch_nwb) or even load those objects in a format compatible with Pynapple [21] (fetch_pynapple).

      Pynapple support is actually aided by our design choice of relying on NWB. Because NWB files can be loaded by Pynapple, any analysis that uses a NWB file that can be read by Pynapple can be loaded as a Pynapple object. We have provided methods to do so.

      (3) Experimental metadata support:

      While NWB provides a solid foundation for storing neurophysiology data streams, it still lacks broad and standardized support for experimental metadata, including descriptions of conditions, subject details, and procedures, as well as links across datasets. This limitation constrains one of Spyglass's key promises: enabling reproducible, crosslaboratory science. The authors should clarify how Spyglass plans to address or mitigate this gap - for example, by adopting or contributing to metadata extensions, providing templates for experimental conditions, or integrating with complementary systems that manage metadata across datasets.

      This is an important point. First, NWB provides methods for creating new metadata extensions, and our laboratory has contributed to multiple such extensions and have adopted metadata extensions as they come to exist (for example, we are currently integrating the ndx-pose extension, which has broader support for pose estimation algorithms such as DLC and SLEAP, enabling us to capture relationships between body parts). These extensions, once incorporated into NWB, make it easy to create parallel Spyglass tables that read in the associated metadata. Second, we note that by storing the metadata from the NWB file in a database, Spyglass naturally supports searches across datasets where the metadata is the same (e.g. all the datasets from a given subject or using a given behavioral apparatus).

      That said, for these searches to be easy, the underlying NWB files need to use the same ontologies (naming systems). Creating shared naming systems within and across labs is very challenging, but even here having a database helps greatly, as it provides a way to find all the names used for a given field and to thereby make an effort to standardize them.

      Finally, while Spyglass aims to enable reproducibility, it will not be possible to solve all standardization issues of the field. We believe that Spyglass is an important step forward in standardization and reproducibility in that it encourages users to use the same data format and processing. To our knowledge, there is no software like it in the field of systems neuroscience. Limitations of the field and of current progress does not invalidate the contribution of Spyglass as a framework.

      We now mention all these issues in the Limitations section of the Discussion.

      (4) Cross-laboratory interoperability:

      While demonstrated across two datasets, the manuscript does not fully address how Spyglass will handle the diversity of metadata standards, acquisition systems, and labspecific practices that remain major obstacles to reproducibility.

      We agree that the current version of Spyglass does not fully address this diversity. Neverless, we note that the NWB standard is increasingly widely adopted in our field, and that by building on this standard, it is much similar to create structures that store relevant data across labs.

      (5) Visualization limitations:

      Beyond the export system and Figurl, NWB offers relatively few options for interactive data exploration. The ability to explore data flexibly and discover new phenomena remains limited, which constrains one of the potential strengths of standardized pipelines.

      We agree that there are many other tools, and we have considered additional integrations. We have chosen not to proceed in this direction because the various visualization tools are well constructed, and therefore already easy to use with data retrieved from Spyglass. Thus, users can choose to use Matplotlib, Seaborn, or any of many other visualization tools and apply thos to data accessed through Spyglass without the need for more explicit integration.

      Spyglass is well-positioned to become a community framework for reproducible neuroscience workflows, with the potential to set new standards for transparency and data sharing. With expanded modality coverage, tighter integration of existing community tools, stronger solutions for cross-lab interoperability, and richer visualization capabilities, it could have a transformative impact on the field.

      We appreciate this summary and will continue to try to make Spyglass more powerful, generalizable, and accessible to the community.

      Reviewer #2 (Recommendations for the authors):

      (1) Documentation/User onboarding:

      While extensive documentation exists, new users may feel overwhelmed. A single Quickstart or "golden path" guide and a one-command validation script would substantially improve usability.

      As mentioned in the response to reviewer 1, we have added an interactive quickstart program to walk users through installation and setup (scripts/install.py) and validate the install (scripts/validate.py). This should greatly reduce the complexity of the set-up process and allow new users to use Spyglass quickly and confidently. We thank the reviewer for the suggestion.

      (2) Permission handling and multi-user scaling:

      Current ad hoc solutions (like cautious deletes) may not scale well in large collaborations. This should be acknowledged, but it is not a fatal weakness given the framework's early stage.

      This is a fair point and we now mention this when cautious delete is introduced in the Methods:

      Though this is not a formal permission-management system, it serves to prevent accidental deletions. We note that this system does incur additional overhead, and while that has not been an issue for us, it is possible that this would become problematic in use for much larger cross-laboratory collaborations.

      (3) Benchmarking and performance evaluation:

      "More systematic testing (e.g., reproducibility across independent users, computational efficiency) would be reassuring, but the lack of it does not invalidate the proof-of-principle demonstration. "

      We agree. So far at least two other labs have adopted this system and we are working with a consortium funded by the Simons Foundation to use Spyglass as a data sharing system across a larger number of labs.

      (4) Support for Cloud solution:

      To lower the barrier to adoption, the authors should consider cloud integration, such as preconfigured Docker/Cloud templates or hosted options, so end-users do not need to maintain databases and storage locally.

      We agree that cloud-based solutions could be a good option for some labs, although we note that the cost of cloud-based computing can be very high. There is also the burden of moving and storing the data to where it needs to be processed, which can be particularly time intensive with the large-scale data being generated by many laboratories.

      At the reviewer’s suggestion, we have added a docker-compose support to lower the barrier to adoption. This includes:

      docker-compose.yml with health checks and persistent storage

      .env.example configuration template

      This allows one-command database setup: `docker compose up –d`

      (5) Integration of greater modalities:

      The authors should consider expanding support to other major data types, particularly calcium imaging, photometry, and other optical physiology data.

      We entirely agree that pipelines to ingest and process these datatypes would be very valuable, and we would welcome collaborations with experts and the general community to build these pipelines. We are, for example, working with a collaborating lab on a photometry pipeline. However, we only have so many people to build and maintain Spyglass, so we are limited by the capacity and expertise of our developers.

      (6) Integrate more community tools:

      Closer integration with community tools such as Pynapple, Neurosift, and SpikeInterface would broaden functionality and position Spyglass as a hub rather than a parallel ecosystem.

      As we mentioned in our responses to Reviewer 1, we entirely agree, and in fact we have already integrated Pynapple support into Spyglass. Because we store files in the NWB format and Pynapple supports NWB, it was easy for us to convert any data we have into the Pynapple format upon request, thus making it easily analyzable by the Pynapple package. Moreover, we use SpikeInterface for the SpikeSorting pipline, and similarly provide pipelines built on other open source projects. As we now clarify in the text:

      Spyglass includes pipelines for a diverse range of analysis tasks in systems neuroscience, such as the analysis of LFP, spike sorting, video and position processing, and fitting state-space models for decoding neural data. Tutorials for all pipelines are available on the Spyglass documentation website (Table 1). Our goal was take advantage of other open source packages, and we have therefore integrated support for Pynapple [21], a general purpose neural data analysis package. We also built our pipelines to take advantage of other community-developed, open-source packages, like GhostiPy [20], SpikeInterface [19], DeepLabCut [2] and Moseq [29].

      (7) Direct Dandi archive upload functionality:

      Scripts and tutorials for uploading data directly from Spyglass to DANDI, with validation of metadata completeness, would provide users with a direct pipeline from raw data to a public archive.

      The tutorials for DANDI upload are included as part of the export tutorial notebook (https://lorenfranklab.github.io/spyglass/latest/notebooks/05_Export/). We agree that this was not apparent from the manuscript before and have noted this within the Manuscript table describing these notebooks.

  2. Jan 2026
    1. I switched from VSCode to Zed
      • Original author switched from VSCode to Zed in December and now uses Zed as the primary editor for Python and Go.
      • Main reason for leaving VSCode was increasingly intrusive AI features (Copilot prompts, inline terminal suggestions) and perceived increase in crashes and slowness.
      • Author still likes VSCode overall but feels rapid AI integration harmed stability and usability, and hopes it becomes less intrusive in the future.
      • JetBrains IDEs were rejected as feeling too heavy, and Vim/Emacs as too time‑intensive to configure; Zed was attractive as a modern, lightweight Rust-based IDE.
      • Transition from VSCode was smooth: similar UI, mostly compatible keybindings, and ability (unused by author) to import some VSCode settings.
      • Zed felt significantly faster and more responsive than VSCode, with no glitches or crashes over a couple of weeks, restoring a sense of “joy of programming”.
      • Initial Zed setup was minimal: adjust fonts, theme, disable inline git blame, and enable autosave; Go worked out of the box.
      • Python setup required more work because Zed uses language servers and defaults to Basedpyright instead of Pylance (which is VSCode-only and closed source).
      • The author hit unexpected strict type-checking because projects with a [tool.pyright] section in pyproject.toml effectively force Basedpyright’s recommended mode.
      • Attempting to set typeCheckingMode in Zed’s settings.json did not help; the fix was explicitly setting typeCheckingMode = "standard" inside each project’s [tool.pyright] config.
      • Another issue was delayed type diagnostics across files, fixed by setting "disablePullDiagnostics": true in Zed’s Basedpyright initialization options.
      • Virtualenv handling and other Python-specific behavior worked smoothly; the author also tried the new ty language server, found it good, but stayed with Basedpyright to match CI’s Pyright.
      • Zed is now the author’s default IDE: fast, stable, familiar, with enough extensions despite a much smaller ecosystem than VSCode.
      • The main missing feature is a powerful side‑by‑side git diff viewer comparable to GitLens.
      • Zed’s AI features are present but easy to ignore; paid plans for AI edit predictions seem like a reasonable way to fund development while keeping the core editor free.
      • The author views Zed as a serious competitor that pressures VSCode to improve, especially around AI integration and performance.
      • The post ends with sharing a minimal settings.json showcasing autosave, disabled inline blame, VSCode keymap, fonts, light theme, and customized Basedpyright LSP options.

      Hacker News Discussion

      • A VS Code team member acknowledges that AI-related features sometimes ignore the “disable” settings but states they try to ship fixes quickly and appreciate feedback.
      • Several commenters recommend VSCodium as a way to get the open‑source VS Code experience without Microsoft’s telemetry and aggressive AI integration, while clarifying that both VS Code and VSCodium build from the same upstream repo.
      • Many users express frustration with VS Code becoming bloated, unreliable, or “enshittified,” particularly around Copilot and complex configuration/remote setups, and are looking at Zed or classic editors as alternatives.
      • Emacs and Vim/Neovim advocates argue that investing in these longstanding editors avoids churn and AI/UX regressions, with some describing decades-long Emacs usage and others praising Neovim plus LSPs as a lightweight yet powerful setup.
      • Sublime Text is often cited as the spiritual predecessor of Zed in terms of speed and snappiness, with some saying Zed is the closest modern successor focused on performance.
      • Zed users highlight positives like fast AI/MCP integration, good Nix/Direnv support, and pleasant design, but note pain points such as font rendering on low‑DPI or non‑GPU setups, Linux packaging gaps, missing REPLs for Lisps, and weaker debugging/extension ecosystems compared to VS Code or JetBrains.
      • Some comments mention concrete bugs and annoyances in Zed, including format‑on‑save occasionally deleting the first line of Python classes, unwanted newline insertion at EOF, and missing small quality-of-life features (e.g., indentation autodetection, drag‑and‑drop markdown link insertion).
      • A few developers describe hybrid workflows: using JetBrains IDEs on powerful machines, Zed on lower‑power devices, and Vim/Neovim or Sublime for quick one‑off edits, emphasizing that Zed is not yet at JetBrains’ level for deep refactoring and code understanding.
      • Several participants discuss Zed’s business model as an AI “reseller”: core editor remains free while Pro users pay for pooled tokens across multiple AI providers, which some see as a relatively benign and sustainable way to monetize.
      • There is concern that Zed’s extension ecosystem is still small and that Rust-based extension development may limit growth relative to VS Code; suggestions include better guidance for porting VS Code extensions and addressing collaboration/chat self‑hosting and security concerns.
  3. Feb 2025
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      The authors have created a system for designing and running experimental pipelines to control and coordinate different programs and devices during an experiment, called Heron. Heron is based around a graphical tool for creating a Knowledge Graph made up of nodes connected by edges, with each node representing a separate Python script, and each edge being a communication pathway connecting a specific output from one node to an iput on another. Each node also has parameters that can be set by the user during setup and runtime, and all of this behavior is concisely specified in the code that defines each node. This tool tries to marry the ease of use, clarity, and selfdocumentation of a purely graphical system like Bonsai with the flexibility and power of a purely code-based system like Robot Operating System (ROS).

      Strengths:

      The underlying idea behind Heron, of combining a graphical design and execution tool with nodes that are made as straightforward Python scripts seems like a great way to get the relative strengths of each approach. The graphical design side is clear, selfexplanatory, and self-documenting, as described in the paper. The underlying code for each node tends to also be relatively simple and straightforward, with a lot of the complex communication architecture successfully abstracted away from the user. This makes it easy to develop new nodes, without needing to understand the underlying communications between them. The authors also provide useful and well-documented templates for each type of node to further facilitate this process. Overall this seems like it could be a great tool for designing and running a wide variety of experiments, without requiring too much advanced technical knowledge from the users.

      The system was relatively easy to download and get running, following the directions and already has a significant amount of documentation available to explain how to use it and expand its capabilities. Heron has also been built from the ground up to easily incorporate nodes stored in separate Git repositories and to thus become a large community-driven platform, with different nodes written and shared by different groups. This gives Heron a wide scope for future utility and usefulness, as more groups use it, write new nodes, and share them with the community. With any system of this sort, the overall strength of the system is thus somewhat dependent on how widely it is used and contributed to, but the authors did a good job of making this easy and accessible for people who are interested. I could certainly see Heron growing into a versatile and popular system for designing and running many types of experiments.

      Weaknesses:

      (1) The number one thing that was missing from the paper was any kind of quantification of the performance of Heron in different circumstances. Several useful and illustrative examples were discussed in depth to show the strengths and flexibility of Heron, but there was no discussion or quantification of performance, timing, or latency for any of these examples. These seem like very important metrics to measure and discuss when creating a new experimental system.

      Heron is practically a thin layer of obfuscation of signal passing across processes. Given its design approach it is up to the code of each Node to deal with issues of timing, synching and latency and thus up to each user to make sure the Nodes they author fulfil their experimental requirements. Having said that, Heron provides a large number of tools to allow users to optimise the generated Knowledge Graphs for their use cases. To showcase these tools, we have expanded on the third experimental example in the paper with three extra sections, two of which relate to Heron’s performance and synching capabilities. One is focusing on Heron’s CPU load requirements (and existing Heron tools to keep those at acceptable limits) and another focusing on post experiment synchronisation of all the different data sets a multi Node experiment generates.   

      (2) After downloading and running Heron with some basic test Nodes, I noticed that many of the nodes were each using a full CPU core on their own. Given that this basic test experiment was just waiting for a keypress, triggering a random number generator, and displaying the result, I was quite surprised to see over 50% of my 8-core CPU fully utilized. I don’t think that Heron needs to be perfectly efficient to accomplish its intended purpose, but I do think that some level of efficiency is required. Some optimization of the codebase should be done so that basic tests like this can run with minimal CPU utilization. This would then inspire confidence that Heron could deal with a real experiment that was significantly more complex without running out of CPU power and thus slowing down.

      The original Heron allowed the OS to choose how to manage resources over the required process. We were aware that this could lead to significant use of CPU time, as well as occasionally significant drop of packets (which was dependent on the OS and its configuration). This drop happened mainly when the Node was running a secondary process (like in the Unity game process in the 3rd example). To mitigate these problems, we have now implemented a feature allowing the user to choose the CPU that each Node’s worker function runs on as well as any extra processes the worker process initialises. This is accessible from the Saving secondary window of the node. This stops the OS from swapping processes between CPUs and eliminates the dropping of packages due to the OS behaviour. It also significantly reduces the utilised CPU time. To showcase this, we initially run the simple example mentioned by the reviewer. The computer running only background services was using 8% of CPU (8 cores). With Heron GUI running but with no active Graph, the CPU usage went to 15%. With the Graph running and Heron’s processes running on OS attributed CPU cores, the total CPU was at 65% (so very close to the reviewer’s 50%). By choosing a different CPU core for each of the three worker processes the CPU went down to 47% and finally when all processes were forced to run on the same CPU core the CPU load dropped to 30%.  So, Heron in its current implementation running its GUI and 3 Nodes takes 22% of CPU load. This is still not ideal but is a consequence of the overhead of running multiple processes vs multiple threads. We believe that, given Heron’s latest optimisation, offering more control of system management to the user, the benefits of multi process applications outweigh this hit in system resources. 

      We have also increased the scope of the third example we provide in the paper and there we describe in detail how a full-scale experiment with 15 Nodes (which is the upper limit of number of Nodes usually required in most experiments) impacts CPU load. 

      Finally, we have added on Heron’s roadmap projects extra tasks focusing only on optimisation (profiling and using Numba for the time critical parts of the Heron code).

      (3) I was also surprised to see that, despite being meant specifically to run on and connect diverse types of computer operating systems and being written purely in Python, the Heron Editor and GUI must be run on Windows. This seems like an unfortunate and unnecessary restriction, and it would be great to see the codebase adjusted to make it fully crossplatform-compatible.

      This point was also mentioned by reviewer 2. This was a mistake on our part and has now been corrected in the paper. Heron (GUI and underlying communication functionality) can run on any machine that the underlying python libraries run, which is Windows, Linux (both for x86 and Arm architectures) and MacOS. We have tested it on Windows (10 and 11, both x64), Linux PC (Ubuntu 20.04.6, x64) and Raspberry Pi 4 (Debian GNU/Linux 12 (bookworm), aarch64). The Windows and Linux versions of Heron have undergone extensive debugging and all of the available Nodes (that are not OS specific) run on those two systems. We are in the process of debugging the Nodes’ functionality for RasPi. The MacOS version, although functional requires further work to make sure all of the basic Nodes are functional (which is not the case at the moment). We have also updated our manuscript (Multiple machines, operating systems and environments) to include the above information. 

      (4) Lastly, when I was running test experiments, sometimes one of the nodes, or part of the Heron editor itself would throw an exception or otherwise crash. Sometimes this left the Heron editor in a zombie state where some aspects of the GUI were responsive and others were not. It would be good to see a more graceful full shutdown of the program when part of it crashes or throws an exception, especially as this is likely to be common as people learn to use it. More problematically, in some of these cases, after closing or force quitting Heron, the TCP ports were not properly relinquished, and thus restarting Heron would run into an "address in use" error. Finding and killing the processes that were still using the ports is not something that is obvious, especially to a beginner, and it would be great to see Heron deal with this better. Ideally, code would be introduced to carefully avoid leaving ports occupied during a hard shutdown, and furthermore, when the address in use error comes up, it would be great to give the user some idea of what to do about it.

      A lot of effort has been put into Heron to achieve graceful shut down of processes, especially when these run on different machines that do not know when the GUI process has closed. The code that is being suggested to avoid leaving ports open has been implemented and this works properly when processes do not crash (Heron is terminated by the user) and almost always when there is a bug in a process that forces it to crash. In the version of Heron available during the reviewing process there were bugs that caused the above behaviour (Node code hanging and leaving zombie processes) on MacOS systems. These have now been fixed. There are very seldom instances though, especially during Node development, that crashing processes will hang and need to be terminated manually. We have taken on board the reviewer’s comments that users should be made more aware of these issues and have also described this situation in the Debugging part of Heron’s documentation. There we explain the logging and other tools Heron provides to help users debug their own Nodes and how to deal with hanging processes.

      Heron is still in alpha (usable but with bugs) and the best way to debug it and iron out all the bugs in all use cases is through usage from multiple users and error reporting (we would be grateful if the errors the reviewer mentions could be reported in Heron’s github Issues page). We are always addressing and closing any reported errors, since this is the only way for Heron to transition from alpha to beta and eventually to production code quality.

      Overall I think that, with these improvements, this could be the beginning of a powerful and versatile new system that would enable flexible experiment design with a relatively low technical barrier to entry. I could see this system being useful to many different labs and fields. 

      We thank the reviewer for positive and supportive words and for the constructive feedbacks. We believe we have now addressed all the raised concerns.  

      Reviewer #2 (Public Review):

      Summary:

      The authors provide an open-source graphic user interface (GUI) called Heron, implemented in Python, that is designed to help experimentalists to

      (1) design experimental pipelines and implement them in a way that is closely aligned with their mental schemata of the experiments,

      (2) execute and control the experimental pipelines with numerous interconnected hardware and software on a network.

      The former is achieved by representing an experimental pipeline using a Knowledge Graph and visually representing this graph in the GUI. The latter is accomplished by using an actor model to govern the interaction among interconnected nodes through messaging, implemented using ZeroMQ. The nodes themselves execute user-supplied code in, but not limited to, Python.

      Using three showcases of behavioral experiments on rats, the authors highlighted three benefits of their software design:

      (1) the knowledge graph serves as a self-documentation of the logic of the experiment, enhancing the readability and reproducibility of the experiment,

      (2) the experiment can be executed in a distributed fashion across multiple machines that each has a different operating system or computing environment, such that the experiment can take advantage of hardware that sometimes can only work on a specific computer/OS, a commonly seen issue nowadays,

      (3) he users supply their own Python code for node execution that is supposed to be more friendly to those who do not have a strong programming background.

      Strengths:

      (1) The software is light-weight and open-source, provides a clean and easy-to-use GUI,

      (2) The software answers the need of experimentalists, particularly in the field of behavioral science, to deal with the diversity of hardware that becomes restricted to run on dedicated systems.

      (3) The software has a solid design that seems to be functionally reliable and useful under many conditions, demonstrated by a number of sophisticated experimental setups.

      (4) The software is well documented. The authors pay special attention to documenting the usage of the software and setting up experiments using this software.

      Weaknesses:

      (1) While the software implementation is solid and has proven effective in designing the experiment showcased in the paper, the novelty of the design is not made clear in the manuscript. Conceptually, both the use of graphs and visual experimental flow design have been key features in many widely used softwares as suggested in the background section of the manuscript. In particular, contrary to the authors’ claim that only pre-defined elements can be used in Simulink or LabView, Simulink introduced MATLAB Function Block back in 2011, and Python code can be used in LabView since 2018. Such customization of nodes is akin to what the authors presented.

      In the Heron manuscript we have provided an extensive literature review of existing systems from which Heron has borrowed ideas. We never wished to say that graphs and visual code is what sets Heron apart since these are technologies predating Heron by many years and implemented by a large number of software. We do not believe also that we have mentioned that LabView or Simulink can utilise only predefined nodes. What we have said is that in such systems (like LabView, Simulink and Bonsai) the focus of the architecture is on prespecified low level elements while the ability for users to author their own is there but only as an afterthought. The difference with Heron is that in the latter the focus is on the users developing their own elements. One could think of LabView style software as node-based languages (with low level visual elements like loops and variables) that also allow extra scripting while Heron is a graphical wrapper around python where nodes are graphical representations of whole processes. To our knowledge there is no other software that allows the very fast generation of graphical elements representing whole processes whose communication can also be defined graphically. Apart from this distinction, Heron also allows a graphical approach to writing code for processes that span different machines which again to our knowledge is a novelty of our approach and one of its strongest points towards ease of experimental pipeline creation (without sacrificing expressivity). 

      (2) The authors claim that the knowledge graph can be considered as a self-documentation of an experiment. I found it to be true to some extent. Conceptually it’s a welcoming feature and the fact that the same visualization of the knowledge graph can be used to run and control experiments is highly desirable (but see point 1 about novelty). However, I found it largely inadequate for a person to understand an experiment from the knowledge graph as visualized in the GUI alone. While the information flow is clear, and it seems easier to navigate a codebase for an experiment using this method, the design of the GUI does not make it a one-stop place to understand the experiment. Take the Knowledge Graph in Supplementary Figure 2B as an example, it is associated with the first showcase in the result section highlighting this self-documentation capability. I can see what the basic flow is through the disjoint graph where 1) one needs to press a key to start a trial, and 2) camera frames are saved into an avi file presumably using FFMPEG. Unfortunately, it is not clear what the parameters are and what each block is trying to accomplish without the explanation from the authors in the main text. Neither is it clear about what the experiment protocol is without the help of Supplementary Figure 2A.

      In my opinion, text/figures are still key to documenting an experiment, including its goals and protocols, but the authors could take advantage of the fact that they are designing a GUI where this information, with properly designed API, could be easily displayed, perhaps through user interaction. For example, in Local Network -> Edit IPs/ports in the GUI configuration, there is a good tooltip displaying additional information for the "password" entry. The GUI for the knowledge graph nodes can very well utilize these tooltips to show additional information about the meaning of the parameters, what a node does, etc, if the API also enforces users to provide this information in the form of, e.g., Python docstrings in their node template. Similarly, this can be applied to edges to make it clear what messages/data are communicated between the nodes. This could greatly enhance the representation of the experiment from the Knowledge graph.

      In the first showcase example in the paper “Probabilistic reversal learning.

      Implementation as self-documentation” we go through the steps that one would follow in order to understand the functionality of an experiment through Heron’s Knowledge Graph. The Graph is not just the visual representation of the Nodes in the GUI but also their corresponding code bases. We mention that the way Heron’s API limits the way a Node’s code is constructed (through an Actor based paradigm) allows for experimenters to easily go to the code base of a specific Node and understand its 2 functions (initialisation and worker) without getting bogged down in the code base of the whole Graph (since these two functions never call code from any other Nodes). Newer versions of Heron facilitate this easy access to the appropriate code by also allowing users to attach to Heron their favourite IDE and open in it any Node’s two scripts (worker and com) when they double click on the Node in Heron’s GUI. On top of this, Heron now (in the versions developed as answers to the reviewers’ comments) allows Node creators to add extensive comments on a Node but also separate comments on the Node’s parameters and input and output ports. Those can be seen as tooltips when one hovers over the Node (a feature that can be turned off or on by the Info button on every Node).  

      As Heron stands at the moment we have not made the claim that the Heron GUI is the full picture in the self-documentation of a Graph. We take note though the reviewer’s desire to have the GUI be the only tool a user would need to use to understand an experimental implementation. The solution to this is the same as the one described by the reviewer of using the GUI to show the user the parts of the code relevant to a specific Node without the user having to go to a separate IDE or code editor. The reason this has not been implemented yet is the lack of a text editor widget in the underlying gui library (DearPyGUI). This is in their roadmap for their next large release and when this exists we will use it to implement exactly the idea the reviewer is suggesting, but also with the capability to not only read comments and code but also directly edit a Node’s code (see Heron’s roadmap). Heron’s API at the moment is ideal for providing such a text editor straight from the GUI.

      (3) The design of Heron was primarily with behavioral experiments in mind, in which highly accurate timing is not a strong requirement. Experiments in some other areas that this software is also hoping to expand to, for example, electrophysiology, may need very strong synchronization between apparatus, for example, the record timing and stimulus delivery should be synced. The communication mechanism implemented in Heron is asynchronous, as I understand it, and the code for each node is executed once upon receiving an event at one or more of its inputs. The paper, however, does not include a discussion, or example, about how Heron could be used to address issues that could arise in this type of communication. There is also a lack of information about, for example, how nodes handle inputs when their ability to execute their work function cannot keep up with the frequency of input events. Does the publication/subscription handle the queue intrinsically? Will it create problems in real-time experiments that make multiple nodes run out of sync? The reader could benefit from a discussion about this if they already exist, and if not, the software could benefit from implementing additional mechanisms such that it can meet the requirements from more types of experiments.

      In order to address the above lack of explanation (that also the first reviewer pointed out) we expanded the third experimental example in the paper with three more sections. One focuses solely on explaining how in this example (which acquires and saves large amounts of data from separate Nodes running on different machines) one would be able to time align the different data packets generated in different Nodes to each other. The techniques described there are directly implementable on experiments where the requirements of synching are more stringent than the behavioural experiment we showcase (like in ephys experiments). 

      Regarding what happens to packages when the worker function of a Node is too slow to handle its traffic, this is mentioned in the paper (Code architecture paragraph): “Heron is designed to have no message buffering, thus automatically dropping any messages that come into a Node’s inputs while the Node’s worker function is still running.” This is also explained in more detail in Heron’s documentation. The reasoning for a no buffer system (as described in the documentation) is that for the use cases Heron is designed to handle we believe there is no situation where a Node would receive large amounts of data in bursts while very little data during the rest of the time (in which case a buffer would make sense). Nodes in most experiments will either be data intensive but with a constant or near constant data receiving speed (e.g. input from a camera or ephys system) or will have variable data load reception but always with small data loads (e.g. buttons). The second case is not an issue and the first case cannot be dealt with a buffer but with the appropriate code design, since buffering data coming in a Node too slow for its input will just postpone the inevitable crash. Heron’s architecture principle in this case is to allow these ‘mistakes’ (i.e. package dropping) to happen so that the pipeline continues to run and transfer the responsibility of making Nodes fast enough to the author of each Node. At the same time Heron provides tools (see the Debugging section of the documentation and the time alignment paragraph of the “Rats playing computer games”  example in the manuscript) that make it easy to detect package drops and either correct them or allow them but also allow time alignment between incoming and outgoing packets. In the very rare case where a buffer is required Heron’s do-it-yourself logic makes it easy for a Node developer to implement their own Node specific buffer.

      (4) The authors mentioned in "Heron GUI’s multiple uses" that the GUI can be used as an experimental control panel where the user can update the parameters of the different Nodes on the fly. This is a very useful feature, but it was not demonstrated in the three showcases. A demonstration could greatly help to support this claim.

      As the reviewer mentions, we have found Heron’s GUI double role also as an experimental on-line controller a very useful capability during our experiments. We have expanded the last experimental example to also showcase this by showing how on the “Rats playing computer games” experiment we used the parameters of two Nodes to change the arena’s behaviour while the experiment was running, depending on how the subject was behaving at the time (thus exploring a much larger set of parameter combinations, faster during exploratory periods of our shaping protocols construction). 

      (5) The API for node scripts can benefit from having a better structure as well as having additional utilities to help users navigate the requirements, and provide more guidance to users in creating new nodes. A more standard practice in the field is to create three abstract Python classes, Source, Sink, and Transform that dictate the requirements for initialisation, work_function, and on_end_of_life, and provide additional utility methods to help users connect between their code and the communication mechanism. They can be properly docstringed, along with templates. In this way, the com and worker scripts can be merged into a single unified API. A simple example that can cause confusion in the worker script is the "worker_object", which is passed into the initialise function. It is unclear what this object this variable should be, and what attributes are available without looking into the source code. As the software is also targeting those who are less experienced in programming, setting up more guidance in the API can be really helpful. In addition, the self-documentation aspect of the GUI can also benefit from a better structured API as discussed in point 2 above.

      The reviewer is right that using abstract classes to expose to users the required API would be a more standard practice. The reason we did not choose to do this was to keep Heron easily accessible to entry level Python programmers who do not have familiarity yet with object oriented programming ideas. So instead of providing abstract classes we expose only the implementation of three functions which are part of the worker classes but the classes themselves are not seen by the users of the API. The point about the users’ accessibility to more information regarding a few objects used in the API (the worker object for example) has been taken on board and we have now addressed this by type hinting all these objects both in the templates and more importantly in the automatically generated code that Heron now creates when a user chooses to create a Node graphically (a feature of Heron not present in the version available in the initial submission of this manuscript).  

      (6) The authors should provide more pre-defined elements. Even though the ability for users to run arbitrary code is the main feature, the initial adoption of a codebase by a community, in which many members are not so experienced with programming, is the ability for them to use off-the-shelf components as much as possible. I believe the software could benefit from a suite of commonly used Nodes.

      There are currently 12 Node repositories in the Heron-repositories project on Github with more than 30 Nodes, 20 of which are general use (not implementing a specific experiment’ logic). This list will continue to grow but we fully appreciate the truth of the reviewer’s comment that adoption will depend on the existence of a large number of commonly used Nodes (for example Numpy, and OpenCV Nodes) and are working towards this goal.

      (7) It is not clear to me if there is any capability or utilities for testing individual nodes without invoking a full system execution. This would be critical when designing new experiments and testing out each component.

      There is no capability to run the code of an individual Node outside Heron’s GUI. A user could potentially design and test parts of the Node before they get added into a Node but we have found this to be a highly inefficient way of developing new Nodes. In our hands the best approach for Node development was to quickly generate test inputs and/or outputs using the “User Defined Function 1I 1O” Node where one can quickly write a function and make it accessible from a Node. Those test outputs can then be pushed in the Node under development or its outputs can be pushed in the test function, to allow for incremental development without having to connect it to the Nodes it would be connected in an actual pipeline. For example, one can easily create a small function that if a user presses a key will generate the same output (if run from a “User Defined Function 1I 1O” Node) as an Arduino Node reading some buttons. This output can then be passed into an experiment logic Node under development that needs to do something with this input. In this way during a Node development Heron allows the generation of simulated hardware inputs and outputs without actually running the actual hardware. We have added this way of developing Nodes also in our manuscript (Creating a new Node).

      Reviewer #3 (Public Review):

      Summary:

      The authors present a Python tool, Heron, that provides a framework for defining and running experiments in a lab setting (e.g. in behavioural neuroscience). It consists of a graphical editor for defining the pipeline (interconnected nodes with parameters that can pass data between them), an API for defining the nodes of these pipelines, and a framework based on ZeroMQ, responsible for the overall control and data exchange between nodes. Since nodes run independently and only communicate via network messages, an experiment can make use of nodes running on several machines and in separate environments, including on different operating systems.

      Strengths:

      As the authors correctly identify, lab experiments often require a hodgepodge of separate hardware and software tools working together. A single, unified interface for defining these connections and running/supervising the experiment, together with flexibility in defining the individual subtasks (nodes) is therefore a very welcome approach. The GUI editor seems fairly intuitive, and Python as an accessible programming environment is a very sensible choice. By basing the communication on the widely used ZeroMQ framework, they have a solid base for the required non-trivial coordination and communication. Potential users reading the paper will have a good idea of how to use the software and whether it would be helpful for their own work. The presented experiments convincingly demonstrate the usefulness of the tool for realistic scientific applications.

      Weaknesses:

      (1) In my opinion, the authors somewhat oversell the reproducibility and "selfdocumentation" aspect of their solution. While it is certainly true that the graph representation gives a useful high-level overview of an experiment, it can also suffer from the same shortcomings as a "pure code" description of a model - if a user gives their nodes and parameters generic/unhelpful names, reading the graph will not help much. 

      This is a problem that to our understanding no software solution can possibly address. Yet having a visual representation of how different inputs and outputs connect to each other we argue would be a substantial benefit in contrast to the case of “pure code” especially when the developer of the experiment has used badly formatted variable names.

      (2) Making the link between the nodes and the actual code is also not straightforward, since the code for the nodes is spread out over several directories (or potentially even machines), and not directly accessible from within the GUI. 

      This is not accurate. The obligatory code of a Node always exists within a single folder and Heron’s API makes it rather cumbersome to spread scripts relating to a Node across separate folders. The Node folder structure can potentially be copied over different machines but this is why Heron is tightly integrated with git practices (and even politely asks the user with popup windows to create git repositories of any Nodes they create whilst using Heron’s automatic Node generator system). Heron’s documentation is also very clear on the folder structure of a Node which keeps the required code always in the same place across machines and more importantly across experiments and labs. Regarding the direct accessibility of the code from the GUI, we took on board the reviewers’ comments and have taken the first step towards correcting this. Now one can attach to Heron their favourite IDE and then they can double click on any Node to open its two main scripts (com and worker) in that IDE embedded in whatever code project they choose (also set in Heron’s settings windows). On top of this, Heron now allows the addition of notes both for a Node and for all its parameters, inputs and outputs which can be viewed by hovering the mouse over them on the Nodes’ GUIs. The final step towards GUI-code integration will be to have a Heron GUI code editor but this is something that has to wait for further development from Heron’s underlying GUI library DearPyGUI.

      (3) The authors state that "[Heron’s approach] confers obvious benefits to the exchange and reproducibility of experiments", but the paper does not discuss how one would actually exchange an experiment and its parameters, given that the graph (and its json representation) contains user-specific absolute filenames, machine IP addresses, etc, and the parameter values that were used are stored in general data frames, potentially separate from the results. Neither does it address how a user could keep track of which versions of files were used (including Heron itself).

      Heron’s Graphs, like any experimental implementation, must contain machine specific strings. These are accessible either from Heron’s GUI when a Graph json file is opened or from the json file itself. Heron in this regard does not do anything different to any other software, other than saving the graphs into human readable json files that users can easily manipulate directly.

      Heron provides a method for users to save every change of the Node parameters that might happen during an experiment so that it can be fully reproduced. The dataframes generated are done so in the folders specified by the user in each of the Nodes (and all those paths are saved in the json file of the Graph). We understand that Heron offers a certain degree of freedom to the user (Heron’s main reason to exist is exactly this versatility) to generate data files wherever they want but makes sure every file path gets recorded for subsequent reproduction. So, Heron behaves pretty much exactly like any other open source software. What we wanted to focus on as the benefits of Heron on exchange and reproducibility was the ability of experimenters to take a Graph from another lab (with its machine specific file paths and IP addresses) and by examining the graphical interface of it to be able to quickly tweak it to make it run on their own systems. That is achievable through the fact that a Heron experiment will be constructed by a small amount of Nodes (5 to 15 usually) whose file paths can be trivially changed in the GUI or directly in the json file while the LAN setup of the machines used can be easily reconstructed from the information saved in the secondary GUIs.

      Where Heron needs to improve (and this is a major point in Heron’s roadmap) is the need to better integrate the different saved experiments with the git versions of Heron and the Nodes that were used for that specific save. This, we appreciate is very important for full reproducibility of the experiment and it is a feature we will soon implement. More specifically users will save together with a graph the versions of all the used repositories and during load the code base utilised will come from the recorded versions and not from the current head of the different repositories. This is a feature that we are currently working on now and as our roadmap suggests will be implemented by the release of Heron 1.0. 

      (4) Another limitation that in my opinion is not sufficiently addressed is the communication between the nodes, and the effect of passing all communications via the host machine and SSH. What does this mean for the resulting throughput and latency - in particular in comparison to software such as Bonsai or Autopilot? The paper also states that "Heron is designed to have no message buffering, thus automatically dropping any messages that come into a Node’s inputs while the Node’s worker function is still running."- it seems to be up to the user to debug and handle this manually?

      There are a few points raised here that require addressing. The first is Heron’s requirement to pass all communication through the main (GUI) machine. We understand (and also state in the manuscript) that this is a limitation that needs to be addressed. We plan to do this is by adding to Heron the feature of running headless (see our roadmap). This will allow us to run whole Heron pipelines in a second machine which will communicate with the main pipeline (run on the GUI machine) with special Nodes. That will allow experimenters to define whole pipelines on secondary machines where the data between their Nodes stay on the machine running the pipeline. This is an important feature for Heron and it will be one of the first features to be implemented next (after the integration of the saving system with git). 

      The second point is regarding Heron’s throughput latency. In our original manuscript we did not have any description of Heron’s capabilities in this respect and both other reviewers mentioned this as a limitation. As mentioned above, we have now addressed this by adding a section to our third experimental example that fully describes how much CPU is required to run a full experimental pipeline running on two machines and utilising also non python code executables (a Unity game). This gives an overview of how heavy pipelines can run on normal computers given adequate optimisation and utilising Heron’s feature of forcing some Nodes to run their Worker processes on a specific core. At the same time, Heron’s use of 0MQ protocol makes sure there are no other delays or speed limitations to message passing. So, message passing within the same machine is just an exchange of memory pointers while messages passing between different machines face the standard speed limitations of the Local Access Network’s ethernet card speeds. 

      Finally, regarding the message dropping feature of Heron, as mentioned above this is an architectural decision given the use cases of message passing we expect Heron to come in contact with. For a full explanation of the logic here please see our answer to the 3rd comment by Reviewer 2.

      (5) As a final comment, I have to admit that I was a bit confused by the use of the term "Knowledge Graph" in the title and elsewhere. In my opinion, the Heron software describes "pipelines" or "data workflows", not knowledge graphs - I’d understand a knowledge graph to be about entities and their relationships. As the authors state, it is usually meant to make it possible to "test propositions against the knowledge and also create novel propositions" - how would this apply here?

      We have described Heron as a Knowledge Graph instead of a pipeline, data workflow or computation graph in order to emphasise Heron’s distinct operation in contrast to what one would consider a standard pipeline and data workflow generated by other visual based software (like LabView and Bonsai). This difference exists on what a user should think of as the base element of a graph, i.e. the Node. In all other visual programming paradigms, the Node is defined as a low-level computation, usually a language keyword, language flow control or some simple function. The logic in this case is generated by composing together the visual elements (Nodes). In Heron the Node is to be thought of as a process which can be of arbitrary complexity and the logic of the graph is composed by the user both within each Node and by the way the Nodes are combined together. This is an important distinction in Heron’s basic operation logic and it is we argue the main way Heron allows flexibility in what can be achieved while retaining ease of graph composition (by users defining their own level of complexity and functionality encompassed within each Node). We have found that calling this approach a computation graph (which it is) or a pipeline or data workflow would not accentuate this difference. The term Knowledge Graph was the most appropriate as it captures the essence of variable information complexity (even in terms of length of shortest string required) defined by a Node.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):

      -  No buffering implies dropped messages when a node is busy. It seems like this could be very problematic for some use cases... 

      This is a design principle of Heron. We have now provided a detailed explanation of the reasoning behind it in our answer to Reviewer 2 (Paragraph 3) as well as in the manuscript. 

      -  How are ssh passwords stored, and is it secure in some way or just in plain text?  

      For now they are plain text in an unencrypted file that is not part of the repo (if one gets Heron from the repo). Eventually, we would like to go to private/public key pairs but this is not a priority due to the local nature of Heron’s use cases (all machines in an experiment are expected to connect in a LAN).  

      Minor notes / copyedits:

      -  Figure 2A: right and left seem to be reversed in the caption. 

      They were. This is now fixed. 

      -  Figure 2B: the text says that proof of life messages are sent to each worker process but in the figure, it looks like they are published by the workers? Also true in the online documentation.  

      The Figure caption was wrong. This is now fixed.

      -  psutil package is not included in the requirements for GitHub

      We have now included psutil in the requirements.

      -  GitHub readme says Python >=3.7 but Heron will not run as written without python >= 3.9 (which is alluded to in the paper)

      The new Heron updates require Python 3.11. We have now updated GitHub and the documentation to reflect this.

      -  The paper mentions that the Heron editor must be run on Windows, but this is not mentioned in the Github readme.  

      This was an error in the manuscript that we have now corrected.

      -  It’s unclear from the readme/manual how to remove a node from the editor once it’s been added.  

      We have now added an X button on each Node to complement the Del button on the keyboard (for MacOS users that do not have this button most of the times).

      -  The first example experiment is called the Probabilistic Reversal Learning experiment in text, but the uncertainty experiment in the supplemental and on GitHub.  

      We have now used the correct name (Probabilistic Reversal Learning) in both the supplemental material and on GitHub

      -  Since Python >=3.9 is required, consider using fstrings instead of str.format for clarity in the codebase  

      Thank you for the suggestion. Latest Heron development has been using f strings and we will do a refactoring in the near future.

      -  Grasshopper cameras can run on linux as well through the spinnaker SDK, not just Windows.  

      Fixed in the manuscript. 

      -  Figure 4: Square and star indicators are unclear.

      Increased the size of the indicators to make them clear.

      -  End of page 9: "an of the self" presumably a typo for "off the shelf"?  

      Corrected.

      -  Page 10 first paragraph. "second root" should be "second route"

      Corrected.

      -  When running Heron, the terminal constantly spams Blowfish encryption deprecation warnings, making it difficult to see the useful messages.  

      The solution to this problem is to either update paramiko or install Heron through pip. This possible issue is mentioned in the documentation.

      -  Node input /output hitboxes in the GUI are pretty small. If they could be bigger it would make it easier to connect nodes reliably without mis-clicks.

      We have redone the Node GUI, also increasing the size of the In/Out points.

      Reviewer #2 (Recommendations For The Authors):

      (1) There are quite a few typos in the manuscript, for example: "one can accessess the code", "an of the self", etc.  

      Thanks for the comment. We have now screened the manuscript for possible typos.

      (2) Heron’s GUI can only run on Windows! This seems to be the opposite of the key argument about the portability of the experimental setup.  

      As explained in the answers to Reviewer 1, Heron can run on most machines that the underlying python libraries run, i.e. Windows and Linux (both for x86 and Arm architectures). We have tested it on Windows (10 and 11, both x64), Linux PC (Ubuntu 20.04.6, x64) and Raspberry Pi 4 (Debian GNU/Linux 12 (bookworm), aarch64). We have now revised the manuscript and the GitHub repo to reflect this.

      (3) Currently, the output is displayed along the left edge of the node, but the yellow dot connector is on the right. It would make more sense to have the text displayed next to the connectors.  

      We have redesigned the Node GUI and have now placed the Out connectors on the right side of the Node.

      (4) The edges are often occluded by the nodes in the GUI. Sometimes it leads to some confusion, particularly when the number of nodes is large, e.g., Fig 4.

      This is something that is dependent on the capabilities of the DearPyGUI module. At the moment there is no way to control the way the edges are drawn.

      Reviewer #3 (Recommendations For The Authors):

      A few comments on the software and the documentation itself:

      - From a software engineering point of view, the implementation seems to be rather immature. While I get the general appeal of "no installation necessary", I do not think that installing dependencies by hand and cloning a GitHub repository is easier than installing a standard package.

      We have now added a pip install capability which also creates a Heron command line command to start Heron with. 

      -The generous use of global variables to store state (minor point, given that all nodes run in different processes), boilerplate code that each node needs to repeat, and the absence of any kind of automatic testing do not give the impression of a very mature software (case in point: I had to delete a line from editor.py to be able to start it on a non-Windows system).  

      As mentioned, the use of global variables in the worker scripts is fine partly due to the multi process nature of the development and we have found it is a friendly approach to Matlab users who are just starting with Python (a serious consideration for Heron). Also, the parts of the code that would require a singleton (the Editor for example) are treated as scripts with global variables while the parts that require the construction of objects are fully embedded in classes (the Node for example). A future refactoring might make also all the parts of the code not seen by the user fully object oriented but this is a decision with pros and cons needing to be weighted first. 

      Absence of testing is an important issue we recognise but Heron is a GUI app and nontrivial unit tests would require some keystroke/mouse movement emulator (like QTest of pytest-qt for QT based GUIs). This will be dealt with in the near future (using more general solutions like PyAutoGUI) but it is something that needs a serious amount of effort (quite a bit more that writing unit tests for non GUI based software) and more importantly it is nowhere as robust as standard unit tests (due to the variable nature of the GUI through development) making automatic test authoring an almost as laborious a process as the one it is supposed to automate.

      -  From looking at the examples, I did not quite see why it is necessary to write the ..._com.py scripts as Python files, since they only seem to consist of boilerplate code and variable definitions. Wouldn’t it be more convenient to represent this information in configuration files (e.g. yaml or toml)?  

      The com is not a configuration file, it is a script that launches the communication process of the Node. We could remove the variable definitions to a separate toml file (which then the com script would have to read). The pros and cons of such a set up should be considered in a future refactoring.

      Minor comments for the paper:

      -  p.7 (top left): "through its return statement" - the worker loop is an infinite loop that forwards data with a return statement?  

      This is now corrected. The worker loop is an infinite loop and does not return anything but at each iteration pushes data to the Nodes output.

      -  p.9 (bottom right): "of the self" → "off-the-shelf"  

      Corrected.

      -  p.10 (bottom left): "second root" → "second route"  

      Corrected.

      -  Supplementary Figure 3: Green start and square seem to be swapped (the green star on top is a camera image and the green star on the bottom is value visualization - inversely for the green square).  

      The star and square have been swapped around.

      -  Caption Supplementary Figure 4 (end): "rashes to receive" → "rushes to receive"  

      Corrected.

  4. Jul 2022
    1. Yes, it’s making it easier than ever to write code collaboratively in the browser with zero configuration and setup. That’s amazing! I’m a HUGE believer in this mission.

      Until those things go away.

      A case study: DuckDuckHack used Codio, which "worked" until DDG decided to call it a wrap on accepting outside contributions. DDG stopped paying for Codio, and because of that, there was no longer an easy way to replicate the development environment—the DuckDuckHack repos remained available (still do), but you can't pop over into Codio and play around with it. Furthermore, because Codio had been functioning as a sort of crutch to paper over the shortcomings in the onboarding/startup process for DuckDuckHack, there was never any pressure to make sure that contributors could easily get up and running without access to a Codio-based development environment.

      It's interesting that, no matter how many times cloud-based Web IDEs have been attempted and failed to displace traditional, local development, people keep getting suckered into it, despite the history of observable downsides.

      What's also interesting is the conflation of two things:

      1. software that works by treating the Web browser as a ubiquitous, reliable interpreter (in a way that neither /usr/local/bin/node nor /usr/bin/python3 are reliably ubiquitous)—NB: and running locally, just like Node or Python (or go build or make run or...)—and

      2. the idea that development toolchains aiming for "zero configuration and setup" should defer to and depend upon the continued operation of third-party servers

      That is, even though the Web browser is an attractive target for its consistency (in behavior and availability), most Web IDE advocates aren't actually leveraging its benefits—they still end up targeting (e.g.) /usr/local/bin/node and /usr/local/python3—except the executables in question are expected to run on some server(s) instead of the contributor's own machine. These browser-based IDEs aren't so browser-based after all, since they're just shelling out to some non-browser process (over RPC over HTTP). The "World Wide Wruntime" is relegated to merely interpreting the code for a thin client that handles its half of the transactions to/from said remote processes, which end up handling the bulk of the computing (even if that computing isn't heavyweight and/or the client code on its own is full of bloat, owing to the modern trends in Web design).

      It's sort of crazy how common it is to encounter this "mental slippery slope": "We can lean on the Web browser, since it's available everywhere!" → "That involves offloading it to the cloud (because that's how you 'do' stuff for the browser, right?)".

      So: want to see an actual boom in collaborative development spurred by zero-configuration dev environments? The prescription is straightforward: make all these tools truly run in the browser. The experience we should all be shooting for resemble something like this: Step 1: clone the repo Step 2: double click README.html Step 3: you're off to the races—because project upstream has given you all the tools you need to nurture your desire to contribute

      You can also watch this space for more examples of the need for an alternative take on working to actually manage to achieve the promise of increased collaboration through friction-free (or at least friction-reduced) development: * https://hypothes.is/search?q=%22the+repo+is+the+IDE%22 * https://hypothes.is/search?q=%22builds+and+burdens%22

  5. Jun 2022
  6. Mar 2022
  7. Feb 2022
    1. Also, this would be a good trick to use to realize "README.html":

      If folks were really committed to improving the developer experience, [...] development would work like this: ¶1. Download the project source tree ¶2. Open README.html ¶3. Drag and drop the project source onto README.html

      This is also a ripe place for the toolbench pattern to manifest.

      The README can both appear to take care of the ABCs and also act as the entry point to any other shell stowed away in the project. For example, in atticus.js, the README contains a line that says to run the tests "use tests/harness.app.htm on the project repo". We could kick off the build process, open up Contribute.app.htm, squirt the contents of README.txt.htm over there, and then display that to the user, making that region "live" (so actually getting to the test runner and running the tests is even easier).

  8. Oct 2021
  9. Aug 2021
    1. There's a lot of cruft here. Consider that while a project might have a prominently named file like "README" that is meant to be the first thing a wanderer encounters, the true first encounter is the file listing in the project source tree:

      • build/
      • config/
      • src/
      • .babelrc
      • .dockerignore
      • .editorconfig
      • .gitignore
      • .stylelintrc
      • .travis.yml
      • Dockerfile
      • Gruntfile.js
      • LICENSE
      • Procfile
      • README.md
      • aldine.sublime-project
      • aldine.sublime-workspace
      • circle.yml
      • package.json
      • tsconfig.json
      • tslint.json
      • yarn.lock

      Imagine a commit (or a pull request) with the summary "Remove cruft". Why might it be rejected? Let's get more specific.

      There's a Dockerfile here. There's also a package.json. We can ask of each of these, "Why is this here?" The answer is, "Because someone found them useful." Consider, then, that here's a strong case for a contrib/ directory† for this project and where these things should be kept, ill-conceived tooling conventions notwithstanding.

      † This link points to a particular blog post that explains the purpose of a contrib/ directory, but this is not an endorsement of Mr DeVault's other positions or demeanor. Ignore any stridence, arrogance, or other obnoxiousness that you might encounter in your pursuit to pull at any threads from that corner of the Web.

  10. May 2021