2,413 Matching Annotations
  1. Jul 2025
    1. Run a generative AI chatbot on Jetson Orin Nano Super Developer Kit. This chatbot features Ollama with Open WebUI, a widely used, open-source, chatbot server interface that connects to locally running LLMs.

      deploying Omi - Open WebUI could be used to run a local LLM through API calls on T8 server?

    1. npm is a couple of things. First and foremost, it is an online repository for the publishing of open-source Node.js projects. Second, it is a CLI tool that aids you install those packages and manage their versions and dependencies.

    Annotators

    1. Thankfully, the development of AI technologies, especially Large Language Models (LLMs) [8, 9, 10] with strong reasoning, adequate knowledge reserve and excellent coding capabilities [11], is reshaping the paradigms and precepts of how people leverage bioinformatics data.

      introduction

    1. many hurdles, caused by the heterogeneity of domain data, the sophistication of domain knowledge, the uniqueness of domain objectives, and the diversity of the constraints
    2. we present a comprehensive survey on domain specification techniques for large language models, an emerging direction critical for large language model applications
    3. LLMs, significantly outperforming smaller models in understanding and generating human-like text,have emerged as a promising AI research trend
    4. domainspecialization of Large Language Models (LLMs) is defined as the process of customizing general-purpose LLMs accordingto specific domain contextual data, augmented by domain-specific knowledge, optimized by the domain’s objective, andregulated by domain-specific constraints

      for introduction

    1. sketching, a popular data compression technique, can serve as an efficient adaptation strategy for LLMs while avoiding low-rank assumptions
    1. predefined workflows and rigid models, SpatialAgent employs adaptive reasoning and dynamic tool integration, allowing it to adjust to new datasets, tissue types, and biological questions
    2. Key modules. The action module (left) executes tasks such as retrieving reference datasets, converting gene names, verifying ligand–receptor interactions using existing databases, processing data with established software packages (e.g., numpy) or generating and executing custom code, while reasoning over and aggregating information from multiple sources
    3. Refer to the original/live annotation in Zotero/note

      This tool does something very similar to omi and has lot of desirable qualities + evaluation methods we can learn from. #omi-relevance

      What it can do

      SpatialAgent employs adaptive reasoning and dynamic tool integration, allowing it to adjust to new datasets, tissue types, and biological questions. It processes multimodal inputs, incorporates external databases, and supports human-in-the-loop interactions, enabling both fully automated and collaborative discovery

      tasks such as gene panel design, cell and tissue annotation, and pattern inference in cell-cell communication and pathway analysis

    1. open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use.
    2. performs competitively with state-of-the-art reasoning models such as DeepSeek-R1 while offering superior inference throughput and memory efficiency
    3. using neural architecture search from Llama 3 models for accelerated inference, knowledge distillation, and continued pretraining, followed by a reasoning-focused post-training stage
    1. anvi’o75 was used to profile and visualize the different Turicibacter strain DNA sequences to locate putative bile salt hydrolase and 7α-HSDH homologs in contig groups, generate variability profiles, and measure gene coverage and detection statistics.
    1. Yet, taxonomic insights offer limited utility to understand functional drivers of biological systems, a pinnacle desire that brings together many corners of microbiology
    1. introduce Lyra, a subquadratic architecture for sequence modeling, grounded in the biological framework of epistasis for understanding sequence-to-function relationships
    1. we propose the use of a pangenome graph, built from assembly graphs produced by assembling short reads of the same sample with different assemblers
    2. highlights similarities between contigs from different assemblies while retaining information on contigs that appear only in one of the input assemblies
    3. Assembly graphs produced by different tools from the same data may differ significantly, posing a challenge to tools for downstream processing tasks

      This could be a useful tool to integrate post assemblies if it improves compatibility with subsequent tools such as plasmid binning in #SOMAteM


      (not relevant, since this paper solves this issue) How can the LLM help solve this by suggesting the correct downstream tool or by converting outputs to be compatible?

    1. present the first assembly-free and mapping-free approach for augmenting an existing pangenome graph using unassembled long reads from an individual not already present in the pangenome.
    1. agentic technology uses tool calling on the backend to obtain up-to-date information, optimize workflows and create subtasks autonomously to achieve complex goals.
    2. ability to store past interactions in memory and plan future actions encourages a personalized experience and comprehensive responses
    1. scaffold information generated by Bambus 2 allows us to integrate multiple sources of information and obtain more accurate annotations of the resulting assembly
    2. provide additional functionality made possible by the integration of different analyses

      Need to understand details of this: What specific integration does MetAMOS really do?

    3. INSTALL script. This will automatically configure the pipeline to run within the user's environment and also fetch all required data

      data => databases?

    1. Is a multi-centre study evaluating the use of nanopore-16S for clinical microbial detection using shared mock samples (looking for consitency, LODs etc..?)

      This study does nanopore on 16S. Compares two bioinformatic pipelines and uses Emu

      Todd: Emu holding its own against a commercial tool, fewer species classified (likely DB issue) but better precision wrt discriminating species

      • Only shortcoming is that the Emu pipeline (GMS-16S) classified fewer species

        • Todd says this is likely a database issue.

        • Can be fixed when implementing #SOMAteM?

        • Check methods for details on the Emu pipeline: “Bioinformatic data analysis and identification of pathogen

      Evaluation of two bioinformatic pipelines: 1928-16S and GMS-16S<br /> The performance of two separate bioinformatic pipelines were compared: the commercial 16S pipeline developed by 1928 Diagnostics (1928-16S) and the gms_16S bioinformatics analysis pipeline that uses the EMU classification tool (GMS-16S). Overall, 1928-16S identified a higher number of species in comparison to GMS-16S (Supplementary FigS2, Supplementary file 2 and 3). However, significant differences were observed at species level, particularly for Streptococcus and Staphylococcus. GMS-16S demonstrated high accuracy of species level classification, effectively discriminating S. intermedius from S. anginosus in sample G4, as well as separating S. aureus from Staphylococcus argenteus in sample Q3 (Fig. 3a). GMS-16S also more accurately classified members of the Enterobacteriaceae family (Q7, Q5), and was able to identify Serratia marcescens at species level with greater precision in sample Q1 compared to 1928-16S. Conversely, 1928-16S classified a larger proportion of reads as C. acnes in sample G6 (laboratory k), whereas GMS-16S distributed the reads between C. acnes and the closely related C. namnetense.

      <annotations in Public group>

    2. commercial 16S bioinformatic pipeline from 1928 Diagnostics (1928-16S) was evaluated and compared with the open-sourced gms_16S pipeline that is based on the EMU classification tool (GMS-16S).

      Emu is more accurate ; Todd is happy :)

      • more annotations in Public group
    1. RapidONT, a workflow designed for cost-effective and accessible WGS-based pathogen analysis

      Includes both a lab protocol and bioinformatic pipeline

    1. choice of the right algorithm for a given dataset has become difficult due to numerous comparative reports on these different assemblers [88, 89]

      What does the choice of algorithm depend on?

    2. major advantage of De Bruijn graphs is that assembled reads contain fewer errors and errors can be easily corrected prior to assembly
  2. amos.sourceforge.net amos.sourceforge.net
    1. small, circular nature of the mitochondrial genome allows reads to span the start and end positions, leading to incomplete exclusion of mtDNA
    1. MADRe, a modular and scalable pipeline for long-read strain-level metagenomic classification, enhanced with Metagenome Assembly-Driven Database Reduction.
    2. contig-to-reference mapping reassignment based on an expectation-maximization algorithm for database reduction,

      EM method similar to EMU?

    3. mapping-based tools such as MetaMaps [24], PathoScope2 [25], EMU [26] and MORA [27], which rely on read alignments and reassignment algorithms, offer higher precision at a greater computational cost.
    4. range of metagenomic classification tools have been developed, which can be broadly categorized into marker-based, DNA-to-protein and DNA-to-DNA approaches, as described in [4].
    5. K-mer-based tools such as Kraken2 [14], KrakenUniq [15], Bracken [16], Centrifuge [17], CLARK/CLARKS [18, 19], Ganon [20, 21], Taxor [22], and Sylph [23] are known for their speed and scalability to large databases, but often trade precision for speed

      This whole paragraph has good knowledge that can be incorporated into LLM-RAG? - can ask user about their need for speed!? vs accuracy

    6. MADRe achieves high precision and strain-level resolution while maintaining lower memory usage and runtime compared to existing tools
    1. assembly tools remain prone to large-scale errors caused by repeats in the genome, leading to inaccurate detection of AMR gene content
    2. the fact that multiple consecutive genes lie within a single read to construct gene-space de Bruijn graphs where the k-mer alphabet is the set of genes in the pan-genome of the species under study
    3. reads corresponding to different copies of AMR genes can be effectively separated based on the genomic context of the AMR genes, and used to infer the nucleotide sequence of each copy
    1. We present Autocycler, a command-line tool for generating accurate bacterial genome assemblies by combining multiple alternative long-read assemblies of the same genome
    2. Autocycler builds a compacted De Bruijn graph from the input assemblies, clusters and filters contigs, trims overlaps and resolves consensus sequences by selecting the most common variant at each locus
    1. To migrate this code to DSL2, you need to move all of your channel logic throughout the script into a workflow definition

      seqscreen was writtein in DSL1, needs to be migrated (Todd)

    1. driving the development of community-centric tools on Seqera.io, empowering scientists worldwide to leverage modern software capabilities on demand
    1. Programmed with a deep understanding of Nextflow, common bioinformatics tools, and the overarching scientific community.

      by "overarchinve scientific community" do you mean some discussions on nf-core forums?

    2. has deep knowledge of the errors

      What could be the source of this knowledge? - Maybe a human in the loop training with automated code gen + linter use? - Grazing on forums?

      able to identify the root cause of errors, help troubleshoot, and suggest edits

    3. not only give you the initial conversion, but also run the stages of the code that it generates with sample data and iteratively correct any code that yields runtime errors
    4. convert a pipeline from Bash/CWL/WDL to Nextflow

      use cases

      can not only give you the initial conversion, but also run the stages of the code that it generates with sample data and iteratively correct any code that yields runtime errors

    5. Seqera AI – a bioinformatics agent purpose-built for the scientific lifecycle

      Seqera-AI can - Suggest pipelines (tested and validated) - Answering bioinformatics questions with context - Generate nextflow code + validate/self-correct (when would someone use this?)

      context retrieved: - Can retrieve context for writing and testing nextflow code - context of pipeline results to aid interpretation

      source: Summarized from text below

    1. importance of improved educational programs aimed at biologists and life scientists that emphasize best practices in data engineering
    2. increased theoretical and empirical research on data provenance, error propagation, and on understanding the impact of errors on analytic pipelines
    3. we focus specifically on concerns that lie at the interface of biological data and computational inference with the goal of inspiring increased research and educational activities in this space
    1. how to best benefit from recent advances in AI and how to generate, format and disseminate data to enable future breakthroughs in AI-guided drug discovery
    1. When given well-crafted instructions, these chatbots hold the potential to significantly augment bioinformatics education and research
    2. role prompting that assigns a role to the chatbot, few-shot prompting that provides relevant examples, and chatbot self-reflection that improves responses based on task feedbacks
    1. In addition, varying study designs will require project-specific statistical analyses.

      how is this addressed? - helpful for #SOMAteM

    2. use of isolated Conda environments for Hecatomb minimizes package version conflicts, minimizes overhead when rebuilding environments for updated dependencies, and allows maintenance and customization of different Hecatomb versions.
    3. While Hecatomb is a Snakemake pipeline, it uses the Snaketool command line interface to make running the pipeline as simple as possible [95]. Snaketool populates required file paths and configuration files, allowing Hecatomb to be configured and run with a simple command
    1. An opt-in feature for now, strict syntax enables consistent behavior between the Nextflow CLI and language server, and enables numerous new features
    1. This new specification enables more specific error reporting, ensures more consistent code, and will allow the Nextflow language to evolve independently of Groovy.
    2. strict syntax will eventually become the only way to write Nextflow code, and new language features will be implemented only in the strict syntax
    1. omi feature idea: minor CLI tools - not pipelines

      • Thought process: What does this tool need as input: MSA.

      • Can this CLI tool make the MSA as well if the user tells it stuff? That’s too specialized -- would be nice to make an LLM tool like omi for that though

      • I think omi can beat seqera AI and chatGPT in this space where we identify and wrap essential CLI tools to be run by text prompts

      • Leave the nextflow part to seqera AI :: if it’s good enough for running pipelines