Based on 1,0000 GEs at 10 ng, they have around 60% losses
- Nov 2025
-
-
-
Capping at 60 ng input wasperformed for some of the cohort explaining the peak at this value;
They capped input at 60 ng (20,000 GEs), and the limit of est. molecule counts for each position around 8000, which indicates 60% losses during processing.
-
-
-
We next investigated how increasing sequencing depthaffects gene detection (Data S1). For multi-exon genes, wedefined “detection” as having more than 50 total readswith at least two junction-spanning reads. Single-exongenes required more than 100 total reads. These thresh-olds were chosen based on junction ratios of genes atdifferent read counts (Figure S5) and manual inspectionof the raw data through the Integrative Genomics Viewer.Overall, iPSCs yielded the highest number of detectedgenes among the four CATs (Figure 1A), consistent withprevious findings that iPSCs express a wide variety ofgenes.28 Detection performance in LCLs was modest atlower depths but converged with that of blood and fibro-blasts at higher depths, likely due to the larger number oflow-expressing genes in LCLs (Figure S6). Across all fourCATs, each additional million reads uncovered 10–30new genes at 100M reads. At 1,000M reads, the detectionrate slowed, reaching 1–2 new genes per million reads(Figure 1A), suggesting a saturation effect for gene
Basically, "1B reads is enough to detect most things"
-
ost reference datasets and diagnostic protocols employ relatively modest sequencingdepths (∼50–150 million reads), which may fail to detect low-abundance transcripts and rare splicing events critical for accurate diagnosis.
Typical RNA seq read depths (>200M is considered high)
-
-
Local file Local file
-
We conclude that cDNA sequencing with30–40 million read measurements readily detects major spliceisoforms for abundant and moderately abundant transcripts,whereas splice detection for the lowest-abundance RNA classesand isoforms is sporadic.
Core conclusion: 40M is enough unless you want rare transcripts or isoforms
-
- Oct 2025
-
Local file Local file
-
The variants detected in the plasma were characterized by comparingthe plasma and buffy coat VF and sequencing results. A variant ischaracterized as being of tumor origin if the lower 95% confidenceinterval (CI) of the plasma VF is greater than the upper bound of the95% CI of the buffy coat VF. The variant is characterized as being ofCH origin if the buffy coat VF is less than 20% and the 95% CI overlapswith the plasma VF or if the buffy coat VF is less than 20% and this VFis greater than the plasma VF. It is characterized as germline if the buffycoat VF is 30% or higher and the plasma VF is less than the buffy coatVF. The variant source is considered unknown if the buffy coat VF wasbetween 20% and 30% to account for potential postzygotic mosaicismresulting in the presence of certain variants only in a subset of cells (13).Variants detected in genes that are known to commonly harbor CHmutations (e.g., ASXL1, JAK2, DNMT3A, and TET2) are characterizedas CH if the buffy coat VF is greater than the plasma VF, as outlinedabove and depicted in Fig. 1.
Same as in Abraham et al. 2025
-
-
Local file Local file
-
we interro-gated 15 mutation hot spots in blood DNA from 4,219individuals using ultra-deep sequencing.
Healthy individuals (no blood disorder) only
-
-
Local file Local file
-
For a sample, they calculate the number of total genomes using QCTs. Then, they calculate the fetal fraction using polymorphic loci. Based on the fetal fraction and the number of total genomes, they estimate the expected number of fetal antigen molecules (AEM) if the fetus is positive. If the AEM is lower than the threshold, they discard.
Then, they calculate the number of antigen molecules detected (ADM) using the read counts and QCTs. Then, they calculate the detected fetal antigen fraction (CFAF) as the number of detected molecules over the number of expected molecules. Across the 5 RhD loci, they take the second highest CFAF value and they compare it to the detection ranges to make the "antigen detected" call.
-
-
Local file Local fileresume2
-
UNIVERSITY OF CALIFORNIA, Los Angeles, California. Volunteer Bioinformatics Analyst. 2021-2025.
Why not join at UCLA?
-
UNIVERSITY OF SOUTHERN CALIFORNIA, Los Angeles, California, Assistant Professor at the Keck School ofMedicine, 2008-2015.
Likely missed tenure
-
-
-
We observed distinct transcriptional con-tributions from solid tissue-specific cell types from the intestine,liver, lungs, pancreas, heart, and kidney (Fig. 1d and Extended DataFig. 4).
They showed that cfRNA data included transcriptional signatures which did not match type-specific transcriptional signatures from hematopoietic cells but did match signatures from intestine, liver, lungs, pancreas, heart, & kidney cells.
-
We used this matrix to deconvolve the cell types of origin inthe plasma cell-free transcriptome
Then, they quantified each marker transcript in the public cfRNA data from healthy patients, and they used them to deconvolve how many of each cell type were present in each sample the cfRNA data.
-
We used Tabula Sapiens ver-sion 1.0 (TSP)12, a multiple-donor whole-body cell atlas spanning24 tissues and organs, to define a basis matrix whose gene set accu-rately and simultaneously resolved the distinct cell types in TSP.
Then, they took public scRNA-seq data (Tabula Sapiens) from cells with labeled types and used SVM to identify the smallest setof transcripts in those samples which discriminated cells of each type from each other.
-
Marker genes for blood, brain, and liver cell types were readilydetected, as previously observed at tissue level 1,3,4,6,7, as well as thekidney, gastrointestinal tract, and pancreas (Fig. 1b).
Basically, they took public cfRNA data from healthy patients and identified transcripts. Then they showed that the transcripts they detected included marker transcripts for certain cell types.
-
-
Local file Local file
-
Barcoded nanopore sequencing libraries were created using a custom low-input protocol for theOxford Nanopore Technologies SQK-NBD-114.24 and LSK-114 kits. Briefly, cDNA was end-repaired and A-tailed as specified, but incubated for 30 minutes, followed by a 30-minutedeactivation. Barcodes from the SQK-NBD-114.24 kit were incubated at 20 C for 4.5 hours.
No UMIs
-
Novel transcript discovery
Quote GPT: "Let’s break this down step by step and look carefully at the computational procedure they used, assessing both its strengths and weaknesses.
What They Did Right
-
Choice of Reference and Annotation
-
They used hg38 as the reference genome and the GENCODE v39 comprehensive transcript annotation set.
-
This is good practice, since hg38 is the current gold-standard reference, and GENCODE is the most complete and high-quality human transcript annotation resource.
-
Tool for Isoform Discovery (IsoQuant v3.3.1)
-
IsoQuant is designed for long-read transcriptome analysis and is well-suited to Oxford Nanopore data, which has unique error profiles compared to Illumina.
-
They set
-d nanopore, which ensures that IsoQuant applies the correct error model and alignment heuristics. -
Quality Control (SQANTI3)
-
They validated the assembled transcript models with SQANTI3, which is one of the standard tools for assessing structural and quality metrics of novel isoforms (junction accuracy, coverage, splice motif analysis, etc.).
-
This helps filter out false positives and adds confidence to reported novel isoforms.
-
Quantification Strategy
-
Using
--gene quantification alland--transcript quantification allensures they didn’t limit themselves to annotated transcripts, which is important for discovery.
Potential Weaknesses / Flaws
-
Batch Analysis of All FASTQs Together
-
They ran IsoQuant on all FASTQs in one batch.
- This risks pooling artifacts across samples: rare or sample-specific isoforms may look more convincing when aggregated, but they might not be real if not supported in individual samples.
-
Ideally, one should assemble transcripts per sample first and then merge, to track sample-specific expression and minimize false discoveries.
-
Strandedness Setting (
-stranded none) -
For cfRNA, strandedness is often important to distinguish sense from antisense transcription.
-
By setting it to "none," they lose orientation information, which could inflate the number of apparent novel transcripts or misclassify antisense transcription as novel.
-
Loose Matching Strategy (
-matching_strategy loose) -
This increases sensitivity but at the cost of specificity.
-
Nanopore reads have higher error rates, and loose matching risks aligning spurious sequences as novel isoforms when they might be alignment artifacts.
-
Novel Transcript Reporting (
--report_novel_unspliced true) -
Allowing unspliced transcripts to be reported as novel might overcount pre-mRNA fragments or degraded cfRNA, which are abundant in plasma.
-
Without careful filtering, many of these could be biologically irrelevant.
-
SQANTI3 with Default Parameters
-
While SQANTI is robust, running it with defaults only may not be enough. For cfRNA (which is fragmented and low-abundance), more stringent filters (on junction support, read coverage, canonical splice motifs) are usually needed.
- Otherwise, false positives could pass QC.
Overall Reliability
- Strengths: They used well-regarded tools (IsoQuant + SQANTI), a proper reference (hg38 + GENCODE), and applied the Nanopore-specific model. That’s a solid foundation.
- Weaknesses: They pooled samples, did not use strandedness, allowed loose matching, and reported unspliced transcripts—all of which tend to inflate the number of “novel” transcripts and may misclassify artifacts as biology. For cfRNA, which is already fragmented and noisy, these issues are magnified.
Conclusion
Their method is adequate for generating hypotheses about novel cfRNA transcripts, but the reliability of their discoveries is questionable without stronger filtering and sample-level validation. The key weaknesses (pooling, unstranded analysis, permissive matching, unspliced reporting) suggest their reported “novel transcripts” are likely overestimates.
If I were evaluating this, I’d say:
- Good first pass, but high false discovery rate.
- They should have assembled per-sample, used stranded protocols if possible, tightened alignment criteria, and reported only spliced, reproducible isoforms across samples."
-
-
ptimallambda values for the logistic regression model were identified using 5-fold cross validation, andthe value 1 standard error away from the minimum cross-validated value of lambda was selectedfor final model training. The final model was trained utilizing the entire dataset, and furtherfeature reduction was performed using LASSO regression.
They didn't do any validation or testing on a validation/test set, they just trained, subset features, and reported sens/spec
-
- Sep 2025
-
Local file Local file
-
Microbial read counts for thyroid carcinoma tissue samples
-
Microbial read counts per million for thyroid carcinoma tissue samples
-
Summary statistics of filtering procedure
-
-
-
We compared 48 cases of Alphapapillomavirus detection in WGSdata against the current gold standard test of mRNA PCR high-risk/tumorigenic subtypes of HPV. The performance using WGS datawas excellent, with only one sample not matching the gold standard(n = 48; sensitivity = 100%, specificity = 97.3%; Fig. 3A). This sam-ple had high HPV burden as detected by WGS and was likely a false-negative result for the PCR-based test.
Somehow alphapapillomavirus leads to head and neck cancer
-
-
Local file Local file
-
The associations may be based on batch differences
-
-
Local file Local file
-
Seems like batch differences
-
-
Local file Local file
-
relative abundance thresholds40,41.
DNA/RNA targets are low-abundance,
-
while robustly preserving biological features42. Instead, in many cases, key biological featuresare inadvertently removed from datasets, while contaminants and artifacts are kept36
discovery of low-abundance biosignatures
-
formative diagnostic markers18-22.
predictive disease assessmen
-
-
Local file Local file
-
sparse1-3, noisy4,5 contain low-abundancefeatures2,6 and come from disparate datasets7,8
However, SoA approaches do not work in many critical applications
-
directly quantifying information loss/gain by SOA approaches and MIRACLe using well-established information-theoretic, divergence, and distance metrics26,
- Third, we will perform preliminary experiments to quantify the actual amount of
-
currently impossible25 training of predictive AI models on low abundancebiomarkers
However, SoA approaches do not work in many critical applications
-
rmance of models trained on simulated amplicon-sequencing data withSOA approaches15,28-33
Current SoA pipelines map reads to read counts
-
SOA11,20
Current SoA pipelines map reads to read counts
-
-
Local file Local file
-
Long-probe quantitative amplifiedsignal (LQAS) assay master
LQAS is often used as a way to do qPCR that is methylation-specific besides methylation-specific qPCR.
-
- Aug 2025
-
Local file Local fileppmSeq21
-
Duplex recovery is assessed by the final number of dsDNA bases sequenced (in Gb)and total dsDNA coverage (in genome equivalents) across total bases sequenced
Basically, "duplex recovery" is calculated as: identify every dsDNA read (a read which has at least 1 read from the opposite strand); recovery is "the number of bases on all dsDNA reads divided by the number of bases on all reads". Max is 100%.
-
-
-
Quantitative real-time PCR (RT-qPCR) was used for quantification ofcfRNA. An RNA-specific primer was designed to cover a 97-bp ampli-con spanning 2 exon boundaries in the housekeeping gene GAPDH(forward 5′-GATCATCAGCAATGCCTCCT-3′, reverse 5′-TGTGGTCATGAGTCCTTCCA-3′). A DNA-specific primer was designed to covera 78-bp transcriptionally silent region of chromosome 12 (forward5′-TACGGTTGGTCCTTTCTTCG-3′, reverse 5′-TTTCCTTTGGGTCTGAATGC-3′). Reverse transcription was first performed using the High-Capacity cDNA Reverse Transcription kit (Applied Biosystems). qPCRwas then run using 2X Power SYBR Green PCR Master Mix (ThermoFisher Scientific) on an Applied Biosystems 7500 Fast Real-Time PCRor QuantStudio 7 Pro instruments. Universal Human Reference RNA(Thermo Fisher Scientific) was run in parallel to generate a standardcurve, and cfRNA concentrations were calculated by comparing theRNA-specific Ct value of the sample to the standard curve.
They quant cfRNA after extraction but before library prep & capture. They use RT-qPCR of a GAPDH region vs a standard curve made using Universal Human Reference RNA. Therefore their actual RNA quant might be off - it's more like a rough overall expression estimate.
-