Hypothesis

34 Matching Annotations

Nov 2025
Local file Local file

Tracking early lung cancer metastatic dissemination in TRACERx using ctDNA

2
1. Shruteek 23 Nov 2025
  
  in Public
  
  Based on 1,0000 GEs at 10 ng, they have around 60% losses
2. Shruteek 23 Nov 2025
  
  in Public
  
  Capping at 60 ng input wasperformed for some of the cohort explaining the peak at this value;
  
  They capped input at 60 ng (20,000 GEs), and the limit of est. molecule counts for each position around 8000, which indicates 60% losses during processing.
Annotators

Shruteek
Local file Local file

The utility of ultra-deep RNA sequencing in Mendelian disorder diagnostics

2
1. Shruteek 12 Nov 2025
  
  in Public
  
  We next investigated how increasing sequencing depthaffects gene detection (Data S1). For multi-exon genes, wedefined “detection” as having more than 50 total readswith at least two junction-spanning reads. Single-exongenes required more than 100 total reads. These thresh-olds were chosen based on junction ratios of genes atdifferent read counts (Figure S5) and manual inspectionof the raw data through the Integrative Genomics Viewer.Overall, iPSCs yielded the highest number of detectedgenes among the four CATs (Figure 1A), consistent withprevious findings that iPSCs express a wide variety ofgenes.28 Detection performance in LCLs was modest atlower depths but converged with that of blood and fibro-blasts at higher depths, likely due to the larger number oflow-expressing genes in LCLs (Figure S6). Across all fourCATs, each additional million reads uncovered 10–30new genes at 100M reads. At 1,000M reads, the detectionrate slowed, reaching 1–2 new genes per million reads(Figure 1A), suggesting a saturation effect for gene
  
  Basically, "1B reads is enough to detect most things"
2. Shruteek 12 Nov 2025
  
  in Public
  
  ost reference datasets and diagnostic protocols employ relatively modest sequencingdepths (∼50–150 million reads), which may fail to detect low-abundance transcripts and rare splicing events critical for accurate diagnosis.
  
  Typical RNA seq read depths (>200M is considered high)
Annotators

Shruteek
Local file Local file

Untitled document

1
1. Shruteek 12 Nov 2025
  
  in Public
  
  We conclude that cDNA sequencing with30–40 million read measurements readily detects major spliceisoforms for abundant and moderately abundant transcripts,whereas splice detection for the lowest-abundance RNA classesand isoforms is sporadic.
  
  Core conclusion: 40M is enough unless you want rare transcripts or isoforms
Annotators

Shruteek
Oct 2025
Local file Local file

Characterization of Plasma Cell-Free DNA Variants as of Tumor or Clonal Hematopoiesis Origin in 16,812 Advanced Cancer Patients

1
1. Shruteek 15 Oct 2025
  
  in Public
  
  The variants detected in the plasma were characterized by comparingthe plasma and buffy coat VF and sequencing results. A variant ischaracterized as being of tumor origin if the lower 95% confidenceinterval (CI) of the plasma VF is greater than the upper bound of the95% CI of the buffy coat VF. The variant is characterized as being ofCH origin if the buffy coat VF is less than 20% and the 95% CI overlapswith the plasma VF or if the buffy coat VF is less than 20% and this VFis greater than the plasma VF. It is characterized as germline if the buffycoat VF is 30% or higher and the plasma VF is less than the buffy coatVF. The variant source is considered unknown if the buffy coat VF wasbetween 20% and 30% to account for potential postzygotic mosaicismresulting in the presence of certain variants only in a subset of cells (13).Variants detected in genes that are known to commonly harbor CHmutations (e.g., ASXL1, JAK2, DNMT3A, and TET2) are characterizedas CH if the buffy coat VF is greater than the plasma VF, as outlinedabove and depicted in Fig. 1.
  
  Same as in Abraham et al. 2025
Annotators

Shruteek
Local file Local file

Leukemia-Associated Somatic Mutations Drive Distinct Patterns of Age-Related Clonal Hemopoiesis

1
1. Shruteek 15 Oct 2025
  
  in Public
  
  we interro-gated 15 mutation hot spots in blood DNA from 4,219individuals using ultra-deep sequencing.
  
  Healthy individuals (no blood disorder) only
Annotators

Shruteek
Local file Local file

Untitled document

1
1. Shruteek 09 Oct 2025
  
  in Public
  
  For a sample, they calculate the number of total genomes using QCTs. Then, they calculate the fetal fraction using polymorphic loci. Based on the fetal fraction and the number of total genomes, they estimate the expected number of fetal antigen molecules (AEM) if the fetus is positive. If the AEM is lower than the threshold, they discard.
  
  Then, they calculate the number of antigen molecules detected (ADM) using the read counts and QCTs. Then, they calculate the detected fetal antigen fraction (CFAF) as the number of detected molecules over the number of expected molecules. Across the 5 RhD loci, they take the second highest CFAF value and they compare it to the detection ranges to make the "antigen detected" call.
Annotators

Shruteek
Local file Local file

resume

2
1. Shruteek 07 Oct 2025
  
  in Public
  
  UNIVERSITY OF CALIFORNIA, Los Angeles, California. Volunteer Bioinformatics Analyst. 2021-2025.
  
  Why not join at UCLA?
2. Shruteek 07 Oct 2025
  
  in Public
  
  UNIVERSITY OF SOUTHERN CALIFORNIA, Los Angeles, California, Assistant Professor at the Keck School ofMedicine, 2008-2015.
  
  Likely missed tenure
Annotators

Shruteek
Local file Local file

Cell types of origin of the cell-free transcriptome

4
1. Shruteek 06 Oct 2025
  
  in Public
  
  We observed distinct transcriptional con-tributions from solid tissue-specific cell types from the intestine,liver, lungs, pancreas, heart, and kidney (Fig. 1d and Extended DataFig. 4).
  
  They showed that cfRNA data included transcriptional signatures which did not match type-specific transcriptional signatures from hematopoietic cells but did match signatures from intestine, liver, lungs, pancreas, heart, & kidney cells.
2. Shruteek 06 Oct 2025
  
  in Public
  
  We used this matrix to deconvolve the cell types of origin inthe plasma cell-free transcriptome
  
  Then, they quantified each marker transcript in the public cfRNA data from healthy patients, and they used them to deconvolve how many of each cell type were present in each sample the cfRNA data.
3. Shruteek 06 Oct 2025
  
  in Public
  
  We used Tabula Sapiens ver-sion 1.0 (TSP)12, a multiple-donor whole-body cell atlas spanning24 tissues and organs, to define a basis matrix whose gene set accu-rately and simultaneously resolved the distinct cell types in TSP.
  
  Then, they took public scRNA-seq data (Tabula Sapiens) from cells with labeled types and used SVM to identify the smallest setof transcripts in those samples which discriminated cells of each type from each other.
4. Shruteek 06 Oct 2025
  
  in Public
  
  Marker genes for blood, brain, and liver cell types were readilydetected, as previously observed at tissue level 1,3,4,6,7, as well as thekidney, gastrointestinal tract, and pancreas (Fig. 1b).
  
  Basically, they took public cfRNA data from healthy patients and identified transcripts. Then they showed that the transcripts they detected included marker transcripts for certain cell types.
Annotators

Shruteek
Local file Local file

RNA liquid biopsy via nanopore sequencing for novel biomarker discovery and cancer early detection

3
1. Shruteek 06 Oct 2025
  
  in Public
  
  Barcoded nanopore sequencing libraries were created using a custom low-input protocol for theOxford Nanopore Technologies SQK-NBD-114.24 and LSK-114 kits. Briefly, cDNA was end-repaired and A-tailed as specified, but incubated for 30 minutes, followed by a 30-minutedeactivation. Barcodes from the SQK-NBD-114.24 kit were incubated at 20 C for 4.5 hours.
  
  No UMIs
2. Shruteek 06 Oct 2025
  
  in Public
  
  Novel transcript discovery
  
  Quote GPT: "Let’s break this down step by step and look carefully at the computational procedure they used, assessing both its strengths and weaknesses.
  
  What They Did Right
  
  Choice of Reference and Annotation
  
  They used hg38 as the reference genome and the GENCODE v39 comprehensive transcript annotation set.
  
  This is good practice, since hg38 is the current gold-standard reference, and GENCODE is the most complete and high-quality human transcript annotation resource.
  
  Tool for Isoform Discovery (IsoQuant v3.3.1)
  
  IsoQuant is designed for long-read transcriptome analysis and is well-suited to Oxford Nanopore data, which has unique error profiles compared to Illumina.
  
  They set -d nanopore, which ensures that IsoQuant applies the correct error model and alignment heuristics.
  
  Quality Control (SQANTI3)
  
  They validated the assembled transcript models with SQANTI3, which is one of the standard tools for assessing structural and quality metrics of novel isoforms (junction accuracy, coverage, splice motif analysis, etc.).
  
  This helps filter out false positives and adds confidence to reported novel isoforms.
  
  Quantification Strategy
  
  Using --gene quantification all and --transcript quantification all ensures they didn’t limit themselves to annotated transcripts, which is important for discovery.
  
  Potential Weaknesses / Flaws
  
  Batch Analysis of All FASTQs Together
  
  They ran IsoQuant on all FASTQs in one batch.
  
  This risks pooling artifacts across samples: rare or sample-specific isoforms may look more convincing when aggregated, but they might not be real if not supported in individual samples.
  
  Ideally, one should assemble transcripts per sample first and then merge, to track sample-specific expression and minimize false discoveries.
  
  Strandedness Setting (-stranded none)
  
  For cfRNA, strandedness is often important to distinguish sense from antisense transcription.
  
  By setting it to "none," they lose orientation information, which could inflate the number of apparent novel transcripts or misclassify antisense transcription as novel.
  
  Loose Matching Strategy (-matching_strategy loose)
  
  This increases sensitivity but at the cost of specificity.
  
  Nanopore reads have higher error rates, and loose matching risks aligning spurious sequences as novel isoforms when they might be alignment artifacts.
  
  Novel Transcript Reporting (--report_novel_unspliced true)
  
  Allowing unspliced transcripts to be reported as novel might overcount pre-mRNA fragments or degraded cfRNA, which are abundant in plasma.
  
  Without careful filtering, many of these could be biologically irrelevant.
  
  SQANTI3 with Default Parameters
  
  While SQANTI is robust, running it with defaults only may not be enough. For cfRNA (which is fragmented and low-abundance), more stringent filters (on junction support, read coverage, canonical splice motifs) are usually needed.
  
  Otherwise, false positives could pass QC.
  
  Overall Reliability
  
  Strengths: They used well-regarded tools (IsoQuant + SQANTI), a proper reference (hg38 + GENCODE), and applied the Nanopore-specific model. That’s a solid foundation.
  
  Weaknesses: They pooled samples, did not use strandedness, allowed loose matching, and reported unspliced transcripts—all of which tend to inflate the number of “novel” transcripts and may misclassify artifacts as biology. For cfRNA, which is already fragmented and noisy, these issues are magnified.
  
  Conclusion
  
  Their method is adequate for generating hypotheses about novel cfRNA transcripts, but the reliability of their discoveries is questionable without stronger filtering and sample-level validation. The key weaknesses (pooling, unstranded analysis, permissive matching, unspliced reporting) suggest their reported “novel transcripts” are likely overestimates.
  
  If I were evaluating this, I’d say:
  
  Good first pass, but high false discovery rate.
  
  They should have assembled per-sample, used stranded protocols if possible, tightened alignment criteria, and reported only spliced, reproducible isoforms across samples."
3. Shruteek 06 Oct 2025
  
  in Public
  
  ptimallambda values for the logistic regression model were identified using 5-fold cross validation, andthe value 1 standard error away from the minimum cross-validated value of lambda was selectedfor final model training. The final model was trained utilizing the entire dataset, and furtherfeature reduction was performed using LASSO regression.
  
  They didn't do any validation or testing on a validation/test set, they just trained, subset features, and reported sens/spec
Annotators

Shruteek
Sep 2025
Local file Local file

Comprehensive analysis of microbial content in whole-genome sequencing samples from The Cancer Genome Atlas project

3
1. Shruteek 16 Sep 2025
  
  in Public
  
  Microbial read counts for thyroid carcinoma tissue samples
2. Shruteek 16 Sep 2025
  
  in Public
  
  Microbial read counts per million for thyroid carcinoma tissue samples
3. Shruteek 16 Sep 2025
  
  in Public
  
  Summary statistics of filtering procedure
Annotators

Shruteek
Local file Local file

The landscape of microbial associations in human cancer

1
1. Shruteek 16 Sep 2025
  
  in Public
  
  We compared 48 cases of Alphapapillomavirus detection in WGSdata against the current gold standard test of mRNA PCR high-risk/tumorigenic subtypes of HPV. The performance using WGS datawas excellent, with only one sample not matching the gold standard(n = 48; sensitivity = 100%, specificity = 97.3%; Fig. 3A). This sam-ple had high HPV burden as detected by WGS and was likely a false-negative result for the PCR-based test.
  
  Somehow alphapapillomavirus leads to head and neck cancer
Annotators

Shruteek
Local file Local file

Intratumoral Bacteria Dysbiosis Is Associated with Human Papillary Thyroid Cancer and Correlated with Oncogenic Signaling Pathways

1
1. Shruteek 11 Sep 2025
  
  in Public
  
  The associations may be based on batch differences
Annotators

Shruteek
Local file Local file

Untitled document

1
1. Shruteek 11 Sep 2025
  
  in Public
  
  Seems like batch differences
Annotators

Shruteek
Local file Local file

Untitled document

3
1. Shruteek 08 Sep 2025
  
  in Public
  
  relative abundance thresholds40,41.
  
  DNA/RNA targets are low-abundance,
2. Shruteek 08 Sep 2025
  
  in Public
  
  while robustly preserving biological features42. Instead, in many cases, key biological featuresare inadvertently removed from datasets, while contaminants and artifacts are kept36
  
  discovery of low-abundance biosignatures
3. Shruteek 08 Sep 2025
  
  in Public
  
  formative diagnostic markers18-22.
  
  predictive disease assessmen
Annotators

Shruteek
Local file Local file

Untitled document

5
1. Shruteek 08 Sep 2025
  
  in Public
  
  sparse1-3, noisy4,5 contain low-abundancefeatures2,6 and come from disparate datasets7,8
  
  However, SoA approaches do not work in many critical applications
2. Shruteek 08 Sep 2025
  
  in Public
  
  directly quantifying information loss/gain by SOA approaches and MIRACLe using well-established information-theoretic, divergence, and distance metrics26,
  
  Third, we will perform preliminary experiments to quantify the actual amount of
3. Shruteek 08 Sep 2025
  
  in Public
  
  currently impossible25 training of predictive AI models on low abundancebiomarkers
  
  However, SoA approaches do not work in many critical applications
4. Shruteek 08 Sep 2025
  
  in Public
  
  rmance of models trained on simulated amplicon-sequencing data withSOA approaches15,28-33
  
  Current SoA pipelines map reads to read counts
5. Shruteek 08 Sep 2025
  
  in Public
  
  SOA11,20
  
  Current SoA pipelines map reads to read counts
Annotators

Shruteek
Local file Local file

Untitled document

1
1. Shruteek 03 Sep 2025
  
  in Public
  
  Long-probe quantitative amplifiedsignal (LQAS) assay master
  
  LQAS is often used as a way to do qPCR that is methylation-specific besides methylation-specific qPCR.
Annotators

Shruteek
Aug 2025
Local file Local file

ppmSeq2

1
1. Shruteek 29 Aug 2025
  
  in Public
  
  Duplex recovery is assessed by the final number of dsDNA bases sequenced (in Gb)and total dsDNA coverage (in genome equivalents) across total bases sequenced
  
  Basically, "duplex recovery" is calculated as: identify every dsDNA read (a read which has at least 1 read from the opposite strand); recovery is "the number of bases on all dsDNA reads divided by the number of bases on all reads". Max is 100%.
Annotators

Shruteek
Local file Local file

An ultrasensitive method for detection of cell-free RNA

1
1. Shruteek 28 Aug 2025
  
  in Public
  
  Quantitative real-time PCR (RT-qPCR) was used for quantification ofcfRNA. An RNA-specific primer was designed to cover a 97-bp ampli-con spanning 2 exon boundaries in the housekeeping gene GAPDH(forward 5′-GATCATCAGCAATGCCTCCT-3′, reverse 5′-TGTGGTCATGAGTCCTTCCA-3′). A DNA-specific primer was designed to covera 78-bp transcriptionally silent region of chromosome 12 (forward5′-TACGGTTGGTCCTTTCTTCG-3′, reverse 5′-TTTCCTTTGGGTCTGAATGC-3′). Reverse transcription was first performed using the High-Capacity cDNA Reverse Transcription kit (Applied Biosystems). qPCRwas then run using 2X Power SYBR Green PCR Master Mix (ThermoFisher Scientific) on an Applied Biosystems 7500 Fast Real-Time PCRor QuantStudio 7 Pro instruments. Universal Human Reference RNA(Thermo Fisher Scientific) was run in parallel to generate a standardcurve, and cfRNA concentrations were calculated by comparing theRNA-specific Ct value of the sample to the standard curve.
  
  They quant cfRNA after extraction but before library prep & capture. They use RT-qPCR of a GAPDH region vs a standard curve made using Universal Human Reference RNA. Therefore their actual RNA quant might be off - it's more like a rough overall expression estimate.
Annotators

Shruteek

Annotators

Annotators

Annotators

Annotators

Annotators

Annotators

Annotators

Annotators

What They Did Right

Potential Weaknesses / Flaws

Overall Reliability

Conclusion

Annotators

Annotators

Annotators

Annotators

Annotators

Annotators

Annotators

Annotators

Annotators

Annotators