  1. Feb 2023
    1. This tool expects large numbers of variant sites in order to achieve decent modeling with the Gaussian mixture model. It's difficult to put a hard number on mimimum requirements because it depends a lot on the quality of the data (clean, well-behaved data requires fewer sites because the clustering tends to be less noisy), but empirically we find that in humans, the procedure tends to work well enough with at least one whole genome or 30 exomes. Anything smaller than that scale is likely to run into difficulties, especially for the indel recalibration.


  2. Jul 2022
    1. 将reads通过STAR比对到参考基因组,筛选出Junction reads(1条read含有两个基因融合断点的read )和Spanning reads (R1,R2比对到不同基因上的reads)作为候选融合基因序列。

      star-fusion 融合支持reads定义

    1. Small variants were called using the previously established GATK protocol [48]. Briefly, raw sequencing reads were trimmed and filtered using the Trimmomatic software [54]. Paired-end reads passing processing were then aligned to the GRCh38 human reference genome using Burrows-Wheeler Aligner, duplicates were marked with Picard, and alignment quality was improved using the Genome Analysis Toolkit [51] local realigner and base quality score recalibrator. Short somatic variants were then called using MuTect2 [55]. The analysis protocol with version information of all tools and details of reference datasets used in analyses have been explained earlier [48]. Following variant calling, variants were annotated with Annovar [56] and variants not passing all MuTect2 filters, falling into intronic and intergenic regions, classified as synonymous or non-frameshift variation, with an ExAC [57], ESP [58], 1KG [59] minor allele frequency higher than 1%, with a variant calling quality less than 40, residing in sites covered by less than 10 reads, with variant allele frequency less than 2% or higher than 30%, and with SNV strand-odd-ratio higher than 3 or indel strand-odd-ratio higher than 11 were removed.

      WES snv过滤可参考

  3. Nov 2021
    1. For example we can identify that, for a given run, whenever we called two A nucleotides in a row, the next base we called had a 1% higher rate of error. So any base call that comes after AA in a read should have its quality score reduced by 1%.


    1. Base quality score recalibration (BQSR): This step modifies the quality scores assigned to individual read bases of the sequence read data. This action removes experimental biases caused by the sequencing methodology.


    1. -K $BWA_K_SIZE \


    2. The --umi_post_process option is used to instruct the tool to perform the necessary post-processing of the consensus reads.


    3. util sort


    4. identify errors raised during sample preparation.


  4. Oct 2021
    1. 最好提供外周血进行对照,这样可以确保检测出的基因变异都是肿瘤细胞特有的。有的公司是通过公共的SNP数据库进行过滤,比如1000g,ExAC_ALL,gnomAD,这些SNP数据其实绝大多数都是外国人的,中国人群自己的SNP数据则是非常少的,如果保证种系变异能够精准的过滤,最好还是提供外周血作一个对照吧!


    2. 比如 ctDNA上样量为 33ng(10000拷贝),文库构建转化率50%,就是5000个拷贝,测序深度增加再深,理论上最多也只能测出5000个拷贝,LOD也不可能达到万分之一的。


    1. 2020年中国癌症新发病例457万例,乳腺癌在全球发病数高居第一,但在中国则在肺癌、结直肠癌、胃癌之后,位居第四。


  5. Jul 2021
  6. May 2021
    1. For example, performing downstream analyses with only 5 PCs does significantly and adversely affect results


  7. Apr 2021
    1. This protocol uses a window of 1500 variants, shifted by 10% for each new round of comparisons, and a threshold of R 2  > 0.2. The window size of 1500 variants corresponds to the large, high LD chromosome 8 inversion, while the shift of 10% represents a trade-off between efficiency and thoroughness

      测试过,pure LD后没有关联位点了。可能是假阳性?

    2. It is necessary to remove rare variants from GWAS because the certainty of the genotype call is reduced by their low minor allele count. Even in common variants, however, genotyping and genotype recalling are subject to technical error, with the result that a proportion of variants and samples are of low quality, and should be removed from the analysis.


    3. For the smallest studies, where fewer than 1000 individuals are investigated, a cut-off of 5% should be considered—this is in line with the analysis program GenAbel, for example, which uses a minor allele count of 5 as its cut-off [ 18 ].


  8. Mar 2021
    1. The expectation is that IBD = 1 for duplicates or monozygotic twins, IBD = 0.5 for first-degree relatives, IBD = 0.25 for second-degree relatives and IBD = 0.125 for third-degree relatives. Due to genotyping error, LD and population structure there is often some variation around these theoretical values and it is typical to remove one individual from each pair with an IBD > 0.1875, which is halfway between the expected IBD for third- and second-degree relatives. For these same reasons an IBD > 0.98 identifies duplicates.


    2. The method works best when only independent SNPs are included in the analysis. To achieve this, regions of extended linkage disequilibrium (LD) (such as the HLA) are entirely removed from the dataset8 and remaining regions are typically pruned so that no pair of SNPs within a given window (say, 50kb) is correlated (typically taken as r2>0.2)


    1. Our primary aim was to generate a cohort large enough to examine the heritability of prognostic therapy outcomes. However, the meta-analysis estimate of SNP heritability was low and non-significant (h2SNP = 0.09, SE = 0.17). A sample size of 2724 has 80% and 99% power to detect a SNP-heritability of 33% and 50%, respectively94. To achieve 80% power to detect a heritability of 20%, a sample of 4500 individuals will be required. A meta-analysis of 2 799 individuals was sufficient to detect a significant heritability estimate for therapy outcome to antidepressant drugs (h2SNP = 0.42, SE = 0.18) and this was the first evidence of a genetic component for treatments outcome of any kind


    2. The meta-analysis sample (n = 2724) had 80% power to detect variants explaining 1.5% of the variance and 42% power to detect variants explaining 1% of the variance. Therefore, it is not especially surprising that we do not detect any variants at genome-wide significance. Typically, GWAS of psychological traits have required tens of thousands of participants to detect SNPs at genome-wide significance
