21 Matching Annotations
  1. Feb 2023
    1. This tool expects large numbers of variant sites in order to achieve decent modeling with the Gaussian mixture model. It's difficult to put a hard number on mimimum requirements because it depends a lot on the quality of the data (clean, well-behaved data requires fewer sites because the clustering tends to be less noisy), but empirically we find that in humans, the procedure tends to work well enough with at least one whole genome or 30 exomes. Anything smaller than that scale is likely to run into difficulties, especially for the indel recalibration.

      VQSR需要WGS或者30个WES数据才能进行

  2. Jul 2022
    1. 将reads通过STAR比对到参考基因组,筛选出Junction reads(1条read含有两个基因融合断点的read )和Spanning reads (R1,R2比对到不同基因上的reads)作为候选融合基因序列。

      star-fusion 融合支持reads定义

    1. Small variants were called using the previously established GATK protocol [48]. Briefly, raw sequencing reads were trimmed and filtered using the Trimmomatic software [54]. Paired-end reads passing processing were then aligned to the GRCh38 human reference genome using Burrows-Wheeler Aligner, duplicates were marked with Picard, and alignment quality was improved using the Genome Analysis Toolkit [51] local realigner and base quality score recalibrator. Short somatic variants were then called using MuTect2 [55]. The analysis protocol with version information of all tools and details of reference datasets used in analyses have been explained earlier [48]. Following variant calling, variants were annotated with Annovar [56] and variants not passing all MuTect2 filters, falling into intronic and intergenic regions, classified as synonymous or non-frameshift variation, with an ExAC [57], ESP [58], 1KG [59] minor allele frequency higher than 1%, with a variant calling quality less than 40, residing in sites covered by less than 10 reads, with variant allele frequency less than 2% or higher than 30%, and with SNV strand-odd-ratio higher than 3 or indel strand-odd-ratio higher than 11 were removed.

      WES snv过滤可参考

  3. Nov 2021
    1. For example we can identify that, for a given run, whenever we called two A nucleotides in a row, the next base we called had a 1% higher rate of error. So any base call that comes after AA in a read should have its quality score reduced by 1%.

      连续序列后的碱基更容易出错

    1. Base quality score recalibration (BQSR): This step modifies the quality scores assigned to individual read bases of the sequence read data. This action removes experimental biases caused by the sequencing methodology.

      UMI数据是否要进行此步骤?

    1. -K $BWA_K_SIZE \

      这个不设置的话默认按照所给线程数自动设置。为确保不同线程下结果一致,官方建议设置为10000000(https://support.sentieon.com/manual/DNAseq_usage/dnaseq/#map-reads-to-reference)。然而真设了10000000跑出来结果一条read都没比对上。。。具体原因待查。

    2. The --umi_post_process option is used to instruct the tool to perform the necessary post-processing of the consensus reads.

      具体做了哪些操作官方文档也没写。可以问一下。

    3. util sort

      此工具支持多线程,官方推荐与bwa设置相同线程数

    4. identify errors raised during sample preparation.

      通过比较双链来检查样本准备过程引入的错误

  4. Oct 2021
    1. 最好提供外周血进行对照,这样可以确保检测出的基因变异都是肿瘤细胞特有的。有的公司是通过公共的SNP数据库进行过滤,比如1000g,ExAC_ALL,gnomAD,这些SNP数据其实绝大多数都是外国人的,中国人群自己的SNP数据则是非常少的,如果保证种系变异能够精准的过滤,最好还是提供外周血作一个对照吧!

      计算TMB最好提供对照,以肿瘤突变减去对照突变,即为体细胞突变,比去筛外国人数据库要准。

    2. 比如 ctDNA上样量为 33ng(10000拷贝),文库构建转化率50%,就是5000个拷贝,测序深度增加再深,理论上最多也只能测出5000个拷贝,LOD也不可能达到万分之一的。

      检出限的根本限制因素,DNA总量与文库转化率。33ng的DNA可以获得约10000拷贝,文库转化率50%,剩余5000个拷贝,靠增加测序深度最多也只能测到5000个拷贝,LOD万分之二,无法达到万分之一。

    1. 2020年中国癌症新发病例457万例,乳腺癌在全球发病数高居第一,但在中国则在肺癌、结直肠癌、胃癌之后,位居第四。

      按14亿人口算,癌症发病率0.326%。第七次人口普查,我国60岁及以上人口有2.6亿人,其中,65岁及以上人口1.9亿人。

  5. Jul 2021
    1. 实际上,将黄元吉丹道思想归入中派更重要的理由即是他的“玄关”理论,黄元吉全部丹诀即以“玄关一窍”为轴心。在虚寂杳冥的玄关态,非阴非阳,即阴即阳,非虚非实,即虚即实,这样的中道零点位正是诸子百家共同奉守之圭臬。此外,黄元吉在《乐育堂语录》中指陈了千古丹经所未发的日用伦常之间的阳生活子时,这种居尘出尘,身在尘世而不被尘世所蔽,不离现实生活而超越现实生活,举凡无不恰到好处、随心所欲而无不中规中矩,在行住坐卧之间无处不阳生、无时不阳生,此岸即是彼岸。这样的丹道阳生观正折射了中国文化的核心精神气质,依时而中,依中而行。阳生活子时本来是丹道景象,黄元吉将阳生与“孔颜之乐”、贞女烈士之舍生取义、私欲褪尽之灵光独耀融会贯通,这一思想颠覆了认为只有深山静坐才能触发阳生的狭隘丹道观念,其意义在丹道史上、乃至中国思想史上皆是非凡的。

      秒啊!秒!

  6. May 2021
    1. For example, performing downstream analyses with only 5 PCs does significantly and adversely affect results

      这个要注意,居然不能只用5个PC

  7. Apr 2021
    1. This protocol uses a window of 1500 variants, shifted by 10% for each new round of comparisons, and a threshold of R 2  > 0.2. The window size of 1500 variants corresponds to the large, high LD chromosome 8 inversion, while the shift of 10% represents a trade-off between efficiency and thoroughness

      测试过,pure LD后没有关联位点了。可能是假阳性?

    2. It is necessary to remove rare variants from GWAS because the certainty of the genotype call is reduced by their low minor allele count. Even in common variants, however, genotyping and genotype recalling are subject to technical error, with the result that a proportion of variants and samples are of low quality, and should be removed from the analysis.

      稀有变异的检出率不是很可靠?

    3. For the smallest studies, where fewer than 1000 individuals are investigated, a cut-off of 5% should be considered—this is in line with the analysis program GenAbel, for example, which uses a minor allele count of 5 as its cut-off [ 18 ].

      1000个样本以下推荐5%的maf,有空要测试一下

  8. Mar 2021
    1. The expectation is that IBD = 1 for duplicates or monozygotic twins, IBD = 0.5 for first-degree relatives, IBD = 0.25 for second-degree relatives and IBD = 0.125 for third-degree relatives. Due to genotyping error, LD and population structure there is often some variation around these theoretical values and it is typical to remove one individual from each pair with an IBD > 0.1875, which is halfway between the expected IBD for third- and second-degree relatives. For these same reasons an IBD > 0.98 identifies duplicates.

      IBD过滤标准.IBD即plink结果中的PI_HAT。

    2. The method works best when only independent SNPs are included in the analysis. To achieve this, regions of extended linkage disequilibrium (LD) (such as the HLA) are entirely removed from the dataset8 and remaining regions are typically pruned so that no pair of SNPs within a given window (say, 50kb) is correlated (typically taken as r2>0.2)

      LD过滤,但实际发现过滤后位点很少,关联效应很弱

    1. Our primary aim was to generate a cohort large enough to examine the heritability of prognostic therapy outcomes. However, the meta-analysis estimate of SNP heritability was low and non-significant (h2SNP = 0.09, SE = 0.17). A sample size of 2724 has 80% and 99% power to detect a SNP-heritability of 33% and 50%, respectively94. To achieve 80% power to detect a heritability of 20%, a sample of 4500 individuals will be required. A meta-analysis of 2 799 individuals was sufficient to detect a significant heritability estimate for therapy outcome to antidepressant drugs (h2SNP = 0.42, SE = 0.18) and this was the first evidence of a genetic component for treatments outcome of any kind

      统计方法值得学习

    2. The meta-analysis sample (n = 2724) had 80% power to detect variants explaining 1.5% of the variance and 42% power to detect variants explaining 1% of the variance. Therefore, it is not especially surprising that we do not detect any variants at genome-wide significance. Typically, GWAS of psychological traits have required tens of thousands of participants to detect SNPs at genome-wide significance

      心理疾病GWAS研究艰难,2724个样本的meta分析结果还是阴性