59 Matching Annotations
  1. Jun 2018
    1. For our data snippet (again, high quality 30X coverage PCR-free 2x150 WGS), the effects of removing indel realignment are subtle. However, for data that deviate towards lower quality and for data that benefit greatly from BQSR, the effects could be more substantial. Given this, we ask researchers who are considering our new workflows to take into account properties of their data and consider the impact omitting indel realignment may have for their analyses. A first step towards this could be to examine the MAPQ distribution of the primary alignments.

      To use IndelRealign if data quality is questionable, e.g., FFPE samples, libs with PCR-amp, low pass seq.

    2. I’m told these influences should be subtle for high quality data but make a difference for lower quality data

      SHould get update on this from GATK team.

      High quality data = ? (>75bp read length, 30x plus, PCR-free lib), and MAPQ > 20

    3. Any discernible patterns from misaligned reads around indels will be small compared to that of flowcell failures

      esp. for high quality data (>75bp read length, 30x plus, PCR-free lib), and MAPQ > 20

  2. Nov 2017
    1. wgs_hg19_125_cancer_blood_normal_panel.vcf. This PON was created with 125 whole genome samples derived using 2012 technology. The sample libraries were from the blood normal tissue of cancer patients. We do NOT use matched normal tissue samples, as matched normal tissue samples can be contaminated with tumor/pre-tumor tissue, as they are typically derived from tissue adjacent to the tumor. The samples are deep coverage samples, ~30x, aligned to hg19. The libraries were paired end and approximately ~101 bp reads (2x101).

      summary of PON creation by BROAD

    2. you should at the least use samples prepared in the same manner (prep and tool-chain) as your samples and sequenced by the same center as your samples. The PON is meant to capture sequencing artifacts that may be different for different centers/sample-prep/tool-chains, and so matching its constituents closely to the provenance of your samples is ideal.


      PON creation should be tied to sample origin, prep and seq center to maximize capturing of sequencing artifacts.

    3. See this document for how to create your PON. Be sure to search our forum also if you have questions as there are multiple threads that help people with their PON creation.

      PON creation is two-step approach using mutect2.

      PS: Mutect2 can call tumor-only variants but it is not a supported feature yet. Read mutect2 documentation on future support for tumor-only variant calls.

    4. We do NOT use matched normal tissue samples, as matched normal tissue samples can be contaminated with tumor/pre-tumor tissue

      Exclude matched normals and prefer blood derived normals to avoid tumor contamination with the former.

  3. Oct 2017
    1. lowest MAPQ of reads indel realignment then increases is at MAPQ26 and in fact over 94% of increases are for MAPQ60 reads.


      Good to get summary stats for indel realigner step and check lowest MAPQ of reads realigned by indel realigner step. It is likely to be safe if lowest MAPQ is >20 for majority of realigned reads as those reads will be included anyways by HC (see table above)


      Where to get this stats?

      I can see target_intervals.list from realign step and recalibrated score summary from BQSR step under recal.data.table. How should I interpret these files and check if majority of indel realigned reads have MAPQ of >20 or other way around?

      Read Changes to alignment records under https://software.broadinstitute.org/gatk/documentation/article.php?id=7156 can get OC tag to extract realigned read intervals and then extract those reads from both original bam and realigned bam to check change in MAPQ

      To subset realigned reads only into a valid BAM, as shown in the screenshots, use samtools view 7088_snippet_indelrealigner.bam | grep 'OC' | cut -f1 | sort > 7088_OC.txt to create a list of readnames. Then, follow direction in blogpost SAM flags down a boat on how to create a valid BAM using FilterSamReads.

    2. although they do not impact genotyping any more than uninformative reads, they may have a disproportionate impact on metrics that use exclusively informative reads

      From https://www.broadinstitute.org/gatk/guide/article?id=6005 :

      We call a read “uninformative” when it passes the quality filters, but the likelihood of the most likely allele given the read is not significantly larger than the likelihood of the second most likely allele given the read. Specifically, the difference between the Phred scaled likelihoods must be greater than 0.2 (Phred-scaled, \(0.2 = 10^{0.2} = 1.585\) or ~60% more quality score than that of alternate allele) to be considered significant. In other words, that means the most likely allele must be 60% more likely than the second most likely allele.

      If a read is considered informative, it gets counted toward the AD and DP of the variant allele in the output record. If a read is considered uninformative, it is counted towards the DP, but not the AD. That way, the AD value reflects how many reads actually contributed support for a given allele at the site.


      Does this impact SNP calls too covered by these added reads from indel realigner step?

    3. altered annotation metrics (QD, FS, SOR, MQ & MQRankSum for the snippet) impact cohort level annotation metrics, the variant quality score recalibration (VQSR) model and the resulting VQSLOD score for the site. These changes to the distribution of the variant scores impact variant site filtering


      Potential impact of including indel realigner in downstream vcf annotations and filtering.

    4. HaplotypeCaller reassembles and calls this insertion with or without indel realignment

      Here, HC detects >30bp insertion with or without prior use of indel realigner. So, it should be ok for HC and MuTect2 inputs to use legacy bam files which were realigned using GATK indel realignment best practices.

      However, see the paragraph further down on how prior indel realignment may impact HC and MuTect2 efficiency, esp. for low quality data.

    5. Indel realigned reads that cross above HaplotypeCaller's MAPQ threshold alter the allelic and total depth counts for a variant site and can affect genotyping and genotype probability and quality scores as well as other annotation metrics.


      Impact of indel realinged passed reads on HC called variants.

    6. HaplotypeCaller gains previously MAPQ10–19 reads now over the default MAPQ20 threshold of the HCMappingQualityFilter.

      Use of indel realigner: Inclusion of additional (realigned) reads at BQSR step when updated MAPQ => 20.

    7. HaplotypeCaller and FreeBayes,

      MuTect v2 (~HaplotypeCaller) and SpeedSeq SNV (~ FreeBayes) are among the well tested variant callers.

    8. Longer short reads, deeper coverage and increased library complexity with PCR-free preparations give us higher quality sequence data.

      >75 bp reads, >30x coverage and PCR-free library, bwa-mem aligned bam ~= high QC data

      Validate using QC tools on both fasta and aligned bam file

    9. We still recommend indel realignment for these legacy workflows

      For MuTect v1

  4. Jul 2017
    1. some “padding” to the intervals in order to include the flanking regions (typically ~100 bp). No need to modify your target list; you can have the GATK engine do it for you automatically using the interval padding argument.

      use padding

    2. This excludes off-target sequences and sequences that may be poorly mapped, which have a higher error rate. Including them could lead to a skewed model and bad recalibration.

      Important to specify regions based on good mappability and avoid over or under correction of base qc scores.

    3. There are even some analyses which should be restricted to the capture targets because failing to do so can lead to suboptimal results.

      Useful to provide -L flag of captured regions

    4. you could also provide a list of "bad" intervals with -XL, which does the exact opposite of -L:

      Useful for blacklist regions

    1. Unless your database of variation is so poor and/or variation so common in your organism that most of your mismatches are real snps, you should always perform recalibration on your bam file.

      Important for non-human species alignments

  5. Apr 2017
    1. Custom formula is: =MOD(INT((ROW(1:1)-1)/4),2)=0    Color: Gray   Range:  A1:Z1000 (or whole sheet...Ctrl+A)

      This works

    1. while those that had been moved to 25 degrees had largely lost the methylation tags. Importantly, they still maintained this reduced histone methylation when moved back to the cooler temperature, suggesting that it is playing an important role in locking the memory into the transgenes.

      Could it be that the genetic background (or importantly surrounding environment or cross talk among cells) in the first generation that were exposed to 25 C temp might have changed, and conditioned subsequent generations?

      After 14 generations, secondary changes in the propoagated genetic makeup (including cross talk) could have reverted phenotype and reset histone marks to repress gene expression.

      See if how often c.elegans mutate or show CN changes, i.e., background vs stress induced by temp changes?

    2. Worms are very short-lived, so perhaps they are transmitting memories of past conditions to help their descendants predict what their environment might be like in the future,

      ? as long as cost, e.g., energy consumption for propagating such beneficial features is not higher than needed for one's survival.

    1. cd /usr/local/RepeatMasker perl ./configure

      Make sure to configure RepeatMasker to use RMBlast and TRF which were installed above.

    2. human, mouse, zebrafish, fruit fly, and nematode

      For other organisms, use RepeatMasker_open_407 (or latest version), unpack and read INSTALL file for installing custom libraries.

    3. RepeatMasker was developed using TRF version 4.0.4

      Downloaded v4.0.9, Linux command line (legacy GLIBC, <= 2.12)


    4. For RMBlast ( NCBI Blast modified for use with RepeatMasker/RepeatModeler )

      Used RMBlast pre-compiled binaries provided by NCBI;

      Previous Release: 2.2.28

      Download Pre-compiled Package: Download both the BLAST+ and RMBlast packages from NCBI for your platform:

      1. RMBlast Binaries:ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28

      2. BLAST+ Binaries:ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/

      Extract both tarballs, and symlink or copy rmblastn RMBlast to blast/bin/ location, so that all of binaries are in once-place.

    1. The Administration could also exercise its regulatory authority—most potently, to direct the Centers for Medicare and Medicaid Services (CMS) to allow reimbursement for molecular profiling of cancers

      Perhaps the most important measure to keep precision medicine initiate alive. Surge in risk and treatment response prediction in genomic assays is of little value without practical means of affordable molecular profiling of a patient's tumor or more importantly, pre-diagnosis genomic screen.

    1. assuming the rate of false positives and false negatives are the same for all mutations

      not really but acceptable, and results can be filtered in downstream analysis.

    2. cells that are hetero- or homozygous for a given mutation.

      Important limitation but can be run on two separate instances by splitting case cohort based on validated gene function loss/gain based on single/double copy loss.

    3. While it seems like such multiple mutations should be unlikely, computational biologist Niko Beerenwinkel of ETH Zurich in Basel, Switzerland, recently posted a preprint to bioRxiv suggesting that recurrent mutations can and do occur, and not infrequently (bioRxiv, doi:10.1101/094722, 2016).

      Fig 1 of preprint: mutated allele may get lost due to LOH event or less likely due to recurrent mutation, and thus reverting to the original base.

      In the former event, SNV could be a preceding dominant negative loss-of-function event which may follow LOH event to remove the beneficial trait (SNV itself). This makes sense for a haploinsufficient gene. However, that's not a strictly speaking a recurrent mutation event as the latter is a copy number level and not mutation change.

    1. So I quit. While my wife worked, I stayed home with our daughter Kriya, who was born in 1993, and became a house husband. I really didn't know what I wanted to do

      Quit Bell labs in 1994 after ~ 6 years of postdoc/staff position.

      From an article in nytimes,

      Astonishingly, a couple of months after quitting, an insight came to me about how to make the microscope finally work. It came while I was pushing my child’s stroller. The idea involved isolating individual molecules and measuring their distance. I wrote this up in a three-page paper, which would later be noted by the Nobel Committee as one reason for giving me the prize.

      Funny thing about that paper[1]: It wasn’t much cited, probably only a hundred times in 20 years. That tells you something about the value of citations as a metric of impact.

      For the next eight years, I worked in private industry, and I discovered it was even harder to succeed there than in science.

      1. https://www.ncbi.nlm.nih.gov/pubmed/19859146
    2. I felt like every good result I had provided justification for a hundred lousy papers to follow, and that was a waste of people's time and taxpayers' money.

      anticipated sheep followers with the high-impact research, and accompanying stagnation in advancement of field of interest.

      know when to move away from the fad and follow the trend

    3. I tried the technology everywhere I could think of. Sometimes it worked and sometimes it didn't, but the papers came quick. In 1992, we applied it to data storage - at one time we held the world record for storage density - and in the following year I demonstrated super-resolution fluorescence imaging of cells for the first time.

      beginning of successful applications in various fields

    4. I would come into work at 4:30 in the morning, and if I saw Harald's car, I would put my hand on the hood to find out if the engine was still warm. He did exactly the same thing. We were both really competitive, but we played tennis every morning and ate dinner together every night. We were best friends and still are.

      compete but be friendly

    5. I'm also lucky in that I have a second chance to be a better husband and father. While I'm close with Kriya and Ravi, one of my regrets is that I didn't spend more time with them when they were growing up

      importance of keeping family and social life tied together to the best one can.

    6. My group at Janelia has never been larger than five postdocs, and has averaged three

      Small labs can be fun and focused to work with too!

    7. Everywhere I've been, I've been able to focus 100% on my work - I've never written a grant in my life. I doubt I would have been as successful in a more traditional academic career path.

      Not getting a grant is not the end of the world. Though important to be hard-working and focused on the question that is most compeling and where one has domain expertise or someting that can be gained on short-term (months and not years) basis.

    8. That brought me back full circle to the optical lattice theory I had published in 2005.

      Not loosing the insight gained from the previous domain expertise

    9. The field was getting crowded, and I've always found it most productive to go where the people aren't. It was time to do something new.

      Importance of knowing when field gets crowded, and much of crap supercede innovation in the field.

    10. Marty had told Gerry that I was interested in that "biological Bell Labs," HHMI's new Janelia Farm Research Campus. The campus wasn't built yet, but I was invited to interview in a little building off site in August, and was on the payroll in October 2005.

      All set for HHMI yet-to-be built Janelia campus, and finally back to the labs!

    11. By early 2006, we had 20 nanometer resolution images of actin filaments, focal adhesions, mitochondria, and lysosomes. We submitted the work to Science in March, and it was published that August, after a lengthy fight with a reviewer who demanded correlative EM data, and then pushed for rejection even after we supplied it.

      Efforts paid off ~46 years of age and been unemployed or so called non-standard career spanning a decade or more!

      PS: Ignore comments of the third reviewer :-)

    12. Harald and I built the first PALM microscope in his living room in La Jolla (Figure 6). We were both unemployed, but Harald had some of his equipment from Bell. We pulled that out of storage, and each put in $25,000 to cover everything else we needed. We worked hard, and in September shipped all the parts to rebuild the microscope in the darkroom of Jennifer's lab at the NIH. The first time we put a cover slip coated with molecules into the microscope and turned on the photoactivating light, the first subset popped up and we knew we had it.

      Unemployed for a decent time but had savings!

    13. Harald and I didn't know any biology, so we needed help.

      Reaching out to experts where you do not have deeper domain knowledge

    14. Mike had one of the biggest libraries of fluorescent protein fusions in the world, and that's where we learned about photoactivatable fluorescent proteins. In the Tallahassee airport on our way home, it became obvious to Harald and me that this was the missing link for the idea that I had pitched after I left Bell

      Importance of networking and collaborations

    15. He invited me to present the idea to the biology department there in April 2005. Marty Chalfie was one of my hosts during that visit, and he turned to me in the cab on the way to dinner and said, "It sounds like you really believe in this idea. How are you going to get back in the lab?" I said, "I have no idea, but I read in Physics Today that there's a guy named Gerry Rubin who wants to make a biological Bell Labs," and we left it at that.

      year 2005 (age 45): firm idea that works but finding a lab where he can work on this idea

    16. you know something so well that you love it and hate it and it's part of you. Within three months, I grokked diffraction and light and formation of foci.

      tenacity and perseverance

    17. I wanted to take advantage of GFP to do live cell imaging, but my physics knowledge had atrophied. So I pulled out my old textbooks and started redoing old homework problems. I was really motivated to understand it this time around, because I figured this was my last chance to make a scientific career.

      never late to study and learn a new skill

    18. I started reading the scientific literature again, and quickly came across Marty Chalfie's paper on green fluorescent protein, which he had published in 1994 as I was leaving Bell. It was like a religious revelation to me.

      Importance of staying up with litreature, especially across different but related fields.

    19. I reconnected with Harald, who had also gone into industry in San Diego, but wasn't completely satisfied.

      getting back with the old and trusted friend

    20. So in 2002 I quit. That was probably the hardest part of my life. I had pissed away my academic career and I had pissed away my backup plan of working for my Dad. Once again, I didn't have any idea what I wanted to do. Fortunately, with money in the bank, I had some time to think of a solution.

      Quit again in 2002 from industry on the top of dried up previous academic ties.

    21. Michigan in 1997 and set down roots

      joined Dad's company in 1997: spent significant amount of money over research project (FAST) which earned hardly anything in revenues.

    22. When Kriya was three, and started speaking with a Jersey accent, I knew we had to get out of New Jersey.

      :-) wonder which Jersey accent: the original or one with the Indian flavour!

    23. During my third year, I stopped just trying one thing after another, and started thinking like a physicist about why things weren't working.

      important to look at the question from different perspective, and reevaluate the failing experiments

    24. Two years in, I wrote in my self-evaluation that if I didn't have a breakthrough in the next year, they wouldn't need to fire me because I would quit.

      setbacks with new project

    25. I knew next to nothing about semiconductor physics, but Horst thought near-field was a really cool idea and it could go places. He wanted to hire me, even though what I was doing was completely outside of what everyone else in the department was doing. Except for one guy - Harald Hess, who I also met during that visit. Harald had built a low temperature scanning tunneling microscope to study superconductors, and he and I hit it off immediately.

      new domain; joint work with Harald Hess

    26. I built this crazy, elaborate, expensive microscope that kind of worked (Figure 2). I never looked at much beyond test patterns, but it was enough to prove that the idea was valid.

      phd thesis ~1988

  6. Mar 2017
    1. the vulnerability of cancer cells that have lost one copy of an essential gene might create a therapeutic window

      genes showing haploinsufficiency trait might be more susceptible to further decrease in their gene expression (and resulting protein activity).

      1. https://www.ncbi.nlm.nih.gov/pubmed/22628553
      2. http://www.cell.com/abstract/S0092-8674(13)01287-7