- Dec 2017
royalsocietypublishing.org royalsocietypublishing.org
The natural selection of bad science
www.sciencedirect.com www.sciencedirect.com
Transcription factor–DNA binding: beyond binding site motifs
alleledb.gersteinlab.org alleledb.gersteinlab.orgAlleleDB1
AlleleDB is a repository, providing genomic annotation of cis-regulatory single nucleotide variants (SNVs) associated with allele-specific binding (ASB) and expression (ASE).
- Nov 2017
www.rittmanmead.com www.rittmanmead.com
Linux cluster sysadmin -- OS metric monitoring with colmux
SSH keys, pdsh distributed execution, collectl and colmux monitoring
stackoverflow.com stackoverflow.com
find . -type d \( -path dir1 -o -path dir2 -o -path dir3 \) -prune -o -print
rsanderlin.com rsanderlin.com
BASH: Create User Accounts with Random Password
www.cyberciti.biz www.cyberciti.biz
How to Change a USER and GROUP ID on Linux For All Owned Files
stackoverflow.com stackoverflow.com
while IFS='=' read -r col1 col2 do echo "$col1" echo "$col2" done <testprop.properties
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
In silico modeling predicts drug sensitivity of patient-derived cancer cells
www.lrjournal.com www.lrjournal.com
what about missense mutations of unknown significance?
Computational drug treatment simulations on projections of dysregulated protein networks derived from the myelodysplastic mutanome match clinical response in patients
www.gnu.org www.gnu.org
Finding Files
with find
forums.plex.tv forums.plex.tv
find /volume1/Movies /volume1/Music /volume1/Photos "/volume1/Home Videos" "/volume1/Music Videos" -type d -exec chmod 755 {} \;
recursively changing permissions with find, specifically for directories versus files
rafalab.github.io rafalab.github.ioharvardx1
HarvardX Biomedical Data Science Open Online Training
www.genome.gov www.genome.gov
Introduction to Population Genetics
epigeneticsandchromatin.biomedcentral.com epigeneticsandchromatin.biomedcentral.com
Genome-wide methylation data mirror ancestry information
www.nature.com www.nature.com
Two independent modes of chromatin organization revealed by cohesin removal
f1000research.com f1000research.com
RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR [version 2; referees: 3 approved]
genomebiology.biomedcentral.com genomebiology.biomedcentral.com
Matthews correlation coefficient
novel method developed within the MAQC-III project utilizing the expression distributions, corrected for noise and batch effects, and assisted by random resampling, to compute DEG scores related to the Wilcoxon U test (Magic, see Additional file 1: Supplementary Note 2)
genome.cshlp.org genome.cshlp.org
These results suggest that deep sequencing is necessary for accurate determination of the expression level of genes
or better quantification methods
academic.oup.com academic.oup.com
EGR2 peaks overlapped with a SOX10 peak when allowing separation distance as large as 1000 bp and 11.09% of the SOX10 peaks overlapped with an EGR2 peak with the same separation distance
Using 40 sets of randomized peak sequences, the occurrence of the motif never exceeded 74%
MOSAiCS implements a model-based approach where the background distribution for unbound regions take into account systematic biases such as mappability and GC content and the peak regions are described with a two component Negative Binomial mixture model
www.sciencedirect.com www.sciencedirect.com
pairwise overlaps using Fisher’s test and mutual exclusion (Leiserson et al., 2016xA weighted exact test for mutually exclusive mutations in cancer. Leiserson, M.D.M., Reyna, M.A., and Raphael, B.J. Bioinformatics. 2016; 32: i736–i745Crossref | PubMed | Scopus (4)See all ReferencesLeiserson et al., 2016)
CRISPR screening has emerged as a powerful method for identifying critical functional dependencies in vitro (Koike-Yusa et al., 2014xGenome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library. Koike-Yusa, H., Li, Y., Tan, E.-P., Velasco-Herrera, Mdel.C., and Yusa, K. Nat. Biotechnol. 2014; 32: 267–273Crossref | PubMed | Scopus (285)See all References, Shalem et al., 2014xGenome-scale CRISPR-Cas9 knockout screening in human cells. Shalem, O., Sanjana, N.E., Hartenian, E., Shi, X., Scott, D.A., Mikkelson, T., Heckl, D., Ebert, B.L., Root, D.E., Doench, J.G., and Zhang, F. Science. 2014; 343: 84–87Crossref | PubMed | Scopus (936)See all References)
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
EXCAVATOR: detecting copy number variants from whole-exome sequencing data
www.nature.com www.nature.com
plasma cells
B cells
polymorphonuclear (PMN) cell39
we determined the number of histologies needed to identify genes with maximal prognostic power
All microarray studies in PRECOG were consistently normalized and pre-processed
no RNA-seq
CIBERSORT, a computational approach for inferring leukocyte representation in bulk tumor transcriptomes
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
Robust enumeration of cell subsets from tissue expression profiles
bmcbioinformatics.biomedcentral.com bmcbioinformatics.biomedcentral.com
AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
Clone 1 is the founding clone; 12.74% of the tumour cells contain only this set of mutations
derivation unclear; not provided in supplemental information
images.nature.com images.nature.com
Clonal evolution in relapsed acute myeloid leukemia revealed by whole genome sequencing
Comparison of SNVs detected in the whole genome sequencing data with SNPs genotyped using arrays
inference of "%diploid coverage"
gmt.genome.wustl.edu gmt.genome.wustl.edu
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
tier 1 contains all changes in the amino acid coding regions of annotated exons, consensus splice-site regions, and RNA genes (including microRNA genes). Tier 2 contains changes in highly conserved regions of the genome or regions that have regulatory potential. Tier 3 contains mutations in the nonrepetitive part of the genome that does not meet tier 2 criteria, and tier 4 contains mutations in the remainder of the genome
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
Gene Set Enrichment Analysis Made Simple
using aggregate t or chi^2 statistic to test if a set of genes is on aggregate differentially expressed
www.laptop-lcd-screen.co.uk www.laptop-lcd-screen.co.uk
UK Based LCD Screen Replacement Experts
onlinelibrary.wiley.com onlinelibrary.wiley.com
The background puzzle: how identical mutations in the same gene lead to different disease symptoms
www.larkinweb.co.uk www.larkinweb.co.uk
Mounting file systems over two SSH hops
www.nature.com www.nature.com
To stay young, kill zombie cells
- Oct 2017
genomebiology.biomedcentral.com genomebiology.biomedcentral.com
Comparison of RNA-seq and microarray-based models for clinical endpoint prediction
genome.cshlp.org genome.cshlp.org
RNA-sequence analysis of human B-cells
academic.oup.com academic.oup.com
Genome-wide analysis of EGR2/SOX10 binding in myelinating peripheral nerve
www.sciencedirect.com www.sciencedirect.com
Genetic and Functional Drivers of Diffuse Large B Cell Lymphoma
www.nature.com www.nature.com
The prognostic landscape of genes and infiltrating immune cells across human cancers
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
RNA sequencing of cancer reveals novel splicing alterations
www.nature.com www.nature.com
Chromatin H3K27me3/H3K4me3 histone marks define gene sets in high-grade serous ovarian cancer that distinguish malignant, tumour-sustaining and chemo-resistant ovarian tumour cells
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing
genomebiology.biomedcentral.com genomebiology.biomedcentral.com
Using their expression data and the same fold-change categories, we investigated the influence of both affinity and cooperative effects based on GraphProt predictions of Ago2 binding sites in comparison to the available CLIP-seq data.
Could do the same since expression microarray data are available, but they show complete lack of differential expression when over-expressing our proteins of interest.
allows the evaluation of putative binding sites with a meaningful score that reflects the biological functionality
score = prediction margin Part of standard GraphProt output?
Prediction margins
Part of standard GraphProt output?
TIA-1 has been described as an ARE-binding protein and binds both U-rich and AU-rich elements.
logos are a mere visualization aid and do not represent the full extent of the information captured by GraphProt models
tenfold cross-validation technique
How do AUROCs look for our proteins of interest compared to the AUROCs for the iCLIP'ed proteins in Additional File 2?
The following describes a typical biological application of computational target detection. A published CLIP-seq experiment for a protein of interest is available for kidney cells, but the targets of that protein are required for liver cells. The original CLIP-seq targets may have missed many correct targets due to differential expression in the two tissues and the costs for a second CLIP-seq experiment in liver cells may not be within the budget or the experiment is otherwise not possible. We provide a solution that uses an accurate protein-binding model from the kidney CLIP-seq data, which can be used to identify potential targets in the entire transcriptome. Transcripts targeted in liver cells can be identified with improved specificity when target prediction is combined with tissue-specific transcript expression data.
use case
Peak detection leads to high-fidelity binding sites; however, it again increases the number of false negatives. Therefore, to complete the RBP interactome, computational discovery of missing binding sites is essential.
iCLIP data are not comprehensive
GraphProt: modeling binding preferences of RNA-binding proteins
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
Single-Cell Analysis of Human Pancreas Reveals Transcriptional Signatures of Aging and Somatic Mutation Patterns
www.purplemath.com www.purplemath.com
Solving Quadratic Inequalities
www.restore.ac.uk www.restore.ac.uk
An Introduction to Odds, Odds Ratios and Exponents
www.biostars.org www.biostars.org
What Tools/Libraries Do You Use To Visualize Genomic Feature Data?
discussion of tools; recently revived
asciigenome.readthedocs.io asciigenome.readthedocs.io
ASCIIGenome is a command-line genome browser running from terminal window
www.sanger.ac.uk www.sanger.ac.uk
Artemis is a free genome browser and annotation tool that allows visualisation of sequence features, next generation data and the results of analyses within the context of the sequence, and also its six-frame translation
github.com github.com
Genome Maps is a modern and high-performance web-based HTML5 genome browser
liorpachter.wordpress.com liorpachter.wordpress.com
How not to perform a differential expression analysis (or science)
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
Salmon provides fast and bias-aware quantification of transcript expression
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
Analyzing Copy Number Variation using SNP Array Data: Protocols for Calling CNV and Association Tests
bmcmedgenomics.biomedcentral.com bmcmedgenomics.biomedcentral.com
CNVassoc: Association analysis of CNV data using R
zzz.bwh.harvard.edu zzz.bwh.harvard.eduPLINK1
Rare copy number variant (CNV) data
journals.plos.org journals.plos.org
A New Method for Detecting Associations with Rare Copy-Number Variants
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
Chromatin marks and ambient temperature-dependent flowering strike up a novel liaison
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
Intragenic Enhancers Attenuate Host Gene Expression
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
A genome-wide analysis of loss of heterozygosity and chromosomal copy number variation in Proteus syndrome using high-density SNP microarrays
support.illumina.com support.illumina.com
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
CG dinucleotide suppression enables antiviral defence targeting non-self RNA
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
argyle: An R Package for Analysis of Illumina Genotyping Arrays
www.biorxiv.org www.biorxiv.org
equal to the frequency of the higher expressed eQTL allele in the population
should be equal to product of frequency of high eQTL allele and major coding allele, though the latter will be close to 1 for the rare coding mutations studied here
211,575 rare (MAF < 1%) coding variantsat thousands of genes
not necessarily pathogenic
inversely proportional?
Modified penetrance of coding variants by cis-regulatory variation shapes human traits
- Sep 2017
www.nature.com www.nature.com
Parental influence on human germline de novo mutations in 1,548 trios from Iceland
www.malacards.org www.malacards.org
Maternal Uniparental Disomy of Chromosome X
www.malacards.org www.malacards.org
Paternal Uniparental Disomy of Chromosome X
stackoverflow.com stackoverflow.com
Perl script in bash's HereDoc
clubmate.fi clubmate.fi
Associative arrays in bash
www.maketecheasier.com www.maketecheasier.com
How to Set Up Bluetooth in Linux
www.statnews.com www.statnews.com
Gut Germs Appear to Play Role in Multiple Sclerosis
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
RNA binding protein CPEB1 remodels host and viral RNA landscapes
remodel = alt splicing
www.biorxiv.org www.biorxiv.org
Massive Mining of Publicly Available RNA-seq Data from Human and Mouse
chronologicaldot.wordpress.com chronologicaldot.wordpress.com
Restoring The Menu Panel on Linux Mint Cinnamon
when text has disappeared etc.
bmcbioinformatics.biomedcentral.com bmcbioinformatics.biomedcentral.com
The projection score - an evaluation criterion for variable subset selection in PCA visualization
"variable" typically means gene or locus in the context of biological data.
www.perlmonks.org www.perlmonks.org
How can I capture STDERR from an external command?
problem arises when using backticks to execute external commands
www.biorxiv.org www.biorxiv.org
Major flaws in "Identification of individuals by trait prediction using whole-genome sequencing data"
re Venter study in PNAS, claiming to be able to identify people based on whole genome data
www.nature.com www.nature.com
Plot a course through the genome Inspired by Google Maps, a suite of tools is allowing researchers to chart the complex conformations of chromosomes.
mentioned tools are focused on (capture) Hi-C data
askubuntu.com askubuntu.com
Multiple Boot Systems Time Conflicts
raspberrypi.stackexchange.com raspberrypi.stackexchange.com
scan all blocks of your partitions
checking an SD card for bad sectors
www.raymond.cc www.raymond.cc
4 Tools to Test and Detect Fake or Counterfeit USB Flash Drives
for Windows
www.scientificamerican.com www.scientificamerican.com
One Test May Spot Cancer, Infections, Diabetes and More
based on cell free DNA fragments in blood; DNA methylation patterns and fragment length distributions can inform on organ of origin.
www.nature.com www.nature.com
A genome-wide analysis of putative functional and exonic variation associated with extremely high intelligence
www.sciencemag.org www.sciencemag.orgmyIDP1
collection of articles on why IDPs are useful
biosciences.stanford.edu biosciences.stanford.edu
IDP Forms and Documentation
for PhD students
www.med.upenn.edu www.med.upenn.edu
Individual Development Plans (IDPs)
for PhD students
- Aug 2017
www.springer.com www.springer.com
Wood Structure and Environment
impacts of environmental conditions on wood anatomy; not climate reconstruction
iopscience.iop.org iopscience.iop.org
Diverse growth trends and climate responses across Eurasia's boreal forest
implies limitations of using macroscopic tree ring features for climate reconstructions, which are influenced by many different factors
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
A Technical Perspective in Modern Tree-ring Research - How to Overcome Dendroecological and Wood Anatomical Challenges
microtome-generated sections along the entire length of wood core samples for anatomical studies
- Jul 2017
academic.oup.com academic.oup.com
RCP: a novel probe design bias correction method for Illumina Methylation BeadChip
better than BMIQ
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
global dye-bias equalization step to control for the different average intensities in the red and green channels. This procedure scales the background-corrected intensities, dividing by the average intensity of the positive control probes in the same channel, red or green, and multiplying by the average intensity of all positive controls in a reference array.
bluishcoder.co.nz bluishcoder.co.nz
firefox -ProfileManager
better: firefox --no-remote -ProfileManager
allanmcrae.com allanmcrae.com
Just use SSH’s support for SOCKS5 proxy.
simple alternative to (open)VPN for web content
pubs.acs.org pubs.acs.org
Mass Spectrometry of Structurally Modified DNA
extensive review
wiki2.dovecot.org wiki2.dovecot.org
Welcome to the Dovecot Wiki
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
The ISMARA client
genome.cshlp.org genome.cshlp.org
ISMARA: automated modeling of genomic signals as a democracy of regulatory motifs
askubuntu.com askubuntu.com
Create a script in your encrypted home directory: ~/scripts/mount_storage.sh
How to auto-mount LUKS encrypted drive upon login
www.napoleome.ch www.napoleome.ch
Napoleon oak genome sequencing project web site: example of public engagement in tree genomics
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
Quercus robur Genome sequencing and assembly
data (not yet live) associated with https://hyp.is/4DerrmIaEeehDPNBqsRcxQ/www.biorxiv.org/content/biorxiv/early/2017/06/13/149203.full.pdf
www.edx.org www.edx.org
Introduction to R for Data Science
Data analysis course using R
www.datacamp.com www.datacamp.com
Learn Data Science Online
Data analysis courses using R and Python
runestoneinteractive.org runestoneinteractive.org
Runestone Interactive
Interactive textbooks on programming
- Jun 2017
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
Canonical Poly(A) Polymerase Activity Promotes the Decay of a Wide Variety of Mammalian Nuclear RNAs
Cordycepin, a modified adenosin produced by a species of fungus, inhibits polyA tail elongation; the RNA-seq data in this paper do NOT include cordycepin-treated samples
molossinus.lab.nig.ac.jp molossinus.lab.nig.ac.jp
Takada et al. 2013
NIG Mouse Genome Database: JF1 and MSM SNPs; effective genome browser
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
Disruption of a novel imprinted zinc-finger gene, ZNF215, in Beckwith-Wiedemann syndrome
demonstrates imprinted expression, but ICR is unknown
journals.plos.org journals.plos.org
Fig 5. Highest ranked ASE genes from (A) brain and (B) liver.
observation of allele-specific expression in brain
www.placentajournal.org www.placentajournal.org
expression is sensitive to tobacco smoke exposure
eprints.bbk.ac.uk eprints.bbk.ac.uk
associated with ASM
genome.cshlp.org genome.cshlp.org
triple-hit: H3K4me2, DNAm and CTCF. skew in SERPINB10 expression consistent with parental allele-specific expression.
journals.plos.org journals.plos.org
haplotype imbalance of expression observed for SERPINB10, but not MEST (imprinted)
- Apr 2017
www.biostars.org www.biostars.org
I'm the developer of pyGeno. Here's a little script that does just that for the Gene TPST2, by using segment trees
recipe for merging transcripts of a gene into a single compound transcript
- Sep 2016
www.nature.com www.nature.com
78% coding density
MAC genome
journals.plos.org journals.plos.org
The P. tetraurelia MAC genome [1] was assembled from 13× Sanger sequencing reads from different insert size librairies of strain d4-2 DNA. Strain d4-2 only differs from strain 51 at a few loci.
SRA accession ERR138952
software.broadinstitute.org software.broadinstitute.org
You must quit IGV and restart for this preference to take effect. The genome should appear in the drop-down list.
restart may be insufficient; had to modify prefs.properties in ~/igv (removing old cached genome values) before i could see my genomes
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
MACS2, an updated version of MACS that is specifically designed to process mixed signal types
ccb.jhu.edu ccb.jhu.eduHISAT24
use to collect coordinates of identified IES
set to 1000: max IES length <1000bp
default acceptable: 26bp is minimal IES length, 20bp is minimal intron length
set to zero
www.asmscience.org www.asmscience.org
90,000 tiny introns (between 20 and 34 nt in length)
MIC and MAC determination during the P. tetraurelia sexual cycle
def. maternal: recipient of gametic nucleus
onlinelibrary.wiley.com onlinelibrary.wiley.com
Paramecium IESs are unique sequence elements between 26 and 882 bp in length
hypothetical pathways for scnRNA-mediated recruitment of the endonuclease in Paramecium
nucleotide modifications: possibly 6mA
The “genome-scanning” model, as envisioned in Paramecium
subtraction of MAC RNA from MIC small RNA = targets (IES) for excision
Nuclear dimorphism and DNA rearrangements in the ciliates Paramecium tetraurelia
tetraurelia: imprecise repeat v precise (splicing-like) IES excision
thomas-cokelaer.info thomas-cokelaer.info
conda and bioconda channel
channel configuration, including for specific python version
www.cell.com www.cell.com
Detection of 6mA Peaks from 6mA-IP-Seq
bowtie (v1), MACS
- Aug 2016
Local file Local file
One currentproject in Dr Schulz’s lab is to characterise a selection of interesting loci in detailusingisoform specific primers and qRT-PCR.
Would be better to end with making an explicit connection between the ENCODE tissue-specific RNAseq data and the Setd2 knock-down RNAseq data. Would it make sense to focus on loci showing evidence for tissue-specific polyA as well as being dependent on Setd2 for correct splicing?
not significant (<0.0274)
I would call that marginally significant
Also these DNA damages as
DNA damage like
For some loci even the used tissues can differ in terms of strainand developmental stage between the qRT-PCRand bisulfite sequencing.
German sentence structure: splitting the predicate (differ ... between). Not done in English. very awkward to read.
a different and relatively unclear pattern
different and inconsistent patterns
Presumed that themechanism of poly(A) site selection/alternative polyadenylation may operate genome-wide in a tissue-specificmanner,and thus, contribute to the complexity of the mammalian transcriptome,
use of very long prepositional phrases at the start of sentences makes reading difficult. stick to simple subject- predicate- object sentence structure.
. This is seen in a different way
really low
no or low (<10%)
Thedata displayedthat it is roughly possible to
My data suggest that it is possible to qualitatively
a totally reliable method
considered quantitative
determined using
Assuming the
AAA indicates poly(A) site
nice and useful figure but: you primed the cDNA synthesis with random hexamers. the qRT-PCR results are therefore not specific to polyadenylated transcripts. so, above figure shows models consistent with the data rather than summarisations of the data (you did not directly measure polyA).
random hexamers
beware that this implies non-polyadenylated transcripts also are represented in the sample
for the
in detail in drafted simplified images
based on direct
normalised tissue
reference tissue
unexpected based on theRNA-seq data
not if you look at the UCSC data (see comment above)
for example
Based on the RNA-seq data
depends in this case on whether you look at the scatter plot or the UCSC genome browser: they are not telling the same story for some reason. my corrections below reflect what UCSC shows, which results in flipping of placenta and thymus.
, compromised
resulted in
liver adult
adult liver
the different
opposed to liver adult
relative to adult liver
and stretched
moretranscripts terminate across
see above; will stop pointing this out
conversion rate was calculated as72.55%
low conversion rate leads to over-estimation of methylation, which could explain the 45% methylation seen in placenta
used for measurements ofheart
primer sets are not tissue-specific; you used them for all tissues; only the measurements themselves are tissue-specific
arose from
Adck2 encodes for a kinase
The host transcript of the CGI is non-coding. Your upstream primers amplify both the coding transcript and the non-coding host transcript. That is a limitation. Could explain the inconsistencies re the RNAseq data.
the expression of transcripts
qRT-PCR cannot show transcription termination: all it can do is verify the RNAseq data, i.e., more relatively more transcription upstream of an active CGI compared to transcription across the CGI. It is important to be precise about what qRT-PCR can and cannot do.
transcripts terminating
transcripts terminateacross
transcription extends across
transcripts terminating
1) Tissue with high CGI activity and more transcripts terminating upstream than across and 2) Tissue with lowCGI activity and more transcripts terminating across than upstream, as described in the chapter ‘Loci selection’.
The data do not show transcripts terminating upstream or downstream of the CGI, they are merely consistent with the hypothesis.
s a