- Jul 2018
-
europepmc.org europepmc.org
-
On 2014 Oct 06, Leonid Teytelman commented:
Dear Authors,
We have published an analysis in S. cerevisiae, showing expression-dependent artifactual ChIP enrichment at highly expressed loci (Teytelman L, 2013 "Highly expressed loci are vulnerable to misleading ChIP localization of multiple unrelated proteins"). As you know, our finding raises the question of whether HOT regions may also be influenced by the same artifact.
It is great that you have considered our work and have thoughtfully responded to our analysis. Below, I would like to continue this discussion in an effort to better understand the artifact, its causes, and whether it may be contributing to the enrichment at the HOT loci.
1. “we have demonstrated that there is no correlation between our non-specific binding controls (IgG) and our measured transcription factor occupancy;”
Considering our results with no-tag control experiments, an IgG may fail to control for the artifact. It would be great if you could instead perform a GFP ChIP-Seq, similarly to what we have done in yeast.
2. The regions determined in ref. 41 have very low enrichment (twofold or less) of non-specific immunoprecipation in anti-GFP antibody controls over input DNA evaluated using a non-standard sliding-window approach. Importantly, immunoprecipitation/input ratios at this level are typically not considered enriched for binding in modern peak-calling procedures. For example, the median immunoprecipitation/input ratio for our human RNA Pol II experiments is 20-fold, and only 0.033% of human RNA Pol II peaks contain an immunoprecipitation/input ratio ≤ twofold.
The mean is low, but in both anti-GFP experiments, there are loci with 3-5x enrichment (figure 4D). Most importantly, while the anti-GFP enrichment at the hyper-ChIPable loci is low, please note that the level of enrichment is variable from protein to protein (2-5X for Sir proteins, but often >10X for Cse4).
3. Thus, it is essential to note that the term ‘hyper-ChIPable’, coined by ref. 41, is quite misleading, as a correctly performed ChIP experiment will evaluate statistically enriched regions, with higher immunoprecipitation/input ratios. The so-called hyper-ChIPable regions in ref. 41 are not binding regions as determined under ChIP-seq best practices. Hence, when statistical peak-calling was performed in ref. 41 (using the established MACS peak-caller) to evaluate signals only at significantly enriched regions (Supplementary Table 1) only 17 (<7.5%) of the 238 claimed ‘hyper-ChIPable’ regions were called significant by all three Sir proteins. In fact, 68% of their 238 regions do not contain a binding site for any Sir protein as determined by MACS, despite even very liberal settings used (P < 10−5, no fold enrichment cut-off). Thus, the data of ref. 41 contradict its own major claim that all three Sir proteins showed enrichment at the 238 sites.
By reporting the 238 sites with >2fold enrichment of Sir2, Sir3, and Sir4, we are in fact being extra-demanding in terms of the threshold. We are stringently requiring all three proteins to be enriched above a threshold at the locus. So a target with 5x enrichment of Sir2 and 1.8X enrichment of Sir3 would not pass this cutoff. A typical ChIP study will focus on a single factor at a time. Had we done that, we would have many more artifactual targets for each silencing protein, with many at 5x or higher enrichment. Furthermore, the level of the artifactual signal varies from protein to protein or experiment to experiment. For example, the Cse4 signal at highly-expressed loci can give 10x or higher enrichment.
4. Furthermore, as indicated in Supplementary Table 3 of ref. 41, the Sir2, Sir3 and Sir4 ChIP-seq experiments were performed only once each, which raises the question as to whether enrichment of Sir proteins at the 238 sites is reproducible. More rigorously, even for the remaining 17 genomic loci, their status as hyper-ChIPable is questionable as each region would first have to be established as a reproducible binding site in replicate experiments for each individual Sir protein. If you consider that Sir2, Sir3 and Sir4 ChIP-seq constitutes three replicates of Sir proteins, their data show that most of their claimed sites were not reproducibly enriched.
Most of our artifact-cause analysis focuses on genome-wide data, not on the 238 sites. The 238 Sir-enriched euchromatic loci were a launching point for the analysis, but most of the paper looks comprehensively at the link between expression and ChIP levels. Figures 3, 4, and 5 are all on genome-wide correlations between Pol II/III and ChIP.
As for reproducibility, we see the same peaks, with often 10x enrichment, in Ste12, Cse4, two distinct GFP experiments, and each of the three Sir ChIP-Seq datasets. The same exact loci come up in the Sir3 paper from Oliver Rando’s group (Radman-Livaja M, 2011).
5. In addition to the analytical differences outlined above, other potential sources for the marked differences between our data and the Sir-enriched regions of ref. 41 are deviations from a typical ChIP protocol. In particular, ref. 41 employed a significantly longer cross-link time (1 h as opposed to the typical 10–20 min). This might contribute to formation of large non-specific protein–DNA complexes, which can in turn increase non-specific immunoprecipitation.
Though not discussed in the manuscript, we have in fact performed experiments to investigate if the crosslinking concentration contributed to the misleading signal. We performed ChIP with the 1 hour crosslinking at room temperature at the following formaldehyde concentrations: 0.0625%, .125%, .25%, .5% and 1%, but did not find a proportionate decrease in the hyper ChIPpable signal with the decreasing formaldehyde concentrations. Moreover, the presence of hyper-ChIPability in the Snyder datasets (Cse4, Ste12), ours (Sir2, 3, 4, GFP), and Rando (Sir3) make it clear that the problem is not in some unusual protocol steps in our hands.
We also note that we initially performed the Sir ChIP-Seq experiments because of our interest in the Sir protein biology. Because the Sir proteins do not directly interact with the DNA, we used longer crosslinking times. This is not unique to our work.
In summary, much more work is needed to pinpoint the cause of the artifact and to evaluate whether some or all of the signal at highly expressed genes in many other reported ChIP studies could be artifactual. Much more work is necessary to develop the best controls and corrections for the artifact. However, the artifact we report is not minor and is not a consequence of the methodological details of our manuscript.
Also, please note the following papers, published almost in parallel with ours, on this topic:
Park D, 2013 "Widespread Misinterpretable ChIP-seq Bias in Yeast" (Different analysis methods but the same conclusions in S. cerevisiae, analyzing an entirely different set of factors with ChIP-Seq experiments.)
Kasinathan S, 2014 "High-resolution mapping of transcription factor binding sites on native chromatin" (Questions specificity of standard ChIP in S. cerevisiae and at HOT regions of Drosophila. This work possibly provides a solution to the artifact with a modification of the ChIP technique.)
Also, the following discussion of our work on PubPeer may be useful.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-
- Feb 2018
-
europepmc.org europepmc.org
-
On 2014 Oct 06, Leonid Teytelman commented:
Dear Authors,
We have published an analysis in S. cerevisiae, showing expression-dependent artifactual ChIP enrichment at highly expressed loci (Teytelman L, 2013 "Highly expressed loci are vulnerable to misleading ChIP localization of multiple unrelated proteins"). As you know, our finding raises the question of whether HOT regions may also be influenced by the same artifact.
It is great that you have considered our work and have thoughtfully responded to our analysis. Below, I would like to continue this discussion in an effort to better understand the artifact, its causes, and whether it may be contributing to the enrichment at the HOT loci.
1. “we have demonstrated that there is no correlation between our non-specific binding controls (IgG) and our measured transcription factor occupancy;”
Considering our results with no-tag control experiments, an IgG may fail to control for the artifact. It would be great if you could instead perform a GFP ChIP-Seq, similarly to what we have done in yeast.
2. The regions determined in ref. 41 have very low enrichment (twofold or less) of non-specific immunoprecipation in anti-GFP antibody controls over input DNA evaluated using a non-standard sliding-window approach. Importantly, immunoprecipitation/input ratios at this level are typically not considered enriched for binding in modern peak-calling procedures. For example, the median immunoprecipitation/input ratio for our human RNA Pol II experiments is 20-fold, and only 0.033% of human RNA Pol II peaks contain an immunoprecipitation/input ratio ≤ twofold.
The mean is low, but in both anti-GFP experiments, there are loci with 3-5x enrichment (figure 4D). Most importantly, while the anti-GFP enrichment at the hyper-ChIPable loci is low, please note that the level of enrichment is variable from protein to protein (2-5X for Sir proteins, but often >10X for Cse4).
3. Thus, it is essential to note that the term ‘hyper-ChIPable’, coined by ref. 41, is quite misleading, as a correctly performed ChIP experiment will evaluate statistically enriched regions, with higher immunoprecipitation/input ratios. The so-called hyper-ChIPable regions in ref. 41 are not binding regions as determined under ChIP-seq best practices. Hence, when statistical peak-calling was performed in ref. 41 (using the established MACS peak-caller) to evaluate signals only at significantly enriched regions (Supplementary Table 1) only 17 (<7.5%) of the 238 claimed ‘hyper-ChIPable’ regions were called significant by all three Sir proteins. In fact, 68% of their 238 regions do not contain a binding site for any Sir protein as determined by MACS, despite even very liberal settings used (P < 10−5, no fold enrichment cut-off). Thus, the data of ref. 41 contradict its own major claim that all three Sir proteins showed enrichment at the 238 sites.
By reporting the 238 sites with >2fold enrichment of Sir2, Sir3, and Sir4, we are in fact being extra-demanding in terms of the threshold. We are stringently requiring all three proteins to be enriched above a threshold at the locus. So a target with 5x enrichment of Sir2 and 1.8X enrichment of Sir3 would not pass this cutoff. A typical ChIP study will focus on a single factor at a time. Had we done that, we would have many more artifactual targets for each silencing protein, with many at 5x or higher enrichment. Furthermore, the level of the artifactual signal varies from protein to protein or experiment to experiment. For example, the Cse4 signal at highly-expressed loci can give 10x or higher enrichment.
4. Furthermore, as indicated in Supplementary Table 3 of ref. 41, the Sir2, Sir3 and Sir4 ChIP-seq experiments were performed only once each, which raises the question as to whether enrichment of Sir proteins at the 238 sites is reproducible. More rigorously, even for the remaining 17 genomic loci, their status as hyper-ChIPable is questionable as each region would first have to be established as a reproducible binding site in replicate experiments for each individual Sir protein. If you consider that Sir2, Sir3 and Sir4 ChIP-seq constitutes three replicates of Sir proteins, their data show that most of their claimed sites were not reproducibly enriched.
Most of our artifact-cause analysis focuses on genome-wide data, not on the 238 sites. The 238 Sir-enriched euchromatic loci were a launching point for the analysis, but most of the paper looks comprehensively at the link between expression and ChIP levels. Figures 3, 4, and 5 are all on genome-wide correlations between Pol II/III and ChIP.
As for reproducibility, we see the same peaks, with often 10x enrichment, in Ste12, Cse4, two distinct GFP experiments, and each of the three Sir ChIP-Seq datasets. The same exact loci come up in the Sir3 paper from Oliver Rando’s group (Radman-Livaja M, 2011).
5. In addition to the analytical differences outlined above, other potential sources for the marked differences between our data and the Sir-enriched regions of ref. 41 are deviations from a typical ChIP protocol. In particular, ref. 41 employed a significantly longer cross-link time (1 h as opposed to the typical 10–20 min). This might contribute to formation of large non-specific protein–DNA complexes, which can in turn increase non-specific immunoprecipitation.
Though not discussed in the manuscript, we have in fact performed experiments to investigate if the crosslinking concentration contributed to the misleading signal. We performed ChIP with the 1 hour crosslinking at room temperature at the following formaldehyde concentrations: 0.0625%, .125%, .25%, .5% and 1%, but did not find a proportionate decrease in the hyper ChIPpable signal with the decreasing formaldehyde concentrations. Moreover, the presence of hyper-ChIPability in the Snyder datasets (Cse4, Ste12), ours (Sir2, 3, 4, GFP), and Rando (Sir3) make it clear that the problem is not in some unusual protocol steps in our hands.
We also note that we initially performed the Sir ChIP-Seq experiments because of our interest in the Sir protein biology. Because the Sir proteins do not directly interact with the DNA, we used longer crosslinking times. This is not unique to our work.
In summary, much more work is needed to pinpoint the cause of the artifact and to evaluate whether some or all of the signal at highly expressed genes in many other reported ChIP studies could be artifactual. Much more work is necessary to develop the best controls and corrections for the artifact. However, the artifact we report is not minor and is not a consequence of the methodological details of our manuscript.
Also, please note the following papers, published almost in parallel with ours, on this topic:
Park D, 2013 "Widespread Misinterpretable ChIP-seq Bias in Yeast" (Different analysis methods but the same conclusions in S. cerevisiae, analyzing an entirely different set of factors with ChIP-Seq experiments.)
Kasinathan S, 2014 "High-resolution mapping of transcription factor binding sites on native chromatin" (Questions specificity of standard ChIP in S. cerevisiae and at HOT regions of Drosophila. This work possibly provides a solution to the artifact with a modification of the ChIP technique.)
Also, the following discussion of our work on PubPeer may be useful.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-