- Jul 2018
-
europepmc.org europepmc.org
-
On 2015 May 31, Shin Lin commented:
This response discusses the manuscript submitted by Gilad and Mizrahi-Man at F1000Reseach, as well as our two responses at that journal. For details, we encourage interested individuals to read those various pieces.
The batch effect that Gilad and Mizrahi-Man present as confounders of our findings are not the result of experimental protocols. Our sequencing libraries were largely prepared in one sitting by the same person, and we used matched primer indices/barcodes to minimize variation as observed in 't Hoen PA, 2013. The potential batch effect to which Gilad and Mizrahi-Man are referring is from sequencing on different lanes/flow cells/sequencing machines. In our experience, these effects are small (also found in 't Hoen PA, 2013), and we did not observe any such effects in the original published data. However, to further settle the issue, we reconstructed the sequencing libraries under a different multiplexing scheme to address their concerns, and we found the same clustering pattern as originally presented by Lin et al. Thus, we emphatically disagree with the conclusion from Gilad and Mizrahi-Man that our conclusions are “not warranted,” but rather, we argue that objective normalization procedures allow the discovery of the clustering of transcriptomes by species.
Gilad and Mizrahi-Man found clustering by tissue after normalization, because in their attempt to account for lane/flow cell/sequencing machine effect, they normalized away the species effect. In that set of experiments, tissues of the same species were multiplexed on the same sequencing lane; accounting for primer indices would not have been possible otherwise. That normalization of the data by each species separately causes clustering by tissue was known to authors of Lin S, 2014, as this observation was presented in the Mouse ENCODE main paper Yue F, 2014.
Gilad and Mizrahi-Man's work focused on one particular dataset in Lin S, 2014. However, that paper contains a principal component analysis (PCA) on data from multiple sources: Stanford (human, mouse), Salk (human), HBM (human), LICR (mouse), and CSHL (mouse). There are undoubtedly many technical differences between the various sources. Yet, the clustering by species was seen in higher order principal components (PCs) (see Figure 1A Lin S, 2014); clustering by tissues, in lower components (Figure 1B in Lin S, 2014) or by normalizing species separately (Extended Data Fig. 1C of Yue F, 2014). The same behavior is seen in the Stanford-only data—both in Lin S, 2014, which minimizes primer index effect (Figure 1C & 1D) and now the newly generated results that account for lane effect. The latter are consistent with our earlier observation that experimental batch did not drive the species-specific clustering.
As for the latest criticisms concerning sample collection, these are issues outside the scope of the manuscript by Gilad and Mizrahi-Man. We state that our procurement practices are consistent with what other investigators have done and continue to do. When we limit our analyses to the small number of tissues examined by recent studies showing tissue specific-clustering (i.e. those with a large number of tissue-specific genes), we also find tissue specific clustering (see Figure 1F in Lin S, 2014). Thus, there are no inherent biases in our data which account for species-specific clustering. Rather, it is our complete dataset with many additional tissue types which results in the different clustering pattern. This evaluation of a broad tissue set is the critical difference which led to our finding of species-specific clustering. Indeed, when we examine the broad dataset of mouse and human CAGE expression data from the Riken Fantom 5 project (FANTOM Consortium and the RIKEN PMI and CLST (DGT)., 2014), we confirm species-specific clustering.
Finally, as stated in the F1000 comment, we reiterate our enthusiasm of the mouse as a vital model system for experimental research, because of its many similarities to humans, which we show in our PNAS paper. However, an appreciation of the differences which exist between human and mouse will allow investigators to better interpret the disparities which are encountered when applying findings in the mouse model to humans.
***New data mentioned herein is available for download at the Mouse ENCODE website.
Shin Lin<sup>1,2</sup> , Yiing Lin<sup>3</sup> , Michael A. Beer<sup>4</sup> , Thomas R. Gingeras<sup>5</sup> , Joseph R. Ecker<sup>6,7</sup> , Michael Snyder<sup>1</sup>
<sup>1</sup> Department of Genetics, Stanford University, 300 Pasteur Drive, M-344 Stanford, California 94305; <sup>2</sup> Division of Cardiovascular Medicine, Stanford University, Falk Building, 870 Quarry Road Stanford, California 94304; <sup>3</sup> Department of Surgery, Washington University School of Medicine, 660 S. Euclid Ave., Campus Box 8109, St. Louis, Missouri 63110; <sup>4</sup> McKusick-Nathans Institute of Genetic Medicine and the Department of Biomedical Engineering, Johns Hopkins University, 733 N. Broadway, BRB 573 Baltimore, Maryland 21205; <sup>5</sup> Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Road, Cold Spring Harbor, New York 11742;<sup>6</sup> Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037; and <sup>7</sup> Howard Hughes Medical Institute, The Salk Institute for Biological Studies, La Jolla, CA 92037.
Acknowledgement: We thank the other members of the Mouse ENCODE consortium in formulating this response.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY. -
On 2015 May 26, Steven Salzberg commented:
Serious technical questions have been raised about the main conclusion of this paper. Specifically, Yoav Gilad and Orna Mizrahi-Man described how the human and mouse samples were processed separately in several ways, any of which could lead to a significant "batch effect." They published some of their findings in F1000 Research, at http://f1000research.com/articles/4-121/v1. They showed that after removing the batch effects, the finding that human and mouse genes cluster separately completely disappeared. In the discussion of that paper on the F1000 site, further sources of batch effects were identified. Thus it appears that the main finding of this paper cannot be supported by the data, because the samples from human and mouse were handled and processed in distinct ways, confounding the batch effect and any possible biological effect.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-
- Feb 2018
-
europepmc.org europepmc.org
-
On 2015 May 26, Steven Salzberg commented:
Serious technical questions have been raised about the main conclusion of this paper. Specifically, Yoav Gilad and Orna Mizrahi-Man described how the human and mouse samples were processed separately in several ways, any of which could lead to a significant "batch effect." They published some of their findings in F1000 Research, at http://f1000research.com/articles/4-121/v1. They showed that after removing the batch effects, the finding that human and mouse genes cluster separately completely disappeared. In the discussion of that paper on the F1000 site, further sources of batch effects were identified. Thus it appears that the main finding of this paper cannot be supported by the data, because the samples from human and mouse were handled and processed in distinct ways, confounding the batch effect and any possible biological effect.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY. -
On 2015 May 31, Shin Lin commented:
This response discusses the manuscript submitted by Gilad and Mizrahi-Man at F1000Reseach, as well as our two responses at that journal. For details, we encourage interested individuals to read those various pieces.
The batch effect that Gilad and Mizrahi-Man present as confounders of our findings are not the result of experimental protocols. Our sequencing libraries were largely prepared in one sitting by the same person, and we used matched primer indices/barcodes to minimize variation as observed in 't Hoen PA, 2013. The potential batch effect to which Gilad and Mizrahi-Man are referring is from sequencing on different lanes/flow cells/sequencing machines. In our experience, these effects are small (also found in 't Hoen PA, 2013), and we did not observe any such effects in the original published data. However, to further settle the issue, we reconstructed the sequencing libraries under a different multiplexing scheme to address their concerns, and we found the same clustering pattern as originally presented by Lin et al. Thus, we emphatically disagree with the conclusion from Gilad and Mizrahi-Man that our conclusions are “not warranted,” but rather, we argue that objective normalization procedures allow the discovery of the clustering of transcriptomes by species.
Gilad and Mizrahi-Man found clustering by tissue after normalization, because in their attempt to account for lane/flow cell/sequencing machine effect, they normalized away the species effect. In that set of experiments, tissues of the same species were multiplexed on the same sequencing lane; accounting for primer indices would not have been possible otherwise. That normalization of the data by each species separately causes clustering by tissue was known to authors of Lin S, 2014, as this observation was presented in the Mouse ENCODE main paper Yue F, 2014.
Gilad and Mizrahi-Man's work focused on one particular dataset in Lin S, 2014. However, that paper contains a principal component analysis (PCA) on data from multiple sources: Stanford (human, mouse), Salk (human), HBM (human), LICR (mouse), and CSHL (mouse). There are undoubtedly many technical differences between the various sources. Yet, the clustering by species was seen in higher order principal components (PCs) (see Figure 1A Lin S, 2014); clustering by tissues, in lower components (Figure 1B in Lin S, 2014) or by normalizing species separately (Extended Data Fig. 1C of Yue F, 2014). The same behavior is seen in the Stanford-only data—both in Lin S, 2014, which minimizes primer index effect (Figure 1C & 1D) and now the newly generated results that account for lane effect. The latter are consistent with our earlier observation that experimental batch did not drive the species-specific clustering.
As for the latest criticisms concerning sample collection, these are issues outside the scope of the manuscript by Gilad and Mizrahi-Man. We state that our procurement practices are consistent with what other investigators have done and continue to do. When we limit our analyses to the small number of tissues examined by recent studies showing tissue specific-clustering (i.e. those with a large number of tissue-specific genes), we also find tissue specific clustering (see Figure 1F in Lin S, 2014). Thus, there are no inherent biases in our data which account for species-specific clustering. Rather, it is our complete dataset with many additional tissue types which results in the different clustering pattern. This evaluation of a broad tissue set is the critical difference which led to our finding of species-specific clustering. Indeed, when we examine the broad dataset of mouse and human CAGE expression data from the Riken Fantom 5 project (FANTOM Consortium and the RIKEN PMI and CLST (DGT)., 2014), we confirm species-specific clustering.
Finally, as stated in the F1000 comment, we reiterate our enthusiasm of the mouse as a vital model system for experimental research, because of its many similarities to humans, which we show in our PNAS paper. However, an appreciation of the differences which exist between human and mouse will allow investigators to better interpret the disparities which are encountered when applying findings in the mouse model to humans.
***New data mentioned herein is available for download at the Mouse ENCODE website.
Shin Lin<sup>1,2</sup> , Yiing Lin<sup>3</sup> , Michael A. Beer<sup>4</sup> , Thomas R. Gingeras<sup>5</sup> , Joseph R. Ecker<sup>6,7</sup> , Michael Snyder<sup>1</sup>
<sup>1</sup> Department of Genetics, Stanford University, 300 Pasteur Drive, M-344 Stanford, California 94305; <sup>2</sup> Division of Cardiovascular Medicine, Stanford University, Falk Building, 870 Quarry Road Stanford, California 94304; <sup>3</sup> Department of Surgery, Washington University School of Medicine, 660 S. Euclid Ave., Campus Box 8109, St. Louis, Missouri 63110; <sup>4</sup> McKusick-Nathans Institute of Genetic Medicine and the Department of Biomedical Engineering, Johns Hopkins University, 733 N. Broadway, BRB 573 Baltimore, Maryland 21205; <sup>5</sup> Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Road, Cold Spring Harbor, New York 11742;<sup>6</sup> Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037; and <sup>7</sup> Howard Hughes Medical Institute, The Salk Institute for Biological Studies, La Jolla, CA 92037.
Acknowledgement: We thank the other members of the Mouse ENCODE consortium in formulating this response.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-