- Jul 2018
-
academic.oup.com academic.oup.com
-
On 2014 Dec 19, Scott Federhen commented:
I am Scott Federhen, author of this article, and would like to report some aspects of our genomes from type that were discovered too late to include in the manuscript. A visual inspection of our k-mer tree revealed two genomes from type for Bacillus subtilis subsp. spizizenii that are placed quite distantly from each other.
The average nucleotide identity (ANI) statistics from the alignments of the type genomes from the relevant subspecies of Bacillus subtilis are as follows:
96.7997 CP002905 ADGS01 89.7316 94.8698 – spizizenii vs. spizizenii
94.6427 CP002905 AMXN01 89.4248 86.6265 – spizizenii vs. inaquosorum
93.2854 CP002905 AL009126 88.7053 88.4756 – spizizenii vs. subtilis
89.7% of CP002905 aligns with 94.9% of ADGS01, and the parts that align share 96.8% average nucleotide identity. These two genomes a likely to be from the same species, but probably not from the same subspecies - and certainly not from co-identical strains.
To examine this problem systematically, we looked for all of the cases where we had more then one genome from type from the same species (or subspecies) and sorted them by pairwise ANI. These were usually from different culture collections and often sequenced in different labs. 274 pairwise combinations break down like this:
4 pairs of genomes < 90% identical.
5 pairs of genomes 96% - 99% identical.
8 pairs of genomes 99% - 99.9% identical.
112 pairs of genomes 99.9% - 99.99% identical.
135 pairs of genomes 99.99% - 99.999% identical.
10 pairs of genomes > 99.999% identical
This is a wide range for genomes from strains that are supposed to be co-identical. The four most diverse pairs are likely to be from different species. The problem could lie in many places - the annotation in the sequence entries, a strain mixup in the sequencing lab, or a contamination, misannotation or misidentification in the culture collection. We are working with submitters and culture collections to resolve the most egregious discrepancies and improve the reliability of our subset of genomes and sequences from type. At some point the community is going to have to come to a consensus as to what constitutes identity between co-identical strains.
We look forward to the day when every described species of prokaryote has a complete genome sequence, and a genome is included with every new species description. At that point it would be useful for each culture collection holding strains from type to sequence at least a low-coverage genome to verify the identity of the strain.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-
- Feb 2018
-
academic.oup.com academic.oup.com
-
On 2014 Dec 19, Scott Federhen commented:
I am Scott Federhen, author of this article, and would like to report some aspects of our genomes from type that were discovered too late to include in the manuscript. A visual inspection of our k-mer tree revealed two genomes from type for Bacillus subtilis subsp. spizizenii that are placed quite distantly from each other.
The average nucleotide identity (ANI) statistics from the alignments of the type genomes from the relevant subspecies of Bacillus subtilis are as follows:
96.7997 CP002905 ADGS01 89.7316 94.8698 – spizizenii vs. spizizenii
94.6427 CP002905 AMXN01 89.4248 86.6265 – spizizenii vs. inaquosorum
93.2854 CP002905 AL009126 88.7053 88.4756 – spizizenii vs. subtilis
89.7% of CP002905 aligns with 94.9% of ADGS01, and the parts that align share 96.8% average nucleotide identity. These two genomes a likely to be from the same species, but probably not from the same subspecies - and certainly not from co-identical strains.
To examine this problem systematically, we looked for all of the cases where we had more then one genome from type from the same species (or subspecies) and sorted them by pairwise ANI. These were usually from different culture collections and often sequenced in different labs. 274 pairwise combinations break down like this:
4 pairs of genomes < 90% identical.
5 pairs of genomes 96% - 99% identical.
8 pairs of genomes 99% - 99.9% identical.
112 pairs of genomes 99.9% - 99.99% identical.
135 pairs of genomes 99.99% - 99.999% identical.
10 pairs of genomes > 99.999% identical
This is a wide range for genomes from strains that are supposed to be co-identical. The four most diverse pairs are likely to be from different species. The problem could lie in many places - the annotation in the sequence entries, a strain mixup in the sequencing lab, or a contamination, misannotation or misidentification in the culture collection. We are working with submitters and culture collections to resolve the most egregious discrepancies and improve the reliability of our subset of genomes and sequences from type. At some point the community is going to have to come to a consensus as to what constitutes identity between co-identical strains.
We look forward to the day when every described species of prokaryote has a complete genome sequence, and a genome is included with every new species description. At that point it would be useful for each culture collection holding strains from type to sequence at least a low-coverage genome to verify the identity of the strain.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-