Hypothesis

2 Matching Annotations

Jul 2018
academic.oup.com academic.oup.com

https://www.ncbi.nlm.nih.gov/pubmed/25398905

1
1. PubMedCommonsArchive 02 Jul 2018
  
  in Europe PMC
  
  On 2014 Dec 19, Scott Federhen commented:
  
  I am Scott Federhen, author of this article, and would like to report some aspects of our genomes from type that were discovered too late to include in the manuscript. A visual inspection of our k-mer tree revealed two genomes from type for Bacillus subtilis subsp. spizizenii that are placed quite distantly from each other.
  The average nucleotide identity (ANI) statistics from the alignments of the type genomes from the relevant subspecies of Bacillus subtilis are as follows:
  96.7997 CP002905 ADGS01 89.7316 94.8698 – spizizenii vs. spizizenii
  94.6427 CP002905 AMXN01 89.4248 86.6265 – spizizenii vs. inaquosorum
  93.2854 CP002905 AL009126 88.7053 88.4756 – spizizenii vs. subtilis
  89.7% of CP002905 aligns with 94.9% of ADGS01, and the parts that align share 96.8% average nucleotide identity. These two genomes a likely to be from the same species, but probably not from the same subspecies - and certainly not from co-identical strains.
  To examine this problem systematically, we looked for all of the cases where we had more then one genome from type from the same species (or subspecies) and sorted them by pairwise ANI. These were usually from different culture collections and often sequenced in different labs. 274 pairwise combinations break down like this:
  4 pairs of genomes < 90% identical.
  5 pairs of genomes 96% - 99% identical.
  8 pairs of genomes 99% - 99.9% identical.
  112 pairs of genomes 99.9% - 99.99% identical.
  135 pairs of genomes 99.99% - 99.999% identical.
  10 pairs of genomes > 99.999% identical
  This is a wide range for genomes from strains that are supposed to be co-identical. The four most diverse pairs are likely to be from different species. The problem could lie in many places - the annotation in the sequence entries, a strain mixup in the sequencing lab, or a contamination, misannotation or misidentification in the culture collection. We are working with submitters and culture collections to resolve the most egregious discrepancies and improve the reliability of our subset of genomes and sequences from type. At some point the community is going to have to come to a consensus as to what constitutes identity between co-identical strains.
  We look forward to the day when every described species of prokaryote has a complete genome sequence, and a genome is included with every new species description. At that point it would be useful for each culture collection holding strains from type to sequence at least a low-coverage genome to verify the identity of the strain.
  This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
  
  PubMedCommonsArchive PMID:25398905
Visit annotations in context

Tags

PMID:25398905

PubMedCommonsArchive

Annotators

PubMedCommonsArchive

URL

academic.oup.com/nar/article/43/D1/D1086/2438106
Feb 2018
academic.oup.com academic.oup.com

https://www.ncbi.nlm.nih.gov/pubmed/25398905

1
1. PubMedCommonsArchive 09 Feb 2018
  
  in Public
  
  On 2014 Dec 19, Scott Federhen commented:
  
  I am Scott Federhen, author of this article, and would like to report some aspects of our genomes from type that were discovered too late to include in the manuscript. A visual inspection of our k-mer tree revealed two genomes from type for Bacillus subtilis subsp. spizizenii that are placed quite distantly from each other.
  The average nucleotide identity (ANI) statistics from the alignments of the type genomes from the relevant subspecies of Bacillus subtilis are as follows:
  96.7997 CP002905 ADGS01 89.7316 94.8698 – spizizenii vs. spizizenii
  94.6427 CP002905 AMXN01 89.4248 86.6265 – spizizenii vs. inaquosorum
  93.2854 CP002905 AL009126 88.7053 88.4756 – spizizenii vs. subtilis
  89.7% of CP002905 aligns with 94.9% of ADGS01, and the parts that align share 96.8% average nucleotide identity. These two genomes a likely to be from the same species, but probably not from the same subspecies - and certainly not from co-identical strains.
  To examine this problem systematically, we looked for all of the cases where we had more then one genome from type from the same species (or subspecies) and sorted them by pairwise ANI. These were usually from different culture collections and often sequenced in different labs. 274 pairwise combinations break down like this:
  4 pairs of genomes < 90% identical.
  5 pairs of genomes 96% - 99% identical.
  8 pairs of genomes 99% - 99.9% identical.
  112 pairs of genomes 99.9% - 99.99% identical.
  135 pairs of genomes 99.99% - 99.999% identical.
  10 pairs of genomes > 99.999% identical
  This is a wide range for genomes from strains that are supposed to be co-identical. The four most diverse pairs are likely to be from different species. The problem could lie in many places - the annotation in the sequence entries, a strain mixup in the sequencing lab, or a contamination, misannotation or misidentification in the culture collection. We are working with submitters and culture collections to resolve the most egregious discrepancies and improve the reliability of our subset of genomes and sequences from type. At some point the community is going to have to come to a consensus as to what constitutes identity between co-identical strains.
  We look forward to the day when every described species of prokaryote has a complete genome sequence, and a genome is included with every new species description. At that point it would be useful for each culture collection holding strains from type to sequence at least a low-coverage genome to verify the identity of the strain.
  This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
  
  PubMedCommonsArchive PMID:25398905
Visit annotations in context

Tags

PMID:25398905

PubMedCommonsArchive

Annotators

PubMedCommonsArchive

URL

academic.oup.com/nar/article/43/D1/D1086/2438106

Tags

Annotators

URL

Tags

Annotators

URL