As genome sequencing has gotten faster and cheaper, the pace of
whole-genome sequencing has accelerated, dramatically increasing the
number of genomes deposited in public archives. Although these genomes
are a valuable resource, problems can arise when researchers misapply
computational methods to assemble them, or accidentally introduce
unnoticed contaminations during sequencing. The first complete bacterial
genome, Haemophilus influenzae, appeared in 1995, and today
the public GenBank database contains over 27,000 prokaryotic and 1,600
eukaryotic genomes. The vast majority of these are draft genomes that
contain gaps in their sequences, and researchers often use these draft
sequences for future analyses.
Each genome sequencing project begins with a DNA source, which varies
depending on the species. For animals, blood is a common source, while
for smaller organisms such as insects the entire organism or a
population of organisms may be required to yield enough DNA for
sequencing. Throughout the process of DNA isolation and sequencing,
contamination remains a possibility. Computational filters applied to
the raw sequencing reads are usually effective at removing common
laboratory contaminants such as E. coli, but other contaminants may be more difficult to identify.
In a new study in PeerJ, authors from Johns Hopkins
University discovered contaminating bacterial and viral sequences in
"draft" assemblies of animal and plant genomes that had been deposited
in GenBank. These may cause particular problems for the rapidly growing
field of microbiome analysis, when sequences labeled as animal in origin
actually turn out to be microbial.
In an even more surprising finding, the authors discovered the
presence of cow and sheep DNA in the supposedly finished genome of a
pathogenic bacterium, Neisseria gonorrhoeae. Although deposited
in GenBank as a finished genome, the bacterium apparently was a draft
genome that was submitted as complete, with erroneous DNA inserted in
five places. If taken at face value, this data would appear to be a
startling case of lateral gene transfer, but the correct explanation
appears to be more mundane.
These findings highlight the importance of careful screening of DNA
sequence data both at the time of release and, in some cases, for many
years after publication.
No comments:
Post a Comment