Beal J, Clore A, Manthey J. Studying pathogens degrades BLAST-based pathogen identification.
Sci Rep 2023;
13:5390. [PMID:
37012314 PMCID:
PMC10068195 DOI:
10.1038/s41598-023-32481-z]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 03/28/2023] [Indexed: 04/05/2023] Open
Abstract
As synthetic biology becomes increasingly capable and accessible, it is likewise increasingly critical to be able to make accurate biosecurity determinations regarding the pathogenicity or toxicity of particular nucleic acid or amino acid sequences. At present, this is typically done using the BLAST algorithm to determine the best match with sequences in the NCBI nucleic acid and protein databases. Neither BLAST nor any of the NCBI databases, however, are actually designed for biosafety determination. Critically, taxonomic errors or ambiguities in the NCBI nucleic acid and protein databases can also cause errors in BLAST-based taxonomic categorization. With heavily studied taxa and frequently used biotechnology tools, even low frequency taxonomic categorization issues can lead to high rates of errors in biosecurity decision-making. Here we focus on the implications for false positives, finding that BLAST against NCBI's protein database will now incorrectly categorize a number of commonly used biotechnology tool sequences as the pathogens or toxins with which they have been used. Paradoxically, this implies that problems are expected to be most acute for the pathogens and toxins of highest interest and for the most widely used biotechnology tools. We thus conclude that biosecurity tools should shift away from BLAST against general purpose databases and towards new methods that are specifically tailored for biosafety purposes.
Collapse