1
|
Sharma GK, Sharma R, Joshi K, Qureshi S, Mathur S, Sinha S, Chatterjee S, Nunia V. Advancing microbial diagnostics: a universal phylogeny guided computational algorithm to find unique sequences for precise microorganism detection. Brief Bioinform 2024; 25:bbae545. [PMID: 39441245 PMCID: PMC11497845 DOI: 10.1093/bib/bbae545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Revised: 09/21/2024] [Accepted: 10/11/2024] [Indexed: 10/25/2024] Open
Abstract
Sequences derived from organisms sharing common evolutionary origins exhibit similarity, while unique sequences, absent in related organisms, act as good diagnostic marker candidates. However, the approach focused on identifying dissimilar regions among closely-related organisms poses challenges as it requires complex multiple sequence alignments, making computation and parsing difficult. To address this, we have developed a biologically inspired universal NAUniSeq algorithm to find the unique sequences for microorganism diagnosis by traveling through the phylogeny of life. Mapping through a phylogenetic tree ensures a low number of cross-contamination and false positives. We have downloaded complete taxonomy data from Taxadb database and sequence data from National Center for Biotechnology Information Reference Sequence Database (NCBI-Refseq) and, with the help of NetworkX, created a phylogenetic tree. Sequences were assigned over the graph nodes, k-mers were created for target and non-target nodes and search was performed over the graph using the depth first search algorithm. In a memory efficient alternative NoSQL approach, we created a collection of Refseq sequences in MongoDB database using tax-id and path of FASTA files. We queried the MongoDB collection for the target and non-target sequences. In both the approaches, we used an alignment free sliding window k-mer-based procedure that quickly compares k-mers of target and non-target sequences and returns unique sequences that are not present in the non-target. We have validated our algorithm with target nodes Mycobacterium tuberculosis, Neisseria gonorrhoeae, and Monkeypox and generated unique sequences. This universal algorithm is a powerful tool for generating diagnostic sequences, enabling the accurate identification of microbial strains with high phylogenetic precision.
Collapse
Affiliation(s)
- Gulshan Kumar Sharma
- Malaviya National Institute of Technology, Jawahar Lal Nehru Marg, Jhalana Gram, Malviya Nagar, Jaipur, Rajasthan 302017, India
| | - Rakesh Sharma
- Centre for Converging Technologies, University of Rajasthan, Jawahar Lal Nehru Marg, Talvandi, Jaipur, Rajasthan 302004, India
| | - Kavita Joshi
- Department of Zoology, University of Rajasthan, Jawahar Lal Nehru Marg, Talvandi, Jaipur, Rajasthan 302004, India
| | - Sameer Qureshi
- Department of Zoology, University of Rajasthan, Jawahar Lal Nehru Marg, Talvandi, Jaipur, Rajasthan 302004, India
| | - Shubhita Mathur
- Department of Zoology, University of Rajasthan, Jawahar Lal Nehru Marg, Talvandi, Jaipur, Rajasthan 302004, India
| | - Sharad Sinha
- Department of Mathematics, University of Rajasthan, Jawahar Lal Nehru Marg, Talvandi, Jaipur, Rajasthan 302004, India
| | - Samit Chatterjee
- Department of Zoology, University of Rajasthan, Jawahar Lal Nehru Marg, Talvandi, Jaipur, Rajasthan 302004, India
| | - Vandana Nunia
- Department of Zoology, University of Rajasthan, Jawahar Lal Nehru Marg, Talvandi, Jaipur, Rajasthan 302004, India
| |
Collapse
|