Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Tangirala K, Herndon N, Caragea D. A Comparative Analysis Between k-Mers and Community Detection-Based Features for the Task of Protein Classification. IEEE Trans Nanobioscience 2016;15:84-92. [PMID: 26863669 PMCID: PMC6245644 DOI: 10.1109/tnb.2016.2523501] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

For:	Tangirala K, Herndon N, Caragea D. A Comparative Analysis Between k-Mers and Community Detection-Based Features for the Task of Protein Classification. IEEE Trans Nanobioscience 2016;15:84-92. [PMID: 26863669 PMCID: PMC6245644 DOI: 10.1109/tnb.2016.2523501] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Number

Cited by Other Article(s)

Sharma GK, Sharma R, Joshi K, Qureshi S, Mathur S, Sinha S, Chatterjee S, Nunia V. Advancing microbial diagnostics: a universal phylogeny guided computational algorithm to find unique sequences for precise microorganism detection. Brief Bioinform 2024;25:bbae545. [PMID: 39441245 PMCID: PMC11497845 DOI: 10.1093/bib/bbae545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Revised: 09/21/2024] [Accepted: 10/11/2024] [Indexed: 10/25/2024] Open

Abstract

Sequences derived from organisms sharing common evolutionary origins exhibit similarity, while unique sequences, absent in related organisms, act as good diagnostic marker candidates. However, the approach focused on identifying dissimilar regions among closely-related organisms poses challenges as it requires complex multiple sequence alignments, making computation and parsing difficult. To address this, we have developed a biologically inspired universal NAUniSeq algorithm to find the unique sequences for microorganism diagnosis by traveling through the phylogeny of life. Mapping through a phylogenetic tree ensures a low number of cross-contamination and false positives. We have downloaded complete taxonomy data from Taxadb database and sequence data from National Center for Biotechnology Information Reference Sequence Database (NCBI-Refseq) and, with the help of NetworkX, created a phylogenetic tree. Sequences were assigned over the graph nodes, k-mers were created for target and non-target nodes and search was performed over the graph using the depth first search algorithm. In a memory efficient alternative NoSQL approach, we created a collection of Refseq sequences in MongoDB database using tax-id and path of FASTA files. We queried the MongoDB collection for the target and non-target sequences. In both the approaches, we used an alignment free sliding window k-mer-based procedure that quickly compares k-mers of target and non-target sequences and returns unique sequences that are not present in the non-target. We have validated our algorithm with target nodes Mycobacterium tuberculosis, Neisseria gonorrhoeae, and Monkeypox and generated unique sequences. This universal algorithm is a powerful tool for generating diagnostic sequences, enabling the accurate identification of microbial strains with high phylogenetic precision.

Collapse