Dhifli W, Diallo AB. ProtNN: fast and accurate protein 3D-structure classification in structural and topological space.
BioData Min 2016;
9:30. [PMID:
27688811 PMCID:
PMC5034655 DOI:
10.1186/s13040-016-0108-2]
[Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2016] [Accepted: 08/22/2016] [Indexed: 11/30/2022] Open
Abstract
Background
Studying the functions and structures of proteins is important for understanding the molecular mechanisms of life. The number of publicly available protein structures has increasingly become extremely large. Still, the classification of a protein structure remains a difficult, costly, and time consuming task. The difficulties are often due to the essential role of spatial and topological structures in the classification of protein structures.
Results
We propose ProtNN, a novel classification approach for protein 3D-structures. Given an unannotated query protein structure and a set of annotated proteins, ProtNN assigns to the query protein the class with the highest number of votes across the k nearest neighbor reference proteins, where k is a user-defined parameter. The search of the nearest neighbor annotated structures is based on a protein-graph representation model and pairwise similarities between vector embedding of the query and the reference protein structures in structural and topological spaces.
Conclusions
We demonstrate through an extensive experimental evaluation that ProtNN is able to accurately classify several datasets in an extremely fast runtime compared to state-of-the-art approaches. We further show that ProtNN is able to scale up to a whole PDB dataset in a single-process mode with no parallelization, with a gain of thousands order of magnitude in runtime compared to state-of-the-art approaches.
Collapse