Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Gherardini PF, Helmer-Citterich M. Structure-based function prediction: approaches and applications. Brief Funct Genomic Proteomic 2008;7:291-302. [PMID: 18599513 DOI: 10.1093/bfgp/eln030] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

For:	Gherardini PF, Helmer-Citterich M. Structure-based function prediction: approaches and applications. Brief Funct Genomic Proteomic 2008;7:291-302. [PMID: 18599513 DOI: 10.1093/bfgp/eln030] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Number

Cited by Other Article(s)

Jang YJ, Qin QQ, Huang SY, Peter ATJ, Ding XM, Kornmann B. Accurate prediction of protein function using statistics-informed graph networks. Nat Commun 2024;15:6601. [PMID: 39097570 PMCID: PMC11297950 DOI: 10.1038/s41467-024-50955-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 07/15/2024] [Indexed: 08/05/2024] Open

Nordquist E, Zhang G, Barethiya S, Ji N, White KM, Han L, Jia Z, Shi J, Cui J, Chen J. Incorporating physics to overcome data scarcity in predictive modeling of protein function: A case study of BK channels. PLoS Comput Biol 2023;19:e1011460. [PMID: 37713443 PMCID: PMC10529646 DOI: 10.1371/journal.pcbi.1011460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Revised: 09/27/2023] [Accepted: 08/24/2023] [Indexed: 09/17/2023] Open

Abstract

Machine learning has played transformative roles in numerous chemical and biophysical problems such as protein folding where large amount of data exists. Nonetheless, many important problems remain challenging for data-driven machine learning approaches due to the limitation of data scarcity. One approach to overcome data scarcity is to incorporate physical principles such as through molecular modeling and simulation. Here, we focus on the big potassium (BK) channels that play important roles in cardiovascular and neural systems. Many mutants of BK channel are associated with various neurological and cardiovascular diseases, but the molecular effects are unknown. The voltage gating properties of BK channels have been characterized for 473 site-specific mutations experimentally over the last three decades; yet, these functional data by themselves remain far too sparse to derive a predictive model of BK channel voltage gating. Using physics-based modeling, we quantify the energetic effects of all single mutations on both open and closed states of the channel. Together with dynamic properties derived from atomistic simulations, these physical descriptors allow the training of random forest models that could reproduce unseen experimentally measured shifts in gating voltage, ∆V1/2, with a RMSE ~ 32 mV and correlation coefficient of R ~ 0.7. Importantly, the model appears capable of uncovering nontrivial physical principles underlying the gating of the channel, including a central role of hydrophobic gating. The model was further evaluated using four novel mutations of L235 and V236 on the S5 helix, mutations of which are predicted to have opposing effects on V1/2 and suggest a key role of S5 in mediating voltage sensor-pore coupling. The measured ∆V1/2 agree quantitatively with prediction for all four mutations, with a high correlation of R = 0.92 and RMSE = 18 mV. Therefore, the model can capture nontrivial voltage gating properties in regions where few mutations are known. The success of predictive modeling of BK voltage gating demonstrates the potential of combining physics and statistical learning for overcoming data scarcity in nontrivial protein function prediction.

Collapse

Nordquist E, Zhang G, Barethiya S, Ji N, White KM, Han L, Jia Z, Shi J, Cui J, Chen J. Incorporating physics to overcome data scarcity in predictive modeling of protein function: a case study of BK channels. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.24.546384. [PMID: 37425916 PMCID: PMC10327070 DOI: 10.1101/2023.06.24.546384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]

Abstract

Machine learning has played transformative roles in numerous chemical and biophysical problems such as protein folding where large amount of data exists. Nonetheless, many important problems remain challenging for data-driven machine learning approaches due to the limitation of data scarcity. One approach to overcome data scarcity is to incorporate physical principles such as through molecular modeling and simulation. Here, we focus on the big potassium (BK) channels that play important roles in cardiovascular and neural systems. Many mutants of BK channel are associated with various neurological and cardiovascular diseases, but the molecular effects are unknown. The voltage gating properties of BK channels have been characterized for 473 site-specific mutations experimentally over the last three decades; yet, these functional data by themselves remain far too sparse to derive a predictive model of BK channel voltage gating. Using physics-based modeling, we quantify the energetic effects of all single mutations on both open and closed states of the channel. Together with dynamic properties derived from atomistic simulations, these physical descriptors allow the training of random forest models that could reproduce unseen experimentally measured shifts in gating voltage, ΔV 1/2 , with a RMSE ∼ 32 mV and correlation coefficient of R ∼ 0.7. Importantly, the model appears capable of uncovering nontrivial physical principles underlying the gating of the channel, including a central role of hydrophobic gating. The model was further evaluated using four novel mutations of L235 and V236 on the S5 helix, mutations of which are predicted to have opposing effects on V 1/2 and suggest a key role of S5 in mediating voltage sensor-pore coupling. The measured ΔV 1/2 agree quantitatively with prediction for all four mutations, with a high correlation of R = 0.92 and RMSE = 18 mV. Therefore, the model can capture nontrivial voltage gating properties in regions where few mutations are known. The success of predictive modeling of BK voltage gating demonstrates the potential of combining physics and statistical learning for overcoming data scarcity in nontrivial protein function prediction.

Author Summary

Deep machine learning has brought many exciting breakthroughs in chemistry, physics and biology. These models require large amount of training data and struggle when the data is scarce. The latter is true for predictive modeling of the function of complex proteins such as ion channels, where only hundreds of mutational data may be available. Using the big potassium (BK) channel as a biologically important model system, we demonstrate that a reliable predictive model of its voltage gating property could be derived from only 473 mutational data by incorporating physics-derived features, which include dynamic properties from molecular dynamics simulations and energetic quantities from Rosetta mutation calculations. We show that the final random forest model captures key trends and hotspots in mutational effects of BK voltage gating, such as the important role of pore hydrophobicity. A particularly curious prediction is that mutations of two adjacent residues on the S5 helix would always have opposite effects on the gating voltage, which was confirmed by experimental characterization of four novel mutations. The current work demonstrates the importance and effectiveness of incorporating physics in predictive modeling of protein function with scarce data.

Collapse

Winker M, Chauveau A, Smieško M, Potterat O, Areesanan A, Zimmermann-Klemd A, Gründemann C. Immunological evaluation of herbal extracts commonly used for treatment of mental diseases during pregnancy. Sci Rep 2023;13:9630. [PMID: 37316493 DOI: 10.1038/s41598-023-35952-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 05/26/2023] [Indexed: 06/16/2023] Open

Abstract

Nonpsychotic mental diseases (NMDs) affect approximately 15% of pregnant women in the US. Herbal preparations are perceived a safe alternative to placenta-crossing antidepressants or benzodiazepines in the treatment of nonpsychotic mental diseases. But are these drugs really safe for mother and foetus? This question is of great relevance to physicians and patients. Therefore, this study investigates the influence of St. John's wort, valerian, hops, lavender, and California poppy and their compounds hyperforin and hypericin, protopine, valerenic acid, and valtrate, as well as linalool, on immune modulating effects in vitro. For this purpose a variety of methods was applied to assess the effects on viability and function of human primary lymphocytes. Viability was assessed via spectrometric assessment, flow cytometric detection of cell death markers and comet assay for possible genotoxicity. Functional assessment was conducted via flow cytometric assessment of proliferation, cell cycle and immunophenotyping. For California poppy, lavender, hops, and the compounds protopine and linalool, and valerenic acid, no effect was found on the viability, proliferation, and function of primary human lymphocytes. However, St. John's wort and valerian inhibited the proliferation of primary human lymphocytes. Hyperforin, hypericin, and valtrate inhibited viability, induced apoptosis, and inhibited cell division. Calculated maximum concentration of compounds in the body fluid, as well as calculated concentrations based on pharmacokinetic data from the literature, were low and supported that the observed effects in vitro would probably have no relevance on patients. In-silico analyses comparing the structure of studied substances with the structure of relevant control substances and known immunosuppressants revealed structural similarities of hyperforin and valerenic acid to the glucocorticoids. Valtrate showed structural similarities to the T cells signaling modulating drugs.

Collapse

Lysine Methyltransferase EhPKMT2 Is Involved in the In Vitro Virulence of Entamoeba histolytica. Pathogens 2023;12:pathogens12030474. [PMID: 36986396 PMCID: PMC10058465 DOI: 10.3390/pathogens12030474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 03/06/2023] [Accepted: 03/11/2023] [Indexed: 03/19/2023] Open

Sicilia C, Corral-Lugo A, Smialowski P, McConnell MJ, Martín-Galiano AJ. Unsupervised Machine Learning Organization of the Functional Dark Proteome of Gram-Negative "Superbugs": Six Protein Clusters Amenable for Distinct Scientific Applications. ACS OMEGA 2022;7:46131-46145. [PMID: 36570227 PMCID: PMC9774411 DOI: 10.1021/acsomega.2c04076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 10/06/2022] [Indexed: 06/17/2023]

In Silico Evaluation of Nonsynonymous SNPs in Human ADAM33: The Most Common Form of Genetic Association to Asthma Susceptibility. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022;2022:1089722. [DOI: 10.1155/2022/1089722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 09/09/2022] [Accepted: 10/07/2022] [Indexed: 11/13/2022]

Sengupta K, Saha S, Halder AK, Chatterjee P, Nasipuri M, Basu S, Plewczynski D. PFP-GO: Integrating protein sequence, domain and protein-protein interaction information for protein function prediction using ranked GO terms. Front Genet 2022;13:969915. [PMID: 36246645 PMCID: PMC9556876 DOI: 10.3389/fgene.2022.969915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 08/31/2022] [Indexed: 11/13/2022] Open

Lee J, Song SB, Chung YK, Jang JH, Huh J. BoostSweet: Learning molecular perceptual representations of sweeteners. Food Chem 2022;383:132435. [PMID: 35182866 DOI: 10.1016/j.foodchem.2022.132435] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Revised: 09/16/2021] [Accepted: 02/09/2022] [Indexed: 11/28/2022]

Bajaj P, Manjunath K, Varadarajan R. Structural and functional determinants inferred from deep mutational scans. Protein Sci 2022;31:e4357. [PMID: 35762712 PMCID: PMC9202547 DOI: 10.1002/pro.4357] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 04/04/2022] [Accepted: 05/11/2022] [Indexed: 11/08/2022]

Xia C, Feng SH, Xia Y, Pan X, Shen HB. Fast protein structure comparison through effective representation learning with contrastive graph neural networks. PLoS Comput Biol 2022;18:e1009986. [PMID: 35324898 PMCID: PMC8982879 DOI: 10.1371/journal.pcbi.1009986] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 04/05/2022] [Accepted: 03/03/2022] [Indexed: 12/03/2022] Open

Abstract

Protein structure alignment algorithms are often time-consuming, resulting in challenges for large-scale protein structure similarity-based retrieval. There is an urgent need for more efficient structure comparison approaches as the number of protein structures increases rapidly. In this paper, we propose an effective graph-based protein structure representation learning method, GraSR, for fast and accurate structure comparison. In GraSR, a graph is constructed based on the intra-residue distance derived from the tertiary structure. Then, deep graph neural networks (GNNs) with a short-cut connection learn graph representations of the tertiary structures under a contrastive learning framework. To further improve GraSR, a novel dynamic training data partition strategy and length-scaling cosine distance are introduced. We objectively evaluate our method GraSR on SCOPe v2.07 and a new released independent test set from PDB database with a designed comprehensive performance metric. Compared with other state-of-the-art methods, GraSR achieves about 7%-10% improvement on two benchmark datasets. GraSR is also much faster than alignment-based methods. We dig into the model and observe that the superiority of GraSR is mainly brought by the learned discriminative residue-level and global descriptors. The web-server and source code of GraSR are freely available at www.csbio.sjtu.edu.cn/bioinf/GraSR/ for academic use.

The size and shape of protein structures vary considerably. Accurate protein structure comparison usually relies on structure alignment algorithms. However, superimposing two protein structures is relatively time-consuming, which makes it inappropriate for large-scale protein structure retrieval. Alignment-free algorithms are proposed for efficient protein structure comparison over the last few decades. These algorithms first transform the coordinates of atoms in two proteins to fixed-length vectors. Then, the comparison can be done by measuring the distance or similarity between two vectors, which is much faster than alignment. In this study, we propose a novel protein structure representation method for efficient structure comparison. Compared with other state-of-the-art alignment-free methods, our method achieves better performance on both ranking and multi-class classification tasks due to the powerful representation ability of deep graph neural networks. We dig into the model and observe that the superiority of our method is mainly brought by the learned discriminative residue-level and global descriptors.

Collapse

Mabonga L, Masamba P, Kappo AP. Inhibitory potential of a benzoxazole derivative, 4FI against SNRPG∼RING finger domain protein complex as a lead compound in the discovery of anti-cancer drugs: A molecular dynamics simulation approach. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.100993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open

Questing functions and structures of hypothetical proteins from Campylobacter jejuni: a computer-aided approach. Biosci Rep 2021;40:225019. [PMID: 32458979 PMCID: PMC7284324 DOI: 10.1042/bsr20193939] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 05/17/2020] [Accepted: 05/26/2020] [Indexed: 12/12/2022] Open

Bhasin M, Varadarajan R. Prediction of Function Determining and Buried Residues Through Analysis of Saturation Mutagenesis Datasets. Front Mol Biosci 2021;8:635425. [PMID: 33778004 PMCID: PMC7991590 DOI: 10.3389/fmolb.2021.635425] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 01/25/2021] [Indexed: 11/13/2022] Open

Zohra Smaili F, Tian S, Roy A, Alazmi M, Arold ST, Mukherjee S, Scott Hefty P, Chen W, Gao X. QAUST: Protein Function Prediction Using Structure Similarity, Protein Interaction, and Functional Motifs. GENOMICS PROTEOMICS & BIOINFORMATICS 2021;19:998-1011. [PMID: 33631427 PMCID: PMC9403031 DOI: 10.1016/j.gpb.2021.02.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2018] [Revised: 04/03/2019] [Accepted: 05/17/2019] [Indexed: 11/25/2022]

Supo-Escalante RR, Médico A, Gushiken E, Olivos-Ramírez GE, Quispe Y, Torres F, Zamudio M, Antiparra R, Amzel LM, Gilman RH, Sheen P, Zimic M. Prediction of Mycobacterium tuberculosis pyrazinamidase function based on structural stability, physicochemical and geometrical descriptors. PLoS One 2020;15:e0235643. [PMID: 32735615 PMCID: PMC7394417 DOI: 10.1371/journal.pone.0235643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 06/19/2020] [Indexed: 12/02/2022] Open

Abstract

BACKGROUND

Pyrazinamide is an important drug against the latent stage of tuberculosis and is used in both first- and second-line treatment regimens. Pyrazinamide-susceptibility test usually takes a week to have a diagnosis to guide initial therapy, implying a delay in receiving appropriate therapy. The continued increase in multi-drug resistant tuberculosis and the prevalence of pyrazinamide resistance in several countries makes the development of assays for prompt identification of resistance necessary. The main cause of pyrazinamide resistance is the impairment of pyrazinamidase function attributed to mutations in the promoter and/or pncA coding gene. However, not all pncA mutations necessarily affect the pyrazinamidase function.

OBJECTIVE

To develop a methodology to predict pyrazinamidase function from detected mutations in the pncA gene.

METHODS

We measured the catalytic constant (kcat), KM, enzymatic efficiency, and enzymatic activity of 35 recombinant mutated pyrazinamidase and the wild type (Protein Data Bank ID = 3pl1). From all the 3D modeled structures, we extracted several predictors based on three categories: structural stability (estimated by normal mode analysis and molecular dynamics), physicochemical, and geometrical characteristics. We used a stepwise Akaike's information criterion forward multiple log-linear regression to model each kinetic parameter with each category of predictors. We also developed weighted models combining the three categories of predictive models for each kinetic parameter. We tested the robustness of the predictive ability of each model by 6-fold cross-validation against random models.

RESULTS

The stability, physicochemical, and geometrical descriptors explained most of the variability (R2) of the kinetic parameters. Our models are best suited to predict kcat, efficiency, and activity based on the root-mean-square error of prediction of the 6-fold cross-validation.

CONCLUSIONS

This study shows a quick approach to predict the pyrazinamidase function only from the pncA sequence when point mutations are present. This can be an important tool to detect pyrazinamide resistance.

Collapse

Affiliation(s)

Rydberg Roman Supo-Escalante Laboratorio de Bioinformática, Biología Molecular y Desarrollos Tecnológicos, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, Peru
Aldhair Médico Laboratorio de Bioinformática, Biología Molecular y Desarrollos Tecnológicos, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, Peru
Eduardo Gushiken Laboratorio de Bioinformática, Biología Molecular y Desarrollos Tecnológicos, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, Peru
Gustavo E. Olivos-Ramírez Laboratorio de Bioinformática, Biología Molecular y Desarrollos Tecnológicos, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, Peru
Yaneth Quispe Laboratorio de Bioinformática, Biología Molecular y Desarrollos Tecnológicos, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, Peru
Fiorella Torres Laboratorio de Bioinformática, Biología Molecular y Desarrollos Tecnológicos, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, Peru
Melissa Zamudio Laboratorio de Bioinformática, Biología Molecular y Desarrollos Tecnológicos, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, Peru
Ricardo Antiparra Laboratorio de Bioinformática, Biología Molecular y Desarrollos Tecnológicos, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, Peru
L. Mario Amzel Department of Biophysics and Biophysical Chemistry, Johns Hopkins University, Baltimore, MD, United States of America
Robert H. Gilman International Health Department, Johns Hopkins School of Public Health, Baltimore, MD, United States of America
Patricia Sheen Laboratorio de Bioinformática, Biología Molecular y Desarrollos Tecnológicos, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, Peru
Mirko Zimic Laboratorio de Bioinformática, Biología Molecular y Desarrollos Tecnológicos, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, Peru

Collapse

In Silico Elucidation of Deleterious Non-synonymous SNPs in SHANK3, the Autism Spectrum Disorder Gene. J Mol Neurosci 2020;70:1649-1667. [DOI: 10.1007/s12031-020-01552-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Accepted: 04/13/2020] [Indexed: 12/11/2022]

Pu L, Govindaraj RG, Lemoine JM, Wu HC, Brylinski M. DeepDrug3D: Classification of ligand-binding pockets in proteins with a convolutional neural network. PLoS Comput Biol 2019;15:e1006718. [PMID: 30716081 PMCID: PMC6375647 DOI: 10.1371/journal.pcbi.1006718] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Revised: 02/14/2019] [Accepted: 12/16/2018] [Indexed: 01/19/2023] Open

Han M, Song Y, Qian J, Ming D. Sequence-based prediction of physicochemical interactions at protein functional sites using a function-and-interaction-annotated domain profile database. BMC Bioinformatics 2018;19:204. [PMID: 29859055 PMCID: PMC5984826 DOI: 10.1186/s12859-018-2206-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 05/15/2018] [Indexed: 01/16/2023] Open

Mills CL, Garg R, Lee JS, Tian L, Suciu A, Cooperman GD, Beuning PJ, Ondrechen MJ. Functional classification of protein structures by local structure matching in graph representation. Protein Sci 2018;27:1125-1135. [PMID: 29604149 PMCID: PMC5980557 DOI: 10.1002/pro.3416] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 03/21/2018] [Accepted: 03/26/2018] [Indexed: 11/08/2022]

Survey of Computational Approaches for Prediction of DNA-Binding Residues on Protein Surfaces. Methods Mol Biol 2018. [PMID: 29536446 DOI: 10.1007/978-1-4939-7717-8_13] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

Jiao D, Han W, Ye Y. Functional association prediction by community profiling. Methods 2017;129:8-17. [PMID: 28454776 PMCID: PMC5643221 DOI: 10.1016/j.ymeth.2017.04.018] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Revised: 03/31/2017] [Accepted: 04/20/2017] [Indexed: 11/27/2022] Open

Abstract

Recent years have witnessed unprecedented accumulation of DNA sequences and therefore protein sequences (predicted from DNA sequences), due to the advances of sequencing technology. One of the major sources of the hypothetical proteins is the metagenomics research. Current annotation of metagenomes (collections of short metagenomic sequences or assemblies) relies on similarity searches against known gene/protein families, based on which functional profiles of microbial communities can be built. This practice, however, leaves out the hypothetical proteins, which may outnumber the known proteins for many microbial communities. On the other hand, we may ask: what can we gain from the large number of metagenomes made available by the metagenomic studies, for the annotation of metagenomic sequences as well as functional annotation of hypothetical proteins in general? Here we propose a community profiling approach for predicting functional associations between proteins: two proteins are predicted to be associated if they share similar presence and absence profiles (called community profiles) across microbial communities. Community profiling is conceptually similar to the phylogenetic profiling approach to functional prediction, however with fundamental differences. We tested different profile construction methods, the selection of reference metagenomes, and correlation metrics, among others, to optimize the performance of this new approach. We demonstrated that the community profiling approach alone slightly outperforms the phylogenetic profiling approach for associating proteins in species that are well represented by sequenced genomes, and combining phylogenetic and community profiling further improves (though only marginally) the prediction of functional association. Further we showed that community profiling method significantly outperforms phylogenetic profiling, revealing more functional associations, when applied to a more recently sequenced bacterial genome.

Collapse

Zinati Z, Alemzadeh A, KayvanJoo AH. Computational approaches for classification and prediction of P-type ATPase substrate specificity in Arabidopsis. PHYSIOLOGY AND MOLECULAR BIOLOGY OF PLANTS : AN INTERNATIONAL JOURNAL OF FUNCTIONAL PLANT BIOLOGY 2016;22:163-174. [PMID: 27186030 PMCID: PMC4840148 DOI: 10.1007/s12298-016-0351-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Revised: 03/15/2016] [Accepted: 03/28/2016] [Indexed: 06/05/2023]

Parasuram R, Mills CL, Wang Z, Somasundaram S, Beuning PJ, Ondrechen MJ. Local structure based method for prediction of the biochemical function of proteins: Applications to glycoside hydrolases. Methods 2016;93:51-63. [DOI: 10.1016/j.ymeth.2015.11.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 11/05/2015] [Accepted: 11/09/2015] [Indexed: 01/07/2023] Open

Carvalho HF, Roque ACA, Iranzo O, Branco RJF. Comparison of the Internal Dynamics of Metalloproteases Provides New Insights on Their Function and Evolution. PLoS One 2015;10:e0138118. [PMID: 26397984 PMCID: PMC4580569 DOI: 10.1371/journal.pone.0138118] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2015] [Accepted: 08/25/2015] [Indexed: 11/20/2022] Open

Abstract

Metalloproteases have evolved in a vast number of biological systems, being one of the most diverse types of proteases and presenting a wide range of folds and catalytic metal ions. Given the increasing understanding of protein internal dynamics and its role in enzyme function, we are interested in assessing how the structural heterogeneity of metalloproteases translates into their dynamics. Therefore, the dynamical profile of the clan MA type protein thermolysin, derived from an Elastic Network Model of protein structure, was evaluated against those obtained from a set of experimental structures and molecular dynamics simulation trajectories. A close correspondence was obtained between modes derived from the coarse-grained model and the subspace of functionally-relevant motions observed experimentally, the later being shown to be encoded in the internal dynamics of the protein. This prompted the use of dynamics-based comparison methods that employ such coarse-grained models in a representative set of clan members, allowing for its quantitative description in terms of structural and dynamical variability. Although members show structural similarity, they nonetheless present distinct dynamical profiles, with no apparent correlation between structural and dynamical relatedness. However, previously unnoticed dynamical similarity was found between the relevant members Carboxypeptidase Pfu, Leishmanolysin, and Botulinum Neurotoxin Type A, despite sharing no structural similarity. Inspection of the respective alignments shows that dynamical similarity has a functional basis, namely the need for maintaining proper intermolecular interactions with the respective substrates. These results suggest that distinct selective pressure mechanisms act on metalloproteases at structural and dynamical levels through the course of their evolution. This work shows how new insights on metalloprotease function and evolution can be assessed with comparison schemes that incorporate information on protein dynamics. The integration of these newly developed tools, if applied to other protein families, can lead to more accurate and descriptive protein classification systems.

Collapse

Khan IK, Wei Q, Chapman S, KC DB, Kihara D. The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches. Gigascience 2015;4:43. [PMID: 26380077 PMCID: PMC4570625 DOI: 10.1186/s13742-015-0083-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Accepted: 08/27/2015] [Indexed: 12/29/2022] Open

Abstract

BACKGROUND

Functional annotation of novel proteins is one of the central problems in bioinformatics. With the ever-increasing development of genome sequencing technologies, more and more sequence information is becoming available to analyze and annotate. To achieve fast and automatic function annotation, many computational (automated) function prediction (AFP) methods have been developed. To objectively evaluate the performance of such methods on a large scale, community-wide assessment experiments have been conducted. The second round of the Critical Assessment of Function Annotation (CAFA) experiment was held in 2013-2014. Evaluation of participating groups was reported in a special interest group meeting at the Intelligent Systems in Molecular Biology (ISMB) conference in Boston in 2014. Our group participated in both CAFA1 and CAFA2 using multiple, in-house AFP methods. Here, we report benchmark results of our methods obtained in the course of preparation for CAFA2 prior to submitting function predictions for CAFA2 targets.

RESULTS

For CAFA2, we updated the annotation databases used by our methods, protein function prediction (PFP) and extended similarity group (ESG), and benchmarked their function prediction performances using the original (older) and updated databases. Performance evaluation for PFP with different settings and ESG are discussed. We also developed two ensemble methods that combine function predictions from six independent, sequence-based AFP methods. We further analyzed the performances of our prediction methods by enriching the predictions with prior distribution of gene ontology (GO) terms. Examples of predictions by the ensemble methods are discussed.

CONCLUSIONS

Updating the annotation database was successful, improving the Fmax prediction accuracy score for both PFP and ESG. Adding the prior distribution of GO terms did not make much improvement. Both of the ensemble methods we developed improved the average Fmax score over all individual component methods except for ESG. Our benchmark results will not only complement the overall assessment that will be done by the CAFA organizers, but also help elucidate the predictive powers of sequence-based function prediction methods in general.

Collapse

Mudgal R, Sandhya S, Chandra N, Srinivasan N. De-DUFing the DUFs: Deciphering distant evolutionary relationships of Domains of Unknown Function using sensitive homology detection methods. Biol Direct 2015;10:38. [PMID: 26228684 PMCID: PMC4520260 DOI: 10.1186/s13062-015-0069-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Accepted: 07/20/2015] [Indexed: 12/23/2022] Open

Abstract

Background

In the post-genomic era where sequences are being determined at a rapid rate, we are highly reliant on computational methods for their tentative biochemical characterization. The Pfam database currently contains 3,786 families corresponding to “Domains of Unknown Function” (DUF) or “Uncharacterized Protein Family” (UPF), of which 3,087 families have no reported three-dimensional structure, constituting almost one-fourth of the known protein families in search for both structure and function.

Results

We applied a ‘computational structural genomics’ approach using five state-of-the-art remote similarity detection methods to detect the relationship between uncharacterized DUFs and domain families of known structures. The association with a structural domain family could serve as a start point in elucidating the function of a DUF. Amongst these five methods, searches in SCOP-NrichD database have been applied for the first time. Predictions were classified into high, medium and low- confidence based on the consensus of results from various approaches and also annotated with enzyme and Gene ontology terms. 614 uncharacterized DUFs could be associated with a known structural domain, of which high confidence predictions, involving at least four methods, were made for 54 families. These structure-function relationships for the 614 DUF families can be accessed on-line at http://proline.biochem.iisc.ernet.in/RHD_DUFS/. For potential enzymes in this set, we assessed their compatibility with the associated fold and performed detailed structural and functional annotation by examining alignments and extent of conservation of functional residues. Detailed discussion is provided for interesting assignments for DUF3050, DUF1636, DUF1572, DUF2092 and DUF659.

Conclusions

This study provides insights into the structure and potential function for nearly 20 % of the DUFs. Use of different computational approaches enables us to reliably recognize distant relationships, especially when they converge to a common assignment because the methods are often complementary. We observe that while pointers to the structural domain can offer the right clues to the function of a protein, recognition of its precise functional role is still ‘non-trivial’ with many DUF domains conserving only some of the critical residues. It is not clear whether these are functional vestiges or instances involving alternate substrates and interacting partners.

Reviewers

This article was reviewed by Drs Eugene Koonin, Frank Eisenhaber and Srikrishna Subramanian.

Electronic supplementary material

The online version of this article (doi:10.1186/s13062-015-0069-2) contains supplementary material, which is available to authorized users.

Collapse

Mills CL, Beuning PJ, Ondrechen MJ. Biochemical functional predictions for protein structures of unknown or uncertain function. Comput Struct Biotechnol J 2015;13:182-91. [PMID: 25848497 PMCID: PMC4372640 DOI: 10.1016/j.csbj.2015.02.003] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2014] [Revised: 02/06/2015] [Accepted: 02/11/2015] [Indexed: 01/07/2023] Open

Bianchi V, Mangone I, Ferrè F, Helmer-Citterich M, Ausiello G. webPDBinder: a server for the identification of ligand binding sites on protein structures. Nucleic Acids Res 2013;41:W308-13. [PMID: 23737450 PMCID: PMC3692056 DOI: 10.1093/nar/gkt457] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open

Chitale M, Khan IK, Kihara D. In-depth performance evaluation of PFP and ESG sequence-based function prediction methods in CAFA 2011 experiment. BMC Bioinformatics 2013;14 Suppl 3:S2. [PMID: 23514353 PMCID: PMC3584938 DOI: 10.1186/1471-2105-14-s3-s2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open

Nam HJ, Han SK, Bowie JU, Kim S. Rampant exchange of the structure and function of extramembrane domains between membrane and water soluble proteins. PLoS Comput Biol 2013;9:e1002997. [PMID: 23555228 PMCID: PMC3605051 DOI: 10.1371/journal.pcbi.1002997] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Accepted: 02/04/2013] [Indexed: 11/19/2022] Open

Khan I, Chitale M, Rayon C, Kihara D. Evaluation of function predictions by PFP, ESG,and PSI-BLAST for moonlighting proteins. BMC Proc 2012;6 Suppl 7:S5. [PMID: 23173871 PMCID: PMC3504920 DOI: 10.1186/1753-6561-6-s7-s5] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open

Abstract

Background

Advancements in function prediction algorithms are enabling large scale computational annotation for newly sequenced genomes. With the increase in the number of functionally well characterized proteins it has been observed that there are many proteins involved in more than one function. These proteins characterized as moonlighting proteins show varied functional behavior depending on the cell type, localization in the cell, oligomerization, multiple binding sites, etc. The functional diversity shown by moonlighting proteins may have significant impact on the traditional sequence based function prediction methods. Here we investigate how well diverse functions of moonlighting proteins can be predicted by some existing function prediction methods.

Results

We have analyzed the performances of three major sequence based function prediction methods, PSI-BLAST, the Protein Function Prediction (PFP), and the Extended Similarity Group (ESG) on predicting diverse functions of moonlighting proteins. In predicting discrete functions of a set of 19 experimentally identified moonlighting proteins, PFP showed overall highest recall among the three methods. Although ESG showed the highest precision, its recall was lower than PSI-BLAST. Recall by PSI-BLAST greatly improved when BLOSUM45 was used instead of BLOSUM62.

Conclusion

We have analyzed the performances of PFP, ESG, and PSI-BLAST in predicting the functional diversity of moonlighting proteins. PFP shows overall better performance in predicting diverse moonlighting functions as compared with PSI-BLAST and ESG. Recall by PSI-BLAST greatly improved when BLOSUM45 was used. This analysis indicates that considering weakly similar sequences in prediction enhances the performance of sequence based AFP methods in predicting functional diversity of moonlighting proteins. The current study will also motivate development of novel computational frameworks for automatic identification of such proteins.

Collapse

Wass MN, Barton G, Sternberg MJE. CombFunc: predicting protein function using heterogeneous data sources. Nucleic Acids Res 2012;40:W466-70. [PMID: 22641853 PMCID: PMC3394346 DOI: 10.1093/nar/gks489] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open

Roy A, Yang J, Zhang Y. COFACTOR: an accurate comparative algorithm for structure-based protein function annotation. Nucleic Acids Res 2012;40:W471-7. [PMID: 22570420 PMCID: PMC3394312 DOI: 10.1093/nar/gks372] [Citation(s) in RCA: 460] [Impact Index Per Article: 38.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Bianchi V, Gherardini PF, Helmer-Citterich M, Ausiello G. Identification of binding pockets in protein structures using a knowledge-based potential derived from local structural similarities. BMC Bioinformatics 2012;13 Suppl 4:S17. [PMID: 22536963 PMCID: PMC3434446 DOI: 10.1186/1471-2105-13-s4-s17] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open

Sehnal D, Vařeková RS, Huber HJ, Geidl S, Ionescu CM, Wimmerová M, Koča J. SiteBinder: an improved approach for comparing multiple protein structural motifs. J Chem Inf Model 2012;52:343-59. [PMID: 22296449 DOI: 10.1021/ci200444d] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Abstract

There is a paramount need to develop new techniques and tools that will extract as much information as possible from the ever growing repository of protein 3D structures. We report here on the development of a software tool for the multiple superimposition of large sets of protein structural motifs. Our superimposition methodology performs a systematic search for the atom pairing that provides the best fit. During this search, the RMSD values for all chemically relevant pairings are calculated by quaternion algebra. The number of evaluated pairings is markedly decreased by using PDB annotations for atoms. This approach guarantees that the best fit will be found and can be applied even when sequence similarity is low or does not exist at all. We have implemented this methodology in the Web application SiteBinder, which is able to process up to thousands of protein structural motifs in a very short time, and which provides an intuitive and user-friendly interface. Our benchmarking analysis has shown the robustness, efficiency, and versatility of our methodology and its implementation by the successful superimposition of 1000 experimentally determined structures for each of 32 eukaryotic linear motifs. We also demonstrate the applicability of SiteBinder using three case studies. We first compared the structures of 61 PA-IIL sugar binding sites containing nine different sugars, and we found that the sugar binding sites of PA-IIL and its mutants have a conserved structure despite their binding different sugars. We then superimposed over 300 zinc finger central motifs and revealed that the molecular structure in the vicinity of the Zn atom is highly conserved. Finally, we superimposed 12 BH3 domains from pro-apoptotic proteins. Our findings come to support the hypothesis that there is a structural basis for the functional segregation of BH3-only proteins into activators and enablers.

Collapse

Sael L, Chitale M, Kihara D. Structure- and sequence-based function prediction for non-homologous proteins. ACTA ACUST UNITED AC 2012;13:111-23. [PMID: 22270458 DOI: 10.1007/s10969-012-9126-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2011] [Accepted: 01/10/2012] [Indexed: 01/14/2023]

Schmidt T, Haas J, Gallo Cassarino T, Schwede T. Assessment of ligand-binding residue predictions in CASP9. Proteins 2011;79 Suppl 10:126-36. [PMID: 21987472 DOI: 10.1002/prot.23174] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2011] [Revised: 07/29/2011] [Accepted: 08/04/2011] [Indexed: 11/06/2022]

Gamliel R, Kedem K, Kolodny R, Keasar C. A library of protein surface patches discriminates between native structures and decoys generated by structure prediction servers. BMC STRUCTURAL BIOLOGY 2011;11:20. [PMID: 21542935 PMCID: PMC3114701 DOI: 10.1186/1472-6807-11-20] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2010] [Accepted: 05/04/2011] [Indexed: 11/10/2022]

Abstract

Background

Protein surfaces serve as an interface with the molecular environment and are thus tightly bound to protein function. On the surface, geometric and chemical complementarity to other molecules provides interaction specificity for ligand binding, docking of bio-macromolecules, and enzymatic catalysis.

As of today, there is no accepted general scheme to represent protein surfaces. Furthermore, most of the research on protein surface focuses on regions of specific interest such as interaction, ligand binding, and docking sites. We present a first step toward a general purpose representation of protein surfaces: a novel surface patch library that represents most surface patches (~98%) in a data set regardless of their functional roles.

Results

Surface patches, in this work, are small fractions of the protein surface. Using a measure of inter-patch distance, we clustered patches extracted from a data set of high quality, non-redundant, proteins. The surface patch library is the collection of all the cluster centroids; thus, each of the data set patches is close to one of the elements in the library.

We demonstrate the biological significance of our method through the ability of the library to capture surface characteristics of native protein structures as opposed to those of decoy sets generated by state-of-the-art protein structure prediction methods. The patches of the decoys are significantly less compatible with the library than their corresponding native structures, allowing us to reliably distinguish native models from models generated by servers. This trend, however, does not extend to the decoys themselves, as their similarity to the native structures does not correlate with compatibility with the library.

Conclusions

We expect that this high-quality, generic surface patch library will add a new perspective to the description of protein structures and improve our ability to predict them. In particular, we expect that it will help improve the prediction of surface features that are apparently neglected by current techniques.

The surface patch libraries are publicly available at http://www.cs.bgu.ac.il/~keasar/patchLibrary.

Collapse

Bertran AGM, Oliveira AS, Nagata T, Resende RO. Molecular characterization of the RNA-dependent RNA polymerase from groundnut ringspot virus (genus Tospovirus, family Bunyaviridae). Arch Virol 2011;156:1425-9. [PMID: 21442231 DOI: 10.1007/s00705-011-0973-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2011] [Accepted: 03/07/2011] [Indexed: 11/26/2022]

Somarowthu S, Yang H, Hildebrand DG, Ondrechen MJ. High-performance prediction of functional residues in proteins with machine learning and computed input features. Biopolymers 2011;95:390-400. [DOI: 10.1002/bip.21589] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Grant MA. INTEGRATING COMPUTATIONAL PROTEIN FUNCTION PREDICTION INTO DRUG DISCOVERY INITIATIVES. Drug Dev Res 2010;72:4-16. [PMID: 25530654 DOI: 10.1002/ddr.20397] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Venner E, Lisewski AM, Erdin S, Ward RM, Amin SR, Lichtarge O. Accurate protein structure annotation through competitive diffusion of enzymatic functions over a network of local evolutionary similarities. PLoS One 2010;5:e14286. [PMID: 21179190 PMCID: PMC3001439 DOI: 10.1371/journal.pone.0014286] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2010] [Accepted: 11/10/2010] [Indexed: 12/24/2022] Open

Moll M, Bryant DH, Kavraki LE. The LabelHash algorithm for substructure matching. BMC Bioinformatics 2010;11:555. [PMID: 21070651 PMCID: PMC2996407 DOI: 10.1186/1471-2105-11-555] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2010] [Accepted: 11/11/2010] [Indexed: 08/30/2023] Open

Abstract

Background

There is an increasing number of proteins with known structure but unknown function. Determining their function would have a significant impact on understanding diseases and designing new therapeutics. However, experimental protein function determination is expensive and very time-consuming. Computational methods can facilitate function determination by identifying proteins that have high structural and chemical similarity.

Results

We present LabelHash, a novel algorithm for matching substructural motifs to large collections of protein structures. The algorithm consists of two phases. In the first phase the proteins are preprocessed in a fashion that allows for instant lookup of partial matches to any motif. In the second phase, partial matches for a given motif are expanded to complete matches. The general applicability of the algorithm is demonstrated with three different case studies. First, we show that we can accurately identify members of the enolase superfamily with a single motif. Next, we demonstrate how LabelHash can complement SOIPPA, an algorithm for motif identification and pairwise substructure alignment. Finally, a large collection of Catalytic Site Atlas motifs is used to benchmark the performance of the algorithm. LabelHash runs very efficiently in parallel; matching a motif against all proteins in the 95% sequence identity filtered non-redundant Protein Data Bank typically takes no more than a few minutes. The LabelHash algorithm is available through a web server and as a suite of standalone programs at http://labelhash.kavrakilab.org. The output of the LabelHash algorithm can be further analyzed with Chimera through a plugin that we developed for this purpose.

Conclusions

LabelHash is an efficient, versatile algorithm for large-scale substructure matching. When LabelHash is running in parallel, motifs can typically be matched against the entire PDB on the order of minutes. The algorithm is able to identify functional homologs beyond the twilight zone of sequence identity and even beyond fold similarity. The three case studies presented in this paper illustrate the versatility of the algorithm.

Collapse

Parca L, Gherardini PF, Helmer-Citterich M, Ausiello G. Phosphate binding sites identification in protein structures. Nucleic Acids Res 2010;39:1231-42. [PMID: 20974634 PMCID: PMC3045618 DOI: 10.1093/nar/gkq987] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open

Unmet challenges of structural genomics. Curr Opin Struct Biol 2010;20:587-97. [PMID: 20810277 DOI: 10.1016/j.sbi.2010.08.001] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2010] [Revised: 07/30/2010] [Accepted: 08/03/2010] [Indexed: 11/22/2022]

Li GH, Huang JF. CMASA: an accurate algorithm for detecting local protein structural similarity and its application to enzyme catalytic site annotation. BMC Bioinformatics 2010;11:439. [PMID: 20796320 PMCID: PMC2936402 DOI: 10.1186/1471-2105-11-439] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2009] [Accepted: 08/27/2010] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The rapid development of structural genomics has resulted in many "unknown function" proteins being deposited in Protein Data Bank (PDB), thus, the functional prediction of these proteins has become a challenge for structural bioinformatics. Several sequence-based and structure-based methods have been developed to predict protein function, but these methods need to be improved further, such as, enhancing the accuracy, sensitivity, and the computational speed. Here, an accurate algorithm, the CMASA (Contact MAtrix based local Structural Alignment algorithm), has been developed to predict unknown functions of proteins based on the local protein structural similarity. This algorithm has been evaluated by building a test set including 164 enzyme families, and also been compared to other methods.

RESULTS

The evaluation of CMASA shows that the CMASA is highly accurate (0.96), sensitive (0.86), and fast enough to be used in the large-scale functional annotation. Comparing to both sequence-based and global structure-based methods, not only the CMASA can find remote homologous proteins, but also can find the active site convergence. Comparing to other local structure comparison-based methods, the CMASA can obtain the better performance than both FFF (a method using geometry to predict protein function) and SPASM (a local structure alignment method); and the CMASA is more sensitive than PINTS and is more accurate than JESS (both are local structure alignment methods). The CMASA was applied to annotate the enzyme catalytic sites of the non-redundant PDB, and at least 166 putative catalytic sites have been suggested, these sites can not be observed by the Catalytic Site Atlas (CSA).

CONCLUSIONS

The CMASA is an accurate algorithm for detecting local protein structural similarity, and it holds several advantages in predicting enzyme active sites. The CMASA can be used in large-scale enzyme active site annotation. The CMASA can be available by the mail-based server (http://159.226.149.45/other1/CMASA/CMASA.htm).

Collapse

Gherardini PF, Ausiello G, Helmer-Citterich M. Superpose3D: a local structural comparison program that allows for user-defined structure representations. PLoS One 2010;5:e11988. [PMID: 20700534 PMCID: PMC2916828 DOI: 10.1371/journal.pone.0011988] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2010] [Accepted: 07/08/2010] [Indexed: 11/19/2022] Open

Wass MN, Kelley LA, Sternberg MJE. 3DLigandSite: predicting ligand-binding sites using similar structures. Nucleic Acids Res 2010;38:W469-73. [PMID: 20513649 PMCID: PMC2896164 DOI: 10.1093/nar/gkq406] [Citation(s) in RCA: 457] [Impact Index Per Article: 32.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open

Cilia E, Passerini A. Automatic prediction of catalytic residues by modeling residue structural neighborhood. BMC Bioinformatics 2010;11:115. [PMID: 20199672 PMCID: PMC2844391 DOI: 10.1186/1471-2105-11-115] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2009] [Accepted: 03/03/2010] [Indexed: 02/05/2023] Open