1
|
Pathira Kankanamge L, Mora A, Ondrechen MJ, Beuning PJ. Biochemical Activity of 17 Cancer-Associated Variants of DNA Polymerase Kappa Predicted by Electrostatic Properties. Chem Res Toxicol 2023; 36:1789-1803. [PMID: 37883788 PMCID: PMC10664756 DOI: 10.1021/acs.chemrestox.3c00233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 10/03/2023] [Accepted: 10/04/2023] [Indexed: 10/28/2023]
Abstract
DNA damage and repair have been widely studied in relation to cancer and therapeutics. Y-family DNA polymerases can bypass DNA lesions, which may result from external or internal DNA damaging agents, including some chemotherapy agents. Overexpression of the Y-family polymerase human pol kappa can result in tumorigenesis and drug resistance in cancer. This report describes the use of computational tools to predict the effects of single nucleotide polymorphism variants on pol kappa activity. Partial Order Optimum Likelihood (POOL), a machine learning method that uses input features from Theoretical Microscopic Titration Curve Shapes (THEMATICS), was used to identify amino acid residues most likely involved in catalytic activity. The μ4 value, a metric obtained from POOL and THEMATICS that serves as a measure of the degree of coupling between one ionizable amino acid and its neighbors, was then used to identify which protein mutations are likely to impact the biochemical activity. Bioinformatic tools SIFT, PolyPhen-2, and FATHMM predicted most of these variants to be deleterious to function. Along with computational and bioinformatic predictions, we characterized the catalytic activity and stability of 17 cancer-associated DNA pol kappa variants. We identified pol kappa variants R48I, H105Y, G147D, G154E, V177L, R298C, E362V, and R470C as having lower activity relative to wild-type pol kappa; the pol kappa variants T102A, H142Y, R175Q, E210K, Y221C, N330D, N338S, K353T, and L383F were identified as being similar in catalytic efficiency to WT pol kappa. We observed that POOL predictions can be used to predict which variants have decreased activity. Predictions from bioinformatic tools like SIFT, PolyPhen-2, and FATHMM are based on sequence comparisons and therefore are complementary to POOL but are less capable of predicting biochemical activity. These bioinformatic and computational tools can be used to identify SNP variants with deleterious effects and altered biochemical activity from a large data set.
Collapse
Affiliation(s)
- Lakindu
S. Pathira Kankanamge
- Department
of Chemistry and Chemical Biology and Department of Bioengineering, Northeastern University, Boston, Massachusetts 02115, United States
| | - Alexandra Mora
- Department
of Chemistry and Chemical Biology and Department of Bioengineering, Northeastern University, Boston, Massachusetts 02115, United States
| | - Mary Jo Ondrechen
- Department
of Chemistry and Chemical Biology and Department of Bioengineering, Northeastern University, Boston, Massachusetts 02115, United States
| | - Penny J. Beuning
- Department
of Chemistry and Chemical Biology and Department of Bioengineering, Northeastern University, Boston, Massachusetts 02115, United States
| |
Collapse
|
2
|
Khan K, Alhar MSO, Abbas MN, Abbas SQ, Kazi M, Khan SA, Sadiq A, Hassan SSU, Bungau S, Jalal K. Integrated Bioinformatics-Based Subtractive Genomics Approach to Decipher the Therapeutic Drug Target and Its Possible Intervention against Brucellosis. Bioengineering (Basel) 2022; 9:633. [PMID: 36354544 PMCID: PMC9687753 DOI: 10.3390/bioengineering9110633] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 10/28/2022] [Accepted: 10/29/2022] [Indexed: 11/16/2023] Open
Abstract
Brucella suis, one of the causative agents of brucellosis, is Gram-negative intracellular bacteria that may be found all over the globe and it is a significant facultative zoonotic pathogen found in livestock. It may adapt to a phagocytic environment, reproduce, and develop resistance to harmful environments inside host cells, which is a crucial part of the Brucella life cycle making it a worldwide menace. The molecular underpinnings of Brucella pathogenicity have been substantially elucidated due to comprehensive methods such as proteomics. Therefore, we aim to explore the complete Brucella suis proteome to prioritize the novel proteins as drug targets via subtractive proteo-genomics analysis, an effort to conjecture the existence of distinct pathways in the development of brucellosis. Consequently, 38 unique metabolic pathways having 503 proteins were observed while among these 503 proteins, the non-homologs (n = 421), essential (n = 350), drug-like (n = 114), virulence (n = 45), resistance (n = 42), and unique to pathogen proteins were retrieved from Brucella suis. The applied subsequent hierarchical shortlisting resulted in a protein, i.e., isocitrate lyase, that may act as potential drug target, which was finalized after the extensive literature survey. The interacting partners for these shortlisted drug targets were identified through the STRING database. Moreover, structure-based studies were also performed on isocitrate lyase to further analyze its function. For that purpose, ~18,000 ZINC compounds were screened to identify new potent drug candidates against isocitrate lyase for brucellosis. It resulted in the shortlisting of six compounds, i.e., ZINC95543764, ZINC02688148, ZINC20115475, ZINC04232055, ZINC04231816, and ZINC04259566 that potentially inhibit isocitrate lyase. However, the ADMET profiling showed that all compounds fulfill ADMET properties except for ZINC20115475 showing positive Ames activity; whereas, ZINC02688148, ZINC04259566, ZINC04232055, and ZINC04231816 showed hepatoxicity while all compounds were observed to have no skin sensitization. In light of these parameters, we recommend ZINC95543764 compound for further experimental studies. According to the present research, which uses subtractive genomics, proteins that might serve as therapeutic targets and potential lead options for eradicating brucellosis have been narrowed down.
Collapse
Affiliation(s)
- Kanwal Khan
- Dr. Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Karachi City 75270, Pakistan
| | | | - Muhammad Naseer Abbas
- Department of Pharmacy, Kohat University of Science and Technology, Kohat 26000, Pakistan
| | - Syed Qamar Abbas
- Department of Pharmacy, Sarhad University of Science and Technology, Peshawar 25000, Pakistan
| | - Mohsin Kazi
- Department of Pharmaceutics, College of Pharmacy, P.O. Box-2457, King Saud University, Riyadh 11451, Saudi Arabia
| | - Saeed Ahmad Khan
- Department of Pharmacy, Kohat University of Science and Technology, Kohat 26000, Pakistan
- Division of Molecular Pharmaceutics and Drug Delivery, The University of Texas at Austin, 2409 University Ave., Austin, TX 78712, USA
| | - Abdul Sadiq
- Department of Pharmacy, Faculty of Biological Sciences, University of Malakand, Chakdara 18000, Pakistan
| | - Syed Shams ul Hassan
- Shanghai Key Laboratory for Molecular Engineering of Chiral Drugs, School of Pharmacy, Shanghai Jiao Tong University, Shanghai 200240, China
- Department of Natural Product Chemistry, School of Pharmacy, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Simona Bungau
- Department of Pharmacy, Faculty of Medicine and Pharmacy, University of Oradea, 410028 Oradea, Romania
| | - Khurshid Jalal
- HEJ Research Institute of Chemistry International Center for Chemical and Biological Sciences, University of Karachi, Karachi City 75270, Pakistan
| |
Collapse
|
3
|
Ngu L, Winters JN, Nguyen K, Ramos KE, DeLateur NA, Makowski L, Whitford PC, Ondrechen MJ, Beuning PJ. Probing remote residues important for catalysis in Escherichia coli ornithine transcarbamoylase. PLoS One 2020; 15:e0228487. [PMID: 32027716 PMCID: PMC7004355 DOI: 10.1371/journal.pone.0228487] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2019] [Accepted: 01/16/2020] [Indexed: 12/14/2022] Open
Abstract
Understanding how enzymes achieve their tremendous catalytic power is a major question in biochemistry. Greater understanding is also needed for enzyme engineering applications. In many cases, enzyme efficiency and specificity depend on residues not in direct contact with the substrate, termed remote residues. This work focuses on Escherichia coli ornithine transcarbamoylase (OTC), which plays a central role in amino acid metabolism. OTC has been reported to undergo an induced-fit conformational change upon binding its first substrate, carbamoyl phosphate (CP), and several residues important for activity have been identified. Using computational methods based on the computed chemical properties from theoretical titration curves, sequence-based scores derived from evolutionary history, and protein surface topology, residues important for catalytic activity were predicted. The roles of these residues in OTC activity were tested by constructing mutations at predicted positions, followed by steady-state kinetics assays and substrate binding studies with the variants. First-layer mutations R57A and D231A, second-layer mutation H272L, and third-layer mutation E299Q, result in 57- to 450-fold reductions in kcat/KM with respect to CP and 44- to 580-fold reductions with respect to ornithine. Second-layer mutations D140N and Y160S also reduce activity with respect to ornithine. Most variants had decreased stability relative to wild-type OTC, with variants H272L, H272N, and E299Q having the greatest decreases. Variants H272L, E299Q, and R57A also show compromised CP binding. In addition to direct effects on catalytic activity, effects on overall protein stability and substrate binding were observed that reveal the intricacies of how these residues contribute to catalysis.
Collapse
Affiliation(s)
- Lisa Ngu
- Department of Chemistry & Chemical Biology, Northeastern University, Boston, MA, United States of America
| | - Jenifer N. Winters
- Department of Chemistry & Chemical Biology, Northeastern University, Boston, MA, United States of America
| | - Kien Nguyen
- Department of Physics, Northeastern University, Boston, MA, United States of America
| | - Kevin E. Ramos
- Department of Chemistry & Chemical Biology, Northeastern University, Boston, MA, United States of America
| | - Nicholas A. DeLateur
- Department of Chemistry & Chemical Biology, Northeastern University, Boston, MA, United States of America
| | - Lee Makowski
- Department of Chemistry & Chemical Biology, Northeastern University, Boston, MA, United States of America
- Department of Bioengineering, Northeastern University, Boston, MA, United States of America
| | - Paul C. Whitford
- Department of Physics, Northeastern University, Boston, MA, United States of America
| | - Mary Jo Ondrechen
- Department of Chemistry & Chemical Biology, Northeastern University, Boston, MA, United States of America
- * E-mail: (MJO); (PJB)
| | - Penny J. Beuning
- Department of Chemistry & Chemical Biology, Northeastern University, Boston, MA, United States of America
- * E-mail: (MJO); (PJB)
| |
Collapse
|
4
|
Jimenez-Rosales A, Flores-Merino MV. Tailoring Proteins to Re-Evolve Nature: A Short Review. Mol Biotechnol 2018; 60:946-974. [DOI: 10.1007/s12033-018-0122-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
5
|
Han M, Song Y, Qian J, Ming D. Sequence-based prediction of physicochemical interactions at protein functional sites using a function-and-interaction-annotated domain profile database. BMC Bioinformatics 2018; 19:204. [PMID: 29859055 PMCID: PMC5984826 DOI: 10.1186/s12859-018-2206-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 05/15/2018] [Indexed: 01/16/2023] Open
Abstract
Background Identifying protein functional sites (PFSs) and, particularly, the physicochemical interactions at these sites is critical to understanding protein functions and the biochemical reactions involved. Several knowledge-based methods have been developed for the prediction of PFSs; however, accurate methods for predicting the physicochemical interactions associated with PFSs are still lacking. Results In this paper, we present a sequence-based method for the prediction of physicochemical interactions at PFSs. The method is based on a functional site and physicochemical interaction-annotated domain profile database, called fiDPD, which was built using protein domains found in the Protein Data Bank. This method was applied to 13 target proteins from the very recent Critical Assessment of Structure Prediction (CASP10/11), and our calculations gave a Matthews correlation coefficient (MCC) value of 0.66 for PFS prediction and an 80% recall in the prediction of the associated physicochemical interactions. Conclusions Our results show that, in addition to the PFSs, the physical interactions at these sites are also conserved in the evolution of proteins. This work provides a valuable sequence-based tool for rational drug design and side-effect assessment. The method is freely available and can be accessed at http://202.119.249.49.
Collapse
Affiliation(s)
- Min Han
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Yifan Song
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Jiaqiang Qian
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Dengming Ming
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Biotech Building Room B1-404, 30 South Puzhu Road, Jiangsu, 211816, Nanjing, People's Republic of China.
| |
Collapse
|
6
|
Mills CL, Garg R, Lee JS, Tian L, Suciu A, Cooperman GD, Beuning PJ, Ondrechen MJ. Functional classification of protein structures by local structure matching in graph representation. Protein Sci 2018; 27:1125-1135. [PMID: 29604149 PMCID: PMC5980557 DOI: 10.1002/pro.3416] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 03/21/2018] [Accepted: 03/26/2018] [Indexed: 11/08/2022]
Abstract
As a result of high‐throughput protein structure initiatives, over 14,400 protein structures have been solved by Structural Genomics (SG) centers and participating research groups. While the totality of SG data represents a tremendous contribution to genomics and structural biology, reliable functional information for these proteins is generally lacking. Better functional predictions for SG proteins will add substantial value to the structural information already obtained. Our method described herein, Graph Representation of Active Sites for Prediction of Function (GRASP‐Func), predicts quickly and accurately the biochemical function of proteins by representing residues at the predicted local active site as graphs rather than in Cartesian coordinates. We compare the GRASP‐Func method to our previously reported method, Structurally Aligned Local Sites of Activity (SALSA), using the Ribulose Phosphate Binding Barrel (RPBB), 6‐Hairpin Glycosidase (6‐HG), and Concanavalin A‐like Lectins/Glucanase (CAL/G) superfamilies as test cases. In each of the superfamilies, SALSA and the much faster method GRASP‐Func yield similar correct classification of previously characterized proteins, providing a validated benchmark for the new method. In addition, we analyzed SG proteins using our SALSA and GRASP‐Func methods to predict function. Forty‐one SG proteins in the RPBB superfamily, nine SG proteins in the 6‐HG superfamily, and one SG protein in the CAL/G superfamily were successfully classified into one of the functional families in their respective superfamily by both methods. This improved, faster, validated computational method can yield more reliable predictions of function that can be used for a wide variety of applications by the community.
Collapse
Affiliation(s)
- Caitlyn L Mills
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts
| | - Rohan Garg
- College of Computer and Information Science, Northeastern University, Boston, Massachusetts
| | - Joslynn S Lee
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts
| | - Liang Tian
- Department of Mathematics, Northeastern University, Boston, Massachusetts
| | - Alexandru Suciu
- Department of Mathematics, Northeastern University, Boston, Massachusetts
| | - Gene D Cooperman
- College of Computer and Information Science, Northeastern University, Boston, Massachusetts
| | - Penny J Beuning
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts
| | - Mary Jo Ondrechen
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts
| |
Collapse
|
7
|
Du Y, Wu NC, Jiang L, Zhang T, Gong D, Shu S, Wu TT, Sun R. Annotating Protein Functional Residues by Coupling High-Throughput Fitness Profile and Homologous-Structure Analysis. mBio 2016; 7:e01801-16. [PMID: 27803181 PMCID: PMC5090041 DOI: 10.1128/mbio.01801-16] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 10/07/2016] [Indexed: 11/28/2022] Open
Abstract
Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available. IMPORTANCE To fully comprehend the diverse functions of a protein, it is essential to understand the functionality of individual residues. Current methods are highly dependent on evolutionary sequence conservation, which is usually limited by sampling size. Sequence conservation-based methods are further confounded by structural constraints and multifunctionality of proteins. Here we present a method that can systematically identify and annotate functional residues of a given protein. We used a high-throughput functional profiling platform to identify essential residues. Coupling it with homologous-structure comparison, we were able to annotate multiple functions of proteins. We demonstrated the method with the PB1 protein of influenza A virus and identified novel functional residues in addition to its canonical function as an RNA-dependent RNA polymerase. Not limited to virology, this method is generally applicable to other proteins that can be functionally selected and about which homologous-structure information is available.
Collapse
Affiliation(s)
- Yushen Du
- Department of Molecular and Medical Pharmacology, University of California Los Angeles, Los Angeles, California, USA
- Cancer Institute, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, ZJU-UCLA Joint Center for Medical Education and Research, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Nicholas C Wu
- Department of Molecular and Medical Pharmacology, University of California Los Angeles, Los Angeles, California, USA
- Molecular Biology Institute, University of California Los Angeles, Los Angeles, California, USA
| | - Lin Jiang
- Department of Neurology, University of California Los Angeles, Los Angeles, California, USA
| | - Tianhao Zhang
- Department of Molecular and Medical Pharmacology, University of California Los Angeles, Los Angeles, California, USA
- Molecular Biology Institute, University of California Los Angeles, Los Angeles, California, USA
| | - Danyang Gong
- Department of Molecular and Medical Pharmacology, University of California Los Angeles, Los Angeles, California, USA
| | - Sara Shu
- Department of Molecular and Medical Pharmacology, University of California Los Angeles, Los Angeles, California, USA
| | - Ting-Ting Wu
- Department of Molecular and Medical Pharmacology, University of California Los Angeles, Los Angeles, California, USA
| | - Ren Sun
- Department of Molecular and Medical Pharmacology, University of California Los Angeles, Los Angeles, California, USA
- Cancer Institute, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, ZJU-UCLA Joint Center for Medical Education and Research, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
- Molecular Biology Institute, University of California Los Angeles, Los Angeles, California, USA
| |
Collapse
|
8
|
Fakhar Z, Naiker S, Alves CN, Govender T, Maguire GEM, Lameira J, Lamichhane G, Kruger HG, Honarparvar B. A comparative modeling and molecular docking study on Mycobacterium tuberculosis targets involved in peptidoglycan biosynthesis. J Biomol Struct Dyn 2016; 34:2399-417. [PMID: 26612108 DOI: 10.1080/07391102.2015.1117397] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
An alarming rise of multidrug-resistant Mycobacterium tuberculosis strains and the continuous high global morbidity of tuberculosis have reinvigorated the need to identify novel targets to combat the disease. The enzymes that catalyze the biosynthesis of peptidoglycan in M. tuberculosis are essential and noteworthy therapeutic targets. In this study, the biochemical function and homology modeling of MurI, MurG, MraY, DapE, DapA, Alr, and Ddl enzymes of the CDC1551 M. tuberculosis strain involved in the biosynthesis of peptidoglycan cell wall are reported. Generation of the 3D structures was achieved with Modeller 9.13. To assess the structural quality of the obtained homology modeled targets, the models were validated using PROCHECK, PDBsum, QMEAN, and ERRAT scores. Molecular dynamics simulations were performed to calculate root mean square deviation (RMSD) and radius of gyration (Rg) of MurI and MurG target proteins and their corresponding templates. For further model validation, RMSD and Rg for selected targets/templates were investigated to compare the close proximity of their dynamic behavior in terms of protein stability and average distances. To identify the potential binding mode required for molecular docking, binding site information of all modeled targets was obtained using two prediction algorithms. A docking study was performed for MurI to determine the potential mode of interaction between the inhibitor and the active site residues. This study presents the first accounts of the 3D structural information for the selected M. tuberculosis targets involved in peptidoglycan biosynthesis.
Collapse
Affiliation(s)
- Zeynab Fakhar
- a Catalysis and Peptide Research Unit, School of Health Sciences , University of KwaZulu-Natal , Durban 4001 , South Africa
| | - Suhashni Naiker
- a Catalysis and Peptide Research Unit, School of Health Sciences , University of KwaZulu-Natal , Durban 4001 , South Africa
| | - Claudio N Alves
- b Laboratório de Planejamento de Fármacos, Instituto de Ciências Exatas e Naturais , Instituto de Ciências Biológicas, Universidade Federal do Pará , CEP 66075-110, Belém , Pará , Brazil
| | - Thavendran Govender
- a Catalysis and Peptide Research Unit, School of Health Sciences , University of KwaZulu-Natal , Durban 4001 , South Africa
| | - Glenn E M Maguire
- a Catalysis and Peptide Research Unit, School of Health Sciences , University of KwaZulu-Natal , Durban 4001 , South Africa.,c School of Chemistry and Physics , University of KwaZulu-Natal , 4001 Durban , South Africa
| | - Jeronimo Lameira
- b Laboratório de Planejamento de Fármacos, Instituto de Ciências Exatas e Naturais , Instituto de Ciências Biológicas, Universidade Federal do Pará , CEP 66075-110, Belém , Pará , Brazil
| | - Gyanu Lamichhane
- d Division of Infectious Diseases, Center for Tuberculosis Research , Johns Hopkins University School of Medicine , Baltimore , MD 21205 , USA
| | - Hendrik G Kruger
- a Catalysis and Peptide Research Unit, School of Health Sciences , University of KwaZulu-Natal , Durban 4001 , South Africa
| | - Bahareh Honarparvar
- a Catalysis and Peptide Research Unit, School of Health Sciences , University of KwaZulu-Natal , Durban 4001 , South Africa
| |
Collapse
|
9
|
Aubailly S, Piazza F. Cutoff lensing: predicting catalytic sites in enzymes. Sci Rep 2015; 5:14874. [PMID: 26445900 PMCID: PMC4597221 DOI: 10.1038/srep14874] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2015] [Accepted: 09/10/2015] [Indexed: 01/12/2023] Open
Abstract
Predicting function-related amino acids in proteins with unknown function or unknown allosteric binding sites in drug-targeted proteins is a task of paramount importance in molecular biomedicine. In this paper we introduce a simple, light and computationally inexpensive structure-based method to identify catalytic sites in enzymes. Our method, termed cutoff lensing, is a general procedure consisting in letting the cutoff used to build an elastic network model increase to large values. A validation of our method against a large database of annotated enzymes shows that optimal values of the cutoff exist such that three different structure-based indicators allow one to recover a maximum of the known catalytic sites. Interestingly, we find that the larger the structures the greater the predictive power afforded by our method. Possible ways to combine the three indicators into a single figure of merit and into a specific sequential analysis are suggested and discussed with reference to the classic case of HIV-protease. Our method could be used as a complement to other sequence- and/or structure-based methods to narrow the results of large-scale screenings.
Collapse
Affiliation(s)
- Simon Aubailly
- Université d'Orléans, Centre de Biophysique Moléculaire, CNRS-UPR4301, Rue C. Sadron, 45071, Orléans, France
| | - Francesco Piazza
- Université d'Orléans, Centre de Biophysique Moléculaire, CNRS-UPR4301, Rue C. Sadron, 45071, Orléans, France
| |
Collapse
|
10
|
Xiao X, Hui MJ, Liu Z, Qiu WR. iCataly-PseAAC: Identification of Enzymes Catalytic Sites Using Sequence Evolution Information with Grey Model GM (2,1). J Membr Biol 2015; 248:1033-41. [DOI: 10.1007/s00232-015-9815-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2015] [Accepted: 06/06/2015] [Indexed: 11/25/2022]
|
11
|
Tiwari AK, Srivastava R. A survey of computational intelligence techniques in protein function prediction. INTERNATIONAL JOURNAL OF PROTEOMICS 2014; 2014:845479. [PMID: 25574395 PMCID: PMC4276698 DOI: 10.1155/2014/845479] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/10/2014] [Revised: 10/31/2014] [Accepted: 11/07/2014] [Indexed: 02/08/2023]
Abstract
During the past, there was a massive growth of knowledge of unknown proteins with the advancement of high throughput microarray technologies. Protein function prediction is the most challenging problem in bioinformatics. In the past, the homology based approaches were used to predict the protein function, but they failed when a new protein was different from the previous one. Therefore, to alleviate the problems associated with homology based traditional approaches, numerous computational intelligence techniques have been proposed in the recent past. This paper presents a state-of-the-art comprehensive review of various computational intelligence techniques for protein function predictions using sequence, structure, protein-protein interaction network, and gene expression data used in wide areas of applications such as prediction of DNA and RNA binding sites, subcellular localization, enzyme functions, signal peptides, catalytic residues, nuclear/G-protein coupled receptors, membrane proteins, and pathway analysis from gene expression datasets. This paper also summarizes the result obtained by many researchers to solve these problems by using computational intelligence techniques with appropriate datasets to improve the prediction performance. The summary shows that ensemble classifiers and integration of multiple heterogeneous data are useful for protein function prediction.
Collapse
Affiliation(s)
- Arvind Kumar Tiwari
- Department of Computer Science & Engineering, Indian Institute of Technology (BHU), Varanasi 221005, India
| | - Rajeev Srivastava
- Department of Computer Science & Engineering, Indian Institute of Technology (BHU), Varanasi 221005, India
| |
Collapse
|
12
|
Chetty S, Soliman MES. Possible allosteric binding site on Gyrase B, a key target for novel anti-TB drugs: homology modelling and binding site identification using molecular dynamics simulation and binding free energy calculations. Med Chem Res 2014. [DOI: 10.1007/s00044-014-1279-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
|
13
|
Yahalom R, Reshef D, Wiener A, Frankel S, Kalisman N, Lerner B, Keasar C. Structure-based identification of catalytic residues. Proteins 2011; 79:1952-63. [PMID: 21491495 DOI: 10.1002/prot.23020] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2010] [Revised: 01/14/2011] [Accepted: 01/28/2011] [Indexed: 11/10/2022]
Abstract
The identification of catalytic residues is an essential step in functional characterization of enzymes. We present a purely structural approach to this problem, which is motivated by the difficulty of evolution-based methods to annotate structural genomics targets that have few or no homologs in the databases. Our approach combines a state-of-the-art support vector machine (SVM) classifier with novel structural features that augment structural clues by spatial averaging and Z scoring. Special attention is paid to the class imbalance problem that stems from the overwhelming number of non-catalytic residues in enzymes compared to catalytic residues. This problem is tackled by: (1) optimizing the classifier to maximize a performance criterion that considers both Type I and Type II errors in the classification of catalytic and non-catalytic residues; (2) under-sampling non-catalytic residues before SVM training; and (3) during SVM training, penalizing errors in learning catalytic residues more than errors in learning non-catalytic residues. Tested on four enzyme datasets, one specifically designed by us to mimic the structural genomics scenario and three previously evaluated datasets, our structure-based classifier is never inferior to similar structure-based classifiers and comparable to classifiers that use both structural and evolutionary features. In addition to the evaluation of the performance of catalytic residue identification, we also present detailed case studies on three proteins. This analysis suggests that many false positive predictions may correspond to binding sites and other functional residues. A web server that implements the method, our own-designed database, and the source code of the programs are publicly available at http://www.cs.bgu.ac.il/∼meshi/functionPrediction.
Collapse
Affiliation(s)
- Ran Yahalom
- Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel
| | | | | | | | | | | | | |
Collapse
|
14
|
Parasuram R, Lee JS, Yin P, Somarowthu S, Ondrechen MJ. Functional classification of protein 3D structures from predicted local interaction sites. J Bioinform Comput Biol 2011; 8 Suppl 1:1-15. [PMID: 21155016 DOI: 10.1142/s0219720010005166] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2010] [Revised: 08/25/2010] [Accepted: 09/10/2010] [Indexed: 11/18/2022]
Abstract
A new approach to the functional classification of protein 3D structures is described with application to some examples from structural genomics. This approach is based on functional site prediction with THEMATICS and POOL. THEMATICS employs calculated electrostatic potentials of the query structure. POOL is a machine learning method that utilizes THEMATICS features and has been shown to predict accurate, precise, highly localized interaction sites. Extension to the functional classification of structural genomics proteins is now described. Predicted functionally important residues are structurally aligned with those of proteins with previously characterized biochemical functions. A 3D structure match at the predicted local functional site then serves as a more reliable predictor of biochemical function than an overall structure match. Annotation is confirmed for a structural genomics protein with the ribulose phosphate binding barrel (RPBB) fold. A putative glucoamylase from Bacteroides fragilis (PDB ID 3eu8) is shown to be in fact probably not a glucoamylase. Finally a structural genomics protein from Streptomyces coelicolor annotated as an enoyl-CoA hydratase (PDB ID 3g64) is shown to be misannotated. Its predicted active site does not match the well-characterized enoyl-CoA hydratases of similar structure but rather bears closer resemblance to those of a dehalogenase with similar fold.
Collapse
Affiliation(s)
- Ramya Parasuram
- Department of Chemistry & Chemical Biology, Northeastern University, Boston, MA 02115, USA
| | | | | | | | | |
Collapse
|
15
|
Somarowthu S, Yang H, Hildebrand DG, Ondrechen MJ. High-performance prediction of functional residues in proteins with machine learning and computed input features. Biopolymers 2011; 95:390-400. [DOI: 10.1002/bip.21589] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
16
|
Structural bioinformatics: deriving biological insights from protein structures. Interdiscip Sci 2010; 2:347-66. [PMID: 21153779 DOI: 10.1007/s12539-010-0045-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2010] [Revised: 06/18/2010] [Accepted: 06/21/2010] [Indexed: 12/27/2022]
Abstract
Structural bioinformatics can be described as an approach that will help decipher biological insights from protein structures. As an important component of structural biology, this area promises to provide a high resolution understanding of biology by assisting comprehension and interpretation of a large amount of structural data. Biological function of protein molecules can be inferred from their three-dimensional structures by comparing structures, classifying them and transferring function from a related protein or family. It is well known now that the structure space of protein molecules is more conserved than the sequence space, making it important to seek functional associations at the structural level. An added advantage of structural bioinformatics over simpler sequence-based methods is that the former also provides ultimate insights into the mechanisms by which various biological events take place. A bird's eye-view of the different aspects of structural bioinformatics is given here along with various recent advances in the area including how knowledge obtained from structural bioinformatics can be applied in drug discovery.
Collapse
|
17
|
Sridhar GR, Rao AA, Srinivas K, Nirmala G, Lakshmi G, Suryanarayna D, Rao PVN, Kaladhar DGSVGL, Kumar SV, Devi TU, Nitesh T, Hanuman T. Butyrylcholinesterase in metabolic syndrome. Med Hypotheses 2010; 75:648-51. [PMID: 20797821 DOI: 10.1016/j.mehy.2010.08.008] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2010] [Accepted: 08/01/2010] [Indexed: 01/21/2023]
Abstract
Butyrylcholinesterase may have a role in a number of metabolic functions and could affect the expression of insulin resistance syndrome. We present our integrated work using clinical, biochemical and bioinformatic approaches to delineate the possible function of this enzyme. Initially, we constructed a phylogenic tree with nucleotides and amino acid sequences and showed the existence of similar sequences in bacteria, plants and in other animals. We also demonstrated a possible pathogenic role for BChE in the common existence of insulin resistance, type 2 diabetes and Alzheimer's disease by in silico method and followed it up with a diabetic mouse study where cognition was slowed along with changes in BChE levels. In the next group of in silico studies, we employed THEMATICS method to identify the amino acids at the active site and later performed docking studies with drugs. THEMATICS predicted two clusters of ionisable amino acid residues that are in proximity: one with two residues and another with 11 showed perturbation in the THEMATICS curves. Using ISIS/Draw 2.5SP4, ARGUSLAB 4.0.1 and HEX 5.1. software. 3-D ligands were docked with BChE motif (from PDB). We did not find any of the ligands studied with significant docking distance, indicating they did not have direct interaction with the active site. Subsequently we performed in silico studies to compare the secondary structure and domain of BChE. Protein-protein interaction showed the following intersections with BChE UBE21, CHAT, APOE, AATF, DF ALDH9A1, PDHX, PONI PSME3 and ATP6VOA2. The integrative physiological roles of proteins with poorly known functions can be approached by generating leads in silico, which can be studied in vivo, setting into movement an iterative process.
Collapse
Affiliation(s)
- Gumpeny R Sridhar
- Endocrine and Diabetes Centre, 15-12-15 Krishananagar, Visakhapatnam 530 002, India.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Xin F, Myers S, Li YF, Cooper DN, Mooney SD, Radivojac P. Structure-based kernels for the prediction of catalytic residues and their involvement in human inherited disease. ACTA ACUST UNITED AC 2010; 26:1975-82. [PMID: 20551136 DOI: 10.1093/bioinformatics/btq319] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
MOTIVATION Enzyme catalysis is involved in numerous biological processes and the disruption of enzymatic activity has been implicated in human disease. Despite this, various aspects of catalytic reactions are not completely understood, such as the mechanics of reaction chemistry and the geometry of catalytic residues within active sites. As a result, the computational prediction of catalytic residues has the potential to identify novel catalytic pockets, aid in the design of more efficient enzymes and also predict the molecular basis of disease. RESULTS We propose a new kernel-based algorithm for the prediction of catalytic residues based on protein sequence, structure and evolutionary information. The method relies upon explicit modeling of similarity between residue-centered neighborhoods in protein structures. We present evidence that this algorithm evaluates favorably against established approaches, and also provides insights into the relative importance of the geometry, physicochemical properties and evolutionary conservation of catalytic residue activity. The new algorithm was used to identify known mutations associated with inherited disease whose molecular mechanism might be predicted to operate specifically though the loss or gain of catalytic residues. It should, therefore, provide a viable approach to identifying the molecular basis of disease in which the loss or gain of function is not caused solely by the disruption of protein stability. Our analysis suggests that both mechanisms are actively involved in human inherited disease. AVAILABILITY AND IMPLEMENTATION Source code for the structural kernel is available at www.informatics.indiana.edu/predrag/.
Collapse
Affiliation(s)
- Fuxiao Xin
- School of Informatics and Computing, Indiana University, Bloomington, IN 47408, USA
| | | | | | | | | | | |
Collapse
|
19
|
Du S, Sakurai M. Multivariate analysis of properties of amino acid residues in proteins from a viewpoint of functional site prediction. Chem Phys Lett 2010. [DOI: 10.1016/j.cplett.2010.02.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
20
|
Sankararaman S, Sha F, Kirsch JF, Jordan MI, Sjölander K. Active site prediction using evolutionary and structural information. ACTA ACUST UNITED AC 2010; 26:617-24. [PMID: 20080507 PMCID: PMC2828116 DOI: 10.1093/bioinformatics/btq008] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Motivation: The identification of catalytic residues is a key step in understanding the function of enzymes. While a variety of computational methods have been developed for this task, accuracies have remained fairly low. The best existing method exploits information from sequence and structure to achieve a precision (the fraction of predicted catalytic residues that are catalytic) of 18.5% at a corresponding recall (the fraction of catalytic residues identified) of 57% on a standard benchmark. Here we present a new method, Discern, which provides a significant improvement over the state-of-the-art through the use of statistical techniques to derive a model with a small set of features that are jointly predictive of enzyme active sites. Results: In cross-validation experiments on two benchmark datasets from the Catalytic Site Atlas and CATRES resources containing a total of 437 manually curated enzymes spanning 487 SCOP families, Discern increases catalytic site recall between 12% and 20% over methods that combine information from both sequence and structure, and by ≥50% over methods that make use of sequence conservation signal only. Controlled experiments show that Discern's improvement in catalytic residue prediction is derived from the combination of three ingredients: the use of the INTREPID phylogenomic method to extract conservation information; the use of 3D structure data, including features computed for residues that are proximal in the structure; and a statistical regularization procedure to prevent overfitting. Contact:kimmen@berkeley.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|
21
|
Bray T, Chan P, Bougouffa S, Greaves R, Doig AJ, Warwicker J. SitesIdentify: a protein functional site prediction tool. BMC Bioinformatics 2009; 10:379. [PMID: 19922660 PMCID: PMC2783165 DOI: 10.1186/1471-2105-10-379] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2009] [Accepted: 11/18/2009] [Indexed: 01/31/2023] Open
Abstract
Background The rate of protein structures being deposited in the Protein Data Bank surpasses the capacity to experimentally characterise them and therefore computational methods to analyse these structures have become increasingly important. Identifying the region of the protein most likely to be involved in function is useful in order to gain information about its potential role. There are many available approaches to predict functional site, but many are not made available via a publicly-accessible application. Results Here we present a functional site prediction tool (SitesIdentify), based on combining sequence conservation information with geometry-based cleft identification, that is freely available via a web-server. We have shown that SitesIdentify compares favourably to other functional site prediction tools in a comparison of seven methods on a non-redundant set of 237 enzymes with annotated active sites. Conclusion SitesIdentify is able to produce comparable accuracy in predicting functional sites to its closest available counterpart, but in addition achieves improved accuracy for proteins with few characterised homologues. SitesIdentify is available via a webserver at http://www.manchester.ac.uk/bioinformatics/sitesidentify/
Collapse
Affiliation(s)
- Tracey Bray
- Faculty of Life Sciences, The University of Manchester, Michael Smith Building, Oxford Road, Manchester M13 9PT, UK.
| | | | | | | | | | | |
Collapse
|
22
|
Alterovitz R, Arvey A, Sankararaman S, Dallett C, Freund Y, Sjölander K. ResBoost: characterizing and predicting catalytic residues in enzymes. BMC Bioinformatics 2009; 10:197. [PMID: 19558703 PMCID: PMC2713229 DOI: 10.1186/1471-2105-10-197] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2008] [Accepted: 06/27/2009] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Identifying the catalytic residues in enzymes can aid in understanding the molecular basis of an enzyme's function and has significant implications for designing new drugs, identifying genetic disorders, and engineering proteins with novel functions. Since experimentally determining catalytic sites is expensive, better computational methods for identifying catalytic residues are needed. RESULTS We propose ResBoost, a new computational method to learn characteristics of catalytic residues. The method effectively selects and combines rules of thumb into a simple, easily interpretable logical expression that can be used for prediction. We formally define the rules of thumb that are often used to narrow the list of candidate residues, including residue evolutionary conservation, 3D clustering, solvent accessibility, and hydrophilicity. ResBoost builds on two methods from machine learning, the AdaBoost algorithm and Alternating Decision Trees, and provides precise control over the inherent trade-off between sensitivity and specificity. We evaluated ResBoost using cross-validation on a dataset of 100 enzymes from the hand-curated Catalytic Site Atlas (CSA). CONCLUSION ResBoost achieved 85% sensitivity for a 9.8% false positive rate and 73% sensitivity for a 5.7% false positive rate. ResBoost reduces the number of false positives by up to 56% compared to the use of evolutionary conservation scoring alone. We also illustrate the ability of ResBoost to identify recently validated catalytic residues not listed in the CSA.
Collapse
Affiliation(s)
- Ron Alterovitz
- Department of Computer Science, University of North Carolina at Chapel Hill, USA
| | - Aaron Arvey
- Department of Computer Science and Engineering, University of California, San Diego, USA
| | - Sriram Sankararaman
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, USA
| | - Carolina Dallett
- Department of Bioengineering, University of California, Berkeley, USA
| | - Yoav Freund
- Department of Computer Science and Engineering, University of California, San Diego, USA
| | - Kimmen Sjölander
- Department of Bioengineering, University of California, Berkeley, USA
| |
Collapse
|
23
|
Nimrod G, Schushan M, Steinberg DM, Ben-Tal N. Detection of functionally important regions in "hypothetical proteins" of known structure. Structure 2009; 16:1755-63. [PMID: 19081051 DOI: 10.1016/j.str.2008.10.017] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2008] [Revised: 10/16/2008] [Accepted: 10/19/2008] [Indexed: 10/21/2022]
Abstract
Structural genomics initiatives provide ample structures of "hypothetical proteins" (i.e., proteins of unknown function) at an ever increasing rate. However, without function annotation, this structural goldmine is of little use to biologists who are interested in particular molecular systems. To this end, we used (an improved version of) the PatchFinder algorithm for the detection of functional regions on the protein surface, which could mediate its interactions with, e.g., substrates, ligands, and other proteins. Examination, using a data set of annotated proteins, showed that PatchFinder outperforms similar methods. We collected 757 structures of hypothetical proteins and their predicted functional regions in the N-Func database. Inspection of several of these regions demonstrated that they are useful for function prediction. For example, we suggested an interprotein interface and a putative nucleotide-binding site. A web-server implementation of PatchFinder and the N-Func database are available at http://patchfinder.tau.ac.il/.
Collapse
Affiliation(s)
- Guy Nimrod
- Department of Biochemistry, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Tel Aviv, Israel
| | | | | | | |
Collapse
|
24
|
Tong W, Wei Y, Murga LF, Ondrechen MJ, Williams RJ. Partial order optimum likelihood (POOL): maximum likelihood prediction of protein active site residues using 3D Structure and sequence properties. PLoS Comput Biol 2009; 5:e1000266. [PMID: 19148270 PMCID: PMC2612599 DOI: 10.1371/journal.pcbi.1000266] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2008] [Accepted: 12/04/2008] [Indexed: 11/24/2022] Open
Abstract
A new monotonicity-constrained maximum likelihood approach, called Partial Order Optimum Likelihood (POOL), is presented and applied to the problem of functional site prediction in protein 3D structures, an important current challenge in genomics. The input consists of electrostatic and geometric properties derived from the 3D structure of the query protein alone. Sequence-based conservation information, where available, may also be incorporated. Electrostatics features from THEMATICS are combined with multidimensional isotonic regression to form maximum likelihood estimates of probabilities that specific residues belong to an active site. This allows likelihood ranking of all ionizable residues in a given protein based on THEMATICS features. The corresponding ROC curves and statistical significance tests demonstrate that this method outperforms prior THEMATICS-based methods, which in turn have been shown previously to outperform other 3D-structure-based methods for identifying active site residues. Then it is shown that the addition of one simple geometric property, the size rank of the cleft in which a given residue is contained, yields improved performance. Extension of the method to include predictions of non-ionizable residues is achieved through the introduction of environment variables. This extension results in even better performance than THEMATICS alone and constitutes to date the best functional site predictor based on 3D structure only, achieving nearly the same level of performance as methods that use both 3D structure and sequence alignment data. Finally, the method also easily incorporates such sequence alignment data, and when this information is included, the resulting method is shown to outperform the best current methods using any combination of sequence alignments and 3D structures. Included is an analysis demonstrating that when THEMATICS features, cleft size rank, and alignment-based conservation scores are used individually or in combination THEMATICS features represent the single most important component of such classifiers. Genome sequencing has revealed the codes for thousands of previously unknown proteins for humans and for hundreds of other species. Many of these proteins are of unknown or unclear function. The information contained in the genome sequences holds tremendous potential benefit to humankind, including new approaches to the diagnosis and treatment of disease. In order to realize these benefits, a key step is to understand the functions of the proteins for which these genes hold the code. A first step in understanding the function of a protein is to identify the functional site, the local area on the surface of a protein where it affects its functional activity. This paper reports on a new computational methodology to predict protein functional sites from protein 3D structures. A new machine learning approach called Partial Order Optimum Likelihood (POOL) is introduced here. It is shown that POOL outperforms previous methods for the prediction of protein functional sites from 3D structures.
Collapse
Affiliation(s)
- Wenxu Tong
- College of Computer and Information Science, Northeastern University, Boston, Massachusetts, United States of America
- Institute for Complex Scientific Software, Northeastern University, Boston, Massachusetts, United States of America
| | - Ying Wei
- Institute for Complex Scientific Software, Northeastern University, Boston, Massachusetts, United States of America
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts, United States of America
| | - Leonel F. Murga
- Institute for Complex Scientific Software, Northeastern University, Boston, Massachusetts, United States of America
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts, United States of America
| | - Mary Jo Ondrechen
- Institute for Complex Scientific Software, Northeastern University, Boston, Massachusetts, United States of America
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts, United States of America
- * E-mail: (MO); (RJW)
| | - Ronald J. Williams
- College of Computer and Information Science, Northeastern University, Boston, Massachusetts, United States of America
- Institute for Complex Scientific Software, Northeastern University, Boston, Massachusetts, United States of America
- * E-mail: (MO); (RJW)
| |
Collapse
|