1
|
Nikam R, Jemimah S, Gromiha MM. DeepPPAPredMut: deep ensemble method for predicting the binding affinity change in protein-protein complexes upon mutation. Bioinformatics 2024; 40:btae309. [PMID: 38718170 PMCID: PMC11112046 DOI: 10.1093/bioinformatics/btae309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 04/08/2024] [Accepted: 05/08/2024] [Indexed: 05/24/2024] Open
Abstract
MOTIVATION Protein-protein interactions underpin many cellular processes and their disruption due to mutations can lead to diseases. With the evolution of protein structure prediction methods like AlphaFold2 and the availability of extensive experimental affinity data, there is a pressing need for updated computational tools that can efficiently predict changes in binding affinity caused by mutations in protein-protein complexes. RESULTS We developed a deep ensemble model that leverages protein sequences, predicted structure-based features, and protein functional classes to accurately predict the change in binding affinity due to mutations. The model achieved a correlation of 0.97 and a mean absolute error (MAE) of 0.35 kcal/mol on the training dataset, and maintained robust performance on the test set with a correlation of 0.72 and a MAE of 0.83 kcal/mol. Further validation using Leave-One-Out Complex (LOOC) cross-validation exhibited a correlation of 0.83 and a MAE of 0.51 kcal/mol, indicating consistent performance. AVAILABILITY AND IMPLEMENTATION https://web.iitm.ac.in/bioinfo2/DeepPPAPredMut/index.html.
Collapse
Affiliation(s)
- Rahul Nikam
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - Sherlyn Jemimah
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
- Department of Biomedical Engineering, Khalifa University, P.O. Box: 127788 , Abu Dhabi, United Arab Emirates
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
- Department of Computer Science, Tokyo Tech World Research Hub Initiative (WRHI), Institute of Innovative Research, Tokyo Institute of Technology, 4259 Nagatsutacho, Midori-ku, Yokohama, Kanagawa 226-8501, Japan
| |
Collapse
|
2
|
Nikam R, Yugandhar K, Gromiha MM. DeepBSRPred: deep learning-based binding site residue prediction for proteins. Amino Acids 2023; 55:1305-1316. [PMID: 36574037 DOI: 10.1007/s00726-022-03228-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 12/15/2022] [Indexed: 12/28/2022]
Abstract
MOTIVATION Proteins-protein interactions (PPIs) are important to govern several cellular activities. Amino acid residues, which are located at the interface are known as the binding sites and the information about binding sites helps to understand the binding affinities and functions of protein-protein complexes. RESULTS We have developed a deep neural network-based method, DeepBSRPred, for predicting the binding sites using protein sequence information and predicted structures from AlphaFold2. Specific sequence and structure-based features include position-specific scoring matrix (PSSM), solvent accessible surface area, conservation score and amino acid properties, and residue depth, respectively. Our method predicted the binding sites with an average F1 score of 0.73 in a dataset of 1236 proteins. Further, we compared the performance with other existing methods in the literature using four benchmark datasets and our method outperformed those methods. AVAILABILITY AND IMPLEMENTATION The DeepBSRPred web server can be found at https://web.iitm.ac.in/bioinfo2/deepbsrpred/index.html , along with all datasets used in this study. The trained models, the DeepBSRPred standalone source code, and the feature computation pipeline are freely available at https://web.iitm.ac.in/bioinfo2/deepbsrpred/download.html .
Collapse
Affiliation(s)
- Rahul Nikam
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Kumar Yugandhar
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
- Department of Computational Biology, Cornell University, New York, NY, USA
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India.
- Department of Computer Science, Tokyo Institute of Technology, Yokohama, Japan.
| |
Collapse
|
3
|
Pandey M, Anoosha P, Yesudhas D, Gromiha MM. Identification of potential driver mutations in glioblastoma using machine learning. Brief Bioinform 2022; 23:6764546. [PMID: 36266243 DOI: 10.1093/bib/bbac451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 09/13/2022] [Accepted: 09/22/2022] [Indexed: 12/14/2022] Open
Abstract
Glioblastoma is a fast and aggressively growing tumor in the brain and spinal cord. Mutation of amino acid residues in targets proteins, which are involved in glioblastoma, alters the structure and function and may lead to disease. In this study, we collected a set of 9386 disease-causing (drivers) mutations based on the recurrence in patient samples and experimentally annotated as pathogenic and 8728 as neutral (passenger) mutations. We observed that Arg is highly preferred at the mutant sites of drivers, whereas Met and Ile showed preferences in passengers. Inspecting neighboring residues at the mutant sites revealed that the motifs YP, CP and GRH, are preferred in drivers, whereas SI, IQ and TVI are dominant in neutral. In addition, we have computed other sequence-based features such as conservation scores, Position Specific Scoring Matrices (PSSM) and physicochemical properties, and developed a machine learning-based method, GBMDriver (GlioBlastoma Multiforme Drivers), for distinguishing between driver and passenger mutations. Our method showed an accuracy and AUC of 73.59% and 0.82, respectively, on 10-fold cross-validation and 81.99% and 0.87 in a blind set of 1809 mutants. The tool is available at https://web.iitm.ac.in/bioinfo2/GBMDriver/index.html. We envisage that the present method is helpful to prioritize driver mutations in glioblastoma and assist in identifying therapeutic targets.
Collapse
Affiliation(s)
- Medha Pandey
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - P Anoosha
- Division of Medical Oncology, Department of Internal Medicine, Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio, USA
| | - Dhanusha Yesudhas
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| |
Collapse
|
4
|
PCA-MutPred: Prediction of binding free energy change upon missense mutation in protein-carbohydrate complexes. J Mol Biol 2022; 434:167526. [DOI: 10.1016/j.jmb.2022.167526] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 02/26/2022] [Accepted: 03/01/2022] [Indexed: 11/22/2022]
|
5
|
L-Type Calcium Channel: Predicting Pathogenic/Likely Pathogenic Status for Variants of Uncertain Clinical Significance. MEMBRANES 2021; 11:membranes11080599. [PMID: 34436362 PMCID: PMC8399957 DOI: 10.3390/membranes11080599] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Revised: 08/01/2021] [Accepted: 08/04/2021] [Indexed: 11/25/2022]
Abstract
(1) Background: Defects in gene CACNA1C, which encodes the pore-forming subunit of the human Cav1.2 channel (hCav1.2), are associated with cardiac disorders such as atrial fibrillation, long QT syndrome, conduction disorders, cardiomyopathies, and congenital heart defects. Clinical manifestations are known only for 12% of CACNA1C missense variants, which are listed in public databases. Bioinformatics approaches can be used to predict the pathogenic/likely pathogenic status for variants of uncertain clinical significance. Choosing a bioinformatics tool and pathogenicity threshold that are optimal for specific protein families increases the reliability of such predictions. (2) Methods and Results: We used databases ClinVar, Humsavar, gnomAD, and Ensembl to compose a dataset of pathogenic/likely pathogenic and benign variants of hCav1.2 and its 20 paralogues: voltage-gated sodium and calcium channels. We further tested the performance of sixteen in silico tools in predicting pathogenic variants. ClinPred demonstrated the best performance, followed by REVEL and MCap. In the subset of 309 uncharacterized variants of hCav1.2, ClinPred predicted the pathogenicity for 188 variants. Among these, 36 variants were also categorized as pathogenic/likely pathogenic in at least one paralogue of hCav1.2. (3) Conclusions: The bioinformatics tool ClinPred and the paralogue annotation method consensually predicted the pathogenic/likely pathogenic status for 36 uncharacterized variants of hCav1.2. An analogous approach can be used to classify missense variants of other calcium channels and novel variants of hCav1.2.
Collapse
|
6
|
Furman CM, Elbashir R, Pannafino G, Clark NL, Alani E. Experimental exchange of paralogous domains in the MLH family provides evidence of sub-functionalization after gene duplication. G3 (BETHESDA, MD.) 2021; 11:jkab111. [PMID: 33871573 PMCID: PMC8495741 DOI: 10.1093/g3journal/jkab111] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 03/31/2021] [Indexed: 01/24/2023]
Abstract
Baker's yeast contains a large number of duplicated genes; some function redundantly, whereas others have more specialized roles. We used the MLH family of DNA mismatch repair (MMR) proteins as a model to better understand the steps that lead to gene specialization following a gene duplication event. We focused on two highly conserved yeast MLH proteins, Pms1 and Mlh3, with Pms1 having a major role in the repair of misincorporation events during DNA replication and Mlh3 acting to resolve recombination intermediates in meiosis to form crossovers. The baker's yeast Mlh3 and Pms1 proteins are significantly diverged (19% overall identity), suggesting that an extensive number of evolutionary steps, some major, others involving subtle refinements, took place to diversify the MLH proteins. Using phylogenetic and molecular approaches, we provide evidence that all three domains (N-terminal ATP binding, linker, C-terminal endonuclease/MLH interaction) in the MLH protein family are critical for conferring pathway specificity. Importantly, mlh3 alleles in the ATP binding and endonuclease domains improved MMR functions in strains lacking the Pms1 protein and did not disrupt Mlh3 meiotic functions. This ability for mlh3 alleles to complement the loss of Pms1 suggests that an ancestral Pms1/Mlh3 protein was capable of performing both MMR and crossover functions. Our strategy for analyzing MLH pathway specificity provides an approach to understand how paralogs have evolved to support distinct cellular processes.
Collapse
Affiliation(s)
- Christopher M Furman
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853-2703, USA
| | - Ryan Elbashir
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853-2703, USA
| | - Gianno Pannafino
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853-2703, USA
| | - Nathan L Clark
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84132, USA
| | - Eric Alani
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853-2703, USA
| |
Collapse
|
7
|
Jemimah S, Sekijima M, Gromiha MM. ProAffiMuSeq: sequence-based method to predict the binding free energy change of protein-protein complexes upon mutation using functional classification. Bioinformatics 2020; 36:1725-1730. [PMID: 31713585 DOI: 10.1093/bioinformatics/btz829] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Revised: 10/23/2019] [Accepted: 11/11/2019] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Protein-protein interactions are essential for the cell and mediate various functions. However, mutations can disrupt these interactions and may cause diseases. Currently available computational methods require a complex structure as input for predicting the change in binding affinity. Further, they have not included the functional class information for the protein-protein complex. To address this, we have developed a method, ProAffiMuSeq, which predicts the change in binding free energy using sequence-based features and functional class. RESULTS Our method shows an average correlation between predicted and experimentally determined ΔΔG of 0.73 and mean absolute error (MAE) of 0.86 kcal/mol in 10-fold cross-validation and correlation of 0.75 with MAE of 0.94 kcal/mol in the test dataset. ProAffiMuSeq was also tested on an external validation set and showed results comparable to structure-based methods. Our method can be used for large-scale analysis of disease-causing mutations in protein-protein complexes without structural information. AVAILABILITY AND IMPLEMENTATION Users can access the method at https://web.iitm.ac.in/bioinfo2/proaffimuseq/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sherlyn Jemimah
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - Masakazu Sekijima
- Advanced Computational Drug Discovery Unit, Tokyo Institute of Technology, Midori-ku, Kanagawa 226-8503, Yokohama, Japan
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India.,Advanced Computational Drug Discovery Unit, Tokyo Tech World Research Hub Initiative (WRHI), Institute of Innovative Research, Tokyo Institute of Technology, Midori-ku, Kanagawa 226-8503, Yokohama, Japan
| |
Collapse
|
8
|
Kulandaisamy A, Zaucha J, Frishman D, Gromiha MM. MPTherm-pred: Analysis and Prediction of Thermal Stability Changes upon Mutations in Transmembrane Proteins. J Mol Biol 2020; 433:166646. [PMID: 32920050 DOI: 10.1016/j.jmb.2020.09.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 09/04/2020] [Accepted: 09/04/2020] [Indexed: 01/06/2023]
Abstract
The stability of membrane proteins differs from globular proteins due to the presence of nonpolar membrane-spanning regions. Using a dataset of 929 membrane protein mutations whose effects on thermal stability (ΔTm) were experimentally determined, we found that the average ΔTm due to 190 stabilizing and 232 destabilizing mutations occurring in membrane-spanning regions are 2.43(3.1) °C and -5.48(5.5) °C, respectively. The ΔTm values for mutations occurring in solvent-exposed regions are 2.56(2.82) and - 6.8(7.2) °C. We have systematically analyzed the factors influencing the stability of mutants and observed that changes in hydrophobicity, number of contacts between Cα atoms and frequency of aliphatic residues are important determinants of the stability change induced by mutations occurring in membrane-spanning regions. We have developed structure- and sequence-based machine learning predictors of ΔTm due to mutations specifically for membrane proteins. They showed a correlation and mean absolute error (MAE) of 0.72 and 2.85 °C, respectively, between experimental and predicted ΔTm for mutations in membrane-spanning regions on 10-fold group-wise cross-validation. The average correlation and MAE for mutations in aqueous regions are 0.73 and 3.7 °C, respectively. These MAE values are about 50% lower than standard deviations from the mean ΔTm values. The reliability of the method was affirmed on a test set of mutations occurring in evolutionary independent protein sequences. The developed MPTherm-pred server for predicting thermal stability changes upon mutations in membrane proteins is available at https://web.iitm.ac.in/bioinfo2/mpthermpred/. Our results provide insights into factors influencing the stability of membrane proteins and can aid in designing mutants that are more resistant to thermal stress.
Collapse
Affiliation(s)
- A Kulandaisamy
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai 600 036, Tamilnadu, India
| | - Jan Zaucha
- Department of Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
| | - Dmitrij Frishman
- Department of Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany; Department of Bioinformatics, Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russian Federation
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai 600 036, Tamilnadu, India.
| |
Collapse
|
9
|
Zaucha J, Heinzinger M, Kulandaisamy A, Kataka E, Salvádor ÓL, Popov P, Rost B, Gromiha MM, Zhorov BS, Frishman D. Mutations in transmembrane proteins: diseases, evolutionary insights, prediction and comparison with globular proteins. Brief Bioinform 2020; 22:5872174. [PMID: 32672331 DOI: 10.1093/bib/bbaa132] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 05/26/2020] [Accepted: 05/28/2020] [Indexed: 12/18/2022] Open
Abstract
Membrane proteins are unique in that they interact with lipid bilayers, making them indispensable for transporting molecules and relaying signals between and across cells. Due to the significance of the protein's functions, mutations often have profound effects on the fitness of the host. This is apparent both from experimental studies, which implicated numerous missense variants in diseases, as well as from evolutionary signals that allow elucidating the physicochemical constraints that intermembrane and aqueous environments bring. In this review, we report on the current state of knowledge acquired on missense variants (referred to as to single amino acid variants) affecting membrane proteins as well as the insights that can be extrapolated from data already available. This includes an overview of the annotations for membrane protein variants that have been collated within databases dedicated to the topic, bioinformatics approaches that leverage evolutionary information in order to shed light on previously uncharacterized membrane protein structures or interaction interfaces, tools for predicting the effects of mutations tailored specifically towards the characteristics of membrane proteins as well as two clinically relevant case studies explaining the implications of mutated membrane proteins in cancer and cardiomyopathy.
Collapse
Affiliation(s)
- Jan Zaucha
- Department of Bioinformatics of the TUM School of Life Sciences Weihenstephan in Freising, Germany
| | - Michael Heinzinger
- Department of Informatics, Bioinformatics and Computational Biology of the TUM Faculty of Informatics in Garching, Germany
| | - A Kulandaisamy
- Department of Biotechnology of the IIT Bhupat and Jyoti Mehta School of BioSciences in Madras, India
| | - Evans Kataka
- Department of Bioinformatics of the TUM School of Life Sciences Weihenstephan in Freising, Germany
| | - Óscar Llorian Salvádor
- Department of Informatics, Bioinformatics and Computational Biology of the TUM Faculty of Informatics in Garching, Germany
| | - Petr Popov
- Center for Computational and Data-Intensive Science and Engineering of the Skolkovo Institute of Science and Technology in Moscow, Russia
| | - Burkhard Rost
- Department of Informatics, Bioinformatics and Computational Biology at the TUM Faculty of Informatics in Garching, Germany
| | | | - Boris S Zhorov
- Department of Biochemistry and Biomedical Sciences, McMaster University in Hamilton, Canada
| | - Dmitrij Frishman
- Department of Bioinformatics at the TUM School of Life Sciences Weihenstephan in Freising, Germany
| |
Collapse
|
10
|
MacGowan SA, Madeira F, Britto‐Borges T, Warowny M, Drozdetskiy A, Procter JB, Barton GJ. The Dundee Resource for Sequence Analysis and Structure Prediction. Protein Sci 2020; 29:277-297. [PMID: 31710725 PMCID: PMC6933851 DOI: 10.1002/pro.3783] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2019] [Revised: 11/07/2019] [Accepted: 11/07/2019] [Indexed: 11/06/2022]
Abstract
The Dundee Resource for Sequence Analysis and Structure Prediction (DRSASP; http://www.compbio.dundee.ac.uk/drsasp.html) is a collection of web services provided by the Barton Group at the University of Dundee. DRSASP's flagship services are the JPred4 webserver for secondary structure and solvent accessibility prediction and the JABAWS 2.2 webserver for multiple sequence alignment, disorder prediction, amino acid conservation calculations, and specificity-determining site prediction. DRSASP resources are available through conventional web interfaces and APIs but are also integrated into the Jalview sequence analysis workbench, which enables the composition of multitool interactive workflows. Other existing Barton Group tools are being brought under the banner of DRSASP, including NoD (Nucleolar localization sequence detector) and 14-3-3-Pred. New resources are being developed that enable the analysis of population genetic data in evolutionary and 3D structural contexts. Existing resources are actively developed to exploit new technologies and maintain parity with evolving web standards. DRSASP provides substantial computational resources for public use, and since 2016 DRSASP services have completed over 1.5 million jobs.
Collapse
Affiliation(s)
- Stuart A. MacGowan
- Division of Computational BiologyCollege of Life Sciences, University of DundeeUK
| | - Fábio Madeira
- Division of Computational BiologyCollege of Life Sciences, University of DundeeUK
| | - Thiago Britto‐Borges
- Division of Computational BiologyCollege of Life Sciences, University of DundeeUK
| | - Mateusz Warowny
- Division of Computational BiologyCollege of Life Sciences, University of DundeeUK
| | - Alexey Drozdetskiy
- Division of Computational BiologyCollege of Life Sciences, University of DundeeUK
| | - James B. Procter
- Division of Computational BiologyCollege of Life Sciences, University of DundeeUK
| | - Geoffrey J. Barton
- Division of Computational BiologyCollege of Life Sciences, University of DundeeUK
| |
Collapse
|
11
|
Kulandaisamy A, Zaucha J, Sakthivel R, Frishman D, Michael Gromiha M. Pred‐MutHTP: Prediction of disease‐causing and neutral mutations in human transmembrane proteins. Hum Mutat 2019; 41:581-590. [DOI: 10.1002/humu.23961] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Revised: 11/05/2019] [Accepted: 11/20/2019] [Indexed: 12/24/2022]
Affiliation(s)
- A. Kulandaisamy
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciencesIndian Institute of Technology MadrasChennai Tamilnadu India
| | - Jan Zaucha
- Department of Bioinformatics, Wissenschaftszentrum WeihenstephanTechnische Universität MünchenFreising Germany
| | - Ramasamy Sakthivel
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciencesIndian Institute of Technology MadrasChennai Tamilnadu India
| | - Dmitrij Frishman
- Department of Bioinformatics, Wissenschaftszentrum WeihenstephanTechnische Universität MünchenFreising Germany
- Department of BioinformaticsPeter the Great St. Petersburg Polytechnic UniversitySt. Petersburg Russian Federation
| | - M. Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciencesIndian Institute of Technology MadrasChennai Tamilnadu India
- Advanced Computational Drug Discovery Unit, Tokyo Tech World Research Hub Initiative (WRHI), Institute of Innovative ResearchTokyo Institute of TechnologyYokohama Japan
| |
Collapse
|
12
|
Tikhomirova TS, Galzitskaya OV. Functionally Significant Amino Acid Motifs of Heat Shock Proteins: Structural and Bioinformatics Analyses of Hsp60/Hsp10 in Five Classes of Chordata. Mol Biol 2018. [DOI: 10.1134/s0026893318050138] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
13
|
Han M, Song Y, Qian J, Ming D. Sequence-based prediction of physicochemical interactions at protein functional sites using a function-and-interaction-annotated domain profile database. BMC Bioinformatics 2018; 19:204. [PMID: 29859055 PMCID: PMC5984826 DOI: 10.1186/s12859-018-2206-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 05/15/2018] [Indexed: 01/16/2023] Open
Abstract
Background Identifying protein functional sites (PFSs) and, particularly, the physicochemical interactions at these sites is critical to understanding protein functions and the biochemical reactions involved. Several knowledge-based methods have been developed for the prediction of PFSs; however, accurate methods for predicting the physicochemical interactions associated with PFSs are still lacking. Results In this paper, we present a sequence-based method for the prediction of physicochemical interactions at PFSs. The method is based on a functional site and physicochemical interaction-annotated domain profile database, called fiDPD, which was built using protein domains found in the Protein Data Bank. This method was applied to 13 target proteins from the very recent Critical Assessment of Structure Prediction (CASP10/11), and our calculations gave a Matthews correlation coefficient (MCC) value of 0.66 for PFS prediction and an 80% recall in the prediction of the associated physicochemical interactions. Conclusions Our results show that, in addition to the PFSs, the physical interactions at these sites are also conserved in the evolution of proteins. This work provides a valuable sequence-based tool for rational drug design and side-effect assessment. The method is freely available and can be accessed at http://202.119.249.49.
Collapse
Affiliation(s)
- Min Han
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Yifan Song
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Jiaqiang Qian
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Dengming Ming
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Biotech Building Room B1-404, 30 South Puzhu Road, Jiangsu, 211816, Nanjing, People's Republic of China.
| |
Collapse
|
14
|
Troshin PV, Procter JB, Sherstnev A, Barton DL, Madeira F, Barton GJ. JABAWS 2.2 distributed web services for Bioinformatics: protein disorder, conservation and RNA secondary structure. Bioinformatics 2018; 34:1939-1940. [PMID: 29390042 PMCID: PMC5972556 DOI: 10.1093/bioinformatics/bty045] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2017] [Revised: 12/22/2017] [Accepted: 01/29/2018] [Indexed: 01/08/2023] Open
Abstract
Summary JABAWS 2.2 is a computational framework that simplifies the deployment of web services for Bioinformatics. In addition to the five multiple sequence alignment (MSA) algorithms in JABAWS 1.0, JABAWS 2.2 includes three additional MSA programs (Clustal Omega, MSAprobs, GLprobs), four protein disorder prediction methods (DisEMBL, IUPred, Ronn, GlobPlot), 18 measures of protein conservation as implemented in AACon, and RNA secondary structure prediction by the RNAalifold program. JABAWS 2.2 can be deployed on a variety of in-house or hosted systems. JABAWS 2.2 web services may be accessed from the Jalview multiple sequence analysis workbench (Version 2.8 and later), as well as directly via the JABAWS command line interface (CLI) client. JABAWS 2.2 can be deployed on a local virtual server as a Virtual Appliance (VA) or simply as a Web Application Archive (WAR) for private use. Improvements in JABAWS 2.2 also include simplified installation and a range of utility tools for usage statistics collection, and web services querying and monitoring. The JABAWS CLI client has been updated to support all the new services and allow integration of JABAWS 2.2 services into conventional scripts. A public JABAWS 2 server has been in production since December 2011 and served over 800 000 analyses for users worldwide. Availability and implementation JABAWS 2.2 is made freely available under the Apache 2 license and can be obtained from: http://www.compbio.dundee.ac.uk/jabaws. Contact g.j.barton@dundee.ac.uk.
Collapse
Affiliation(s)
- Peter V Troshin
- Division of Computational Biology, School of Life Sciences, University of Dundee, Dundee, UK
| | - James B Procter
- Division of Computational Biology, School of Life Sciences, University of Dundee, Dundee, UK
| | - Alexander Sherstnev
- Division of Computational Biology, School of Life Sciences, University of Dundee, Dundee, UK
| | - Daniel L Barton
- Division of Computational Biology, School of Life Sciences, University of Dundee, Dundee, UK
| | - Fábio Madeira
- Division of Computational Biology, School of Life Sciences, University of Dundee, Dundee, UK
| | - Geoffrey J Barton
- Division of Computational Biology, School of Life Sciences, University of Dundee, Dundee, UK
| |
Collapse
|
15
|
Tikhomirova TS, Ievlev RS, Suvorina MY, Bobyleva LG, Vikhlyantsev IM, Surin AK, Galzitskaya OV. Search for Functionally Significant Motifs and Amino Acid Residues of Actin. Mol Biol 2018. [DOI: 10.1134/s0026893318010193] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
16
|
Uddin MS, Naider F, Becker JM. Dynamic roles for the N-terminus of the yeast G protein-coupled receptor Ste2p. BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES 2017; 1859:2058-2067. [PMID: 28754538 DOI: 10.1016/j.bbamem.2017.07.014] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Revised: 07/13/2017] [Accepted: 07/24/2017] [Indexed: 12/11/2022]
Abstract
The Saccharomyces cerevisiae α-factor receptor Ste2p has been used extensively as a model to understand the molecular mechanism of signal transduction by G protein-coupled receptors (GPCRs). Single and double cysteine mutants of Ste2p were created and served as surrogates to detect intramolecular interactions and dimerization of Ste2p using disulfide cross-linking methodology. When a mutation was introduced into the phylogenetically conserved tyrosine residue at position 26 (Y26C) in the N-terminus of Ste2p, dimerization was increased greatly. The amount of dimer formed by this Y26C mutant was greatly reduced by ligand binding even though the ligand binding site is far removed from the N-terminus; the lowering of the dimer formation was consistent with a conformational change in the N-terminus of the receptor upon activation. Dimerization was decreased by double mutations Y26C/V109C or Y26C/T114C indicating that Y26 is in close proximity to V109 and T114 of extracellular loop 1 in native Ste2p. Combined with earlier studies, these results indicate previously unrecognized roles for the N-terminus of Ste2p, and perhaps of GPCRs in general, and reveal a specific N-terminus residue or region, that is involved in GPCR signaling, intrareceptor interactions, and receptor dimerization.
Collapse
Affiliation(s)
- M Seraj Uddin
- Department of Microbiology, University of Tennessee, Knoxville, Tennessee 37996, United States
| | - Fred Naider
- Department of Chemistry and Macromolecular Assemblies Institute, College of Staten Island, CUNY, New York, New York 10314, United States; Ph.D. Programs in Biochemistry and Chemistry, The Graduate Center of the City University of New York, New York, NY 10016, United States
| | - Jeffrey M Becker
- Department of Microbiology, University of Tennessee, Knoxville, Tennessee 37996, United States.
| |
Collapse
|
17
|
Indrischek H, Prohaska SJ, Gurevich VV, Gurevich EV, Stadler PF. Uncovering missing pieces: duplication and deletion history of arrestins in deuterostomes. BMC Evol Biol 2017; 17:163. [PMID: 28683816 PMCID: PMC5501109 DOI: 10.1186/s12862-017-1001-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 06/19/2017] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND The cytosolic arrestin proteins mediate desensitization of activated G protein-coupled receptors (GPCRs) via competition with G proteins for the active phosphorylated receptors. Arrestins in active, including receptor-bound, conformation are also transducers of signaling. Therefore, this protein family is an attractive therapeutic target. The signaling outcome is believed to be a result of structural and sequence-dependent interactions of arrestins with GPCRs and other protein partners. Here we elucidated the detailed evolution of arrestins in deuterostomes. RESULTS Identity and number of arrestin paralogs were determined searching deuterostome genomes and gene expression data. In contrast to standard gene prediction methods, our strategy first detects exons situated on different scaffolds and then solves the problem of assigning them to the correct gene. This increases both the completeness and the accuracy of the annotation in comparison to conventional database search strategies applied by the community. The employed strategy enabled us to map in detail the duplication- and deletion history of arrestin paralogs including tandem duplications, pseudogenizations and the formation of retrogenes. The two rounds of whole genome duplications in the vertebrate stem lineage gave rise to four arrestin paralogs. Surprisingly, visual arrestin ARR3 was lost in the mammalian clades Afrotheria and Xenarthra. Duplications in specific clades, on the other hand, must have given rise to new paralogs that show signatures of diversification in functional elements important for receptor binding and phosphate sensing. CONCLUSION The current study traces the functional evolution of deuterostome arrestins in unprecedented detail. Based on a precise re-annotation of the exon-intron structure at nucleotide resolution, we infer the gain and loss of paralogs and patterns of conservation, co-variation and selection.
Collapse
Affiliation(s)
- Henrike Indrischek
- Computational EvoDevo Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany.
- Bioinformatics Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany.
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany.
| | - Sonja J Prohaska
- Computational EvoDevo Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany
| | - Vsevolod V Gurevich
- Department of Pharmacology, Vanderbilt University, 2200 Pierce Ave, Nashville, TN 37232, USA
| | - Eugenia V Gurevich
- Department of Pharmacology, Vanderbilt University, 2200 Pierce Ave, Nashville, TN 37232, USA
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, D-04103, Germany
- Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, Leipzig, D-04103, Germany
- Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, Vienna, A-1090, Austria
- Center for non-coding RNA in Technology and Health, Grønegårdsvej 3, Frederiksberg C, DK-1870, Denmark
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| |
Collapse
|
18
|
Hou Q, Dutilh BE, Huynen MA, Heringa J, Feenstra KA. Sequence specificity between interacting and non-interacting homologs identifies interface residues--a homodimer and monomer use case. BMC Bioinformatics 2015; 16:325. [PMID: 26449222 PMCID: PMC4599308 DOI: 10.1186/s12859-015-0758-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Accepted: 09/30/2015] [Indexed: 11/17/2022] Open
Abstract
Background Protein families participating in protein-protein interactions may contain sub-families that have different binding characteristics, ranging from right binding to showing no interaction at all. Composition differences at the sequence level in these sub-families are often decisive to their differential functional interaction. Methods to predict interface sites from protein sequences typically exploit conservation as a signal. Here, instead, we provide proof of concept that the sequence specificity between interacting versus non-interacting groups can be exploited to recognise interaction sites. Results We collected homodimeric and monomeric proteins and formed homologous groups, each having an interacting (homodimer) subgroup and a non-interacting (monomer) subgroup. We then compiled multiple sequence alignments of the proteins in the homologous groups and identified compositional differences between the homodimeric and monomeric subgroups for each of the alignment positions. Our results show that this specificity signal distinguishes interface and other surface residues with 40.9 % recall and up to 25.1 % precision. Conclusions To our best knowledge, this is the first large scale study that exploits sequence specificity between interacting and non-interacting homologs to predict interaction sites from sequence information only. The performance obtained indicates that this signal contains valuable information to identify protein-protein interaction sites. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0758-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qingzhen Hou
- Center for Integrative Bioinformatics VU (IBIVU), Vrije University Amsterdam, De Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands.
| | - Bas E Dutilh
- Theoretical Biology and Bioinformatics, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands. .,Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands. .,Department of Marine Biology, Institute of Biology, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil.
| | - Martijn A Huynen
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands.
| | - Jaap Heringa
- Center for Integrative Bioinformatics VU (IBIVU), Vrije University Amsterdam, De Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands.
| | - K Anton Feenstra
- Center for Integrative Bioinformatics VU (IBIVU), Vrije University Amsterdam, De Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands.
| |
Collapse
|
19
|
Lee TW, Yang ASP, Brittain T, Birch NP. An analysis approach to identify specific functional sites in orthologous proteins using sequence and structural information: application to neuroserpin reveals regions that differentially regulate inhibitory activity. Proteins 2015; 83:135-52. [PMID: 25363759 DOI: 10.1002/prot.24711] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2014] [Revised: 10/22/2014] [Accepted: 10/27/2014] [Indexed: 01/12/2023]
Abstract
The analysis of sequence conservation is commonly used to predict functionally important sites in proteins. We have developed an approach that first identifies highly conserved sites in a set of orthologous sequences using a weighted substitution-matrix-based conservation score and then filters these conserved sites based on the pattern of conservation present in a wider alignment of sequences from the same family and structural information to identify surface-exposed sites. This allows us to detect specific functional sites in the target protein and exclude regions that are likely to be generally important for the structure or function of the wider protein family. We applied our method to two members of the serpin family of serine protease inhibitors. We first confirmed that our method successfully detected the known heparin binding site in antithrombin while excluding residues known to be generally important in the serpin family. We next applied our sequence analysis approach to neuroserpin and used our results to guide site-directed polyalanine mutagenesis experiments. The majority of the mutant neuroserpin proteins were found to fold correctly and could still form inhibitory complexes with tissue plasminogen activator (tPA). Kinetic analysis of tPA inhibition, however, revealed altered inhibitory kinetics in several of the mutant proteins, with some mutants showing decreased association with tPA and others showing more rapid dissociation of the covalent complex. Altogether, these results confirm that our sequence analysis approach is a useful tool that can be used to guide mutagenesis experiments for the detection of specific functional sites in proteins.
Collapse
Affiliation(s)
- Tet Woo Lee
- School of Biological Sciences and Centre for Brain Research, University of Auckland, Auckland, New Zealand
| | | | | | | |
Collapse
|
20
|
Chakraborty A, Chakrabarti S. A survey on prediction of specificity-determining sites in proteins. Brief Bioinform 2014; 16:71-88. [DOI: 10.1093/bib/bbt092] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
21
|
Chakraborty C, Agrawal A. Computational analysis of C-reactive protein for assessment of molecular dynamics and interaction properties. Cell Biochem Biophys 2013; 67:645-56. [PMID: 23494263 PMCID: PMC3874389 DOI: 10.1007/s12013-013-9553-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Serum C-reactive protein (CRP) is used as a marker of inflammation in several diseases including autoimmune disease and cardiovascular disease. CRP, a member of the pentraxin family, is comprised of five identical subunits. CRP has diverse ligand-binding properties which depend upon different structural states of CRP. However, little is known about the molecular dynamics and interaction properties of CRP. In this study, we used SAPS, SCRATCH protein predictor, PDBsum, ConSurf, ProtScale, Drawhca, ASAView, SCide and SRide server and performed comprehensive analyses of molecular dynamics, protein-protein and residue-residue interactions of CRP. We used 1GNH.pdb file for the crystal structure of human CRP which generated two pentamers (ABCDE and FGHIJ). The number of residues involved in residue-residue interactions between A-B, B-C, C-D, D-E, F-G, G-H, H-I, I-J, A-E and F-J subunits were 12, 11, 10, 11, 12, 11, 10, 11, 10 and 10, respectively. Fifteen antiparallel β sheets were involved in β-sheet topology, and five β hairpins were involved in forming the secondary structure. Analysis of hydrophobic segment distribution revealed deviations in surface hydrophobicity at different cavities present in CRP. Approximately 33 % of all residues were involved in the stabilization centers. We show that the bioinformatics tools can provide a rapid method to predict molecular dynamics and interaction properties of CRP. Our prediction of molecular dynamics and interaction properties of CRP combined with the modeling data based on the known 3D structure of CRP is helpful in designing stable forms of CRP mutants for structure-function studies of CRP and may facilitate in silico drug design for therapeutic targeting of CRP.
Collapse
Affiliation(s)
- Chiranjib Chakraborty
- Department of Bioinformatics, School of Computer and Information Sciences, Galgotias University, Greater Noida, India,
| | | |
Collapse
|
22
|
Chakraborty C, George Priya Doss C, Bandyopadhyay S, Sarkar BK, Syed Haneef SA. Mapping the Structural Topology of IRS Family Cascades Through Computational Biology. Cell Biochem Biophys 2013; 67:1319-31. [DOI: 10.1007/s12013-013-9664-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
23
|
Comparison of different ranking methods in protein-ligand binding site prediction. Int J Mol Sci 2012; 13:8752-8761. [PMID: 22942732 PMCID: PMC3430263 DOI: 10.3390/ijms13078752] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2012] [Revised: 06/19/2012] [Accepted: 07/02/2012] [Indexed: 11/17/2022] Open
Abstract
In recent years, although many ligand-binding site prediction methods have been developed, there has still been a great demand to improve the prediction accuracy and compare different prediction algorithms to evaluate their performances. In this work, in order to improve the performance of the protein-ligand binding site prediction method presented in our former study, a comparison of different binding site ranking lists was studied. Four kinds of properties, i.e., pocket size, distance from the protein centroid, sequence conservation and the number of hydrophobic residues, have been chosen as the corresponding ranking criterion respectively. Our studies show that the sequence conservation information helps to rank the real pockets with the most successful accuracy compared to others. At the same time, the pocket size and the distance of binding site from the protein centroid are also found to be helpful. In addition, a multi-view ranking aggregation method, which combines the information among those four properties, was further applied in our study. The results show that a better performance can be achieved by the aggregation of the complementary properties in the prediction of ligand-binding sites.
Collapse
|
24
|
Dai T, Liu Q, Gao J, Cao Z, Zhu R. A new protein-ligand binding sites prediction method based on the integration of protein sequence conservation information. BMC Bioinformatics 2011; 12 Suppl 14:S9. [PMID: 22373099 PMCID: PMC3287474 DOI: 10.1186/1471-2105-12-s14-s9] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Prediction of protein-ligand binding sites is an important issue for protein function annotation and structure-based drug design. Nowadays, although many computational methods for ligand-binding prediction have been developed, there is still a demanding to improve the prediction accuracy and efficiency. In addition, most of these methods are purely geometry-based, if the prediction methods improvement could be succeeded by integrating physicochemical or sequence properties of protein-ligand binding, it may also be more helpful to address the biological question in such studies. Results In our study, in order to investigate the contribution of sequence conservation in binding sites prediction and to make up the insufficiencies in purely geometry based methods, a simple yet efficient protein-binding sites prediction algorithm is presented, based on the geometry-based cavity identification integrated with sequence conservation information. Our method was compared with the other three classical tools: PocketPicker, SURFNET, and PASS, and evaluated on an existing comprehensive dataset of 210 non-redundant protein-ligand complexes. The results demonstrate that our approach correctly predicted the binding sites in 59% and 75% of cases among the TOP1 candidates and TOP3 candidates in the ranking list, respectively, which performs better than those of SURFNET and PASS, and achieves generally a slight better performance with PocketPicker. Conclusions Our work has successfully indicated the importance of the sequence conservation information in binding sites prediction as well as provided a more accurate way for binding sites identification.
Collapse
Affiliation(s)
- Tianli Dai
- College of Life Science and Biotechnology, Tongji University, 200092, Shanghai, China
| | | | | | | | | |
Collapse
|
25
|
Tungtur S, Parente DJ, Swint-Kruse L. Functionally important positions can comprise the majority of a protein's architecture. Proteins 2011; 79:1589-608. [PMID: 21374721 DOI: 10.1002/prot.22985] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2010] [Revised: 12/08/2010] [Accepted: 12/15/2010] [Indexed: 01/13/2023]
Abstract
Concomitant with the genomic era, many bioinformatics programs have been developed to identify functionally important positions from sequence alignments of protein families. To evaluate these analyses, many have used the LacI/GalR family and determined whether positions predicted to be "important" are validated by published experiments. However, we previously noted that predictions do not identify all of the experimentally important positions present in the linker regions of these homologs. In an attempt to reconcile these differences, we corrected and expanded the LacI/GalR sequence set commonly used in sequence/function analyses. Next, a variety of analyses were carried out (1) for the entire LacI/GalR sequence set and (2) for a subset of homologs with functionally-important "YxPxxxAxxL" motifs in their linkers. This strategy was devised to determine whether predictions could be improved by knowledge-based sequence sorting and-for some analyses-did increase the number of linker positions identified. However, two functionally important linker positions were not reliably identified by any analysis. Finally, we compared the new predictions to all known experimental data for E. coli LacI and three homologous linkers. From these, we estimate that >50% of positions are important to the functions of the LacI/GalR homologs. In corollary, neutral positions might occur less frequently and might be easier to detect in sequence analyses. Although analyses have successfully guided mutations that partially exchange protein functions, a better experimental understanding of the sequence/function relationships in protein families would be helpful for uncovering the remaining rules used by nature to evolve new protein functions.
Collapse
Affiliation(s)
- Sudheer Tungtur
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, MSN 3030, Kansas City, Kansas 66160, USA
| | | | | |
Collapse
|
26
|
Kc DB, Livesay DR. Topology improves phylogenetic motif functional site predictions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:226-233. [PMID: 21071810 DOI: 10.1109/tcbb.2009.60] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Prediction of protein functional sites from sequence-derived data remains an open bioinformatics problem. We have developed a phylogenetic motif (PM) functional site prediction approach that identifies functional sites from alignment fragments that parallel the evolutionary patterns of the family. In our approach, PMs are identified by comparing tree topologies of each alignment fragment to that of the complete phylogeny. Herein, we bypass the phylogenetic reconstruction step and identify PMs directly from distance matrix comparisons. In order to optimize the new algorithm, we consider three different distance matrices and 13 different matrix similarity scores. We assess the performance of the various approaches on a structurally nonredundant data set that includes three types of functional site definitions. Without exception, the predictive power of the original approach outperforms the distance matrix variants. While the distance matrix methods fail to improve upon the original approach, our results are important because they clearly demonstrate that the improved predictive power is based on the topological comparisons. Meaning that phylogenetic trees are a straightforward, yet powerful way to improve functional site prediction accuracy. While complementary studies have shown that topology improves predictions of protein-protein interactions, this report represents the first demonstration that trees improve functional site predictions as well.
Collapse
Affiliation(s)
- Dukka B Kc
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Blvd., Charlotte, NC 28223, USA.
| | | |
Collapse
|
27
|
Networks of high mutual information define the structural proximity of catalytic sites: implications for catalytic residue identification. PLoS Comput Biol 2010; 6:e1000978. [PMID: 21079665 PMCID: PMC2973806 DOI: 10.1371/journal.pcbi.1000978] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2010] [Accepted: 09/27/2010] [Indexed: 11/19/2022] Open
Abstract
Identification of catalytic residues (CR) is essential for the characterization of enzyme function. CR are, in general, conserved and located in the functional site of a protein in order to attain their function. However, many non-catalytic residues are highly conserved and not all CR are conserved throughout a given protein family making identification of CR a challenging task. Here, we put forward the hypothesis that CR carry a particular signature defined by networks of close proximity residues with high mutual information (MI), and that this signature can be applied to distinguish functional from other non-functional conserved residues. Using a data set of 434 Pfam families included in the catalytic site atlas (CSA) database, we tested this hypothesis and demonstrated that MI can complement amino acid conservation scores to detect CR. The Kullback-Leibler (KL) conservation measurement was shown to significantly outperform both the Shannon entropy and maximal frequency measurements. Residues in the proximity of catalytic sites were shown to be rich in shared MI. A structural proximity MI average score (termed pMI) was demonstrated to be a strong predictor for CR, thus confirming the proposed hypothesis. A structural proximity conservation average score (termed pC) was also calculated and demonstrated to carry distinct information from pMI. A catalytic likeliness score (Cls), combining the KL, pC and pMI measures, was shown to lead to significantly improved prediction accuracy. At a specificity of 0.90, the Cls method was found to have a sensitivity of 0.816. In summary, we demonstrate that networks of residues with high MI provide a distinct signature on CR and propose that such a signature should be present in other classes of functional residues where the requirement to maintain a particular function places limitations on the diversification of the structural environment along the course of evolution.
Collapse
|
28
|
Guharoy M, Chakrabarti P. Conserved residue clusters at protein-protein interfaces and their use in binding site identification. BMC Bioinformatics 2010; 11:286. [PMID: 20507585 PMCID: PMC2894039 DOI: 10.1186/1471-2105-11-286] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2010] [Accepted: 05/27/2010] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Biological evolution conserves protein residues that are important for structure and function. Both protein stability and function often require a certain degree of structural co-operativity between spatially neighboring residues and it has previously been shown that conserved residues occur clustered together in protein tertiary structures, enzyme active sites and protein-DNA interfaces. Residues comprising protein interfaces are often more conserved compared to those occurring elsewhere on the protein surface. We investigate the extent to which conserved residues within protein-protein interfaces are clustered together in three-dimensions. RESULTS Out of 121 and 392 interfaces in homodimers and heterocomplexes, 96.7 and 86.7%, respectively, have the conserved positions clustered within the overall interface region. The significance of this clustering was established in comparison to what is seen for the subsets of the same size of randomly selected residues from the interface. Conserved residues occurring in larger interfaces could often be sub-divided into two or more distinct sub-clusters. These structural cluster(s) comprising conserved residues indicate functionally important regions within the protein-protein interface that can be targeted for further structural and energetic analysis by experimental scanning mutagenesis. Almost 60% of experimental hot spot residues (with DeltaDeltaG > 2 kcal/mol) were localized to these conserved residue clusters. An analysis of the residue types that are enriched within these conserved subsets compared to the overall interface showed that hydrophobic and aromatic residues are favored, but charged residues (both positive and negative) are less common. The potential use of this method for discriminating binding sites (interfaces) versus random surface patches was explored by comparing the clustering of conserved residues within each of these regions--in about 50% cases the true interface is ranked among the top 10% of all surface patches. CONCLUSIONS Protein-protein interaction sites are much larger than small molecule biding sites, but still conserved residues are not randomly distributed over the whole interface and are distinctly clustered. The clustered nature of evolutionarily conserved residues within interfaces as compared to those within other surface patches not involved in binding has important implications for the identification of protein-protein binding sites and would have applications in docking studies.
Collapse
Affiliation(s)
- Mainak Guharoy
- Bioinformatics Centre, Bose Institute, P-1/12 CIT Scheme VIIM, Kolkata, India
| | | |
Collapse
|
29
|
Xu Y, Tillier ERM. Regional covariation and its application for predicting protein contact patches. Proteins 2010; 78:548-58. [PMID: 19768681 DOI: 10.1002/prot.22576] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Correlated mutation analysis (CMA) is an effective approach for predicting functional and structural residue interactions from multiple sequence alignments (MSAs) of proteins. As nearby residues may also play a role in a given functional interaction, we were interested in seeing whether covarying sites were clustered, and whether this could be used to enhance the predictive power of CMA. A large-scale search for coevolving regions within protein domains revealed that if two sites in a MSA covary, then neighboring sites in the alignment also typically covary, resulting in clusters of covarying residues. The program PatchD(http://www.uhnres.utoronto.ca/labs/tillier/) was developed to measure the covariation between disconnected sequence clusters to reveal patch covariation. Patches that exhibit strong covariation identify multiple residues that are generally nearby in the protein structure, suggesting that the detection of covarying patches can be used in conjunction with traditional CMA approaches to reveal functional interaction partners.
Collapse
Affiliation(s)
- Yongbai Xu
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | | |
Collapse
|
30
|
Bray T, Chan P, Bougouffa S, Greaves R, Doig AJ, Warwicker J. SitesIdentify: a protein functional site prediction tool. BMC Bioinformatics 2009; 10:379. [PMID: 19922660 PMCID: PMC2783165 DOI: 10.1186/1471-2105-10-379] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2009] [Accepted: 11/18/2009] [Indexed: 01/31/2023] Open
Abstract
Background The rate of protein structures being deposited in the Protein Data Bank surpasses the capacity to experimentally characterise them and therefore computational methods to analyse these structures have become increasingly important. Identifying the region of the protein most likely to be involved in function is useful in order to gain information about its potential role. There are many available approaches to predict functional site, but many are not made available via a publicly-accessible application. Results Here we present a functional site prediction tool (SitesIdentify), based on combining sequence conservation information with geometry-based cleft identification, that is freely available via a web-server. We have shown that SitesIdentify compares favourably to other functional site prediction tools in a comparison of seven methods on a non-redundant set of 237 enzymes with annotated active sites. Conclusion SitesIdentify is able to produce comparable accuracy in predicting functional sites to its closest available counterpart, but in addition achieves improved accuracy for proteins with few characterised homologues. SitesIdentify is available via a webserver at http://www.manchester.ac.uk/bioinformatics/sitesidentify/
Collapse
Affiliation(s)
- Tracey Bray
- Faculty of Life Sciences, The University of Manchester, Michael Smith Building, Oxford Road, Manchester M13 9PT, UK.
| | | | | | | | | | | |
Collapse
|
31
|
Comparing the functional roles of nonconserved sequence positions in homologous transcription repressors: implications for sequence/function analyses. J Mol Biol 2009; 395:785-802. [PMID: 19818797 DOI: 10.1016/j.jmb.2009.10.001] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2009] [Revised: 10/01/2009] [Accepted: 10/02/2009] [Indexed: 11/21/2022]
Abstract
The explosion of protein sequences deduced from genetic code has led to both a problem and a potential resource: Efficient data use requires interpreting the functional impact of sequence change without experimentally characterizing each protein variant. Several groups have hypothesized that interpretation could be aided by analyzing the sequences of naturally occurring homologues. To that end, myriad sequence/function analyses have been developed to predict which conserved, semi-conserved, and nonconserved positions are functionally important. These positions must be discriminated from the nonconserved positions that are functionally silent. However, the assumptions that underlie sequence analyses are based on experimental results that are sparse and usually designed to address different questions. Here, we use three homologues from a test family common to bioinformatics-the LacI/GalR transcription repressors-to test a common assumption: If a position is functionally important for one family member, it has similar importance in all homologues. We generated experimental sequence/function information for each nonconserved position in the 18 amino acids that link the DNA-binding and regulatory domains of three LacI/GalR homologues. We find that the functional importance of each position is preserved among the three linkers, albeit to different degrees. We also find that every linker position contributes to function, which has twofold implications. (1) Since the linker positions range from highly conserved to semi-conserved to nonconserved and contribute to affinity, selectivity, and allosteric response, we assert that sequence/function analyses must identify positions in the LacI/GalR linkers to be qualified as "successful". Many analyses overlook this region since most of the residues do not directly contact ligand. (2) No position in the LacI/GalR linker is functionally silent. This finding is inconsistent with another underlying principle of many analyses: Using sequence sets to discriminate important from non-contributing positions obligates silent positions, which denotes that most homologues tolerate a variety of amino acid substitutions at the position without functional change. Instead, additional combinatorial mutants in the LacI/GalR linkers show that particular substitutions can be silent in a context-dependent manner. Thus, specific permutations of sequence change (rather than change at silent positions) would facilitate neutral drift during evolution. Finally, the combinatorial mutants also reveal functional synergy between semi- and nonconserved positions. Such functional relationships would be missed by analyses that rely primarily upon co-evolution.
Collapse
|
32
|
Kalinina OV, Gelfand MS, Russell RB. Combining specificity determining and conserved residues improves functional site prediction. BMC Bioinformatics 2009; 10:174. [PMID: 19508719 PMCID: PMC2709924 DOI: 10.1186/1471-2105-10-174] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2009] [Accepted: 06/09/2009] [Indexed: 11/16/2022] Open
Abstract
Background Predicting the location of functionally important sites from protein sequence and/or structure is a long-standing problem in computational biology. Most current approaches make use of sequence conservation, assuming that amino acid residues conserved within a protein family are most likely to be functionally important. Most often these approaches do not consider many residues that act to define specific sub-functions within a family, or they make no distinction between residues important for function and those more relevant for maintaining structure (e.g. in the hydrophobic core). Many protein families bind and/or act on a variety of ligands, meaning that conserved residues often only bind a common ligand sub-structure or perform general catalytic activities. Results Here we present a novel method for functional site prediction based on identification of conserved positions, as well as those responsible for determining ligand specificity. We define Specificity-Determining Positions (SDPs), as those occupied by conserved residues within sub-groups of proteins in a family having a common specificity, but differ between groups, and are thus likely to account for specific recognition events. We benchmark the approach on enzyme families of known 3D structure with bound substrates, and find that in nearly all families residues predicted by SDPsite are in contact with the bound substrate, and that the addition of SDPs significantly improves functional site prediction accuracy. We apply SDPsite to various families of proteins containing known three-dimensional structures, but lacking clear functional annotations, and discusse several illustrative examples. Conclusion The results suggest a better means to predict functional details for the thousands of protein structures determined prior to a clear understanding of molecular function.
Collapse
|
33
|
Redfern OC, Dessailly B, Orengo CA. Exploring the structure and function paradigm. Curr Opin Struct Biol 2008; 18:394-402. [PMID: 18554899 DOI: 10.1016/j.sbi.2008.05.007] [Citation(s) in RCA: 88] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2008] [Revised: 04/16/2008] [Accepted: 05/07/2008] [Indexed: 11/29/2022]
Abstract
Advances in protein structure determination, led by the structural genomics initiatives have increased the proportion of novel folds deposited in the Protein Data Bank. However, these structures are often not accompanied by functional annotations with experimental confirmation. In this review, we reassess the meaning of structural novelty and examine its relevance to the complexity of the structure-function paradigm. Recent advances in the prediction of protein function from structure are discussed, as well as new sequence-based methods for partitioning large, diverse superfamilies into biologically meaningful clusters. Obtaining structural data for these functionally coherent groups of proteins will allow us to better understand the relationship between structure and function.
Collapse
Affiliation(s)
- Oliver C Redfern
- Department of Structural and Molecular Biology, University College London, London WC1E 6BT, United Kingdom
| | | | | |
Collapse
|
34
|
Dukka BKC, Livesay DR. Improving position-specific predictions of protein functional sites using phylogenetic motifs. ACTA ACUST UNITED AC 2008; 24:2308-16. [PMID: 18723520 DOI: 10.1093/bioinformatics/btn454] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Accurate computational prediction of protein functional sites is critical to maximizing the utility of recent high-throughput sequencing efforts. Among the available approaches, position-specific conservation scores remain among the most popular due to their accuracy and ease of computation. Unfortunately, high false positive rates remain a limiting factor. Using phylogenetic motifs (PMs), we have developed two combined (conservation + PMs) prediction schemes that significantly improve prediction accuracy. RESULTS Our first approach, called position-specific MINER (psMINER), rank orders alignment columns by conservation. Subsequently, positions that are also not identified as PMs are excluded from the prediction set. This approach improves prediction accuracy, in a statistically significant way, compared to the underlying conservation scores. Increased accuracy is a general result, meaning improvement is observed over several different conservation scores that span a continuum of complexity. In addition, a hybrid MINER (hMINER) that quantitatively considers both scoring regimes provides further improvement. More importantly, it provides critical insight into the relative importance of phylogeny versus alignment conservation. Both methods outperform other common prediction algorithms that also utilize phylogenetic concepts. Finally, we demonstrate that the presented results are critically sensitive to functional site definition, thus highlighting the need for more complete benchmarks within the prediction community.
Collapse
Affiliation(s)
- Bahadur K C Dukka
- Department of Computer Science and Bioinformatics Research Center, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | | |
Collapse
|
35
|
Capra JA, Singh M. Characterization and prediction of residues determining protein functional specificity. ACTA ACUST UNITED AC 2008; 24:1473-80. [PMID: 18450811 PMCID: PMC2718669 DOI: 10.1093/bioinformatics/btn214] [Citation(s) in RCA: 101] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Within a homologous protein family, proteins may be grouped into subtypes that share specific functions that are not common to the entire family. Often, the amino acids present in a small number of sequence positions determine each protein's particular functional specificity. Knowledge of these specificity determining positions (SDPs) aids in protein function prediction, drug design and experimental analysis. A number of sequence-based computational methods have been introduced for identifying SDPs; however, their further development and evaluation have been hindered by the limited number of known experimentally determined SDPs. RESULTS We combine several bioinformatics resources to automate a process, typically undertaken manually, to build a dataset of SDPs. The resulting large dataset, which consists of SDPs in enzymes, enables us to characterize SDPs in terms of their physicochemical and evolutionary properties. It also facilitates the large-scale evaluation of sequence-based SDP prediction methods. We present a simple sequence-based SDP prediction method, GroupSim, and show that, surprisingly, it is competitive with a representative set of current methods. We also describe ConsWin, a heuristic that considers sequence conservation of neighboring amino acids, and demonstrate that it improves the performance of all methods tested on our large dataset of enzyme SDPs. AVAILABILITY Datasets and GroupSim code are available online at http://compbio.cs.princeton.edu/specificity/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- John A Capra
- Department of Computer Science, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
| | | |
Collapse
|