1
|
Kotb HM, Davey NE. xProtCAS: A Toolkit for Extracting Conserved Accessible Surfaces from Protein Structures. Biomolecules 2023; 13:906. [PMID: 37371487 DOI: 10.3390/biom13060906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 05/23/2023] [Accepted: 05/24/2023] [Indexed: 06/29/2023] Open
Abstract
The identification of protein surfaces required for interaction with other biomolecules broadens our understanding of protein function, their regulation by post-translational modification, and the deleterious effect of disease mutations. Protein interaction interfaces are often identifiable as patches of conserved residues on a protein's surface. However, finding conserved accessible surfaces on folded regions requires an understanding of the protein structure to discriminate between functional and structural constraints on residue conservation. With the emergence of deep learning methods for protein structure prediction, high-quality structural models are now available for any protein. In this study, we introduce tools to identify conserved surfaces on AlphaFold2 structural models. We define autonomous structural modules from the structural models and convert these modules to a graph encoding residue topology, accessibility, and conservation. Conserved surfaces are then extracted using a novel eigenvector centrality-based approach. We apply the tool to the human proteome identifying hundreds of uncharacterised yet highly conserved surfaces, many of which contain clinically significant mutations. The xProtCAS tool is available as open-source Python software and an interactive web server.
Collapse
Affiliation(s)
- Hazem M Kotb
- Division of Cancer Biology, The Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, UK
| | - Norman E Davey
- Division of Cancer Biology, The Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, UK
| |
Collapse
|
2
|
Lara Ortiz MT, Martinell García V, Del Rio G. Saturation Mutagenesis of the Transmembrane Region of HokC in Escherichia coli Reveals Its High Tolerance to Mutations. Int J Mol Sci 2021; 22:ijms221910359. [PMID: 34638709 PMCID: PMC8509063 DOI: 10.3390/ijms221910359] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 09/20/2021] [Accepted: 09/22/2021] [Indexed: 11/16/2022] Open
Abstract
Cells adapt to different stress conditions, such as the antibiotics presence. This adaptation sometimes is achieved by changing relevant protein positions, of which the mutability is limited by structural constrains. Understanding the basis of these constrains represent an important challenge for both basic science and potential biotechnological applications. To study these constraints, we performed a systematic saturation mutagenesis of the transmembrane region of HokC, a toxin used by Escherichia coli to control its own population, and observed that 92% of single-point mutations are tolerated and that all the non-tolerated mutations have compensatory mutations that reverse their effect. We provide experimental evidence that HokC accumulates multiple compensatory mutations that are found as correlated mutations in the HokC family multiple sequence alignment. In agreement with these observations, transmembrane proteins show higher probability to present correlated mutations and are less densely packed locally than globular proteins; previous mutagenesis results on transmembrane proteins further support our observations on the high tolerability to mutations of transmembrane regions of proteins. Thus, our experimental results reveal the HokC transmembrane region high tolerance to loss-of-function mutations that is associated with low sequence conservation and high rate of correlated mutations in the HokC family sequences alignment, which are features shared with other transmembrane proteins.
Collapse
|
3
|
Summers TJ, Daniel BP, Cheng Q, DeYonker NJ. Quantifying Inter-Residue Contacts through Interaction Energies. J Chem Inf Model 2019; 59:5034-5044. [DOI: 10.1021/acs.jcim.9b00804] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Thomas J. Summers
- The Department of Chemistry, The University of Memphis, 213 Smith Chemistry Building, Memphis, Tennessee 38152-3550, United States
| | - Baty P. Daniel
- The Department of Chemistry, The University of Memphis, 213 Smith Chemistry Building, Memphis, Tennessee 38152-3550, United States
| | - Qianyi Cheng
- The Department of Chemistry, The University of Memphis, 213 Smith Chemistry Building, Memphis, Tennessee 38152-3550, United States
| | - Nathan J. DeYonker
- The Department of Chemistry, The University of Memphis, 213 Smith Chemistry Building, Memphis, Tennessee 38152-3550, United States
| |
Collapse
|
4
|
Saldaño TE, Tosatto SCE, Parisi G, Fernandez-Alberti S. Network analysis of dynamically important residues in protein structures mediating ligand-binding conformational changes. EUROPEAN BIOPHYSICS JOURNAL: EBJ 2019; 48:559-568. [PMID: 31273390 DOI: 10.1007/s00249-019-01384-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2019] [Revised: 05/31/2019] [Accepted: 07/01/2019] [Indexed: 11/26/2022]
Abstract
According to the generalized conformational selection model, ligand binding involves the co-existence of at least two conformers with different ligand-affinities in a dynamical equilibrium. Conformational transitions between them should be guaranteed by intramolecular vibrational dynamics associated to each conformation. These motions are, therefore, related to the biological function of a protein. Positions whose mutations are found to alter these vibrations the most can be defined as key positions, that is, dynamically important residues that mediate the ligand-binding conformational change. In a previous study, we have shown that these positions are evolutionarily conserved. They correspond to buried aliphatic residues mostly localized in regular structured regions of the protein like β-sheets and α-helices. In the present paper, we perform a network analysis of these key positions for a large dataset of paired protein structures in the ligand-free and ligand-bound form. We observe that networks of interactions between these key positions present larger and more integrated networks with faster transmission of the information. Besides, networks of residues result that are robust to conformational changes. Our results reveal that the conformational diversity of proteins seems to be guaranteed by a network of strongly interconnected key positions rather than individual residues.
Collapse
Affiliation(s)
- Tadeo E Saldaño
- Universidad Nacional de Quilmes/CONICET, Roque Saenz Peña 352, B1876BXD, Bernal, Argentina
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, Viale G. Colombo 3, 5131, Padua, Italy
| | - Gustavo Parisi
- Universidad Nacional de Quilmes/CONICET, Roque Saenz Peña 352, B1876BXD, Bernal, Argentina
| | | |
Collapse
|
5
|
Gil N, Fiser A. The choice of sequence homologs included in multiple sequence alignments has a dramatic impact on evolutionary conservation analysis. Bioinformatics 2019; 35:12-19. [PMID: 29947739 PMCID: PMC6298051 DOI: 10.1093/bioinformatics/bty523] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 04/20/2018] [Accepted: 06/26/2018] [Indexed: 11/12/2022] Open
Abstract
Motivation The analysis of sequence conservation patterns has been widely utilized to identify functionally important (catalytic and ligand-binding) protein residues for over a half-century. Despite decades of development, on average state-of-the-art non-template-based functional residue prediction methods must predict ∼25% of a protein's total residues to correctly identify half of the protein's functional site residues. The overwhelming proportion of false positives results in reported 'F-Scores' of ∼0.3. We investigated the limits of current approaches, focusing on the so-far neglected impact of the specific choice of homologs included in multiple sequence alignments (MSAs). Results The limits of conservation-based functional residue prediction were explored by surveying the binding sites of 1023 proteins. A straightforward conservation analysis of MSAs composed of randomly selected homologs sampled from a PSI-BLAST search achieves average F-Scores of ∼0.3, a performance matching that reported by state-of-the-art methods, which often consider additional features for the prediction in a machine learning setting. Interestingly, we found that a simple combinatorial MSA sampling algorithm will in almost every case produce an MSA with an optimal set of homologs whose conservation analysis reaches average F-Scores of ∼0.6, doubling state-of-the-art performance. We also show that this is nearly at the theoretical limit of possible performance given the agreement between different binding site definitions. Additionally, we showcase the progress in this direction made by Selection of Alignment by Maximal Mutual Information (SAMMI), an information-theory-based approach to identifying biologically informative MSAs. This work highlights the importance and the unused potential of optimally composed MSAs for conservation analysis. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nelson Gil
- Department of Systems & Computational Biology, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Andras Fiser
- Department of Systems & Computational Biology, Albert Einstein College of Medicine, Bronx, NY, USA
| |
Collapse
|
6
|
Yan W, Hu G, Liang Z, Zhou J, Yang Y, Chen J, Shen B. Node-Weighted Amino Acid Network Strategy for Characterization and Identification of Protein Functional Residues. J Chem Inf Model 2018; 58:2024-2032. [PMID: 30107728 DOI: 10.1021/acs.jcim.8b00146] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The study of functional residues (FRs) is essential for understanding protein functions and biological processes. The amino acid network (AAN) has become an emerging paradigm for studying FRs during the past decade. Current AAN models ignore the heterogeneity of nodes and treat amino acids in the AAN as the same. However, the properties of each amino acid node are of fundamental importance. We here proposed a node-weighted AAN strategy termed the node-weighted amino acid contact energy network (NACEN) to characterize and predict three types of FRs, namely, hot spots, catalytic residues, and allosteric residues. We first constructed NACENs with their nodes weighted based on structural, sequence, physicochemical, and dynamical properties of the amino acids and then characterized the FRs with the NACEN parameters. We finally built machine learning predictors to identify each type of FR. The results revealed that residues characterized with NACEN parameters are more distinguishable between FRs and non-FRs than those with unweighted network ones. With few features for classification, NACEN yields comparable performance for FR identification and provides residue level prediction for allosteric regulation. The proposed strategy can be easily implemented to other functional residue identification. An R package is also provided for NACEN construction and analysis at http://sysbio.suda.edu.cn/NACEN/index.html .
Collapse
Affiliation(s)
- Wenying Yan
- Center for systems biology , Soochow University , Suzhou 215006 , China
| | - Guang Hu
- Center for systems biology , Soochow University , Suzhou 215006 , China
| | - Zhongjie Liang
- Center for systems biology , Soochow University , Suzhou 215006 , China
| | - Jianhong Zhou
- Center for systems biology , Soochow University , Suzhou 215006 , China
| | - Yang Yang
- School of computer science and technology , Soochow University , Suzhou 215006 , China
| | - Jiajia Chen
- School of Chemistry, Biology and Material Engineering , Suzhou University of Science and Technology , Suzhou 215011 , China
| | - Bairong Shen
- Center for systems biology , Soochow University , Suzhou 215006 , China
| |
Collapse
|
7
|
Gil N, Fiser A. Identifying functionally informative evolutionary sequence profiles. Bioinformatics 2018; 34:1278-1286. [PMID: 29211823 PMCID: PMC5905606 DOI: 10.1093/bioinformatics/btx779] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Accepted: 11/29/2017] [Indexed: 01/06/2023] Open
Abstract
Motivation Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. Results We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. Contact andras.fiser@einstein.yu.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nelson Gil
- Department of Systems & Computational Biology, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Andras Fiser
- Department of Systems & Computational Biology, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| |
Collapse
|
8
|
Choudhary P, Kumar S, Bachhawat AK, Pandit SB. CSmetaPred: a consensus method for prediction of catalytic residues. BMC Bioinformatics 2017; 18:583. [PMID: 29273005 PMCID: PMC5741869 DOI: 10.1186/s12859-017-1987-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Accepted: 12/05/2017] [Indexed: 01/27/2023] Open
Abstract
Background Knowledge of catalytic residues can play an essential role in elucidating mechanistic details of an enzyme. However, experimental identification of catalytic residues is a tedious and time-consuming task, which can be expedited by computational predictions. Despite significant development in active-site prediction methods, one of the remaining issues is ranked positions of putative catalytic residues among all ranked residues. In order to improve ranking of catalytic residues and their prediction accuracy, we have developed a meta-approach based method CSmetaPred. In this approach, residues are ranked based on the mean of normalized residue scores derived from four well-known catalytic residue predictors. The mean residue score of CSmetaPred is combined with predicted pocket information to improve prediction performance in meta-predictor, CSmetaPred_poc. Results Both meta-predictors are evaluated on two comprehensive benchmark datasets and three legacy datasets using Receiver Operating Characteristic (ROC) and Precision Recall (PR) curves. The visual and quantitative analysis of ROC and PR curves shows that meta-predictors outperform their constituent methods and CSmetaPred_poc is the best of evaluated methods. For instance, on CSAMAC dataset CSmetaPred_poc (CSmetaPred) achieves highest Mean Average Specificity (MAS), a scalar measure for ROC curve, of 0.97 (0.96). Importantly, median predicted rank of catalytic residues is the lowest (best) for CSmetaPred_poc. Considering residues ranked ≤20 classified as true positive in binary classification, CSmetaPred_poc achieves prediction accuracy of 0.94 on CSAMAC dataset. Moreover, on the same dataset CSmetaPred_poc predicts all catalytic residues within top 20 ranks for ~73% of enzymes. Furthermore, benchmarking of prediction on comparative modelled structures showed that models result in better prediction than only sequence based predictions. These analyses suggest that CSmetaPred_poc is able to rank putative catalytic residues at lower (better) ranked positions, which can facilitate and expedite their experimental characterization. Conclusions The benchmarking studies showed that employing meta-approach in combining residue-level scores derived from well-known catalytic residue predictors can improve prediction accuracy as well as provide improved ranked positions of known catalytic residues. Hence, such predictions can assist experimentalist to prioritize residues for mutational studies in their efforts to characterize catalytic residues. Both meta-predictors are available as webserver at: http://14.139.227.206/csmetapred/. Electronic supplementary material The online version of this article (10.1186/s12859-017-1987-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Preeti Choudhary
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, Knowledge City, Sector 81, SAS Nagar, Manuali PO 140306, India
| | - Shailesh Kumar
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, Knowledge City, Sector 81, SAS Nagar, Manuali PO 140306, India.,Laboratory of Biochemistry and Genetics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Anand Kumar Bachhawat
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, Knowledge City, Sector 81, SAS Nagar, Manuali PO 140306, India
| | - Shashi Bhushan Pandit
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, Knowledge City, Sector 81, SAS Nagar, Manuali PO 140306, India.
| |
Collapse
|
9
|
Systematic Identification of Machine-Learning Models Aimed to Classify Critical Residues for Protein Function from Protein Structure. Molecules 2017; 22:molecules22101673. [PMID: 28991206 PMCID: PMC6151554 DOI: 10.3390/molecules22101673] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Revised: 09/24/2017] [Accepted: 09/24/2017] [Indexed: 12/14/2022] Open
Abstract
Protein structure and protein function should be related, yet the nature of this relationship remains unsolved. Mapping the critical residues for protein function with protein structure features represents an opportunity to explore this relationship, yet two important limitations have precluded a proper analysis of the structure-function relationship of proteins: (i) the lack of a formal definition of what critical residues are and (ii) the lack of a systematic evaluation of methods and protein structure features. To address this problem, here we introduce an index to quantify the protein-function criticality of a residue based on experimental data and a strategy aimed to optimize both, descriptors of protein structure (physicochemical and centrality descriptors) and machine learning algorithms, to minimize the error in the classification of critical residues. We observed that both physicochemical and centrality descriptors of residues effectively relate protein structure and protein function, and that physicochemical descriptors better describe critical residues. We also show that critical residues are better classified when residue criticality is considered as a binary attribute (i.e., residues are considered critical or not critical). Using this binary annotation for critical residues 8 models rendered accurate and non-overlapping classification of critical residues, confirming the multi-factorial character of the structure-function relationship of proteins.
Collapse
|
10
|
Metagenome Analysis: a Powerful Tool for Enzyme Bioprospecting. Appl Biochem Biotechnol 2017; 183:636-651. [PMID: 28815469 DOI: 10.1007/s12010-017-2568-3] [Citation(s) in RCA: 61] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Accepted: 07/24/2017] [Indexed: 01/05/2023]
Abstract
Microorganisms are found throughout every corner of nature, and vast number of microorganisms is difficult to cultivate by classical microbiological techniques. The advent of metagenomics has revolutionized the field of microbial biotechnology. Metagenomics allow the recovery of genetic material directly from environmental niches without any cultivation techniques. Currently, metagenomic tools are widely employed as powerful tools to isolate and identify enzymes with novel biocatalytic activities from the uncultivable component of microbial communities. The employment of next-generation sequencing techniques for metagenomics resulted in the generation of large sequence data sets derived from various environments, such as soil, the human body and ocean water. This review article describes the state-of-the-art techniques and tools in metagenomics and discusses the potential of metagenomic approaches for the bioprospecting of industrial enzymes from various environmental samples. We also describe the unusual novel enzymes discovered via metagenomic approaches and discuss the future prospects for metagenome technologies.
Collapse
|
11
|
Biosynthesis of therapeutic natural products using synthetic biology. Adv Drug Deliv Rev 2016; 105:96-106. [PMID: 27094795 DOI: 10.1016/j.addr.2016.04.010] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Revised: 03/24/2016] [Accepted: 04/10/2016] [Indexed: 02/08/2023]
Abstract
Natural products are a group of bioactive structurally diverse chemicals produced by microorganisms and plants. These molecules and their derivatives have contributed to over a third of the therapeutic drugs produced in the last century. However, over the last few decades traditional drug discovery pipelines from natural products have become far less productive and far more expensive. One recent development with promise to combat this trend is the application of synthetic biology to therapeutic natural product biosynthesis. Synthetic biology is a young discipline with roots in systems biology, genetic engineering, and metabolic engineering. In this review, we discuss the use of synthetic biology to engineer improved yields of existing therapeutic natural products. We further describe the use of synthetic biology to combine and express natural product biosynthetic genes in unprecedented ways, and how this holds promise for opening up completely new avenues for drug discovery and production.
Collapse
|
12
|
Ferrer M, Martínez-Martínez M, Bargiela R, Streit WR, Golyshina OV, Golyshin PN. Estimating the success of enzyme bioprospecting through metagenomics: current status and future trends. Microb Biotechnol 2016; 9:22-34. [PMID: 26275154 PMCID: PMC4720405 DOI: 10.1111/1751-7915.12309] [Citation(s) in RCA: 127] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Revised: 06/26/2015] [Accepted: 07/02/2015] [Indexed: 12/01/2022] Open
Abstract
Recent reports have suggested that the establishment of industrially relevant enzyme collections from environmental genomes has become a routine procedure. Across the studies assessed, a mean number of approximately 44 active clones were obtained in an average size of approximately 53,000 clones tested using naïve screening protocols. This number could be significantly increased in shorter times when novel metagenome enzyme sequences obtained by direct sequencing are selected and subjected to high-throughput expression for subsequent production and characterization. The pre-screening of clone libraries by naïve screens followed by the pyrosequencing of the inserts allowed for a 106-fold increase in the success rate of identifying genes encoding enzymes of interest. However, a much longer time, usually on the order of years, is needed from the time of enzyme identification to the establishment of an industrial process. If the hit frequency for the identification of enzymes performing at high turnover rates under real application conditions could be increased while still covering a high natural diversity, the very expensive and time-consuming enzyme optimization phase would likely be significantly shortened. At this point, it is important to review the current knowledge about the success of fine-tuned naïve- and sequence-based screening protocols for enzyme selection and to describe the environments worldwide that have already been subjected to enzyme screen programmes through metagenomic tools. Here, we provide such estimations and suggest the current challenges and future actions needed before environmental enzymes can be successfully introduced into the market.
Collapse
Affiliation(s)
- Manuel Ferrer
- Institute of Catalysis, Consejo Superior de Investigaciones Científicas (CSIC), Marie Curie 2, 28049, Madrid, Spain
| | - Mónica Martínez-Martínez
- Institute of Catalysis, Consejo Superior de Investigaciones Científicas (CSIC), Marie Curie 2, 28049, Madrid, Spain
| | - Rafael Bargiela
- Institute of Catalysis, Consejo Superior de Investigaciones Científicas (CSIC), Marie Curie 2, 28049, Madrid, Spain
| | - Wolfgang R Streit
- Biozentrum Klein Flottbek, Universität Hamburg, Ohnhorststraße 18, D-22609, Hamburg, Germany
| | - Olga V Golyshina
- School of Biological Sciences, Bangor University, LL57 2UW, Gwynedd, UK
| | - Peter N Golyshin
- School of Biological Sciences, Bangor University, LL57 2UW, Gwynedd, UK
| |
Collapse
|
13
|
Aubailly S, Piazza F. Cutoff lensing: predicting catalytic sites in enzymes. Sci Rep 2015; 5:14874. [PMID: 26445900 PMCID: PMC4597221 DOI: 10.1038/srep14874] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2015] [Accepted: 09/10/2015] [Indexed: 01/12/2023] Open
Abstract
Predicting function-related amino acids in proteins with unknown function or unknown allosteric binding sites in drug-targeted proteins is a task of paramount importance in molecular biomedicine. In this paper we introduce a simple, light and computationally inexpensive structure-based method to identify catalytic sites in enzymes. Our method, termed cutoff lensing, is a general procedure consisting in letting the cutoff used to build an elastic network model increase to large values. A validation of our method against a large database of annotated enzymes shows that optimal values of the cutoff exist such that three different structure-based indicators allow one to recover a maximum of the known catalytic sites. Interestingly, we find that the larger the structures the greater the predictive power afforded by our method. Possible ways to combine the three indicators into a single figure of merit and into a specific sequential analysis are suggested and discussed with reference to the classic case of HIV-protease. Our method could be used as a complement to other sequence- and/or structure-based methods to narrow the results of large-scale screenings.
Collapse
Affiliation(s)
- Simon Aubailly
- Université d'Orléans, Centre de Biophysique Moléculaire, CNRS-UPR4301, Rue C. Sadron, 45071, Orléans, France
| | - Francesco Piazza
- Université d'Orléans, Centre de Biophysique Moléculaire, CNRS-UPR4301, Rue C. Sadron, 45071, Orléans, France
| |
Collapse
|
14
|
PINGU: PredIction of eNzyme catalytic residues usinG seqUence information. PLoS One 2015; 10:e0135122. [PMID: 26261982 PMCID: PMC4532418 DOI: 10.1371/journal.pone.0135122] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2015] [Accepted: 07/17/2015] [Indexed: 11/19/2022] Open
Abstract
Identification of catalytic residues can help unveil interesting attributes of enzyme function for various therapeutic and industrial applications. Based on their biochemical roles, the number of catalytic residues and sequence lengths of enzymes vary. This article describes a prediction approach (PINGU) for such a scenario. It uses models trained using physicochemical properties and evolutionary information of 650 non-redundant enzymes (2136 catalytic residues) in a support vector machines architecture. Independent testing on 200 non-redundant enzymes (683 catalytic residues) in predefined prediction settings, i.e., with non-catalytic per catalytic residue ranging from 1 to 30, suggested that the prediction approach was highly sensitive and specific, i.e., 80% or above, over the incremental challenges. To learn more about the discriminatory power of PINGU in real scenarios, where the prediction challenge is variable and susceptible to high false positives, the best model from independent testing was used on 60 diverse enzymes. Results suggested that PINGU was able to identify most catalytic residues and non-catalytic residues properly with 80% or above accuracy, sensitivity and specificity. The effect of false positives on precision was addressed in this study by application of predicted ligand-binding residue information as a post-processing filter. An overall improvement of 20% in F-measure and 0.138 in Correlation Coefficient with 16% enhanced precision could be achieved. On account of its encouraging performance, PINGU is hoped to have eventual applications in boosting enzyme engineering and novel drug discovery.
Collapse
|
15
|
Piégu B, Bire S, Arensburger P, Bigot Y. A survey of transposable element classification systems--a call for a fundamental update to meet the challenge of their diversity and complexity. Mol Phylogenet Evol 2015; 86:90-109. [PMID: 25797922 DOI: 10.1016/j.ympev.2015.03.009] [Citation(s) in RCA: 81] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2014] [Revised: 03/11/2015] [Accepted: 03/12/2015] [Indexed: 10/25/2022]
Abstract
The increase of publicly available sequencing data has allowed for rapid progress in our understanding of genome composition. As new information becomes available we should constantly be updating and reanalyzing existing and newly acquired data. In this report we focus on transposable elements (TEs) which make up a significant portion of nearly all sequenced genomes. Our ability to accurately identify and classify these sequences is critical to understanding their impact on host genomes. At the same time, as we demonstrate in this report, problems with existing classification schemes have led to significant misunderstandings of the evolution of both TE sequences and their host genomes. In a pioneering publication Finnegan (1989) proposed classifying all TE sequences into two classes based on transposition mechanisms and structural features: the retrotransposons (class I) and the DNA transposons (class II). We have retraced how ideas regarding TE classification and annotation in both prokaryotic and eukaryotic scientific communities have changed over time. This has led us to observe that: (1) a number of TEs have convergent structural features and/or transposition mechanisms that have led to misleading conclusions regarding their classification, (2) the evolution of TEs is similar to that of viruses by having several unrelated origins, (3) there might be at least 8 classes and 12 orders of TEs including 10 novel orders. In an effort to address these classification issues we propose: (1) the outline of a universal TE classification, (2) a set of methods and classification rules that could be used by all scientific communities involved in the study of TEs, and (3) a 5-year schedule for the establishment of an International Committee for Taxonomy of Transposable Elements (ICTTE).
Collapse
Affiliation(s)
- Benoît Piégu
- UMR INRA-CNRS 7247, PRC, Centre INRA de Nouzilly, 37380 Nouzilly, France
| | - Solenne Bire
- UMR INRA-CNRS 7247, PRC, Centre INRA de Nouzilly, 37380 Nouzilly, France; Institute of Biotechnology, University of Lausanne, Center for Biotechnology UNIL-EPFL, 1015 Lausanne, Switzerland
| | - Peter Arensburger
- UMR INRA-CNRS 7247, PRC, Centre INRA de Nouzilly, 37380 Nouzilly, France; Biological Sciences Department, California State Polytechnic University, Pomona, CA 91768, United States.
| | - Yves Bigot
- UMR INRA-CNRS 7247, PRC, Centre INRA de Nouzilly, 37380 Nouzilly, France.
| |
Collapse
|
16
|
Basis for substrate recognition and distinction by matrix metalloproteinases. Proc Natl Acad Sci U S A 2014; 111:E4148-55. [PMID: 25246591 DOI: 10.1073/pnas.1406134111] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Genomic sequencing and structural genomics produced a vast amount of sequence and structural data, creating an opportunity for structure-function analysis in silico [Radivojac P, et al. (2013) Nat Methods 10(3):221-227]. Unfortunately, only a few large experimental datasets exist to serve as benchmarks for function-related predictions. Furthermore, currently there are no reliable means to predict the extent of functional similarity among proteins. Here, we quantify structure-function relationships among three phylogenetic branches of the matrix metalloproteinase (MMP) family by comparing their cleavage efficiencies toward an extended set of phage peptide substrates that were selected from ∼ 64 million peptide sequences (i.e., a large unbiased representation of substrate space). The observed second-order rate constants [k(obs)] across the substrate space provide a distance measure of functional similarity among the MMPs. These functional distances directly correlate with MMP phylogenetic distance. There is also a remarkable and near-perfect correlation between the MMP substrate preference and sequence identity of 50-57 discontinuous residues surrounding the catalytic groove. We conclude that these residues represent the specificity-determining positions (SDPs) that allowed for the expansion of MMP proteolytic function during evolution. A transmutation of only a few selected SDPs proximal to the bound substrate peptide, and contributing the most to selectivity among the MMPs, is sufficient to enact a global change in the substrate preference of one MMP to that of another, indicating the potential for the rational and focused redesign of cleavage specificity in MMPs.
Collapse
|
17
|
EXIA2: web server of accurate and rapid protein catalytic residue prediction. BIOMED RESEARCH INTERNATIONAL 2014; 2014:807839. [PMID: 25295274 PMCID: PMC4177735 DOI: 10.1155/2014/807839] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/21/2014] [Revised: 05/27/2014] [Accepted: 06/11/2014] [Indexed: 11/18/2022]
Abstract
We propose a method (EXIA2) of catalytic residue prediction based on protein structure without needing homology information. The method is based on the special side chain orientation of catalytic residues. We found that the side chain of catalytic residues usually points to the center of the catalytic site. The special orientation is usually observed in catalytic residues but not in noncatalytic residues, which usually have random side chain orientation. The method is shown to be the most accurate catalytic residue prediction method currently when combined with PSI-Blast sequence conservation. It performs better than other competing methods on several benchmark datasets that include over 1,200 enzyme structures. The areas under the ROC curve (AUC) on these benchmark datasets are in the range from 0.934 to 0.968.
Collapse
|
18
|
Aran M, Smal C, Pellizza L, Gallo M, Otero LH, Klinke S, Goldbaum FA, Ithurralde ER, Bercovich A, Mac Cormack WP, Turjanski AG, Cicero DO. Solution and crystal structure of BA42, a protein from the Antarctic bacteriumBizionia argentinensiscomprised of a stand-alone TPM domain. Proteins 2014; 82:3062-78. [DOI: 10.1002/prot.24667] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2014] [Revised: 08/01/2014] [Accepted: 08/06/2014] [Indexed: 11/11/2022]
Affiliation(s)
- Martin Aran
- Fundación Instituto Leloir, IIBBA-CONICET, Patricias Argentinas 435 (C1405BWE); Buenos Aires Argentina
| | - Clara Smal
- Fundación Instituto Leloir, IIBBA-CONICET, Patricias Argentinas 435 (C1405BWE); Buenos Aires Argentina
| | - Leonardo Pellizza
- Fundación Instituto Leloir, IIBBA-CONICET, Patricias Argentinas 435 (C1405BWE); Buenos Aires Argentina
| | - Mariana Gallo
- Fundación Instituto Leloir, IIBBA-CONICET, Patricias Argentinas 435 (C1405BWE); Buenos Aires Argentina
| | - Lisandro H. Otero
- Fundación Instituto Leloir, IIBBA-CONICET, Patricias Argentinas 435 (C1405BWE); Buenos Aires Argentina
- Plataforma Argentina de Biología Estructural y Metabolómica PLABEM, Patricias Argentinas 435 (C1405BWE); Buenos Aires Argentina
| | - Sebastián Klinke
- Fundación Instituto Leloir, IIBBA-CONICET, Patricias Argentinas 435 (C1405BWE); Buenos Aires Argentina
- Plataforma Argentina de Biología Estructural y Metabolómica PLABEM, Patricias Argentinas 435 (C1405BWE); Buenos Aires Argentina
| | - Fernando A. Goldbaum
- Fundación Instituto Leloir, IIBBA-CONICET, Patricias Argentinas 435 (C1405BWE); Buenos Aires Argentina
- Plataforma Argentina de Biología Estructural y Metabolómica PLABEM, Patricias Argentinas 435 (C1405BWE); Buenos Aires Argentina
| | - Esteban R. Ithurralde
- Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales; Universidad de Buenos Aires, e INQUIMAE-CONICET, Intendente Güiraldes 2160 (C1428EGA); Buenos Aires Argentina
| | - Andrés Bercovich
- Biosidus S.A., Constitución 4234 (C1254ABX); Buenos Aires Argentina
| | | | - Adrián G. Turjanski
- Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales; Universidad de Buenos Aires, e INQUIMAE-CONICET, Intendente Güiraldes 2160 (C1428EGA); Buenos Aires Argentina
| | - Daniel O. Cicero
- Dipartimento di Scienze e Tecnologie Chimiche; Università di Roma “Tor Vergata”, via della Ricerca Scientifica SNC (00133); Rome Italy
| |
Collapse
|
19
|
Mahalingam R, Peng HP, Yang AS. Prediction of fatty acid-binding residues on protein surfaces with three-dimensional probability distributions of interacting atoms. Biophys Chem 2014; 192:10-9. [PMID: 24934883 DOI: 10.1016/j.bpc.2014.05.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Revised: 05/22/2014] [Accepted: 05/22/2014] [Indexed: 10/25/2022]
Abstract
Protein-fatty acid interaction is vital for many cellular processes and understanding this interaction is important for functional annotation as well as drug discovery. In this work, we present a method for predicting the fatty acid (FA)-binding residues by using three-dimensional probability density distributions of interacting atoms of FAs on protein surfaces which are derived from the known protein-FA complex structures. A machine learning algorithm was established to learn the characteristic patterns of the probability density maps specific to the FA-binding sites. The predictor was trained with five-fold cross validation on a non-redundant training set and then evaluated with an independent test set as well as on holo-apo pair's dataset. The results showed good accuracy in predicting the FA-binding residues. Further, the predictor developed in this study is implemented as an online server which is freely accessible at the following website, http://ismblab.genomics.sinica.edu.tw/.
Collapse
Affiliation(s)
| | - Hung-Pin Peng
- Genomics Research Center, Academia Sinica, Taipei 115, Taiwan; Institute of Biomedical Informatics, National Yang-Ming University, Taipei 11221, Taiwan; Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei 115, Taiwan
| | - An-Suei Yang
- Genomics Research Center, Academia Sinica, Taipei 115, Taiwan.
| |
Collapse
|
20
|
Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative. Proc Natl Acad Sci U S A 2014; 111:3733-8. [PMID: 24567391 DOI: 10.1073/pnas.1321614111] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases, which are effectively redefining the scope and contribution of these large-scale structure-based efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins--including proteins for which reliable homology models can be generated--on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long.
Collapse
|