1
|
Walther D. Specifics of Metabolite-Protein Interactions and Their Computational Analysis and Prediction. Methods Mol Biol 2023; 2554:179-197. [PMID: 36178627 DOI: 10.1007/978-1-0716-2624-5_12] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Computational approaches to the characterization and prediction of compound-protein interactions have a long research history and are well established, driven primarily by the needs of drug development. While, in principle, many of the computational methods developed in the context of drug development can also be applied directly to the investigation of metabolite-protein interactions, the interactions of metabolites with proteins (enzymes) are characterized by a number of particularities that result from their natural evolutionary origin and their biological and biochemical roles, as well as from a different problem setting when investigating them. In this review, these special aspects will be highlighted and recent research on them and developed computational approaches presented, along with available resources. They concern, among others, binding promiscuity, allostery, the role of posttranslational modifications, molecular steering and crowding effects, and metabolic conversion rate predictions. Recent breakthroughs in the field of protein structure prediction and newly developed machine learning techniques are being discussed as a tremendous opportunity for developing a more detailed molecular understanding of metabolism.
Collapse
Affiliation(s)
- Dirk Walther
- Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany.
| |
Collapse
|
2
|
Brysbaert G, Lensink MF. Centrality Measures in Residue Interaction Networks to Highlight Amino Acids in Protein–Protein Binding. FRONTIERS IN BIOINFORMATICS 2021; 1:684970. [PMID: 36303777 PMCID: PMC9581030 DOI: 10.3389/fbinf.2021.684970] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 05/17/2021] [Indexed: 12/21/2022] Open
Abstract
Residue interaction networks (RINs) describe a protein structure as a network of interacting residues. Central nodes in these networks, identified by centrality analyses, highlight those residues that play a role in the structure and function of the protein. However, little is known about the capability of such analyses to identify residues involved in the formation of macromolecular complexes. Here, we performed six different centrality measures on the RINs generated from the complexes of the SKEMPI 2 database of changes in protein–protein binding upon mutation in order to evaluate the capability of each of these measures to identify major binding residues. The analyses were performed with and without the crystallographic water molecules, in addition to the protein residues. We also investigated the use of a weight factor based on the inter-residue distances to improve the detection of these residues. We show that for the identification of major binding residues, closeness, degree, and PageRank result in good precision, whereas betweenness, eigenvector, and residue centrality analyses give a higher sensitivity. Including water in the analysis improves the sensitivity of all measures without losing precision. Applying weights only slightly raises the sensitivity of eigenvector centrality analysis. We finally show that a combination of multiple centrality analyses is the optimal approach to identify residues that play a role in protein–protein interaction.
Collapse
|
3
|
Lugo-Martinez J, Zeiberg D, Gaudelet T, Malod-Dognin N, Przulj N, Radivojac P. Classification in biological networks with hypergraphlet kernels. Bioinformatics 2021; 37:1000-1007. [PMID: 32886115 DOI: 10.1093/bioinformatics/btaa768] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 06/13/2020] [Accepted: 08/26/2020] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION Biological and cellular systems are often modeled as graphs in which vertices represent objects of interest (genes, proteins and drugs) and edges represent relational ties between these objects (binds-to, interacts-with and regulates). This approach has been highly successful owing to the theory, methodology and software that support analysis and learning on graphs. Graphs, however, suffer from information loss when modeling physical systems due to their inability to accurately represent multiobject relationships. Hypergraphs, a generalization of graphs, provide a framework to mitigate information loss and unify disparate graph-based methodologies. RESULTS We present a hypergraph-based approach for modeling biological systems and formulate vertex classification, edge classification and link prediction problems on (hyper)graphs as instances of vertex classification on (extended, dual) hypergraphs. We then introduce a novel kernel method on vertex- and edge-labeled (colored) hypergraphs for analysis and learning. The method is based on exact and inexact (via hypergraph edit distances) enumeration of hypergraphlets; i.e. small hypergraphs rooted at a vertex of interest. We empirically evaluate this method on fifteen biological networks and show its potential use in a positive-unlabeled setting to estimate the interactome sizes in various species. AVAILABILITY AND IMPLEMENTATION https://github.com/jlugomar/hypergraphlet-kernels. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jose Lugo-Martinez
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Daniel Zeiberg
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
| | - Thomas Gaudelet
- Department of Computer Science, University College London, London WC1E 6BT, UK
| | | | - Natasa Przulj
- Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain.,ICREA, Pg. Lluis Companys 23, Barcelona 08010, Spain
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
| |
Collapse
|
4
|
Barot M, Gligorijević V, Cho K, Bonneau R. NetQuilt: Deep Multispecies Network-based Protein Function Prediction using Homology-informed Network Similarity. Bioinformatics 2021; 37:2414-2422. [PMID: 33576802 PMCID: PMC8388039 DOI: 10.1093/bioinformatics/btab098] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 02/04/2021] [Accepted: 02/09/2021] [Indexed: 02/02/2023] Open
Abstract
Motivation Transferring knowledge between species is challenging: different species contain distinct proteomes and cellular architectures, which cause their proteins to carry out different functions via different interaction networks. Many approaches to protein functional annotation use sequence similarity to transfer knowledge between species. These approaches cannot produce accurate predictions for proteins without homologues of known function, as many functions require cellular context for meaningful prediction. To supply this context, network-based methods use protein-protein interaction (PPI) networks as a source of information for inferring protein function and have demonstrated promising results in function prediction. However, most of these methods are tied to a network for a single species, and many species lack biological networks. Results In this work, we integrate sequence and network information across multiple species by computing IsoRank similarity scores to create a meta-network profile of the proteins of multiple species. We use this integrated multispecies meta-network as input to train a maxout neural network with Gene Ontology terms as target labels. Our multispecies approach takes advantage of more training examples, and consequently leads to significant improvements in function prediction performance compared to two network-based methods, a deep learning sequence-based method and the BLAST annotation method used in the Critial Assessment of Functional Annotation. We are able to demonstrate that our approach performs well even in cases where a species has no network information available: when an organism’s PPI network is left out we can use our multi-species method to make predictions for the left-out organism with good performance. Availability and implementation The code is freely available at https://github.com/nowittynamesleft/NetQuilt. The data, including sequences, PPI networks and GO annotations are available at https://string-db.org/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Meet Barot
- Center for Data Science, New York University, New York, 10011, USA
| | | | - Kyunghyun Cho
- Center for Data Science, New York University, New York, 10011, USA
| | - Richard Bonneau
- Center for Data Science, New York University, New York, 10011, USA.,Center for Computational Biology, Flatiron Institute, New York, 10010, USA
| |
Collapse
|
5
|
Newaz K, Wright G, Piland J, Li J, Clark PL, Emrich SJ, Milenković T. Network analysis of synonymous codon usage. Bioinformatics 2020; 36:4876-4884. [PMID: 32609328 DOI: 10.1093/bioinformatics/btaa603] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2019] [Revised: 05/05/2020] [Accepted: 06/22/2020] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Most amino acids are encoded by multiple synonymous codons, some of which are used more rarely than others. Analyses of positions of such rare codons in protein sequences revealed that rare codons can impact co-translational protein folding and that positions of some rare codons are evolutionarily conserved. Analyses of their positions in protein 3-dimensional structures, which are richer in biochemical information than sequences alone, might further explain the role of rare codons in protein folding. RESULTS We model protein structures as networks and use network centrality to measure the structural position of an amino acid. We first validate that amino acids buried within the structural core are network-central, and those on the surface are not. Then, we study potential differences between network centralities and thus structural positions of amino acids encoded by conserved rare, non-conserved rare and commonly used codons. We find that in 84% of proteins, the three codon categories occupy significantly different structural positions. We examine protein groups showing different codon centrality trends, i.e. different relationships between structural positions of the three codon categories. We see several cases of all proteins from our data with some structural or functional property being in the same group. Also, we see a case of all proteins in some group having the same property. Our work shows that codon usage is linked to the final protein structure and thus possibly to co-translational protein folding. AVAILABILITY AND IMPLEMENTATION https://nd.edu/∼cone/CodonUsage/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Khalique Newaz
- Department of Computer Science and Engineering.,Center for Network and Data Science.,Eck institute for Global Health
| | - Gabriel Wright
- Department of Computer Science and Engineering.,Eck institute for Global Health
| | - Jacob Piland
- Department of Computer Science and Engineering.,Center for Network and Data Science.,Eck institute for Global Health
| | - Jun Li
- Department of Applied and Computational Mathematics and Statistics
| | - Patricia L Clark
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Scott J Emrich
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN 37996, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering.,Center for Network and Data Science.,Eck institute for Global Health
| |
Collapse
|
6
|
Newaz K, Ghalehnovi M, Rahnama A, Antsaklis PJ, Milenković T. Network-based protein structural classification. ROYAL SOCIETY OPEN SCIENCE 2020; 7:191461. [PMID: 32742675 PMCID: PMC7353965 DOI: 10.1098/rsos.191461] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 05/05/2020] [Indexed: 06/11/2023]
Abstract
Experimental determination of protein function is resource-consuming. As an alternative, computational prediction of protein function has received attention. In this context, protein structural classification (PSC) can help, by allowing for determining structural classes of currently unclassified proteins based on their features, and then relying on the fact that proteins with similar structures have similar functions. Existing PSC approaches rely on sequence-based or direct three-dimensional (3D) structure-based protein features. By contrast, we first model 3D structures of proteins as protein structure networks (PSNs). Then, we use network-based features for PSC. We propose the use of graphlets, state-of-the-art features in many research areas of network science, in the task of PSC. Moreover, because graphlets can deal only with unweighted PSNs, and because accounting for edge weights when constructing PSNs could improve PSC accuracy, we also propose a deep learning framework that automatically learns network features from weighted PSNs. When evaluated on a large set of approximately 9400 CATH and approximately 12 800 SCOP protein domains (spanning 36 PSN sets), the best of our proposed approaches are superior to existing PSC approaches in terms of accuracy, with comparable running times. Our data and code are available at https://doi.org/10.5281/zenodo.3787922.
Collapse
Affiliation(s)
- Khalique Newaz
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
- Center for Network and Data Science, University of Notre Dame, Notre Dame, IN 46556, USA
- Eck institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Mahboobeh Ghalehnovi
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Arash Rahnama
- Department of Electrical Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Panos J. Antsaklis
- Department of Electrical Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
- Center for Network and Data Science, University of Notre Dame, Notre Dame, IN 46556, USA
- Eck institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
7
|
Yan W, Hu G, Liang Z, Zhou J, Yang Y, Chen J, Shen B. Node-Weighted Amino Acid Network Strategy for Characterization and Identification of Protein Functional Residues. J Chem Inf Model 2018; 58:2024-2032. [PMID: 30107728 DOI: 10.1021/acs.jcim.8b00146] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The study of functional residues (FRs) is essential for understanding protein functions and biological processes. The amino acid network (AAN) has become an emerging paradigm for studying FRs during the past decade. Current AAN models ignore the heterogeneity of nodes and treat amino acids in the AAN as the same. However, the properties of each amino acid node are of fundamental importance. We here proposed a node-weighted AAN strategy termed the node-weighted amino acid contact energy network (NACEN) to characterize and predict three types of FRs, namely, hot spots, catalytic residues, and allosteric residues. We first constructed NACENs with their nodes weighted based on structural, sequence, physicochemical, and dynamical properties of the amino acids and then characterized the FRs with the NACEN parameters. We finally built machine learning predictors to identify each type of FR. The results revealed that residues characterized with NACEN parameters are more distinguishable between FRs and non-FRs than those with unweighted network ones. With few features for classification, NACEN yields comparable performance for FR identification and provides residue level prediction for allosteric regulation. The proposed strategy can be easily implemented to other functional residue identification. An R package is also provided for NACEN construction and analysis at http://sysbio.suda.edu.cn/NACEN/index.html .
Collapse
Affiliation(s)
- Wenying Yan
- Center for systems biology , Soochow University , Suzhou 215006 , China
| | - Guang Hu
- Center for systems biology , Soochow University , Suzhou 215006 , China
| | - Zhongjie Liang
- Center for systems biology , Soochow University , Suzhou 215006 , China
| | - Jianhong Zhou
- Center for systems biology , Soochow University , Suzhou 215006 , China
| | - Yang Yang
- School of computer science and technology , Soochow University , Suzhou 215006 , China
| | - Jiajia Chen
- School of Chemistry, Biology and Material Engineering , Suzhou University of Science and Technology , Suzhou 215011 , China
| | - Bairong Shen
- Center for systems biology , Soochow University , Suzhou 215006 , China
| |
Collapse
|
8
|
Gu S, Johnson J, Faisal FE, Milenković T. From homogeneous to heterogeneous network alignment via colored graphlets. Sci Rep 2018; 8:12524. [PMID: 30131590 PMCID: PMC6104050 DOI: 10.1038/s41598-018-30831-w] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Accepted: 08/07/2018] [Indexed: 11/19/2022] Open
Abstract
Network alignment (NA) compares networks with the goal of finding a node mapping that uncovers highly similar (conserved) network regions. Existing NA methods are homogeneous, i.e., they can deal only with networks containing nodes and edges of one type. Due to increasing amounts of heterogeneous network data with nodes or edges of different types, we extend three recent state-of-the-art homogeneous NA methods, WAVE, MAGNA++, and SANA, to allow for heterogeneous NA for the first time. We introduce several algorithmic novelties. Namely, these existing methods compute homogeneous graphlet-based node similarities and then find high-scoring alignments with respect to these similarities, while simultaneously maximizing the amount of conserved edges. Instead, we extend homogeneous graphlets to their heterogeneous counterparts, which we then use to develop a new measure of heterogeneous node similarity. Also, we extend S3, a state-of-the-art measure of edge conservation for homogeneous NA, to its heterogeneous counterpart. Then, we find high-scoring alignments with respect to our heterogeneous node similarity and edge conservation measures. In evaluations on synthetic and real-world biological networks, our proposed heterogeneous NA methods lead to higher-quality alignments and better robustness to noise in the data than their homogeneous counterparts. The software and data from this work is available at https://nd.edu/~cone/colored_graphlets/.
Collapse
Affiliation(s)
- Shawn Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - John Johnson
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Fazle E Faisal
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
- Eck Institute for Global Health and Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA.
- Eck Institute for Global Health and Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, 46556, USA.
| |
Collapse
|
9
|
Cannoodt R, Ruyssinck J, Ramon J, De Preter K, Saeys Y. IncGraph: Incremental graphlet counting for topology optimisation. PLoS One 2018; 13:e0195997. [PMID: 29698494 PMCID: PMC5919487 DOI: 10.1371/journal.pone.0195997] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Accepted: 04/04/2018] [Indexed: 01/22/2023] Open
Abstract
MOTIVATION Graphlets are small network patterns that can be counted in order to characterise the structure of a network (topology). As part of a topology optimisation process, one could use graphlet counts to iteratively modify a network and keep track of the graphlet counts, in order to achieve certain topological properties. Up until now, however, graphlets were not suited as a metric for performing topology optimisation; when millions of minor changes are made to the network structure it becomes computationally intractable to recalculate all the graphlet counts for each of the edge modifications. RESULTS IncGraph is a method for calculating the differences in graphlet counts with respect to the network in its previous state, which is much more efficient than calculating the graphlet occurrences from scratch at every edge modification made. In comparison to static counting approaches, our findings show IncGraph reduces the execution time by several orders of magnitude. The usefulness of this approach was demonstrated by developing a graphlet-based metric to optimise gene regulatory networks. IncGraph is able to quickly quantify the topological impact of small changes to a network, which opens novel research opportunities to study changes in topologies in evolving or online networks, or develop graphlet-based criteria for topology optimisation. AVAILABILITY IncGraph is freely available as an open-source R package on CRAN (incgraph). The development version is also available on GitHub (rcannood/incgraph).
Collapse
Affiliation(s)
- Robrecht Cannoodt
- Data Mining and Modelling for Biomedicine group, VIB Center for Inflammation Research, Ghent, Belgium
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
- Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Joeri Ruyssinck
- IDLab, Department of Information Technology, Ghent University – imec, Ghent, Belgium
| | - Jan Ramon
- Department of Computer Science, KU Leuven, Belgium
| | - Katleen De Preter
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
- Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Yvan Saeys
- Data Mining and Modelling for Biomedicine group, VIB Center for Inflammation Research, Ghent, Belgium
- Department of Applied Mathematics and Computer Science, Ghent University, Ghent, Belgium
- * E-mail:
| |
Collapse
|
10
|
Lugo-Martinez J, Pejaver V, Pagel KA, Jain S, Mort M, Cooper DN, Mooney SD, Radivojac P. The Loss and Gain of Functional Amino Acid Residues Is a Common Mechanism Causing Human Inherited Disease. PLoS Comput Biol 2016; 12:e1005091. [PMID: 27564311 PMCID: PMC5001644 DOI: 10.1371/journal.pcbi.1005091] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Accepted: 08/02/2016] [Indexed: 01/12/2023] Open
Abstract
Elucidating the precise molecular events altered by disease-causing genetic variants represents a major challenge in translational bioinformatics. To this end, many studies have investigated the structural and functional impact of amino acid substitutions. Most of these studies were however limited in scope to either individual molecular functions or were concerned with functional effects (e.g. deleterious vs. neutral) without specifically considering possible molecular alterations. The recent growth of structural, molecular and genetic data presents an opportunity for more comprehensive studies to consider the structural environment of a residue of interest, to hypothesize specific molecular effects of sequence variants and to statistically associate these effects with genetic disease. In this study, we analyzed data sets of disease-causing and putatively neutral human variants mapped to protein 3D structures as part of a systematic study of the loss and gain of various types of functional attribute potentially underlying pathogenic molecular alterations. We first propose a formal model to assess probabilistically function-impacting variants. We then develop an array of structure-based functional residue predictors, evaluate their performance, and use them to quantify the impact of disease-causing amino acid substitutions on catalytic activity, metal binding, macromolecular binding, ligand binding, allosteric regulation and post-translational modifications. We show that our methodology generates actionable biological hypotheses for up to 41% of disease-causing genetic variants mapped to protein structures suggesting that it can be reliably used to guide experimental validation. Our results suggest that a significant fraction of disease-causing human variants mapping to protein structures are function-altering both in the presence and absence of stability disruption. Identifying the molecular changes caused by mutations is a major challenge in understanding and treating human genetic disease. To address this problem, we have developed a wide range of profiling tools designed to predict specific types of functional site from protein 3D structures. We then apply these tools to data sets of inherited disease-associated and putatively neutral amino acid substitutions and estimate the relative contribution of the loss and gain of functional residues in disease. Our results suggest that alterations of molecular function are involved in a significant number of cases of human genetic disease and are over-represented as compared to putatively neutral variants. Additionally, we use experimental data to show that it is possible to computationally identify the loss of specific functional events in disease pathogenesis. Finally, our methodology can be used to reliably identify the potential molecular consequences of disease-causing genetic variants and hence prioritize experimental validation.
Collapse
Affiliation(s)
- Jose Lugo-Martinez
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana, United States of America
| | - Vikas Pejaver
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana, United States of America
| | - Kymberleigh A. Pagel
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana, United States of America
| | - Shantanu Jain
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana, United States of America
| | - Matthew Mort
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - David N. Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - Sean D. Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, United States of America
- * E-mail: (SDM); (PR)
| | - Predrag Radivojac
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana, United States of America
- * E-mail: (SDM); (PR)
| |
Collapse
|
11
|
Huwe PJ, Xu Q, Shapovalov MV, Modi V, Andrake MD, Dunbrack RL. Biological function derived from predicted structures in CASP11. Proteins 2016; 84 Suppl 1:370-91. [PMID: 27181425 DOI: 10.1002/prot.24997] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2015] [Revised: 01/10/2016] [Accepted: 01/18/2016] [Indexed: 12/26/2022]
Abstract
In CASP11, the organizers sought to bring the biological inferences from predicted structures to the fore. To accomplish this, we assessed the models for their ability to perform quantifiable tasks related to biological function. First, for 10 targets that were probable homodimers, we measured the accuracy of docking the models into homodimers as a function of GDT-TS of the monomers, which produced characteristic L-shaped plots. At low GDT-TS, none of the models could be docked correctly as homodimers. Above GDT-TS of ∼60%, some models formed correct homodimers in one of the largest docked clusters, while many other models at the same values of GDT-TS did not. Docking was more successful when many of the templates shared the same homodimer. Second, we docked a ligand from an experimental structure into each of the models of one of the targets. Docking to the models with two different programs produced poor ligand RMSDs with the experimental structure. Measures that evaluated similarity of contacts were reasonable for some of the models, although there was not a significant correlation with model accuracy. Finally, we assessed whether models would be useful in predicting the phenotypes of missense mutations in three human targets by comparing features calculated from the models with those calculated from the experimental structures. The models were successful in reproducing accessible surface areas but there was little correlation of model accuracy with calculation of FoldX evaluation of the change in free energy between the wild-type and the mutant. Proteins 2016; 84(Suppl 1):370-391. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Peter J Huwe
- Fox Chase Cancer Center, Philadelphia, Pennsylvania, 19111
| | - Qifang Xu
- Fox Chase Cancer Center, Philadelphia, Pennsylvania, 19111
| | | | - Vivek Modi
- Fox Chase Cancer Center, Philadelphia, Pennsylvania, 19111
| | - Mark D Andrake
- Fox Chase Cancer Center, Philadelphia, Pennsylvania, 19111
| | | |
Collapse
|
12
|
Aubailly S, Piazza F. Cutoff lensing: predicting catalytic sites in enzymes. Sci Rep 2015; 5:14874. [PMID: 26445900 PMCID: PMC4597221 DOI: 10.1038/srep14874] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2015] [Accepted: 09/10/2015] [Indexed: 01/12/2023] Open
Abstract
Predicting function-related amino acids in proteins with unknown function or unknown allosteric binding sites in drug-targeted proteins is a task of paramount importance in molecular biomedicine. In this paper we introduce a simple, light and computationally inexpensive structure-based method to identify catalytic sites in enzymes. Our method, termed cutoff lensing, is a general procedure consisting in letting the cutoff used to build an elastic network model increase to large values. A validation of our method against a large database of annotated enzymes shows that optimal values of the cutoff exist such that three different structure-based indicators allow one to recover a maximum of the known catalytic sites. Interestingly, we find that the larger the structures the greater the predictive power afforded by our method. Possible ways to combine the three indicators into a single figure of merit and into a specific sequential analysis are suggested and discussed with reference to the classic case of HIV-protease. Our method could be used as a complement to other sequence- and/or structure-based methods to narrow the results of large-scale screenings.
Collapse
Affiliation(s)
- Simon Aubailly
- Université d'Orléans, Centre de Biophysique Moléculaire, CNRS-UPR4301, Rue C. Sadron, 45071, Orléans, France
| | - Francesco Piazza
- Université d'Orléans, Centre de Biophysique Moléculaire, CNRS-UPR4301, Rue C. Sadron, 45071, Orléans, France
| |
Collapse
|
13
|
Hulovatyy Y, Chen H, Milenković T. Exploring the structure and function of temporal networks with dynamic graphlets. Bioinformatics 2015; 31:i171-80. [PMID: 26072480 PMCID: PMC4765862 DOI: 10.1093/bioinformatics/btv227] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION With increasing availability of temporal real-world networks, how to efficiently study these data? One can model a temporal network as a single aggregate static network, or as a series of time-specific snapshots, each being an aggregate static network over the corresponding time window. Then, one can use established methods for static analysis on the resulting aggregate network(s), but losing in the process valuable temporal information either completely, or at the interface between different snapshots, respectively. Here, we develop a novel approach for studying a temporal network more explicitly, by capturing inter-snapshot relationships. RESULTS We base our methodology on well-established graphlets (subgraphs), which have been proven in numerous contexts in static network research. We develop new theory to allow for graphlet-based analyses of temporal networks. Our new notion of dynamic graphlets is different from existing dynamic network approaches that are based on temporal motifs (statistically significant subgraphs). The latter have limitations: their results depend on the choice of a null network model that is required to evaluate the significance of a subgraph, and choosing a good null model is non-trivial. Our dynamic graphlets overcome the limitations of the temporal motifs. Also, when we aim to characterize the structure and function of an entire temporal network or of individual nodes, our dynamic graphlets outperform the static graphlets. Clearly, accounting for temporal information helps. We apply dynamic graphlets to temporal age-specific molecular network data to deepen our limited knowledge about human aging. AVAILABILITY AND IMPLEMENTATION http://www.nd.edu/∼cone/DG.
Collapse
Affiliation(s)
- Y Hulovatyy
- Department of Computer Science and Engineering, Interdisciplinary Center for Network Science and Applications, and ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - H Chen
- Department of Computer Science and Engineering, Interdisciplinary Center for Network Science and Applications, and ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - T Milenković
- Department of Computer Science and Engineering, Interdisciplinary Center for Network Science and Applications, and ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
14
|
Meysman P, Zhou C, Cule B, Goethals B, Laukens K. Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns. BioData Min 2015; 8:4. [PMID: 25657820 PMCID: PMC4318390 DOI: 10.1186/s13040-015-0038-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 01/18/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The three-dimensional structure of a protein is an essential aspect of its functionality. Despite the large diversity in protein structures and functionality, it is known that there are common patterns and preferences in the contacts between amino acid residues, or between residues and other biomolecules, such as DNA. The discovery and characterization of these patterns is an important research topic within structural biology as it can give fundamental insight into protein structures and can aid in the prediction of unknown structures. RESULTS Here we apply an efficient spatial pattern miner to search for sets of amino acids that occur frequently in close spatial proximity in the protein structures of the Protein DataBank. This allowed us to mine for a new class of amino acid patterns, that we term FreSCOs (Frequent Spatially Cohesive Component sets), which feature synergetic combinations. To demonstrate the relevance of these FreSCOs, they were compared in relation to the thermostability of the protein structure and the interaction preferences of DNA-protein complexes. In both cases, the results matched well with prior investigations using more complex methods on smaller data sets. CONCLUSIONS The currently characterized protein structures feature a diverse set of frequent amino acid patterns that can be related to the stability of the protein molecular structure and that are independent from protein function or specific conserved domains.
Collapse
Affiliation(s)
- Pieter Meysman
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| | - Cheng Zhou
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Boris Cule
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Bart Goethals
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| |
Collapse
|
15
|
Singh O, Sawariya K, Aparoy P. Graphlet signature-based scoring method to estimate protein-ligand binding affinity. ROYAL SOCIETY OPEN SCIENCE 2014; 1:140306. [PMID: 26064572 PMCID: PMC4448774 DOI: 10.1098/rsos.140306] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Accepted: 10/31/2014] [Indexed: 06/04/2023]
Abstract
Over the years, various computational methodologies have been developed to understand and quantify receptor-ligand interactions. Protein-ligand interactions can also be explained in the form of a network and its properties. The ligand binding at the protein-active site is stabilized by formation of new interactions like hydrogen bond, hydrophobic and ionic. These non-covalent interactions when considered as links cause non-isomorphic sub-graphs in the residue interaction network. This study aims to investigate the relationship between these induced sub-graphs and ligand activity. Graphlet signature-based analysis of networks has been applied in various biological problems; the focus of this work is to analyse protein-ligand interactions in terms of neighbourhood connectivity and to develop a method in which the information from residue interaction networks, i.e. graphlet signatures, can be applied to quantify ligand affinity. A scoring method was developed, which depicts the variability in signatures adopted by different amino acids during inhibitor binding, and was termed as GSUS (graphlet signature uniqueness score). The score is specific for every individual inhibitor. Two well-known drug targets, COX-2 and CA-II and their inhibitors, were considered to assess the method. Residue interaction networks of COX-2 and CA-II with their respective inhibitors were used. Only hydrogen bond network was considered to calculate GSUS and quantify protein-ligand interaction in terms of graphlet signatures. The correlation of the GSUS with pIC50 was consistent in both proteins and better in comparison to the Autodock results. The GSUS scoring method was better in activity prediction of molecules with similar structure and diverse activity and vice versa. This study can be a major platform in developing approaches that can be used alone or together with existing methods to predict ligand affinity from protein-ligand complexes.
Collapse
|
16
|
Stock M, Fober T, Hüllermeier E, Glinca S, Klebe G, Pahikkala T, Airola A, De Baets B, Waegeman W. Identification of Functionally Related Enzymes by Learning-to-Rank Methods. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:1157-1169. [PMID: 26357052 DOI: 10.1109/tcbb.2014.2338308] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the database, from very similar to dissimilar enzymes, while information about the biological function of annotated database enzymes is ignored. In this work, we show that rankings of that kind can be substantially improved by applying kernel-based learning algorithms. This approach enables the detection of statistical dependencies between similarities of the active cleft and the biological function of annotated enzymes. This is in contrast to search-based approaches, which do not take annotated training data into account. Similarity measures based on the active cleft are known to outperform sequence-based or structure-based measures under certain conditions. We consider the Enzyme Commission (EC) classification hierarchy for obtaining annotated enzymes during the training phase. The results of a set of sizeable experiments indicate a consistent and significant improvement for a set of similarity measures that exploit information about small cavities in the surface of enzymes.
Collapse
|
17
|
Dhifli W, Saidi R, Nguifo EM. Smoothing 3D Protein Structure Motifs Through Graph Mining and Amino Acid Similarities. J Comput Biol 2014; 21:162-72. [DOI: 10.1089/cmb.2013.0092] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Affiliation(s)
- Wajdi Dhifli
- LIMOS, Blaise Pascal University, Clermont University, Clermont-Ferrand, France
- LIMOS, CNRS UMR 6158, Aubière, France
| | - Rabie Saidi
- European Bioinformatics Institute, Hinxton, Cambridge, United Kingdom
| | - Engelbert Mephu Nguifo
- LIMOS, Blaise Pascal University, Clermont University, Clermont-Ferrand, France
- LIMOS, CNRS UMR 6158, Aubière, France
| |
Collapse
|
18
|
Eksi R, Li HD, Menon R, Wen Y, Omenn GS, Kretzler M, Guan Y. Systematically differentiating functions for alternatively spliced isoforms through integrating RNA-seq data. PLoS Comput Biol 2013; 9:e1003314. [PMID: 24244129 PMCID: PMC3820534 DOI: 10.1371/journal.pcbi.1003314] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2013] [Accepted: 09/19/2013] [Indexed: 12/13/2022] Open
Abstract
Integrating large-scale functional genomic data has significantly accelerated our understanding of gene functions. However, no algorithm has been developed to differentiate functions for isoforms of the same gene using high-throughput genomic data. This is because standard supervised learning requires ‘ground-truth’ functional annotations, which are lacking at the isoform level. To address this challenge, we developed a generic framework that interrogates public RNA-seq data at the transcript level to differentiate functions for alternatively spliced isoforms. For a specific function, our algorithm identifies the ‘responsible’ isoform(s) of a gene and generates classifying models at the isoform level instead of at the gene level. Through cross-validation, we demonstrated that our algorithm is effective in assigning functions to genes, especially the ones with multiple isoforms, and robust to gene expression levels and removal of homologous gene pairs. We identified genes in the mouse whose isoforms are predicted to have disparate functionalities and experimentally validated the ‘responsible’ isoforms using data from mammary tissue. With protein structure modeling and experimental evidence, we further validated the predicted isoform functional differences for the genes Cdkn2a and Anxa6. Our generic framework is the first to predict and differentiate functions for alternatively spliced isoforms, instead of genes, using genomic data. It is extendable to any base machine learner and other species with alternatively spliced isoforms, and shifts the current gene-centered function prediction to isoform-level predictions. In mammalian genomes, a single gene can be alternatively spliced into multiple isoforms which greatly increase the functional diversity of the genome. In the human, more than 95% of multi-exon genes undergo alternative splicing. It is hard to computationally differentiate the functions for the splice isoforms of the same gene, because they are almost always annotated with the same functions and share similar sequences. In this paper, we developed a generic framework to identify the ‘responsible’ isoform(s) for each function that the gene carries out, and therefore predict functional assignment on the isoform level instead of on the gene level. Within this generic framework, we implemented and evaluated several related algorithms for isoform function prediction. We tested these algorithms through both computational evaluation and experimental validation of the predicted ‘responsible’ isoform(s) and the predicted disparate functions of the isoforms of Cdkn2a and of Anxa6. Our algorithm represents the first effort to predict and differentiate isoforms through large-scale genomic data integration.
Collapse
Affiliation(s)
- Ridvan Eksi
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Hong-Dong Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Rajasree Menon
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yuchen Wen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Gilbert S. Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail: (GSO); (MK); (YG)
| | - Matthias Kretzler
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail: (GSO); (MK); (YG)
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail: (GSO); (MK); (YG)
| |
Collapse
|
19
|
He H, Wang S, Li X, Wang H, Zhang W, Yuan L, Liu X. A novel metabolic balance model for describing the metabolic disruption of and interactions between cardiovascular-related markers during acute myocardial infarction. Metabolism 2013; 62:1357-66. [PMID: 23702382 DOI: 10.1016/j.metabol.2013.04.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Revised: 04/09/2013] [Accepted: 04/13/2013] [Indexed: 12/21/2022]
Abstract
OBJECTIVE After acute myocardial infarction (AMI), an integral evaluation of risk using multimarker approach and the understanding of the pathophysiological processes involved have recently received much attention. This study aimed to develop a model to integrally evaluate the metabolic disruption of cardiovascular-related markers and unveil their interactions after AMI. METHODS AMI was induced in rats by coronary artery ligation. Several cardiovascular-related markers in plasma and the heart were determined during AMI. A metabolic balance model was developed using matrix equations to assess the metabolic disturbance of, and interactions between, these markers. RESULTS Metabolic balance maps intuitively depicted the metabolic disruption of cardiovascular-related markers after AMI. The deviation and magnitude of the disruption were quantitatively and integrally described by φ and k (the dynamic parameter of metabolic balance disruption), respectively. The metabolic balance was disturbed in both the circulatory system and the heart post-AMI. All of the measured markers appeared to be interactional. Among these markers, kidney function and dimethylarginine dimethylaminohydrolase (DDAH) activity in the heart showed a potent effect on the other markers, whereas asymmetric dimethylarginine (ADMA) levels in plasma and adenosine triphosphate (ATP) contents in the heart were susceptible to the effects of the other markers. CONCLUSION A metabolic balance model was developed to integrally evaluate the disruption of cardiovascular-related markers after AMI, which proposes a new method for evaluating the disease state post-AMI using a multimarker approach. The unveiled interactions between these cardiovascular-related markers are helpful in understanding the pathophysiological processes.
Collapse
Affiliation(s)
- Hua He
- Center of Drug Metabolism and Pharmacokinetics, China Pharmaceutical University, Nanjing 210009, China
| | | | | | | | | | | | | |
Collapse
|
20
|
Rahman M, Bhuiyan MA, Rahman M, Hasan M. GUISE: a uniform sampler for constructing frequency histogram of graphlets. Knowl Inf Syst 2013. [DOI: 10.1007/s10115-013-0673-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
21
|
Fober T, Mernberger M, Klebe G, Hüllermeier E. Fingerprint Kernels for Protein Structure Comparison. Mol Inform 2012; 31:443-52. [PMID: 27477463 DOI: 10.1002/minf.201100149] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2011] [Accepted: 04/03/2012] [Indexed: 11/06/2022]
Abstract
A key task in structural biology is to define a meaningful similarity measure for the comparison of protein structures. Recently, the use of graphs as modeling tools for molecular data has gained increasing importance. In this context, kernel functions have attracted a lot of attention, especially since they allow for the application of a rich repertoire of methods from the field of kernel-based machine learning. However, most of the existing graph kernels have been designed for unlabeled and/or unweighted graphs, although proteins are often more naturally and more exactly represented in terms of node-labeled and edge-weighted graphs. Here we analyze kernel-based protein comparison methods and propose extensions to existing graph kernels to exploit node-labeled and edge-weighted graphs. Moreover, we propose an instance of the substructure fingerprint kernel suitable for the analysis of protein binding sites. By using fuzzy fingerprints, we solve the problem of discontinuity on bin-boundaries arising in the case of labeled graphs.
Collapse
Affiliation(s)
- Thomas Fober
- Department of Mathematics and Computer Science, Philipps-Universität Marburg, 35032 Marburg, Germany.,The first two authors should be regarded as joint First Authors
| | - Marco Mernberger
- Department of Mathematics and Computer Science, Philipps-Universität Marburg, 35032 Marburg, Germany.,Department of Pharmaceutical Chemistry, Philipps-Universität Marburg, 35032 Marburg, Germany.,The first two authors should be regarded as joint First Authors
| | - Gerhard Klebe
- Department of Pharmaceutical Chemistry, Philipps-Universität Marburg, 35032 Marburg, Germany
| | - Eyke Hüllermeier
- Department of Mathematics and Computer Science, Philipps-Universität Marburg, 35032 Marburg, Germany.
| |
Collapse
|
22
|
Xin F, Myers S, Li YF, Cooper DN, Mooney SD, Radivojac P. Structure-based kernels for the prediction of catalytic residues and their involvement in human inherited disease. ACTA ACUST UNITED AC 2010; 26:1975-82. [PMID: 20551136 DOI: 10.1093/bioinformatics/btq319] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
MOTIVATION Enzyme catalysis is involved in numerous biological processes and the disruption of enzymatic activity has been implicated in human disease. Despite this, various aspects of catalytic reactions are not completely understood, such as the mechanics of reaction chemistry and the geometry of catalytic residues within active sites. As a result, the computational prediction of catalytic residues has the potential to identify novel catalytic pockets, aid in the design of more efficient enzymes and also predict the molecular basis of disease. RESULTS We propose a new kernel-based algorithm for the prediction of catalytic residues based on protein sequence, structure and evolutionary information. The method relies upon explicit modeling of similarity between residue-centered neighborhoods in protein structures. We present evidence that this algorithm evaluates favorably against established approaches, and also provides insights into the relative importance of the geometry, physicochemical properties and evolutionary conservation of catalytic residue activity. The new algorithm was used to identify known mutations associated with inherited disease whose molecular mechanism might be predicted to operate specifically though the loss or gain of catalytic residues. It should, therefore, provide a viable approach to identifying the molecular basis of disease in which the loss or gain of function is not caused solely by the disruption of protein stability. Our analysis suggests that both mechanisms are actively involved in human inherited disease. AVAILABILITY AND IMPLEMENTATION Source code for the structural kernel is available at www.informatics.indiana.edu/predrag/.
Collapse
Affiliation(s)
- Fuxiao Xin
- School of Informatics and Computing, Indiana University, Bloomington, IN 47408, USA
| | | | | | | | | | | |
Collapse
|