101
|
Betts MJ, Lu Q, Jiang Y, Drusko A, Wichmann O, Utz M, Valtierra-Gutiérrez IA, Schlesner M, Jaeger N, Jones DT, Pfister S, Lichter P, Eils R, Siebert R, Bork P, Apic G, Gavin AC, Russell RB. Mechismo: predicting the mechanistic impact of mutations and modifications on molecular interactions. Nucleic Acids Res 2014; 43:e10. [PMID: 25392414 PMCID: PMC4333368 DOI: 10.1093/nar/gku1094] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Systematic interrogation of mutation or protein modification data is important to identify sites with functional consequences and to deduce global consequences from large data sets. Mechismo (mechismo.russellab.org) enables simultaneous consideration of thousands of 3D structures and biomolecular interactions to predict rapidly mechanistic consequences for mutations and modifications. As useful functional information often only comes from homologous proteins, we benchmarked the accuracy of predictions as a function of protein/structure sequence similarity, which permits the use of relatively weak sequence similarities with an appropriate confidence measure. For protein–protein, protein–nucleic acid and a subset of protein–chemical interactions, we also developed and benchmarked a measure of whether modifications are likely to enhance or diminish the interactions, which can assist the detection of modifications with specific effects. Analysis of high-throughput sequencing data shows that the approach can identify interesting differences between cancers, and application to proteomics data finds potential mechanistic insights for how post-translational modifications can alter biomolecular interactions.
Collapse
Affiliation(s)
- Matthew J Betts
- Cell Networks, University of Heidelberg, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany Bioquant, University of Heidelberg, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
| | - Qianhao Lu
- Cell Networks, University of Heidelberg, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany Bioquant, University of Heidelberg, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
| | - YingYing Jiang
- Cell Networks, University of Heidelberg, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany Bioquant, University of Heidelberg, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
| | - Armin Drusko
- Cell Networks, University of Heidelberg, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany Bioquant, University of Heidelberg, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
| | - Oliver Wichmann
- Cell Networks, University of Heidelberg, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany Bioquant, University of Heidelberg, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
| | - Mathias Utz
- Cell Networks, University of Heidelberg, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany Bioquant, University of Heidelberg, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
| | - Ilse A Valtierra-Gutiérrez
- Cell Networks, University of Heidelberg, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany Bioquant, University of Heidelberg, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
| | - Matthias Schlesner
- Deutsches Krebsforschungszentrum, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| | - Natalie Jaeger
- Deutsches Krebsforschungszentrum, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| | - David T Jones
- Deutsches Krebsforschungszentrum, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| | - Stefan Pfister
- Deutsches Krebsforschungszentrum, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| | - Peter Lichter
- Deutsches Krebsforschungszentrum, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| | - Roland Eils
- Bioquant, University of Heidelberg, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany Deutsches Krebsforschungszentrum, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany Department for Bioinformatics and Functional Genomics, Institute for Pharmacy and Molecular Biotechnology (IPMB), University of Heidelberg, Heidelberg, Germany
| | - Reiner Siebert
- Institut für Humangenetik, Universitätsklinikum Schleswig-Holstein, Christian-Albrechts-Universität zu Kiel, Arnold Heller Straße 3, 24105 Kiel, Germany
| | - Peer Bork
- EMBL, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Gordana Apic
- Cell Networks, University of Heidelberg, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany Bioquant, University of Heidelberg, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany Cambridge Cell Networks Ltd, St John's Innovation Centre, Cowley Road, CB3 0WS, Cambridge, UK
| | | | - Robert B Russell
- Cell Networks, University of Heidelberg, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany Bioquant, University of Heidelberg, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
| |
Collapse
|
102
|
Cukuroglu E, Engin HB, Gursoy A, Keskin O. Hot spots in protein–protein interfaces: Towards drug discovery. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2014; 116:165-73. [DOI: 10.1016/j.pbiomolbio.2014.06.003] [Citation(s) in RCA: 113] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Revised: 05/30/2014] [Accepted: 06/12/2014] [Indexed: 11/16/2022]
|
103
|
Berliner N, Teyra J, Çolak R, Garcia Lopez S, Kim PM. Combining structural modeling with ensemble machine learning to accurately predict protein fold stability and binding affinity effects upon mutation. PLoS One 2014; 9:e107353. [PMID: 25243403 PMCID: PMC4170975 DOI: 10.1371/journal.pone.0107353] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2014] [Accepted: 07/21/2014] [Indexed: 12/04/2022] Open
Abstract
Advances in sequencing have led to a rapid accumulation of mutations, some of which are associated with diseases. However, to draw mechanistic conclusions, a biochemical understanding of these mutations is necessary. For coding mutations, accurate prediction of significant changes in either the stability of proteins or their affinity to their binding partners is required. Traditional methods have used semi-empirical force fields, while newer methods employ machine learning of sequence and structural features. Here, we show how combining both of these approaches leads to a marked boost in accuracy. We introduce ELASPIC, a novel ensemble machine learning approach that is able to predict stability effects upon mutation in both, domain cores and domain-domain interfaces. We combine semi-empirical energy terms, sequence conservation, and a wide variety of molecular details with a Stochastic Gradient Boosting of Decision Trees (SGB-DT) algorithm. The accuracy of our predictions surpasses existing methods by a considerable margin, achieving correlation coefficients of 0.77 for stability, and 0.75 for affinity predictions. Notably, we integrated homology modeling to enable proteome-wide prediction and show that accurate prediction on modeled structures is possible. Lastly, ELASPIC showed significant differences between various types of disease-associated mutations, as well as between disease and common neutral mutations. Unlike pure sequence-based prediction methods that try to predict phenotypic effects of mutations, our predictions unravel the molecular details governing the protein instability, and help us better understand the molecular causes of diseases.
Collapse
Affiliation(s)
- Niklas Berliner
- Terrence Donnelly Centre for Cellular and Biomolecular Research (CCBR), University of Toronto, Toronto, Ontario, Canada
| | - Joan Teyra
- Terrence Donnelly Centre for Cellular and Biomolecular Research (CCBR), University of Toronto, Toronto, Ontario, Canada
| | - Recep Çolak
- Terrence Donnelly Centre for Cellular and Biomolecular Research (CCBR), University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Sebastian Garcia Lopez
- Terrence Donnelly Centre for Cellular and Biomolecular Research (CCBR), University of Toronto, Toronto, Ontario, Canada
- Universidad Nacional de Colombia, Manizales, Colombia
| | - Philip M. Kim
- Terrence Donnelly Centre for Cellular and Biomolecular Research (CCBR), University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
104
|
Sudha G, Nussinov R, Srinivasan N. An overview of recent advances in structural bioinformatics of protein-protein interactions and a guide to their principles. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2014; 116:141-50. [PMID: 25077409 DOI: 10.1016/j.pbiomolbio.2014.07.004] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2014] [Accepted: 07/13/2014] [Indexed: 12/20/2022]
Abstract
Rich data bearing on the structural and evolutionary principles of protein-protein interactions are paving the way to a better understanding of the regulation of function in the cell. This is particularly the case when these interactions are considered in the framework of key pathways. Knowledge of the interactions may provide insights into the mechanisms of crucial 'driver' mutations in oncogenesis. They also provide the foundation toward the design of protein-protein interfaces and inhibitors that can abrogate their formation or enhance them. The main features to learn from known 3-D structures of protein-protein complexes and the extensive literature which analyzes them computationally and experimentally include the interaction details which permit undertaking structure-based drug discovery, the evolution of complexes and their interactions, the consequences of alterations such as post-translational modifications, ligand binding, disease causing mutations, host pathogen interactions, oligomerization, aggregation and the roles of disorder, dynamics, allostery and more to the protein and the cell. This review highlights some of the recent advances in these areas, including design, inhibition and prediction of protein-protein complexes. The field is broad, and much work has been carried out in these areas, making it challenging to cover it in its entirety. Much of this is due to the fast increase in the number of molecules whose structures have been determined experimentally and the vast increase in computational power. Here we provide a concise overview.
Collapse
Affiliation(s)
- Govindarajan Sudha
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India.
| | - Ruth Nussinov
- Cancer and Inflammation Program, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., National Cancer Institute, Frederick, MD 21702, USA; Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel.
| | | |
Collapse
|
105
|
Advances in Human Biology: Combining Genetics and Molecular Biophysics to Pave the Way for Personalized Diagnostics and Medicine. ACTA ACUST UNITED AC 2014. [DOI: 10.1155/2014/471836] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Advances in several biology-oriented initiatives such as genome sequencing and structural genomics, along with the progress made through traditional biological and biochemical research, have opened up a unique opportunity to better understand the molecular effects of human diseases. Human DNA can vary significantly from person to person and determines an individual’s physical characteristics and their susceptibility to diseases. Armed with an individual’s DNA sequence, researchers and physicians can check for defects known to be associated with certain diseases by utilizing various databases. However, for unclassified DNA mutations or in order to reveal molecular mechanism behind the effects, the mutations have to be mapped onto the corresponding networks and macromolecular structures and then analyzed to reveal their effect on the wild type properties of biological processes involved. Predicting the effect of DNA mutations on individual’s health is typically referred to as personalized or companion diagnostics. Furthermore, once the molecular mechanism of the mutations is revealed, the patient should be given drugs which are the most appropriate for the individual genome, referred to as pharmacogenomics. Altogether, the shift in focus in medicine towards more genomic-oriented practices is the foundation of personalized medicine. The progress made in these rapidly developing fields is outlined.
Collapse
|
106
|
Lu HC, Fornili A, Fraternali F. Protein-protein interaction networks studies and importance of 3D structure knowledge. Expert Rev Proteomics 2014; 10:511-20. [PMID: 24206225 DOI: 10.1586/14789450.2013.856764] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Protein-protein interaction networks (PPINs) are a powerful tool to study biological processes in living cells. In this review, we present the progress of PPIN studies from abstract to more detailed representations. We will focus on 3D interactome networks, which offer detailed information at the atomic level. This information can be exploited in understanding not only the underlying cellular mechanisms, but also how human variants and disease-causing mutations affect protein functions and complexes' stability. Recent studies have used structural information on PPINs to also understand the molecular mechanisms of binding partner selection. We will address the challenges in generating 3D PPINs due to the restricted number of solved protein structures. Finally, some of the current use of 3D PPINs will be discussed, highlighting their contribution to the studies in genotype-phenotype relationships and in the optimization of targeted studies to design novel chemical compounds for medical treatments.
Collapse
Affiliation(s)
- Hui-Chun Lu
- Randall Division of Cell and Molecular Biophysics, King's College London, New Hunt's House, London SE1 1UL, UK
| | | | | |
Collapse
|
107
|
Abstract
Unravelling the genotype–phenotype relationship in humans remains a challenging task in genomics studies. Recent advances in sequencing technologies mean there are now thousands of sequenced human genomes, revealing millions of single nucleotide variants (SNVs). For non-synonymous SNVs present in proteins the difficulties of the problem lie in first identifying those nsSNVs that result in a functional change in the protein among the many non-functional variants and in turn linking this functional change to phenotype. Here we present VarMod (Variant Modeller) a method that utilises both protein sequence and structural features to predict nsSNVs that alter protein function. VarMod develops recent observations that functional nsSNVs are enriched at protein–protein interfaces and protein–ligand binding sites and uses these characteristics to make predictions. In benchmarking on a set of nearly 3000 nsSNVs VarMod performance is comparable to an existing state of the art method. The VarMod web server provides extensive resources to investigate the sequence and structural features associated with the predictions including visualisation of protein models and complexes via an interactive JSmol molecular viewer. VarMod is available for use at http://www.wasslab.org/varmod.
Collapse
Affiliation(s)
- Morena Pappalardo
- Centre for Molecular Processing, School of Biosciences, University of Kent, CT2 7NH, UK
| | - Mark N Wass
- Centre for Molecular Processing, School of Biosciences, University of Kent, CT2 7NH, UK
| |
Collapse
|
108
|
Villoutreix BO, Kuenemann MA, Poyet JL, Bruzzoni-Giovanelli H, Labbé C, Lagorce D, Sperandio O, Miteva MA. Drug-Like Protein-Protein Interaction Modulators: Challenges and Opportunities for Drug Discovery and Chemical Biology. Mol Inform 2014; 33:414-437. [PMID: 25254076 PMCID: PMC4160817 DOI: 10.1002/minf.201400040] [Citation(s) in RCA: 84] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Accepted: 04/21/2014] [Indexed: 12/13/2022]
Abstract
[Formula: see text] Fundamental processes in living cells are largely controlled by macromolecular interactions and among them, protein-protein interactions (PPIs) have a critical role while their dysregulations can contribute to the pathogenesis of numerous diseases. Although PPIs were considered as attractive pharmaceutical targets already some years ago, they have been thus far largely unexploited for therapeutic interventions with low molecular weight compounds. Several limiting factors, from technological hurdles to conceptual barriers, are known, which, taken together, explain why research in this area has been relatively slow. However, this last decade, the scientific community has challenged the dogma and became more enthusiastic about the modulation of PPIs with small drug-like molecules. In fact, several success stories were reported both, at the preclinical and clinical stages. In this review article, written for the 2014 International Summer School in Chemoinformatics (Strasbourg, France), we discuss in silico tools (essentially post 2012) and databases that can assist the design of low molecular weight PPI modulators (these tools can be found at www.vls3d.com). We first introduce the field of protein-protein interaction research, discuss key challenges and comment recently reported in silico packages, protocols and databases dedicated to PPIs. Then, we illustrate how in silico methods can be used and combined with experimental work to identify PPI modulators.
Collapse
Affiliation(s)
- Bruno O Villoutreix
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
- CDithem, Faculté de Pharmacie, 1 rue du Prof Laguesse59000 Lille, France
| | - Melaine A Kuenemann
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
| | - Jean-Luc Poyet
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
- IUH, Hôpital Saint-LouisParis, France
- CDithem, Faculté de Pharmacie, 1 rue du Prof Laguesse59000 Lille, France
| | - Heriberto Bruzzoni-Giovanelli
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
- CIC, Clinical investigation center, Hôpital Saint-LouisParis, France
| | - Céline Labbé
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
| | - David Lagorce
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
| | - Olivier Sperandio
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
- CDithem, Faculté de Pharmacie, 1 rue du Prof Laguesse59000 Lille, France
| | - Maria A Miteva
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
| |
Collapse
|
109
|
Xie L, Ge X, Tan H, Xie L, Zhang Y, Hart T, Yang X, Bourne PE. Towards structural systems pharmacology to study complex diseases and personalized medicine. PLoS Comput Biol 2014; 10:e1003554. [PMID: 24830652 PMCID: PMC4022462 DOI: 10.1371/journal.pcbi.1003554] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Genome-Wide Association Studies (GWAS), whole genome sequencing, and high-throughput omics techniques have generated vast amounts of genotypic and molecular phenotypic data. However, these data have not yet been fully explored to improve the effectiveness and efficiency of drug discovery, which continues along a one-drug-one-target-one-disease paradigm. As a partial consequence, both the cost to launch a new drug and the attrition rate are increasing. Systems pharmacology and pharmacogenomics are emerging to exploit the available data and potentially reverse this trend, but, as we argue here, more is needed. To understand the impact of genetic, epigenetic, and environmental factors on drug action, we must study the structural energetics and dynamics of molecular interactions in the context of the whole human genome and interactome. Such an approach requires an integrative modeling framework for drug action that leverages advances in data-driven statistical modeling and mechanism-based multiscale modeling and transforms heterogeneous data from GWAS, high-throughput sequencing, structural genomics, functional genomics, and chemical genomics into unified knowledge. This is not a small task, but, as reviewed here, progress is being made towards the final goal of personalized medicines for the treatment of complex diseases.
Collapse
Affiliation(s)
- Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, New York, New York, United States of America
- Ph.D. Program in Computer Science, Biology, and Biochemistry, The Graduate Center, The City University of New York, New York, New York, United States of America
- * E-mail:
| | - Xiaoxia Ge
- Department of Computer Science, Hunter College, The City University of New York, New York, New York, United States of America
| | - Hepan Tan
- Department of Computer Science, Hunter College, The City University of New York, New York, New York, United States of America
| | - Li Xie
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California, United States of America
| | - Yinliang Zhang
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California, United States of America
| | - Thomas Hart
- Department of Biological Sciences, Hunter College, The City University of New York, New York, New York, United States of America
| | - Xiaowei Yang
- School of Public Health, Hunter College, The City University of New York, New York, New York, United States of America
| | - Philip E. Bourne
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California, United States of America
| |
Collapse
|
110
|
Yates CM, Filippis I, Kelley LA, Sternberg MJE. SuSPect: enhanced prediction of single amino acid variant (SAV) phenotype using network features. J Mol Biol 2014; 426:2692-701. [PMID: 24810707 PMCID: PMC4087249 DOI: 10.1016/j.jmb.2014.04.026] [Citation(s) in RCA: 168] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2014] [Revised: 04/23/2014] [Accepted: 04/28/2014] [Indexed: 11/16/2022]
Abstract
Whole-genome and exome sequencing studies reveal many genetic variants between individuals, some of which are linked to disease. Many of these variants lead to single amino acid variants (SAVs), and accurate prediction of their phenotypic impact is important. Incorporating sequence conservation and network-level features, we have developed a method, SuSPect (Disease-Susceptibility-based SAV Phenotype Prediction), for predicting how likely SAVs are to be associated with disease. SuSPect performs significantly better than other available batch methods on the VariBench benchmarking dataset, with a balanced accuracy of 82%. SuSPect is available at www.sbg.bio.ic.ac.uk/suspect. The Web site has been implemented in Perl and SQLite and is compatible with modern browsers. An SQLite database of possible missense variants in the human proteome is available to download at www.sbg.bio.ic.ac.uk/suspect/download.html. Bioinformatics approaches are key for identification of disease-causing variants. SAV phenotype prediction can be improved using network information. A method including these features, SuSPect, outperforms tested methods. SuSPect is available to use at www.sbg.bio.ic.ac.uk/suspect.
Collapse
Affiliation(s)
- Christopher M Yates
- Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK.
| | - Ioannis Filippis
- Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| | - Lawrence A Kelley
- Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
111
|
Zhao N, Han JG, Shyu CR, Korkin D. Determining effects of non-synonymous SNPs on protein-protein interactions using supervised and semi-supervised learning. PLoS Comput Biol 2014; 10:e1003592. [PMID: 24784581 PMCID: PMC4006705 DOI: 10.1371/journal.pcbi.1003592] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2013] [Accepted: 03/13/2014] [Indexed: 12/31/2022] Open
Abstract
Single nucleotide polymorphisms (SNPs) are among the most common types of genetic variation in complex genetic disorders. A growing number of studies link the functional role of SNPs with the networks and pathways mediated by the disease-associated genes. For example, many non-synonymous missense SNPs (nsSNPs) have been found near or inside the protein-protein interaction (PPI) interfaces. Determining whether such nsSNP will disrupt or preserve a PPI is a challenging task to address, both experimentally and computationally. Here, we present this task as three related classification problems, and develop a new computational method, called the SNP-IN tool (non-synonymous SNP INteraction effect predictor). Our method predicts the effects of nsSNPs on PPIs, given the interaction's structure. It leverages supervised and semi-supervised feature-based classifiers, including our new Random Forest self-learning protocol. The classifiers are trained based on a dataset of comprehensive mutagenesis studies for 151 PPI complexes, with experimentally determined binding affinities of the mutant and wild-type interactions. Three classification problems were considered: (1) a 2-class problem (strengthening/weakening PPI mutations), (2) another 2-class problem (mutations that disrupt/preserve a PPI), and (3) a 3-class classification (detrimental/neutral/beneficial mutation effects). In total, 11 different supervised and semi-supervised classifiers were trained and assessed resulting in a promising performance, with the weighted f-measure ranging from 0.87 for Problem 1 to 0.70 for the most challenging Problem 3. By integrating prediction results of the 2-class classifiers into the 3-class classifier, we further improved its performance for Problem 3. To demonstrate the utility of SNP-IN tool, it was applied to study the nsSNP-induced rewiring of two disease-centered networks. The accurate and balanced performance of SNP-IN tool makes it readily available to study the rewiring of large-scale protein-protein interaction networks, and can be useful for functional annotation of disease-associated SNPs. SNIP-IN tool is freely accessible as a web-server at http://korkinlab.org/snpintool/.
Collapse
Affiliation(s)
- Nan Zhao
- Informatics Institute, University of Missouri, Columbia, Missouri, United States of America
| | - Jing Ginger Han
- Informatics Institute, University of Missouri, Columbia, Missouri, United States of America
| | - Chi-Ren Shyu
- Informatics Institute, University of Missouri, Columbia, Missouri, United States of America
- Department of Computer Science, University of Missouri, Columbia, Missouri, United States of America
| | - Dmitry Korkin
- Informatics Institute, University of Missouri, Columbia, Missouri, United States of America
- Department of Computer Science, University of Missouri, Columbia, Missouri, United States of America
- Bond Life Science Center, University of Missouri, Columbia, Missouri, United States of America
| |
Collapse
|
112
|
Preeprem T, Gibson G. SDS, a structural disruption score for assessment of missense variant deleteriousness. Front Genet 2014; 5:82. [PMID: 24795746 PMCID: PMC4001065 DOI: 10.3389/fgene.2014.00082] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2014] [Accepted: 03/26/2014] [Indexed: 11/17/2022] Open
Abstract
We have developed a novel structure-based evaluation for missense variants that explicitly models protein structure and amino acid properties to predict the likelihood that a variant disrupts protein function. A structural disruption score (SDS) is introduced as a measure to depict the likelihood that a case variant is functional. The score is constructed using characteristics that distinguish between causal and neutral variants within a group of proteins. The SDS score is correlated with standard sequence-based deleteriousness, but shows promise for improving discrimination between neutral and causal variants at less conserved sites. The prediction was performed on 3-dimentional structures of 57 gene products whose homozygous SNPs were identified as case-exclusive variants in an exome sequencing study of epilepsy disorders. We contrasted the candidate epilepsy variants with scores for likely benign variants found in the EVS database, and for positive control variants in the same genes that are suspected to promote a range of diseases. To derive a characteristic profile of damaging SNPs, we transformed continuous scores into categorical variables based on the score distribution of each measurement, collected from all possible SNPs in this protein set, where extreme measures were assumed to be deleterious. A second epilepsy dataset was used to replicate the findings. Causal variants tend to receive higher sequence-based deleterious scores, induce larger physico-chemical changes between amino acid pairs, locate in protein domains, buried sites or on conserved protein surface clusters, and cause protein destabilization, relative to negative controls. These measures were agglomerated for each variant. A list of nine high-priority putative functional variants for epilepsy was generated. Our newly developed SDS protocol facilitates SNP prioritization for experimental validation.
Collapse
Affiliation(s)
| | - Greg Gibson
- School of Biology, Georgia Institute of Technology Atlanta, GA, USA
| |
Collapse
|
113
|
Das J, Lee HR, Sagar A, Fragoza R, Liang J, Wei X, Wang X, Mort M, Stenson PD, Cooper DN, Yu H. Elucidating common structural features of human pathogenic variations using large-scale atomic-resolution protein networks. Hum Mutat 2014; 35:585-93. [PMID: 24599843 DOI: 10.1002/humu.22534] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2013] [Accepted: 02/14/2014] [Indexed: 01/24/2023]
Abstract
With the rapid growth of structural genomics, numerous protein crystal structures have become available. However, the parallel increase in knowledge of the functional principles underlying biological processes, and more specifically the underlying molecular mechanisms of disease, has been less dramatic. This notwithstanding, the study of complex cellular networks has made possible the inference of protein functions on a large scale. Here, we combine the scale of network systems biology with the resolution of traditional structural biology to generate a large-scale atomic-resolution interactome-network comprising 3,398 interactions between 2,890 proteins with a well-defined interaction interface and interface residues for each interaction. Within the framework of this atomic-resolution network, we have explored the structural principles underlying variations causing human-inherited disease. We find that in-frame pathogenic variations are enriched at both the interface and in the interacting domain, suggesting that variations not only at interface "hot-spots," but in the entire interacting domain can result in alterations of interactions. Further, the sites of pathogenic variations are closely related to the biophysical strength of the interactions they perturb. Finally, we show that biochemical alterations consequent to these variations are considerably more disruptive than evolutionary changes, with the most significant alterations at the protein interaction interface.
Collapse
Affiliation(s)
- Jishnu Das
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, 14853; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, 14853
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
114
|
Schwede T. Protein modeling: what happened to the "protein structure gap"? Structure 2014; 21:1531-40. [PMID: 24010712 DOI: 10.1016/j.str.2013.08.007] [Citation(s) in RCA: 83] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2013] [Revised: 08/12/2013] [Accepted: 08/12/2013] [Indexed: 11/27/2022]
Abstract
Computational modeling of three-dimensional macromolecular structures and complexes from their sequence has been a long-standing vision in structural biology. Over the last 2 decades, a paradigm shift has occurred: starting from a large "structure knowledge gap" between the huge number of protein sequences and small number of known structures, today, some form of structural information, either experimental or template-based models, is available for the majority of amino acids encoded by common model organism genomes. With the scientific focus of interest moving toward larger macromolecular complexes and dynamic networks of interactions, the integration of computational modeling methods with low-resolution experimental techniques allows the study of large and complex molecular machines. One of the open challenges for computational modeling and prediction techniques is to convey the underlying assumptions, as well as the expected accuracy and structural variability of a specific model, which is crucial to understanding its limitations.
Collapse
Affiliation(s)
- Torsten Schwede
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, 4056 Basel, Switzerland; Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Klingelbergstrasse 50-70, 4056 Basel, Switzerland.
| |
Collapse
|
115
|
Acuner Ozbabacan SE, Gursoy A, Nussinov R, Keskin O. The structural pathway of interleukin 1 (IL-1) initiated signaling reveals mechanisms of oncogenic mutations and SNPs in inflammation and cancer. PLoS Comput Biol 2014; 10:e1003470. [PMID: 24550720 PMCID: PMC3923659 DOI: 10.1371/journal.pcbi.1003470] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2013] [Accepted: 12/25/2013] [Indexed: 01/21/2023] Open
Abstract
Interleukin-1 (IL-1) is a large cytokine family closely related to innate immunity and inflammation. IL-1 proteins are key players in signaling pathways such as apoptosis, TLR, MAPK, NLR and NF-κB. The IL-1 pathway is also associated with cancer, and chronic inflammation increases the risk of tumor development via oncogenic mutations. Here we illustrate that the structures of interfaces between proteins in this pathway bearing the mutations may reveal how. Proteins are frequently regulated via their interactions, which can turn them ON or OFF. We show that oncogenic mutations are significantly at or adjoining interface regions, and can abolish (or enhance) the protein-protein interaction, making the protein constitutively active (or inactive, if it is a repressor). We combine known structures of protein-protein complexes and those that we have predicted for the IL-1 pathway, and integrate them with literature information. In the reconstructed pathway there are 104 interactions between proteins whose three dimensional structures are experimentally identified; only 15 have experimentally-determined structures of the interacting complexes. By predicting the protein-protein complexes throughout the pathway via the PRISM algorithm, the structural coverage increases from 15% to 71%. In silico mutagenesis and comparison of the predicted binding energies reveal the mechanisms of how oncogenic and single nucleotide polymorphism (SNP) mutations can abrogate the interactions or increase the binding affinity of the mutant to the native partner. Computational mapping of mutations on the interface of the predicted complexes may constitute a powerful strategy to explain the mechanisms of activation/inhibition. It can also help explain how an oncogenic mutation or SNP works.
Collapse
Affiliation(s)
- Saliha Ece Acuner Ozbabacan
- Center for Computational Biology and Bioinformatics and College of Engineering, Koc University, Sariyer Istanbul, Turkey
| | - Attila Gursoy
- Center for Computational Biology and Bioinformatics and College of Engineering, Koc University, Sariyer Istanbul, Turkey
| | - Ruth Nussinov
- Cancer and Inflammation Program, Leidos Biomedical Research, Inc., National Cancer Institute, Frederick National Laboratory, Frederick, Maryland, United States of America
- Sackler Inst. of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Ozlem Keskin
- Center for Computational Biology and Bioinformatics and College of Engineering, Koc University, Sariyer Istanbul, Turkey
| |
Collapse
|
116
|
Espinosa O, Mitsopoulos K, Hakas J, Pearl F, Zvelebil M. Deriving a mutation index of carcinogenicity using protein structure and protein interfaces. PLoS One 2014; 9:e84598. [PMID: 24454733 PMCID: PMC3893166 DOI: 10.1371/journal.pone.0084598] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2013] [Accepted: 11/16/2013] [Indexed: 11/29/2022] Open
Abstract
With the advent of Next Generation Sequencing the identification of mutations in the genomes of healthy and diseased tissues has become commonplace. While much progress has been made to elucidate the aetiology of disease processes in cancer, the contributions to disease that many individual mutations make remain to be characterised and their downstream consequences on cancer phenotypes remain to be understood. Missense mutations commonly occur in cancers and their consequences remain challenging to predict. However, this knowledge is becoming more vital, for both assessing disease progression and for stratifying drug treatment regimes. Coupled with structural data, comprehensive genomic databases of mutations such as the 1000 Genomes project and COSMIC give an opportunity to investigate general principles of how cancer mutations disrupt proteins and their interactions at the molecular and network level. We describe a comprehensive comparison of cancer and neutral missense mutations; by combining features derived from structural and interface properties we have developed a carcinogenicity predictor, InCa (Index of Carcinogenicity). Upon comparison with other methods, we observe that InCa can predict mutations that might not be detected by other methods. We also discuss general limitations shared by all predictors that attempt to predict driver mutations and discuss how this could impact high-throughput predictions. A web interface to a server implementation is publicly available at http://inca.icr.ac.uk/.
Collapse
Affiliation(s)
- Octavio Espinosa
- Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London, United Kingdom
| | - Konstantinos Mitsopoulos
- Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London, United Kingdom
| | - Jarle Hakas
- Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London, United Kingdom
| | - Frances Pearl
- UK Cancer Therapeutics Unit, The Institute of Cancer Research, London, United Kingdom
- Translational Drug Discovery Group, School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Marketa Zvelebil
- Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London, United Kingdom
| |
Collapse
|
117
|
Cutting GR. Annotating DNA variants is the next major goal for human genetics. Am J Hum Genet 2014; 94:5-10. [PMID: 24387988 PMCID: PMC3882730 DOI: 10.1016/j.ajhg.2013.12.008] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Indexed: 12/29/2022] Open
Abstract
Clinical genetic testing has undergone a dramatic transformation in the past two decades. Diagnostic laboratories that previously tested for well-established disease-causing DNA variants in a handful of genes have evolved into sequencing factories identifying thousands of variants of known and unknown medical consequence. Sorting out what does and does not cause disease in our genomes is the next great challenge in making genetics a central feature of healthcare. I propose that closing the gap in our ability to interpret variation responsible for Mendelian disorders provides a grand and unprecedented opportunity for geneticists. Human geneticists are well placed to coordinate a systematic evaluation of variants in collaboration with basic scientists and clinicians. Sharing of knowledge, data, methods, and tools will aid both researchers and healthcare workers in achieving their common goal of defining the pathogenic potential of variants. Generation of variant annotations will inform genetic testing and will deepen our understanding of gene and protein function, thereby aiding the search for molecular targeted therapies.
Collapse
Affiliation(s)
- Garry R Cutting
- McKusick-Nathans Institute of Genetic Medicine and Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.
| |
Collapse
|
118
|
Preeprem T, Gibson G. An association-adjusted consensus deleterious scheme to classify homozygous Mis-sense mutations for personal genome interpretation. BioData Min 2013; 6:24. [PMID: 24365473 PMCID: PMC3892026 DOI: 10.1186/1756-0381-6-24] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2013] [Accepted: 12/17/2013] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Personal genome analysis is now being considered for evaluation of disease risk in healthy individuals, utilizing both rare and common variants. Multiple scores have been developed to predict the deleteriousness of amino acid substitutions, using information on the allele frequencies, level of evolutionary conservation, and averaged structural evidence. However, agreement among these scores is limited and they likely over-estimate the fraction of the genome that is deleterious. METHOD This study proposes an integrative approach to identify a subset of homozygous non-synonymous single nucleotide polymorphisms (nsSNPs). An 8-level classification scheme is constructed from the presence/absence of deleterious predictions combined with evidence of association with disease or complex traits. Detailed literature searches and structural validations are then performed for a subset of homozygous 826 mis-sense mutations in 575 proteins found in the genomes of 12 healthy adults. RESULTS Implementation of the Association-Adjusted Consensus Deleterious Scheme (AACDS) classifies 11% of all predicted highly deleterious homozygous variants as most likely to influence disease risk. The number of such variants per genome ranges from 0 to 8 with no significant difference between African and Caucasian Americans. Detailed analysis of mutations affecting the APOE, MTMR2, THSB1, CHIA, αMyHC, and AMY2A proteins shows how the protein structure is likely to be disrupted, even though the associated phenotypes have not been documented in the corresponding individuals. CONCLUSIONS The classification system for homozygous nsSNPs provides an opportunity to systematically rank nsSNPs based on suggestive evidence from annotations and sequence-based predictions. The ranking scheme, in-depth literature searches, and structural validations of highly prioritized mis-sense mutations compliment traditional sequence-based approaches and should have particular utility for the development of individualized health profiles. An online tool reporting the AACDS score for any variant is provided at the authors' website.
Collapse
Affiliation(s)
| | - Greg Gibson
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
119
|
Engin HB, Guney E, Keskin O, Oliva B, Gursoy A. Integrating structure to protein-protein interaction networks that drive metastasis to brain and lung in breast cancer. PLoS One 2013; 8:e81035. [PMID: 24278371 PMCID: PMC3838352 DOI: 10.1371/journal.pone.0081035] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2013] [Accepted: 10/05/2013] [Indexed: 11/18/2022] Open
Abstract
Blocking specific protein interactions can lead to human diseases. Accordingly, protein interactions and the structural knowledge on interacting surfaces of proteins (interfaces) have an important role in predicting the genotype-phenotype relationship. We have built the phenotype specific sub-networks of protein-protein interactions (PPIs) involving the relevant genes responsible for lung and brain metastasis from primary tumor in breast cancer. First, we selected the PPIs most relevant to metastasis causing genes (seed genes), by using the "guilt-by-association" principle. Then, we modeled structures of the interactions whose complex forms are not available in Protein Databank (PDB). Finally, we mapped mutations to interface structures (real and modeled), in order to spot the interactions that might be manipulated by these mutations. Functional analyses performed on these sub-networks revealed the potential relationship between immune system-infectious diseases and lung metastasis progression, but this connection was not observed significantly in the brain metastasis. Besides, structural analyses showed that some PPI interfaces in both metastasis sub-networks are originating from microbial proteins, which in turn were mostly related with cell adhesion. Cell adhesion is a key mechanism in metastasis, therefore these PPIs may be involved in similar molecular pathways that are shared by infectious disease and metastasis. Finally, by mapping the mutations and amino acid variations on the interface regions of the proteins in the metastasis sub-networks we found evidence for some mutations to be involved in the mechanisms differentiating the type of the metastasis.
Collapse
Affiliation(s)
- H. Billur Engin
- Center for Computational Biology and Bioinformatics and College of Engineering, Koc University, Istanbul, Turkey
| | - Emre Guney
- Structural Bioinformatics Group (GRIB), Universitat Pompeu Fabra
| | - Ozlem Keskin
- Center for Computational Biology and Bioinformatics and College of Engineering, Koc University, Istanbul, Turkey
| | - Baldo Oliva
- Structural Bioinformatics Group (GRIB), Universitat Pompeu Fabra
| | - Attila Gursoy
- Center for Computational Biology and Bioinformatics and College of Engineering, Koc University, Istanbul, Turkey
| |
Collapse
|
120
|
Agius R, Torchala M, Moal IH, Fernández-Recio J, Bates PA. Characterizing changes in the rate of protein-protein dissociation upon interface mutation using hotspot energy and organization. PLoS Comput Biol 2013; 9:e1003216. [PMID: 24039569 PMCID: PMC3764008 DOI: 10.1371/journal.pcbi.1003216] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2013] [Accepted: 07/25/2013] [Indexed: 12/21/2022] Open
Abstract
Predicting the effects of mutations on the kinetic rate constants of protein-protein interactions is central to both the modeling of complex diseases and the design of effective peptide drug inhibitors. However, while most studies have concentrated on the determination of association rate constants, dissociation rates have received less attention. In this work we take a novel approach by relating the changes in dissociation rates upon mutation to the energetics and architecture of hotspots and hotregions, by performing alanine scans pre- and post-mutation. From these scans, we design a set of descriptors that capture the change in hotspot energy and distribution. The method is benchmarked on 713 kinetically characterized mutations from the SKEMPI database. Our investigations show that, with the use of hotspot descriptors, energies from single-point alanine mutations may be used for the estimation of off-rate mutations to any residue type and also multi-point mutations. A number of machine learning models are built from a combination of molecular and hotspot descriptors, with the best models achieving a Pearson's Correlation Coefficient of 0.79 with experimental off-rates and a Matthew's Correlation Coefficient of 0.6 in the detection of rare stabilizing mutations. Using specialized feature selection models we identify descriptors that are highly specific and, conversely, broadly important to predicting the effects of different classes of mutations, interface regions and complexes. Our results also indicate that the distribution of the critical stability regions across protein-protein interfaces is a function of complex size more strongly than interface area. In addition, mutations at the rim are critical for the stability of small complexes, but consistently harder to characterize. The relationship between hotregion size and the dissociation rate is also investigated and, using hotspot descriptors which model cooperative effects within hotregions, we show how the contribution of hotregions of different sizes, changes under different cooperative effects.
Collapse
Affiliation(s)
- Rudi Agius
- Biomolecular Modelling Laboratory, Cancer Research UK London Research Institute, London, United Kingdom
| | - Mieczyslaw Torchala
- Biomolecular Modelling Laboratory, Cancer Research UK London Research Institute, London, United Kingdom
| | - Iain H. Moal
- Joint BSC-IRB Research Program in Computational Biology, Life Science Department, Barcelona Supercomputing Center, Barcelona, Spain
| | - Juan Fernández-Recio
- Joint BSC-IRB Research Program in Computational Biology, Life Science Department, Barcelona Supercomputing Center, Barcelona, Spain
| | - Paul A. Bates
- Biomolecular Modelling Laboratory, Cancer Research UK London Research Institute, London, United Kingdom
| |
Collapse
|
121
|
Yates CM, Sternberg MJE. The effects of non-synonymous single nucleotide polymorphisms (nsSNPs) on protein-protein interactions. J Mol Biol 2013; 425:3949-63. [PMID: 23867278 DOI: 10.1016/j.jmb.2013.07.012] [Citation(s) in RCA: 152] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2013] [Revised: 07/02/2013] [Accepted: 07/09/2013] [Indexed: 12/23/2022]
Abstract
Non-synonymous single nucleotide polymorphisms (nsSNPs) are single base changes leading to a change to the amino acid sequence of the encoded protein. Many of these variants are associated with disease, so nsSNPs have been well studied, with studies looking at the effects of nsSNPs on individual proteins, for example, on stability and enzyme active sites. In recent years, the impact of nsSNPs upon protein-protein interactions has also been investigated, giving a greater insight into the mechanisms by which nsSNPs can lead to disease. In this review, we summarize these studies, looking at the various mechanisms by which nsSNPs can affect protein-protein interactions. We focus on structural changes that can impair interaction, changes to disorder, gain of interaction, and post-translational modifications before looking at some examples of nsSNPs at human-pathogen protein-protein interfaces and the analysis of nsSNPs from a network perspective.
Collapse
Affiliation(s)
- Christopher M Yates
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Sir Ernst Chain Building, Imperial College London, South Kensington, SW7 2AZ, UK.
| | | |
Collapse
|
122
|
Nishi H, Tyagi M, Teng S, Shoemaker BA, Hashimoto K, Alexov E, Wuchty S, Panchenko AR. Cancer missense mutations alter binding properties of proteins and their interaction networks. PLoS One 2013; 8:e66273. [PMID: 23799087 PMCID: PMC3682950 DOI: 10.1371/journal.pone.0066273] [Citation(s) in RCA: 81] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Accepted: 05/02/2013] [Indexed: 11/18/2022] Open
Abstract
Many studies have shown that missense mutations might play an important role in carcinogenesis. However, the extent to which cancer mutations might affect biomolecular interactions remains unclear. Here, we map glioblastoma missense mutations on the human protein interactome, model the structures of affected protein complexes and decipher the effect of mutations on protein-protein, protein-nucleic acid and protein-ion binding interfaces. Although some missense mutations over-stabilize protein complexes, we found that the overall effect of mutations is destabilizing, mostly affecting the electrostatic component of binding energy. We also showed that mutations on interfaces resulted in more drastic changes of amino acid physico-chemical properties than mutations occurring outside the interfaces. Analysis of glioblastoma mutations on interfaces allowed us to stratify cancer-related interactions, identify potential driver genes, and propose two dozen additional cancer biomarkers, including those specific to functions of the nervous system. Such an analysis also offered insight into the molecular mechanism of the phenotypic outcomes of mutations, including effects on complex stability, activity, binding and turnover rate. As a result of mutated protein and gene network analysis, we observed that interactions of proteins with mutations mapped on interfaces had higher bottleneck properties compared to interactions with mutations elsewhere on the protein or unaffected interactions. Such observations suggest that genes with mutations directly affecting protein binding properties are preferably located in central network positions and may influence critical nodes and edges in signal transduction networks.
Collapse
Affiliation(s)
- Hafumi Nishi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Manoj Tyagi
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Shaolei Teng
- Computational Biophysics and Bioinformatics, Department of Physics, Clemson University, Clemson, South Carolina, United States of America
| | - Benjamin A. Shoemaker
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | | | - Emil Alexov
- Computational Biophysics and Bioinformatics, Department of Physics, Clemson University, Clemson, South Carolina, United States of America
| | - Stefan Wuchty
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Anna R. Panchenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
123
|
Nevin Gerek Z, Kumar S, Banu Ozkan S. Structural dynamics flexibility informs function and evolution at a proteome scale. Evol Appl 2013; 6:423-33. [PMID: 23745135 PMCID: PMC3673471 DOI: 10.1111/eva.12052] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2012] [Accepted: 01/13/2013] [Indexed: 01/04/2023] Open
Abstract
Protein structures are dynamic entities with a myriad of atomic fluctuations, side-chain rotations, and collective domain movements. Although the importance of these dynamics to proper functioning of proteins is emerging in the studies of many protein families, there is a lack of broad evidence for the critical role of protein dynamics in shaping the biological functions of a substantial fraction of residues for a large number of proteins in the human proteome. Here, we propose a novel dynamic flexibility index (dfi) to quantify the dynamic properties of individual residues in any protein and use it to assess the importance of protein dynamics in 100 human proteins. Our analyses involving functionally critical positions, disease-associated and putatively neutral population variations, and the rate of interspecific substitutions per residue produce concordant patterns at a proteome scale. They establish that the preservation of dynamic properties of residues in a protein structure is critical for maintaining the protein/biological function. Therefore, structural dynamics needs to become a major component of the analysis of protein function and evolution. Such analyses will be facilitated by the dfi, which will also enable the integrative use of structural dynamics with evolutionary conservation in genomic medicine as well as functional genomics investigations.
Collapse
Affiliation(s)
- Zeynep Nevin Gerek
- Center for Evolutionary Medicine and Informatics, Biodesign Institute, Arizona State University Tempe, AZ, USA ; Department of Physics, Center for Biological Physics, Bateman Physical Sciences F-Wing, Arizona State University Tempe, AZ, USA
| | | | | |
Collapse
|
124
|
Yates CM, Sternberg MJE. Proteins and domains vary in their tolerance of non-synonymous single nucleotide polymorphisms (nsSNPs). J Mol Biol 2013; 425:1274-86. [PMID: 23357174 DOI: 10.1016/j.jmb.2013.01.026] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2012] [Revised: 01/11/2013] [Accepted: 01/19/2013] [Indexed: 02/05/2023]
Abstract
The widespread application of whole-genome sequencing is identifying numerous non-synonymous single nucleotide polymorphisms (nsSNPs), many of which are associated with disease. We analyzed nsSNPs from Humsavar and the 1000 Genomes Project to investigate why some proteins and domains are more tolerant of mutations than others. We identified 311 proteins and 112 Pfam families, corresponding to 2910 domains, as diseasesusceptible and 32 proteins and 67 Pfam families (10,783 domains) as diseaseresistant based on the relative numbers of disease-associated and neutral polymorphisms. Proteins with no significant difference from expected numbers of disease and polymorphism nsSNPs are classified as other. This classification takes into account the phenotypes of all known mutations in the protein or domain rather than simply classifying based on the presence or absence of disease nsSNPs. Of the two hypotheses suggested, our results support the model that disease-resistant domains and proteins are more able to tolerate mutations rather than having more lethal mutations that are not observed. Disease-resistant proteins and domains show significantly higher mutation rates and lower sequence conservation than disease-susceptible proteins and domains. Disease-susceptible proteins are more likely to be encoded by essential genes, are more central in protein-protein interaction networks and are less likely to contain loss-of-function mutations in healthy individuals. We use this classification for nsSNP phenotype prediction, predicting nsSNPs in disease-susceptible domains to be disease and those in disease-resistant domains to be polymorphism. In this way, we achieve higher accuracy than SIFT, a state-of-the-art algorithm.
Collapse
Affiliation(s)
- Christopher M Yates
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, Sir Ernst Chain Building, South Kensington, London SW7 2AZ, UK.
| | | |
Collapse
|
125
|
Mosca R, Céol A, Aloy P. Interactome3D: adding structural details to protein networks. Nat Methods 2013; 10:47-53. [DOI: 10.1038/nmeth.2289] [Citation(s) in RCA: 339] [Impact Index Per Article: 30.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Accepted: 10/30/2012] [Indexed: 01/13/2023]
|
126
|
Wei Q, Xu Q, Dunbrack RL. Prediction of phenotypes of missense mutations in human proteins from biological assemblies. Proteins 2012; 81:199-213. [PMID: 22965855 DOI: 10.1002/prot.24176] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2012] [Revised: 08/16/2012] [Accepted: 08/17/2012] [Indexed: 11/11/2022]
Abstract
Single nucleotide polymorphisms (SNPs) are the most frequent variation in the human genome. Nonsynonymous SNPs that lead to missense mutations can be neutral or deleterious, and several computational methods have been presented that predict the phenotype of human missense mutations. These methods use sequence-based and structure-based features in various combinations, relying on different statistical distributions of these features for deleterious and neutral mutations. One structure-based feature that has not been studied significantly is the accessible surface area within biologically relevant oligomeric assemblies. These assemblies are different from the crystallographic asymmetric unit for more than half of X-ray crystal structures. We find that mutations in the core of proteins or in the interfaces in biological assemblies are significantly more likely to be disease-associated than those on the surface of the biological assemblies. For structures with more than one protein in the biological assembly (whether the same sequence or different), we find the accessible surface area from biological assemblies provides a statistically significant improvement in prediction over the accessible surface area of monomers from protein crystal structures (P = 6e-5). When adding this information to sequence-based features such as the difference between wildtype and mutant position-specific profile scores, the improvement from biological assemblies is statistically significant but much smaller (P = 0.018). Combining this information with sequence-based features in a support vector machine leads to 82% accuracy on a balanced dataset of 50% disease-associated mutations from SwissVar and 50% neutral mutations from human/primate sequence differences in orthologous proteins.
Collapse
Affiliation(s)
- Qiong Wei
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, Pennsylvania 19111, USA
| | | | | |
Collapse
|