1
|
Hoehe MR, Herwig R. Analysis of 1276 Haplotype-Resolved Genomes Allows Characterization of Cis- and Trans-Abundant Genes. Methods Mol Biol 2023; 2590:237-272. [PMID: 36335503 DOI: 10.1007/978-1-0716-2819-5_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Many methods for haplotyping have materialized, but their application on a significant scale has been rare to date. Here we summarize analyses that were carried out in 1092 genomes from the 1000 Genomes Consortium and validated in an unprecedented number of 184 PGP genomes that have been experimentally haplotype-resolved by application of the Long-Fragment Read (LFR) technology. These analyses provided first insights into the diplotypic nature of human genomes and its potential functional implications. Thus, protein-changing variants were not randomly distributed between the two homologues of 18,121 autosomal protein-coding genes but occurred significantly more frequently in cis than in trans configurations in virtually each of the 1276 phased genomes. This resulted in global cis/trans ratios of ~60:40, establishing "cis abundance" as a universal characteristic of diploid human genomes. This phenomenon was based on two different classes of genes, a larger one exhibiting cis configurations of protein-changing variants in excess, so-called "cis-abundant" genes, and a smaller one of "trans-abundant" genes. These two gene classes, which together constitute a common diplotypic exome, were further functionally distinguished by means of gene ontology (GO) and pathway enrichment analysis. Moreover, they were distinguishable in terms of their effects on the human interactome, where they constitute distinct cis and trans modules, as shown with network propagation on a large integrated protein-protein interaction network. These analyses, recently performed with updated database and analysis tools, further consolidated the characterization of cis- and trans-abundant genes while expanding previous results. In this chapter, we present the key results along with the materials and methods to motivate readers to investigate these findings independently and gain further insights into the diplotypic nature of genes and genomes.
Collapse
Affiliation(s)
- Margret R Hoehe
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany.
| | - Ralf Herwig
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| |
Collapse
|
2
|
Azbukina N, Zharikova A, Ramensky V. Intragenic compensation through the lens of deep mutational scanning. Biophys Rev 2022; 14:1161-1182. [PMID: 36345285 PMCID: PMC9636336 DOI: 10.1007/s12551-022-01005-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Accepted: 09/26/2022] [Indexed: 12/20/2022] Open
Abstract
A significant fraction of mutations in proteins are deleterious and result in adverse consequences for protein function, stability, or interaction with other molecules. Intragenic compensation is a specific case of positive epistasis when a neutral missense mutation cancels effect of a deleterious mutation in the same protein. Permissive compensatory mutations facilitate protein evolution, since without them all sequences would be extremely conserved. Understanding compensatory mechanisms is an important scientific challenge at the intersection of protein biophysics and evolution. In human genetics, intragenic compensatory interactions are important since they may result in variable penetrance of pathogenic mutations or fixation of pathogenic human alleles in orthologous proteins from related species. The latter phenomenon complicates computational and clinical inference of an allele's pathogenicity. Deep mutational scanning is a relatively new technique that enables experimental studies of functional effects of thousands of mutations in proteins. We review the important aspects of the field and discuss existing limitations of current datasets. We reviewed ten published DMS datasets with quantified functional effects of single and double mutations and described rates and patterns of intragenic compensation in eight of them. Supplementary Information The online version contains supplementary material available at 10.1007/s12551-022-01005-w.
Collapse
Affiliation(s)
- Nadezhda Azbukina
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
| | - Anastasia Zharikova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
- National Medical Research Center for Therapy and Preventive Medicine, Petroverigsky per., 10, Bld.3, 101000 Moscow, Russia
| | - Vasily Ramensky
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
- National Medical Research Center for Therapy and Preventive Medicine, Petroverigsky per., 10, Bld.3, 101000 Moscow, Russia
| |
Collapse
|
3
|
Fu T, Li F, Zhang Y, Yin J, Qiu W, Li X, Liu X, Xin W, Wang C, Yu L, Gao J, Zheng Q, Zeng S, Zhu F. VARIDT 2.0: structural variability of drug transporter. Nucleic Acids Res 2021; 50:D1417-D1431. [PMID: 34747471 PMCID: PMC8728241 DOI: 10.1093/nar/gkab1013] [Citation(s) in RCA: 67] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 10/08/2021] [Accepted: 11/04/2021] [Indexed: 12/20/2022] Open
Abstract
The structural variability data of drug transporter (DT) are key for research on precision medicine and rational drug use. However, these valuable data are not sufficiently covered by the available databases. In this study, a major update of VARIDT (a database previously constructed to provide DTs' variability data) was thus described. First, the experimentally resolved structures of all DTs reported in the original VARIDT were discovered from PubMed and Protein Data Bank. Second, the structural variability data of each DT were collected by literature review, which included: (a) mutation-induced spatial variations in folded state, (b) difference among DT structures of human and model organisms, (c) outward/inward-facing DT conformations and (d) xenobiotics-driven alterations in the 3D complexes. Third, for those DTs without experimentally resolved structural variabilities, homology modeling was further applied as well-established protocol to enrich such valuable data. As a result, 145 mutation-induced spatial variations of 42 DTs, 1622 inter-species structures originating from 292 DTs, 118 outward/inward-facing conformations belonging to 59 DTs, and 822 xenobiotics-regulated structures in complex with 57 DTs were updated to VARIDT (https://idrblab.org/varidt/ and http://varidt.idrblab.net/). All in all, the newly collected structural variabilities will be indispensable for explaining drug sensitivity/selectivity, bridging preclinical research with clinical trial, revealing the mechanism underlying drug-drug interaction, and so on.
Collapse
Affiliation(s)
- Tingting Fu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Institute of Theoretical Chemistry, College of Chemistry, Jilin University, Changchun 130023, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yang Zhang
- Department of Pharmacology, Hebei Medical University, Shijiazhuang 050017, China
| | - Jiayi Yin
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Wenqi Qiu
- Department of Surgery, HKU-SZH & Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Xuedong Li
- Department of Pharmacology, Hebei Medical University, Shijiazhuang 050017, China
| | - Xingang Liu
- Department of Pharmacology, Hebei Medical University, Shijiazhuang 050017, China
| | - Wenwen Xin
- Department of Pharmacology, Hebei Medical University, Shijiazhuang 050017, China
| | - Chengzhao Wang
- Department of Pharmacology, Hebei Medical University, Shijiazhuang 050017, China
| | - Lushan Yu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jianqing Gao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang, China
| | - Qingchuan Zheng
- Institute of Theoretical Chemistry, College of Chemistry, Jilin University, Changchun 130023, China
| | - Su Zeng
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
4
|
Serrano C, Teixeira CSS, Cooper DN, Carneiro J, Lopes-Marques M, Stenson PD, Amorim A, Prata MJ, Sousa SF, Azevedo L. Compensatory epistasis explored by molecular dynamics simulations. Hum Genet 2021; 140:1329-1342. [PMID: 34173867 DOI: 10.1007/s00439-021-02307-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 06/20/2021] [Indexed: 11/24/2022]
Abstract
A non-negligible proportion of human pathogenic variants are known to be present as wild type in at least some non-human mammalian species. The standard explanation for this finding is that molecular mechanisms of compensatory epistasis can alleviate the mutations' otherwise pathogenic effects. Examples of compensated variants have been described in the literature but the interacting residue(s) postulated to play a compensatory role have rarely been ascertained. In this study, the examination of five human X-chromosomally encoded proteins (FIX, GLA, HPRT1, NDP and OTC) allowed us to identify several candidate compensated variants. Strong evidence for a compensated/compensatory pair of amino acids in the coagulation FIXa protein (involving residues 270 and 271) was found in a variety of mammalian species. Both amino acid residues are located within the 60-loop, spatially close to the 39-loop that performs a key role in coagulation serine proteases. To understand the nature of the underlying interactions, molecular dynamics simulations were performed. The predicted conformational change in the 39-loop consequent to the Glu270Lys substitution (associated with hemophilia B) appears to impair the protein's interaction with its substrate but, importantly, such steric hindrance is largely mitigated in those proteins that carry the compensatory residue (Pro271) at the neighboring amino acid position.
Collapse
Affiliation(s)
- Catarina Serrano
- i3S, Instituto de Investigação e Inovação em Saúde, Population Genetics and Evolution Group, Universidade do Porto, Rua Alfredo Allen 208, 4200-135, Porto, Portugal
- IPATIMUP-Institute of Molecular Pathology and Immunology, University of Porto, Rua Júlio Amaral de Carvalho 45, 4200-135, Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua Do Campo Alegre, s/n, 4169-007, Porto, Portugal
| | - Carla S S Teixeira
- UCIBIO/REQUIMTE, BioSIM, Departamento de Biomedicina, Faculdade de Medicina da Universidade do Porto, Porto, Portugal
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - João Carneiro
- CIIMAR, Interdisciplinary Centre of Marine and Environmental Research, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208, Matosinhos, Portugal
| | - Mónica Lopes-Marques
- i3S, Instituto de Investigação e Inovação em Saúde, Population Genetics and Evolution Group, Universidade do Porto, Rua Alfredo Allen 208, 4200-135, Porto, Portugal
- IPATIMUP-Institute of Molecular Pathology and Immunology, University of Porto, Rua Júlio Amaral de Carvalho 45, 4200-135, Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua Do Campo Alegre, s/n, 4169-007, Porto, Portugal
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - António Amorim
- i3S, Instituto de Investigação e Inovação em Saúde, Population Genetics and Evolution Group, Universidade do Porto, Rua Alfredo Allen 208, 4200-135, Porto, Portugal
- IPATIMUP-Institute of Molecular Pathology and Immunology, University of Porto, Rua Júlio Amaral de Carvalho 45, 4200-135, Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua Do Campo Alegre, s/n, 4169-007, Porto, Portugal
| | - Maria J Prata
- i3S, Instituto de Investigação e Inovação em Saúde, Population Genetics and Evolution Group, Universidade do Porto, Rua Alfredo Allen 208, 4200-135, Porto, Portugal
- IPATIMUP-Institute of Molecular Pathology and Immunology, University of Porto, Rua Júlio Amaral de Carvalho 45, 4200-135, Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua Do Campo Alegre, s/n, 4169-007, Porto, Portugal
| | - Sérgio F Sousa
- UCIBIO/REQUIMTE, BioSIM, Departamento de Biomedicina, Faculdade de Medicina da Universidade do Porto, Porto, Portugal.
| | - Luísa Azevedo
- i3S, Instituto de Investigação e Inovação em Saúde, Population Genetics and Evolution Group, Universidade do Porto, Rua Alfredo Allen 208, 4200-135, Porto, Portugal.
- IPATIMUP-Institute of Molecular Pathology and Immunology, University of Porto, Rua Júlio Amaral de Carvalho 45, 4200-135, Porto, Portugal.
- Department of Biology, Faculty of Sciences, University of Porto, Rua Do Campo Alegre, s/n, 4169-007, Porto, Portugal.
| |
Collapse
|
5
|
Sharma V, Hiller M. Losses of human disease-associated genes in placental mammals. NAR Genom Bioinform 2019; 2:lqz012. [PMID: 33575564 PMCID: PMC7671337 DOI: 10.1093/nargab/lqz012] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 08/24/2019] [Accepted: 10/08/2019] [Indexed: 02/07/2023] Open
Abstract
We systematically investigate whether losses of human disease-associated genes occurred in other mammals during evolution. We first show that genes lost in any of 62 non-human mammals generally have a lower degree of pleiotropy, and are highly depleted in essential and disease-associated genes. Despite this under-representation, we discovered multiple genes implicated in human disease that are truly lost in non-human mammals. In most cases, traits resembling human disease symptoms are present but not deleterious in gene-loss species, exemplified by losses of genes causing human eye or teeth disorders in poor-vision or enamel-less mammals. We also found widespread losses of PCSK9 and CETP genes, where loss-of-function mutations in humans protect from atherosclerosis. Unexpectedly, we discovered losses of disease genes (TYMP, TBX22, ABCG5, ABCG8, MEFV, CTSE) where deleterious phenotypes do not manifest in the respective species. A remarkable example is the uric acid-degrading enzyme UOX, which we found to be inactivated in elephants and manatees. While UOX loss in hominoids led to high serum uric acid levels and a predisposition for gout, elephants and manatees exhibit low uric acid levels, suggesting alternative ways of metabolizing uric acid. Together, our results highlight numerous mammals that are 'natural knockouts' of human disease genes.
Collapse
Affiliation(s)
- Virag Sharma
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany.,Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany.,Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Michael Hiller
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany.,Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany.,Center for Systems Biology Dresden, 01307 Dresden, Germany
| |
Collapse
|
6
|
Hoehe MR, Herwig R, Mao Q, Peters BA, Drmanac R, Church GM, Huebsch T. Significant abundance of cis configurations of coding variants in diploid human genomes. Nucleic Acids Res 2019; 47:2981-2995. [PMID: 30698752 PMCID: PMC6451136 DOI: 10.1093/nar/gkz031] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Revised: 12/05/2018] [Accepted: 01/15/2019] [Indexed: 12/12/2022] Open
Abstract
To fully understand human genetic variation and its functional consequences, the specific distribution of variants between the two chromosomal homologues of genes must be known. The 'phase' of variants can significantly impact gene function and phenotype. To assess patterns of phase at large scale, we have analyzed 18 121 autosomal genes in 1092 statistically phased genomes from the 1000 Genomes Project and 184 experimentally phased genomes from the Personal Genome Project. Here we show that genes with cis-configurations of coding variants are more frequent than genes with trans-configurations in a genome, with global cis/trans ratios of ∼60:40. Significant cis-abundance was observed in virtually all genomes in all populations. Moreover, we identified a large group of genes exhibiting cis-configurations of protein-changing variants in excess, so-called 'cis-abundant genes', and a smaller group of 'trans-abundant genes'. These two gene categories were functionally distinguishable, and exhibited strikingly different distributional patterns of protein-changing variants. Underlying these phenomena was a shared set of phase-sensitive genes of importance for adaptation and evolution. This work establishes common patterns of phase as key characteristics of diploid human exomes and provides evidence for their functional significance, highlighting the importance of phase for the interpretation of protein-coding genetic variation and gene function.
Collapse
Affiliation(s)
- Margret R Hoehe
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Ralf Herwig
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Qing Mao
- Complete Genomics, Inc., San Jose, CA 95112, USA
| | - Brock A Peters
- Complete Genomics, Inc., San Jose, CA 95112, USA.,BGI-Shenzhen, Shenzhen 518083, China
| | - Radoje Drmanac
- Complete Genomics, Inc., San Jose, CA 95112, USA.,BGI-Shenzhen, Shenzhen 518083, China
| | - George M Church
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Thomas Huebsch
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| |
Collapse
|
7
|
Marín Ò, Aguirre J, de la Cruz X. Compensated pathogenic variants in coagulation factors VIII and IX present complex mapping between molecular impact and hemophilia severity. Sci Rep 2019; 9:9538. [PMID: 31267011 PMCID: PMC6606640 DOI: 10.1038/s41598-019-45916-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Accepted: 06/18/2019] [Indexed: 01/07/2023] Open
Abstract
Compensated pathogenic deviations (CPDs) are sequence variants that are pathogenic in humans but neutral in other species. In recent years, our molecular understanding of CPDs has advanced substantially. For example, it is known that their impact on human proteins is generally milder than that of average pathogenic mutations and that their impact is suppressed in non-human carriers by compensatory mutations. However, prior studies have ignored the evolutionarily relevant relationship between molecular impact and organismal phenotype. Here, we explore this topic using CPDs from FVIII and FIX and data concerning carriers' hemophilia severity. We find that, regardless of their molecular impact, these mutations can be associated with either mild or severe disease phenotypes. Only a weak relationship is found between protein stability changes and severity. We also characterize the population variability of hemostasis proteins, which constitute the genetic background of FVIII and FIX, using data from the 1000 Genome project. We observe that genetic background can vary substantially between individuals in terms of both the amount and nature of genetic variants. Finally, we discuss how these results highlight the need to include new terms in present models of protein evolution to explain the origin of CPDs.
Collapse
Affiliation(s)
- Òscar Marín
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, P/Vall d'Hebron, 119-129, 08035, Barcelona, Spain
| | - Josu Aguirre
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, P/Vall d'Hebron, 119-129, 08035, Barcelona, Spain
| | - Xavier de la Cruz
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, P/Vall d'Hebron, 119-129, 08035, Barcelona, Spain. .,ICREA, Barcelona, Spain.
| |
Collapse
|
8
|
Storz JF. Compensatory mutations and epistasis for protein function. Curr Opin Struct Biol 2018; 50:18-25. [PMID: 29100081 PMCID: PMC5936477 DOI: 10.1016/j.sbi.2017.10.009] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Revised: 10/05/2017] [Accepted: 10/12/2017] [Indexed: 01/09/2023]
Abstract
Adaptive protein evolution may be facilitated by neutral amino acid mutations that confer no benefit when they first arise but which potentiate subsequent function-altering mutations via direct or indirect structural mechanisms. Theoretical and empirical results indicate that such compensatory interactions (intramolecular epistasis) can exert a strong influence on trajectories of protein evolution. For this reason, assessing the form and prevalence of intramolecular epistasis and characterizing biophysical mechanisms of compensatory interaction are important research goals at the nexus of structural biology and molecular evolution. Here I review recent insights derived from protein-engineering studies, and I describe an approach for identifying and characterizing mechanisms of epistasis that integrates experimental data on structure-function relationships with analyses of comparative sequence data.
Collapse
Affiliation(s)
- Jay F Storz
- University of Nebraska, School of Biological Sciences, Lincoln, NE 68588-0114, United States.
| |
Collapse
|
9
|
Tiberti M, Pandini A, Fraternali F, Fornili A. In silico identification of rescue sites by double force scanning. Bioinformatics 2018; 34:207-214. [PMID: 28961796 PMCID: PMC5860198 DOI: 10.1093/bioinformatics/btx515] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2017] [Revised: 06/23/2017] [Accepted: 08/10/2017] [Indexed: 01/03/2023] Open
Abstract
Motivation A deleterious amino acid change in a protein can be compensated by a second-site rescue mutation. These compensatory mechanisms can be mimicked by drugs. In particular, the location of rescue mutations can be used to identify protein regions that can be targeted by small molecules to reactivate a damaged mutant. Results We present the first general computational method to detect rescue sites. By mimicking the effect of mutations through the application of forces, the double force scanning (DFS) method identifies the second-site residues that make the protein structure most resilient to the effect of pathogenic mutations. We tested DFS predictions against two datasets containing experimentally validated and putative evolutionary-related rescue sites. A remarkably good agreement was found between predictions and experimental data. Indeed, almost half of the rescue sites in p53 was correctly predicted by DFS, with 65% of remaining sites in contact with DFS predictions. Similar results were found for other proteins in the evolutionary dataset. Availability and implementation The DFS code is available under GPL at https://fornililab.github.io/dfs/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matteo Tiberti
- School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
| | - Alessandro Pandini
- Department of Computer Science, College of Engineering, Design and Physical Sciences and Synthetic Biology Theme, Institute of Environment, Health and Societies, Brunel University London, Uxbridge, London, UK
| | - Franca Fraternali
- Randall Division of Cell and Molecular Biophysics, King‘s College London, London, UK
- The Francis Crick Institute, London, UK
- The Thomas Young Centre for Theory and Simulation of Materials, London, UK
| | - Arianna Fornili
- School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
- The Thomas Young Centre for Theory and Simulation of Materials, London, UK
| |
Collapse
|
10
|
Starr TN, Thornton JW. Epistasis in protein evolution. Protein Sci 2016; 25:1204-18. [PMID: 26833806 PMCID: PMC4918427 DOI: 10.1002/pro.2897] [Citation(s) in RCA: 317] [Impact Index Per Article: 39.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Revised: 01/25/2016] [Accepted: 01/27/2016] [Indexed: 01/18/2023]
Abstract
The structure, function, and evolution of proteins depend on physical and genetic interactions among amino acids. Recent studies have used new strategies to explore the prevalence, biochemical mechanisms, and evolutionary implications of these interactions-called epistasis-within proteins. Here we describe an emerging picture of pervasive epistasis in which the physical and biological effects of mutations change over the course of evolution in a lineage-specific fashion. Epistasis can restrict the trajectories available to an evolving protein or open new paths to sequences and functions that would otherwise have been inaccessible. We describe two broad classes of epistatic interactions, which arise from different physical mechanisms and have different effects on evolutionary processes. Specific epistasis-in which one mutation influences the phenotypic effect of few other mutations-is caused by direct and indirect physical interactions between mutations, which nonadditively change the protein's physical properties, such as conformation, stability, or affinity for ligands. In contrast, nonspecific epistasis describes mutations that modify the effect of many others; these typically behave additively with respect to the physical properties of a protein but exhibit epistasis because of a nonlinear relationship between the physical properties and their biological effects, such as function or fitness. Both types of interaction are rampant, but specific epistasis has stronger effects on the rate and outcomes of evolution, because it imposes stricter constraints and modulates evolutionary potential more dramatically; it therefore makes evolution more contingent on low-probability historical events and leaves stronger marks on the sequences, structures, and functions of protein families.
Collapse
Affiliation(s)
- Tyler N Starr
- Graduate Program in Biochemistry and Molecular Biophysics, University of Chicago, Chicago, Illinois, 60637
| | - Joseph W Thornton
- Departments of Ecology and Evolution and Human Genetics, University of Chicago, Chicago, Illinois, 60637
| |
Collapse
|
11
|
Abstract
To what extent is the convergent evolution of protein function attributable to convergent or parallel changes at the amino acid level? The mutations that contribute to adaptive protein evolution may represent a biased subset of all possible beneficial mutations owing to mutation bias and/or variation in the magnitude of deleterious pleiotropy. A key finding is that the fitness effects of amino acid mutations are often conditional on genetic background. This context dependence (epistasis) can reduce the probability of convergence and parallelism because it reduces the number of possible mutations that are unconditionally acceptable in divergent genetic backgrounds. Here, I review factors that influence the probability of replicated evolution at the molecular level.
Collapse
Affiliation(s)
- Jay F Storz
- School of Biological Sciences, University of Nebraska, Lincoln, Nebraska 68588, USA
| |
Collapse
|
12
|
Mueller SC, Sommer B, Backes C, Haas J, Meder B, Meese E, Keller A. From Single Variants to Protein Cascades: MULTISCALE MODELING OF SINGLE NUCLEOTIDE VARIANT SETS IN GENETIC DISORDERS. J Biol Chem 2016; 291:1582-1590. [PMID: 26601959 DOI: 10.1074/jbc.m115.695247] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2015] [Indexed: 01/18/2023] Open
Abstract
Understanding the role of genetics in disease has become a central part of medical research. Non-synonymous single nucleotide variants (nsSNVs) in coding regions of human genes frequently lead to pathological phenotypes. Beyond single variations, the individual combination of nsSNVs may add to pathogenic processes. We developed a multiscale pipeline to systematically analyze the existence of quantitative effects of multiple nsSNVs and gene combinations in single individuals on pathogenicity. Based on this pipeline, we detected in a data set of 842 nsSNVs discovered in 76 genes related to cardiomyopathies, associated nsSNV combinations in seven genes present in at least 70% of all 639 patient samples, but not in a control cohort of healthy humans. Structural analyses of these revealed primarily an influence on the protein stability. For amino acid substitutions located at the protein surface, we generally observed a proximity to putative binding pockets. To computationally analyze cumulative effects and their impact, pathogenicity methods are currently being developed. Our approach supports this process, as shown on the example of a cardiac phenotype but can be likewise applied to other diseases such as cancer.
Collapse
Affiliation(s)
- Sabine C Mueller
- From the Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany,; Department of Human Genetics, Saarland University, 66421 Homburg, Germany,.
| | - Björn Sommer
- the Bio-/Medical Informatics Department, Faculty of Technology, Bielefeld University, 33501 Bielefeld, Germany,; Clayton School of Information Technology, Faculty of Information Technology, Monash University, Melbourne 3800, Australia
| | - Christina Backes
- From the Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
| | - Jan Haas
- the Department of Internal Medicine III, Heidelberg University, 69120 Heidelberg, Germany, and; the DZHK (German Centre for Cardiovascular Research), 69120 Heidelberg, Germany
| | - Benjamin Meder
- the Department of Internal Medicine III, Heidelberg University, 69120 Heidelberg, Germany, and; the DZHK (German Centre for Cardiovascular Research), 69120 Heidelberg, Germany
| | - Eckart Meese
- Department of Human Genetics, Saarland University, 66421 Homburg, Germany
| | - Andreas Keller
- From the Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
| |
Collapse
|
13
|
Abstract
Deleterious or 'disease-associated' mutations are mutations that lead to disease with high phenotype penetrance: they are inherited in a simple Mendelian manner, or, in the case of cancer, accumulate in somatic cells leading directly to disease. However, in some cases, the amino acid that is substituted resulting in disease is the wild-type native residue in the functionally equivalent protein in another species. Such examples are known as 'compensated pathogenic deviations' (CPDs) because, somewhere in the second species, there must be compensatory mutations that allow the protein to function normally despite having a residue which would cause disease in the first species. Depending on the nature of the mutations, compensation can occur in the same protein, or in a different protein with which it interacts. In principle, compensation can be achieved by a single mutation (most probably structurally close to the CPD), or by the cumulative effect of several mutations. Although it is clear that these effects occur in proteins, compensatory mutations are also important in RNA potentially having an impact on disease. As a much simpler molecule, RNA provides an interesting model for understanding mechanisms of compensatory effects, both by looking at naturally occurring RNA molecules and as a means of computational simulation. This review surveys the rather limited literature that has explored these effects. Understanding the nature of CPDs is important in understanding traversal along fitness landscape valleys in evolution. It could also have applications in treating diseases that result from such mutations.
Collapse
|
14
|
Identification of cis-suppression of human disease mutations by comparative genomics. Nature 2015; 524:225-9. [PMID: 26123021 DOI: 10.1038/nature14497] [Citation(s) in RCA: 96] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2014] [Accepted: 04/23/2015] [Indexed: 11/08/2022]
Abstract
Patterns of amino acid conservation have served as a tool for understanding protein evolution. The same principles have also found broad application in human genomics, driven by the need to interpret the pathogenic potential of variants in patients. Here we performed a systematic comparative genomics analysis of human disease-causing missense variants. We found that an appreciable fraction of disease-causing alleles are fixed in the genomes of other species, suggesting a role for genomic context. We developed a model of genetic interactions that predicts most of these to be simple pairwise compensations. Functional testing of this model on two known human disease genes revealed discrete cis amino acid residues that, although benign on their own, could rescue the human mutations in vivo. This approach was also applied to ab initio gene discovery to support the identification of a de novo disease driver in BTG2 that is subject to protective cis-modification in more than 50 species. Finally, on the basis of our data and models, we developed a computational tool to predict candidate residues subject to compensation. Taken together, our data highlight the importance of cis-genomic context as a contributor to protein evolution; they provide an insight into the complexity of allele effect on phenotype; and they are likely to assist methods for predicting allele pathogenicity.
Collapse
|
15
|
Sikosek T, Chan HS. Biophysics of protein evolution and evolutionary protein biophysics. J R Soc Interface 2015; 11:20140419. [PMID: 25165599 DOI: 10.1098/rsif.2014.0419] [Citation(s) in RCA: 150] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence-structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by 'hidden' conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution.
Collapse
Affiliation(s)
- Tobias Sikosek
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| | - Hue Sun Chan
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| |
Collapse
|
16
|
Riera C, Lois S, Domínguez C, Fernandez-Cadenas I, Montaner J, Rodríguez-Sureda V, de la Cruz X. Molecular damage in Fabry disease: characterization and prediction of alpha-galactosidase A pathological mutations. Proteins 2014; 83:91-104. [PMID: 25382311 DOI: 10.1002/prot.24708] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2014] [Revised: 09/25/2014] [Accepted: 10/18/2014] [Indexed: 12/12/2022]
Abstract
Loss-of-function mutations of the enzyme alpha-galactosidase A (GLA) causes Fabry disease (FD), that is a rare and potentially fatal disease. Identification of these pathological mutations by sequencing is important because it allows an early treatment of the disease. However, before taking any treatment decision, if the mutation identified is unknown, we first need to establish if it is pathological or not. General bioinformatic tools (PolyPhen-2, SIFT, Condel, etc.) can be used for this purpose, but their performance is still limited. Here we present a new tool, specifically derived for the assessment of GLA mutations. We first compared mutations of this enzyme known to cause FD with neutral sequence variants, using several structure and sequence properties. Then, we used these properties to develop a family of prediction methods adapted to different quality requirements. Trained and tested on a set of known Fabry mutations, our methods have a performance (Matthews correlation: 0.56-0.72) comparable or better than that of the more complex method, Polyphen-2 (Matthews correlation: 0.61), and better than those of SIFT (Matthews correl.: 0.54) and Condel (Matthews correl.: 0.51). This result is validated in an independent set of 65 pathological mutations, for which our method displayed the best success rate (91.0%, 87.7%, and 73.8%, for our method, PolyPhen-2 and SIFT, respectively). These data confirmed that our specific approach can effectively contribute to the identification of pathological mutations in GLA, and therefore enhance the use of sequence information in the identification of undiagnosed Fabry patients.
Collapse
Affiliation(s)
- Casandra Riera
- Research Unit in Translational Bioinformatics, Institut de Recerca Hospital Vall d'Hebron (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain
| | | | | | | | | | | | | |
Collapse
|
17
|
Ivankov DN, Finkelstein AV, Kondrashov FA. A structural perspective of compensatory evolution. Curr Opin Struct Biol 2014; 26:104-12. [PMID: 24981969 PMCID: PMC4141909 DOI: 10.1016/j.sbi.2014.05.004] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Revised: 04/11/2014] [Accepted: 05/16/2014] [Indexed: 11/25/2022]
Abstract
The study of molecular evolution is important because it reveals how protein functions emerge and evolve. Recently, several types of studies indicated that substitutions in molecular evolution occur in a compensatory manner, whereby the occurrence of a substitution depends on the amino acid residues at other sites. However, a molecular or structural basis behind the compensation often remains obscure. Here, we review studies on the interface of structural biology and molecular evolution that revealed novel aspects of compensatory evolution. In many cases structural studies benefit from evolutionary data while structural data often add a functional dimension to the study of molecular evolution.
Collapse
Affiliation(s)
- Dmitry N Ivankov
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 88 Dr. Aiguader, 08003 Barcelona, Spain; Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain; Laboratory of Protein Physics, Institute of Protein Research of the Russian Academy of Sciences, 4 Institutskaya str., Pushchino, Moscow Region, 142290, Russia
| | - Alexei V Finkelstein
- Laboratory of Protein Physics, Institute of Protein Research of the Russian Academy of Sciences, 4 Institutskaya str., Pushchino, Moscow Region, 142290, Russia
| | - Fyodor A Kondrashov
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 88 Dr. Aiguader, 08003 Barcelona, Spain; Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), 23 Pg. Lluís Companys, 08010 Barcelona, Spain.
| |
Collapse
|
18
|
Xu J, Zhang J. Why human disease-associated residues appear as the wild-type in other species: genome-scale structural evidence for the compensation hypothesis. Mol Biol Evol 2014; 31:1787-92. [PMID: 24723421 DOI: 10.1093/molbev/msu130] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Many human-disease associated amino acid residues (DARs) appear as the wild-type in other species. This phenomenon is commonly explained by the presence of compensatory residues in these other species that alleviate the deleterious effects of the DARs. The general validity of this hypothesis, however, is unclear, because few compensatory residues have been identified. Here we test the compensation hypothesis by assembling and analyzing 1,077 DARs located in 177 proteins of known crystal structures. Because destabilizing protein structures is a primary reason why DARs are deleterious, we focus on protein stability in this analysis. We discover that, in species where a DAR represents the wild-type, the destabilizing effect of the DAR is generally lessened by the observed amino acid substitutions in the spatial proximity of the DAR. This and other findings provide genome-scale evidence for the compensation hypothesis and have important implications for understanding epistasis in protein evolution and for using animal models of human diseases.
Collapse
Affiliation(s)
- Jinrui Xu
- Department of Computational Medicine and Bioinformatics, University of Michigan
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan
| |
Collapse
|
19
|
Riera C, Lois S, de la Cruz X. Prediction of pathological mutations in proteins: the challenge of integrating sequence conservation and structure stability principles. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2013. [DOI: 10.1002/wcms.1170] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Casandra Riera
- Laboratory of Translational Bioinformatics in Neuroscience; VHIR; Barcelona Spain
| | - Sergio Lois
- Laboratory of Translational Bioinformatics in Neuroscience; VHIR; Barcelona Spain
| | - Xavier de la Cruz
- Laboratory of Translational Bioinformatics in Neuroscience; VHIR; Barcelona Spain
- Institució Catalana per la Recerca i Estudis Avançats (ICREA); Barcelona Spain
| |
Collapse
|
20
|
Soylemez O, Kondrashov FA. Estimating the rate of irreversibility in protein evolution. Genome Biol Evol 2013; 4:1213-22. [PMID: 23132897 PMCID: PMC3542581 DOI: 10.1093/gbe/evs096] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Whether or not evolutionary change is inherently irreversible remains a controversial
topic. Some examples of evolutionary irreversibility are known; however, this question has
not been comprehensively addressed at the molecular level. Here, we use data from 221
human genes with known pathogenic mutations to estimate the rate of irreversibility in
protein evolution. For these genes, we reconstruct ancestral amino acid sequences along
the mammalian phylogeny and identify ancestral amino acid states that match known
pathogenic mutations. Such cases represent inherent evolutionary irreversibility because,
at the present moment, reversals to these ancestral amino acid states are impossible for
the human lineage. We estimate that approximately 10% of all amino acid
substitutions along the mammalian phylogeny are irreversible, such that a return to the
ancestral amino acid state would lead to a pathogenic phenotype. For a subset of 51 genes
with high rates of irreversibility, as much as 40% of all amino acid evolution was
estimated to be irreversible. Because pathogenic phenotypes do not resemble ancestral
phenotypes, the molecular nature of the high rate of irreversibility in proteins is best
explained by evolution with a high prevalence of compensatory, epistatic interactions
between amino acid sites. Under such mode of protein evolution, once an amino acid
substitution is fixed, the probability of its reversal declines as the protein sequence
accumulates changes that affect the phenotypic manifestation of the ancestral state. The
prevalence of epistasis in evolution indicates that the observed high rate of
irreversibility in protein evolution is an inherent property of protein structure and
function.
Collapse
Affiliation(s)
- Onuralp Soylemez
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
| | | |
Collapse
|
21
|
Wang C, Huang R, He B, Du Q. Improving the thermostability of alpha-amylase by combinatorial coevolving-site saturation mutagenesis. BMC Bioinformatics 2012; 13:263. [PMID: 23057711 PMCID: PMC3478181 DOI: 10.1186/1471-2105-13-263] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2012] [Accepted: 09/11/2012] [Indexed: 11/12/2022] Open
Abstract
Background The generation of focused mutant libraries at hotspot residues is an important strategy in directed protein evolution. Existing methods, such as combinatorial active site testing and residual coupling analysis, depend primarily on the evolutionary conserved information to find the hotspot residues. Hardly any attention has been paid to another important functional and structural determinants, the functionally correlated variation information--coevolution. Results In this paper, we suggest a new method, named combinatorial coevolving-site saturation mutagenesis (CCSM), in which the functionally correlated variation sites of proteins are chosen as the hotspot sites to construct focused mutant libraries. The CCSM approach was used to improve the thermal stability of α-amylase from Bacillus subtilis CN7 (Amy7C). The results indicate that the CCSM can identify novel beneficial mutation sites, and enhance the thermal stability of wild-type Amy7C by 8°C (
T5030), which could not be achieved with the ordinarily rational introduction of single or a double point mutation. Conclusions Our method is able to produce more thermostable mutant α-amylases with novel beneficial mutations at new sites. It is also verified that the coevolving sites can be used as the hotspots to construct focused mutant libraries in protein engineering. This study throws new light on the active researches of the molecular coevolution.
Collapse
Affiliation(s)
- Chenghua Wang
- Nanjing University of Technology, Nanjing, Jiangsu, China
| | | | | | | |
Collapse
|
22
|
Current challenges in genome annotation through structural biology and bioinformatics. Curr Opin Struct Biol 2012; 22:594-601. [PMID: 22884875 DOI: 10.1016/j.sbi.2012.07.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2012] [Revised: 06/29/2012] [Accepted: 07/09/2012] [Indexed: 01/25/2023]
Abstract
With the huge volume in genomic sequences being generated from high-throughout sequencing projects the requirement for providing accurate and detailed annotations of gene products has never been greater. It is proving to be a huge challenge for computational biologists to use as much information as possible from experimental data to provide annotations for genome data of unknown function. A central component to this process is to use experimentally determined structures, which provide a means to detect homology that is not discernable from just the sequence and permit the consequences of genomic variation to be realized at the molecular level. In particular, structures also form the basis of many bioinformatics methods for improving the detailed functional annotations of enzymes in combination with similarities in sequence and chemistry.
Collapse
|
23
|
Zhang G, Pei Z, Ball EV, Mort M, Kehrer-Sawatzki H, Cooper DN. Cross-comparison of the genome sequences from human, chimpanzee, Neanderthal and a Denisovan hominin identifies novel potentially compensated mutations. Hum Genomics 2012; 5:453-84. [PMID: 21807602 PMCID: PMC3525967 DOI: 10.1186/1479-7364-5-5-453] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
The recent publication of the draft genome sequences of the Neanderthal and a ~50,000-year-old archaic hominin from Denisova Cave in southern Siberia has ushered in a new age in molecular archaeology. We previously cross-compared the human, chimpanzee and Neanderthal genome sequences with respect to a set of disease-causing/disease-associated missense and regulatory mutations (Human Gene Mutation Database) and succeeded in identifying genetic variants which, although apparently pathogenic in humans, may represent a 'compensated' wild-type state in at least one of the other two species. Here, in an attempt to identify further 'potentially compensated mutations' (PCMs) of interest, we have compared our dataset of disease-causing/disease-associated mutations with their corresponding nucleotide positions in the Denisovan hominin, Neanderthal and chimpanzee genomes. Of the 15 human putatively disease-causing mutations that were found to be compensated in chimpanzee, Denisovan or Neanderthal, only a solitary F5 variant (Val1736Met) was specific to the Denisovan. In humans, this missense mutation is associated with activated protein C resistance and an increased risk of thromboembolism and recurrent miscarriage. It is unclear at this juncture whether this variant was indeed a PCM in the Denisovan or whether it could instead have been associated with disease in this ancient hominin.
Collapse
Affiliation(s)
- Guojie Zhang
- Bioinformatics Department, Beijing Genomics Institute at Shenzhen, China.
| | | | | | | | | | | |
Collapse
|
24
|
Kumar S, Dudley JT, Filipski A, Liu L. Phylomedicine: an evolutionary telescope to explore and diagnose the universe of disease mutations. Trends Genet 2011; 27:377-86. [PMID: 21764165 PMCID: PMC3272884 DOI: 10.1016/j.tig.2011.06.004] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2011] [Revised: 06/10/2011] [Accepted: 06/13/2011] [Indexed: 12/30/2022]
Abstract
Modern technologies have made the sequencing of personal genomes routine. They have revealed thousands of nonsynonymous (amino acid altering) single nucleotide variants (nSNVs) of protein-coding DNA per genome. What do these variants foretell about an individual's predisposition to diseases? The experimental technologies required to carry out such evaluations at a genomic scale are not yet available. Fortunately, the process of natural selection has lent us an almost infinite set of tests in nature. During long-term evolution, new mutations and existing variations have been evaluated for their biological consequences in countless species, and outcomes are readily revealed by multispecies genome comparisons. We review studies that have investigated evolutionary characteristics and in silico functional diagnoses of nSNVs found in thousands of disease-associated genes. We conclude that the patterns of long-term evolutionary conservation and permissible sequence divergence are essential and instructive modalities for functional assessment of human genetic variations.
Collapse
Affiliation(s)
- Sudhir Kumar
- School of Life Sciences, Arizona State University, Tempe, AZ 85287-4501, USA.
| | | | | | | |
Collapse
|
25
|
Gong S, Worth CL, Cheng TMK, Blundell TL. Meet Me Halfway: When Genomics Meets Structural Bioinformatics. J Cardiovasc Transl Res 2011; 4:281-303. [DOI: 10.1007/s12265-011-9259-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Accepted: 02/08/2011] [Indexed: 01/08/2023]
|
26
|
Zhang G, Pei Z, Krawczak M, Ball EV, Mort M, Kehrer-Sawatzki H, Cooper DN. Triangulation of the human, chimpanzee, and Neanderthal genome sequences identifies potentially compensated mutations. Hum Mutat 2010; 31:1286-93. [DOI: 10.1002/humu.21389] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
27
|
Cooper DN, Chen JM, Ball EV, Howells K, Mort M, Phillips AD, Chuzhanova N, Krawczak M, Kehrer-Sawatzki H, Stenson PD. Genes, mutations, and human inherited disease at the dawn of the age of personalized genomics. Hum Mutat 2010; 31:631-55. [PMID: 20506564 DOI: 10.1002/humu.21260] [Citation(s) in RCA: 117] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The number of reported germline mutations in human nuclear genes, either underlying or associated with inherited disease, has now exceeded 100,000 in more than 3,700 different genes. The availability of these data has both revolutionized the study of the morbid anatomy of the human genome and facilitated "personalized genomics." With approximately 300 new "inherited disease genes" (and approximately 10,000 new mutations) being identified annually, it is pertinent to ask how many "inherited disease genes" there are in the human genome, how many mutations reside within them, and where such lesions are likely to be located? To address these questions, it is necessary not only to reconsider how we define human genes but also to explore notions of gene "essentiality" and "dispensability."Answers to these questions are now emerging from recent novel insights into genome structure and function and through complete genome sequence information derived from multiple individual human genomes. However, a change in focus toward screening functional genomic elements as opposed to genes sensu stricto will be required if we are to capitalize fully on recent technical and conceptual advances and identify new types of disease-associated mutation within noncoding regions remote from the genes whose function they disrupt.
Collapse
Affiliation(s)
- David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, United Kingdom.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Liang Z, Xu M, Teng M, Niu L, Wu J. Coevolution is a short-distance force at the protein interaction level and correlates with the modular organization of protein networks. FEBS Lett 2010; 584:4237-40. [DOI: 10.1016/j.febslet.2010.09.014] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2010] [Revised: 09/04/2010] [Accepted: 09/08/2010] [Indexed: 11/17/2022]
|
29
|
Talavera D, Taylor MS, Thornton JM. The (non)malignancy of cancerous amino acidic substitutions. Proteins 2010; 78:518-29. [PMID: 19787769 DOI: 10.1002/prot.22574] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The process of natural selection acts both on individual organisms within a population and on individual cells within an organism as they develop into cancer. In this work, we have taken a first step toward understanding the differences in selection pressures exerted on the human genome under these disparate circumstances. Focusing on single amino acid substitutions, we have found that cancer-related mutations (CRMs) are frequent in evolutionarily conserved sites, whereas single amino acid polymorphisms (SAPs) tend to appear in sites having a more relaxed evolutionary pressure. Those CRMs classed as cancer driver mutations show greater enrichment for conserved sites than passenger mutations. Consistent with this, driver mutations are enriched for sites annotated as key functional residues and their neighbors, and are more likely to be located on the surface of proteins than expected by chance. Overall the pattern of CRM and polymorphism is remarkably similar, but we do see a clear signal indicative of diversifying selection for disruptive amino acid substitutions in the cancer driver mutations. The ultimate consequence of the appearance of those mutations must be advantageous for the tumor cell, leading to cell population-growth and migration events similar to those seen in natural ecosystems.
Collapse
Affiliation(s)
- David Talavera
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.
| | | | | |
Collapse
|
30
|
Gong S, Blundell TL. Structural and functional restraints on the occurrence of single amino acid variations in human proteins. PLoS One 2010; 5:e9186. [PMID: 20169194 PMCID: PMC2820541 DOI: 10.1371/journal.pone.0009186] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2009] [Accepted: 01/24/2010] [Indexed: 11/19/2022] Open
Abstract
Human genetic variation is the incarnation of diverse evolutionary history, which reflects both selectively advantageous and selectively neutral change. In this study, we catalogue structural and functional features of proteins that restrain genetic variation leading to single amino acid substitutions. Our variation dataset is divided into three categories: i) Mendelian disease-related variants, ii) neutral polymorphisms and iii) cancer somatic mutations. We characterize structural environments of the amino acid variants by the following properties: i) side-chain solvent accessibility, ii) main-chain secondary structure, and iii) hydrogen bonds from a side chain to a main chain or other side chains. To address functional restraints, amino acid substitutions in proteins are examined to see whether they are located at functionally important sites involved in protein-protein interactions, protein-ligand interactions or catalytic activity of enzymes. We also measure the likelihood of amino acid substitutions and the degree of residue conservation where variants occur. We show that various types of variants are under different degrees of structural and functional restraints, which affect their occurrence in human proteome.
Collapse
Affiliation(s)
- Sungsam Gong
- Biocomputing Group, Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | - Tom L. Blundell
- Biocomputing Group, Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
- * E-mail:
| |
Collapse
|
31
|
Chakrabarti S, Panchenko AR. Structural and functional roles of coevolved sites in proteins. PLoS One 2010; 5:e8591. [PMID: 20066038 PMCID: PMC2797611 DOI: 10.1371/journal.pone.0008591] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2009] [Accepted: 10/19/2009] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Understanding the residue covariations between multiple positions in protein families is very crucial and can be helpful for designing protein engineering experiments. These simultaneous changes or residue coevolution allow protein to maintain its overall structural-functional integrity while enabling it to acquire specific functional modifications. Despite the significant efforts in the field there is still controversy in terms of the preferable locations of coevolved residues on different regions of protein molecules, the strength of coevolutionary signal and role of coevolution in functional diversification. METHODOLOGY In this paper we study the scale and nature of residue coevolution in maintaining the overall functionality and structural integrity of proteins. We employed a large scale study to investigate the structural and functional aspects of coevolved residues. We found that the networks representing the coevolutionary residue connections within our dataset are in general of 'small-world' type as they have clustering coefficient values higher than random networks and also show smaller mean shortest path lengths similar and/or lower than random and regular networks. We also found that altogether 11% of functionally important sites are coevolved with any other sites. Active sites are found more frequently to coevolve with any other sites (15%) compared to protein (11%) and ligand (9%) binding sites. Metal binding and active sites are also found to be more frequently coevolved with other metal binding and active sites, respectively. Analysis of the coupling between coevolutionary processes and the spatial distribution of coevolved sites reveals that a high fraction of coevolved sites are located close to each other. Moreover, approximately 80% of charge compensatory substitutions within coevolved sites are found at very close spatial proximity (<or= 5A), pointing to the possible preservation of salt bridges in evolution. CONCLUSION Our findings show that a noticeable fraction of functionally important sites undergo coevolution and also point towards compensatory substitutions as a probable coevolutionary mechanism within spatially proximal coevolved functional sites.
Collapse
Affiliation(s)
- Saikat Chakrabarti
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail: (SC); (ARP)
| | - Anna R. Panchenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail: (SC); (ARP)
| |
Collapse
|
32
|
Baresić A, Hopcroft LEM, Rogers HH, Hurst JM, Martin ACR. Compensated pathogenic deviations: analysis of structural effects. J Mol Biol 2009; 396:19-30. [PMID: 19900462 DOI: 10.1016/j.jmb.2009.11.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2009] [Revised: 10/29/2009] [Accepted: 11/03/2009] [Indexed: 10/20/2022]
Abstract
Pathogenic deviations (PDs) in humans are disease-causing missense mutations. However, in some cases, these disease-associated residues occur as the wild-type residues in functionally equivalent proteins in other species and these cases are termed 'compensated pathogenic deviations' (CPDs). The lack of pathogenicity in a non-human protein is presumed to be explained in most cases by the presence of compensatory mutations, most commonly within the same protein. Identification of structural features of CPDs and detection of specific compensatory events will help us to understand traversal along fitness landscape valleys in protein evolution. We divided mutations listed in the OMIM (Online Mendelian Inheritance in Man) database into PD and CPD data sets and performed two independent analyses: (i) We searched for potential compensatory mutations spatially close to the CPDs and, (ii) using our SAAPdb database, we examined likely structural effects to try to explain why mutations are pathogenic, comparing PDs and CPDs. Our data sets were obtained from a set of 245 human proteins of known structure and contained a total of 2328 mutations of which 453 (from 85 structures) were seen to be compensated in at least one functionally equivalent protein in another (non-human) species. Structural analysis results confirm previous findings that CPDs are, on average, 'milder' in their likely structural effects than uncompensated PDs and tend to be on the protein surface. We also showed that the residues surrounding the CPD residue in the folded protein are more often mutated than the residues surrounding an uncompensated mutation, supporting the hypothesis that compensation is largely a result of structurally local mutations.
Collapse
Affiliation(s)
- Anja Baresić
- Institute of Structural and Molecular Biology, Darwin Building, University College London, Gower Street, London WC1E 6BT, UK
| | | | | | | | | |
Collapse
|
33
|
Azevedo L, Carneiro J, van Asch B, Moleirinho A, Pereira F, Amorim A. Epistatic interactions modulate the evolution of mammalian mitochondrial respiratory complex components. BMC Genomics 2009; 10:266. [PMID: 19523237 PMCID: PMC2711975 DOI: 10.1186/1471-2164-10-266] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2008] [Accepted: 06/13/2009] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND The deleterious effect of a mutation can be reverted by a second-site interacting residue. This is an epistatic compensatory process explaining why mutations that are deleterious in some species are tolerated in phylogenetically related lineages, rendering evident that those mutations are, by all means, only deleterious in the species-specific context. Although an extensive and refined theoretical framework on compensatory evolution does exist, the supporting evidence remains limited, especially for protein models. In this current study, we focused on the molecular mechanism underlying the epistatic compensatory process in mammalian mitochondrial OXPHOS proteins using a combination of in-depth structural and sequence analyses. RESULTS Modeled human structures were used in this study to predict the structural impairment and recovery of deleterious mutations alone and combined with an interacting compensatory partner, respectively. In two cases, COI and COIII, intramolecular interactions between spatially linked residues restore the folding pattern impaired by the deleterious mutation. In a third case, intermolecular contact between mitochondrial CYB and nuclear CYT1 encoded components of the cytochrome bc1 complex are likely to restore protein binding. Moreover, we observed different modes of compensatory evolution that have resulted in either a quasi-simultaneous occurrence of a mutation and corresponding compensatory partner, or in independent occurrences of mutations in distinct lineages that were always preceded by the compensatory site. CONCLUSION Epistatic interactions between individual replacements involving deleterious mutations seems to follow a parsimonious model of evolution in which genomes hold pre-compensating states that subsequently tolerate deleterious mutations. This phenomenon is likely to have been constraining the variability at coevolving sites and shaping the interaction between the mitochondrial and the nuclear genome.
Collapse
Affiliation(s)
- Luísa Azevedo
- IPATIMUP-Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal
| | - João Carneiro
- IPATIMUP-Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal
- Faculty of Sciences of the University of Porto, Porto, Portugal
| | - Barbara van Asch
- IPATIMUP-Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal
- Faculty of Sciences of the University of Porto, Porto, Portugal
| | - Ana Moleirinho
- IPATIMUP-Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal
| | - Filipe Pereira
- IPATIMUP-Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal
- Faculty of Sciences of the University of Porto, Porto, Portugal
| | - António Amorim
- IPATIMUP-Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal
- Faculty of Sciences of the University of Porto, Porto, Portugal
| |
Collapse
|
34
|
Jakubowska A, Korona R. Lack of evolutionary conservation at positions important for thermal stability in the yeast ODCase protein. Mol Biol Evol 2009; 26:1431-4. [PMID: 19349645 DOI: 10.1093/molbev/msp066] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Mutations destabilizing the spatial structure of proteins can persist in populations if they are fixed by drift or compensated by other mutations. The prevalence and dynamics of these processes remain largely unrecognized. A suitable target to screen for both deleterious and compensatory mutations is the URA3 gene in yeast. We identified 13 positions in which a single missense substitution causes substantially strong thermal sensitivity. We then applied mild mutagenesis resulting in roughly one base substitution per gene and found that only reversions to an original amino acid can compensate for the thermal instability. However, the 13 positions are not visibly conserved across 53 species of Ascomycota, despite that the gene product is an enzyme of stable function and high efficiency. This shows how much fitness penalties for amino acid substitutions are background dependent, underscoring the role of complex intragenic interactions in the evolution of proteins.
Collapse
|
35
|
Pazos F, Valencia A. Protein co-evolution, co-adaptation and interactions. EMBO J 2008; 27:2648-55. [PMID: 18818697 PMCID: PMC2556093 DOI: 10.1038/emboj.2008.189] [Citation(s) in RCA: 124] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2008] [Accepted: 08/28/2008] [Indexed: 01/28/2023] Open
Abstract
Co-evolution has an important function in the evolution of species and it is clearly manifested in certain scenarios such as host–parasite and predator–prey interactions, symbiosis and mutualism. The extrapolation of the concepts and methodologies developed for the study of species co-evolution at the molecular level has prompted the development of a variety of computational methods able to predict protein interactions through the characteristics of co-evolution. Particularly successful have been those methods that predict interactions at the genomic level based on the detection of pairs of protein families with similar evolutionary histories (similarity of phylogenetic trees: mirrortree). Future advances in this field will require a better understanding of the molecular basis of the co-evolution of protein families. Thus, it will be important to decipher the molecular mechanisms underlying the similarity observed in phylogenetic trees of interacting proteins, distinguishing direct specific molecular interactions from other general functional constraints. In particular, it will be important to separate the effects of physical interactions within protein complexes (‘co-adaptation') from other forces that, in a less specific way, can also create general patterns of co-evolution.
Collapse
Affiliation(s)
- Florencio Pazos
- Structure of Macromolecules, Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC), Madrid, Spain
| | | |
Collapse
|
36
|
Juan D, Pazos F, Valencia A. Co-evolution and co-adaptation in protein networks. FEBS Lett 2008; 582:1225-30. [DOI: 10.1016/j.febslet.2008.02.017] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2008] [Accepted: 02/08/2008] [Indexed: 10/22/2022]
|
37
|
Musumeci MA, Arakaki AK, Rial DV, Catalano-Dupuy DL, Ceccarelli EA. Modulation of the enzymatic efficiency of ferredoxin-NADP(H) reductase by the amino acid volume around the catalytic site. FEBS J 2008; 275:1350-66. [PMID: 18279389 DOI: 10.1111/j.1742-4658.2008.06298.x] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Ferredoxin (flavodoxin)-NADP(H) reductases (FNRs) are ubiquitous flavoenzymes that deliver NADPH or low-potential one-electron donors (ferredoxin, flavodoxin, adrenodoxin) to redox-based metabolic reactions in plastids, mitochondria and bacteria. Plastidic FNRs are quite efficient reductases. In contrast, FNRs from organisms possessing a heterotrophic metabolism or anoxygenic photosynthesis display turnover numbers 20- to 100-fold lower than those of their plastidic and cyanobacterial counterparts. Several structural features of these enzymes have yet to be explained. The residue Y308 in pea FNR is stacked nearly parallel to the re-face of the flavin and is highly conserved amongst members of the family. By computing the relative free energy for the lumiflavin-phenol pair at different angles with the relative position found for Y308 in pea FNR, it can be concluded that this amino acid is constrained against the isoalloxazine. This effect is probably caused by amino acids C266 and L268, which face the other side of this tyrosine. Simple and double FNR mutants of these amino acids were obtained and characterized. It was observed that a decrease or increase in the amino acid volume resulted in a decrease in the catalytic efficiency of the enzyme without altering the protein structure. Our results provide experimental evidence that the volume of these amino acids participates in the fine-tuning of the catalytic efficiency of the enzyme.
Collapse
Affiliation(s)
- Matías A Musumeci
- Molecular Biology Division, Instituto de Biología Molecular y Celular de Rosario (IBR), Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Argentina
| | | | | | | | | |
Collapse
|
38
|
Paulander W, Maisnier-Patin S, Andersson DI. Multiple mechanisms to ameliorate the fitness burden of mupirocin resistance in Salmonella typhimurium. Mol Microbiol 2007; 64:1038-48. [PMID: 17501926 DOI: 10.1111/j.1365-2958.2007.05713.x] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
We examined how the fitness costs of mupirocin resistance caused by mutations in the chromosomal isoleucyl-tRNA synthetase gene (ileS) can be ameliorated. Mupirocin-resistant mutants were isolated and four different, resistance-conferring point mutations in the chromosomal ileS gene were identified. Fifty independent lineages of the low-fitness, resistant mutants were serially passaged to evolve compensated mutants with increased fitness. In 34/50 of the evolved lineages, the increase in fitness resulted from additional point mutations in isoleucine tRNA synthetase (IleRS). Measurements in vitro of the kinetics of aminoacylation of wild-type and mutant enzymes showed that resistant IleRS had a reduced rate of aminoacylation due to altered interactions with both tRNAIle and ATP. The intragenic compensatory mutations improved IleRS kinetics towards the wild-type enzyme, thereby restoring bacterial fitness. Seven of the 16 lineages that lacked second-site compensatory mutations in ileS, showed an increase in ileS gene dosage, suggesting that an increased level of defective IleRS compensate for the decrease in aminoacylation activity. Our findings show that the fitness costs of ileS mutations conferring mupirocin resistance can be reduced by several types of mechanisms that may contribute to the stability of mupirocin resistance in clinical settings.
Collapse
Affiliation(s)
- Wilhelm Paulander
- Department of Bacteriology, Swedish Institute for Infectious Disease Control and Microbiology, Tumor and Cell Biology Center, Karolinska Institute, S-171 82 Solna, Sweden
| | | | | |
Collapse
|
39
|
Evolutionary anatomies of positions and types of disease-associated and neutral amino acid mutations in the human genome. BMC Genomics 2006; 7:306. [PMID: 17144929 PMCID: PMC1702542 DOI: 10.1186/1471-2164-7-306] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2006] [Accepted: 12/05/2006] [Indexed: 02/02/2023] Open
Abstract
Background Amino acid mutations in a large number of human proteins are known to be associated with heritable genetic disease. These disease-associated mutations (DAMs) are known to occur predominantly in positions essential to the structure and function of the proteins. Here, we examine how the relative perpetuation and conservation of amino acid positions modulate the genome-wide patterns of 8,627 human disease-associated mutations (DAMs) reported in 541 genes. We compare these patterns with 5,308 non-synonymous Single Nucleotide Polymorphisms (nSNPs) in 2,592 genes from primary SNP resources. Results The abundance of DAMs shows a negative relationship with the evolutionary rate of the amino acid positions harboring them. An opposite trend describes the distribution of nSNPs. DAMs are also preferentially found in the amino acid positions that are retained (or present) in multiple vertebrate species, whereas the nSNPs are over-abundant in the positions that have been lost (or absent) in the non-human vertebrates. These observations are consistent with the effect of purifying selection on natural variation, which also explains the existence of lower minor nSNP allele frequencies at highly-conserved amino acid positions. The biochemical severity of the inter-specific amino acid changes is also modulated by natural selection, with the fast-evolving positions containing more radical amino acid differences among species. Similarly, DAMs associated with early-onset diseases are more radical than those associated with the late-onset diseases. A small fraction of DAMs (10%) overlap with the amino acid differences between species within the same position, but are biochemically the most conservative group of amino acid differences in our datasets. Overlapping DAMs are found disproportionately in fast-evolving amino acid positions, which, along with the conservative nature of the amino acid changes, may have allowed some of them to escape natural selection until compensatory changes occur. Conclusion The consistency and predictability of genome-wide patterns of disease- associated and neutral amino acid variants reported here underscores the importance of the consideration of evolutionary rates of amino acid positions in clinical and population genetic analyses aimed at understanding the nature and fate of disease-associated and neutral population variation. Establishing such general patterns is an early step in efforts to diagnose the pathogenic potentials of novel amino acid mutations.
Collapse
|