1
|
Mandal S, Faizan S, Raghavendra NM, Kumar BRP. Molecular dynamics articulated multilevel virtual screening protocol to discover novel dual PPAR α/γ agonists for anti-diabetic and metabolic applications. Mol Divers 2023; 27:2605-2631. [PMID: 36437421 DOI: 10.1007/s11030-022-10571-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 11/11/2022] [Indexed: 11/29/2022]
Abstract
PPARα and PPARγ are isoforms of the nuclear receptor superfamily which regulate glucose and lipid metabolism. Activation of PPARα and PPARγ receptors by exogenous ligands could transactivate the expression of PPARα and PPARγ-dependent genes, and thereby, metabolic pathways get triggered, which are helpful to ameliorate treatment for the type 2 diabetes mellitus, and related metabolic complications. Herein, by understanding the structural requirements for ligands to activate PPARα and PPARγ proteins, we developed a multilevel in silico-based virtual screening protocol to identify novel chemical scaffolds and further design and synthesize two distinct series of glitazone derivatives with advantages over the classical PPARα and PPARγ agonists. Moreover, the synthesized compounds were biologically evaluated for PPARα and PPARγ transactivation potency from nuclear extracts of 3T3-L1 cell. Furthermore, glucose uptake assay on L6 cells confirmed the potency of the synthesized compounds toward glucose regulation. Percentage lipid-lowering potency was also assessed through triglyceride estimate from 3T3-L1 cell extracts. Results suggested the ligand binding mode was in orthosteric fashion as similar to classical agonists. Thus molecular docking and molecular dynamics (MD) simulation experiments were executed to validate our hypothesis on mode of ligands binding and protein complex stability. Altogether, the present study developed a newer protocol for virtual screening and enables to design of novel glitazones for activation of PPARα and PPARγ-mediated pathways. Accordingly, present approach will offer benefit as a therapeutic strategy against type 2 diabetes mellitus and associated metabolic complications.
Collapse
Affiliation(s)
- Subhankar Mandal
- Department of Pharmaceutical Chemistry, JSS College of Pharmacy, S. S. Nagar, Mysuru, Karnataka, 570015, India
- JSS Academy of Higher Education and Research, Mysuru, Karnataka, 570015, India
| | - Syed Faizan
- Department of Pharmaceutical Chemistry, JSS College of Pharmacy, S. S. Nagar, Mysuru, Karnataka, 570015, India
- JSS Academy of Higher Education and Research, Mysuru, Karnataka, 570015, India
| | | | - B R Prashantha Kumar
- Department of Pharmaceutical Chemistry, JSS College of Pharmacy, S. S. Nagar, Mysuru, Karnataka, 570015, India.
- JSS Academy of Higher Education and Research, Mysuru, Karnataka, 570015, India.
| |
Collapse
|
2
|
Horne J, Shukla D. Recent Advances in Machine Learning Variant Effect Prediction Tools for Protein Engineering. Ind Eng Chem Res 2022; 61:6235-6245. [DOI: 10.1021/acs.iecr.1c04943] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Affiliation(s)
- Jesse Horne
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana−Champaign, Champaign, Illinois 61801, United States
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana−Champaign, Champaign, Illinois 61801, United States
- Department of Bioengineering, University of Illinois Urbana−Champaign, Champaign, Illinois 61801, United States
- Department of Plant Biology, University of Illinois Urbana−Champaign, Champaign, Illinois 61801, United States
- Cancer Center at Illinois, University of Illinois Urbana−Champaign, Champaign, Illinois 61801, United States
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana−Champaign, Champaign, Illinois 61801, United States
| |
Collapse
|
3
|
Ose NJ, Butler BM, Kumar A, Kazan IC, Sanderford M, Kumar S, Ozkan SB. Dynamic coupling of residues within proteins as a mechanistic foundation of many enigmatic pathogenic missense variants. PLoS Comput Biol 2022; 18:e1010006. [PMID: 35389981 PMCID: PMC9017885 DOI: 10.1371/journal.pcbi.1010006] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 04/19/2022] [Accepted: 03/09/2022] [Indexed: 01/07/2023] Open
Abstract
Many pathogenic missense mutations are found in protein positions that are neither well-conserved nor fall in any known functional domains. Consequently, we lack any mechanistic underpinning of dysfunction caused by such mutations. We explored the disruption of allosteric dynamic coupling between these positions and the known functional sites as a possible mechanism for pathogenesis. In this study, we present an analysis of 591 pathogenic missense variants in 144 human enzymes that suggests that allosteric dynamic coupling of mutated positions with known active sites is a plausible biophysical mechanism and evidence of their functional importance. We illustrate this mechanism in a case study of β-Glucocerebrosidase (GCase) in which a vast majority of 94 sites harboring Gaucher disease-associated missense variants are located some distance away from the active site. An analysis of the conformational dynamics of GCase suggests that mutations on these distal sites cause changes in the flexibility of active site residues despite their distance, indicating a dynamic communication network throughout the protein. The disruption of the long-distance dynamic coupling caused by missense mutations may provide a plausible general mechanistic explanation for biological dysfunction and disease.
Collapse
Affiliation(s)
- Nicholas J. Ose
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - Brandon M. Butler
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - Avishek Kumar
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - I. Can Kazan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - Maxwell Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
- Department of Biology, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
- Department of Biology, Temple University, Philadelphia, Pennsylvania, United States of America
- Center for Genomic Medicine Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - S. Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| |
Collapse
|
4
|
Serrano C, Teixeira CSS, Cooper DN, Carneiro J, Lopes-Marques M, Stenson PD, Amorim A, Prata MJ, Sousa SF, Azevedo L. Compensatory epistasis explored by molecular dynamics simulations. Hum Genet 2021; 140:1329-1342. [PMID: 34173867 DOI: 10.1007/s00439-021-02307-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 06/20/2021] [Indexed: 11/24/2022]
Abstract
A non-negligible proportion of human pathogenic variants are known to be present as wild type in at least some non-human mammalian species. The standard explanation for this finding is that molecular mechanisms of compensatory epistasis can alleviate the mutations' otherwise pathogenic effects. Examples of compensated variants have been described in the literature but the interacting residue(s) postulated to play a compensatory role have rarely been ascertained. In this study, the examination of five human X-chromosomally encoded proteins (FIX, GLA, HPRT1, NDP and OTC) allowed us to identify several candidate compensated variants. Strong evidence for a compensated/compensatory pair of amino acids in the coagulation FIXa protein (involving residues 270 and 271) was found in a variety of mammalian species. Both amino acid residues are located within the 60-loop, spatially close to the 39-loop that performs a key role in coagulation serine proteases. To understand the nature of the underlying interactions, molecular dynamics simulations were performed. The predicted conformational change in the 39-loop consequent to the Glu270Lys substitution (associated with hemophilia B) appears to impair the protein's interaction with its substrate but, importantly, such steric hindrance is largely mitigated in those proteins that carry the compensatory residue (Pro271) at the neighboring amino acid position.
Collapse
Affiliation(s)
- Catarina Serrano
- i3S, Instituto de Investigação e Inovação em Saúde, Population Genetics and Evolution Group, Universidade do Porto, Rua Alfredo Allen 208, 4200-135, Porto, Portugal
- IPATIMUP-Institute of Molecular Pathology and Immunology, University of Porto, Rua Júlio Amaral de Carvalho 45, 4200-135, Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua Do Campo Alegre, s/n, 4169-007, Porto, Portugal
| | - Carla S S Teixeira
- UCIBIO/REQUIMTE, BioSIM, Departamento de Biomedicina, Faculdade de Medicina da Universidade do Porto, Porto, Portugal
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - João Carneiro
- CIIMAR, Interdisciplinary Centre of Marine and Environmental Research, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208, Matosinhos, Portugal
| | - Mónica Lopes-Marques
- i3S, Instituto de Investigação e Inovação em Saúde, Population Genetics and Evolution Group, Universidade do Porto, Rua Alfredo Allen 208, 4200-135, Porto, Portugal
- IPATIMUP-Institute of Molecular Pathology and Immunology, University of Porto, Rua Júlio Amaral de Carvalho 45, 4200-135, Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua Do Campo Alegre, s/n, 4169-007, Porto, Portugal
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - António Amorim
- i3S, Instituto de Investigação e Inovação em Saúde, Population Genetics and Evolution Group, Universidade do Porto, Rua Alfredo Allen 208, 4200-135, Porto, Portugal
- IPATIMUP-Institute of Molecular Pathology and Immunology, University of Porto, Rua Júlio Amaral de Carvalho 45, 4200-135, Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua Do Campo Alegre, s/n, 4169-007, Porto, Portugal
| | - Maria J Prata
- i3S, Instituto de Investigação e Inovação em Saúde, Population Genetics and Evolution Group, Universidade do Porto, Rua Alfredo Allen 208, 4200-135, Porto, Portugal
- IPATIMUP-Institute of Molecular Pathology and Immunology, University of Porto, Rua Júlio Amaral de Carvalho 45, 4200-135, Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua Do Campo Alegre, s/n, 4169-007, Porto, Portugal
| | - Sérgio F Sousa
- UCIBIO/REQUIMTE, BioSIM, Departamento de Biomedicina, Faculdade de Medicina da Universidade do Porto, Porto, Portugal.
| | - Luísa Azevedo
- i3S, Instituto de Investigação e Inovação em Saúde, Population Genetics and Evolution Group, Universidade do Porto, Rua Alfredo Allen 208, 4200-135, Porto, Portugal.
- IPATIMUP-Institute of Molecular Pathology and Immunology, University of Porto, Rua Júlio Amaral de Carvalho 45, 4200-135, Porto, Portugal.
- Department of Biology, Faculty of Sciences, University of Porto, Rua Do Campo Alegre, s/n, 4169-007, Porto, Portugal.
| |
Collapse
|
5
|
Campitelli P, Swint-Kruse L, Ozkan SB. Substitutions at Nonconserved Rheostat Positions Modulate Function by Rewiring Long-Range, Dynamic Interactions. Mol Biol Evol 2021; 38:201-214. [PMID: 32780837 PMCID: PMC7783170 DOI: 10.1093/molbev/msaa202] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Amino acid substitutions at nonconserved protein positions can have noncanonical and "long-distance" outcomes on protein function. Such outcomes might arise from changes in the internal protein communication network, which is often accompanied by changes in structural flexibility. To test this, we calculated flexibilities and dynamic coupling for positions in the linker region of the lactose repressor protein. This region contains nonconserved positions for which substitutions alter DNA-binding affinity. We first chose to study 11 substitutions at position 52. In computations, substitutions showed long-range effects on flexibilities of DNA-binding positions, and the degree of flexibility change correlated with experimentally measured changes in DNA binding. Substitutions also altered dynamic coupling to DNA-binding positions in a manner that captured other experimentally determined functional changes. Next, we broadened calculations to consider the dynamic coupling between 17 linker positions and the DNA-binding domain. Experimentally, these linker positions exhibited a wide range of substitution outcomes: Four conserved positions tolerated hardly any substitutions ("toggle"), ten nonconserved positions showed progressive changes from a range of substitutions ("rheostat"), and three nonconserved positions tolerated almost all substitutions ("neutral"). In computations with wild-type lactose repressor protein, the dynamic couplings between the DNA-binding domain and these linker positions showed varied degrees of asymmetry that correlated with the observed toggle/rheostat/neutral substitution outcomes. Thus, we propose that long-range and noncanonical substitutions outcomes at nonconserved positions arise from rewiring long-range communication among functionally important positions. Such calculations might enable predictions for substitution outcomes at a range of nonconserved positions.
Collapse
Affiliation(s)
- Paul Campitelli
- Department of Physics, Center for Biological Physics, Arizona State University, Tempe, AZ
| | - Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, KS
| | - S Banu Ozkan
- Department of Physics, Center for Biological Physics, Arizona State University, Tempe, AZ
| |
Collapse
|
6
|
An K, Zhou JB, Xiong Y, Han W, Wang T, Ye ZQ, Wu YD. Computational Studies of the Structural Basis of Human RPS19 Mutations Associated With Diamond-Blackfan Anemia. Front Genet 2021; 12:650897. [PMID: 34108988 PMCID: PMC8181406 DOI: 10.3389/fgene.2021.650897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 04/28/2021] [Indexed: 11/13/2022] Open
Abstract
Diamond-Blackfan Anemia (DBA) is an inherited rare disease characterized with severe pure red cell aplasia, and it is caused by the defective ribosome biogenesis stemming from the impairment of ribosomal proteins. Among all DBA-associated ribosomal proteins, RPS19 affects most patients and carries most DBA mutations. Revealing how these mutations lead to the impairment of RPS19 is highly demanded for understanding the pathogenesis of DBA, but a systematic study is currently lacking. In this work, based on the complex structure of human ribosome, we comprehensively studied the structural basis of DBA mutations of RPS19 by using computational methods. Main structure elements and five conserved surface patches involved in RPS19-18S rRNA interaction were identified. We further revealed that DBA mutations would destabilize RPS19 through disrupting the hydrophobic core or breaking the helix, or perturb the RPS19-18S rRNA interaction through destroying hydrogen bonds, introducing steric hindrance effect, or altering surface electrostatic property at the interface. Moreover, we trained a machine-learning model to predict the pathogenicity of all possible RPS19 mutations. Our work has laid a foundation for revealing the pathogenesis of DBA from the structural perspective.
Collapse
Affiliation(s)
- Ke An
- State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen, China
| | - Jing-Bo Zhou
- State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen, China
| | - Yao Xiong
- State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen, China
| | - Wei Han
- State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen, China
| | - Tao Wang
- Shenzhen Bay Laboratory, Shenzhen, China
| | - Zhi-Qiang Ye
- State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen, China
- Shenzhen Bay Laboratory, Shenzhen, China
| | - Yun-Dong Wu
- State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen, China
- Shenzhen Bay Laboratory, Shenzhen, China
- College of Chemistry and Molecular Engineering, Peking University, Beijing, China
| |
Collapse
|
7
|
Campitelli P, Modi T, Kumar S, Ozkan SB. The Role of Conformational Dynamics and Allostery in Modulating Protein Evolution. Annu Rev Biophys 2020; 49:267-288. [PMID: 32075411 DOI: 10.1146/annurev-biophys-052118-115517] [Citation(s) in RCA: 73] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Advances in sequencing techniques and statistical methods have made it possible not only to predict sequences of ancestral proteins but also to identify thousands of mutations in the human exome, some of which are disease associated. These developments have motivated numerous theories and raised many questions regarding the fundamental principles behind protein evolution, which have been traditionally investigated horizontally using the tip of the phylogenetic tree through comparative studies of extant proteins within a family. In this article, we review a vertical comparison of the modern and resurrected ancestral proteins. We focus mainly on the dynamical properties responsible for a protein's ability to adapt new functions in response to environmental changes. Using the Dynamic Flexibility Index and the Dynamic Coupling Index to quantify the relative flexibility and dynamic coupling at a site-specific, single-amino-acid level, we provide evidence that the migration of hinges, which are often functionally critical rigid sites, is a mechanism through which proteins can rapidly evolve. Additionally, we show that disease-associated mutations in proteins often result in flexibility changes even at positions distal from mutational sites, particularly in the modulation of active site dynamics.
Collapse
Affiliation(s)
- Paul Campitelli
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona 85281, USA; , ,
| | - Tushar Modi
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona 85281, USA; , ,
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania 19122, USA; .,Department of Biology, Temple University, Philadelphia, Pennsylvania 19122, USA.,Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - S Banu Ozkan
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona 85281, USA; , ,
| |
Collapse
|
8
|
Wong KC, Yan S, Lin Q, Li X, Peng C. Deleterious Non-Synonymous Single Nucleotide Polymorphism Predictions on Human Transcription Factors. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:327-333. [PMID: 30475727 DOI: 10.1109/tcbb.2018.2882548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Transcription factors (TFs) are the major components of human gene regulation. In particular, they bind onto specific DNA sequences and regulate neighborhood genes in different tissues at different developmental stages. Non-synonymous single nucleotide polymorphisms on its protein-coding sequences could result in undesired consequences in human. Therefore, it is necessary to develop methods for predicting any abnormality among those non-synonymous single nucleotide polymorphisms. To address it, we have developed and compared different strategies to predict deleterious non-synonymous single nucleotide polymorphisms (also known as missense mutations) on the protein-coding sequences of human TFs. Taking advantage of evolutionary conservation signals, we have developed and compared different classifiers with different feature sets as computed from different evolutionarily related sequence collections. The results indicate that the classic ensemble algorithm, Adaboost with decision stumps, with orthologous sequence collection, has performed the best (namely, TFmedic). We have further compared TFmedic with other state-of-the-arts methods (i.e., PolyPhen-2 and SIFT) on PolyPhen-2's own datasets, demonstrating that TFmedic can outperform the others. As applications, we have further applied TFmedic to all possible missense mutations on all human transcription factors; the proteome-wide results reveal interesting insights, consistent with the existing physiochemical knowledge. A case study with the actual 3D structure is conducted, revealing how TFmedic can be contributed to protein-DNA binding complex studies.
Collapse
|
9
|
Kim D, Han SK, Lee K, Kim I, Kong J, Kim S. Evolutionary coupling analysis identifies the impact of disease-associated variants at less-conserved sites. Nucleic Acids Res 2019; 47:e94. [PMID: 31199866 PMCID: PMC6895274 DOI: 10.1093/nar/gkz536] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 05/03/2019] [Accepted: 06/05/2019] [Indexed: 12/20/2022] Open
Abstract
Genome-wide association studies have discovered a large number of genetic variants in human patients with the disease. Thus, predicting the impact of these variants is important for sorting disease-associated variants (DVs) from neutral variants. Current methods to predict the mutational impacts depend on evolutionary conservation at the mutation site, which is determined using homologous sequences and based on the assumption that variants at well-conserved sites have high impacts. However, many DVs at less-conserved but functionally important sites cannot be predicted by the current methods. Here, we present a method to find DVs at less-conserved sites by predicting the mutational impacts using evolutionary coupling analysis. Functionally important and evolutionarily coupled sites often have compensatory variants on cooperative sites to avoid loss of function. We found that our method identified known intolerant variants in a diverse group of proteins. Furthermore, at less-conserved sites, we identified DVs that were not identified using conservation-based methods. These newly identified DVs were frequently found at protein interaction interfaces, where species-specific mutations often alter interaction specificity. This work presents a means to identify less-conserved DVs and provides insight into the relationship between evolutionarily coupled sites and human DVs.
Collapse
Affiliation(s)
- Donghyo Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Seong Kyu Han
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Kwanghwan Lee
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Inhae Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - JungHo Kong
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Sanguk Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| |
Collapse
|
10
|
Patel R, Kumar S. On estimating evolutionary probabilities of population variants. BMC Evol Biol 2019; 19:133. [PMID: 31238981 PMCID: PMC6593550 DOI: 10.1186/s12862-019-1455-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Accepted: 06/06/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The evolutionary probability (EP) of an allele in a DNA or protein sequence predicts evolutionarily permissible (ePerm; EP ≥ 0.05) and forbidden (eForb; EP < 0.05) variants. EP of an allele represents an independent evolutionary expectation of observing an allele in a population based solely on the long-term substitution patterns captured in a multiple sequence alignment. In the neutral theory, EP and population frequencies can be compared to identify neutral and non-neutral alleles. This approach has been used to discover candidate adaptive polymorphisms in humans, which are eForbs segregating with high frequencies. The original method to compute EP requires the evolutionary relationships and divergence times of species in the sequence alignment (a timetree), which are not known with certainty for most datasets. This requirement impedes a general use of the original EP formulation. Here, we present an approach in which the phylogeny and times are inferred from the sequence alignment itself prior to the EP calculation. We evaluate if the modified EP approach produces results that are similar to those from the original method. RESULTS We compared EP estimates from the original and the modified approaches by using more than 18,000 protein sequence alignments containing orthologous sequences from 46 vertebrate species. For the original EP calculations, we used species relationships from UCSC and divergence times from TimeTree web resource, and the resulting EP estimates were considered to be the ground truth. We found that the modified approaches produced reasonable EP estimates for HGMD disease missense variant and 1000 Genomes Project missense variant datasets. Our results showed that reliable estimates of EP can be obtained without a priori knowledge of the sequence phylogeny and divergence times. We also found that, in order to obtain robust EP estimates, it is important to assemble a dataset with many sequences, sampling from a diversity of species groups. CONCLUSION We conclude that the modified EP approach will be generally applicable for alignments and enable the detection of potentially neutral, deleterious, and adaptive alleles in populations.
Collapse
Affiliation(s)
- Ravi Patel
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA. .,Department of Biology, Temple University, Philadelphia, PA, 19122, USA. .,Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia.
| |
Collapse
|
11
|
Penning TM. AKR1C3 (type 5 17β-hydroxysteroid dehydrogenase/prostaglandin F synthase): Roles in malignancy and endocrine disorders. Mol Cell Endocrinol 2019; 489:82-91. [PMID: 30012349 PMCID: PMC6422768 DOI: 10.1016/j.mce.2018.07.002] [Citation(s) in RCA: 62] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 06/12/2018] [Accepted: 07/03/2018] [Indexed: 12/11/2022]
Abstract
Aldo-Keto-Reductase 1C3 (type 5 17β-hydroxysteroid dehydrogenase (HSD)/prostaglandin (PG) F2α synthase) is the only 17β-HSD that is not a short-chain dehydrogenase/reductase. By acting as a 17-ketosteroid reductase, AKR1C3 produces potent androgens in peripheral tissues which activate the androgen receptor (AR) or act as substrates for aromatase. AKR1C3 is implicated in the production of androgens in castration-resistant prostate cancer (CRPC) and polycystic ovarian syndrome; and is implicated in the production of aromatase substrates in breast cancer. By acting as an 11-ketoprostaglandin reductase, AKR1C3 generates 11β-PGF2α to activate the FP receptor and deprives peroxisome proliferator activator receptorγ of its putative PGJ2 ligands. These growth stimulatory signals implicate AKR1C3 in non-hormonal dependent malignancies e.g. acute myeloid leukemia (AML). AKR1C3 moonlights by acting as a co-activator of the AR and stabilizes ubiquitin ligases. AKR1C3 inhibitors have been used clinically for CRPC and AML and can be used to probe its pluripotency.
Collapse
Affiliation(s)
- Trevor M Penning
- Department of Systems Pharmacology and Translational Therapeutics and Center of Excellence in Environmental Toxicology, Perelman School of Medicine, University of Pennsylvania, 1315 BRBII/III 421 Curie Blvd, Philadelphia, PA, 19104, USA.
| |
Collapse
|
12
|
Novel, rare and common pathogenic variants in the CFTR gene screened by high-throughput sequencing technology and predicted by in silico tools. Sci Rep 2019; 9:6234. [PMID: 30996306 PMCID: PMC6470152 DOI: 10.1038/s41598-019-42404-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Accepted: 03/28/2019] [Indexed: 12/13/2022] Open
Abstract
Cystic fibrosis (CF) is caused by ~300 pathogenic CFTR variants. The heterogeneity of which, challenges molecular diagnosis and precision medicine approaches in CF. Our objective was to identify CFTR variants through high-throughput sequencing (HTS) and to predict the pathogenicity of novel variants through in 8 silico tools. Two guidelines were followed to deduce the pathogenicity. A total of 169 CF patients had genomic DNA submitted to a Targeted Gene Sequencing and we identified 63 variants (three patients had three variants). The most frequent alleles were: F508del (n = 192), G542* (n = 26), N1303K (n = 11), R1162* and R334W (n = 9). The screened variants were classified as follows: 41 - pathogenic variants [classified as (I) n = 23, (II) n = 6, (III) n = 1, (IV) n = 6, (IV/V) n = 1 and (VI) n = 4]; 14 - variants of uncertain significance; and seven novel variants. To the novel variants we suggested the classification of 6b-16 exon duplication, G646* and 3557delA as Class I. There was concordance among the predictors as likely pathogenic for L935Q, cDNA.5808T>A and I1427I. Also, Y325F presented two discordant results among the predictors. HTS and in silico analysis can identify pathogenic CFTR variants and will open the door to integration of precision medicine into routine clinical practice in the near future.
Collapse
|
13
|
Penning TM, Wangtrakuldee P, Auchus RJ. Structural and Functional Biology of Aldo-Keto Reductase Steroid-Transforming Enzymes. Endocr Rev 2019; 40:447-475. [PMID: 30137266 PMCID: PMC6405412 DOI: 10.1210/er.2018-00089] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Accepted: 06/05/2018] [Indexed: 12/19/2022]
Abstract
Aldo-keto reductases (AKRs) are monomeric NAD(P)(H)-dependent oxidoreductases that play pivotal roles in the biosynthesis and metabolism of steroids in humans. AKR1C enzymes acting as 3-ketosteroid, 17-ketosteroid, and 20-ketosteroid reductases are involved in the prereceptor regulation of ligands for the androgen, estrogen, and progesterone receptors and are considered drug targets to treat steroid hormone-dependent malignancies and endocrine disorders. In contrast, AKR1D1 is the only known steroid 5β-reductase and is essential for bile-acid biosynthesis, the generation of ligands for the farnesoid X receptor, and the 5β-dihydrosteroids that have their own biological activity. In this review we discuss the crystal structures of these AKRs, their kinetic and catalytic mechanisms, AKR genomics (gene expression, splice variants, polymorphic variants, and inherited genetic deficiencies), distribution in steroid target tissues, roles in steroid hormone action and disease, and inhibitor design.
Collapse
Affiliation(s)
- Trevor M Penning
- Center of Excellence in Environmental Toxicology, Perelman School of Medicine University of Pennsylvania, Philadelphia, Pennsylvania.,Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine University of Pennsylvania, Philadelphia, Pennsylvania
| | - Phumvadee Wangtrakuldee
- Center of Excellence in Environmental Toxicology, Perelman School of Medicine University of Pennsylvania, Philadelphia, Pennsylvania.,Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine University of Pennsylvania, Philadelphia, Pennsylvania
| | - Richard J Auchus
- Division of Metabolism, Endocrinology, and Diabetes, Department of Internal Medicine and Department of Pharmacology, University of Michigan School of Medicine, Ann Arbor, Michigan
| |
Collapse
|
14
|
Liu L, Sanderford MD, Patel R, Chandrashekar P, Gibson G, Kumar S. Biological relevance of computationally predicted pathogenicity of noncoding variants. Nat Commun 2019; 10:330. [PMID: 30659175 PMCID: PMC6338804 DOI: 10.1038/s41467-018-08270-y] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Accepted: 12/19/2018] [Indexed: 11/15/2022] Open
Abstract
Computational prediction of the phenotypic propensities of noncoding single nucleotide variants typically combines annotation of genomic, functional and evolutionary attributes into a single score. Here, we evaluate if the claimed excellent accuracies of these predictions translate into high rates of success in addressing questions important in biological research, such as fine mapping causal variants, distinguishing pathogenic allele(s) at a given position, and prioritizing variants for genetic risk assessment. A significant disconnect is found to exist between the statistical modelling and biological performance of predictive approaches. We discuss fundamental reasons underlying these deficiencies and suggest that future improvements of computational predictions need to address confounding of allelic, positional and regional effects as well as imbalance of the proportion of true positive variants in candidate lists. Researchers can make use of a variety of computational tools to prioritize genetic variants and predict their pathogenicity. Here, the authors evaluate the performance of six of these tools in three typical biological tasks and find generally low concordance of predictions and experimental confirmation.
Collapse
Affiliation(s)
- Li Liu
- College of Health Solutions, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Maxwell D Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - Ravi Patel
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA.,Department of Biology, Temple University, Philadelphia, PA, USA
| | - Pramod Chandrashekar
- College of Health Solutions, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Greg Gibson
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA.
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA. .,Department of Biology, Temple University, Philadelphia, PA, USA.
| |
Collapse
|
15
|
Abstract
Genetic differences between species and within populations are two sides of the same coin under the neutral theory of molecular evolution. This theory posits that a vast majority of evolutionary substitutions, which appear as differences between species, are (nearly) neutral, that is, these substitutions are permitted without a significantly adverse impact on a species' survival. We refer to them as evolutionarily permissible (ePerm) variation. Evolutionary permissibility of any possible variant can be inferred from multispecies sequence alignments by applying sophisticated statistical methods to the evolutionary tree of species. Here, we explore the evolutionary permissibility of amino acid variants associated with genetic diseases and those observed in personal exomes. Consistent with the predictions of the neutral theory, disease associated amino acid variants are rarely ePerm, much more biochemically radical, and found predominantly at more conserved positions than their non-disease counterparts. Only 10% of amino acid mutations are ePerm, but these variants rise to become two-thirds of all substitutions in the human lineage (a 6-fold enrichment). In contrast, only a minority of the variants in a personal exome are ePerm, a seemingly counterintuitive pattern that results from a combination of mutational and evolutionary processes that are, in fact, broadly consistent with the neutral theory. Evolutionarily forbidden variants outnumber detrimental variants in individual exomes and may play an underappreciated role in protecting against disease. We discuss these observations and conclude that the long-term evolutionary history of species can illuminate functional biomedical properties of variation present in personal exomes.
Collapse
Affiliation(s)
- Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Ravi Patel
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
| |
Collapse
|
16
|
Gray VE, Hause RJ, Luebeck J, Shendure J, Fowler DM. Quantitative Missense Variant Effect Prediction Using Large-Scale Mutagenesis Data. Cell Syst 2017; 6:116-124.e3. [PMID: 29226803 DOI: 10.1016/j.cels.2017.11.003] [Citation(s) in RCA: 115] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2017] [Revised: 08/30/2017] [Accepted: 11/03/2017] [Indexed: 11/26/2022]
Abstract
Large datasets describing the quantitative effects of mutations on protein function are becoming increasingly available. Here, we leverage these datasets to develop Envision, which predicts the magnitude of a missense variant's molecular effect. Envision combines 21,026 variant effect measurements from nine large-scale experimental mutagenesis datasets, a hitherto untapped training resource, with a supervised, stochastic gradient boosting learning algorithm. Envision outperforms other missense variant effect predictors both on large-scale mutagenesis data and on an independent test dataset comprising 2,312 TP53 variants whose effects were measured using a low-throughput approach. This dataset was never used for hyperparameter tuning or model training and thus serves as an independent validation set. Envision prediction accuracy is also more consistent across amino acids than other predictors. Finally, we demonstrate that Envision's performance improves as more large-scale mutagenesis data are incorporated. We precompute Envision predictions for every possible single amino acid variant in human, mouse, frog, zebrafish, fruit fly, worm, and yeast proteomes (https://envision.gs.washington.edu/).
Collapse
Affiliation(s)
- Vanessa E Gray
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Ronald J Hause
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Jens Luebeck
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Howard Hughes Medical Institute, Seattle, WA 98195, USA
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Department of Bioengineering, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
17
|
Karim S, NourEldin HF, Abusamra H, Salem N, Alhathli E, Dudley J, Sanderford M, Scheinfeldt LB, Chaudhary AG, Al-Qahtani MH, Kumar S. e-GRASP: an integrated evolutionary and GRASP resource for exploring disease associations. BMC Genomics 2016; 17:770. [PMID: 27766955 PMCID: PMC5073857 DOI: 10.1186/s12864-016-3088-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2023] Open
Abstract
Background Genome-wide association studies (GWAS) have become a mainstay of biological research concerned with discovering genetic variation linked to phenotypic traits and diseases. Both discrete and continuous traits can be analyzed in GWAS to discover associations between single nucleotide polymorphisms (SNPs) and traits of interest. Associations are typically determined by estimating the significance of the statistical relationship between genetic loci and the given trait. However, the prioritization of bona fide, reproducible genetic associations from GWAS results remains a central challenge in identifying genomic loci underlying common complex diseases. Evolutionary-aware meta-analysis of the growing GWAS literature is one way to address this challenge and to advance from association to causation in the discovery of genotype-phenotype relationships. Description We have created an evolutionary GWAS resource to enable in-depth query and exploration of published GWAS results. This resource uses the publically available GWAS results annotated in the GRASP2 database. The GRASP2 database includes results from 2082 studies, 177 broad phenotype categories, and ~8.87 million SNP-phenotype associations. For each SNP in e-GRASP, we present information from the GRASP2 database for convenience as well as evolutionary information (e.g., rate and timespan). Users can, therefore, identify not only SNPs with highly significant phenotype-association P-values, but also SNPs that are highly replicated and/or occur at evolutionarily conserved sites that are likely to be functionally important. Additionally, we provide an evolutionary-adjusted SNP association ranking (E-rank) that uses cross-species evolutionary conservation scores and population allele frequencies to transform P-values in an effort to enhance the discovery of SNPs with a greater probability of biologically meaningful disease associations. Conclusion By adding an evolutionary dimension to the GWAS results available in the GRASP2 database, our e-GRASP resource will enable a more effective exploration of SNPs not only by the statistical significance of trait associations, but also by the number of studies in which associations have been replicated, and the evolutionary context of the associated mutations. Therefore, e-GRASP will be a valuable resource for aiding researchers in the identification of bona fide, reproducible genetic associations from GWAS results. This resource is freely available at http://www.mypeg.info/egrasp.
Collapse
Affiliation(s)
- Sajjad Karim
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Hend Fakhri NourEldin
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Heba Abusamra
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Nada Salem
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Elham Alhathli
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Joel Dudley
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, NY, 10029, USA
| | - Max Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA
| | - Laura B Scheinfeldt
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | | | | | - Sudhir Kumar
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia. .,Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA. .,Department of Biology, Temple University, Philadelphia, PA, 19122, USA.
| |
Collapse
|
18
|
Szedlak A, Smith N, Liu L, Paternostro G, Piermarocchi C. Evolutionary and Topological Properties of Genes and Community Structures in Human Gene Regulatory Networks. PLoS Comput Biol 2016; 12:e1005009. [PMID: 27359334 PMCID: PMC4928929 DOI: 10.1371/journal.pcbi.1005009] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2016] [Accepted: 05/25/2016] [Indexed: 01/26/2023] Open
Abstract
The diverse, specialized genes present in today's lifeforms evolved from a common core of ancient, elementary genes. However, these genes did not evolve individually: gene expression is controlled by a complex network of interactions, and alterations in one gene may drive reciprocal changes in its proteins' binding partners. Like many complex networks, these gene regulatory networks (GRNs) are composed of communities, or clusters of genes with relatively high connectivity. A deep understanding of the relationship between the evolutionary history of single genes and the topological properties of the underlying GRN is integral to evolutionary genetics. Here, we show that the topological properties of an acute myeloid leukemia GRN and a general human GRN are strongly coupled with its genes' evolutionary properties. Slowly evolving ("cold"), old genes tend to interact with each other, as do rapidly evolving ("hot"), young genes. This naturally causes genes to segregate into community structures with relatively homogeneous evolutionary histories. We argue that gene duplication placed old, cold genes and communities at the center of the networks, and young, hot genes and communities at the periphery. We demonstrate this with single-node centrality measures and two new measures of efficiency, the set efficiency and the interset efficiency. We conclude that these methods for studying the relationships between a GRN's community structures and its genes' evolutionary properties provide new perspectives for understanding evolutionary genetics.
Collapse
Affiliation(s)
- Anthony Szedlak
- Department of Physics and Astronomy, Michigan State University, East Lansing, Michigan, United States of America
| | - Nicholas Smith
- Salgomed Inc., Del Mar, California, United States of America
| | - Li Liu
- College of Health Solutions, Arizona State University, Tempe, Arizona, United States of America
| | - Giovanni Paternostro
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, California, United States of America
| | - Carlo Piermarocchi
- Department of Physics and Astronomy, Michigan State University, East Lansing, Michigan, United States of America
| |
Collapse
|
19
|
Kumar A, Butler BM, Kumar S, Ozkan SB. Integration of structural dynamics and molecular evolution via protein interaction networks: a new era in genomic medicine. Curr Opin Struct Biol 2015; 35:135-42. [PMID: 26684487 PMCID: PMC4856467 DOI: 10.1016/j.sbi.2015.11.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2015] [Revised: 11/03/2015] [Accepted: 11/05/2015] [Indexed: 01/08/2023]
Abstract
Sequencing technologies are revealing many new non-synonymous single nucleotide variants (nsSNVs) in each personal exome. To assess their functional impacts, comparative genomics is frequently employed to predict if they are benign or not. However, evolutionary analysis alone is insufficient, because it misdiagnoses many disease-associated nsSNVs, such as those at positions involved in protein interfaces, and because evolutionary predictions do not provide mechanistic insights into functional change or loss. Structural analyses can aid in overcoming both of these problems by incorporating conformational dynamics and allostery in nSNV diagnosis. Finally, protein-protein interaction networks using systems-level methodologies shed light onto disease etiology and pathogenesis. Bridging these network approaches with structurally resolved protein interactions and dynamics will advance genomic medicine.
Collapse
Affiliation(s)
- Avishek Kumar
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ 85281, United States
| | - Brandon M Butler
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ 85281, United States
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, United States; Department of Biology, Temple University, Philadelphia, PA 19122, United States; Center for Genomic Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - S Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ 85281, United States.
| |
Collapse
|
20
|
Miura S, Tate S, Kumar S. Using Disease-Associated Coding Sequence Variation to Investigate Functional Compensation by Human Paralogous Proteins. Evol Bioinform Online 2015; 11:245-51. [PMID: 26604664 PMCID: PMC4631161 DOI: 10.4137/ebo.s30594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2015] [Revised: 09/14/2015] [Accepted: 09/18/2015] [Indexed: 11/09/2022] Open
Abstract
Gene duplication enables the functional diversification in species. It is thought that duplicated genes may be able to compensate if the function of one of the gene copies is disrupted. This possibility is extensively debated with some studies reporting proteome-wide compensation, whereas others suggest functional compensation among only recent gene duplicates or no compensation at all. We report results from a systematic molecular evolutionary analysis to test the predictions of the functional compensation hypothesis. We contrasted the density of Mendelian disease-associated single nucleotide variants (dSNVs) in proteins with no discernable paralogs (singletons) with the dSNV density in proteins found in multigene families. Under the functional compensation hypothesis, we expected to find greater numbers of dSNVs in singletons due to the lack of any compensating partners. Our analyses produced an opposite pattern; paralogs have over 35% higher dSNV density than singletons. We found that these patterns are concordant with similar differences in the rates of amino acid evolution (ie, functional constraints), as the proteins with paralogs have evolved 33% slower than singletons. Our evolutionary constraint explanation is robust to differences in family sizes, ages (young vs. old duplicates), and degrees of amino acid sequence similarities among paralogs. Therefore, disease-associated human variation does not exhibit significant signals of functional compensation among paralogous proteins, but rather an evolutionary constraint hypothesis provides a better explanation for the observed patterns of disease-associated and neutral polymorphisms in the human genome.
Collapse
Affiliation(s)
- Sayaka Miura
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - Stephanie Tate
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA. ; Department of Biology, Temple University, Philadelphia, PA, USA. ; Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
21
|
Gerek NZ, Liu L, Gerold K, Biparva P, Thomas ED, Kumar S. Evolutionary Diagnosis of non-synonymous variants involved in differential drug response. BMC Med Genomics 2015; 8 Suppl 1:S6. [PMID: 25952014 PMCID: PMC4315320 DOI: 10.1186/1755-8794-8-s1-s6] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Many pharmaceutical drugs are known to be ineffective or have negative side effects in a substantial proportion of patients. Genomic advances are revealing that some non-synonymous single nucleotide variants (nsSNVs) may cause differences in drug efficacy and side effects. Therefore, it is desirable to evaluate nsSNVs of interest in their ability to modulate the drug response. Results We found that the available data on the link between drug response and nsSNV is rather modest. There were only 31 distinct drug response-altering (DR-altering) and 43 distinct drug response-neutral (DR-neutral) nsSNVs in the whole Pharmacogenomics Knowledge Base (PharmGKB). However, even with this modest dataset, it was clear that existing bioinformatics tools have difficulties in correctly predicting the known DR-altering and DR-neutral nsSNVs. They exhibited an overall accuracy of less than 50%, which was not better than random diagnosis. We found that the underlying problem is the markedly different evolutionary properties between positions harboring nsSNVs linked to drug responses and those observed for inherited diseases. To solve this problem, we developed a new diagnosis method, Drug-EvoD, which was trained on the evolutionary properties of nsSNVs associated with drug responses in a sparse learning framework. Drug-EvoD achieves a TPR of 84% and a TNR of 53%, with a balanced accuracy of 69%, which improves upon other methods significantly. Conclusions The new tool will enable researchers to computationally identify nsSNVs that may affect drug responses. However, much larger training and testing datasets are needed to develop more reliable and accurate tools.
Collapse
|
22
|
Butler BM, Gerek ZN, Kumar S, Ozkan SB. Conformational dynamics of nonsynonymous variants at protein interfaces reveals disease association. Proteins 2015; 83:428-35. [PMID: 25546381 DOI: 10.1002/prot.24748] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2014] [Revised: 11/20/2014] [Accepted: 12/10/2014] [Indexed: 12/12/2022]
Abstract
Recent studies have shown that the protein interface sites between individual monomeric units in biological assemblies are enriched in disease-associated non-synonymous single nucleotide variants (nsSNVs). To elucidate the mechanistic underpinning of this observation, we investigated the conformational dynamic properties of protein interface sites through a site-specific structural dynamic flexibility metric (dfi) for 333 multimeric protein assemblies. dfi measures the dynamic resilience of a single residue to perturbations that occurred in the rest of the protein structure and identifies sites contributing the most to functionally critical dynamics. Analysis of dfi profiles of over a thousand positions harboring variation revealed that amino acid residues at interfaces have lower average dfi (31%) than those present at non-interfaces (50%), which means that protein interfaces have less dynamic flexibility. Interestingly, interface sites with disease-associated nsSNVs have significantly lower average dfi (23%) as compared to those of neutral nsSNVs (42%), which directly relates structural dynamics to functional importance. We found that less conserved interface positions show much lower dfi for disease nsSNVs as compared to neutral nsSNVs. In this case, dfi is better as compared to the accessible surface area metric, which is based on the static protein structure. Overall, our proteome-wide conformational dynamic analysis indicates that certain interface sites play a critical role in functionally related dynamics (i.e., those with low dfi values), therefore mutations at those sites are more likely to be associated with disease.
Collapse
|
23
|
Human aldo-keto reductases and the metabolic activation of polycyclic aromatic hydrocarbons. Chem Res Toxicol 2014; 27:1901-17. [PMID: 25279998 PMCID: PMC4237494 DOI: 10.1021/tx500298n] [Citation(s) in RCA: 76] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
![]()
Aldo-keto reductases (AKRs) are promiscuous
NAD(P)(H) dependent
oxidoreductases implicated in the metabolic activation of polycyclic
aromatic hydrocarbons (PAH). These enzymes catalyze the oxidation
of non-K-region trans-dihydrodiols to the corresponding o-quinones with the concomitant production of reactive oxygen
species (ROS). The PAH o-quinones are Michael acceptors
and can form adducts but are also redox-active and enter into futile
redox cycles to amplify ROS formation. Evidence exists to support
this metabolic pathway in humans. The human recombinant AKR1A1 and
AKR1C1–AKR1C4 enzymes all catalyze the oxidation of PAH trans-dihydrodiols to PAH o-quinones. Many
human AKRs also catalyze the NADPH-dependent reduction of the o-quinone products to air-sensitive catechols, exacerbating
ROS formation. Moreover, this pathway of PAH activation occurs in
a panel of human lung cell lines, resulting in the production of ROS
and oxidative DNA damage in the form of 8-oxo-2′-deoxyguanosine.
Using stable-isotope dilution liquid chromatography tandem mass spectrometry,
this pathway of benzo[a]pyrene (B[a]P) metabolism was found to contribute equally with the diol-epoxide
pathway to the activation of this human carcinogen in human lung cells.
Evaluation of the mutagenicity of anti-B[a]P-diol epoxide with B[a]P-7,8-dione on
p53 showed that the o-quinone produced by AKRs was
the more potent mutagen, provided that it was permitted to redox cycle,
and that the mutations observed were G to T transversions, reminiscent
of those observed in human lung cancer. It is concluded that there
is sufficient evidence to support the role of human AKRs in the metabolic
activation of PAH in human lung cell lines and that they may contribute
to the causation of human lung cancer.
Collapse
|
24
|
Li B, Seligman C, Thusberg J, Miller JL, Auer J, Whirl-Carrillo M, Capriotti E, Klein TE, Mooney SD. In silico comparative characterization of pharmacogenomic missense variants. BMC Genomics 2014; 15 Suppl 4:S4. [PMID: 25057096 PMCID: PMC4092878 DOI: 10.1186/1471-2164-15-s4-s4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Missense pharmacogenomic (PGx) variants refer to amino acid substitutions that potentially affect the pharmacokinetic (PK) or pharmacodynamic (PD) response to drug therapies. The PGx variants, as compared to disease-associated variants, have not been investigated as deeply. The ability to computationally predict future PGx variants is desirable; however, it is not clear what data sets should be used or what features are beneficial to this end. Hence we carried out a comparative characterization of PGx variants with annotated neutral and disease variants from UniProt, to test the predictive power of sequence conservation and structural information in discriminating these three groups. RESULTS 126 PGx variants of high quality from PharmGKB were selected and two data sets were created: one set contained 416 variants with structural and sequence information, and, the other set contained 1,265 variants with sequence information only. In terms of sequence conservation, PGx variants are more conserved than neutral variants and much less conserved than disease variants. A weighted random forest was used to strike a more balanced classification for PGx variants. Generally structural features are helpful in discriminating PGx variant from the other two groups, but still classification of PGx from neutral polymorphisms is much less effective than between disease and neutral variants. CONCLUSIONS We found that PGx variants are much more similar to neutral variants than to disease variants in the feature space consisting of residue conservation, neighboring residue conservation, number of neighbors, and protein solvent accessibility. Such similarity poses great difficulty in the classification of PGx variants and polymorphisms.
Collapse
|
25
|
Gray VE, Liu L, Nirankari R, Hornbeck PV, Kumar S. Signatures of natural selection on mutations of residues with multiple posttranslational modifications. Mol Biol Evol 2014; 31:1641-5. [PMID: 24739307 DOI: 10.1093/molbev/msu137] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Posttranslational modifications (PTMs) regulate molecular structures and functions of proteins by covalently binding to amino acids. Hundreds of thousands of PTMs have been reported for the human proteome, with multiple PTMs known to affect tens of thousands of lysine (K) residues. Our molecular evolutionary analyses show that K residues with multiple PTMs exhibit greater conservation than those with a single PTM, but the difference is rather small. In contrast, short-term evolutionary trends revealed in an analysis of human population variation exhibited a much larger difference. Lysine residues with three PTMs show 1.8-fold enrichment of Mendelian disease-associated variants when compared with K residues with two PTMs, with the latter showing 1.7-fold enrichment of these variants when compared with the K residues with one PTM. Rare polymorphisms in humans show a similar trend, which suggests much greater negative selection against mutations of K residues with multiple PTMs within population. Conversely, common polymorphisms are overabundant at unmodified K residues and at K residues with fewer PTMs. The observed difference between inter- and intraspecies patterns of purifying selection on residues with PTMs suggests extensive species-specific drifting of PTM positions. These results suggest that the functionality of a protein is likely conserved, without necessarily conserving the PTM positions over evolutionary time.
Collapse
Affiliation(s)
- Vanessa E Gray
- Center for Evolutionary Medicine and Informatics, Biodesign Institute, Arizona State University
| | - Li Liu
- Center for Evolutionary Medicine and Informatics, Biodesign Institute, Arizona State University
| | - Ronika Nirankari
- Center for Evolutionary Medicine and Informatics, Biodesign Institute, Arizona State University
| | | | - Sudhir Kumar
- Center for Evolutionary Medicine and Informatics, Biodesign Institute, Arizona State UniversitySchool of Life Sciences, Arizona State UniversityCenter for Genomic Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
26
|
Isakov O, Perrone M, Shomron N. Exome sequencing analysis: a guide to disease variant detection. Methods Mol Biol 2014; 1038:137-58. [PMID: 23872973 DOI: 10.1007/978-1-62703-514-9_8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Whole exome sequencing presents a powerful tool to study rare genetic disorders. The most challenging part of using exome sequencing for the purpose of disease-causing variant detection is analyzing, interpreting, and filtering the large number of detected variants. In this chapter we provide a comprehensive description of the various steps required for such an analysis. We address strategies in selecting samples to sequence, and technical considerations involved in exome sequencing. We then discuss how to identify variants, and methods for first annotating detected variants using characteristics such as allele frequency, location in the genome, and predicted severity, and then classifying and prioritizing the detected variants based on those annotations. Finally, we review possible gene annotations that may help to establish a relationship between genes carrying high-priority variants and the phenotype in question, in order to identify the most likely causative mutations.
Collapse
Affiliation(s)
- Ofer Isakov
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | | | | |
Collapse
|
27
|
Wong KC, Zhang Z. SNPdryad: predicting deleterious non-synonymous human SNPs using only orthologous protein sequences. ACTA ACUST UNITED AC 2014; 30:1112-1119. [PMID: 24389653 DOI: 10.1093/bioinformatics/btt769] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Accepted: 12/13/2013] [Indexed: 11/12/2022]
Abstract
MOTIVATION The recent advances in genome sequencing have revealed an abundance of non-synonymous polymorphisms among human individuals; subsequently, it is of immense interest and importance to predict whether such substitutions are functional neutral or have deleterious effects. The accuracy of such prediction algorithms depends on the quality of the multiple-sequence alignment, which is used to infer how an amino acid substitution is tolerated at a given position. Because of the scarcity of orthologous protein sequences in the past, the existing prediction algorithms all include sequences of protein paralogs in the alignment, which can dilute the conservation signal and affect prediction accuracy. However, we believe that, with the sequencing of a large number of mammalian genomes, it is now feasible to include only protein orthologs in the alignment and improve the prediction performance. RESULTS We have developed a novel prediction algorithm, named SNPdryad, which only includes protein orthologs in building a multiple sequence alignment. Among many other innovations, SNPdryad uses different conservation scoring schemes and uses Random Forest as a classifier. We have tested SNPdryad on several datasets. We found that SNPdryad consistently outperformed other methods in several performance metrics, which is attributed to the exclusion of paralogous sequence. We have run SNPdryad on the complete human proteome, generating prediction scores for all the possible amino acid substitutions. AVAILABILITY AND IMPLEMENTATION The algorithm and the prediction results can be accessed from the Web site: http://snps.ccbr.utoronto.ca:8080/SNPdryad/ CONTACT: Zhaolei.Zhang@utoronto.ca Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ka-Chun Wong
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada M5S 3G4 The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1 and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Computer Science, University of Toronto, Toronto, Ontario, Canada M5S 3G4 The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1 and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| | - Zhaolei Zhang
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada M5S 3G4 The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1 and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Computer Science, University of Toronto, Toronto, Ontario, Canada M5S 3G4 The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1 and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Computer Science, University of Toronto, Toronto, Ontario, Canada M5S 3G4 The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1 and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Computer Science, University of Toronto, Toronto, Ontario, Canada M5S 3G4 The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1 and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| |
Collapse
|
28
|
Schumacher J, Ramljak S, Asif AR, Schaffrath M, Zischler H, Herlyn H. Evolutionary conservation of mammalian sperm proteins associates with overall, not tyrosine, phosphorylation in human spermatozoa. J Proteome Res 2013; 12:5370-82. [PMID: 23919900 DOI: 10.1021/pr400228c] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
We investigated possible associations between sequence evolution of mammalian sperm proteins and their phosphorylation status in humans. As a reference, spermatozoa from three normozoospermic men were analyzed combining two-dimensional gel electrophoresis, immunoblotting, and mass spectrometry. We identified 99 sperm proteins (thereof 42 newly described) and determined the phosphorylation status for most of them. Sequence evolution was studied across six mammalian species using nonsynonymous/synonymous rate ratios (dN/dS) and amino acid distances. Site-specific purifying selection was assessed employing average ratios of evolutionary rates at phosphorylated versus nonphosphorylated amino acids (α). According to our data, mammalian sperm proteins do not show statistically significant sequence conservation difference, no matter if the human ortholog is a phosphoprotein with or without tyrosine (Y) phosphorylation. In contrast, overall phosphorylation of human sperm proteins, i.e., phosphorylation at serine (S), threonine (T), and/or Y residues, associates with above-average conservation of sequences. Complementary investigations suggest that numerous protein-protein interactants constrain sequence evolution of sperm phosphoproteins. Although our findings reject a special relevance of Y phosphorylation for sperm functioning, they still indicate that overall phosphorylation substantially contributes to proper functioning of sperm proteins. Hence, phosphorylated sperm proteins might be considered as prime candidates for diagnosis and treatment of reduced male fertility.
Collapse
Affiliation(s)
- Julia Schumacher
- Institute of Anthropology, University Mainz , Anselm-Franz-von-Bentzel-Weg 7, Mainz 55128, Germany
| | | | | | | | | | | |
Collapse
|
29
|
Stefl S, Nishi H, Petukh M, Panchenko AR, Alexov E. Molecular mechanisms of disease-causing missense mutations. J Mol Biol 2013; 425:3919-36. [PMID: 23871686 DOI: 10.1016/j.jmb.2013.07.014] [Citation(s) in RCA: 187] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2013] [Revised: 07/04/2013] [Accepted: 07/10/2013] [Indexed: 12/23/2022]
Abstract
Genetic variations resulting in a change of amino acid sequence can have a dramatic effect on stability, hydrogen bond network, conformational dynamics, activity and many other physiologically important properties of proteins. The substitutions of only one residue in a protein sequence, so-called missense mutations, can be related to many pathological conditions and may influence susceptibility to disease and drug treatment. The plausible effects of missense mutations range from affecting the macromolecular stability to perturbing macromolecular interactions and cellular localization. Here we review the individual cases and genome-wide studies that illustrate the association between missense mutations and diseases. In addition, we emphasize that the molecular mechanisms of effects of mutations should be revealed in order to understand the disease origin. Finally, we report the current state-of-the-art methodologies that predict the effects of mutations on protein stability, the hydrogen bond network, pH dependence, conformational dynamics and protein function.
Collapse
Affiliation(s)
- Shannon Stefl
- Computational Biophysics and Bioinformatics, Department of Physics, Clemson University, Clemson, SC 29634, USA
| | | | | | | | | |
Collapse
|
30
|
Nishi H, Tyagi M, Teng S, Shoemaker BA, Hashimoto K, Alexov E, Wuchty S, Panchenko AR. Cancer missense mutations alter binding properties of proteins and their interaction networks. PLoS One 2013; 8:e66273. [PMID: 23799087 PMCID: PMC3682950 DOI: 10.1371/journal.pone.0066273] [Citation(s) in RCA: 81] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Accepted: 05/02/2013] [Indexed: 11/18/2022] Open
Abstract
Many studies have shown that missense mutations might play an important role in carcinogenesis. However, the extent to which cancer mutations might affect biomolecular interactions remains unclear. Here, we map glioblastoma missense mutations on the human protein interactome, model the structures of affected protein complexes and decipher the effect of mutations on protein-protein, protein-nucleic acid and protein-ion binding interfaces. Although some missense mutations over-stabilize protein complexes, we found that the overall effect of mutations is destabilizing, mostly affecting the electrostatic component of binding energy. We also showed that mutations on interfaces resulted in more drastic changes of amino acid physico-chemical properties than mutations occurring outside the interfaces. Analysis of glioblastoma mutations on interfaces allowed us to stratify cancer-related interactions, identify potential driver genes, and propose two dozen additional cancer biomarkers, including those specific to functions of the nervous system. Such an analysis also offered insight into the molecular mechanism of the phenotypic outcomes of mutations, including effects on complex stability, activity, binding and turnover rate. As a result of mutated protein and gene network analysis, we observed that interactions of proteins with mutations mapped on interfaces had higher bottleneck properties compared to interactions with mutations elsewhere on the protein or unaffected interactions. Such observations suggest that genes with mutations directly affecting protein binding properties are preferably located in central network positions and may influence critical nodes and edges in signal transduction networks.
Collapse
Affiliation(s)
- Hafumi Nishi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Manoj Tyagi
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Shaolei Teng
- Computational Biophysics and Bioinformatics, Department of Physics, Clemson University, Clemson, South Carolina, United States of America
| | - Benjamin A. Shoemaker
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | | | - Emil Alexov
- Computational Biophysics and Bioinformatics, Department of Physics, Clemson University, Clemson, South Carolina, United States of America
| | - Stefan Wuchty
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Anna R. Panchenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
31
|
Liu L, Kumar S. Evolutionary balancing is critical for correctly forecasting disease-associated amino acid variants. Mol Biol Evol 2013; 30:1252-7. [PMID: 23462317 DOI: 10.1093/molbev/mst037] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Computational predictions have become indispensable for evaluating the disease-related impact of nonsynonymous single-nucleotide variants discovered in exome sequencing. Many such methods have their roots in molecular evolution, as they use information derived from multiple sequence alignments. We show that the performance of current methods (e.g., PolyPhen-2 and SIFT) is improved significantly by optimizing their statistical models on evolutionarily balanced training data, where equal numbers of positive and negative controls within each evolutionary conservation class are used. Evolutionary balancing significantly reduces the false-positive rates for variants observed at highly conserved sites and false-negative rates for variants observed at fast evolving sites. Use of these improved methods enables more accurate forecasting when concordant diagnosis from multiple methods is regarded as a more reliable indicator of the prediction. Applied to a large exome variation data set, we find that the current methods produce concordant predictions for less than half of the population variants. These advances are implemented in a web resource for use in practical applications (www.mypeg.info, last accessed March 13, 2013).
Collapse
Affiliation(s)
- Li Liu
- Center for Evolutionary Medicine and Informatics, Biodesign Institute, Arizona State University, USA
| | | |
Collapse
|
32
|
Nevin Gerek Z, Kumar S, Banu Ozkan S. Structural dynamics flexibility informs function and evolution at a proteome scale. Evol Appl 2013; 6:423-33. [PMID: 23745135 PMCID: PMC3673471 DOI: 10.1111/eva.12052] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2012] [Accepted: 01/13/2013] [Indexed: 01/04/2023] Open
Abstract
Protein structures are dynamic entities with a myriad of atomic fluctuations, side-chain rotations, and collective domain movements. Although the importance of these dynamics to proper functioning of proteins is emerging in the studies of many protein families, there is a lack of broad evidence for the critical role of protein dynamics in shaping the biological functions of a substantial fraction of residues for a large number of proteins in the human proteome. Here, we propose a novel dynamic flexibility index (dfi) to quantify the dynamic properties of individual residues in any protein and use it to assess the importance of protein dynamics in 100 human proteins. Our analyses involving functionally critical positions, disease-associated and putatively neutral population variations, and the rate of interspecific substitutions per residue produce concordant patterns at a proteome scale. They establish that the preservation of dynamic properties of residues in a protein structure is critical for maintaining the protein/biological function. Therefore, structural dynamics needs to become a major component of the analysis of protein function and evolution. Such analyses will be facilitated by the dfi, which will also enable the integrative use of structural dynamics with evolutionary conservation in genomic medicine as well as functional genomics investigations.
Collapse
Affiliation(s)
- Zeynep Nevin Gerek
- Center for Evolutionary Medicine and Informatics, Biodesign Institute, Arizona State University Tempe, AZ, USA ; Department of Physics, Center for Biological Physics, Bateman Physical Sciences F-Wing, Arizona State University Tempe, AZ, USA
| | | | | |
Collapse
|
33
|
Champion MD, Gray V, Eberhard C, Kumar S. The evolutionary history of amino acid variations mediating increased resistance of S. aureus identifies reversion mutations in metabolic regulators. PLoS One 2013; 8:e56466. [PMID: 23424663 PMCID: PMC3570469 DOI: 10.1371/journal.pone.0056466] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2012] [Accepted: 01/09/2013] [Indexed: 01/11/2023] Open
Abstract
The evolution of resistance in Staphylococcus aureus occurs rapidly, and in response to all known antimicrobial treatments. Numerous studies of model species describe compensatory roles of mutations in mediating competitive fitness, and there is growing evidence that these mutation types also drive adaptation of S. aureus strains. However, few studies have tracked amino acid changes during the complete evolutionary trajectory of antibiotic adaptation or been able to predict their functional relevance. Here, we have assessed the efficacy of computational methods to predict biological resistance of a collection of clinically known Resistance Associated Mutations (RAMs). We have found that >90% of known RAMs are incorrectly predicted to be functionally neutral by at least one of the prediction methods used. By tracing the evolutionary histories of all of the false negative RAMs, we have discovered that a significant number are reversion mutations to ancestral alleles also carried in the MSSA476 methicillin-sensitive isolate. These genetic reversions are most prevalent in strains following daptomycin treatment and show a tendency to accumulate in biological pathway reactions that are distinct from those accumulating non-reversion mutations. Our studies therefore show that in addition to non-reversion mutations, reversion mutations arise in isolates exposed to new antibiotic treatments. It is possible that acquisition of reversion mutations in the genome may prevent substantial fitness costs during the progression of resistance. Our findings pose an interesting question to be addressed by further clinical studies regarding whether or not these reversion mutations lead to a renewed vulnerability of a vancomycin or daptomycin resistant strain to antibiotics administered at an earlier stage of infection.
Collapse
Affiliation(s)
- Mia D Champion
- Center for Evolutionary Medicine & Informatics, Biodesign Institute, Arizona State University, Arizona, United States of America.
| | | | | | | |
Collapse
|
34
|
Maruki T, Kumar S, Kim Y. Purifying selection modulates the estimates of population differentiation and confounds genome-wide comparisons across single-nucleotide polymorphisms. Mol Biol Evol 2012; 29:3617-23. [PMID: 22826460 PMCID: PMC3494274 DOI: 10.1093/molbev/mss187] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
An improved understanding of the biological and numerical properties of measures of population differentiation across loci is becoming increasingly more important because of their growing use in analyzing genome-wide polymorphism data for detecting population structures, inferring the rates of migration, and identifying local adaptations. In a genome-wide analysis, we discovered that the estimates of population differentiation (e.g., F(ST), θ, and Jost's D) calculated for human single-nucleotide polymorphisms (SNPs) are strongly and positively correlated to the position-specific evolutionary rates measured from multispecies alignments. That is, genomic positions (loci) experiencing higher purifying selection (lower evolutionary rates) produce lower values for the degree of population differentiation than those evolving with faster rates. We show that this pattern is completely mediated by the negative effects of purifying selection on the minor allele frequency (MAF) at individual loci. Our results suggest that inferences and methods relying on the comparison of population differentiation estimates (F(ST), θ, and Jost's D) based on SNPs across genomic positions should be restricted to loci with similar MAFs and/or the rates of evolution in genome scale surveys.
Collapse
Affiliation(s)
- Takahiro Maruki
- Center for Evolutionary Medicine and Informatics, The Biodesign Institute, Arizona State University
- School of Life Sciences, Arizona State University
| | - Sudhir Kumar
- Center for Evolutionary Medicine and Informatics, The Biodesign Institute, Arizona State University
- School of Life Sciences, Arizona State University
| | - Yuseob Kim
- Center for Evolutionary Medicine and Informatics, The Biodesign Institute, Arizona State University
- School of Life Sciences, Arizona State University
- Department of Life Science, Ewha Womans University, Seoul, Korea
| |
Collapse
|
35
|
Sunyaev SR. Inferring causality and functional significance of human coding DNA variants. Hum Mol Genet 2012; 21:R10-7. [PMID: 22990389 DOI: 10.1093/hmg/dds385] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Sequencing technology enables the complete characterization of human genetic variation. Statistical genetics studies identify numerous loci linked to or associated with phenotypes of direct medical interest. The major remaining challenge is to characterize functionally significant alleles that are causally implicated in the genetic basis of human traits. Here, I review three sources of evidence for the functional significance of human DNA variants in protein-coding genes. These include (i) statistical genetics considerations such as co-segregation with the phenotype, allele frequency in unaffected controls and recurrence; (ii) in vitro functional assays and model organism experiments; and (iii) computational methods for predicting the functional effect of amino acid substitutions. In spite of many successes of recent studies, functional characterization of human allelic variants remains problematic.
Collapse
Affiliation(s)
- Shamil R Sunyaev
- Genetics Division, Brigham and Women's Hospital, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA.
| |
Collapse
|
36
|
Gray VE, Kukurba KR, Kumar S. Performance of computational tools in evaluating the functional impact of laboratory-induced amino acid mutations. Bioinformatics 2012; 28:2093-6. [PMID: 22685075 PMCID: PMC3413386 DOI: 10.1093/bioinformatics/bts336] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Summary: Site-directed mutagenesis is frequently used by scientists to investigate the functional impact of amino acid mutations in the laboratory. Over 10 000 such laboratory-induced mutations have been reported in the UniProt database along with the outcomes of functional assays. Here, we explore the performance of state-of-the-art computational tools (Condel, PolyPhen-2 and SIFT) in correctly annotating the function-altering potential of 10 913 laboratory-induced mutations from 2372 proteins. We find that computational tools are very successful in diagnosing laboratory-induced mutations that elicit significant functional change in the laboratory (up to 92% accuracy). But, these tools consistently fail in correctly annotating laboratory-induced mutations that show no functional impact in the laboratory assays. Therefore, the overall accuracy of computational tools for laboratory-induced mutations is much lower than that observed for the naturally occurring human variants. We tested and rejected the possibilities that the preponderance of changes to alanine and the presence of multiple base-pair mutations in the laboratory were the reasons for the observed discordance between the performance of computational tools for natural and laboratory mutations. Instead, we discover that the laboratory-induced mutations occur predominately at the highly conserved positions in proteins, where the computational tools have the lowest accuracy of correct prediction for variants that do not impact function (neutral). Therefore, the comparisons of experimental-profiling results with those from computational predictions need to be sensitive to the evolutionary conservation of the positions harboring the amino acid change. Contact:s.kumar@asu.edu
Collapse
Affiliation(s)
- Vanessa E Gray
- Center for Evolutionary Medicine and Informatics, Biodesign Institute, Arizona State University, Tempe, AZ 85287, USA
| | | | | |
Collapse
|
37
|
Dudley JT, Kim Y, Liu L, Markov GJ, Gerold K, Chen R, Butte AJ, Kumar S. Human genomic disease variants: a neutral evolutionary explanation. Genome Res 2012; 22:1383-94. [PMID: 22665443 PMCID: PMC3409252 DOI: 10.1101/gr.133702.111] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Many perspectives on the role of evolution in human health include nonempirical assumptions concerning the adaptive evolutionary origins of human diseases. Evolutionary analyses of the increasing wealth of clinical and population genomic data have begun to challenge these presumptions. In order to systematically evaluate such claims, the time has come to build a common framework for an empirical and intellectual unification of evolution and modern medicine. We review the emerging evidence and provide a supporting conceptual framework that establishes the classical neutral theory of molecular evolution (NTME) as the basis for evaluating disease- associated genomic variations in health and medicine. For over a decade, the NTME has already explained the origins and distribution of variants implicated in diseases and has illuminated the power of evolutionary thinking in genomic medicine. We suggest that a majority of disease variants in modern populations will have neutral evolutionary origins (previously neutral), with a relatively smaller fraction exhibiting adaptive evolutionary origins (previously adaptive). This pattern is expected to hold true for common as well as rare disease variants. Ultimately, a neutral evolutionary perspective will provide medicine with an informative and actionable framework that enables objective clinical assessment beyond convenient tendencies to invoke past adaptive events in human history as a root cause of human disease.
Collapse
Affiliation(s)
- Joel T Dudley
- Program in Biomedical Informatics, Stanford University School of Medicine, Stanford, California 94305, USA
| | | | | | | | | | | | | | | |
Collapse
|
38
|
High-resolution melting analysis of 15 genes in 60 patients with cytochrome-c oxidase deficiency. J Hum Genet 2012; 57:442-8. [PMID: 22592081 DOI: 10.1038/jhg.2012.49] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Cytochrome-c oxidase (COX) deficiency is one of the common childhood mitochondrial disorders. Mutations in genes for the assembly factors SURF1 and SCO2 are prevalent in children with COX deficiency in the Slavonic population. Molecular diagnosis is difficult because of the number of genes involved in COX biogenesis and assembly. The aim of this study was to screen for mutations in 15 nuclear genes that encode the 10 structural subunits, their isoforms and two assembly factors of COX in 60 unrelated Czech children with COX deficiency. Nine novel variants were identified in exons and adjacent intronic regions of COX4I2, COX6A1, COX6A2, COX7A1, COX7A2 and COX10 using high-resolution melting (HRM) analysis. Online bioinformatics servers were used to predict the importance of the newly identified amino-acid substitutions. The newly characterized variants updated the contemporary spectrum of known genetic sequence variations that are present in the Czech population, which will be important for further targeted mutation screening in Czech COX-deficient children. HRM and predictive bioinformatics methodologies are advantageous because they are low-cost screening tools that complement large-scale genomic studies and reduce the required time and effort.
Collapse
|
39
|
Dudley JT, Chen R, Sanderford M, Butte AJ, Kumar S. Evolutionary meta-analysis of association studies reveals ancient constraints affecting disease marker discovery. Mol Biol Evol 2012; 29:2087-94. [PMID: 22389448 DOI: 10.1093/molbev/mss079] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Genome-wide disease association studies contrast genetic variation between disease cohorts and healthy populations to discover single nucleotide polymorphisms (SNPs) and other genetic markers revealing underlying genetic architectures of human diseases. Despite scores of efforts over the past decade, many reproducible genetic variants that explain substantial proportions of the heritable risk of common human diseases remain undiscovered. We have conducted a multispecies genomic analysis of 5,831 putative human risk variants for more than 230 disease phenotypes reported in 2,021 studies. We find that the current approaches show a propensity for discovering disease-associated SNPs (dSNPs) at conserved genomic positions because the effect size (odds ratio) and allelic P value of genetic association of an SNP relates strongly to the evolutionary conservation of their genomic position. We propose a new measure for ranking SNPs that integrates evolutionary conservation scores and the P value (E-rank). Using published data from a large case-control study, we demonstrate that E-rank method prioritizes SNPs with a greater likelihood of bona fide and reproducible genetic disease associations, many of which may explain greater proportions of genetic variance. Therefore, long-term evolutionary histories of genomic positions offer key practical utility in reassessing data from existing disease association studies, and in the design and analysis of future studies aimed at revealing the genetic basis of common human diseases.
Collapse
|
40
|
Hao DC, Feng Y, Xiao R, Xiao PG. Non-neutral nonsynonymous single nucleotide polymorphisms in human ABC transporters: the first comparison of six prediction methods. Pharmacol Rep 2012; 63:924-34. [PMID: 22001980 DOI: 10.1016/s1734-1140(11)70608-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2010] [Revised: 02/07/2011] [Indexed: 11/28/2022]
Abstract
Nonsynonymous single nucleotide polymorphisms (nsSNPs) in coding regions that can lead to amino acid changes may cause alteration of protein function and account for susceptibility to disease and altered drug/xenobiotic response. Abundant nsSNPs have been found in genes coding for human ATP-binding cassette (ABC) transporters, but there is little known about the relationship between the genotype and phenotype of nsSNPs in these membrane proteins. In addition, it is unknown which prediction method is better suited for the prediction of non-neutral nsSNPs of ABC transporters. We have identified 2,172 validated nsSNPs in 49 human ABC transporter genes from the Ensembl genome database and the NCBI SNP database. Using six different algorithms, 41 to 52% of nsSNPs in ABC transporter genes were predicted to have functional impacts on protein function. Predictions largely agreed with the available experimental annotations. Overall, 78.5% of non-neutral nsSNPs were predicted correctly as damaging by SNAP, which together with SIFT and PolyPhen, was superior to the prediction methods Pmut, PhD-SNP, and Panther. This study also identified any amino acids that were likely to be functionally critical but have not yet been studied experimentally. There was significant concordance between the predicted results of SIFT and PolyPhen. Evolutionarily non-neutral (destabilizing) amino acid substitutions are predicted to be the basis for the pathogenic alteration of ABC transporter activity that is associated with disease susceptibility and altered drug/xenobiotic response.
Collapse
Affiliation(s)
- Da Cheng Hao
- Laboratory of Biotechnology, College of Environment, Dalian Jiaotong University, Dalian 116028, China.
| | | | | | | |
Collapse
|
41
|
David A, Razali R, Wass MN, Sternberg MJE. Protein-protein interaction sites are hot spots for disease-associated nonsynonymous SNPs. Hum Mutat 2011; 33:359-63. [PMID: 22072597 DOI: 10.1002/humu.21656] [Citation(s) in RCA: 118] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2011] [Accepted: 10/31/2011] [Indexed: 11/08/2022]
Abstract
Many nonsynonymous single nucleotide polymorphisms (nsSNPs) are disease causing due to effects at protein-protein interfaces. We have integrated a database of the three-dimensional (3D) structures of human protein/protein complexes and the humsavar database of nsSNPs. We analyzed the location of nsSNPS in terms of their location in the protein core, at protein-protein interfaces, and on the surface when not at an interface. Disease-causing nsSNPs that do not occur in the protein core are preferentially located at protein-protein interfaces rather than surface noninterface regions when compared to random segregation. The disruption of the protein-protein interaction can be explained by a range of structural effects including the loss of an electrostatic salt bridge, the destabilization due to reduction of the hydrophobic effect, the formation of a steric clash, and the introduction of a proline altering the main-chain conformation.
Collapse
Affiliation(s)
- Alessia David
- Centre for Integrative Systems Biology and Bioinformatics, Division of Molecular Biosciences, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | | | | | | |
Collapse
|
42
|
Kumar S, Dudley JT, Filipski A, Liu L. Phylomedicine: an evolutionary telescope to explore and diagnose the universe of disease mutations. Trends Genet 2011; 27:377-86. [PMID: 21764165 PMCID: PMC3272884 DOI: 10.1016/j.tig.2011.06.004] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2011] [Revised: 06/10/2011] [Accepted: 06/13/2011] [Indexed: 12/30/2022]
Abstract
Modern technologies have made the sequencing of personal genomes routine. They have revealed thousands of nonsynonymous (amino acid altering) single nucleotide variants (nSNVs) of protein-coding DNA per genome. What do these variants foretell about an individual's predisposition to diseases? The experimental technologies required to carry out such evaluations at a genomic scale are not yet available. Fortunately, the process of natural selection has lent us an almost infinite set of tests in nature. During long-term evolution, new mutations and existing variations have been evaluated for their biological consequences in countless species, and outcomes are readily revealed by multispecies genome comparisons. We review studies that have investigated evolutionary characteristics and in silico functional diagnoses of nSNVs found in thousands of disease-associated genes. We conclude that the patterns of long-term evolutionary conservation and permissible sequence divergence are essential and instructive modalities for functional assessment of human genetic variations.
Collapse
Affiliation(s)
- Sudhir Kumar
- School of Life Sciences, Arizona State University, Tempe, AZ 85287-4501, USA.
| | | | | | | |
Collapse
|
43
|
Sterne-Weiler T, Howard J, Mort M, Cooper DN, Sanford JR. Loss of exon identity is a common mechanism of human inherited disease. Genome Res 2011; 21:1563-71. [PMID: 21750108 DOI: 10.1101/gr.118638.110] [Citation(s) in RCA: 134] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
It is widely accepted that at least 10% of all mutations causing human inherited disease disrupt splice-site consensus sequences. In contrast to splice-site mutations, the role of auxiliary cis-acting elements such as exonic splicing enhancers (ESE) and exonic splicing silencers (ESS) in human inherited disease is still poorly understood. Here we use a top-down approach to determine rates of loss or gain of known human exonic splicing regulatory (ESR) sequences associated with either disease-causing mutations or putatively neutral single nucleotide polymorphisms (SNPs). We observe significant enrichment toward loss of ESEs and gain of ESSs among inherited disease-causing variants relative to neutral polymorphisms, indicating that exon skipping may play a prominent role in aberrant gene regulation. Both computational and biochemical approaches underscore the relevance of exonic splicing enhancer loss and silencer gain in inherited disease. Additionally, we provide direct evidence that both SRp20 (SRSF3) and possibly PTB (PTBP1) are involved in the function of a splicing silencer that is created de novo by a total of 83 different inherited disease mutations in 67 different disease genes. Taken together, we find that ~25% (7154/27,681) of known mis-sense and nonsense disease-causing mutations alter functional splicing signals within exons, suggesting a much more widespread role for aberrant mRNA processing in causing human inherited disease than has hitherto been appreciated.
Collapse
Affiliation(s)
- Timothy Sterne-Weiler
- Department of Molecular, Cellular and Developmental Biology, University of California Santa Cruz, Santa Cruz, California 95064, USA
| | | | | | | | | |
Collapse
|
44
|
Ying H, Huttley G. Exploiting CpG hypermutability to identify phenotypically significant variation within human protein-coding genes. Genome Biol Evol 2011; 3:938-49. [PMID: 21398426 PMCID: PMC3184784 DOI: 10.1093/gbe/evr021] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
The CpG dinucleotide is disproportionately represented in human genetic variation due to the hypermutability of 5-methyl-cytosine (5mC). We exploit this hypermutability and a novel codon substitution model to identify candidate functionally important exonic nucleotides. Population genetic theory suggests that codon positions with high cross-species CpG frequency will derive from stronger purifying selection. Using the phylogeny-based maximum likelihood inference framework, we applied codon substitution models with context-dependent parameters to measure the mutagenic and selective processes affecting CpG dinucleotides within exonic sequence. The suitability of these models was validated on >2,000 protein coding genes from a naturally occurring biological control, four yeast species that do not methylate their DNA. As expected, our analyses of yeast revealed no evidence for an elevated CpG transition rate or for substitution suppression affecting CpG-containing codons. Our analyses of >12,000 protein-coding genes from four primate lineages confirm the systemic influence of 5mC hypermutability on the divergence of these genes. After adjusting for confounding influences of mutation and the properties of the encoded amino acids, we confirmed that CpG-containing codons are under greater purifying selection in primates. Genes with significant evidence of enhanced suppression of nonsynonymous CpG changes were also shown to be significantly enriched in Online Mendelian Inheritance in Man. We developed a method for ranking candidate phenotypically influential CpG positions in human genes. Application of this method indicates that of the ∼1 million exonic CpG dinucleotides within humans, ∼20% are strong candidates for both hypermutability and disease association.
Collapse
Affiliation(s)
- Hua Ying
- Department of Genome Biology, John Curtin School of Medical Research, The Australian National University, Canberra, ACT 0200, Australia
| | | |
Collapse
|
45
|
Gray VE, Kumar S. Rampant purifying selection conserves positions with posttranslational modifications in human proteins. Mol Biol Evol 2011; 28:1565-8. [PMID: 21273632 DOI: 10.1093/molbev/msr013] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Posttranslational modifications (PTMs) are chemical alterations that are critical to protein conformation and activation states. Despite their functional importance and reported involvement in many diseases, evolutionary analyses have produced enigmatic results because only weak or no selective pressures have been attributed to many types of PTMs. In a large-scale analysis of 16,836 PTM positions from 4,484 human proteins, we find that positions harboring PTMs show evidence of higher purifying selection in 70% of the phosphorylated and N-linked glycosylated proteins. The purifying selection is up to 42% more severe at PTM residues as compared with the corresponding unmodified amino acids. These results establish extensive selective pressures in the long-term history of positions that experience PTMs in the human proteins. Our findings will enhance our understanding of the historical function of PTMs over time and help in predicting PTM positions by using evolutionary comparisons.
Collapse
Affiliation(s)
- Vanessa E Gray
- Center for Evolutionary Medicine and Informatics, The Biodesign Institute, Arizona State University
| | | |
Collapse
|
46
|
Phenotype prediction of nonsynonymous single nucleotide polymorphisms in human phase II drug/xenobiotic metabolizing enzymes: perspectives on molecular evolution. SCIENCE CHINA-LIFE SCIENCES 2010; 53:1252-62. [DOI: 10.1007/s11427-010-4062-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2010] [Accepted: 05/27/2010] [Indexed: 12/18/2022]
|
47
|
Cooper DN, Chen JM, Ball EV, Howells K, Mort M, Phillips AD, Chuzhanova N, Krawczak M, Kehrer-Sawatzki H, Stenson PD. Genes, mutations, and human inherited disease at the dawn of the age of personalized genomics. Hum Mutat 2010; 31:631-55. [PMID: 20506564 DOI: 10.1002/humu.21260] [Citation(s) in RCA: 117] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The number of reported germline mutations in human nuclear genes, either underlying or associated with inherited disease, has now exceeded 100,000 in more than 3,700 different genes. The availability of these data has both revolutionized the study of the morbid anatomy of the human genome and facilitated "personalized genomics." With approximately 300 new "inherited disease genes" (and approximately 10,000 new mutations) being identified annually, it is pertinent to ask how many "inherited disease genes" there are in the human genome, how many mutations reside within them, and where such lesions are likely to be located? To address these questions, it is necessary not only to reconsider how we define human genes but also to explore notions of gene "essentiality" and "dispensability."Answers to these questions are now emerging from recent novel insights into genome structure and function and through complete genome sequence information derived from multiple individual human genomes. However, a change in focus toward screening functional genomic elements as opposed to genes sensu stricto will be required if we are to capitalize fully on recent technical and conceptual advances and identify new types of disease-associated mutation within noncoding regions remote from the genes whose function they disrupt.
Collapse
Affiliation(s)
- David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, United Kingdom.
| | | | | | | | | | | | | | | | | | | |
Collapse
|