1
|
Campitelli P, Ross D, Swint-Kruse L, Ozkan SB. Dynamics-based protein network features accurately discriminate neutral and rheostat positions. Biophys J 2024:S0006-3495(24)00625-8. [PMID: 39277794 DOI: 10.1016/j.bpj.2024.09.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 07/03/2024] [Accepted: 09/11/2024] [Indexed: 09/17/2024] Open
Abstract
In some proteins, a unique class of nonconserved positions is characterized by their ability to generate diverse functional outcomes through single amino acid substitutions. Due to their ability to tune protein function, accurately identifying such "rheostat" positions is crucial for protein design, for understanding the impact of mutations observed in humans, and for predicting the evolution of pathogen drug resistance. However, identifying rheostat positions has been challenging, due-in part-to the absence of a clear structural relationship with binding sites. In this study, experimental data from our previous study of the Escherichia coli lactose repressor protein (LacI) was used to identify rheostat positions for which mutations tune in vivo EC50 for the allosteric ligand "IPTG." We next used the rheostat assignments to test the hypothesis that rheostat positions have unique dynamic features that will enable their identification. To that end, we integrated all-atom molecular dynamics simulations with perturbation residue response analysis. Results first revealed distinct dynamic behavior in IPTG-bound LacI compared with apo LacI, which was consistent with IPTG's role as an allosteric inducer. Next, we used a variety of dynamic features to build a classification model that discriminates experimentally characterized rheostat positions in LacI from positions with other types of substitution outcomes. In parallel, we built a second classifier model based on the 3D structural "static" network features of LacI. In comparative studies, the dynamic model better identified rheostat positions that were >8 Å from the binding site. In summary, our study provides insights into the dynamic characteristics of rheostat positions and suggests that models built on dynamic features may be useful for predicting the locations of rheostat positions in a wide range of proteins.
Collapse
Affiliation(s)
- P Campitelli
- Department of Physics, Center for Biological Physics, Arizona State University, Tempe, Arizona
| | - D Ross
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland
| | - L Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas.
| | - S B Ozkan
- Department of Physics, Center for Biological Physics, Arizona State University, Tempe, Arizona.
| |
Collapse
|
2
|
Chen H, Shu J, Maley CC, Liu L. A Mouse-Specific Model to Detect Genes under Selection in Tumors. Cancers (Basel) 2023; 15:5156. [PMID: 37958330 PMCID: PMC10647215 DOI: 10.3390/cancers15215156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/16/2023] [Accepted: 10/18/2023] [Indexed: 11/15/2023] Open
Abstract
The mouse is a widely used model organism in cancer research. However, no computational methods exist to identify cancer driver genes in mice due to a lack of labeled training data. To address this knowledge gap, we adapted the GUST (Genes Under Selection in Tumors) model, originally trained on human exomes, to mouse exomes via transfer learning. The resulting tool, called GUST-mouse, can estimate long-term and short-term evolutionary selection in mouse tumors, and distinguish between oncogenes, tumor suppressor genes, and passenger genes using high-throughput sequencing data. We applied GUST-mouse to analyze 65 exomes of mouse primary breast cancer models and 17 exomes of mouse leukemia models. Comparing the predictions between cancer types and between human and mouse tumors revealed common and unique driver genes. The GUST-mouse method is available as an open-source R package on github.
Collapse
Affiliation(s)
- Hai Chen
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA; (H.C.); (J.S.)
- Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA;
| | - Jingmin Shu
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA; (H.C.); (J.S.)
- Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA;
| | - Carlo C. Maley
- Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA;
- Arizona Cancer Evolution Center, Arizona State University, Tempe, AZ 85281, USA
| | - Li Liu
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA; (H.C.); (J.S.)
- Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA;
- Arizona Cancer Evolution Center, Arizona State University, Tempe, AZ 85281, USA
| |
Collapse
|
3
|
Liska O, Boross G, Rocabert C, Szappanos B, Tengölics R, Papp B. Principles of metabolome conservation in animals. Proc Natl Acad Sci U S A 2023; 120:e2302147120. [PMID: 37603743 PMCID: PMC10468614 DOI: 10.1073/pnas.2302147120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 07/16/2023] [Indexed: 08/23/2023] Open
Abstract
Metabolite levels shape cellular physiology and disease susceptibility, yet the general principles governing metabolome evolution are largely unknown. Here, we introduce a measure of conservation of individual metabolite levels among related species. By analyzing multispecies tissue metabolome datasets in phylogenetically diverse mammals and fruit flies, we show that conservation varies extensively across metabolites. Three major functional properties, metabolite abundance, essentiality, and association with human diseases predict conservation, highlighting a striking parallel between the evolutionary forces driving metabolome and protein sequence conservation. Metabolic network simulations recapitulated these general patterns and revealed that abundant metabolites are highly conserved due to their strong coupling to key metabolic fluxes in the network. Finally, we show that biomarkers of metabolic diseases can be distinguished from other metabolites simply based on evolutionary conservation, without requiring any prior clinical knowledge. Overall, this study uncovers simple rules that govern metabolic evolution in animals and implies that most tissue metabolome differences between species are permitted, rather than favored by natural selection. More broadly, our work paves the way toward using evolutionary information to identify biomarkers, as well as to detect pathogenic metabolome alterations in individual patients.
Collapse
Affiliation(s)
- Orsolya Liska
- Hungarian Centre of Excellence for Molecular Medicine - Biological Research Centre Metabolic Systems Biology Lab, 6728Szeged, Hungary
- National Laboratory of Biotechnology, Synthetic and System Biology Unit, Institute of Biochemistry, Biological Research Centre, Eötvös Loránd Research Network, 6726Szeged, Hungary
- Doctoral School of Biology, University of Szeged, 6726Szeged, Hungary
| | - Gábor Boross
- National Laboratory of Biotechnology, Synthetic and System Biology Unit, Institute of Biochemistry, Biological Research Centre, Eötvös Loránd Research Network, 6726Szeged, Hungary
- Department of Biology, Stanford University, Stanford, City of Palo Alto, CA94305-5020
| | - Charles Rocabert
- National Laboratory of Biotechnology, Synthetic and System Biology Unit, Institute of Biochemistry, Biological Research Centre, Eötvös Loránd Research Network, 6726Szeged, Hungary
- Inria, 78150Rocquencourt, 69100Villeurbanne, France
- Organismal and Evolutionary Biology Research Programme, University of Helsinki, 00014Helsinki, Finland
- Institute for Computational Cell Biology, Heinrich-Heine Universität, 40225Düsseldorf, Germany
| | - Balázs Szappanos
- Hungarian Centre of Excellence for Molecular Medicine - Biological Research Centre Metabolic Systems Biology Lab, 6728Szeged, Hungary
- National Laboratory of Biotechnology, Synthetic and System Biology Unit, Institute of Biochemistry, Biological Research Centre, Eötvös Loránd Research Network, 6726Szeged, Hungary
- Department of Biotechnology, University of Szeged, 6726Szeged, Hungary
| | - Roland Tengölics
- Hungarian Centre of Excellence for Molecular Medicine - Biological Research Centre Metabolic Systems Biology Lab, 6728Szeged, Hungary
- National Laboratory of Biotechnology, Synthetic and System Biology Unit, Institute of Biochemistry, Biological Research Centre, Eötvös Loránd Research Network, 6726Szeged, Hungary
- Metabolomics Lab, Core facilities, Biological Research Centre, Eötvös Loránd Research Network, 6726Szeged, Hungary
| | - Balázs Papp
- Hungarian Centre of Excellence for Molecular Medicine - Biological Research Centre Metabolic Systems Biology Lab, 6728Szeged, Hungary
- National Laboratory of Biotechnology, Synthetic and System Biology Unit, Institute of Biochemistry, Biological Research Centre, Eötvös Loránd Research Network, 6726Szeged, Hungary
- National Laboratory for Health Security, Biological Research Centre, Eötvös Loránd Research Network, 6726Szeged, Hungary
| |
Collapse
|
4
|
Manzoor H, Zahid H, Emerling CA, Kumar KR, Hussain HMJ, Seo GH, Wajid M, Naz S. A biallelic variant of DCAF13 implicated in a neuromuscular disorder in humans. Eur J Hum Genet 2023; 31:629-637. [PMID: 36797467 PMCID: PMC10250411 DOI: 10.1038/s41431-023-01319-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 02/05/2023] [Accepted: 02/09/2023] [Indexed: 02/18/2023] Open
Abstract
Neuromuscular disorders encompass a broad range of phenotypes and genetic causes. We investigated a consanguineous family in which multiple patients had a neuromuscular disorder characterized by a waddling gait, limb deformities, muscular weakness and facial palsy. Exome sequencing was completed on the DNA of three of the four patients. We identified a novel missense variant in DCAF13, ENST00000612750.5, NM_015420.7, c.907 G > A;p.(Asp303Asn), ENST00000616836.4, NM_015420.6, c.1363 G > A:p.(Asp455Asn) (rs1209794872) segregating with this phenotype; being homozygous in all four affected patients and heterozygous in the unaffected individuals. The variant was extremely rare in the public databases (gnomAD allele frequency 0.000007081); was absent from the DNA of 300 ethnically matched controls and affected an amino acid which has been conserved across 1-2 billion years of evolution in eukaryotes. DCAF13 contains three WD40 domains and is hypothesized to have roles in both rRNA processing and in ubiquitination of proteins. Analysis of DCAF13 with the p.(Asp455Asn) variant predicted that the amino acid change is deleterious and affects a β-hairpin turn, within a WD40 domain of the protein which may decrease protein stability. Previously, a heterozygous variant of DCAF13 NM_015420.6, c.20 G > C:p.(Trp7Ser) with or without a heterozygous missense variant in CCN3, was suggested to cause inherited cortical myoclonic tremor with epilepsy. In addition, a heterozygous DCAF13 variant has been associated with autism spectrum disorder. Our study indicates a potential role of biallelic DCAF13 variants in neuromuscular disorders. Screening of additional patients with similar phenotype may broaden the allelic and phenotypic spectrum due to DCAF13 variants.
Collapse
Affiliation(s)
- Humera Manzoor
- School of Biological Sciences, University of the Punjab, Quaid-e-Azam Campus, Lahore, 54590, Pakistan
- Department of Human Genetics and Molecular Biology, University of Health Sciences, Lahore, Pakistan
| | - Hafsa Zahid
- School of Biological Sciences, University of the Punjab, Quaid-e-Azam Campus, Lahore, 54590, Pakistan
| | | | - Kishore R Kumar
- Molecular Medicine Laboratory and Department of Neurology, Concord Repatriation General Hospital, Concord Clinical School Faculty of Medicine and Health, University of Sydney, Sydney, NSW, Australia
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| | | | | | - Muhammad Wajid
- Department of Zoology, University of Okara, Punjab, Pakistan
| | - Sadaf Naz
- School of Biological Sciences, University of the Punjab, Quaid-e-Azam Campus, Lahore, 54590, Pakistan.
| |
Collapse
|
5
|
Ose NJ, Butler BM, Kumar A, Kazan IC, Sanderford M, Kumar S, Ozkan SB. Dynamic coupling of residues within proteins as a mechanistic foundation of many enigmatic pathogenic missense variants. PLoS Comput Biol 2022; 18:e1010006. [PMID: 35389981 PMCID: PMC9017885 DOI: 10.1371/journal.pcbi.1010006] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 04/19/2022] [Accepted: 03/09/2022] [Indexed: 01/07/2023] Open
Abstract
Many pathogenic missense mutations are found in protein positions that are neither well-conserved nor fall in any known functional domains. Consequently, we lack any mechanistic underpinning of dysfunction caused by such mutations. We explored the disruption of allosteric dynamic coupling between these positions and the known functional sites as a possible mechanism for pathogenesis. In this study, we present an analysis of 591 pathogenic missense variants in 144 human enzymes that suggests that allosteric dynamic coupling of mutated positions with known active sites is a plausible biophysical mechanism and evidence of their functional importance. We illustrate this mechanism in a case study of β-Glucocerebrosidase (GCase) in which a vast majority of 94 sites harboring Gaucher disease-associated missense variants are located some distance away from the active site. An analysis of the conformational dynamics of GCase suggests that mutations on these distal sites cause changes in the flexibility of active site residues despite their distance, indicating a dynamic communication network throughout the protein. The disruption of the long-distance dynamic coupling caused by missense mutations may provide a plausible general mechanistic explanation for biological dysfunction and disease.
Collapse
Affiliation(s)
- Nicholas J. Ose
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - Brandon M. Butler
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - Avishek Kumar
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - I. Can Kazan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - Maxwell Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
- Department of Biology, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
- Department of Biology, Temple University, Philadelphia, Pennsylvania, United States of America
- Center for Genomic Medicine Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - S. Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| |
Collapse
|
6
|
Murphy WJ, Foley NM, Bredemeyer KR, Gatesy J, Springer MS. Phylogenomics and the Genetic Architecture of the Placental Mammal Radiation. Annu Rev Anim Biosci 2020; 9:29-53. [PMID: 33228377 DOI: 10.1146/annurev-animal-061220-023149] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The genomes of placental mammals are being sequenced at an unprecedented rate. Alignments of hundreds, and one day thousands, of genomes spanning the rich living and extinct diversity of species offer unparalleled power to resolve phylogenetic controversies, identify genomic innovations of adaptation, and dissect the genetic architecture of reproductive isolation. We highlight outstanding questions about the earliest phases of placental mammal diversification and the promise of newer methods, as well as remaining challenges, toward using whole genome data to resolve placental mammal phylogeny. The next phase of mammalian comparative genomics will see the completion and application of finished-quality, gapless genome assemblies from many ordinal lineages and closely related species. Interspecific comparisons between the most hypervariable genomic loci will likely reveal large, but heretofore mostly underappreciated, effects on population divergence, morphological innovation, and the origin of new species.
Collapse
Affiliation(s)
- William J Murphy
- Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas 77843, USA;
| | - Nicole M Foley
- Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas 77843, USA;
| | - Kevin R Bredemeyer
- Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas 77843, USA;
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA
| | - Mark S Springer
- Department of Evolution, Ecology and Organismal Biology, University of California, Riverside, California 92521, USA
| |
Collapse
|
7
|
Allostery and Epistasis: Emergent Properties of Anisotropic Networks. ENTROPY 2020; 22:e22060667. [PMID: 33286439 PMCID: PMC7517209 DOI: 10.3390/e22060667] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 06/02/2020] [Accepted: 06/08/2020] [Indexed: 11/17/2022]
Abstract
Understanding the underlying mechanisms behind protein allostery and non-additivity of substitution outcomes (i.e., epistasis) is critical when attempting to predict the functional impact of mutations, particularly at non-conserved sites. In an effort to model these two biological properties, we extend the framework of our metric to calculate dynamic coupling between residues, the Dynamic Coupling Index (DCI) to two new metrics: (i) EpiScore, which quantifies the difference between the residue fluctuation response of a functional site when two other positions are perturbed with random Brownian kicks simultaneously versus individually to capture the degree of cooperativity of these two other positions in modulating the dynamics of the functional site and (ii) DCIasym, which measures the degree of asymmetry between the residue fluctuation response of two sites when one or the other is perturbed with a random force. Applied to four independent systems, we successfully show that EpiScore and DCIasym can capture important biophysical properties in dual mutant substitution outcomes. We propose that allosteric regulation and the mechanisms underlying non-additive amino acid substitution outcomes (i.e., epistasis) can be understood as emergent properties of an anisotropic network of interactions where the inclusion of the full network of interactions is critical for accurate modeling. Consequently, mutations which drive towards a new function may require a fine balance between functional site asymmetry and strength of dynamic coupling with the functional sites. These two tools will provide mechanistic insight into both understanding and predicting the outcome of dual mutations.
Collapse
|
8
|
Guan X, Runger G, Liu L. Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery. BMC Bioinformatics 2020; 21:77. [PMID: 32164534 PMCID: PMC7068914 DOI: 10.1186/s12859-020-3344-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Background In biomarker discovery, applying domain knowledge is an effective approach to eliminating false positive features, prioritizing functionally impactful markers and facilitating the interpretation of predictive signatures. Several computational methods have been developed that formulate the knowledge-based biomarker discovery as a feature selection problem guided by prior information. These methods often require that prior information is encoded as a single score and the algorithms are optimized for biological knowledge of a specific type. However, in practice, domain knowledge from diverse resources can provide complementary information. But no current methods can integrate heterogeneous prior information for biomarker discovery. To address this problem, we developed the Know-GRRF (know-guided regularized random forest) method that enables dynamic incorporation of domain knowledge from multiple disciplines to guide feature selection. Results Know-GRRF embeds domain knowledge in a regularized random forest framework. It combines prior information from multiple domains in a linear model to derive a composite score, which, together with other tuning parameters, controls the regularization of the random forests model. Know-GRRF concurrently optimizes the weight given to each type of domain knowledge and other tuning parameters to minimize the AIC of out-of-bag predictions. The objective is to select a compact feature subset that has a high discriminative power and strong functional relevance to the biological phenotype. Via rigorous simulations, we show that Know-GRRF guided by multiple-domain prior information outperforms feature selection methods guided by single-domain prior information or no prior information. We then applied Known-GRRF to a real-world study to identify prognostic biomarkers of prostate cancers. We evaluated the combination of cancer-related gene annotations, evolutionary conservation and pre-computed statistical scores as the prior knowledge to assemble a panel of biomarkers. We discovered a compact set of biomarkers with significant improvements on prediction accuracies. Conclusions Know-GRRF is a powerful novel method to incorporate knowledge from multiple domains for feature selection. It has a broad range of applications in biomarker discoveries. We implemented this method and released a KnowGRRF package in the R/CRAN archive.
Collapse
Affiliation(s)
- Xin Guan
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA.,Intel Corporation, Chandler, AZ, 85226, USA
| | - George Runger
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA
| | - Li Liu
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA. .,Biodesign Institute, Arizona State University, Tempe, AZ, 85287, USA. .,Department of Neurology, Mayo Clinic, Scottsdale, AZ, 85259, USA.
| |
Collapse
|
9
|
Campitelli P, Modi T, Kumar S, Ozkan SB. The Role of Conformational Dynamics and Allostery in Modulating Protein Evolution. Annu Rev Biophys 2020; 49:267-288. [PMID: 32075411 DOI: 10.1146/annurev-biophys-052118-115517] [Citation(s) in RCA: 89] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Advances in sequencing techniques and statistical methods have made it possible not only to predict sequences of ancestral proteins but also to identify thousands of mutations in the human exome, some of which are disease associated. These developments have motivated numerous theories and raised many questions regarding the fundamental principles behind protein evolution, which have been traditionally investigated horizontally using the tip of the phylogenetic tree through comparative studies of extant proteins within a family. In this article, we review a vertical comparison of the modern and resurrected ancestral proteins. We focus mainly on the dynamical properties responsible for a protein's ability to adapt new functions in response to environmental changes. Using the Dynamic Flexibility Index and the Dynamic Coupling Index to quantify the relative flexibility and dynamic coupling at a site-specific, single-amino-acid level, we provide evidence that the migration of hinges, which are often functionally critical rigid sites, is a mechanism through which proteins can rapidly evolve. Additionally, we show that disease-associated mutations in proteins often result in flexibility changes even at positions distal from mutational sites, particularly in the modulation of active site dynamics.
Collapse
Affiliation(s)
- Paul Campitelli
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona 85281, USA; , ,
| | - Tushar Modi
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona 85281, USA; , ,
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania 19122, USA; .,Department of Biology, Temple University, Philadelphia, Pennsylvania 19122, USA.,Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - S Banu Ozkan
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona 85281, USA; , ,
| |
Collapse
|
10
|
Wong KC, Yan S, Lin Q, Li X, Peng C. Deleterious Non-Synonymous Single Nucleotide Polymorphism Predictions on Human Transcription Factors. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:327-333. [PMID: 30475727 DOI: 10.1109/tcbb.2018.2882548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Transcription factors (TFs) are the major components of human gene regulation. In particular, they bind onto specific DNA sequences and regulate neighborhood genes in different tissues at different developmental stages. Non-synonymous single nucleotide polymorphisms on its protein-coding sequences could result in undesired consequences in human. Therefore, it is necessary to develop methods for predicting any abnormality among those non-synonymous single nucleotide polymorphisms. To address it, we have developed and compared different strategies to predict deleterious non-synonymous single nucleotide polymorphisms (also known as missense mutations) on the protein-coding sequences of human TFs. Taking advantage of evolutionary conservation signals, we have developed and compared different classifiers with different feature sets as computed from different evolutionarily related sequence collections. The results indicate that the classic ensemble algorithm, Adaboost with decision stumps, with orthologous sequence collection, has performed the best (namely, TFmedic). We have further compared TFmedic with other state-of-the-arts methods (i.e., PolyPhen-2 and SIFT) on PolyPhen-2's own datasets, demonstrating that TFmedic can outperform the others. As applications, we have further applied TFmedic to all possible missense mutations on all human transcription factors; the proteome-wide results reveal interesting insights, consistent with the existing physiochemical knowledge. A case study with the actual 3D structure is conducted, revealing how TFmedic can be contributed to protein-DNA binding complex studies.
Collapse
|
11
|
Sharma V, Hiller M. Losses of human disease-associated genes in placental mammals. NAR Genom Bioinform 2019; 2:lqz012. [PMID: 33575564 PMCID: PMC7671337 DOI: 10.1093/nargab/lqz012] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 08/24/2019] [Accepted: 10/08/2019] [Indexed: 02/07/2023] Open
Abstract
We systematically investigate whether losses of human disease-associated genes occurred in other mammals during evolution. We first show that genes lost in any of 62 non-human mammals generally have a lower degree of pleiotropy, and are highly depleted in essential and disease-associated genes. Despite this under-representation, we discovered multiple genes implicated in human disease that are truly lost in non-human mammals. In most cases, traits resembling human disease symptoms are present but not deleterious in gene-loss species, exemplified by losses of genes causing human eye or teeth disorders in poor-vision or enamel-less mammals. We also found widespread losses of PCSK9 and CETP genes, where loss-of-function mutations in humans protect from atherosclerosis. Unexpectedly, we discovered losses of disease genes (TYMP, TBX22, ABCG5, ABCG8, MEFV, CTSE) where deleterious phenotypes do not manifest in the respective species. A remarkable example is the uric acid-degrading enzyme UOX, which we found to be inactivated in elephants and manatees. While UOX loss in hominoids led to high serum uric acid levels and a predisposition for gout, elephants and manatees exhibit low uric acid levels, suggesting alternative ways of metabolizing uric acid. Together, our results highlight numerous mammals that are 'natural knockouts' of human disease genes.
Collapse
Affiliation(s)
- Virag Sharma
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany.,Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany.,Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Michael Hiller
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany.,Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany.,Center for Systems Biology Dresden, 01307 Dresden, Germany
| |
Collapse
|
12
|
Alves LQ, Alves J, Ribeiro R, Ruivo R, Castro F. The dopamine receptor D 5 gene shows signs of independent erosion in toothed and baleen whales. PeerJ 2019; 7:e7758. [PMID: 31616587 PMCID: PMC6791347 DOI: 10.7717/peerj.7758] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Accepted: 08/26/2019] [Indexed: 12/30/2022] Open
Abstract
To compare gene loci considering a phylogenetic framework is a promising approach to uncover the genetic basis of human diseases. Imbalance of dopaminergic systems is suspected to underlie some emerging neurological disorders. The physiological functions of dopamine are transduced via G-protein-coupled receptors, including DRD5 which displays a relatively higher affinity toward dopamine. Importantly, DRD5 knockout mice are hypertense, a condition emerging from an increase in sympathetic tone. We investigated the evolution of DRD5, a high affinity receptor for dopamine, in mammals. Surprisingly, among 124 investigated mammalian genomes, we found that Cetacea lineages (Mysticeti and Odontoceti) have independently lost this gene, as well as the burrowing Chrysochloris asiatica (Cape golden mole). We suggest that DRD5 inactivation parallels hypoxia-induced adaptations, such as peripheral vasoconstriction required for deep-diving in Cetacea, in accordance with the convergent evolution of vasoconstrictor genes in hypoxia-exposed animals. Our findings indicate that Cetacea are natural knockouts for DRD5 and might offer valuable insights into the mechanisms of some forms of vasoconstriction responses and hypertension in humans.
Collapse
Affiliation(s)
- Luís Q Alves
- CIIMAR-University of Porto, Matosinhos, Portugal.,FCUP-University of Porto, Porto, Portugal
| | - Juliana Alves
- CIIMAR-University of Porto, Matosinhos, Portugal.,FCUP-University of Porto, Porto, Portugal
| | - Rodrigo Ribeiro
- CIIMAR-University of Porto, Matosinhos, Portugal.,FCUP-University of Porto, Porto, Portugal
| | - Raquel Ruivo
- CIIMAR-University of Porto, Matosinhos, Portugal
| | | |
Collapse
|
13
|
Chong CS, Kunze M, Hochreiter B, Krenn M, Berger J, Maurer-Stroh S. Rare Human Missense Variants can affect the Function of Disease-Relevant Proteins by Loss and Gain of Peroxisomal Targeting Motifs. Int J Mol Sci 2019; 20:E4609. [PMID: 31533369 PMCID: PMC6770196 DOI: 10.3390/ijms20184609] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Revised: 09/06/2019] [Accepted: 09/14/2019] [Indexed: 12/30/2022] Open
Abstract
Single nucleotide variants (SNVs) resulting in amino acid substitutions (i.e., missense variants) can affect protein localization by changing or creating new targeting signals. Here, we studied the potential of naturally occurring SNVs from the Genome Aggregation Database (gnomAD) to result in the loss of an existing peroxisomal targeting signal 1 (PTS1) or gain of a novel PTS1 leading to mistargeting of cytosolic proteins to peroxisomes. Filtering down from 32,985 SNVs resulting in missense mutations within the C-terminal tripeptide of 23,064 human proteins, based on gene annotation data and computational prediction, we selected six SNVs for experimental testing of loss of function (LoF) of the PTS1 motif and five SNVs in cytosolic proteins for gain in PTS1-mediated peroxisome import (GoF). Experimental verification by immunofluorescence microscopy for subcellular localization and FRET affinity measurements for interaction with the receptor PEX5 demonstrated that five of the six predicted LoF SNVs resulted in loss of the PTS1 motif while three of five predicted GoF SNVs resulted in de novo PTS1 generation. Overall, we showed that a complementary approach incorporating bioinformatics methods and experimental testing was successful in identifying SNVs capable of altering peroxisome protein import, which may have implications in human disease.
Collapse
Affiliation(s)
- Cheng-Shoong Chong
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore 138671, Singapore.
- National University of Singapore Graduate School for Integrative Sciences and Engineering (NGS), National University of Singapore, Singapore 119077, Singapore.
| | - Markus Kunze
- Medical University of Vienna, Center for Brain Research, Department of Pathobiology of the Nervous System, 1090 Vienna, Austria.
| | - Bernhard Hochreiter
- Medical University of Vienna, Center for Physiology and Pharmacology, Institute for Vascular Biology and Thrombosis Research, 1090 Vienna, Austria.
| | - Martin Krenn
- Department of Neurology, Medical University of Vienna, 1090 Vienna, Austria.
- Institute of Human Genetics, Technical University Munich, 81675 Munich, Germany.
| | - Johannes Berger
- Medical University of Vienna, Center for Brain Research, Department of Pathobiology of the Nervous System, 1090 Vienna, Austria.
| | - Sebastian Maurer-Stroh
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore 138671, Singapore.
- National University of Singapore Graduate School for Integrative Sciences and Engineering (NGS), National University of Singapore, Singapore 119077, Singapore.
- Department of Biological Sciences, National University of Singapore, Singapore 117558, Singapore.
- Innovations in Food and Chemical Safety Programme (IFCS), Agency for Science, Technology and Research (A*STAR), Singapore 138671, Singapore.
| |
Collapse
|
14
|
Butler BM, Kazan IC, Kumar A, Ozkan SB. Coevolving residues inform protein dynamics profiles and disease susceptibility of nSNVs. PLoS Comput Biol 2018; 14:e1006626. [PMID: 30496278 PMCID: PMC6289467 DOI: 10.1371/journal.pcbi.1006626] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Revised: 12/11/2018] [Accepted: 11/09/2018] [Indexed: 11/18/2022] Open
Abstract
The conformational dynamics of proteins is rarely used in methodologies used to predict the impact of genetic mutations due to the paucity of three-dimensional protein structures as compared to the vast number of available sequences. Until now a three-dimensional (3D) structure has been required to predict the conformational dynamics of a protein. We introduce an approach that estimates the conformational dynamics of a protein, without relying on structural information. This de novo approach utilizes coevolving residues identified from a multiple sequence alignment (MSA) using Potts models. These coevolving residues are used as contacts in a Gaussian network model (GNM) to obtain protein dynamics. B-factors calculated using sequence-based GNM (Seq-GNM) are in agreement with crystallographic B-factors as well as theoretical B-factors from the original GNM that utilizes the 3D structure. Moreover, we demonstrate the ability of the calculated B-factors from the Seq-GNM approach to discriminate genomic variants according to their phenotypes for a wide range of proteins. These results suggest that protein dynamics can be approximated based on sequence information alone, making it possible to assess the phenotypes of nSNVs in cases where a 3D structure is unknown. We hope this work will promote the use of dynamics information in genetic disease prediction at scale by circumventing the need for 3D structures. Proteins are dynamic machines that undergo atomic fluctuations, side chain rotations, and collective domain movements that are required for biological function. There is, therefore, a need for quantitative metrics that capture the dynamic fluctuations per position to understand the critical role of protein dynamics in shaping biological functions. A limiting factor in incorporating structural dynamics information in the classification of non-synonymous single nucleotide variants (nSNVs) is the limited number of known 3D structures compared to the vast number of available sequences. We have developed a new sequence-based GNM method, termed Seq-GNM, which uses co-evolving amino acid positions based on the multiple sequence alignment of a given query sequence to estimate the thermal motions of C-alpha atoms. In this paper, we have demonstrated that the predicted thermal motions using Seq-GNM are in reasonable agreement with experimental B-factors as well as B-factors computed using 3D crystal structures. We also provide evidence that B-factors predicted by Seq-GNM are capable of distinguishing between disease-associated and neutral nSNVs.
Collapse
Affiliation(s)
- Brandon M. Butler
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
| | - I. Can Kazan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
| | - Avishek Kumar
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
- Harris School of Public Policy and Center for Data Science and Public Policy, University of Chicago, Chicago, IL, United States of America
| | - S. Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
- * E-mail:
| |
Collapse
|
15
|
Lopez JV, Kamel B, Medina M, Collins T, Baums IB. Multiple Facets of Marine Invertebrate Conservation Genomics. Annu Rev Anim Biosci 2018; 7:473-497. [PMID: 30485758 DOI: 10.1146/annurev-animal-020518-115034] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Conservation genomics aims to preserve the viability of populations and the biodiversity of living organisms. Invertebrate organisms represent 95% of animal biodiversity; however, few genomic resources currently exist for the group. The subset of marine invertebrates includes the most ancient metazoan lineages and possesses codes for unique gene products and possible keys to adaptation. The benefits of supporting invertebrate conservation genomics research (e.g., likely discovery of novel genes, protein regulatory mechanisms, genomic innovations, and transposable elements) outweigh the various hurdles (rare, small, or polymorphic starting materials). Here we review best conservation genomics practices in the laboratory and in silico when applied to marine invertebrates and also showcase unique features in several case studies of acroporid corals, crown-of-thorns starfish, apple snails, and abalone. Marine conservation genomics should also address how diversity can lead to unique marine innovations, the impact of deleterious variation, and how genomic monitoring and profiling could positively affect broader conservation goals (e.g., value of baseline data for in situ/ex situ genomic stocks).
Collapse
Affiliation(s)
- Jose V Lopez
- Department of Biological Sciences, Halmos College of Natural Sciences and Oceanography, Nova Southeastern University, Dania Beach, Florida 33004, USA;
| | - Bishoy Kamel
- Department of Biology, Center for Evolutionary and Theoretical Immunology, University of New Mexico, Albuquerque, New Mexico 87131, USA;
| | - Mónica Medina
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA; ,
| | - Timothy Collins
- Department of Biological Sciences, Florida International University, Miami, Florida 33199, USA;
| | - Iliana B Baums
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA; ,
| |
Collapse
|
16
|
Abstract
Increasing our understanding of Earth's biodiversity and responsibly stewarding its resources are among the most crucial scientific and social challenges of the new millennium. These challenges require fundamental new knowledge of the organization, evolution, functions, and interactions among millions of the planet's organisms. Herein, we present a perspective on the Earth BioGenome Project (EBP), a moonshot for biology that aims to sequence, catalog, and characterize the genomes of all of Earth's eukaryotic biodiversity over a period of 10 years. The outcomes of the EBP will inform a broad range of major issues facing humanity, such as the impact of climate change on biodiversity, the conservation of endangered species and ecosystems, and the preservation and enhancement of ecosystem services. We describe hurdles that the project faces, including data-sharing policies that ensure a permanent, freely available resource for future scientific discovery while respecting access and benefit sharing guidelines of the Nagoya Protocol. We also describe scientific and organizational challenges in executing such an ambitious project, and the structure proposed to achieve the project's goals. The far-reaching potential benefits of creating an open digital repository of genomic information for life on Earth can be realized only by a coordinated international effort.
Collapse
|
17
|
Abstract
Genetic differences between species and within populations are two sides of the same coin under the neutral theory of molecular evolution. This theory posits that a vast majority of evolutionary substitutions, which appear as differences between species, are (nearly) neutral, that is, these substitutions are permitted without a significantly adverse impact on a species' survival. We refer to them as evolutionarily permissible (ePerm) variation. Evolutionary permissibility of any possible variant can be inferred from multispecies sequence alignments by applying sophisticated statistical methods to the evolutionary tree of species. Here, we explore the evolutionary permissibility of amino acid variants associated with genetic diseases and those observed in personal exomes. Consistent with the predictions of the neutral theory, disease associated amino acid variants are rarely ePerm, much more biochemically radical, and found predominantly at more conserved positions than their non-disease counterparts. Only 10% of amino acid mutations are ePerm, but these variants rise to become two-thirds of all substitutions in the human lineage (a 6-fold enrichment). In contrast, only a minority of the variants in a personal exome are ePerm, a seemingly counterintuitive pattern that results from a combination of mutational and evolutionary processes that are, in fact, broadly consistent with the neutral theory. Evolutionarily forbidden variants outnumber detrimental variants in individual exomes and may play an underappreciated role in protecting against disease. We discuss these observations and conclude that the long-term evolutionary history of species can illuminate functional biomedical properties of variation present in personal exomes.
Collapse
Affiliation(s)
- Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Ravi Patel
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
| |
Collapse
|
18
|
Scholte LLS, Pascoal-Xavier MA, Nahum LA. Helminths and Cancers From the Evolutionary Perspective. Front Med (Lausanne) 2018; 5:90. [PMID: 29713629 PMCID: PMC5911458 DOI: 10.3389/fmed.2018.00090] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Accepted: 03/22/2018] [Indexed: 01/20/2023] Open
Abstract
Helminths include free-living and parasitic Platyhelminthes and Nematoda which infect millions of people worldwide. Some Platyhelminthes species of blood flukes (Schistosoma haematobium, Schistosoma japonicum, and Schistosoma mansoni) and liver flukes (Clonorchis sinensis and Opisthorchis viverrini) are known to be involved in human cancers. Other helminths are likely to be carcinogenic. Our main goals are to summarize the current knowledge of human cancers caused by Platyhelminthes, point out some helminth and human biomarkers identified so far, and highlight the potential contributions of phylogenetics and molecular evolution to cancer research. Human cancers caused by helminth infection include cholangiocarcinoma, colorectal hepatocellular carcinoma, squamous cell carcinoma, and urinary bladder cancer. Chronic inflammation is proposed as a common pathway for cancer initiation and development. Furthermore, different bacteria present in gastric, colorectal, and urogenital microbiomes might be responsible for enlarging inflammatory and fibrotic responses in cancers. Studies have suggested that different biomarkers are involved in helminth infection and human cancer development; although, the detailed mechanisms remain under debate. Different helminth proteins have been studied by different approaches. However, their evolutionary relationships remain unsolved. Here, we illustrate the strengths of homology identification and function prediction of uncharacterized proteins from genome sequencing projects based on an evolutionary framework. Together, these approaches may help identifying new biomarkers for disease diagnostics and intervention measures. This work has potential applications in the field of phylomedicine (evolutionary medicine) and may contribute to parasite and cancer research.
Collapse
Affiliation(s)
- Larissa L. S. Scholte
- Instituto René Rachou, Fundação Oswaldo Cruz (FIOCRUZ), Belo Horizonte, Brazil
- Vice-Presidência de Pesquisa e Coleções Biológicas, Fundação Oswaldo Cruz (FIOCRUZ), Rio de Janeiro, Brazil
| | - Marcelo A. Pascoal-Xavier
- Instituto René Rachou, Fundação Oswaldo Cruz (FIOCRUZ), Belo Horizonte, Brazil
- Departamento de Anatomia Patológica, Faculdade de Medicina, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Laila A. Nahum
- Instituto René Rachou, Fundação Oswaldo Cruz (FIOCRUZ), Belo Horizonte, Brazil
- Faculdade Promove de Tecnologia, Belo Horizonte, Brazil
| |
Collapse
|
19
|
Klink GV, Golovin AV, Bazykin GA. Substitutions into amino acids that are pathogenic in human mitochondrial proteins are more frequent in lineages closely related to human than in distant lineages. PeerJ 2017; 5:e4143. [PMID: 29250469 PMCID: PMC5731343 DOI: 10.7717/peerj.4143] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Accepted: 11/16/2017] [Indexed: 11/23/2022] Open
Abstract
Propensities for different amino acids within a protein site change in the course of evolution, so that an amino acid deleterious in a particular species may be acceptable at the same site in a different species. Here, we study the amino acid-changing variants in human mitochondrial genes, and analyze their occurrence in non-human species. We show that substitutions giving rise to such variants tend to occur in lineages closely related to human more frequently than in more distantly related lineages, indicating that a human variant is more likely to be deleterious in more distant species. Unexpectedly, substitutions giving rise to amino acids that correspond to alleles pathogenic in humans also more frequently occur in more closely related lineages. Therefore, a pathogenic variant still tends to be more acceptable in human mitochondria than a variant that may only be fit after a substantial perturbation of the protein structure.
Collapse
Affiliation(s)
- Galya V. Klink
- Sector of Molecular Evolution, Institute for Information Transmission Problems (Kharkevich Institute) of the Russian Academy of Sciences, Moscow, Russian Federation
| | - Andrey V. Golovin
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russian Federation
| | - Georgii A. Bazykin
- Sector of Molecular Evolution, Institute for Information Transmission Problems (Kharkevich Institute) of the Russian Academy of Sciences, Moscow, Russian Federation
- Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Skolkovo, Russian Federation
| |
Collapse
|
20
|
Emerling CA, Widjaja AD, Nguyen NN, Springer MS. Their loss is our gain: regressive evolution in vertebrates provides genomic models for uncovering human disease loci. J Med Genet 2017; 54:787-794. [PMID: 28814606 DOI: 10.1136/jmedgenet-2017-104837] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2017] [Revised: 07/07/2017] [Accepted: 07/10/2017] [Indexed: 12/20/2022]
Abstract
Throughout Earth's history, evolution's numerous natural 'experiments' have resulted in a diverse range of phenotypes. Though de novo phenotypes receive widespread attention, degeneration of traits inherited from an ancestor is a very common, yet frequently neglected, evolutionary path. The latter phenomenon, known as regressive evolution, often results in vertebrates with phenotypes that mimic inherited disease states in humans. Regressive evolution of anatomical and/or physiological traits is typically accompanied by inactivating mutations underlying these traits, which frequently occur at loci identical to those implicated in human diseases. Here we discuss the potential utility of examining the genomes of vertebrates that have experienced regressive evolution to inform human medical genetics. This approach is low cost and high throughput, giving it the potential to rapidly improve knowledge of disease genetics. We discuss two well-described examples, rod monochromacy (congenital achromatopsia) and amelogenesis imperfecta, to demonstrate the utility of this approach, and then suggest methods to equip non-experts with the ability to corroborate candidate genes and uncover new disease loci.
Collapse
Affiliation(s)
- Christopher A Emerling
- Museum of Vertebrate Zoology, University of California, Berkeley, California, USA
- Department of Biology, University of California, Riverside, California, USA
| | - Andrew D Widjaja
- Department of Biochemistry, University of California, Riverside, California, USA
- Department of Nutritional Sciences and Toxicology, University of California, Berkeley, California, USA
| | - Nancy N Nguyen
- Department of Bioengineering, University of California, Riverside, California, USA
- Department of Bioengineering, University of California, Los Angeles, California, USA
| | - Mark S Springer
- Department of Biology, University of California, Riverside, California, USA
| |
Collapse
|
21
|
Gasse B, Prasad M, Delgado S, Huckert M, Kawczynski M, Garret-Bernardin A, Lopez-Cazaux S, Bailleul-Forestier I, Manière MC, Stoetzel C, Bloch-Zupan A, Sire JY. Evolutionary Analysis Predicts Sensitive Positions of MMP20 and Validates Newly- and Previously-Identified MMP20 Mutations Causing Amelogenesis Imperfecta. Front Physiol 2017; 8:398. [PMID: 28659819 PMCID: PMC5469888 DOI: 10.3389/fphys.2017.00398] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Accepted: 05/26/2017] [Indexed: 12/21/2022] Open
Abstract
Amelogenesis imperfecta (AI) designates a group of genetic diseases characterized by a large range of enamel disorders causing important social and health problems. These defects can result from mutations in enamel matrix proteins or protease encoding genes. A range of mutations in the enamel cleavage enzyme matrix metalloproteinase-20 gene (MMP20) produce enamel defects of varying severity. To address how various alterations produce a range of AI phenotypes, we performed a targeted analysis to find MMP20 mutations in French patients diagnosed with non-syndromic AI. Genomic DNA was isolated from saliva and MMP20 exons and exon-intron boundaries sequenced. We identified several homozygous or heterozygous mutations, putatively involved in the AI phenotypes. To validate missense mutations and predict sensitive positions in the MMP20 sequence, we evolutionarily compared 75 sequences extracted from the public databases using the Datamonkey webserver. These sequences were representative of mammalian lineages, covering more than 150 million years of evolution. This analysis allowed us to find 324 sensitive positions (out of the 483 MMP20 residues), pinpoint functionally important domains, and build an evolutionary chart of important conserved MMP20 regions. This is an efficient tool to identify new- and previously-identified mutations. We thus identified six functional MMP20 mutations in unrelated families, finding two novel mutated sites. The genotypes and phenotypes of these six mutations are described and compared. To date, 13 MMP20 mutations causing AI have been reported, making these genotypes and associated hypomature enamel phenotypes the most frequent in AI.
Collapse
Affiliation(s)
- Barbara Gasse
- Institut de Biologie Paris-Seine, UMR 7138-Evolution Paris-Seine, Sorbonne Universités, Université Pierre et Marie CurieParis, France
| | - Megana Prasad
- Laboratoire de Génétique Médicale, Institut National de la Santé et de la Recherche Médicale UMRS_1112, Institut de Génétique Médicale d'Alsace, FMTS, Université de StrasbourgStrasbourg, France
| | - Sidney Delgado
- Institut de Biologie Paris-Seine, UMR 7138-Evolution Paris-Seine, Sorbonne Universités, Université Pierre et Marie CurieParis, France
| | - Mathilde Huckert
- Laboratoire de Génétique Médicale, Institut National de la Santé et de la Recherche Médicale UMRS_1112, Institut de Génétique Médicale d'Alsace, FMTS, Université de StrasbourgStrasbourg, France.,Faculté de Chirurgie Dentaire, Université de StrasbourgStrasbourg, France
| | - Marzena Kawczynski
- Faculté de Chirurgie Dentaire, Université de StrasbourgStrasbourg, France.,Pôle de Médecine et Chirurgie Bucco-Dentaires, Centre de Référence des Manifestations Odontologiques des Maladies Rares, O-Rares, Hôpitaux Universitaires de StrasbourgStrasbourg, France
| | - Annelyse Garret-Bernardin
- Faculté de Chirurgie Dentaire, Université de StrasbourgStrasbourg, France.,Unit of Dentistry, IRCCS, Bambino Gesù Children's HospitalRome, Italy
| | - Serena Lopez-Cazaux
- Faculté de Chirurgie Dentaire, Département d'Odontologie Pédiatrique, Centre de Compétences Maladies Rares, CHU Hôtel Dieu, Service d'odontologie Conservatrice et PédiatriqueNantes, France
| | - Isabelle Bailleul-Forestier
- Faculté de Chirurgie Dentaire, CHU de Toulouse, Centre de Compétences Maladies Rares, Odontologie Pédiatrique, Université Paul SabatierToulouse, France
| | - Marie-Cécile Manière
- Faculté de Chirurgie Dentaire, Université de StrasbourgStrasbourg, France.,Pôle de Médecine et Chirurgie Bucco-Dentaires, Centre de Référence des Manifestations Odontologiques des Maladies Rares, O-Rares, Hôpitaux Universitaires de StrasbourgStrasbourg, France
| | - Corinne Stoetzel
- Laboratoire de Génétique Médicale, Institut National de la Santé et de la Recherche Médicale UMRS_1112, Institut de Génétique Médicale d'Alsace, FMTS, Université de StrasbourgStrasbourg, France
| | - Agnès Bloch-Zupan
- Faculté de Chirurgie Dentaire, Université de StrasbourgStrasbourg, France.,Pôle de Médecine et Chirurgie Bucco-Dentaires, Centre de Référence des Manifestations Odontologiques des Maladies Rares, O-Rares, Hôpitaux Universitaires de StrasbourgStrasbourg, France.,Centre Européen de Recherche en Biologie et en Médecine, Centre National de la Recherche Scientifique UMR7104, Institut National de la Santé et de la Recherche Médicale U964, Institut de Génétique et de Biologie Moléculaire and Cellulaire, Université de StrasbourgIllkirch, France.,Institut d'Etudes Avancées, Université de Strasbourg, USIASStrasbourg, France.,Eastman Dental Institute, University College LondonLondon, United Kingdom
| | - Jean-Yves Sire
- Institut de Biologie Paris-Seine, UMR 7138-Evolution Paris-Seine, Sorbonne Universités, Université Pierre et Marie CurieParis, France
| |
Collapse
|
22
|
Tollis M, Schiffman JD, Boddy AM. Evolution of cancer suppression as revealed by mammalian comparative genomics. Curr Opin Genet Dev 2017; 42:40-47. [DOI: 10.1016/j.gde.2016.12.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Revised: 12/19/2016] [Accepted: 12/21/2016] [Indexed: 02/05/2023]
|
23
|
Spataro N, Rodríguez JA, Navarro A, Bosch E. Properties of human disease genes and the role of genes linked to Mendelian disorders in complex disease aetiology. Hum Mol Genet 2017; 26:489-500. [PMID: 28053046 PMCID: PMC5409085 DOI: 10.1093/hmg/ddw405] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Revised: 11/10/2016] [Accepted: 11/23/2016] [Indexed: 01/19/2023] Open
Abstract
Do genes presenting variation that has been linked to human disease have different biological properties than genes that have never been related to disease? What is the relationship between disease and fitness? Are the evolutionary pressures that affect genes linked to Mendelian diseases the same to those acting on genes whose variation contributes to complex disorders? The answers to these questions could shed light on the architecture of human genetic disorders and may have relevant implications when designing mapping strategies in future genetic studies. Here we show that, relative to non-disease genes, human disease (HD) genes have specific evolutionary profiles and protein network properties. Additionally, our results indicate that the mutation-selection balance renders an insufficient account of the evolutionary history of some HD genes and that adaptive selection could also contribute to shape their genetic architecture. Notably, several biological features of HD genes depend on the type of pathology (complex or Mendelian) with which they are related. For example, genes harbouring both causal variants for Mendelian disorders and risk factors for complex disease traits (Complex-Mendelian genes), tend to present higher functional relevance in the protein network and higher expression levels than genes associated only with complex disorders. Moreover, risk variants in Complex-Mendelian genes tend to present higher odds ratios than those on genes associated with the same complex disorders but with no link to Mendelian diseases. Taken together, our results suggest that genetic variation at genes linked to Mendelian disorders plays an important role in driving susceptibility to complex disease.
Collapse
Affiliation(s)
- Nino Spataro
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Juan Antonio Rodríguez
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Arcadi Navarro
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
- National Institute for Bioinformatics (INB), Barcelona Biomedical Research Park (PRBB), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona Biomedical Research Park (PRBB), Barcelona, Spain
- Center for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona Biomedical Research Park (PRBB), Barcelona, Spain
| | - Elena Bosch
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| |
Collapse
|
24
|
Liu L, Chang Y, Yang T, Noren DP, Long B, Kornblau S, Qutub A, Ye J. Evolution-informed modeling improves outcome prediction for cancers. Evol Appl 2016; 10:68-76. [PMID: 28035236 PMCID: PMC5192825 DOI: 10.1111/eva.12417] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 08/17/2016] [Indexed: 12/19/2022] Open
Abstract
Despite wide applications of high-throughput biotechnologies in cancer research, many biomarkers discovered by exploring large-scale omics data do not provide satisfactory performance when used to predict cancer treatment outcomes. This problem is partly due to the overlooking of functional implications of molecular markers. Here, we present a novel computational method that uses evolutionary conservation as prior knowledge to discover bona fide biomarkers. Evolutionary selection at the molecular level is nature's test on functional consequences of genetic elements. By prioritizing genes that show significant statistical association and high functional impact, our new method reduces the chances of including spurious markers in the predictive model. When applied to predicting therapeutic responses for patients with acute myeloid leukemia and to predicting metastasis for patients with prostate cancers, the new method gave rise to evolution-informed models that enjoyed low complexity and high accuracy. The identified genetic markers also have significant implications in tumor progression and embrace potential drug targets. Because evolutionary conservation can be estimated as a gene-specific, position-specific, or allele-specific parameter on the nucleotide level and on the protein level, this new method can be extended to apply to miscellaneous "omics" data to accelerate biomarker discoveries.
Collapse
Affiliation(s)
- Li Liu
- Department of Biomedical Informatics Arizona State University Tempe AZ USA
| | - Yung Chang
- School of Life Science Arizona State University Tempe AZ USA
| | - Tao Yang
- Department of Computer Science and Engineering Arizona State University Tempe AZ USA
| | - David P Noren
- Department of Bioengineering Rice University Houston TX USA
| | - Byron Long
- Department of Bioengineering Rice University Houston TX USA
| | - Steven Kornblau
- The University of Texas MD Anderson Cancer Center Houston TX USA
| | - Amina Qutub
- Department of Bioengineering Rice University Houston TX USA
| | - Jieping Ye
- Department of Computational Medicine and Bioinformatics University of Michigan Ann Arbor MI USA
| |
Collapse
|
25
|
Karim S, NourEldin HF, Abusamra H, Salem N, Alhathli E, Dudley J, Sanderford M, Scheinfeldt LB, Chaudhary AG, Al-Qahtani MH, Kumar S. e-GRASP: an integrated evolutionary and GRASP resource for exploring disease associations. BMC Genomics 2016; 17:770. [PMID: 27766955 PMCID: PMC5073857 DOI: 10.1186/s12864-016-3088-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2023] Open
Abstract
Background Genome-wide association studies (GWAS) have become a mainstay of biological research concerned with discovering genetic variation linked to phenotypic traits and diseases. Both discrete and continuous traits can be analyzed in GWAS to discover associations between single nucleotide polymorphisms (SNPs) and traits of interest. Associations are typically determined by estimating the significance of the statistical relationship between genetic loci and the given trait. However, the prioritization of bona fide, reproducible genetic associations from GWAS results remains a central challenge in identifying genomic loci underlying common complex diseases. Evolutionary-aware meta-analysis of the growing GWAS literature is one way to address this challenge and to advance from association to causation in the discovery of genotype-phenotype relationships. Description We have created an evolutionary GWAS resource to enable in-depth query and exploration of published GWAS results. This resource uses the publically available GWAS results annotated in the GRASP2 database. The GRASP2 database includes results from 2082 studies, 177 broad phenotype categories, and ~8.87 million SNP-phenotype associations. For each SNP in e-GRASP, we present information from the GRASP2 database for convenience as well as evolutionary information (e.g., rate and timespan). Users can, therefore, identify not only SNPs with highly significant phenotype-association P-values, but also SNPs that are highly replicated and/or occur at evolutionarily conserved sites that are likely to be functionally important. Additionally, we provide an evolutionary-adjusted SNP association ranking (E-rank) that uses cross-species evolutionary conservation scores and population allele frequencies to transform P-values in an effort to enhance the discovery of SNPs with a greater probability of biologically meaningful disease associations. Conclusion By adding an evolutionary dimension to the GWAS results available in the GRASP2 database, our e-GRASP resource will enable a more effective exploration of SNPs not only by the statistical significance of trait associations, but also by the number of studies in which associations have been replicated, and the evolutionary context of the associated mutations. Therefore, e-GRASP will be a valuable resource for aiding researchers in the identification of bona fide, reproducible genetic associations from GWAS results. This resource is freely available at http://www.mypeg.info/egrasp.
Collapse
Affiliation(s)
- Sajjad Karim
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Hend Fakhri NourEldin
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Heba Abusamra
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Nada Salem
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Elham Alhathli
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Joel Dudley
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, NY, 10029, USA
| | - Max Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA
| | - Laura B Scheinfeldt
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | | | | | - Sudhir Kumar
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia. .,Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA. .,Department of Biology, Temple University, Philadelphia, PA, 19122, USA.
| |
Collapse
|
26
|
|
27
|
Comparative sequence analyses of rhodopsin and RPE65 reveal patterns of selective constraint across hereditary retinal disease mutations. Vis Neurosci 2016; 33:e002. [DOI: 10.1017/s0952523815000322] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
AbstractRetinitis pigmentosa (RP) comprises several heritable diseases that involve photoreceptor, and ultimately retinal, degeneration. Currently, mutations in over 50 genes have known links to RP. Despite advances in clinical characterization, molecular characterization of RP remains challenging due to the heterogeneous nature of causal genes, mutations, and clinical phenotypes. In this study, we compiled large datasets of two important visual genes associated with RP: rhodopsin, which initiates the phototransduction cascade, and the retinoid isomerase RPE65, which regenerates the visual cycle. We used a comparative evolutionary approach to investigate the relationship between interspecific sequence variation and pathogenic mutations that lead to degenerative retinal disease. Using codon-based likelihood methods, we estimated evolutionary rates (dN/dS) across both genes in a phylogenetic context to investigate differences between pathogenic and nonpathogenic amino acid sites. In both genes, disease-associated sites showed significantly lower evolutionary rates compared to nondisease sites, and were more likely to occur in functionally critical areas of the proteins. The nature of the dataset (e.g., vertebrate or mammalian sequences), as well as selection of pathogenic sites, affected the differences observed between pathogenic and nonpathogenic sites. Our results illustrate that these methods can serve as an intermediate step in understanding protein structure and function in a clinical context, particularly in predicting the relative pathogenicity (i.e., functional impact) of point mutations and their downstream phenotypic effects. Extensions of this approach may also contribute to current methods for predicting the deleterious effects of candidate mutations and to the identification of protein regions under strong constraint where we expect pathogenic mutations to occur.
Collapse
|
28
|
Kumar A, Butler BM, Kumar S, Ozkan SB. Integration of structural dynamics and molecular evolution via protein interaction networks: a new era in genomic medicine. Curr Opin Struct Biol 2015; 35:135-42. [PMID: 26684487 PMCID: PMC4856467 DOI: 10.1016/j.sbi.2015.11.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2015] [Revised: 11/03/2015] [Accepted: 11/05/2015] [Indexed: 01/08/2023]
Abstract
Sequencing technologies are revealing many new non-synonymous single nucleotide variants (nsSNVs) in each personal exome. To assess their functional impacts, comparative genomics is frequently employed to predict if they are benign or not. However, evolutionary analysis alone is insufficient, because it misdiagnoses many disease-associated nsSNVs, such as those at positions involved in protein interfaces, and because evolutionary predictions do not provide mechanistic insights into functional change or loss. Structural analyses can aid in overcoming both of these problems by incorporating conformational dynamics and allostery in nSNV diagnosis. Finally, protein-protein interaction networks using systems-level methodologies shed light onto disease etiology and pathogenesis. Bridging these network approaches with structurally resolved protein interactions and dynamics will advance genomic medicine.
Collapse
Affiliation(s)
- Avishek Kumar
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ 85281, United States
| | - Brandon M Butler
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ 85281, United States
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, United States; Department of Biology, Temple University, Philadelphia, PA 19122, United States; Center for Genomic Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - S Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ 85281, United States.
| |
Collapse
|
29
|
Miura S, Tate S, Kumar S. Using Disease-Associated Coding Sequence Variation to Investigate Functional Compensation by Human Paralogous Proteins. Evol Bioinform Online 2015; 11:245-51. [PMID: 26604664 PMCID: PMC4631161 DOI: 10.4137/ebo.s30594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2015] [Revised: 09/14/2015] [Accepted: 09/18/2015] [Indexed: 11/09/2022] Open
Abstract
Gene duplication enables the functional diversification in species. It is thought that duplicated genes may be able to compensate if the function of one of the gene copies is disrupted. This possibility is extensively debated with some studies reporting proteome-wide compensation, whereas others suggest functional compensation among only recent gene duplicates or no compensation at all. We report results from a systematic molecular evolutionary analysis to test the predictions of the functional compensation hypothesis. We contrasted the density of Mendelian disease-associated single nucleotide variants (dSNVs) in proteins with no discernable paralogs (singletons) with the dSNV density in proteins found in multigene families. Under the functional compensation hypothesis, we expected to find greater numbers of dSNVs in singletons due to the lack of any compensating partners. Our analyses produced an opposite pattern; paralogs have over 35% higher dSNV density than singletons. We found that these patterns are concordant with similar differences in the rates of amino acid evolution (ie, functional constraints), as the proteins with paralogs have evolved 33% slower than singletons. Our evolutionary constraint explanation is robust to differences in family sizes, ages (young vs. old duplicates), and degrees of amino acid sequence similarities among paralogs. Therefore, disease-associated human variation does not exhibit significant signals of functional compensation among paralogous proteins, but rather an evolutionary constraint hypothesis provides a better explanation for the observed patterns of disease-associated and neutral polymorphisms in the human genome.
Collapse
Affiliation(s)
- Sayaka Miura
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - Stephanie Tate
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA. ; Department of Biology, Temple University, Philadelphia, PA, USA. ; Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
30
|
Kumar A, Glembo TJ, Ozkan SB. The Role of Conformational Dynamics and Allostery in the Disease Development of Human Ferritin. Biophys J 2015; 109:1273-81. [PMID: 26255589 PMCID: PMC4576160 DOI: 10.1016/j.bpj.2015.06.060] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Revised: 06/18/2015] [Accepted: 06/30/2015] [Indexed: 12/26/2022] Open
Abstract
Determining the three-dimensional structure of myoglobin, the first solved structure of a protein, fundamentally changed the way protein function was understood. Even more revolutionary was the information that came afterward: protein dynamics play a critical role in biological functions. Therefore, understanding conformational dynamics is crucial to obtaining a more complete picture of protein evolution. We recently analyzed the evolution of different protein families including green fluorescent proteins (GFPs), β-lactamase inhibitors, and nuclear receptors, and we observed that the alteration of conformational dynamics through allosteric regulation leads to functional changes. Moreover, proteome-wide conformational dynamics analysis of more than 100 human proteins showed that mutations occurring at rigid residue positions are more susceptible to disease than flexible residue positions. These studies suggest that disease-associated mutations may impair dynamic allosteric regulations, leading to loss of function. Thus, in this study, we analyzed the conformational dynamics of the wild-type light chain subunit of human ferritin protein along with the neutral and disease forms. We first performed replica exchange molecular dynamics simulations of wild-type and mutants to obtain equilibrated dynamics and then used perturbation response scanning (PRS), where we introduced a random Brownian kick to a position and computed the fluctuation response of the chain using linear response theory. Using this approach, we computed the dynamic flexibility index (DFI) for each position in the chain for the wild-type and the mutants. DFI quantifies the resilience of a position to a perturbation and provides a flexibility/rigidity measurement for a given position in the chain. The DFI analysis reveals that neutral variants and the wild-type exhibit similar flexibility profiles in which experimentally determined functionally critical sites act as hinges in controlling the overall motion. However, disease mutations alter the conformational dynamic profile, making hinges more loose (i.e., softening the hinges), thus impairing the allosterically regulated dynamics.
Collapse
Affiliation(s)
- Avishek Kumar
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona
| | - Tyler J Glembo
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona
| | - S Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona.
| |
Collapse
|
31
|
Cheng F, Liu C, Lin CC, Zhao J, Jia P, Li WH, Zhao Z. A Gene Gravity Model for the Evolution of Cancer Genomes: A Study of 3,000 Cancer Genomes across 9 Cancer Types. PLoS Comput Biol 2015; 11:e1004497. [PMID: 26352260 PMCID: PMC4564226 DOI: 10.1371/journal.pcbi.1004497] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2015] [Accepted: 08/11/2015] [Indexed: 12/14/2022] Open
Abstract
Cancer development and progression result from somatic evolution by an accumulation of genomic alterations. The effects of those alterations on the fitness of somatic cells lead to evolutionary adaptations such as increased cell proliferation, angiogenesis, and altered anticancer drug responses. However, there are few general mathematical models to quantitatively examine how perturbations of a single gene shape subsequent evolution of the cancer genome. In this study, we proposed the gene gravity model to study the evolution of cancer genomes by incorporating the genome-wide transcription and somatic mutation profiles of ~3,000 tumors across 9 cancer types from The Cancer Genome Atlas into a broad gene network. We found that somatic mutations of a cancer driver gene may drive cancer genome evolution by inducing mutations in other genes. This functional consequence is often generated by the combined effect of genetic and epigenetic (e.g., chromatin regulation) alterations. By quantifying cancer genome evolution using the gene gravity model, we identified six putative cancer genes (AHNAK, COL11A1, DDX3X, FAT4, STAG2, and SYNE1). The tumor genomes harboring the nonsynonymous somatic mutations in these genes had a higher mutation density at the genome level compared to the wild-type groups. Furthermore, we provided statistical evidence that hypermutation of cancer driver genes on inactive X chromosomes is a general feature in female cancer genomes. In summary, this study sheds light on the functional consequences and evolutionary characteristics of somatic mutations during tumorigenesis by propelling adaptive cancer genome evolution, which would provide new perspectives for cancer research and therapeutics.
Collapse
Affiliation(s)
- Feixiong Cheng
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Chuang Liu
- Alibaba Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou, Zhejiang, China
| | - Chen-Ching Lin
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Junfei Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Peilin Jia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Wen-Hsiung Li
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America
- Biodiversity Research Center and Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- * E-mail:
| |
Collapse
|
32
|
Evolutionary analysis of selective constraints identifies ameloblastin (AMBN) as a potential candidate for amelogenesis imperfecta. BMC Evol Biol 2015. [PMID: 26223266 PMCID: PMC4518657 DOI: 10.1186/s12862-015-0431-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Background Ameloblastin (AMBN) is a phosphorylated, proline/glutamine-rich protein secreted during enamel formation. Previous studies have revealed that this enamel matrix protein was present early in vertebrate evolution and certainly plays important roles during enamel formation although its precise functions remain unclear. We performed evolutionary analyses of AMBN in order to (i) identify residues and motifs important for the protein function, (ii) predict mutations responsible for genetic diseases, and (iii) understand its molecular evolution in mammals. Results In silico searches retrieved 56 complete sequences in public databases that were aligned and analyzed computationally. We showed that AMBN is globally evolving under moderate purifying selection in mammals and contains a strong phylogenetic signal. In addition, our analyses revealed codons evolving under significant positive selection. Evidence for positive selection acting on AMBN was observed in catarrhine primates and the aye-aye. We also found that (i) an additional translation initiation site was recruited in the ancestral placental AMBN, (ii) a short exon was duplicated several times in various species including catarrhine primates, and (iii) several polyadenylation sites are present. Conclusions AMBN possesses many positions, which have been subjected to strong selective pressure for 200 million years. These positions correspond to several cleavage sites and hydroxylated, O-glycosylated, and phosphorylated residues. We predict that these conserved positions would be potentially responsible for enamel disorder if substituted. Some motifs that were previously identified as potentially important functionally were confirmed, and we found two, highly conserved, new motifs, the function of which should be tested in the near future. This study illustrates the power of evolutionary analyses for characterizing the functional constraints acting on proteins with yet uncharacterized structure. Electronic supplementary material The online version of this article (doi:10.1186/s12862-015-0431-0) contains supplementary material, which is available to authorized users.
Collapse
|
33
|
Webb AE, Gerek ZN, Morgan CC, Walsh TA, Loscher CE, Edwards SV, O'Connell MJ. Adaptive Evolution as a Predictor of Species-Specific Innate Immune Response. Mol Biol Evol 2015; 32:1717-29. [PMID: 25758009 PMCID: PMC4476151 DOI: 10.1093/molbev/msv051] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
It has been proposed that positive selection may be associated with protein functional change. For example, human and macaque have different outcomes to HIV infection and it has been shown that residues under positive selection in the macaque TRIM5α receptor locate to the region known to influence species-specific response to HIV. In general, however, the relationship between sequence and function has proven difficult to fully elucidate, and it is the role of large-scale studies to help bridge this gap in our understanding by revealing major patterns in the data that correlate genotype with function or phenotype. In this study, we investigate the level of species-specific positive selection in innate immune genes from human and mouse. In total, we analyzed 456 innate immune genes using codon-based models of evolution, comparing human, mouse, and 19 other vertebrate species to identify putative species-specific positive selection. Then we used population genomic data from the recently completed Neanderthal genome project, the 1000 human genomes project, and the 17 laboratory mouse genomes project to determine whether the residues that were putatively positively selected are fixed or variable in these populations. We find evidence of species-specific positive selection on both the human and the mouse branches and we show that the classes of genes under positive selection cluster by function and by interaction. Data from this study provide us with targets to test the relationship between positive selection and protein function and ultimately to test the relationship between positive selection and discordant phenotypes.
Collapse
Affiliation(s)
- Andrew E Webb
- Bioinformatics and Molecular Evolution Group, School of Biotechnology, Dublin City University, Dublin 9, Ireland Centre for Scientific Computing & Complex Systems Modeling (SCI-SYM), Dublin City University, Dublin 9, Ireland
| | - Z Nevin Gerek
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia
| | - Claire C Morgan
- Bioinformatics and Molecular Evolution Group, School of Biotechnology, Dublin City University, Dublin 9, Ireland Centre for Scientific Computing & Complex Systems Modeling (SCI-SYM), Dublin City University, Dublin 9, Ireland
| | - Thomas A Walsh
- Bioinformatics and Molecular Evolution Group, School of Biotechnology, Dublin City University, Dublin 9, Ireland Centre for Scientific Computing & Complex Systems Modeling (SCI-SYM), Dublin City University, Dublin 9, Ireland
| | - Christine E Loscher
- Immunomodulation Research Group, School of Biotechnology, Dublin City University, Glasnevin, Dublin 9, Ireland
| | - Scott V Edwards
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard University
| | - Mary J O'Connell
- Bioinformatics and Molecular Evolution Group, School of Biotechnology, Dublin City University, Dublin 9, Ireland Centre for Scientific Computing & Complex Systems Modeling (SCI-SYM), Dublin City University, Dublin 9, Ireland Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard University
| |
Collapse
|
34
|
Butler BM, Gerek ZN, Kumar S, Ozkan SB. Conformational dynamics of nonsynonymous variants at protein interfaces reveals disease association. Proteins 2015; 83:428-35. [PMID: 25546381 DOI: 10.1002/prot.24748] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2014] [Revised: 11/20/2014] [Accepted: 12/10/2014] [Indexed: 12/12/2022]
Abstract
Recent studies have shown that the protein interface sites between individual monomeric units in biological assemblies are enriched in disease-associated non-synonymous single nucleotide variants (nsSNVs). To elucidate the mechanistic underpinning of this observation, we investigated the conformational dynamic properties of protein interface sites through a site-specific structural dynamic flexibility metric (dfi) for 333 multimeric protein assemblies. dfi measures the dynamic resilience of a single residue to perturbations that occurred in the rest of the protein structure and identifies sites contributing the most to functionally critical dynamics. Analysis of dfi profiles of over a thousand positions harboring variation revealed that amino acid residues at interfaces have lower average dfi (31%) than those present at non-interfaces (50%), which means that protein interfaces have less dynamic flexibility. Interestingly, interface sites with disease-associated nsSNVs have significantly lower average dfi (23%) as compared to those of neutral nsSNVs (42%), which directly relates structural dynamics to functional importance. We found that less conserved interface positions show much lower dfi for disease nsSNVs as compared to neutral nsSNVs. In this case, dfi is better as compared to the accessible surface area metric, which is based on the static protein structure. Overall, our proteome-wide conformational dynamic analysis indicates that certain interface sites play a critical role in functionally related dynamics (i.e., those with low dfi values), therefore mutations at those sites are more likely to be associated with disease.
Collapse
|
35
|
Carroll CJ, Brilhante V, Suomalainen A. Next-generation sequencing for mitochondrial disorders. Br J Pharmacol 2014; 171:1837-53. [PMID: 24138576 DOI: 10.1111/bph.12469] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2013] [Revised: 10/03/2013] [Accepted: 10/13/2013] [Indexed: 12/30/2022] Open
Abstract
A great deal of our understanding of mitochondrial function has come from studies of inherited mitochondrial diseases, but still majority of the patients lack molecular diagnosis. Furthermore, effective treatments for mitochondrial disorders do not exist. Development of therapies has been complicated by the fact that the diseases are extremely heterogeneous, and collecting large enough cohorts of similarly affected individuals to assess new therapies properly has been difficult. Next-generation sequencing technologies have in the last few years been shown to be an effective method for the genetic diagnosis of inherited mitochondrial diseases. Here we review the strategies and findings from studies applying next-generation sequencing methods for the genetic diagnosis of mitochondrial disorders. Detailed knowledge of molecular causes also enables collection of homogenous cohorts of patients for therapy trials, and therefore boosts development of intervention.
Collapse
Affiliation(s)
- C J Carroll
- Research Programs Unit, Molecular Neurology, Biomedicum-Helsinki, University of Helsinki, Helsinki, Finland
| | | | | |
Collapse
|
36
|
Whole-genome sequencing of the snub-nosed monkey provides insights into folivory and evolutionary history. Nat Genet 2014; 46:1303-10. [DOI: 10.1038/ng.3137] [Citation(s) in RCA: 137] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2014] [Accepted: 10/09/2014] [Indexed: 11/08/2022]
|
37
|
Silvent J, Gasse B, Mornet E, Sire JY. Molecular evolution of the tissue-nonspecific alkaline phosphatase allows prediction and validation of missense mutations responsible for hypophosphatasia. J Biol Chem 2014; 289:24168-79. [PMID: 25023282 DOI: 10.1074/jbc.m114.576843] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
ALPL encodes the tissue nonspecific alkaline phosphatase (TNSALP), which removes phosphate groups from various substrates. Its function is essential for bone and tooth mineralization. In humans, ALPL mutations lead to hypophosphatasia, a genetic disorder characterized by defective bone and/or tooth mineralization. To date, 275 ALPL mutations have been reported to cause hypophosphatasia, of which 204 were simple missense mutations. Molecular evolutionary analysis has proved to be an efficient method to highlight residues important for the protein function and to predict or validate sensitive positions for genetic disease. Here we analyzed 58 mammalian TNSALP to identify amino acids unchanged, or only substituted by residues sharing similar properties, through 220 millions years of mammalian evolution. We found 469 sensitive positions of the 524 residues of human TNSALP, which indicates a highly constrained protein. Any substitution occurring at one of these positions is predicted to lead to hypophosphatasia. We tested the 204 missense mutations resulting in hypophosphatasia against our predictive chart, and validated 99% of them. Most sensitive positions were located in functionally important regions of TNSALP (active site, homodimeric interface, crown domain, calcium site, …). However, some important positions are located in regions, the structure and/or biological function of which are still unknown. Our chart of sensitive positions in human TNSALP (i) enables to validate or invalidate at low cost any ALPL mutation, which would be suspected to be responsible for hypophosphatasia, by contrast with time consuming and expensive functional tests, and (ii) displays higher predictive power than in silico models of prediction.
Collapse
Affiliation(s)
- Jérémie Silvent
- From the Université Pierre & Marie Curie, IBPS, Evolution Paris Seine, 7 quai St-Bernard, Case 05, 75005 Paris and
| | - Barbara Gasse
- From the Université Pierre & Marie Curie, IBPS, Evolution Paris Seine, 7 quai St-Bernard, Case 05, 75005 Paris and
| | - Etienne Mornet
- the Unité de Pathologie Cellulaire et Génétique, EA2493, Université de Versailles-Saint Quentin en Yvelines, Versailles & Unité de Génétique Constitutionnelle, Centre Hospitalier de Versailles, 78150 Le Chesnay, France
| | - Jean-Yves Sire
- From the Université Pierre & Marie Curie, IBPS, Evolution Paris Seine, 7 quai St-Bernard, Case 05, 75005 Paris and
| |
Collapse
|
38
|
Cheng F, Jia P, Wang Q, Lin CC, Li WH, Zhao Z. Studying tumorigenesis through network evolution and somatic mutational perturbations in the cancer interactome. Mol Biol Evol 2014; 31:2156-69. [PMID: 24881052 DOI: 10.1093/molbev/msu167] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Cells govern biological functions through complex biological networks. Perturbations to networks may drive cells to new phenotypic states, for example, tumorigenesis. Identifying how genetic lesions perturb molecular networks is a fundamental challenge. This study used large-scale human interactome data to systematically explore the relationship among network topology, somatic mutation, evolutionary rate, and evolutionary origin of cancer genes. We found the unique network centrality of cancer proteins, which is largely independent of gene essentiality. Cancer genes likely have experienced a lower evolutionary rate and stronger purifying selection than those of noncancer, Mendelian disease, and orphan disease genes. Cancer proteins tend to have ancient histories, likely originated in early metazoan, although they are younger than proteins encoded by Mendelian disease genes, orphan disease genes, and essential genes. We found that the protein evolutionary origin (age) positively correlates with protein connectivity in the human interactome. Furthermore, we investigated the network-attacking perturbations due to somatic mutations identified from 3,268 tumors across 12 cancer types in The Cancer Genome Atlas. We observed a positive correlation between protein connectivity and the number of nonsynonymous somatic mutations, whereas a weaker or insignificant correlation between protein connectivity and the number of synonymous somatic mutations. These observations suggest that somatic mutational network-attacking perturbations to hub genes play an important role in tumor emergence and evolution. Collectively, this work has broad biomedical implications for both basic cancer biology and the development of personalized cancer therapy.
Collapse
Affiliation(s)
- Feixiong Cheng
- Department of Biomedical Informatics, Vanderbilt University School of Medicine
| | - Peilin Jia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine
| | - Quan Wang
- Department of Biomedical Informatics, Vanderbilt University School of Medicine
| | - Chen-Ching Lin
- Department of Biomedical Informatics, Vanderbilt University School of Medicine
| | - Wen-Hsiung Li
- Department of Ecology and Evolution, University of ChicagoBiodiversity Research Center and Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of MedicineDepartment of Cancer Biology, Vanderbilt University School of MedicineDepartment of Psychiatry, Vanderbilt University School of MedicineCenter for Quantitative Sciences, Vanderbilt University Medical Center
| |
Collapse
|
39
|
Vona B, Hofrichter MAH, Neuner C, Schröder J, Gehrig A, Hennermann JB, Kraus F, Shehata-Dieler W, Klopocki E, Nanda I, Haaf T. DFNB16 is a frequent cause of congenital hearing impairment: implementation of STRC mutation analysis in routine diagnostics. Clin Genet 2014; 87:49-55. [PMID: 26011646 PMCID: PMC4302246 DOI: 10.1111/cge.12332] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Revised: 11/26/2013] [Accepted: 12/12/2013] [Indexed: 11/29/2022]
Abstract
Increasing attention has been directed toward assessing mutational fallout of stereocilin (STRC), the gene underlying DFNB16. A major challenge is due to a closely linked pseudogene with 99.6% coding sequence identity. In 94 GJB2/GJB6-mutation negative individuals with non-syndromic sensorineural hearing loss (NSHL), we identified two homozygous and six heterozygous deletions, encompassing the STRC region by microarray and/or quantitative polymerase chain reaction (qPCR) analysis. To detect smaller mutations, we developed a Sanger sequencing method for pseudogene exclusion. Three heterozygous deletion carriers exhibited hemizygous mutations predicted as negatively impacting the protein. In 30 NSHL individuals without deletion, we detected one with compound heterozygous and two with heterozygous pathogenic mutations. Of 36 total patients undergoing STRC sequencing, two showed the c.3893A>G variant in conjunction with a heterozygous deletion or mutation and three exhibited the variant in a heterozygous state. Although this variant affects a highly conserved amino acid and is predicted as deleterious, comparable minor allele frequencies (MAFs) (around 10%) in NSHL individuals and controls and homozygous variant carriers without NSHL argue against its pathogenicity. Collectively, six (6%) of 94 NSHL individuals were diagnosed with homozygous or compound heterozygous mutations causing DFNB16 and five (5%) as heterozygous mutation carriers. Besides GJB2/GJB6 (DFNB1), STRC is a major contributor to congenital hearing impairment.
Collapse
Affiliation(s)
- B Vona
- Institute of Human Genetics, Julius Maximilians University, Würzburg, Germany
| | - M A H Hofrichter
- Institute of Human Genetics, Julius Maximilians University, Würzburg, Germany
| | - C Neuner
- Institute of Human Genetics, Julius Maximilians University, Würzburg, Germany
| | - J Schröder
- Institute of Human Genetics, Julius Maximilians University, Würzburg, Germany
| | - A Gehrig
- Institute of Human Genetics, Julius Maximilians University, Würzburg, Germany
| | - J B Hennermann
- Department of Pediatric Endocrinology, Gastroenterology and Metabolic Diseases, Charité Universitätsmedizin, Berlin, Germany
| | - F Kraus
- Comprehensive Hearing Center, Department of Otorhinolaryngology, Plastic, Aesthetic and Reconstructive Head and Neck Surgery, University Hospital, Würzburg, Germany
| | - W Shehata-Dieler
- Comprehensive Hearing Center, Department of Otorhinolaryngology, Plastic, Aesthetic and Reconstructive Head and Neck Surgery, University Hospital, Würzburg, Germany
| | - E Klopocki
- Institute of Human Genetics, Julius Maximilians University, Würzburg, Germany
| | - I Nanda
- Institute of Human Genetics, Julius Maximilians University, Würzburg, Germany
| | - T Haaf
- Institute of Human Genetics, Julius Maximilians University, Würzburg, Germany
| |
Collapse
|
40
|
Phylogenetic Gaussian process model for the inference of functionally important regions in protein tertiary structures. PLoS Comput Biol 2014; 10:e1003429. [PMID: 24453956 PMCID: PMC3894161 DOI: 10.1371/journal.pcbi.1003429] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Accepted: 11/22/2013] [Indexed: 11/30/2022] Open
Abstract
A critical question in biology is the identification of functionally important amino acid sites in proteins. Because functionally important sites are under stronger purifying selection, site-specific substitution rates tend to be lower than usual at these sites. A large number of phylogenetic models have been developed to estimate site-specific substitution rates in proteins and the extraordinarily low substitution rates have been used as evidence of function. Most of the existing tools, e.g. Rate4Site, assume that site-specific substitution rates are independent across sites. However, site-specific substitution rates may be strongly correlated in the protein tertiary structure, since functionally important sites tend to be clustered together to form functional patches. We have developed a new model, GP4Rate, which incorporates the Gaussian process model with the standard phylogenetic model to identify slowly evolved regions in protein tertiary structures. GP4Rate uses the Gaussian process to define a nonparametric prior distribution of site-specific substitution rates, which naturally captures the spatial correlation of substitution rates. Simulations suggest that GP4Rate can potentially estimate site-specific substitution rates with a much higher accuracy than Rate4Site and tends to report slowly evolved regions rather than individual sites. In addition, GP4Rate can estimate the strength of the spatial correlation of substitution rates from the data. By applying GP4Rate to a set of mammalian B7-1 genes, we found a highly conserved region which coincides with experimental evidence. GP4Rate may be a useful tool for the in silico prediction of functionally important regions in the proteins with known structures. To understand how a protein functions, a critical step is to know which regions in its protein tertiary structure may be functionally important. Functionally important protein regions are typically more conserved than other regions because mutations in these regions are more likely to be deleterious. A number of phylogenetic models have been developed to identify conserved sites or regions in proteins by comparing protein sequences from multiple species. However, most of these methods treat amino acid sites independently and do not consider the spatial clustering of conserved sites in the protein tertiary structure. Therefore, their power of identifying functional protein regions is limited. We develop a new statistical model, GP4Rate, which combines the information from the protein sequences and the protein tertiary structure to infer conserved regions. We demonstrate that GP4Rate outperforms Rate4Site, the most widely used phylogenetic software for inferring functional amino acid sites, via simulations with a case study of B7-1 genes. GP4Rate is a potentially useful tool for guiding mutagenesis experiments or providing insights on the relationship between protein structures and functions.
Collapse
|
41
|
Stecher G, Liu L, Sanderford M, Peterson D, Tamura K, Kumar S. MEGA-MD: molecular evolutionary genetics analysis software with mutational diagnosis of amino acid variation. Bioinformatics 2014; 30:1305-7. [PMID: 24413669 DOI: 10.1093/bioinformatics/btu018] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Computational diagnosis of amino acid variants in the human exome is the first step in assessing the disruptive impacts of non-synonymous single nucleotide variants (nsSNVs) on human health and disease. The Molecular Evolutionary Genetics Analysis software with mutational diagnosis (MEGA-MD) is a suite of tools developed to forecast the deleteriousness of nsSNVs using multiple methods and to explore nsSNVs in the context of the variability permitted in the long-term evolution of the affected position. In its graphical interface for use on desktops, it enables interactive computational diagnosis and evolutionary exploration of nsSNVs. As a web service, MEGA-MD is suitable for diagnosing variants on an exome scale. The MEGA-MD suite intends to serve the needs for conducting low- and high-throughput analysis of nsSNVs in diverse applications.
Collapse
Affiliation(s)
- Glen Stecher
- Center for Evolutionary Medicine and Informatics, Biodesign Institute, Arizona State University (ASU), Tempe, AZ 85287, Research Center for Genomics and Bioinformatics, Tokyo Metropolitan University (TMU), Hachioji, Tokyo, Japan, Department of Biological Sciences, TMU, Tokyo, Japan, School of Life Sciences, ASU, Tempe, AZ 85287, USA and Center for Excellence in Genomic Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | | | | | | | | | | |
Collapse
|
42
|
Preeprem T, Gibson G. An association-adjusted consensus deleterious scheme to classify homozygous Mis-sense mutations for personal genome interpretation. BioData Min 2013; 6:24. [PMID: 24365473 PMCID: PMC3892026 DOI: 10.1186/1756-0381-6-24] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2013] [Accepted: 12/17/2013] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Personal genome analysis is now being considered for evaluation of disease risk in healthy individuals, utilizing both rare and common variants. Multiple scores have been developed to predict the deleteriousness of amino acid substitutions, using information on the allele frequencies, level of evolutionary conservation, and averaged structural evidence. However, agreement among these scores is limited and they likely over-estimate the fraction of the genome that is deleterious. METHOD This study proposes an integrative approach to identify a subset of homozygous non-synonymous single nucleotide polymorphisms (nsSNPs). An 8-level classification scheme is constructed from the presence/absence of deleterious predictions combined with evidence of association with disease or complex traits. Detailed literature searches and structural validations are then performed for a subset of homozygous 826 mis-sense mutations in 575 proteins found in the genomes of 12 healthy adults. RESULTS Implementation of the Association-Adjusted Consensus Deleterious Scheme (AACDS) classifies 11% of all predicted highly deleterious homozygous variants as most likely to influence disease risk. The number of such variants per genome ranges from 0 to 8 with no significant difference between African and Caucasian Americans. Detailed analysis of mutations affecting the APOE, MTMR2, THSB1, CHIA, αMyHC, and AMY2A proteins shows how the protein structure is likely to be disrupted, even though the associated phenotypes have not been documented in the corresponding individuals. CONCLUSIONS The classification system for homozygous nsSNPs provides an opportunity to systematically rank nsSNPs based on suggestive evidence from annotations and sequence-based predictions. The ranking scheme, in-depth literature searches, and structural validations of highly prioritized mis-sense mutations compliment traditional sequence-based approaches and should have particular utility for the development of individualized health profiles. An online tool reporting the AACDS score for any variant is provided at the authors' website.
Collapse
Affiliation(s)
| | - Greg Gibson
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
43
|
Gemovic B, Perovic V, Glisic S, Veljkovic N. Feature-based classification of amino acid substitutions outside conserved functional protein domains. ScientificWorldJournal 2013; 2013:948617. [PMID: 24348198 PMCID: PMC3855963 DOI: 10.1155/2013/948617] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2013] [Accepted: 09/24/2013] [Indexed: 01/01/2023] Open
Abstract
There are more than 500 amino acid substitutions in each human genome, and bioinformatics tools irreplaceably contribute to determination of their functional effects. We have developed feature-based algorithm for the detection of mutations outside conserved functional domains (CFDs) and compared its classification efficacy with the most commonly used phylogeny-based tools, PolyPhen-2 and SIFT. The new algorithm is based on the informational spectrum method (ISM), a feature-based technique, and statistical analysis. Our dataset contained neutral polymorphisms and mutations associated with myeloid malignancies from epigenetic regulators ASXL1, DNMT3A, EZH2, and TET2. PolyPhen-2 and SIFT had significantly lower accuracies in predicting the effects of amino acid substitutions outside CFDs than expected, with especially low sensitivity. On the other hand, only ISM algorithm showed statistically significant classification of these sequences. It outperformed PolyPhen-2 and SIFT by 15% and 13%, respectively. These results suggest that feature-based methods, like ISM, are more suitable for the classification of amino acid substitutions outside CFDs than phylogeny-based tools.
Collapse
Affiliation(s)
- Branislava Gemovic
- Centre for Multidisciplinary Research and Engineering, Vinca Institute of Nuclear Sciences, University of Belgrade, 12-14 Mihajla Petrovica Alasa, 11001 Belgrade, Serbia
| | - Vladimir Perovic
- Centre for Multidisciplinary Research and Engineering, Vinca Institute of Nuclear Sciences, University of Belgrade, 12-14 Mihajla Petrovica Alasa, 11001 Belgrade, Serbia
| | - Sanja Glisic
- Centre for Multidisciplinary Research and Engineering, Vinca Institute of Nuclear Sciences, University of Belgrade, 12-14 Mihajla Petrovica Alasa, 11001 Belgrade, Serbia
| | - Nevena Veljkovic
- Centre for Multidisciplinary Research and Engineering, Vinca Institute of Nuclear Sciences, University of Belgrade, 12-14 Mihajla Petrovica Alasa, 11001 Belgrade, Serbia
| |
Collapse
|
44
|
Christodoulou K, Wiskin AE, Gibson J, Tapper W, Willis C, Afzal NA, Upstill-Goddard R, Holloway JW, Simpson MA, Beattie RM, Collins A, Ennis S. Next generation exome sequencing of paediatric inflammatory bowel disease patients identifies rare and novel variants in candidate genes. Gut 2013; 62:977-84. [PMID: 22543157 PMCID: PMC3686259 DOI: 10.1136/gutjnl-2011-301833] [Citation(s) in RCA: 97] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
BACKGROUND Multiple genes have been implicated by association studies in altering inflammatory bowel disease (IBD) predisposition. Paediatric patients often manifest more extensive disease and a particularly severe disease course. It is likely that genetic predisposition plays a more substantial role in this group. OBJECTIVE To identify the spectrum of rare and novel variation in known IBD susceptibility genes using exome sequencing analysis in eight individual cases of childhood onset severe disease. DESIGN DNA samples from the eight patients underwent targeted exome capture and sequencing. Data were processed through an analytical pipeline to align sequence reads, conduct quality checks, and identify and annotate variants where patient sequence differed from the reference sequence. For each patient, the entire complement of rare variation within strongly associated candidate genes was catalogued. RESULTS Across the panel of 169 known IBD susceptibility genes, approximately 300 variants in 104 genes were found. Excluding splicing and HLA-class variants, 58 variants across 39 of these genes were classified as rare, with an alternative allele frequency of <5%, of which 17 were novel. Only two patients with early onset Crohn's disease exhibited rare deleterious variations within NOD2: the previously described R702W variant was the sole NOD2 variant in one patient, while the second patient also carried the L1007 frameshift insertion. Both patients harboured other potentially damaging mutations in the GSDMB, ERAP2 and SEC16A genes. The two patients severely affected with ulcerative colitis exhibited a distinct profile: both carried potentially detrimental variation in the BACH2 and IL10 genes not seen in other patients. CONCLUSION For each of the eight individuals studied, all non-synonymous, truncating and frameshift mutations across all known IBD genes were identified. A unique profile of rare and potentially damaging variants was evident for each patient with this complex disease.
Collapse
Affiliation(s)
- Katja Christodoulou
- Genetic Epidemiology and Genomic Informatics Group, Human Genetics & Genomic Medicine, Faculty of Medicine, University of Southampton, Duthie Building (Mailpoint 808), University Hospital Southampton NHS Foundation Trust, Southampton, UK
| | - Anthony E Wiskin
- NIHR Biomedical Research Unit (Nutrition, Diet & Lifestyle), University Hospital Southampton NHS Foundation Trust, Mailpoint 218, Southampton General Hospital, Tremona Road, Southampton, UK
| | - Jane Gibson
- Genetic Epidemiology and Genomic Informatics Group, Human Genetics & Genomic Medicine, Faculty of Medicine, University of Southampton, Duthie Building (Mailpoint 808), University Hospital Southampton NHS Foundation Trust, Southampton, UK
| | - William Tapper
- Genetic Epidemiology and Genomic Informatics Group, Human Genetics & Genomic Medicine, Faculty of Medicine, University of Southampton, Duthie Building (Mailpoint 808), University Hospital Southampton NHS Foundation Trust, Southampton, UK
| | - Claire Willis
- NIHR Biomedical Research Unit (Nutrition, Diet & Lifestyle), University Hospital Southampton NHS Foundation Trust, Mailpoint 218, Southampton General Hospital, Tremona Road, Southampton, UK
| | - Nadeem A Afzal
- Paediatric Medical Unit, University Hospital Southampton NHS Foundation Trust, Southampton General Hospital, Tremona Road, Southampton, UK
| | - Rosanna Upstill-Goddard
- Genetic Epidemiology and Genomic Informatics Group, Human Genetics & Genomic Medicine, Faculty of Medicine, University of Southampton, Duthie Building (Mailpoint 808), University Hospital Southampton NHS Foundation Trust, Southampton, UK
| | - John W Holloway
- Human Genetics & Genomic Medicine, Human Genetics, Faculty of Medicine, University of Southampton Duthie Building (Mailpoint 808), University Hospital Southampton NHS Foundation Trust, Southampton, SO16 6YD, UK
| | - Michael A Simpson
- Division of Genetics and Molecular Medicine, King's College London School of Medicine, Guy's Hospital, London, UK
| | - R Mark Beattie
- Paediatric Medical Unit, University Hospital Southampton NHS Foundation Trust, Southampton General Hospital, Tremona Road, Southampton, UK
| | - Andrew Collins
- Genetic Epidemiology and Genomic Informatics Group, Human Genetics & Genomic Medicine, Faculty of Medicine, University of Southampton, Duthie Building (Mailpoint 808), University Hospital Southampton NHS Foundation Trust, Southampton, UK
| | - Sarah Ennis
- Genetic Epidemiology and Genomic Informatics Group, Human Genetics & Genomic Medicine, Faculty of Medicine, University of Southampton, Duthie Building (Mailpoint 808), University Hospital Southampton NHS Foundation Trust, Southampton, UK
| |
Collapse
|
45
|
Kotaru AR, Shameer K, Sundaramurthy P, Joshi RC. An improved hypergeometric probability method for identification of functionally linked proteins using phylogenetic profiles. Bioinformation 2013; 9:368-74. [PMID: 23750082 PMCID: PMC3669790 DOI: 10.6026/97320630009368] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2013] [Accepted: 03/06/2013] [Indexed: 12/04/2022] Open
Abstract
Predicting functions of proteins and alternatively spliced isoforms encoded in a genome is one of the important applications of
bioinformatics in the post-genome era. Due to the practical limitation of experimental characterization of all proteins encoded in a
genome using biochemical studies, bioinformatics methods provide powerful tools for function annotation and prediction. These
methods also help minimize the growing sequence-to-function gap. Phylogenetic profiling is a bioinformatics approach to identify
the influence of a trait across species and can be employed to infer the evolutionary history of proteins encoded in genomes. Here
we propose an improved phylogenetic profile-based method which considers the co-evolution of the reference genome to derive
the basic similarity measure, the background phylogeny of target genomes for profile generation and assigning weights to target
genomes. The ordering of genomes and the runs of consecutive matches between the proteins were used to define phylogenetic
relationships in the approach. We used Escherichia coli K12 genome as the reference genome and its 4195 proteins were used in the
current analysis. We compared our approach with two existing methods and our initial results show that the predictions have
outperformed two of the existing approaches. In addition, we have validated our method using a targeted protein-protein
interaction network derived from protein-protein interaction database STRING. Our preliminary results indicates that
improvement in function prediction can be attained by using coevolution-based similarity measures and the runs on to the same
scale instead of computing them in different scales. Our method can be applied at the whole-genome level for annotating
hypothetical proteins from prokaryotic genomes.
Collapse
Affiliation(s)
- Appala Raju Kotaru
- Department of Electronics and Computer Engineering, Indian Institute of Technology Roorkee, 247667, Roorkee, India
| | | | | | | |
Collapse
|
46
|
Effect of genetic regions on the correlation between single point mutation variability and morbidity. Comput Biol Med 2013; 43:594-9. [DOI: 10.1016/j.compbiomed.2013.01.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2011] [Revised: 07/27/2012] [Accepted: 01/19/2013] [Indexed: 11/19/2022]
|
47
|
Kirwan JD, Bekaert M, Commins JM, Davies KTJ, Rossiter SJ, Teeling EC. A phylomedicine approach to understanding the evolution of auditory sensory perception and disease in mammals. Evol Appl 2013; 6:412-22. [PMID: 23745134 PMCID: PMC3673470 DOI: 10.1111/eva.12047] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2012] [Accepted: 12/21/2012] [Indexed: 01/31/2023] Open
Abstract
Hereditary deafness affects 0.1% of individuals globally and is considered as one of the most debilitating diseases of man. Despite recent advances, the molecular basis of normal auditory function is not fully understood and little is known about the contribution of single-nucleotide variations to the disease. Using cross-species comparisons of 11 ‘deafness’ genes (Myo15, Ush1 g, Strc, Tecta, Tectb, Otog, Col11a2, Gjb2, Cldn14, Kcnq4, Pou3f4) across 69 evolutionary and ecologically divergent mammals, we elucidated whether there was evidence for: (i) adaptive evolution acting on these genes across mammals with similar hearing capabilities; and, (ii) regions of long-term evolutionary conservation within which we predict disease-associated mutations should occur. We find evidence of adaptive evolution acting on the eutherian mammals in Myo15, Otog and Tecta. Examination of selection pressures in Tecta and Pou3f4 across a taxonomic sample that included a wide representation of auditory specialists, the bats, did not uncover any evidence for a role in echolocation. We generated ‘conservation indices’ based on selection estimates at nucleotide sites and found that known disease mutations fall within sites of high evolutionary conservation. We suggest that methods such as this, derived from estimates of evolutionary conservation using phylogenetically divergent taxa, will help to differentiate between deleterious and benign mutations.
Collapse
Affiliation(s)
- John D Kirwan
- UCD School of Biology and Environmental Science & UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin Dublin, Ireland
| | | | | | | | | | | |
Collapse
|
48
|
Liu L, Kumar S. Evolutionary balancing is critical for correctly forecasting disease-associated amino acid variants. Mol Biol Evol 2013; 30:1252-7. [PMID: 23462317 DOI: 10.1093/molbev/mst037] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Computational predictions have become indispensable for evaluating the disease-related impact of nonsynonymous single-nucleotide variants discovered in exome sequencing. Many such methods have their roots in molecular evolution, as they use information derived from multiple sequence alignments. We show that the performance of current methods (e.g., PolyPhen-2 and SIFT) is improved significantly by optimizing their statistical models on evolutionarily balanced training data, where equal numbers of positive and negative controls within each evolutionary conservation class are used. Evolutionary balancing significantly reduces the false-positive rates for variants observed at highly conserved sites and false-negative rates for variants observed at fast evolving sites. Use of these improved methods enables more accurate forecasting when concordant diagnosis from multiple methods is regarded as a more reliable indicator of the prediction. Applied to a large exome variation data set, we find that the current methods produce concordant predictions for less than half of the population variants. These advances are implemented in a web resource for use in practical applications (www.mypeg.info, last accessed March 13, 2013).
Collapse
Affiliation(s)
- Li Liu
- Center for Evolutionary Medicine and Informatics, Biodesign Institute, Arizona State University, USA
| | | |
Collapse
|
49
|
Kindt ASD, Navarro P, Semple CAM, Haley CS. The genomic signature of trait-associated variants. BMC Genomics 2013; 14:108. [PMID: 23418889 PMCID: PMC3600003 DOI: 10.1186/1471-2164-14-108] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2012] [Accepted: 02/11/2013] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Genome-wide association studies have identified thousands of SNP variants associated with hundreds of phenotypes. For most associations the causal variants and the molecular mechanisms underlying pathogenesis remain unknown. Exploration of the underlying functional annotations of trait-associated loci has thrown some light on their potential roles in pathogenesis. However, there are some shortcomings of the methods used to date, which may undermine efforts to prioritize variants for further analyses. Here, we introduce and apply novel methods to rigorously identify annotation classes showing enrichment or depletion of trait-associated variants taking into account the underlying associations due to co-location of different functional annotations and linkage disequilibrium. RESULTS We assessed enrichment and depletion of variants in publicly available annotation classes such as genic regions, regulatory features, measures of conservation, and patterns of histone modifications. We used logistic regression to build a multivariate model that identified the most influential functional annotations for trait-association status of genome-wide significant variants. SNPs associated with all of the enriched annotations were 8 times more likely to be trait-associated variants than SNPs annotated with none of them. Annotations associated with chromatin state together with prior knowledge of the existence of a local expression QTL (eQTL) were the most important factors in the final logistic regression model. Surprisingly, despite the widespread use of evolutionary conservation to prioritize variants for study we find only modest enrichment of trait-associated SNPs in conserved regions. CONCLUSION We established odds ratios of functional annotations that are more likely to contain significantly trait-associated SNPs, for the purpose of prioritizing GWAS hits for further studies. Additionally, we estimated the relative and combined influence of the different genomic annotations, which may facilitate future prioritization methods by adding substantial information.
Collapse
Affiliation(s)
- Alida S D Kindt
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, EH4 2XU, Edinburgh, UK
| | - Pau Navarro
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, EH4 2XU, Edinburgh, UK
| | - Colin A M Semple
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, EH4 2XU, Edinburgh, UK
| | - Chris S Haley
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, EH4 2XU, Edinburgh, UK
| |
Collapse
|
50
|
Nevin Gerek Z, Kumar S, Banu Ozkan S. Structural dynamics flexibility informs function and evolution at a proteome scale. Evol Appl 2013; 6:423-33. [PMID: 23745135 PMCID: PMC3673471 DOI: 10.1111/eva.12052] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2012] [Accepted: 01/13/2013] [Indexed: 01/04/2023] Open
Abstract
Protein structures are dynamic entities with a myriad of atomic fluctuations, side-chain rotations, and collective domain movements. Although the importance of these dynamics to proper functioning of proteins is emerging in the studies of many protein families, there is a lack of broad evidence for the critical role of protein dynamics in shaping the biological functions of a substantial fraction of residues for a large number of proteins in the human proteome. Here, we propose a novel dynamic flexibility index (dfi) to quantify the dynamic properties of individual residues in any protein and use it to assess the importance of protein dynamics in 100 human proteins. Our analyses involving functionally critical positions, disease-associated and putatively neutral population variations, and the rate of interspecific substitutions per residue produce concordant patterns at a proteome scale. They establish that the preservation of dynamic properties of residues in a protein structure is critical for maintaining the protein/biological function. Therefore, structural dynamics needs to become a major component of the analysis of protein function and evolution. Such analyses will be facilitated by the dfi, which will also enable the integrative use of structural dynamics with evolutionary conservation in genomic medicine as well as functional genomics investigations.
Collapse
Affiliation(s)
- Zeynep Nevin Gerek
- Center for Evolutionary Medicine and Informatics, Biodesign Institute, Arizona State University Tempe, AZ, USA ; Department of Physics, Center for Biological Physics, Bateman Physical Sciences F-Wing, Arizona State University Tempe, AZ, USA
| | | | | |
Collapse
|