1
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: trends from three decades of genetic variant impact predictors. Hum Genomics 2024; 18:90. [PMID: 39198917 PMCID: PMC11360829 DOI: 10.1186/s40246-024-00663-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2024] [Accepted: 08/19/2024] [Indexed: 09/01/2024] Open
Abstract
BACKGROUND Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). RESULTS The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past three decades, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 190 VIPs, resulting in a total of 407 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. CONCLUSIONS VIPdb version 2 summarizes 407 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. VIPdb is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA
| | - Arul S Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA
- Illumina, Foster City, CA, 94404, USA
| | - Steven E Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA.
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA.
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA.
| |
Collapse
|
2
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: Trends from 25 years of genetic variant impact predictors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600283. [PMID: 38979289 PMCID: PMC11230257 DOI: 10.1101/2024.06.25.600283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Background Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). Results The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past 25 years, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 186 VIPs, resulting in a total of 403 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. Conclusions VIPdb version 2 summarizes 403 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. Availability VIPdb version 2 is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
| | - Arul S. Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Currently at: Illumina, Foster City, California 94404, USA
| | - Steven E. Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
3
|
Sivadas A, Rathore S, Sahana S, Jolly B, Bhoyar RC, Jain A, Sharma D, Imran M, Senthilvel V, Divakar MK, Mishra A, Sivasubbu S, Scaria V. The genomic landscape of CYP2D6 variation in the Indian population. Pharmacogenomics 2024; 25:147-160. [PMID: 38426301 DOI: 10.2217/pgs-2023-0233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2024] Open
Abstract
Aim: The CYP2D6 gene is highly polymorphic, causing large interindividual variability in the metabolism of several clinically important drugs. Materials & methods: The authors investigated the diversity and distribution of CYP2D6 alleles in Indians using whole genome sequences (N = 1518). Functional consequences were assessed using pathogenicity scores and molecular dynamics simulations. Results: The analysis revealed population-specific CYP2D6 alleles (*86, *7, *111, *112, *113, *99) and remarkable differences in variant and phenotype frequencies with global populations. The authors observed that one in three Indians could benefit from a dose alteration for psychiatric drugs with accurate CYP2D6 phenotyping. Molecular dynamics simulations revealed large conformational fluctuations, confirming the predicted reduced function of *86 and *113 alleles. Conclusion: The findings emphasize the utility of comprehensive CYP2D6 profiling for aiding precision public health.
Collapse
Affiliation(s)
- Ambily Sivadas
- Division of Nutrition, St. John's Research Institute, St. John's National Academy of Health Sciences, Bangalore, Karnataka, 560034, India
| | - Surabhi Rathore
- CSIR Institute of Genomics & Integrative Biology, Mathura Road, New Delhi, 110025, India
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, 201002, India
| | - S Sahana
- CSIR Institute of Genomics & Integrative Biology, Mathura Road, New Delhi, 110025, India
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, 201002, India
| | - Bani Jolly
- CSIR Institute of Genomics & Integrative Biology, Mathura Road, New Delhi, 110025, India
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, 201002, India
| | - Rahul C Bhoyar
- CSIR Institute of Genomics & Integrative Biology, Mathura Road, New Delhi, 110025, India
| | - Abhinav Jain
- CSIR Institute of Genomics & Integrative Biology, Mathura Road, New Delhi, 110025, India
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, 201002, India
| | - Disha Sharma
- CSIR Institute of Genomics & Integrative Biology, Mathura Road, New Delhi, 110025, India
| | - Mohamed Imran
- CSIR Institute of Genomics & Integrative Biology, Mathura Road, New Delhi, 110025, India
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, 201002, India
| | - Vigneshwar Senthilvel
- CSIR Institute of Genomics & Integrative Biology, Mathura Road, New Delhi, 110025, India
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, 201002, India
| | - Mohit Kumar Divakar
- CSIR Institute of Genomics & Integrative Biology, Mathura Road, New Delhi, 110025, India
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, 201002, India
| | - Anushree Mishra
- CSIR Institute of Genomics & Integrative Biology, Mathura Road, New Delhi, 110025, India
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, 201002, India
| | - Sridhar Sivasubbu
- CSIR Institute of Genomics & Integrative Biology, Mathura Road, New Delhi, 110025, India
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, 201002, India
- Vishwanath Cancer Care Foundation, B 702, 7th Floor, Neelkanth Business Park Kirol Village, Vidya Vihar, West Mumbai, 400086, India
| | - Vinod Scaria
- CSIR Institute of Genomics & Integrative Biology, Mathura Road, New Delhi, 110025, India
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, 201002, India
- Vishwanath Cancer Care Foundation, B 702, 7th Floor, Neelkanth Business Park Kirol Village, Vidya Vihar, West Mumbai, 400086, India
| |
Collapse
|
4
|
Vaidya K, Rodrigues G, Gupta S, Devarajan A, Yeolekar M, Madhusudhan MS, Kamat SS. Identification of sequence determinants for the ABHD14 enzymes. Proteins 2023. [PMID: 37974539 DOI: 10.1002/prot.26632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/14/2023] [Accepted: 10/24/2023] [Indexed: 11/19/2023]
Abstract
Over the course of evolution, enzymes have developed remarkable functional diversity in catalyzing important chemical reactions across various organisms, and understanding how new enzyme functions might have evolved remains an important question in modern enzymology. To systematically annotate functions, based on their protein sequences and available biochemical studies, enzymes with similar catalytic mechanisms have been clustered together into an enzyme superfamily. Typically, enzymes within a superfamily have similar overall three-dimensional structures, conserved catalytic residues, but large variations in substrate recognition sites and residues to accommodate the diverse biochemical reactions that are catalyzed within the superfamily. The serine hydrolases are an excellent example of such an enzyme superfamily. Based on known enzymatic activities and protein sequences, they are split almost equally into the serine proteases and metabolic serine hydrolases. Within the metabolic serine hydrolases, there are two outlying members, ABHD14A and ABHD14B, that have high sequence similarity, but their biological functions remained cryptic till recently. While ABHD14A still lacks any functional annotation to date, we recently showed that ABHD14B functions as a lysine deacetylase in mammals. Given their high sequence similarity, automated databases often wrongly assign ABHD14A and ABHD14B as the same enzyme, and therefore, annotating functions to them in various organisms has been problematic. In this article, we present a bioinformatics study coupled with biochemical experiments, which identifies key sequence determinants for both ABHD14A and ABHD14B, and enable better classification for them. In addition, we map these enzymes on an evolutionary timescale and provide a much-wanted resource for studying these interesting enzymes in different organisms.
Collapse
Affiliation(s)
- Kaveri Vaidya
- Department of Biology, Indian Institute of Science Education and Research Pune, Pune, Maharashtra, India
| | - Golding Rodrigues
- Department of Biology, Indian Institute of Science Education and Research Pune, Pune, Maharashtra, India
| | - Sonali Gupta
- Department of Biology, Indian Institute of Science Education and Research Pune, Pune, Maharashtra, India
| | - Archit Devarajan
- Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Bhopal, Madhya Pradesh, India
| | - Mihika Yeolekar
- Department of Biology, Indian Institute of Science Education and Research Pune, Pune, Maharashtra, India
| | - M S Madhusudhan
- Department of Biology, Indian Institute of Science Education and Research Pune, Pune, Maharashtra, India
| | - Siddhesh S Kamat
- Department of Biology, Indian Institute of Science Education and Research Pune, Pune, Maharashtra, India
| |
Collapse
|
5
|
Arani AA, Sehhati M, Tabatabaiefar MA. Genetic variant effect prediction by supervised nonnegative matrix tri-factorization. Mol Omics 2021; 17:740-751. [PMID: 34164638 DOI: 10.1039/d1mo00038a] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Discriminating between deleterious and neutral mutations among numerous non-synonymous single nucleotide variants (nsSNVs) that may be observed through whole exome sequencing (WES) is considered a great challenge. In this regard, many machine learning methods have been developed for the prediction of variant consequences based on the analysis of either protein amino acid sequences or protein structures or their integration with features extracted from various gene level data and phenotype information. Due to the availability of a high number of features and heterogeneity of sources, implementing a suitable integration method plays an important role in predictive models. In this study, we proposed a novel supervised nonnegative matrix tri-factorization (sNMTF) algorithm to integrate current variant prediction scores into the gene level data and disease networks. In this regard, a new feature space was constructed by the integration of all input data using sNMTF to provide appropriate inputs for training a classifier. For the assessment of the proposed model, we utilized two benchmark datasets. The first one contained 11 207 deleterious and 19 839 neutral nsSNPs, whereas for the other dataset we used 4416 and 4960 deleterious and neutral nsSNPs, respectively. In general, the evaluation of our proposed supervised NMTF method on both datasets indicated that, in comparison with the existing nsSNV effect prediction approaches, regardless of whether they are ensemble-based or not, our method exhibited a better performance, which resulted in a higher prediction accuracy on average of 15% than other ensemble scores. In addition, excluding any kind of data that were integrated into the final model led to a substantial decrease in deleterious variant prediction. The proposed model can be used as an extensible framework for integrating more hetergeneous sources.
Collapse
Affiliation(s)
- Asieh Amousoltani Arani
- Department of Bioelectric and Biomedical Engineering, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Mohammadreza Sehhati
- Department of Bioinformatics, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.
| | - Mohammad Amin Tabatabaiefar
- Department of Genetics and Molecular Biology, School of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran and GTaC Corp., Deputy of Research and Technology, Isfahan University of Medical Sciences, Isfahan, Iran
| |
Collapse
|
6
|
Zaucha J, Heinzinger M, Kulandaisamy A, Kataka E, Salvádor ÓL, Popov P, Rost B, Gromiha MM, Zhorov BS, Frishman D. Mutations in transmembrane proteins: diseases, evolutionary insights, prediction and comparison with globular proteins. Brief Bioinform 2020; 22:5872174. [PMID: 32672331 DOI: 10.1093/bib/bbaa132] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 05/26/2020] [Accepted: 05/28/2020] [Indexed: 12/18/2022] Open
Abstract
Membrane proteins are unique in that they interact with lipid bilayers, making them indispensable for transporting molecules and relaying signals between and across cells. Due to the significance of the protein's functions, mutations often have profound effects on the fitness of the host. This is apparent both from experimental studies, which implicated numerous missense variants in diseases, as well as from evolutionary signals that allow elucidating the physicochemical constraints that intermembrane and aqueous environments bring. In this review, we report on the current state of knowledge acquired on missense variants (referred to as to single amino acid variants) affecting membrane proteins as well as the insights that can be extrapolated from data already available. This includes an overview of the annotations for membrane protein variants that have been collated within databases dedicated to the topic, bioinformatics approaches that leverage evolutionary information in order to shed light on previously uncharacterized membrane protein structures or interaction interfaces, tools for predicting the effects of mutations tailored specifically towards the characteristics of membrane proteins as well as two clinically relevant case studies explaining the implications of mutated membrane proteins in cancer and cardiomyopathy.
Collapse
Affiliation(s)
- Jan Zaucha
- Department of Bioinformatics of the TUM School of Life Sciences Weihenstephan in Freising, Germany
| | - Michael Heinzinger
- Department of Informatics, Bioinformatics and Computational Biology of the TUM Faculty of Informatics in Garching, Germany
| | - A Kulandaisamy
- Department of Biotechnology of the IIT Bhupat and Jyoti Mehta School of BioSciences in Madras, India
| | - Evans Kataka
- Department of Bioinformatics of the TUM School of Life Sciences Weihenstephan in Freising, Germany
| | - Óscar Llorian Salvádor
- Department of Informatics, Bioinformatics and Computational Biology of the TUM Faculty of Informatics in Garching, Germany
| | - Petr Popov
- Center for Computational and Data-Intensive Science and Engineering of the Skolkovo Institute of Science and Technology in Moscow, Russia
| | - Burkhard Rost
- Department of Informatics, Bioinformatics and Computational Biology at the TUM Faculty of Informatics in Garching, Germany
| | | | - Boris S Zhorov
- Department of Biochemistry and Biomedical Sciences, McMaster University in Hamilton, Canada
| | - Dmitrij Frishman
- Department of Bioinformatics at the TUM School of Life Sciences Weihenstephan in Freising, Germany
| |
Collapse
|
7
|
Harris KL, Myers MB, McKim KL, Elespuru RK, Parsons BL. Rationale and Roadmap for Developing Panels of Hotspot Cancer Driver Gene Mutations as Biomarkers of Cancer Risk. ENVIRONMENTAL AND MOLECULAR MUTAGENESIS 2020; 61:152-175. [PMID: 31469467 PMCID: PMC6973253 DOI: 10.1002/em.22326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Revised: 08/23/2019] [Accepted: 08/26/2019] [Indexed: 05/24/2023]
Abstract
Cancer driver mutations (CDMs) are necessary and causal for carcinogenesis and have advantages as reporters of carcinogenic risk. However, little progress has been made toward developing measurements of CDMs as biomarkers for use in cancer risk assessment. Impediments for using a CDM-based metric to inform cancer risk include the complexity and stochastic nature of carcinogenesis, technical difficulty in quantifying low-frequency CDMs, and lack of established relationships between cancer driver mutant fractions and tumor incidence. Through literature review and database analyses, this review identifies the most promising targets to investigate as biomarkers of cancer risk. Mutational hotspots were discerned within the 20 most mutated genes across the 10 deadliest cancers. Forty genes were identified that encompass 108 mutational hotspot codons overrepresented in the COSMIC database; 424 different mutations within these hotspot codons account for approximately 63,000 tumors and their prevalence across tumor types is described. The review summarizes literature on the prevalence of CDMs in normal tissues and suggests such mutations are direct and indirect substrates for chemical carcinogenesis, which occurs in a spatially stochastic manner. Evidence that hotspot CDMs (hCDMs) frequently occur as tumor subpopulations is presented, indicating COSMIC data may underestimate mutation prevalence. Analyses of online databases show that genes containing hCDMs are enriched in functions related to intercellular communication. In its totality, the review provides a roadmap for the development of tissue-specific, CDM-based biomarkers of carcinogenic potential, comprised of batteries of hCDMs and can be measured by error-correct next-generation sequencing. Environ. Mol. Mutagen. 61:152-175, 2020. Published 2019. This article is a U.S. Government work and is in the public domain in the USA. Environmental and Molecular Mutagenesis published by Wiley Periodicals, Inc. on behalf of Environmental Mutagen Society.
Collapse
Affiliation(s)
- Kelly L. Harris
- Division of Genetic and Molecular ToxicologyNational Center for Toxicological Research, US Food and Drug AdministrationJeffersonArkansas
| | - Meagan B. Myers
- Division of Genetic and Molecular ToxicologyNational Center for Toxicological Research, US Food and Drug AdministrationJeffersonArkansas
| | - Karen L. McKim
- Division of Genetic and Molecular ToxicologyNational Center for Toxicological Research, US Food and Drug AdministrationJeffersonArkansas
| | - Rosalie K. Elespuru
- Division of Biology, Chemistry and Materials ScienceCDRH/OSEL, US Food and Drug AdministrationSilver SpringMaryland
| | - Barbara L. Parsons
- Division of Genetic and Molecular ToxicologyNational Center for Toxicological Research, US Food and Drug AdministrationJeffersonArkansas
| |
Collapse
|
8
|
Galano-Frutos JJ, García-Cebollada H, Sancho J. Molecular dynamics simulations for genetic interpretation in protein coding regions: where we are, where to go and when. Brief Bioinform 2019; 22:3-19. [PMID: 31813950 DOI: 10.1093/bib/bbz146] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Revised: 09/22/2019] [Accepted: 10/25/2019] [Indexed: 12/18/2022] Open
Abstract
The increasing ease with which massive genetic information can be obtained from patients or healthy individuals has stimulated the development of interpretive bioinformatics tools as aids in clinical practice. Most such tools analyze evolutionary information and simple physical-chemical properties to predict whether replacement of one amino acid residue with another will be tolerated or cause disease. Those approaches achieve up to 80-85% accuracy as binary classifiers (neutral/pathogenic). As such accuracy is insufficient for medical decision to be based on, and it does not appear to be increasing, more precise methods, such as full-atom molecular dynamics (MD) simulations in explicit solvent, are also discussed. Then, to describe the goal of interpreting human genetic variations at large scale through MD simulations, we restrictively refer to all possible protein variants carrying single-amino-acid substitutions arising from single-nucleotide variations as the human variome. We calculate its size and develop a simple model that allows calculating the simulation time needed to have a 0.99 probability of observing unfolding events of any unstable variant. The knowledge of that time enables performing a binary classification of the variants (stable-potentially neutral/unstable-pathogenic). Our model indicates that the human variome cannot be simulated with present computing capabilities. However, if they continue to increase as per Moore's law, it could be simulated (at 65°C) spending only 3 years in the task if we started in 2031. The simulation of individual protein variomes is achievable in short times starting at present. International coordination seems appropriate to embark upon massive MD simulations of protein variants.
Collapse
Affiliation(s)
- Juan J Galano-Frutos
- Protein Folding and Molecular Design (ProtMol)' group at BIFI, University of Zaragoza
| | | | - Javier Sancho
- Protein Folding and Molecular Design (ProtMol)' group at BIFI, University of Zaragoza
| |
Collapse
|
9
|
Hu Z, Yu C, Furutsuki M, Andreoletti G, Ly M, Hoskins R, Adhikari AN, Brenner SE. VIPdb, a genetic Variant Impact Predictor Database. Hum Mutat 2019; 40:1202-1214. [PMID: 31283070 PMCID: PMC7288905 DOI: 10.1002/humu.23858] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 06/27/2019] [Indexed: 12/30/2022]
Abstract
Genome sequencing identifies vast number of genetic variants. Predicting these variants' molecular and clinical effects is one of the preeminent challenges in human genetics. Accurate prediction of the impact of genetic variants improves our understanding of how genetic information is conveyed to molecular and cellular functions, and is an essential step towards precision medicine. Over one hundred tools/resources have been developed specifically for this purpose. We summarize these tools as well as their characteristics, in the genetic Variant Impact Predictor Database (VIPdb). This database will help researchers and clinicians explore appropriate tools, and inform the development of improved methods. VIPdb can be browsed and downloaded at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Changhua Yu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Department of Bioengineering, University of California, Berkeley, California 94720, USA
| | - Mabel Furutsuki
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720, USA
| | - Gaia Andreoletti
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Melissa Ly
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Division of Data Sciences, University of California, Berkeley, California 94720, USA
| | - Roger Hoskins
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Aashish N. Adhikari
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Steven E. Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
10
|
Glusman G, Rose PW, Prlić A, Dougherty J, Duarte JM, Hoffman AS, Barton GJ, Bendixen E, Bergquist T, Bock C, Brunk E, Buljan M, Burley SK, Cai B, Carter H, Gao J, Godzik A, Heuer M, Hicks M, Hrabe T, Karchin R, Leman JK, Lane L, Masica DL, Mooney SD, Moult J, Omenn GS, Pearl F, Pejaver V, Reynolds SM, Rokem A, Schwede T, Song S, Tilgner H, Valasatava Y, Zhang Y, Deutsch EW. Mapping genetic variations to three-dimensional protein structures to enhance variant interpretation: a proposed framework. Genome Med 2017; 9:113. [PMID: 29254494 PMCID: PMC5735928 DOI: 10.1186/s13073-017-0509-y] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
The translation of personal genomics to precision medicine depends on the accurate interpretation of the multitude of genetic variants observed for each individual. However, even when genetic variants are predicted to modify a protein, their functional implications may be unclear. Many diseases are caused by genetic variants affecting important protein features, such as enzyme active sites or interaction interfaces. The scientific community has catalogued millions of genetic variants in genomic databases and thousands of protein structures in the Protein Data Bank. Mapping mutations onto three-dimensional (3D) structures enables atomic-level analyses of protein positions that may be important for the stability or formation of interactions; these may explain the effect of mutations and in some cases even open a path for targeted drug development. To accelerate progress in the integration of these data types, we held a two-day Gene Variation to 3D (GVto3D) workshop to report on the latest advances and to discuss unmet needs. The overarching goal of the workshop was to address the question: what can be done together as a community to advance the integration of genetic variants and 3D protein structures that could not be done by a single investigator or laboratory? Here we describe the workshop outcomes, review the state of the field, and propose the development of a framework with which to promote progress in this arena. The framework will include a set of standard formats, common ontologies, a common application programming interface to enable interoperation of the resources, and a Tool Registry to make it easy to find and apply the tools to specific analysis problems. Interoperability will enable integration of diverse data sources and tools and collaborative development of variant effect prediction methods.
Collapse
Affiliation(s)
| | - Peter W Rose
- San Diego Supercomputer Center, University of California San Diego, La Jolla, CA, 98093, USA
| | - Andreas Prlić
- San Diego Supercomputer Center, University of California San Diego, La Jolla, CA, 98093, USA.,RCSB Protein Data Bank, University of California San Diego, La Jolla, CA, 98093, USA
| | | | - José M Duarte
- RCSB Protein Data Bank, University of California San Diego, La Jolla, CA, 98093, USA
| | - Andrew S Hoffman
- Human Centered Design & Engineering, University of Washington, Seattle, WA, 98195, USA
| | - Geoffrey J Barton
- Division of Computational Biology, School of Life Sciences, University of Dundee, Dundee, DD1 5EH, UK
| | - Emøke Bendixen
- Department of Molecular Biology and Genetics, Aarhus University, 8000, Aarhus, Denmark
| | - Timothy Bergquist
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, 98109, USA
| | - Christian Bock
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, 98109, USA
| | - Elizabeth Brunk
- University of California San Diego, La Jolla, CA, 92093, USA
| | - Marija Buljan
- Institute of Molecular Systems Biology, ETH Zurich, CH-8093, Zurich, Switzerland
| | - Stephen K Burley
- San Diego Supercomputer Center, University of California San Diego, La Jolla, CA, 98093, USA.,RCSB Protein Data Bank, University of California San Diego, La Jolla, CA, 98093, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Binghuang Cai
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, 98109, USA
| | - Hannah Carter
- University of California San Diego, La Jolla, CA, 92093, USA
| | - JianJiong Gao
- Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| | - Adam Godzik
- SBP Medical Discovery Institute, La Jolla, CA, 92037, USA
| | - Michael Heuer
- AMPLab, University of California, Berkeley, CA, 94720, USA
| | | | - Thomas Hrabe
- SBP Medical Discovery Institute, La Jolla, CA, 92037, USA
| | - Rachel Karchin
- Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD, 21218, USA.,Department of Oncology, Johns Hopkins Medicine, Baltimore, MD, 21287, USA
| | - Julia Koehler Leman
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, 10010, USA.,Department of Biology and Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - Lydie Lane
- SIB Swiss Institute of Bioinformatics and University of Geneva, CH-1211, Geneva, Switzerland
| | - David L Masica
- Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, 98109, USA
| | - John Moult
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD, 20850, USA.,Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD, 20742, USA
| | - Gilbert S Omenn
- Institute for Systems Biology, Seattle, WA, 98109, USA.,Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109-2218, USA
| | - Frances Pearl
- School of Life Sciences, University of Sussex, Brighton, BN1 9QG, UK
| | - Vikas Pejaver
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, 98109, USA.,The University of Washington eScience Institute, Seattle, WA, 98195, USA
| | | | - Ariel Rokem
- The University of Washington eScience Institute, Seattle, WA, 98195, USA
| | - Torsten Schwede
- SIB Swiss Institute of Bioinformatics and Biozentrum University of Basel, CH-4056, Basel, Switzerland
| | - Sicheng Song
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, 98109, USA
| | - Hagen Tilgner
- Brain and Mind Research Institute, Weill Cornell Medicine, New York City, NY, 10021, USA
| | - Yana Valasatava
- RCSB Protein Data Bank, University of California San Diego, La Jolla, CA, 98093, USA
| | - Yang Zhang
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109-2218, USA
| | | |
Collapse
|
11
|
Zhang Q. Associating rare genetic variants with human diseases. Front Genet 2015; 6:133. [PMID: 25904936 PMCID: PMC4389536 DOI: 10.3389/fgene.2015.00133] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Accepted: 03/19/2015] [Indexed: 11/20/2022] Open
Affiliation(s)
- Qunyuan Zhang
- Division of Statistical Genomics, Washington University School of Medicine St. Louis, MO, USA
| |
Collapse
|
12
|
Pepin MG, Murray ML, Bailey S, Leistritz-Kessler D, Schwarze U, Byers PH. The challenge of comprehensive and consistent sequence variant interpretation between clinical laboratories. Genet Med 2015; 18:20-4. [PMID: 25834947 DOI: 10.1038/gim.2015.31] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2014] [Accepted: 02/10/2015] [Indexed: 11/09/2022] Open
Abstract
PURPOSE Genetic testing has shifted from academic laboratories with expertise in specific genes to commercial laboratories that offer tests of a diverse array of genes. The purpose of this comparative study was to determine whether one academic laboratory's model of variant interpretation is similar to that of several commercial laboratories. METHODS The Collagen Diagnostic Laboratory (CDL) received, over a 14-month period, 38 requests to interpret variants originally identified by an outside laboratory (OL). The interpretations by the OL and CDL were compared and discrepancies were assessed. RESULTS Interpretations from the OL and CDL were concordant in 11 inquiries (29%); discrepancies were moderate in 11 instances (29%) and significant in 16 (42%). Factors that caused discrepancies included the following: (i) private data were not shared in a public database (n = 9); (ii) publicly available allele frequency data were not referenced and used as evidence (n = 5); and (iii) important aspects of protein structure and function were not taken into account (n = 13). CONCLUSION Comprehensive interpretation of sequence variants depends on good functional tests and well-curated variant databases. Provision of clinical information to the clinical laboratory, mandatory submission of identified variants with phenotype data to common resources, and collaboration between clinical laboratories and recognized experts is likely to improve consistency in variant interpretation among clinical laboratories.Genet Med 18 1, 20-24.
Collapse
Affiliation(s)
- Melanie G Pepin
- Department of Pathology, University of Washington, Seattle, Washington, USA
| | - Mitzi L Murray
- Department of Pathology, University of Washington, Seattle, Washington, USA.,Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington, USA
| | - Samuel Bailey
- Department of Pathology, University of Washington, Seattle, Washington, USA
| | | | - Ulrike Schwarze
- Department of Pathology, University of Washington, Seattle, Washington, USA
| | - Peter H Byers
- Department of Pathology, University of Washington, Seattle, Washington, USA.,Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington, USA
| |
Collapse
|
13
|
Insights into the genetic foundations of human communication. Neuropsychol Rev 2015; 25:3-26. [PMID: 25597031 DOI: 10.1007/s11065-014-9277-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2014] [Accepted: 12/22/2014] [Indexed: 12/19/2022]
Abstract
The human capacity to acquire sophisticated language is unmatched in the animal kingdom. Despite the discontinuity in communicative abilities between humans and other primates, language is built on ancient genetic foundations, which are being illuminated by comparative genomics. The genetic architecture of the language faculty is also being uncovered by research into neurodevelopmental disorders that disrupt the normally effortless process of language acquisition. In this article, we discuss the strategies that researchers are using to reveal genetic factors contributing to communicative abilities, and review progress in identifying the relevant genes and genetic variants. The first gene directly implicated in a speech and language disorder was FOXP2. Using this gene as a case study, we illustrate how evidence from genetics, molecular cell biology, animal models and human neuroimaging has converged to build a picture of the role of FOXP2 in neurodevelopment, providing a framework for future endeavors to bridge the gaps between genes, brains and behavior.
Collapse
|
14
|
Katsonis P, Koire A, Wilson SJ, Hsu TK, Lua RC, Wilkins AD, Lichtarge O. Single nucleotide variations: biological impact and theoretical interpretation. Protein Sci 2014; 23:1650-66. [PMID: 25234433 PMCID: PMC4253807 DOI: 10.1002/pro.2552] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2014] [Revised: 09/12/2014] [Accepted: 09/15/2014] [Indexed: 12/27/2022]
Abstract
Genome-wide association studies (GWAS) and whole-exome sequencing (WES) generate massive amounts of genomic variant information, and a major challenge is to identify which variations drive disease or contribute to phenotypic traits. Because the majority of known disease-causing mutations are exonic non-synonymous single nucleotide variations (nsSNVs), most studies focus on whether these nsSNVs affect protein function. Computational studies show that the impact of nsSNVs on protein function reflects sequence homology and structural information and predict the impact through statistical methods, machine learning techniques, or models of protein evolution. Here, we review impact prediction methods and discuss their underlying principles, their advantages and limitations, and how they compare to and complement one another. Finally, we present current applications and future directions for these methods in biological research and medical genetics.
Collapse
Affiliation(s)
- Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of MedicineHouston, Texas
| | - Amanda Koire
- Department of Structural and Computational Biology and Molecular BiophysicsHouston, Texas
| | - Stephen Joseph Wilson
- Department of Biochemistry and Molecular Biology, Baylor College of MedicineHouston, Texas
| | - Teng-Kuei Hsu
- Department of Biochemistry and Molecular Biology, Baylor College of MedicineHouston, Texas
| | - Rhonald C Lua
- Department of Molecular and Human Genetics, Baylor College of MedicineHouston, Texas
| | - Angela Dawn Wilkins
- Department of Molecular and Human Genetics, Baylor College of MedicineHouston, Texas
- Computational and Integrative Biomedical Research Center, Baylor College of MedicineHouston, Texas
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of MedicineHouston, Texas
- Department of Structural and Computational Biology and Molecular BiophysicsHouston, Texas
- Department of Biochemistry and Molecular Biology, Baylor College of MedicineHouston, Texas
- Computational and Integrative Biomedical Research Center, Baylor College of MedicineHouston, Texas
- Department of Pharmacology, Baylor College of MedicineHouston, Texas
| |
Collapse
|