101
|
Wong KC, Zhang Z. SNPdryad: predicting deleterious non-synonymous human SNPs using only orthologous protein sequences. ACTA ACUST UNITED AC 2014; 30:1112-1119. [PMID: 24389653 DOI: 10.1093/bioinformatics/btt769] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Accepted: 12/13/2013] [Indexed: 11/12/2022]
Abstract
MOTIVATION The recent advances in genome sequencing have revealed an abundance of non-synonymous polymorphisms among human individuals; subsequently, it is of immense interest and importance to predict whether such substitutions are functional neutral or have deleterious effects. The accuracy of such prediction algorithms depends on the quality of the multiple-sequence alignment, which is used to infer how an amino acid substitution is tolerated at a given position. Because of the scarcity of orthologous protein sequences in the past, the existing prediction algorithms all include sequences of protein paralogs in the alignment, which can dilute the conservation signal and affect prediction accuracy. However, we believe that, with the sequencing of a large number of mammalian genomes, it is now feasible to include only protein orthologs in the alignment and improve the prediction performance. RESULTS We have developed a novel prediction algorithm, named SNPdryad, which only includes protein orthologs in building a multiple sequence alignment. Among many other innovations, SNPdryad uses different conservation scoring schemes and uses Random Forest as a classifier. We have tested SNPdryad on several datasets. We found that SNPdryad consistently outperformed other methods in several performance metrics, which is attributed to the exclusion of paralogous sequence. We have run SNPdryad on the complete human proteome, generating prediction scores for all the possible amino acid substitutions. AVAILABILITY AND IMPLEMENTATION The algorithm and the prediction results can be accessed from the Web site: http://snps.ccbr.utoronto.ca:8080/SNPdryad/ CONTACT: Zhaolei.Zhang@utoronto.ca Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ka-Chun Wong
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada M5S 3G4 The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1 and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Computer Science, University of Toronto, Toronto, Ontario, Canada M5S 3G4 The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1 and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| | - Zhaolei Zhang
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada M5S 3G4 The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1 and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Computer Science, University of Toronto, Toronto, Ontario, Canada M5S 3G4 The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1 and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Computer Science, University of Toronto, Toronto, Ontario, Canada M5S 3G4 The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1 and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Computer Science, University of Toronto, Toronto, Ontario, Canada M5S 3G4 The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1 and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| |
Collapse
|
102
|
Abstract
Moving from a traditional medical model of treating pathologies to an individualized predictive and preventive model of personalized medicine promises to reduce the healthcare cost on an overburdened and overwhelmed system. Next-generation sequencing (NGS) has the potential to accelerate the early detection of disorders and the identification of pharmacogenetics markers to customize treatments. This review explains the historical facts that led to the development of NGS along with the strengths and weakness of NGS, with a special emphasis on the analytical aspects used to process NGS data. There are solutions to all the steps necessary for performing NGS in the clinical context where the majority of them are very efficient, but there are some crucial steps in the process that need immediate attention.
Collapse
Affiliation(s)
- Manuel L. Gonzalez-Garay
- Center for Molecular Imaging, Division of Genomics & Bioinformatics, The Brown Foundation Institute of Molecular Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
103
|
Gemovic B, Perovic V, Glisic S, Veljkovic N. Feature-based classification of amino acid substitutions outside conserved functional protein domains. ScientificWorldJournal 2013; 2013:948617. [PMID: 24348198 PMCID: PMC3855963 DOI: 10.1155/2013/948617] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2013] [Accepted: 09/24/2013] [Indexed: 01/01/2023] Open
Abstract
There are more than 500 amino acid substitutions in each human genome, and bioinformatics tools irreplaceably contribute to determination of their functional effects. We have developed feature-based algorithm for the detection of mutations outside conserved functional domains (CFDs) and compared its classification efficacy with the most commonly used phylogeny-based tools, PolyPhen-2 and SIFT. The new algorithm is based on the informational spectrum method (ISM), a feature-based technique, and statistical analysis. Our dataset contained neutral polymorphisms and mutations associated with myeloid malignancies from epigenetic regulators ASXL1, DNMT3A, EZH2, and TET2. PolyPhen-2 and SIFT had significantly lower accuracies in predicting the effects of amino acid substitutions outside CFDs than expected, with especially low sensitivity. On the other hand, only ISM algorithm showed statistically significant classification of these sequences. It outperformed PolyPhen-2 and SIFT by 15% and 13%, respectively. These results suggest that feature-based methods, like ISM, are more suitable for the classification of amino acid substitutions outside CFDs than phylogeny-based tools.
Collapse
Affiliation(s)
- Branislava Gemovic
- Centre for Multidisciplinary Research and Engineering, Vinca Institute of Nuclear Sciences, University of Belgrade, 12-14 Mihajla Petrovica Alasa, 11001 Belgrade, Serbia
| | - Vladimir Perovic
- Centre for Multidisciplinary Research and Engineering, Vinca Institute of Nuclear Sciences, University of Belgrade, 12-14 Mihajla Petrovica Alasa, 11001 Belgrade, Serbia
| | - Sanja Glisic
- Centre for Multidisciplinary Research and Engineering, Vinca Institute of Nuclear Sciences, University of Belgrade, 12-14 Mihajla Petrovica Alasa, 11001 Belgrade, Serbia
| | - Nevena Veljkovic
- Centre for Multidisciplinary Research and Engineering, Vinca Institute of Nuclear Sciences, University of Belgrade, 12-14 Mihajla Petrovica Alasa, 11001 Belgrade, Serbia
| |
Collapse
|
104
|
Cheng WC, Chung IF, Chen CY, Sun HJ, Fen JJ, Tang WC, Chang TY, Wong TT, Wang HW. DriverDB: an exome sequencing database for cancer driver gene identification. Nucleic Acids Res 2013; 42:D1048-54. [PMID: 24214964 PMCID: PMC3965046 DOI: 10.1093/nar/gkt1025] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Exome sequencing (exome-seq) has aided in the discovery of a huge amount of mutations in cancers, yet challenges remain in converting oncogenomics data into information that is interpretable and accessible for clinical care. We constructed DriverDB (http://ngs.ym.edu.tw/driverdb/), a database which incorporates 6079 cases of exome-seq data, annotation databases (such as dbSNP, 1000 Genome and Cosmic) and published bioinformatics algorithms dedicated to driver gene/mutation identification. We provide two points of view, ‘Cancer’ and ‘Gene’, to help researchers to visualize the relationships between cancers and driver genes/mutations. The ‘Cancer’ section summarizes the calculated results of driver genes by eight computational methods for a specific cancer type/dataset and provides three levels of biological interpretation for realization of the relationships between driver genes. The ‘Gene’ section is designed to visualize the mutation information of a driver gene in five different aspects. Moreover, a ‘Meta-Analysis’ function is provided so researchers may identify driver genes in customer-defined samples. The novel driver genes/mutations identified hold potential for both basic research and biotech applications.
Collapse
Affiliation(s)
- Wei-Chung Cheng
- Pediatric Neurosurgery, Department of Surgery, Cheng Hsin General Hospital, Taipei 11220, Taiwan, VGH-YM Genomic Research Center, National Yang-Ming University, Taipei 11221, Taiwan, Institute of Biomedical Informatics, National Yang-Ming University, Taipei 11221, Taiwan, Information Technology Office, Taipei Veterans General Hospital, Taipei 11217, Taiwan, Institute of Microbiology and Immunology, National Yang-Ming University, Taipei 11221, Taiwan and Department of Education and Research, Taipei City Hospital, Taipei 10341, Taiwan
| | | | | | | | | | | | | | | | | |
Collapse
|
105
|
Robinson PN, Köhler S, Oellrich A, Wang K, Mungall CJ, Lewis SE, Washington N, Bauer S, Seelow D, Krawitz P, Gilissen C, Haendel M, Smedley D. Improved exome prioritization of disease genes through cross-species phenotype comparison. Genome Res 2013; 24:340-8. [PMID: 24162188 PMCID: PMC3912424 DOI: 10.1101/gr.160325.113] [Citation(s) in RCA: 245] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Numerous new disease-gene associations have been identified by whole-exome sequencing studies in the last few years. However, many cases remain unsolved due to the sheer number of candidate variants remaining after common filtering strategies such as removing low quality and common variants and those deemed unlikely to be pathogenic. The observation that each of our genomes contains about 100 genuine loss-of-function variants makes identification of the causative mutation problematic when using these strategies alone. We propose using the wealth of genotype to phenotype data that already exists from model organism studies to assess the potential impact of these exome variants. Here, we introduce PHenotypic Interpretation of Variants in Exomes (PHIVE), an algorithm that integrates the calculation of phenotype similarity between human diseases and genetically modified mouse models with evaluation of the variants according to allele frequency, pathogenicity, and mode of inheritance approaches in our Exomiser tool. Large-scale validation of PHIVE analysis using 100,000 exomes containing known mutations demonstrated a substantial improvement (up to 54.1-fold) over purely variant-based (frequency and pathogenicity) methods with the correct gene recalled as the top hit in up to 83% of samples, corresponding to an area under the ROC curve of >95%. We conclude that incorporation of phenotype data can play a vital role in translational bioinformatics and propose that exome sequencing projects should systematically capture clinical phenotypes to take advantage of the strategy presented here.
Collapse
Affiliation(s)
- Peter N Robinson
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
106
|
Riera C, Lois S, de la Cruz X. Prediction of pathological mutations in proteins: the challenge of integrating sequence conservation and structure stability principles. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2013. [DOI: 10.1002/wcms.1170] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Casandra Riera
- Laboratory of Translational Bioinformatics in Neuroscience; VHIR; Barcelona Spain
| | - Sergio Lois
- Laboratory of Translational Bioinformatics in Neuroscience; VHIR; Barcelona Spain
| | - Xavier de la Cruz
- Laboratory of Translational Bioinformatics in Neuroscience; VHIR; Barcelona Spain
- Institució Catalana per la Recerca i Estudis Avançats (ICREA); Barcelona Spain
| |
Collapse
|
107
|
Yates CM, Sternberg MJE. The effects of non-synonymous single nucleotide polymorphisms (nsSNPs) on protein-protein interactions. J Mol Biol 2013; 425:3949-63. [PMID: 23867278 DOI: 10.1016/j.jmb.2013.07.012] [Citation(s) in RCA: 152] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2013] [Revised: 07/02/2013] [Accepted: 07/09/2013] [Indexed: 12/23/2022]
Abstract
Non-synonymous single nucleotide polymorphisms (nsSNPs) are single base changes leading to a change to the amino acid sequence of the encoded protein. Many of these variants are associated with disease, so nsSNPs have been well studied, with studies looking at the effects of nsSNPs on individual proteins, for example, on stability and enzyme active sites. In recent years, the impact of nsSNPs upon protein-protein interactions has also been investigated, giving a greater insight into the mechanisms by which nsSNPs can lead to disease. In this review, we summarize these studies, looking at the various mechanisms by which nsSNPs can affect protein-protein interactions. We focus on structural changes that can impair interaction, changes to disorder, gain of interaction, and post-translational modifications before looking at some examples of nsSNPs at human-pathogen protein-protein interfaces and the analysis of nsSNPs from a network perspective.
Collapse
Affiliation(s)
- Christopher M Yates
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Sir Ernst Chain Building, Imperial College London, South Kensington, SW7 2AZ, UK.
| | | |
Collapse
|
108
|
Liu X, Jian X, Boerwinkle E. dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations. Hum Mutat 2013; 34:E2393-402. [PMID: 23843252 DOI: 10.1002/humu.22376] [Citation(s) in RCA: 488] [Impact Index Per Article: 44.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2013] [Accepted: 06/24/2013] [Indexed: 12/18/2022]
Abstract
dbNSFP is a database developed for functional prediction and annotation of all potential non-synonymous single-nucleotide variants (nsSNVs) in the human genome. This database significantly facilitates the process of querying predictions and annotations from different databases/web-servers for large amounts of nsSNVs discovered in exome-sequencing studies. Here we report a recent major update of the database to version 2.0. We have rebuilt the SNV collection based on GENCODE 9 and currently the database includes 87,347,043 nsSNVs and 2,270,742 essential splice site SNVs (an 18% increase compared to dbNSFP v1.0). For each nsSNV dbNSFP v2.0 has added two prediction scores (MutationAssessor and FATHMM) and two conservation scores (GERP++ and SiPhy). The original five prediction and conservation scores in v1.0 (SIFT, Polyphen2, LRT, MutationTaster and PhyloP) have been updated. Rich functional annotations for SNVs and genes have also been added into the new version, including allele frequencies observed in the 1000 Genomes Project phase 1 data and the NHLBI Exome Sequencing Project, various gene IDs from different databases, functional descriptions of genes, gene expression and gene interaction information, among others. dbNSFP v2.0 is freely available for download at http://sites.google.com/site/jpopgen/dbNSFP.
Collapse
Affiliation(s)
- Xiaoming Liu
- Human Genetics Center, School of Public Health, University of Texas Health Science Center at Houston, Houston, Texas 77030, USA.
| | | | | |
Collapse
|
109
|
Merello E, Kibar Z, Allache R, Piatelli G, Cama A, Capra V, De Marco P. Rare missense variants inDVL1, one of the human counterparts of theDrosophila dishevelledgene, do not confer increased risk for neural tube defects. ACTA ACUST UNITED AC 2013; 97:452-5. [DOI: 10.1002/bdra.23157] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2013] [Revised: 05/15/2013] [Accepted: 05/17/2013] [Indexed: 12/20/2022]
Affiliation(s)
- Elisa Merello
- UOC Neurochirurgia, Istituto Giannina Gaslini; Genova Italia
| | - Zoha Kibar
- Department of Obstetrics and Gynecology; CHU Sainte Justine Research Center and University of Montreal; Montreal Quebec Canada
| | - Redouane Allache
- Department of Obstetrics and Gynecology; CHU Sainte Justine Research Center and University of Montreal; Montreal Quebec Canada
| | | | - Armando Cama
- UOC Neurochirurgia, Istituto Giannina Gaslini; Genova Italia
| | - Valeria Capra
- UOC Neurochirurgia, Istituto Giannina Gaslini; Genova Italia
| | | |
Collapse
|
110
|
Quiles F, Fernández-Rodríguez J, Mosca R, Feliubadaló L, Tornero E, Brunet J, Blanco I, Capellá G, Pujana MÀ, Aloy P, Monteiro A, Lázaro C. Functional and structural analysis of C-terminal BRCA1 missense variants. PLoS One 2013; 8:e61302. [PMID: 23613828 PMCID: PMC3629201 DOI: 10.1371/journal.pone.0061302] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2013] [Accepted: 03/07/2013] [Indexed: 11/18/2022] Open
Abstract
Germline inactivating mutations in BRCA1 and BRCA2 genes are responsible for Hereditary Breast and Ovarian Cancer Syndrome (HBOCS). Genetic testing of these genes is available, although approximately 15% of tests identify variants of uncertain significance (VUS). Classification of these variants into pathogenic or non-pathogenic type is an important challenge in genetic diagnosis and counseling. The aim of the present study is to functionally assess a set of 7 missense VUS (Q1409L, S1473P, E1586G, R1589H, Y1703S, W1718L and G1770V) located in the C-terminal region of BRCA1 by combining in silico prediction tools and structural analysis with a transcription activation (TA) assay. The in silico prediction programs gave discrepant results making its interpretation difficult. Structural analysis of the three variants located in the BRCT domains (Y1703S, W1718L and G1770V) reveals significant alterations of BRCT structure. The TA assay shows that variants Y1703S, W1718L and G1770V dramatically compromise the transcriptional activity of BRCA1, while variants Q1409L, S1473P, E1586G and R1589H behave like wild-type BRCA1. In conclusion, our results suggest that variants Y1703S, W1718L and G1770V can be classified as likely pathogenic BRCA1 mutations.
Collapse
Affiliation(s)
- Francisco Quiles
- Hereditary Cancer Program, Catalan Institute of Oncology-(Bellvitge Institute for Biomedical Research; Girona Institute for Biomedical Research; Germans Trial i Pujol Research Institute) (ICO-IDIBELL, ICO-IdIBGi, ICO-IGTP), L'Hospitalet de Llobregat, Barcelona, Spain
| | - Juana Fernández-Rodríguez
- Hereditary Cancer Program, Catalan Institute of Oncology-(Bellvitge Institute for Biomedical Research; Girona Institute for Biomedical Research; Germans Trial i Pujol Research Institute) (ICO-IDIBELL, ICO-IdIBGi, ICO-IGTP), L'Hospitalet de Llobregat, Barcelona, Spain
| | - Roberto Mosca
- Institute for Research in Biomedicine (IRB) Barcelona, Joint IRB-BSC Program in Computational Biology, Barcelona, Spain
| | - Lídia Feliubadaló
- Hereditary Cancer Program, Catalan Institute of Oncology-(Bellvitge Institute for Biomedical Research; Girona Institute for Biomedical Research; Germans Trial i Pujol Research Institute) (ICO-IDIBELL, ICO-IdIBGi, ICO-IGTP), L'Hospitalet de Llobregat, Barcelona, Spain
| | - Eva Tornero
- Hereditary Cancer Program, Catalan Institute of Oncology-(Bellvitge Institute for Biomedical Research; Girona Institute for Biomedical Research; Germans Trial i Pujol Research Institute) (ICO-IDIBELL, ICO-IdIBGi, ICO-IGTP), L'Hospitalet de Llobregat, Barcelona, Spain
| | - Joan Brunet
- Hereditary Cancer Program, Catalan Institute of Oncology-(Bellvitge Institute for Biomedical Research; Girona Institute for Biomedical Research; Germans Trial i Pujol Research Institute) (ICO-IDIBELL, ICO-IdIBGi, ICO-IGTP), L'Hospitalet de Llobregat, Barcelona, Spain
| | - Ignacio Blanco
- Hereditary Cancer Program, Catalan Institute of Oncology-(Bellvitge Institute for Biomedical Research; Girona Institute for Biomedical Research; Germans Trial i Pujol Research Institute) (ICO-IDIBELL, ICO-IdIBGi, ICO-IGTP), L'Hospitalet de Llobregat, Barcelona, Spain
| | - Gabriel Capellá
- Hereditary Cancer Program, Catalan Institute of Oncology-(Bellvitge Institute for Biomedical Research; Girona Institute for Biomedical Research; Germans Trial i Pujol Research Institute) (ICO-IDIBELL, ICO-IdIBGi, ICO-IGTP), L'Hospitalet de Llobregat, Barcelona, Spain
| | - Miquel Àngel Pujana
- Breast Cancer Unit, Translational Research Laboratory, Catalan Institute of Oncology (ICO), Bellvitge Institute for Biomedical Research (IDIBELL), L'Hospitalet de Llobregat, Barcelona, Spain
| | - Patrick Aloy
- Institute for Research in Biomedicine (IRB) Barcelona, Joint IRB-BSC Program in Computational Biology, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Alvaro Monteiro
- Cancer Epidemiology Program, H. Lee Moffitt Cancer Center & Research Institute, Tampa, Florida, United States of America
| | - Conxi Lázaro
- Hereditary Cancer Program, Catalan Institute of Oncology-(Bellvitge Institute for Biomedical Research; Girona Institute for Biomedical Research; Germans Trial i Pujol Research Institute) (ICO-IDIBELL, ICO-IdIBGi, ICO-IGTP), L'Hospitalet de Llobregat, Barcelona, Spain
- * E-mail:
| |
Collapse
|