1
|
Diabate M, Islam MM, Nagy G, Banerjee T, Dhar S, Smith N, Adamovich AI, Starita LM, Parvin JD. DNA repair function scores for 2172 variants in the BRCA1 amino-terminus. PLoS Genet 2023; 19:e1010739. [PMID: 37578980 PMCID: PMC10449183 DOI: 10.1371/journal.pgen.1010739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 08/24/2023] [Accepted: 07/16/2023] [Indexed: 08/16/2023] Open
Abstract
Single nucleotide variants are the most frequent type of sequence changes detected in the genome and these are frequently variants of uncertain significance (VUS). VUS are changes in DNA for which disease risk association is unknown. Thus, methods that classify the functional impact of a VUS can be used as evidence for variant interpretation. In the case of the breast and ovarian cancer specific tumor suppressor protein, BRCA1, pathogenic missense variants frequently score as loss of function in an assay for homology-directed repair (HDR) of DNA double-strand breaks. We previously published functional results using a multiplexed assay for 1056 amino acid substitutions residues 2-192 in the amino terminus of BRCA1. In this study, we have re-assessed the data from this multiplexed assay using an improved analysis pipeline. These new analysis methods yield functional scores for more variants in the first 192 amino acids of BRCA1, plus we report new results for BRCA1 amino acid residues 193-302. We now present the functional classification of 2172 BRCA1 variants in BRCA1 residues 2-302 using the multiplexed HDR assay. Comparison of the functional determinations of the missense variants with clinically known benign or pathogenic variants indicated 93% sensitivity and 100% specificity for this assay. The results from BRCA1 variants tested in this assay are a resource for clinical geneticists for evidence to evaluate VUS in BRCA1.
Collapse
Affiliation(s)
- Mariame Diabate
- The Ohio State University, Department of Biomedical Informatics, Columbus, Ohio, United States of America
- The Ohio State University Comprehensive Cancer Center, Columbus, Ohio, United States of America
| | - Muhtadi M. Islam
- The Ohio State University, Department of Biomedical Informatics, Columbus, Ohio, United States of America
- The Ohio State University Comprehensive Cancer Center, Columbus, Ohio, United States of America
| | - Gregory Nagy
- The Ohio State University, Department of Biomedical Informatics, Columbus, Ohio, United States of America
- The Ohio State University Comprehensive Cancer Center, Columbus, Ohio, United States of America
| | - Tapahsama Banerjee
- The Ohio State University, Department of Biomedical Informatics, Columbus, Ohio, United States of America
- The Ohio State University Comprehensive Cancer Center, Columbus, Ohio, United States of America
| | - Shruti Dhar
- The Ohio State University, Department of Biomedical Informatics, Columbus, Ohio, United States of America
- The Ohio State University Comprehensive Cancer Center, Columbus, Ohio, United States of America
| | - Nahum Smith
- The University of Washington, Department of Genome Sciences, Seattle, Washington, United States of America
- Brotman Baty Institute for Precision Medicine, Seattle, Washington, United States of America
| | - Aleksandra I. Adamovich
- The Ohio State University, Department of Biomedical Informatics, Columbus, Ohio, United States of America
- The Ohio State University Comprehensive Cancer Center, Columbus, Ohio, United States of America
| | - Lea M. Starita
- The University of Washington, Department of Genome Sciences, Seattle, Washington, United States of America
- Brotman Baty Institute for Precision Medicine, Seattle, Washington, United States of America
| | - Jeffrey D. Parvin
- The Ohio State University, Department of Biomedical Informatics, Columbus, Ohio, United States of America
- The Ohio State University Comprehensive Cancer Center, Columbus, Ohio, United States of America
| |
Collapse
|
2
|
Diabate M, Islam MM, Nagy G, Banerjee T, Dhar S, Smith N, Adamovich AI, Starita LM, Parvin JD. DNA Repair Function Scores for 2172 Variants in the BRCA1 Amino-Terminus. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.10.536331. [PMID: 37090572 PMCID: PMC10120616 DOI: 10.1101/2023.04.10.536331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Single nucleotide variants are the most frequent type of sequence changes detected in the genome and these are frequently variants of uncertain significance (VUS). VUS are changes in DNA for which disease risk association is unknown. Thus, methods that classify the functional impact of a VUS can be used as evidence for variant interpretation. In the case of the breast and ovarian cancer specific tumor suppressor protein, BRCA1, pathogenic missense variants frequently score as loss of function in an assay for homology-directed repair (HDR) of DNA double-strand breaks. We previously published functional results using a multiplexed assay for 1056 amino acid substitutions residues 2-192 in the amino terminus of BRCA1. In this study, we have re-assessed the data from this multiplexed assay using an improved analysis pipeline. These new analysis methods yield functional scores for more variants in the first 192 amino acids of BRCA1, plus we report new results for BRCA1 amino acid residues 193-302. We now present the functional classification of 2172 BRCA1 variants in BRCA1 residues 2-302 using the multiplexed HDR assay. Comparison of the functional determinations of the missense variants with clinically known benign or pathogenic variants indicated 93% sensitivity and 100% specificity for this assay. The results from BRCA1 variants tested in this assay are a resource for clinical geneticists for evidence to evaluate VUS in BRCA1 . AUTHOR SUMMARY Most missense substitutions in BRCA1 are variants of unknown significance (VUS), and individuals with a VUS in BRCA1 cannot know from genetic information alone whether this variant predisposes to breast or ovarian cancer. We apply a multiplexed functional assay for homology directed repair of DNA double strand breaks to assess variant impact on this important BRCA1 protein function. We analyzed 2172 variants in the amino-terminus of BRCA1 and demonstrate that variants that are known as pathogenic have a loss of function in the DNA repair assay. Conversely, variants that are known to be benign are functionally normal in the multiplexed assay. We suggest that these functional determinations of BRCA1 variants can be used to augment the information that clinical cancer geneticists provide to patients who have a VUS in BRCA1 .
Collapse
Affiliation(s)
- Mariame Diabate
- The Ohio State University, Department of Biomedical Informatics, and The Ohio State University Comprehensive Center, Columbus, OH 43210
| | - Muhtadi M Islam
- The Ohio State University, Department of Biomedical Informatics, and The Ohio State University Comprehensive Center, Columbus, OH 43210
| | - Gregory Nagy
- The Ohio State University, Department of Biomedical Informatics, and The Ohio State University Comprehensive Center, Columbus, OH 43210
| | - Tapahsama Banerjee
- The Ohio State University, Department of Biomedical Informatics, and The Ohio State University Comprehensive Center, Columbus, OH 43210
| | - Shruti Dhar
- The Ohio State University, Department of Biomedical Informatics, and The Ohio State University Comprehensive Center, Columbus, OH 43210
| | - Nahum Smith
- The University of Washington, Department of Genome Sciences, Seattle, WA 98195
- Brotman Baty Institute for Precision Medicine, Seattle WA, 98195
| | - Aleksandra I Adamovich
- The Ohio State University, Department of Biomedical Informatics, and The Ohio State University Comprehensive Center, Columbus, OH 43210
| | - Lea M Starita
- The University of Washington, Department of Genome Sciences, Seattle, WA 98195
- Brotman Baty Institute for Precision Medicine, Seattle WA, 98195
| | - Jeffrey D Parvin
- The Ohio State University, Department of Biomedical Informatics, and The Ohio State University Comprehensive Center, Columbus, OH 43210
| |
Collapse
|
3
|
The structure-based cancer-related single amino acid variation prediction. Sci Rep 2021; 11:13599. [PMID: 34193921 PMCID: PMC8245468 DOI: 10.1038/s41598-021-92793-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 06/16/2021] [Indexed: 11/09/2022] Open
Abstract
Single amino acid variation (SAV) is an amino acid substitution of the protein sequence that can potentially influence the entire protein structure or function, as well as its binding affinity. Protein destabilization is related to diseases, including several cancers, although using traditional experiments to clarify the relationship between SAVs and cancer uses much time and resources. Some SAV prediction methods use computational approaches, with most predicting SAV-induced changes in protein stability. In this investigation, all SAV characteristics generated from protein sequences, structures and the microenvironment were converted into feature vectors and fed into an integrated predicting system using a support vector machine and genetic algorithm. Critical features were used to estimate the relationship between their properties and cancers caused by SAVs. We describe how we developed a prediction system based on protein sequences and structure that is capable of distinguishing if the SAV is related to cancer or not. The five-fold cross-validation performance of our system is 89.73% for the accuracy, 0.74 for the Matthews correlation coefficient, and 0.81 for the F1 score. We have built an online prediction server, CanSavPre ( http://bioinfo.cmu.edu.tw/CanSavPre/ ), which is expected to become a useful, practical tool for cancer research and precision medicine.
Collapse
|
4
|
Gemović B, Perović V, Davidović R, Drljača T, Veljkovic N. Alignment-free method for functional annotation of amino acid substitutions: Application on epigenetic factors involved in hematologic malignancies. PLoS One 2021; 16:e0244948. [PMID: 33395407 PMCID: PMC7781373 DOI: 10.1371/journal.pone.0244948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Accepted: 12/21/2020] [Indexed: 11/19/2022] Open
Abstract
For the last couple of decades, there has been a significant growth in sequencing data, leading to an extraordinary increase in the number of gene variants. This places a challenge on the bioinformatics research community to develop and improve computational tools for functional annotation of new variants. Genes coding for epigenetic regulators have important roles in cancer pathogenesis and mutations in these genes show great potential as clinical biomarkers, especially in hematologic malignancies. Therefore, we developed a model that specifically focuses on these genes, with an assumption that it would outperform general models in predicting the functional effects of amino acid substitutions. EpiMut is a standalone software that implements a sequence based alignment-free method. We applied a two-step approach for generating sequence based features, relying on the biophysical and biochemical indices of amino acids and the Fourier Transform as a sequence transformation method. For each gene in the dataset, the machine learning algorithm-Naïve Bayes was used for building a model for prediction of the neutral or disease-related status of variants. EpiMut outperformed state-of-the-art tools used for comparison, PolyPhen-2, SIFT and SNAP2. Additionally, EpiMut showed the highest performance on the subset of variants positioned outside conserved functional domains of analysed proteins, which represents an important group of cancer-related variants. These results imply that EpiMut can be applied as a first choice tool in research of the impact of gene variants in epigenetic regulators, especially in the light of the biomarker role in hematologic malignancies. EpiMut is freely available at https://www.vin.bg.ac.rs/180/tools/epimut.php.
Collapse
Affiliation(s)
- Branislava Gemović
- Laboratory for Bioinformatics and Computational Chemistry, Vinča Institute of Nuclear Sciences, National Institute of the Republic of Serbia, University of Belgrade, Belgrade, Serbia
- * E-mail:
| | - Vladimir Perović
- Laboratory for Bioinformatics and Computational Chemistry, Vinča Institute of Nuclear Sciences, National Institute of the Republic of Serbia, University of Belgrade, Belgrade, Serbia
| | - Radoslav Davidović
- Laboratory for Bioinformatics and Computational Chemistry, Vinča Institute of Nuclear Sciences, National Institute of the Republic of Serbia, University of Belgrade, Belgrade, Serbia
| | - Tamara Drljača
- Laboratory for Bioinformatics and Computational Chemistry, Vinča Institute of Nuclear Sciences, National Institute of the Republic of Serbia, University of Belgrade, Belgrade, Serbia
| | - Nevena Veljkovic
- Laboratory for Bioinformatics and Computational Chemistry, Vinča Institute of Nuclear Sciences, National Institute of the Republic of Serbia, University of Belgrade, Belgrade, Serbia
- Heliant d.o.o., Belgrade, Serbia
| |
Collapse
|
5
|
Sarkar A, Yang Y, Vihinen M. Variation benchmark datasets: update, criteria, quality and applications. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5710862. [PMID: 32016318 PMCID: PMC6997940 DOI: 10.1093/database/baz117] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 06/03/2019] [Accepted: 07/01/2019] [Indexed: 02/07/2023]
Abstract
Development of new computational methods and testing their performance has to be carried out using experimental data. Only in comparison to existing knowledge can method performance be assessed. For that purpose, benchmark datasets with known and verified outcome are needed. High-quality benchmark datasets are valuable and may be difficult, laborious and time consuming to generate. VariBench and VariSNP are the two existing databases for sharing variation benchmark datasets used mainly for variation interpretation. They have been used for training and benchmarking predictors for various types of variations and their effects. VariBench was updated with 419 new datasets from 109 papers containing altogether 329 014 152 variants; however, there is plenty of redundancy between the datasets. VariBench is freely available at http://structure.bmc.lu.se/VariBench/. The contents of the datasets vary depending on information in the original source. The available datasets have been categorized into 20 groups and subgroups. There are datasets for insertions and deletions, substitutions in coding and non-coding region, structure mapped, synonymous and benign variants. Effect-specific datasets include DNA regulatory elements, RNA splicing, and protein property for aggregation, binding free energy, disorder and stability. Then there are several datasets for molecule-specific and disease-specific applications, as well as one dataset for variation phenotype effects. Variants are often described at three molecular levels (DNA, RNA and protein) and sometimes also at the protein structural level including relevant cross references and variant descriptions. The updated VariBench facilitates development and testing of new methods and comparison of obtained performances to previously published methods. We compared the performance of the pathogenicity/tolerance predictor PON-P2 to several benchmark studies, and show that such comparisons are feasible and useful, however, there may be limitations due to lack of provided details and shared data. Database URL: http://structure.bmc.lu.se/VariBench
Collapse
Affiliation(s)
- Anasua Sarkar
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22 184 Lund, Sweden
| | - Yang Yang
- School of Computer Science and Technology, Soochow University, No1. Shizi Street, Suzhou, 215006 Jiangsu, China.,Provincial Key Laboratory for Computer Information Processing Technology, No1. Shizi Street, Soochow University, Suzhou, 215006 Jiangsu, China
| | - Mauno Vihinen
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22 184 Lund, Sweden
| |
Collapse
|
6
|
Zimmerman L, Zelichov O, Aizenmann A, Barbash Z, Vidne M, Tarcic G. A Novel System for Functional Determination of Variants of Uncertain Significance using Deep Convolutional Neural Networks. Sci Rep 2020; 10:4192. [PMID: 32144301 PMCID: PMC7060242 DOI: 10.1038/s41598-020-61173-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Accepted: 02/24/2020] [Indexed: 11/08/2022] Open
Abstract
Many drugs are developed for commonly occurring, well studied cancer drivers such as vemurafenib for BRAF V600E and erlotinib for EGFR exon 19 mutations. However, most tumors also harbor mutations which have an uncertain role in disease formation, commonly called Variants of Uncertain Significance (VUS), which are not studied or characterized and could play a significant role in drug resistance and relapse. Therefore, the determination of the functional significance of VUS and their response to Molecularly Targeted Agents (MTA) is essential for developing new drugs and predicting response of patients. Here we present a multi-scale deep convolutional neural network (DCNN) architecture combined with an in-vitro functional assay to investigate the functional role of VUS and their response to MTA's. Our method achieved high accuracy and precision on a hold-out set of examples (0.98 mean AUC for all tested genes) and was used to predict the oncogenicity of 195 VUS in 6 genes. 63 (32%) of the assayed VUS's were classified as pathway activating, many of them to a similar extent as known driver mutations. Finally, we show that responses of various mutations to FDA approved MTAs are accurately predicted by our platform in a dose dependent manner. Taken together this novel system can uncover the treatable mutational landscape of a drug and be a useful tool in drug development.
Collapse
|
7
|
Chen H, Li J, Wang Y, Ng PKS, Tsang YH, Shaw KR, Mills GB, Liang H. Comprehensive assessment of computational algorithms in predicting cancer driver mutations. Genome Biol 2020; 21:43. [PMID: 32079540 PMCID: PMC7033911 DOI: 10.1186/s13059-020-01954-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Accepted: 02/07/2020] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND The initiation and subsequent evolution of cancer are largely driven by a relatively small number of somatic mutations with critical functional impacts, so-called driver mutations. Identifying driver mutations in a patient's tumor cells is a central task in the era of precision cancer medicine. Over the decade, many computational algorithms have been developed to predict the effects of missense single-nucleotide variants, and they are frequently employed to prioritize mutation candidates. These algorithms employ diverse molecular features to build predictive models, and while some algorithms are cancer-specific, others are not. However, the relative performance of these algorithms has not been rigorously assessed. RESULTS We construct five complementary benchmark datasets: mutation clustering patterns in the protein 3D structures, literature annotation based on OncoKB, TP53 mutations based on their effects on target-gene transactivation, effects of cancer mutations on tumor formation in xenograft experiments, and functional annotation based on in vitro cell viability assays we developed including a new dataset of ~ 200 mutations. We evaluate the performance of 33 algorithms and found that CHASM, CTAT-cancer, DEOGEN2, and PrimateAI show consistently better performance than the other algorithms. Moreover, cancer-specific algorithms show much better performance than those designed for a general purpose. CONCLUSIONS Our study is a comprehensive assessment of the performance of different algorithms in predicting cancer driver mutations and provides deep insights into the best practice of computationally prioritizing cancer mutation candidates for end-users and for the future development of new algorithms.
Collapse
Affiliation(s)
- Hu Chen
- Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX, 77030, USA.,Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Jun Li
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Yumeng Wang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Patrick Kwok-Shing Ng
- Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Yiu Huen Tsang
- Department of Cell, Developmental & Cancer Biology, Knight Cancer Institute, Oregon Health Sciences University, Portland, OR, 97239, USA
| | - Kenna R Shaw
- Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Gordon B Mills
- Department of Cell, Developmental & Cancer Biology, Knight Cancer Institute, Oregon Health Sciences University, Portland, OR, 97239, USA
| | - Han Liang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA. .,Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.
| |
Collapse
|
8
|
Li S, Qian D, Thompson BA, Gutierrez S, Wu S, Pesaran T, LaDuca H, Lu HM, Chao EC, Black MH. Tumour characteristics provide evidence for germline mismatch repair missense variant pathogenicity. J Med Genet 2019; 57:62-69. [PMID: 31391288 DOI: 10.1136/jmedgenet-2019-106096] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Revised: 06/28/2019] [Accepted: 07/09/2019] [Indexed: 12/20/2022]
Abstract
BACKGROUND Pathogenic variants in mismatch repair (MMR) genes (MLH1, MSH2, MSH6 and PMS2) increase risk for Lynch syndrome and related cancers. We quantified tumour characteristics to assess variant pathogenicity for germline MMR genes. METHODS Among 4740 patients with cancer with microsatellite instability (MSI) and immunohistochemical (IHC) results, we tested MMR pathogenic variant association with MSI/IHC status, and estimated likelihood ratios which we used to compute a tumour characteristic likelihood ratio (TCLR) for each variant. Predictive performance of TCLR in combination with in silico predictors, and a multifactorial variant prediction (MVP) model that included allele frequency, co-occurrence, co-segregation, and clinical and family history information was assessed. RESULTS Compared with non-carriers, carriers of germline pathogenic/likely pathogenic (P/LP) variants were more likely to have abnormal MSI/IHC status (p<0.0001). Among 150 classified missense variants, 73.3% were accurately predicted with TCLR alone. Models leveraging in silico scores as prior probabilities accurately classified >76.7% variants. Adding TCLR as quantitative evidence in an MVP model (MVP +TCLR Pred) increased the proportion of accurately classified variants from 88.0% (MVP alone) to 98.0% and generated optimal performance statistics among all models tested. Importantly, MVP +TCLR Pred resulted in the high yield of predicted classifications for missense variants of unknown significance (VUS); among 193 VUS, 62.7% were predicted as P/PL or benign/likely benign (B/LB) when assessed according to American College of Medical Genetics and Genomics/Association for Molecular Pathology guidelines. CONCLUSION Our study demonstrates that when used separately or in conjunction with other evidence, tumour characteristics provide evidence for germline MMR missense variant assessment, which may have important implications for genetic testing and clinical management.
Collapse
Affiliation(s)
- Shuwei Li
- Bioinformatics, Ambry Genetics Corp, Aliso Viejo, California, USA
| | - Dajun Qian
- Bioinformatics, Ambry Genetics Corp, Aliso Viejo, California, USA
| | - Bryony A Thompson
- Royal Melbourne Hospital, Melbourne, Victoria, Australia.,Department of Clinical Pathology, University of Melbourne, Parkville, Victoria, Australia
| | | | - Sitao Wu
- Bioinformatics, Ambry Genetics Corp, Aliso Viejo, California, USA
| | - Tina Pesaran
- Clinical Diagnostics, Ambry Genetics Corp, Aliso Viejo, California, USA
| | - Holly LaDuca
- Clinical Diagnostics, Ambry Genetics Corp, Aliso Viejo, California, USA
| | - Hsiao-Mei Lu
- Bioinformatics, Ambry Genetics Corp, Aliso Viejo, California, USA
| | - Elizabeth C Chao
- Clinical Diagnostics, Ambry Genetics Corp, Aliso Viejo, California, USA
| | - Mary Helen Black
- Bioinformatics, Ambry Genetics Corp, Aliso Viejo, California, USA
| |
Collapse
|
9
|
McGarvey PB, Nightingale A, Luo J, Huang H, Martin MJ, Wu C, Consortium U. UniProt genomic mapping for deciphering functional effects of missense variants. Hum Mutat 2019; 40:694-705. [PMID: 30840782 PMCID: PMC6563471 DOI: 10.1002/humu.23738] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2018] [Revised: 12/17/2018] [Accepted: 02/17/2019] [Indexed: 01/08/2023]
Abstract
Understanding the association of genetic variation with its functional consequences in proteins is essential for the interpretation of genomic data and identifying causal variants in diseases. Integration of protein function knowledge with genome annotation can assist in rapidly comprehending genetic variation within complex biological processes. Here, we describe mapping UniProtKB human sequences and positional annotations, such as active sites, binding sites, and variants to the human genome (GRCh38) and the release of a public genome track hub for genome browsers. To demonstrate the power of combining protein annotations with genome annotations for functional interpretation of variants, we present specific biological examples in disease-related genes and proteins. Computational comparisons of UniProtKB annotations and protein variants with ClinVar clinically annotated single nucleotide polymorphism (SNP) data show that 32% of UniProtKB variants colocate with 8% of ClinVar SNPs. The majority of colocated UniProtKB disease-associated variants (86%) map to 'pathogenic' ClinVar SNPs. UniProt and ClinVar are collaborating to provide a unified clinical variant annotation for genomic, protein, and clinical researchers. The genome track hubs, and related UniProtKB files, are downloadable from the UniProt FTP site and discoverable as public track hubs at the UCSC and Ensembl genome browsers.
Collapse
Affiliation(s)
- Peter B. McGarvey
- Innovation Center for Biomedical InformaticsGeorgetown University Medical CenterWashingtonDC
- Protein Information ResourceGeorgetown Medical CenterWashingtonDC
| | - Andrew Nightingale
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute, Wellcome Genome CampusHinxtonUnited Kingdom
| | - Jie Luo
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute, Wellcome Genome CampusHinxtonUnited Kingdom
| | - Hongzhan Huang
- Center for Bioinformatics and Computational BiologyUniversity of DelawareNewarkDelaware
| | - Maria J. Martin
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute, Wellcome Genome CampusHinxtonUnited Kingdom
| | - Cathy Wu
- Center for Bioinformatics and Computational BiologyUniversity of DelawareNewarkDelaware
| | - UniProt Consortium
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute, Wellcome Genome CampusHinxtonUnited Kingdom
- Swiss Institute of BioinformaticsCentre Medical UniversitaireGenevaSwitzerland
- Protein Information ResourceGeorgetown Medical CenterWashingtonDC
| |
Collapse
|