1
|
Villani RM, McKenzie ME, Davidson AL, Spurdle AB. Regional-specific calibration enables application of computational evidence for clinical classification of 5' cis-regulatory variants in Mendelian disease. Am J Hum Genet 2024; 111:1301-1315. [PMID: 38815586 PMCID: PMC11267523 DOI: 10.1016/j.ajhg.2024.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 05/02/2024] [Accepted: 05/03/2024] [Indexed: 06/01/2024] Open
Abstract
To date, clinical genetic testing for Mendelian disease variants has focused heavily on exonic coding and intronic gene regions. This multi-step study was undertaken to provide an evidence base for selecting and applying computational approaches for use in clinical classification of 5' cis-regulatory region variants. Curated datasets of clinically reported disease-causing 5' cis-regulatory region variants and variants from matched genomic regions in population controls were used to calibrate six bioinformatic tools as predictors of variant pathogenicity. Likelihood ratio estimates were aligned to code weights following ClinGen recommendations for application of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) classification scheme. Considering code assignment across all reference dataset variants, performance was best for CADD (81.2%) and REMM (81.5%). Optimized thresholds provided moderate evidence toward pathogenicity (CADD, REMM) and moderate (CADD) or supporting (REMM) evidence against pathogenicity. Both sensitivity and specificity of prediction were improved when further categorizing variants based on location in an EPDnew-defined promoter region. Combining predictions (CADD, REMM, and location in a promoter region) increased specificity at the expense of sensitivity. Importantly, the optimal CADD thresholds for assigning ACMG/AMP codes PP3 (≥10) and BP4 (≤8) were vastly different from recommendations for protein-coding variants (PP3 ≥25.3; BP4 ≤22.7); CADD <22.7 would incorrectly assign BP4 for >90% of reported disease-causing cis-regulatory region variants. Our results demonstrate the need to consider a tiered approach and tailored score thresholds to optimize bioinformatic impact prediction for clinical classification of 5' cis-regulatory region variants.
Collapse
Affiliation(s)
- Rehan M Villani
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Maddison E McKenzie
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Aimee L Davidson
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Amanda B Spurdle
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia; University of Queensland, Brisbane, Queensland, Australia.
| |
Collapse
|
2
|
Wang Z, Zhao G, Li B, Fang Z, Chen Q, Wang X, Luo T, Wang Y, Zhou Q, Li K, Xia L, Zhang Y, Zhou X, Pan H, Zhao Y, Wang Y, Wang L, Guo J, Tang B, Xia K, Li J. Performance Comparison of Computational Methods for the Prediction of the Function and Pathogenicity of Non-coding Variants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:649-661. [PMID: 35272052 PMCID: PMC10787016 DOI: 10.1016/j.gpb.2022.02.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 12/28/2021] [Accepted: 02/27/2022] [Indexed: 06/14/2023]
Abstract
Non-coding variants in the human genome significantly influence human traits and complex diseases via their regulation and modification effects. Hence, an increasing number of computational methods are developed to predict the effects of variants in human non-coding sequences. However, it is difficult for inexperienced users to select appropriate computational methods from dozens of available methods. To solve this issue, we assessed 12 performance metrics of 24 methods on four independent non-coding variant benchmark datasets: (1) rare germline variants from clinical relevant sequence variants (ClinVar), (2) rare somatic variants from Catalogue Of Somatic Mutations In Cancer (COSMIC), (3) common regulatory variants from curated expression quantitative trait locus (eQTL) data, and (4) disease-associated common variants from curated genome-wide association studies (GWAS). All 24 tested methods performed differently under various conditions, indicating varying strengths and weaknesses under different scenarios. Importantly, the performance of existing methods was acceptable for rare germline variants from ClinVar with the area under the receiver operating characteristic curve (AUROC) of 0.4481-0.8033 and poor for rare somatic variants from COSMIC (AUROC = 0.4984-0.7131), common regulatory variants from curated eQTL data (AUROC = 0.4837-0.6472), and disease-associated common variants from curated GWAS (AUROC = 0.4766-0.5188). We also compared the prediction performance of 24 methods for non-coding de novo mutations in autism spectrum disorder, and found that the combined annotation-dependent depletion (CADD) and context-dependent tolerance score (CDTS) methods showed better performance. Summarily, we assessed the performance of 24 computational methods under diverse scenarios, providing preliminary advice for proper tool selection and guiding the development of new techniques in interpreting non-coding variants.
Collapse
Affiliation(s)
- Zheng Wang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Guihu Zhao
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Bin Li
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Zhenghuan Fang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Qian Chen
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Xiaomeng Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Tengfei Luo
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Yijing Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Qiao Zhou
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Kuokuo Li
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Lu Xia
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Yi Zhang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Xun Zhou
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Hongxu Pan
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Yuwen Zhao
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Yige Wang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Lin Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China; Reproductive Medicine Center, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Jifeng Guo
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Beisha Tang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Kun Xia
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Jinchen Li
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China; Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China.
| |
Collapse
|
3
|
Rasheed S, Bouley RA, Yoder RJ, Petreaca RC. Protein Arginine Methyltransferase 5 (PRMT5) Mutations in Cancer Cells. Int J Mol Sci 2023; 24:6042. [PMID: 37047013 PMCID: PMC10094674 DOI: 10.3390/ijms24076042] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 03/15/2023] [Accepted: 03/20/2023] [Indexed: 04/14/2023] Open
Abstract
Arginine methylation is a form of posttranslational modification that regulates many cellular functions such as development, DNA damage repair, inflammatory response, splicing, and signal transduction, among others. Protein arginine methyltransferase 5 (PRMT5) is one of nine identified methyltransferases, and it can methylate both histone and non-histone targets. It has pleiotropic functions, including recruitment of repair machinery to a chromosomal DNA double strand break (DSB) and coordinating the interplay between repair and checkpoint activation. Thus, PRMT5 has been actively studied as a cancer treatment target, and small molecule inhibitors of its enzymatic activity have already been developed. In this report, we analyzed all reported PRMT5 mutations appearing in cancer cells using data from the Catalogue of Somatic Mutations in Cancers (COSMIC). Our goal is to classify mutations as either drivers or passengers to understand which ones are likely to promote cellular transformation. Using gold standard artificial intelligence algorithms, we uncovered several key driver mutations in the active site of the enzyme (D306H, L315P, and N318K). In silico protein modeling shows that these mutations may affect the affinity of PRMT5 for S-adenosylmethionine (SAM), which is required as a methyl donor. Electrostatic analysis of the enzyme active site shows that one of these mutations creates a tunnel in the vicinity of the SAM binding site, which may allow interfering molecules to enter the enzyme active site and decrease its activity. We also identified several non-coding mutations that appear to affect PRMT5 splicing. Our analyses provide insights into the role of PRMT5 mutations in cancer cells. Additionally, since PRMT5 single molecule inhibitors have already been developed, this work may uncover future directions in how mutations can affect targeted inhibition.
Collapse
Affiliation(s)
- Shayaan Rasheed
- James Comprehensive Cancer Center, The Ohio State University Columbus, Columbus, OH 43210, USA
- Biology Program, The Ohio State University, Columbus, OH 43210, USA
| | - Renee A. Bouley
- Department of Chemistry and Biochemistry, The Ohio State University, Marion, OH 43302, USA
| | - Ryan J. Yoder
- Department of Chemistry and Biochemistry, The Ohio State University, Marion, OH 43302, USA
| | - Ruben C. Petreaca
- James Comprehensive Cancer Center, The Ohio State University Columbus, Columbus, OH 43210, USA
- Department of Molecular Genetics, The Ohio State University, Marion, OH 43302, USA
| |
Collapse
|
4
|
Morova T, Ding Y, Huang CCF, Sar F, Schwarz T, Giambartolomei C, Baca S, Grishin D, Hach F, Gusev A, Freedman M, Pasaniuc B, Lack N. Optimized high-throughput screening of non-coding variants identified from genome-wide association studies. Nucleic Acids Res 2022; 51:e18. [PMID: 36546757 PMCID: PMC9943666 DOI: 10.1093/nar/gkac1198] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 11/19/2022] [Accepted: 12/06/2022] [Indexed: 12/24/2022] Open
Abstract
The vast majority of disease-associated single nucleotide polymorphisms (SNP) identified from genome-wide association studies (GWAS) are localized in non-coding regions. A significant fraction of these variants impact transcription factors binding to enhancer elements and alter gene expression. To functionally interrogate the activity of such variants we developed snpSTARRseq, a high-throughput experimental method that can interrogate the functional impact of hundreds to thousands of non-coding variants on enhancer activity. snpSTARRseq dramatically improves signal-to-noise by utilizing a novel sequencing and bioinformatic approach that increases both insert size and the number of variants tested per loci. Using this strategy, we interrogated known prostate cancer (PCa) risk-associated loci and demonstrated that 35% of them harbor SNPs that significantly altered enhancer activity. Combining these results with chromosomal looping data we could identify interacting genes and provide a mechanism of action for 20 PCa GWAS risk regions. When benchmarked to orthogonal methods, snpSTARRseq showed a strong correlation with in vivo experimental allelic-imbalance studies whereas there was no correlation with predictive in silico approaches. Overall, snpSTARRseq provides an integrated experimental and computational framework to functionally test non-coding genetic variants.
Collapse
Affiliation(s)
- Tunc Morova
- Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| | - Yi Ding
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | | | - Funda Sar
- Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| | - Tommer Schwarz
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Claudia Giambartolomei
- Central RNA Lab, Istituto Italiano di Tecnologia, Genova 16163, Italy,Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Sylvan C Baca
- Department of Medical Oncology, The Center for Functional Cancer Epigenetics, Dana Farber Cancer Institute, Boston, MA 02215, USA
| | - Dennis Grishin
- Department of Medical Oncology, The Center for Functional Cancer Epigenetics, Dana Farber Cancer Institute, Boston, MA 02215, USA
| | - Faraz Hach
- Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada,Department of Urologic Science, University of British Columbia, Vancouver, BC V5Z 1M9, Canada
| | - Alexander Gusev
- Department of Medical Oncology, The Center for Functional Cancer Epigenetics, Dana Farber Cancer Institute, Boston, MA 02215, USA,Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Matthew L Freedman
- Department of Medical Oncology, The Center for Functional Cancer Epigenetics, Dana Farber Cancer Institute, Boston, MA 02215, USA,The Center for Cancer Genome Discovery, Dana Farber Cancer Institute, Boston, MA 02215, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Nathan A Lack
- To whom correspondence should be addressed. Tel: +1 604 875 4411;
| |
Collapse
|
5
|
Nayara Góes de Araújo J, Fernandes de Oliveira V, Bassani Borges J, Dagli-Hernandez C, da Silva Rodrigues Marçal E, Caroline Costa de Freitas R, Medeiros Bastos G, Marques Gonçalves R, Arpad Faludi A, Elim Jannes C, da Costa Pereira A, Dominguez Crespo Hirata R, Hiroyuki Hirata M, Ducati Luchessi A, Nogueira Silbiger V. In silico analysis of upstream variants in Brazilian patients with Familial Hypercholesterolemia. Gene X 2022; 849:146908. [PMID: 36167182 DOI: 10.1016/j.gene.2022.146908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 08/16/2022] [Accepted: 09/19/2022] [Indexed: 10/14/2022] Open
Abstract
Familial hypercholesterolemia (FH) is a prevalent autosomal genetic disease associated with increased risk of early cardiovascular events and death due to chronic exposure to very high levels of low-density lipoprotein cholesterol (LDL-c). Pathogenic variants in the coding regions of LDLR, APOB and PCSK9 account for most FH cases, and variants in non-coding regions maybe involved in FH as well. Variants in the upstream region of LDLR, APOB and PCSK9 were screened by targeted next-generation sequencing and their effects were explored using in silico tools. Twenty-five patients without pathogenic variants in FH-related genes were selected. 3 kb upstream regions of LDLR, APOB and PCSK9 were sequenced using the AmpliSeq (Illumina) and Miseq Reagent Nano Kit v2 (Illumina). Sequencing data were analyzed using variant discovery and functional annotation tools. Potentially regulatory variants were selected by integrating data from public databases, published data and context-dependent regulatory prediction score. Thirty-four single nucleotide variants (SNVs) in upstream regions were identified (6 in LDLR, 15 in APOB, and 13 in PCSK9). Five SNVs were prioritized as potentially regulatory variants (rs934197, rs9282606, rs36218923, rs538300761, g.55038486A>G). APOB rs934197 was previously associated with increased rate of transcription, which in silico analysis suggests that could be due to reducing binding affinity of a transcriptional repressor. Our findings highlight the importance of variant screening outside of coding regions of all relevant genes. Further functional studies are necessary to confirm that prioritized variants could impact gene regulation and contribute to the FH phenotype.
Collapse
Affiliation(s)
- Jéssica Nayara Góes de Araújo
- Northeast Biotechnology Network (RENORBIO), Graduate Program in Biotechnology, Federal University of Rio Grande do Norte, Natal 59078-900, Brazil
| | - Victor Fernandes de Oliveira
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of Sao Paulo, Sao Paulo 05508-000, Brazil
| | - Jéssica Bassani Borges
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of Sao Paulo, Sao Paulo 05508-000, Brazil; Laboratory of Molecular Research in Cardiology, Institute Dante Pazzanese of Cardiology, Sao Paulo, 04012-909, Brazil
| | - Carolina Dagli-Hernandez
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of Sao Paulo, Sao Paulo 05508-000, Brazil
| | | | - Renata Caroline Costa de Freitas
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of Sao Paulo, Sao Paulo 05508-000, Brazil
| | - Gisele Medeiros Bastos
- Laboratory of Molecular Research in Cardiology, Institute Dante Pazzanese of Cardiology, Sao Paulo, 04012-909, Brazil; Medical Clinic Division, Institute Dante Pazzanese of Cardiology, Sao Paulo 04012-909, Brazil
| | | | - André Arpad Faludi
- Medical Clinic Division, Institute Dante Pazzanese of Cardiology, Sao Paulo 04012-909, Brazil
| | - Cinthia Elim Jannes
- Laboratory of Genetics and Molecular Cardiology, Heart Institute, University of Sao Paulo 05403-900, Brazil
| | - Alexandre da Costa Pereira
- Laboratory of Genetics and Molecular Cardiology, Heart Institute, University of Sao Paulo 05403-900, Brazil
| | - Rosario Dominguez Crespo Hirata
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of Sao Paulo, Sao Paulo 05508-000, Brazil
| | - Mario Hiroyuki Hirata
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of Sao Paulo, Sao Paulo 05508-000, Brazil
| | - André Ducati Luchessi
- Northeast Biotechnology Network (RENORBIO), Graduate Program in Biotechnology, Federal University of Rio Grande do Norte, Natal 59078-900, Brazil; Department of Clinical and Toxicological Analyses, Federal University of Rio Grande do Norte, Natal 59012-570, Brazil
| | - Vivian Nogueira Silbiger
- Northeast Biotechnology Network (RENORBIO), Graduate Program in Biotechnology, Federal University of Rio Grande do Norte, Natal 59078-900, Brazil; Department of Clinical and Toxicological Analyses, Federal University of Rio Grande do Norte, Natal 59012-570, Brazil.
| |
Collapse
|
6
|
Schipper M, Posthuma D. "Demystifying non-coding GWAS variants: an overview of computational tools and methods.". Hum Mol Genet 2022; 31:R73-R83. [PMID: 35972862 PMCID: PMC9585674 DOI: 10.1093/hmg/ddac198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 08/11/2022] [Accepted: 08/11/2022] [Indexed: 02/01/2023] Open
Abstract
Genome-wide association studies (GWAS) have found the majority of disease-associated variants to be non-coding. Major efforts into the charting of the non-coding regulatory landscapes have allowed for the development of tools and methods which aim to aid in the identification of causal variants and their mechanism of action. In this review, we give an overview of current tools and methods for the analysis of non-coding GWAS variants in disease. We provide a workflow that allows for the accumulation of in silico evidence to generate novel hypotheses on mechanisms underlying disease and prioritize targets for follow-up study using non-coding GWAS variants. Lastly, we discuss the need for comprehensive benchmarks and novel tools for the analysis of non-coding variants.
Collapse
Affiliation(s)
- Marijn Schipper
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, VU University Amsterdam, De Boelelaan 1105 1081HV Amsterdam, The Netherlands
| | - Danielle Posthuma
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, VU University Amsterdam, De Boelelaan 1105 1081HV Amsterdam, The Netherlands
| |
Collapse
|
7
|
Classification of non-coding variants with high pathogenic impact. PLoS Genet 2022; 18:e1010191. [PMID: 35486646 PMCID: PMC9094564 DOI: 10.1371/journal.pgen.1010191] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2021] [Revised: 05/11/2022] [Accepted: 04/05/2022] [Indexed: 01/22/2023] Open
Abstract
Whole genome sequencing is increasingly used to diagnose medical conditions of genetic origin. While both coding and non-coding DNA variants contribute to a wide range of diseases, most patients who receive a WGS-based diagnosis today harbour a protein-coding mutation. Functional interpretation and prioritization of non-coding variants represents a persistent challenge, and disease-causing non-coding variants remain largely unidentified. Depending on the disease, WGS fails to identify a candidate variant in 20–80% of patients, severely limiting the usefulness of sequencing for personalised medicine. Here we present FINSURF, a machine-learning approach to predict the functional impact of non-coding variants in regulatory regions. FINSURF outperforms state-of-the-art methods, owing in particular to optimized control variants selection during training. In addition to ranking candidate variants, FINSURF breaks down the score for each variant into contributions from individual annotations, facilitating the evaluation of their functional relevance. We applied FINSURF to a diverse set of 30 diseases with described causative non-coding mutations, and correctly identified the disease-causative non-coding variant within the ten top hits in 22 cases. FINSURF is implemented as an online server to as well as custom browser tracks, and provides a quick and efficient solution to prioritize candidate non-coding variants in realistic clinical settings.
Collapse
|
8
|
Ruscheinski A, Reimler AL, Ewald R, Uhrmacher AM. VPMBench: a test bench for variant prioritization methods. BMC Bioinformatics 2021; 22:543. [PMID: 34749640 PMCID: PMC8576923 DOI: 10.1186/s12859-021-04458-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 10/23/2021] [Indexed: 11/18/2022] Open
Abstract
Background Clinical diagnostics of whole-exome and whole-genome sequencing data requires geneticists to consider thousands of genetic variants for each patient. Various variant prioritization methods have been developed over the last years to aid clinicians in identifying variants that are likely disease-causing. Each time a new method is developed, its effectiveness must be evaluated and compared to other approaches based on the most recently available evaluation data. Doing so in an unbiased, systematic, and replicable manner requires significant effort. Results The open-source test bench “VPMBench” automates the evaluation of variant prioritization methods. VPMBench introduces a standardized interface for prioritization methods and provides a plugin system that makes it easy to evaluate new methods. It supports different input data formats and custom output data preparation. VPMBench exploits declaratively specified information about the methods, e.g., the variants supported by the methods. Plugins may also be provided in a technology-agnostic manner via containerization. Conclusions VPMBench significantly simplifies the evaluation of both custom and published variant prioritization methods. As we expect variant prioritization methods to become ever more critical with the advent of whole-genome sequencing in clinical diagnostics, such tool support is crucial to facilitate methodological research.
Collapse
Affiliation(s)
- Andreas Ruscheinski
- Modeling and Simulation Group, Institute for Visual and Analytic Computing, University of Rostock, Albert-Einstein-Straße 22, 18051, Rostock, Germany.
| | - Anna Lena Reimler
- Modeling and Simulation Group, Institute for Visual and Analytic Computing, University of Rostock, Albert-Einstein-Straße 22, 18051, Rostock, Germany
| | - Roland Ewald
- Limbus Medical Technologies GmbH, Lindenstraße 2, 18055, Rostock, Germany
| | - Adelinde M Uhrmacher
- Modeling and Simulation Group, Institute for Visual and Analytic Computing, University of Rostock, Albert-Einstein-Straße 22, 18051, Rostock, Germany
| |
Collapse
|
9
|
Benevenuta S, Capriotti E, Fariselli P. Calibrating variant-scoring methods for clinical decision making. Bioinformatics 2021; 36:5709-5711. [PMID: 33492342 PMCID: PMC8023678 DOI: 10.1093/bioinformatics/btaa943] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Revised: 09/27/2020] [Accepted: 10/28/2020] [Indexed: 12/22/2022] Open
Abstract
Summary Identifying pathogenic variants and annotating them is a major challenge in human genetics, especially for the non-coding ones. Several tools have been developed and used to predict the functional effect of genetic variants. However, the calibration assessment of the predictions has received little attention. Calibration refers to the idea that if a model predicts a group of variants to be pathogenic with a probability P, it is expected that the same fraction P of true positive is found in the observed set. For instance, a well-calibrated classifier should label the variants such that among the ones to which it gave a probability value close to 0.7, approximately 70% actually belong to the pathogenic class. Poorly calibrated algorithms can be misleading and potentially harmful for clinical decision making. Avaliability and implementation The dataset used for testing the methods is available through the DOI:10.5281/zenodo.4448197. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Silvia Benevenuta
- Department of Medical Sciences, University of Torino, Via Santena, 19, 10126, Torino, Italy
| | - Emidio Capriotti
- BioFolD Unit, Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Via Selmi 3, 40126, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Via Santena, 19, 10126, Torino, Italy
| |
Collapse
|
10
|
Biggs H, Parthasarathy P, Gavryushkina A, Gardner PP. ncVarDB: a manually curated database for pathogenic non-coding variants and benign controls. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:6013764. [PMID: 33258967 PMCID: PMC7706182 DOI: 10.1093/database/baaa105] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 10/13/2020] [Accepted: 11/12/2020] [Indexed: 11/22/2022]
Abstract
Variants within the non-coding genome are frequently associated with phenotypes in genome-wide association studies. These non-coding regions may be involved in the regulation of gene expression, encode functional non-coding RNAs, or influence splicing and other cellular functions. We have curated a list of characterized non-coding human genome variants based on the published evidence that indicates phenotypic consequences of the variation. In order to minimize annotation errors, two curators have independently verified the supporting evidence for pathogenicity of each non-coding variant in the published literature. The database consists of 721 non-coding variants linked to the published literature describing the evidence of functional consequences. We have also sampled 7228 covariate-matched benign controls, that have a population frequency of over 5%, from the single nucleotide polymorphism database (dbSNP151) database. These were sampled controlling for potential confounding factors such as linkage with pathogenic variants, annotation type (untranslated region, intron, intergenic, etc.) and variant type (substitution or indel). The dataset presented here represents a curated repository, with a potential use for the training or evaluation of algorithms used in the prediction of non-coding variant functionality. Database URL: https://github.com/Gardner-BinfLab/ncVarDB.
Collapse
Affiliation(s)
- Harry Biggs
- Department of Biochemistry, University of Otago, PO Box 56, Dunedin 9054, New Zealand
| | - Padmini Parthasarathy
- Department of Biochemistry, University of Otago, PO Box 56, Dunedin 9054, New Zealand
| | - Alexandra Gavryushkina
- Department of Biochemistry, University of Otago, PO Box 56, Dunedin 9054, New Zealand.,Bio-Protection Research Centre, University of Otago, PO Box 56, Dunedin 9054, New Zealand
| | - Paul P Gardner
- Department of Biochemistry, University of Otago, PO Box 56, Dunedin 9054, New Zealand.,Bio-Protection Research Centre, University of Otago, PO Box 56, Dunedin 9054, New Zealand
| |
Collapse
|
11
|
Schwarz JM, Hombach D, Köhler S, Cooper DN, Schuelke M, Seelow D. RegulationSpotter: annotation and interpretation of extratranscriptic DNA variants. Nucleic Acids Res 2020; 47:W106-W113. [PMID: 31106382 PMCID: PMC6602480 DOI: 10.1093/nar/gkz327] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Revised: 04/17/2019] [Accepted: 05/09/2019] [Indexed: 02/07/2023] Open
Abstract
RegulationSpotter is a web-based tool for the user-friendly annotation and interpretation of DNA variants located outside of protein-coding transcripts (extratranscriptic variants). It is designed for clinicians and researchers who wish to assess the potential impact of the considerable number of non-coding variants found in Whole Genome Sequencing runs. It annotates individual variants with underlying regulatory features in an intuitive way by assessing over 100 genome-wide annotations. Additionally, it calculates a score, which reflects the regulatory potential of the variant region. Its dichotomous classifications, ‘functional’ or ‘non-functional’, and a human-readable presentation of the underlying evidence allow a biologically meaningful interpretation of the score. The output shows key aspects of every variant and allows rapid access to more detailed information about its possible role in gene regulation. RegulationSpotter can either analyse single variants or complete VCF files. Variants located within protein-coding transcripts are automatically assessed by MutationTaster as well as by RegulationSpotter to account for possible intragenic regulatory effects. RegulationSpotter offers the possibility of using phenotypic data to focus on known disease genes or genomic elements interacting with them. RegulationSpotter is freely available at https://www.regulationspotter.org.
Collapse
Affiliation(s)
- Jana Marie Schwarz
- Department of Neuropediatrics, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Berlin, Germany.,Centrum für Therapieforschung, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Berlin, Germany.,NeuroCure Cluster of Excellence and NeuroCure Clinical Research Center, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Berlin, Germany
| | - Daniela Hombach
- Centrum für Therapieforschung, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Berlin, Germany.,NeuroCure Cluster of Excellence and NeuroCure Clinical Research Center, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Berlin, Germany
| | - Sebastian Köhler
- Centrum für Therapieforschung, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Berlin, Germany.,Berlin Institute of Health (BIH), Berlin, Germany.,Einstein Center for Digital Future, Berlin, Germany
| | - David N Cooper
- Institute of Medical Genetics, Cardiff University, Cardiff, UK
| | - Markus Schuelke
- Department of Neuropediatrics, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Berlin, Germany.,NeuroCure Cluster of Excellence and NeuroCure Clinical Research Center, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Berlin, Germany
| | - Dominik Seelow
- Centrum für Therapieforschung, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Berlin, Germany.,Berlin Institute of Health (BIH), Berlin, Germany
| |
Collapse
|
12
|
Zhang S, He Y, Liu H, Zhai H, Huang D, Yi X, Dong X, Wang Z, Zhao K, Zhou Y, Wang J, Yao H, Xu H, Yang Z, Sham PC, Chen K, Li MJ. regBase: whole genome base-wise aggregation and functional prediction for human non-coding regulatory variants. Nucleic Acids Res 2020; 47:e134. [PMID: 31511901 PMCID: PMC6868349 DOI: 10.1093/nar/gkz774] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Accepted: 08/29/2019] [Indexed: 12/19/2022] Open
Abstract
Predicting the functional or pathogenic regulatory variants in the human non-coding genome facilitates the interpretation of disease causation. While numerous prediction methods are available, their performance is inconsistent or restricted to specific tasks, which raises the demand of developing comprehensive integration for those methods. Here, we compile whole genome base-wise aggregations, regBase, that incorporate largest prediction scores. Building on different assumptions of causality, we train three composite models to score functional, pathogenic and cancer driver non-coding regulatory variants respectively. We demonstrate the superior and stable performance of our models using independent benchmarks and show great success to fine-map causal regulatory variants on specific locus or at base-wise resolution. We believe that regBase database together with three composite models will be useful in different areas of human genetic studies, such as annotation-based casual variant fine-mapping, pathogenic variant discovery as well as cancer driver mutation identification. regBase is freely available at https://github.com/mulinlab/regBase.
Collapse
Affiliation(s)
- Shijie Zhang
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Key Laboratory of Inflammation Biology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Yukun He
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Key Laboratory of Inflammation Biology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Huanhuan Liu
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Key Laboratory of Inflammation Biology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Haoyu Zhai
- Department of Computer Science, University of Illinois Urbana-Champaign, IL, USA
| | - Dandan Huang
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Key Laboratory of Inflammation Biology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China.,Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Xianfu Yi
- School of Biomedical Engineering, Tianjin Medical University, Tianjin, China
| | - Xiaobao Dong
- Department of Genetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Zhao Wang
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Key Laboratory of Inflammation Biology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Ke Zhao
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Key Laboratory of Inflammation Biology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Yao Zhou
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Key Laboratory of Inflammation Biology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Jianhua Wang
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Key Laboratory of Inflammation Biology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Hongcheng Yao
- School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Hang Xu
- School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Zhenglu Yang
- College of Computer Science, Nankai University, Tianjin, China
| | - Pak Chung Sham
- Centre of Genomics Sciences, State Key Laboratory of Brain and Cognitive Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Kexin Chen
- Department of Epidemiology and Biostatistics, Tianjin Key Laboratory of Molecular Cancer Epidemiology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Mulin Jun Li
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Key Laboratory of Inflammation Biology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China.,Department of Epidemiology and Biostatistics, Tianjin Key Laboratory of Molecular Cancer Epidemiology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| |
Collapse
|
13
|
Marcath LA, Kidwell KM, Vangipuram K, Gersch CL, Rae JM, Burness ML, Griggs JJ, Van Poznak C, Hayes DF, Smith EML, Henry NL, Beutler AS, Hertz DL. Genetic variation in EPHA contributes to sensitivity to paclitaxel-induced peripheral neuropathy. Br J Clin Pharmacol 2020; 86:880-890. [PMID: 31823378 DOI: 10.1111/bcp.14192] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 11/12/2019] [Accepted: 11/20/2019] [Indexed: 12/23/2022] Open
Abstract
AIMS Chemotherapy-induced peripheral neuropathy (PN) is a treatment limiting toxicity of paclitaxel. We evaluated if EPHA genetic variation (EPHA4, EPHA5, EPHA6, and EPHA8) is associated with PN sensitivity by accounting for variability in systemic paclitaxel exposure (time above threshold). METHODS Germline DNA from 60 patients with breast cancer was sequenced. PN was measured using the 8-item sensory subscale (CIPN8) of the patient-reported CIPN20. Associations for 3 genetic models were tested by incorporating genetics into previously published PN prediction models integrating measured paclitaxel exposure and cumulative treatment. Significant associations were then tested for association with PN-related treatment disruption. RESULTS EPHA5 rs7349683 (minor allele frequency = 0.32) was associated with increased PN sensitivity (β-coefficient = 0.39, 95% confidence interval 0.11-0.67, p = 0.007). Setting a maximum tolerable threshold of CIPN8 = 30, optimal paclitaxel exposure target is shorter for rs7349683 homozygous (11.6 h) than heterozygous (12.6 h) or wild-type (13.6 h) patients. Total number of missense variants (median = 0, range 0-2) was associated with decreased PN sensitivity (β-coefficient: -0.42, 95% confidence interval -0.72 to -0.12, P = .006). No association with treatment disruption was detected for the total number of missense variants or rs7349683. CONCLUSION Isolating toxicity sensitivity by accounting for exposure is a novel approach, and rs7349683 represents a promising marker for PN sensitivity that may be used to individualize paclitaxel treatment.
Collapse
Affiliation(s)
- Lauren A Marcath
- Department of Pharmacotherapy, Washington State University College of Pharmacy and Pharmaceutical Sciences, Spokane, WA, USA
| | - Kelley M Kidwell
- University of Michigan Rogel Cancer Center, Ann Arbor, MI, USA.,Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Kiran Vangipuram
- Department of Clinical Pharmacy, University of Michigan College of Pharmacy, Ann Arbor, MI, USA
| | | | - James M Rae
- University of Michigan Rogel Cancer Center, Ann Arbor, MI, USA
| | - Monika L Burness
- University of Michigan Rogel Cancer Center, Ann Arbor, MI, USA.,Department of Internal Medicine, Division of Hematology/Oncology, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Jennifer J Griggs
- University of Michigan Rogel Cancer Center, Ann Arbor, MI, USA.,Department of Internal Medicine, Division of Hematology/Oncology, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Catherine Van Poznak
- University of Michigan Rogel Cancer Center, Ann Arbor, MI, USA.,Department of Internal Medicine, Division of Hematology/Oncology, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Daniel F Hayes
- University of Michigan Rogel Cancer Center, Ann Arbor, MI, USA.,Department of Internal Medicine, Division of Hematology/Oncology, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Ellen M Lavoie Smith
- Department of Health Behavior and Biological Sciences, University of Michigan School of Nursing, Ann Arbor, MI, USA
| | - N Lynn Henry
- Department of Internal Medicine, Division of Oncology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Andreas S Beutler
- Department of Anesthesiology, Mayo Clinic, Rochester, MN, USA.,Department of Oncology, Mayo Clinic, Rochester, MN, USA
| | - Daniel L Hertz
- Department of Clinical Pharmacy, University of Michigan College of Pharmacy, Ann Arbor, MI, USA
| |
Collapse
|
14
|
Rojano E, Seoane P, Ranea JAG, Perkins JR. Regulatory variants: from detection to predicting impact. Brief Bioinform 2019; 20:1639-1654. [PMID: 29893792 PMCID: PMC6917219 DOI: 10.1093/bib/bby039] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Revised: 04/18/2018] [Indexed: 02/01/2023] Open
Abstract
Variants within non-coding genomic regions can greatly affect disease. In recent years, increasing focus has been given to these variants, and how they can alter regulatory elements, such as enhancers, transcription factor binding sites and DNA methylation regions. Such variants can be considered regulatory variants. Concurrently, much effort has been put into establishing international consortia to undertake large projects aimed at discovering regulatory elements in different tissues, cell lines and organisms, and probing the effects of genetic variants on regulation by measuring gene expression. Here, we describe methods and techniques for discovering disease-associated non-coding variants using sequencing technologies. We then explain the computational procedures that can be used for annotating these variants using the information from the aforementioned projects, and prediction of their putative effects, including potential pathogenicity, based on rule-based and machine learning approaches. We provide the details of techniques to validate these predictions, by mapping chromatin-chromatin and chromatin-protein interactions, and introduce Clustered Regularly Interspaced Short Palindromic Repeats-Associated Protein 9 (CRISPR-Cas9) technology, which has already been used in this field and is likely to have a big impact on its future evolution. We also give examples of regulatory variants associated with multiple complex diseases. This review is aimed at bioinformaticians interested in the characterization of regulatory variants, molecular biologists and geneticists interested in understanding more about the nature and potential role of such variants from a functional point of views, and clinicians who may wish to learn about variants in non-coding genomic regions associated with a given disease and find out what to do next to uncover how they impact on the underlying mechanisms.
Collapse
Affiliation(s)
- Elena Rojano
- Department of Molecular Biology and Biochemistry, University of Malaga (UMA), 29010 Malaga, Spain
| | - Pedro Seoane
- Department of Molecular Biology and Biochemistry, University of Malaga (UMA), 29010 Malaga, Spain
| | - Juan A G Ranea
- CIBER de Enfermedades Raras, ISCIII, Madrid, Spain and Department of Molecular Biology and Biochemistry, University of Malaga (UMA), 29010 Malaga, Spain
| | - James R Perkins
- Research laboratory, IBIMA-Regional University Hospital of Malaga, UMA, Malaga 29009, Spain
| |
Collapse
|
15
|
Yang H, Chen R, Wang Q, Wei Q, Ji Y, Zheng G, Zhong X, Cox NJ, Li B. De novo pattern discovery enables robust assessment of functional consequences of non-coding variants. Bioinformatics 2019; 35:1453-1460. [PMID: 30256891 DOI: 10.1093/bioinformatics/bty826] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 08/17/2018] [Accepted: 09/25/2018] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Given the complexity of genome regions, prioritize the functional effects of non-coding variants remains a challenge. Although several frameworks have been proposed for the evaluation of the functionality of non-coding variants, most of them used 'black boxes' methods that simplify the task as the pathogenicity/benign classification problem, which ignores the distinct regulatory mechanisms of variants and leads to less desirable performance. In this study, we developed DVAR, an unsupervised framework that leverage various biochemical and evolutionary evidence to distinguish the gene regulatory categories of variants and assess their comprehensive functional impact simultaneously. RESULTS DVAR performed de novo pattern discovery in high-dimensional data and identified five regulatory clusters of non-coding variants. Leveraging the new insights into the multiple functional patterns, it measures both the between-class and the within-class functional implication of the variants to achieve accurate prioritization. Compared to other two-class learning methods, it showed improved performance in identification of clinically significant variants, fine-mapped GWAS variants, eQTLs and expression-modulating variants. Moreover, it has superior performance on disease causal variants verified by genome-editing (like CRISPR-Cas9), which could provide a pre-selection strategy for genome-editing technologies across the whole genome. Finally, evaluated in BioVU and UK Biobank, two large-scale DNA biobanks linked to complete electronic health records, DVAR demonstrated its effectiveness in prioritizing non-coding variants associated with medical phenotypes. AVAILABILITY AND IMPLEMENTATION The C++ and Python source codes, the pre-computed DVAR-cluster labels and DVAR-scores across the whole genome are available at https://www.vumc.org/cgg/dvar. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hai Yang
- Department of Molecular Physiology & Biophysics, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
| | - Rui Chen
- Department of Molecular Physiology & Biophysics, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
| | - Quan Wang
- Department of Molecular Physiology & Biophysics, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
| | - Qiang Wei
- Department of Molecular Physiology & Biophysics, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
| | - Ying Ji
- Department of Molecular Physiology & Biophysics, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
| | - Guangze Zheng
- Department of Molecular Physiology & Biophysics, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
| | - Xue Zhong
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Nancy J Cox
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Bingshan Li
- Department of Molecular Physiology & Biophysics, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
16
|
Vidal EA, Moyano TC, Bustos BI, Pérez-Palma E, Moraga C, Riveras E, Montecinos A, Azócar L, Soto DC, Vidal M, Di Genova A, Puschel K, Nürnberg P, Buch S, Hampe J, Allende ML, Cambiazo V, González M, Hodar C, Montecino M, Muñoz-Espinoza C, Orellana A, Reyes-Jara A, Travisany D, Vizoso P, Moraga M, Eyheramendy S, Maass A, De Ferrari GV, Miquel JF, Gutiérrez RA. Whole Genome Sequence, Variant Discovery and Annotation in Mapuche-Huilliche Native South Americans. Sci Rep 2019; 9:2132. [PMID: 30765821 PMCID: PMC6376018 DOI: 10.1038/s41598-019-39391-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 01/23/2019] [Indexed: 12/15/2022] Open
Abstract
Whole human genome sequencing initiatives help us understand population history and the basis of genetic diseases. Current data mostly focuses on Old World populations, and the information of the genomic structure of Native Americans, especially those from the Southern Cone is scant. Here we present annotation and variant discovery from high-quality complete genome sequences of a cohort of 11 Mapuche-Huilliche individuals (HUI) from Southern Chile. We found approximately 3.1 × 106 single nucleotide variants (SNVs) per individual and identified 403,383 (6.9%) of novel SNVs events. Analyses of large-scale genomic events detected 680 copy number variants (CNVs) and 4,514 structural variants (SVs), including 398 and 1,910 novel events, respectively. Global ancestry composition of HUI genomes revealed that the cohort represents a sample from a marginally admixed population from the Southern Cone, whose main genetic component derives from Native American ancestors. Additionally, we found that HUI genomes contain variants in genes associated with 5 of the 6 leading causes of noncommunicable diseases in Chile, which may have an impact on the risk of prevalent diseases in Chilean and Amerindian populations. Our data represents a useful resource that can contribute to population-based studies and for the design of early diagnostics or prevention tools for Native and admixed Latin American populations.
Collapse
Affiliation(s)
- Elena A Vidal
- FONDAP Center for Genome Regulation, Santiago, Chile
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, Santiago, Chile
| | - Tomás C Moyano
- FONDAP Center for Genome Regulation, Santiago, Chile
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Bernabé I Bustos
- FONDAP Center for Genome Regulation, Santiago, Chile
- Centro de Investigaciones Biomédicas, Facultad de Ciencias Biológicas y Facultad de Medicina, Universidad Andres Bello, Santiago, Chile
| | - Eduardo Pérez-Palma
- FONDAP Center for Genome Regulation, Santiago, Chile
- Centro de Investigaciones Biomédicas, Facultad de Ciencias Biológicas y Facultad de Medicina, Universidad Andres Bello, Santiago, Chile
| | - Carol Moraga
- FONDAP Center for Genome Regulation, Santiago, Chile
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Eleodoro Riveras
- FONDAP Center for Genome Regulation, Santiago, Chile
- Departamento de Gastroenterología, Facultad de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Alejandro Montecinos
- FONDAP Center for Genome Regulation, Santiago, Chile
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Lorena Azócar
- FONDAP Center for Genome Regulation, Santiago, Chile
- Departamento de Gastroenterología, Facultad de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Daniela C Soto
- FONDAP Center for Genome Regulation, Santiago, Chile
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Mabel Vidal
- FONDAP Center for Genome Regulation, Santiago, Chile
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Alex Di Genova
- FONDAP Center for Genome Regulation, Santiago, Chile
- Laboratorio de Bioinformática y Matemática del Genoma (LBMG-Mathomics), Centro de Modelamiento Matemático, Facultad de Ciencias Físicas y Matemáticas, Universidad de Chile, Santiago, Chile
| | - Klaus Puschel
- Departamento de Medicina Familiar, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Peter Nürnberg
- Cologne Center for Genomics (CCG), University of Cologne, Cologne, Germany
| | - Stephan Buch
- Medical Department I, University Hospital Dresden, TU Dresden, Germany
| | - Jochen Hampe
- Medical Department I, University Hospital Dresden, TU Dresden, Germany
| | - Miguel L Allende
- FONDAP Center for Genome Regulation, Santiago, Chile
- Departamento de Biología, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
| | - Verónica Cambiazo
- FONDAP Center for Genome Regulation, Santiago, Chile
- Laboratorio de Bioinformática y Expresión Génica, Instituto de Nutrición y Tecnología de los Alimentos, Universidad de Chile, Santiago, Chile
| | - Mauricio González
- FONDAP Center for Genome Regulation, Santiago, Chile
- Laboratorio de Bioinformática y Expresión Génica, Instituto de Nutrición y Tecnología de los Alimentos, Universidad de Chile, Santiago, Chile
| | - Christian Hodar
- FONDAP Center for Genome Regulation, Santiago, Chile
- Laboratorio de Bioinformática y Expresión Génica, Instituto de Nutrición y Tecnología de los Alimentos, Universidad de Chile, Santiago, Chile
| | - Martín Montecino
- FONDAP Center for Genome Regulation, Santiago, Chile
- Centro de Investigaciones Biomédicas, Facultad de Ciencias Biológicas y Facultad de Medicina, Universidad Andres Bello, Santiago, Chile
| | - Claudia Muñoz-Espinoza
- FONDAP Center for Genome Regulation, Santiago, Chile
- Centro de Biotecnología Vegetal, Facultad de Ciencias Biológicas, Universidad Andrés Bello, Santiago, Chile
| | - Ariel Orellana
- FONDAP Center for Genome Regulation, Santiago, Chile
- Centro de Biotecnología Vegetal, Facultad de Ciencias Biológicas, Universidad Andrés Bello, Santiago, Chile
| | - Angélica Reyes-Jara
- FONDAP Center for Genome Regulation, Santiago, Chile
- Laboratorio de Bioinformática y Expresión Génica, Instituto de Nutrición y Tecnología de los Alimentos, Universidad de Chile, Santiago, Chile
| | - Dante Travisany
- FONDAP Center for Genome Regulation, Santiago, Chile
- Laboratorio de Bioinformática y Matemática del Genoma (LBMG-Mathomics), Centro de Modelamiento Matemático, Facultad de Ciencias Físicas y Matemáticas, Universidad de Chile, Santiago, Chile
| | - Paula Vizoso
- FONDAP Center for Genome Regulation, Santiago, Chile
- Centro de Propagación y Conservación Vegetal (CEPROVEG), Facultad de Ciencias, Universidad Mayor, Santiago, Chile
| | - Mauricio Moraga
- Instituto de Ciencias Biomédicas, Facultad de Medicina, Universidad de Chile, Santiago, Chile
- Departamento de Antropología, Facultad de Ciencias Sociales, Universidad de Chile, Santiago, Chile
| | - Susana Eyheramendy
- Departmento de Estadística, Facultad de Matemáticas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Alejandro Maass
- FONDAP Center for Genome Regulation, Santiago, Chile
- Departamento de Medicina Familiar, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Giancarlo V De Ferrari
- FONDAP Center for Genome Regulation, Santiago, Chile.
- Centro de Investigaciones Biomédicas, Facultad de Ciencias Biológicas y Facultad de Medicina, Universidad Andres Bello, Santiago, Chile.
| | - Juan Francisco Miquel
- FONDAP Center for Genome Regulation, Santiago, Chile.
- Departamento de Gastroenterología, Facultad de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile.
| | - Rodrigo A Gutiérrez
- FONDAP Center for Genome Regulation, Santiago, Chile.
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile.
| |
Collapse
|
17
|
Almlöf JC, Nystedt S, Leonard D, Eloranta ML, Grosso G, Sjöwall C, Bengtsson AA, Jönsen A, Gunnarsson I, Svenungsson E, Rönnblom L, Sandling JK, Syvänen AC. Whole-genome sequencing identifies complex contributions to genetic risk by variants in genes causing monogenic systemic lupus erythematosus. Hum Genet 2019; 138:141-150. [PMID: 30707351 PMCID: PMC6373277 DOI: 10.1007/s00439-018-01966-7] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 12/13/2018] [Indexed: 01/01/2023]
Abstract
Systemic lupus erythematosus (SLE, OMIM 152700) is a systemic autoimmune disease with a complex etiology. The mode of inheritance of the genetic risk beyond familial SLE cases is currently unknown. Additionally, the contribution of heterozygous variants in genes known to cause monogenic SLE is not fully understood. Whole-genome sequencing of DNA samples from 71 Swedish patients with SLE and their healthy biological parents was performed to investigate the general genetic risk of SLE using known SLE GWAS risk loci identified using the ImmunoChip, variants in genes associated to monogenic SLE, and the mode of inheritance of SLE risk alleles in these families. A random forest model for predicting genetic risk for SLE showed that the SLE risk variants were mainly inherited from one of the parents. In the 71 patients, we detected a significant enrichment of ultra-rare ( ≤ 0.1%) missense and nonsense mutations in 22 genes known to cause monogenic forms of SLE. We identified one previously reported homozygous nonsense mutation in the C1QC (Complement C1q C Chain) gene, which explains the immunodeficiency and severe SLE phenotype of that patient. We also identified seven ultra-rare, coding heterozygous variants in five genes (C1S, DNASE1L3, DNASE1, IFIH1, and RNASEH2A) involved in monogenic SLE. Our findings indicate a complex contribution to the overall genetic risk of SLE by rare variants in genes associated with monogenic forms of SLE. The rare variants were inherited from the other parent than the one who passed on the more common risk variants leading to an increased genetic burden for SLE in the child. Higher frequency SLE risk variants are mostly passed from one of the parents to the offspring affected with SLE. In contrast, the other parent, in seven cases, contributed heterozygous rare variants in genes associated with monogenic forms of SLE, suggesting a larger impact of rare variants in SLE than hitherto reported.
Collapse
Affiliation(s)
- Jonas Carlsson Almlöf
- Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, 751 23, Uppsala, Sweden.
| | - Sara Nystedt
- Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, 751 23, Uppsala, Sweden
| | - Dag Leonard
- Department of Medical Sciences, Rheumatology and Science for Life Laboratory, Uppsala University, 751 85, Uppsala, Sweden
| | - Maija-Leena Eloranta
- Department of Medical Sciences, Rheumatology and Science for Life Laboratory, Uppsala University, 751 85, Uppsala, Sweden
| | - Giorgia Grosso
- Rheumatology Unit, Department of Medicine, Karolinska Institutet, Rheumatology, Karolinska University Hospital, 171 77, Stockholm, Sweden
| | - Christopher Sjöwall
- Division of Neuro and Inflammation Sciences, Department of Clinical and Experimental Medicine, Rheumatology, Linköping University, 581 83, Linköping, Sweden
| | - Anders A Bengtsson
- Department of Clinical Sciences, Rheumatology, Lund University, Skåne University Hospital, 222 42, Lund, Sweden
| | - Andreas Jönsen
- Department of Clinical Sciences, Rheumatology, Lund University, Skåne University Hospital, 222 42, Lund, Sweden
| | - Iva Gunnarsson
- Rheumatology Unit, Department of Medicine, Karolinska Institutet, Rheumatology, Karolinska University Hospital, 171 77, Stockholm, Sweden
| | - Elisabet Svenungsson
- Rheumatology Unit, Department of Medicine, Karolinska Institutet, Rheumatology, Karolinska University Hospital, 171 77, Stockholm, Sweden
| | - Lars Rönnblom
- Department of Medical Sciences, Rheumatology and Science for Life Laboratory, Uppsala University, 751 85, Uppsala, Sweden
| | - Johanna K Sandling
- Department of Medical Sciences, Rheumatology and Science for Life Laboratory, Uppsala University, 751 85, Uppsala, Sweden
| | - Ann-Christine Syvänen
- Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, 751 23, Uppsala, Sweden
| |
Collapse
|