1
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: trends from three decades of genetic variant impact predictors. Hum Genomics 2024; 18:90. [PMID: 39198917 PMCID: PMC11360829 DOI: 10.1186/s40246-024-00663-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2024] [Accepted: 08/19/2024] [Indexed: 09/01/2024] Open
Abstract
BACKGROUND Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). RESULTS The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past three decades, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 190 VIPs, resulting in a total of 407 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. CONCLUSIONS VIPdb version 2 summarizes 407 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. VIPdb is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA
| | - Arul S Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA
- Illumina, Foster City, CA, 94404, USA
| | - Steven E Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA.
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA.
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA.
| |
Collapse
|
2
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: Trends from 25 years of genetic variant impact predictors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600283. [PMID: 38979289 PMCID: PMC11230257 DOI: 10.1101/2024.06.25.600283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Background Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). Results The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past 25 years, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 186 VIPs, resulting in a total of 403 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. Conclusions VIPdb version 2 summarizes 403 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. Availability VIPdb version 2 is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
| | - Arul S. Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Currently at: Illumina, Foster City, California 94404, USA
| | - Steven E. Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
3
|
Lynn N, Tuller T. Detecting and understanding meaningful cancerous mutations based on computational models of mRNA splicing. NPJ Syst Biol Appl 2024; 10:25. [PMID: 38453965 PMCID: PMC10920900 DOI: 10.1038/s41540-024-00351-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 02/22/2024] [Indexed: 03/09/2024] Open
Abstract
Cancer research has long relied on non-silent mutations. Yet, it has become overwhelmingly clear that silent mutations can affect gene expression and cancer cell fitness. One fundamental mechanism that apparently silent mutations can severely disrupt is alternative splicing. Here we introduce Oncosplice, a tool that scores mutations based on models of proteomes generated using aberrant splicing predictions. Oncosplice leverages a highly accurate neural network that predicts splice sites within arbitrary mRNA sequences, a greedy transcript constructor that considers alternate arrangements of splicing blueprints, and an algorithm that grades the functional divergence between proteins based on evolutionary conservation. By applying this tool to 12M somatic mutations we identify 8K deleterious variants that are significantly depleted within the healthy population; we demonstrate the tool's ability to identify clinically validated pathogenic variants with a positive predictive value of 94%; we show strong enrichment of predicted deleterious mutations across pan-cancer drivers. We also achieve improved patient survival estimation using a proposed set of novel cancer-involved genes. Ultimately, this pipeline enables accelerated insight-gathering of sequence-specific consequences for a class of understudied mutations and provides an efficient way of filtering through massive variant datasets - functionalities with immediate experimental and clinical applications.
Collapse
Affiliation(s)
- Nicolas Lynn
- Department of Biomedical Engineering, the Engineering Faculty, Tel Aviv University, Tel-Aviv, 69978, Israel
| | - Tamir Tuller
- Department of Biomedical Engineering, the Engineering Faculty, Tel Aviv University, Tel-Aviv, 69978, Israel.
| |
Collapse
|
4
|
Clay S, Evans A, Zambrano R, Otohinoyi D, Hicks C, Tsien F. Bioinformatics characterization of variants of uncertain significance in pediatric sensorineural hearing loss. Front Pediatr 2024; 12:1299341. [PMID: 38450295 PMCID: PMC10915201 DOI: 10.3389/fped.2024.1299341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 01/31/2024] [Indexed: 03/08/2024] Open
Abstract
Introduction Rapid advancements in Next Generation Sequencing (NGS) and bioinformatics tools have allowed physicians to obtain genetic testing results in a more rapid, cost-effective, and comprehensive manner than ever before. Around 50% of pediatric sensorineural hearing loss (SNHL) cases are due to a genetic etiology, thus physicians regularly utilize targeted sequencing panels that identify variants in genes related to SNHL. These panels allow for early detection of pathogenic variants which allows physicians to provide anticipatory guidance to families. Molecular testing does not always reveal a clear etiology due to the presence of multigenic variants with varying classifications, including the presence of Variants of Uncertain Significance (VUS). This study aims to perform a preliminary bioinformatics characterization of patients with variants associated with Type II Usher Syndrome in the presence of other multigenic variants. We also provide an interpretation algorithm for physicians reviewing molecular results with medical geneticists. Methods Review of records for multigenic and/or VUS results identified several potential subjects of interest. For the purposes of this study, two ADGRV1 compound heterozygotes met inclusion criteria. Sequencing, data processing, and variant calling (the process by which variants are identified from sequence data) was performed at Invitae (San Francisco CA). The preliminary analysis followed the recommendations outlined by the American College of Medical Genetics and Association for Molecular Pathology (ACMG-AMP) in 2015 and 2019. The present study utilizes computational analysis, predictive data, and population data as well as clinical information from chart review and publicly available information in the ClinVar database. Results Two subjects were identified as compound heterozygotes for variants in the gene ADGRV1. Subject 1's variants were predicted as deleterious, while Subject 2's variants were predicted as non-deleterious. These results were based on known information of the variants from ClinVar, multiple lines of computational data, population databases, as well as the clinical presentation. Discussion Early molecular diagnosis through NGS is ideal, as families are then able to access a wide range of resources that will ultimately support the child as their condition progresses. We recommend that physicians build strong relationships with medical geneticists and carefully review their interpretation before making recommendations to families, particularly when addressing the VUS. Reclassification efforts of VUS are supported by studies like ours that provide evidence of pathogenic or benign effects of variants.
Collapse
Affiliation(s)
- Sloane Clay
- Department of Genetics, Louisiana State University Health Sciences Center, New Orleans, LA, United States
| | - Adele Evans
- Department of Otolaryngology, Children's Hospital of New Orleans, New Orleans, LA, United States
| | - Regina Zambrano
- Department of Pediatrics, Division of Clinical Genetics, Louisiana State University Health Sciences Center and Children’s Hospital of New Orleans, New Orleans, LA, United States
| | - David Otohinoyi
- Department of Genetics, Bioinformatics and Genomics Program, Louisiana State University Health Sciences Center, New Orleans, LA, United States
| | - Chindo Hicks
- Department of Genetics, Bioinformatics and Genomics Program, Louisiana State University Health Sciences Center, New Orleans, LA, United States
| | - Fern Tsien
- Department of Genetics, Louisiana State University Health Sciences Center, New Orleans, LA, United States
| |
Collapse
|
5
|
Hipólito A, Xavier R, Brito C, Tomás A, Lemos I, Cabaço LC, Silva F, Oliva A, Barral DC, Vicente JB, Gonçalves LG, Pojo M, Serpa J. BRD9 status is a major contributor for cysteine metabolic remodeling through MST and EAAT3 modulation in malignant melanoma. Biochim Biophys Acta Mol Basis Dis 2024; 1870:166983. [PMID: 38070581 DOI: 10.1016/j.bbadis.2023.166983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 10/31/2023] [Accepted: 11/30/2023] [Indexed: 12/17/2023]
Abstract
Cutaneous melanoma (CM) is the most aggressive skin cancer, showing globally increasing incidence. Hereditary CM accounts for a significant percentage (5-15 %) of all CM cases. However, most familial cases remain without a known genetic cause. Even though, BRD9 has been associated to CM as a susceptibility gene. The molecular events following BRD9 mutagenesis are still not completely understood. In this study, we disclosed BRD9 as a key regulator in cysteine metabolism and associated altered BRD9 to increased cell proliferation, migration and invasiveness, as well as to altered melanin levels, inducing higher susceptibility to melanomagenesis. It is evident that BRD9 WT and mutated BRD9 (c.183G>C) have a different impact on cysteine metabolism, respectively by inhibiting and activating MPST expression in the metastatic A375 cell line. The effect of the mutated BRD9 variant was more evident in A375 cells than in the less invasive WM115 line. Our data point out novel molecular and metabolic mechanisms dependent on BRD9 status that potentially account for the increased risk of developing CM and enhancing CM aggressiveness. Moreover, our findings emphasize the role of cysteine metabolism remodeling in melanoma progression and open new queues to follow to explore the role of BRD9 as a melanoma susceptibility or cancer-related gene.
Collapse
Affiliation(s)
- Ana Hipólito
- iNOVA4Health, NOVA Medical School, Faculdade de Ciências Médicas, NMS, FCM, Universidade NOVA de Lisboa, Campo dos Mártires da Pátria, 130, 1169-056 Lisboa, Portugal; Instituto Português de Oncologia de Lisboa Francisco Gentil (IPOLFG), Rua Prof Lima Basto, 1099-023 Lisboa, Portugal
| | - Renato Xavier
- Instituto Português de Oncologia de Lisboa Francisco Gentil (IPOLFG), Rua Prof Lima Basto, 1099-023 Lisboa, Portugal
| | - Cheila Brito
- Instituto Português de Oncologia de Lisboa Francisco Gentil (IPOLFG), Rua Prof Lima Basto, 1099-023 Lisboa, Portugal
| | - Ana Tomás
- iNOVA4Health, NOVA Medical School, Faculdade de Ciências Médicas, NMS, FCM, Universidade NOVA de Lisboa, Campo dos Mártires da Pátria, 130, 1169-056 Lisboa, Portugal; Instituto Português de Oncologia de Lisboa Francisco Gentil (IPOLFG), Rua Prof Lima Basto, 1099-023 Lisboa, Portugal
| | - Isabel Lemos
- Instituto Português de Oncologia de Lisboa Francisco Gentil (IPOLFG), Rua Prof Lima Basto, 1099-023 Lisboa, Portugal; Instituto de Tecnologia Química e Tecnológica (ITQB) António Xavier da Universidade Nova de Lisboa, Av. da República, 2780-157 Oeiras, Portugal
| | - Luís C Cabaço
- iNOVA4Health, NOVA Medical School, Faculdade de Ciências Médicas, NMS, FCM, Universidade NOVA de Lisboa, Campo dos Mártires da Pátria, 130, 1169-056 Lisboa, Portugal
| | - Fernanda Silva
- iNOVA4Health, NOVA Medical School, Faculdade de Ciências Médicas, NMS, FCM, Universidade NOVA de Lisboa, Campo dos Mártires da Pátria, 130, 1169-056 Lisboa, Portugal; Instituto Português de Oncologia de Lisboa Francisco Gentil (IPOLFG), Rua Prof Lima Basto, 1099-023 Lisboa, Portugal
| | - Abel Oliva
- Instituto de Tecnologia Química e Tecnológica (ITQB) António Xavier da Universidade Nova de Lisboa, Av. da República, 2780-157 Oeiras, Portugal
| | - Duarte C Barral
- iNOVA4Health, NOVA Medical School, Faculdade de Ciências Médicas, NMS, FCM, Universidade NOVA de Lisboa, Campo dos Mártires da Pátria, 130, 1169-056 Lisboa, Portugal
| | - João B Vicente
- Instituto de Tecnologia Química e Tecnológica (ITQB) António Xavier da Universidade Nova de Lisboa, Av. da República, 2780-157 Oeiras, Portugal
| | - Luís G Gonçalves
- Instituto de Tecnologia Química e Tecnológica (ITQB) António Xavier da Universidade Nova de Lisboa, Av. da República, 2780-157 Oeiras, Portugal
| | - Marta Pojo
- Instituto Português de Oncologia de Lisboa Francisco Gentil (IPOLFG), Rua Prof Lima Basto, 1099-023 Lisboa, Portugal
| | - Jacinta Serpa
- iNOVA4Health, NOVA Medical School, Faculdade de Ciências Médicas, NMS, FCM, Universidade NOVA de Lisboa, Campo dos Mártires da Pátria, 130, 1169-056 Lisboa, Portugal; Instituto Português de Oncologia de Lisboa Francisco Gentil (IPOLFG), Rua Prof Lima Basto, 1099-023 Lisboa, Portugal.
| |
Collapse
|
6
|
Joudaki A, Takeda JI, Masuda A, Ode R, Fujiwara K, Ohno K. FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon. Genes (Basel) 2023; 14:1765. [PMID: 37761905 PMCID: PMC10531444 DOI: 10.3390/genes14091765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 08/30/2023] [Accepted: 09/04/2023] [Indexed: 09/29/2023] Open
Abstract
Single nucleotide variants (SNVs) affecting the first nucleotide G of an exon (Fex-SNVs) identified in various diseases are mostly recognized as missense or nonsense variants. Their effect on pre-mRNA splicing has been seldom analyzed, and no curated database is available. We previously reported that Fex-SNVs affect splicing when the length of the polypyrimidine tract is short or degenerate. However, we cannot readily predict the splicing effects of Fex-SNVs. We here scrutinized the available literature and identified 106 splicing-affecting Fex-SNVs based on experimental evidence. We similarly identified 106 neutral Fex-SNVs in the dbSNP database with a global minor allele frequency (MAF) of more than 0.01 and less than 0.50. We extracted 115 features representing the strength of splicing cis-elements and developed machine-learning models with support vector machine, random forest, and gradient boosting to discriminate splicing-affecting and neutral Fex-SNVs. Gradient boosting-based LightGBM outperformed the other two models, and the length and nucleotide compositions of the polypyrimidine tract played critical roles in the discrimination. Recursive feature elimination showed that the LightGBM model using 15 features achieved the best performance with an accuracy of 0.80 ± 0.12 (mean and SD), a Matthews Correlation Coefficient (MCC) of 0.57 ± 0.15, an area under the curve of the receiver operating characteristics curve (AUROC) of 0.86 ± 0.08, and an area under the curve of the precision-recall curve (AUPRC) of 0.87 ± 0.09 using a 10-fold cross-validation. We developed a web service program, named FexSplice that accepts a genomic coordinate either on GRCh37/hg19 or GRCh38/hg38 and returns a predicted probability of aberrant splicing of A, C, and T variants.
Collapse
Affiliation(s)
- Atefeh Joudaki
- Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai, Showa-ku, Nagoya 466-8550, Japan; (A.J.); (J.-i.T.); (A.M.)
| | - Jun-ichi Takeda
- Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai, Showa-ku, Nagoya 466-8550, Japan; (A.J.); (J.-i.T.); (A.M.)
| | - Akio Masuda
- Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai, Showa-ku, Nagoya 466-8550, Japan; (A.J.); (J.-i.T.); (A.M.)
| | - Rikumo Ode
- Department of Materials Science and Engineering, Nagoya University Graduate School of Engineering, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan; (R.O.); (K.F.)
| | - Koichi Fujiwara
- Department of Materials Science and Engineering, Nagoya University Graduate School of Engineering, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan; (R.O.); (K.F.)
| | - Kinji Ohno
- Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai, Showa-ku, Nagoya 466-8550, Japan; (A.J.); (J.-i.T.); (A.M.)
| |
Collapse
|
7
|
Barbosa P, Savisaar R, Carmo-Fonseca M, Fonseca A. Computational prediction of human deep intronic variation. Gigascience 2022; 12:giad085. [PMID: 37878682 PMCID: PMC10599398 DOI: 10.1093/gigascience/giad085] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 06/07/2023] [Accepted: 09/20/2023] [Indexed: 10/27/2023] Open
Abstract
BACKGROUND The adoption of whole-genome sequencing in genetic screens has facilitated the detection of genetic variation in the intronic regions of genes, far from annotated splice sites. However, selecting an appropriate computational tool to discriminate functionally relevant genetic variants from those with no effect is challenging, particularly for deep intronic regions where independent benchmarks are scarce. RESULTS In this study, we have provided an overview of the computational methods available and the extent to which they can be used to analyze deep intronic variation. We leveraged diverse datasets to extensively evaluate tool performance across different intronic regions, distinguishing between variants that are expected to disrupt splicing through different molecular mechanisms. Notably, we compared the performance of SpliceAI, a widely used sequence-based deep learning model, with that of more recent methods that extend its original implementation. We observed considerable differences in tool performance depending on the region considered, with variants generating cryptic splice sites being better predicted than those that potentially affect splicing regulatory elements. Finally, we devised a novel quantitative assessment of tool interpretability and found that tools providing mechanistic explanations of their predictions are often correct with respect to the ground - information, but the use of these tools results in decreased predictive power when compared to black box methods. CONCLUSIONS Our findings translate into practical recommendations for tool usage and provide a reference framework for applying prediction tools in deep intronic regions, enabling more informed decision-making by practitioners.
Collapse
Affiliation(s)
- Pedro Barbosa
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016,, Lisboa, Portugal
- Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, 1649-028, Lisboa, Portugal
| | | | - Maria Carmo-Fonseca
- Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, 1649-028, Lisboa, Portugal
| | - Alcides Fonseca
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016,, Lisboa, Portugal
| |
Collapse
|