1
|
Stenton SL, O'Leary MC, Lemire G, VanNoy GE, DiTroia S, Ganesh VS, Groopman E, O'Heir E, Mangilog B, Osei-Owusu I, Pais LS, Serrano J, Singer-Berk M, Weisburd B, Wilson MW, Austin-Tse C, Abdelhakim M, Althagafi A, Babbi G, Bellazzi R, Bovo S, Carta MG, Casadio R, Coenen PJ, De Paoli F, Floris M, Gajapathy M, Hoehndorf R, Jacobsen JOB, Joseph T, Kamandula A, Katsonis P, Kint C, Lichtarge O, Limongelli I, Lu Y, Magni P, Mamidi TKK, Martelli PL, Mulargia M, Nicora G, Nykamp K, Pejaver V, Peng Y, Pham THC, Podda MS, Rao A, Rizzo E, Saipradeep VG, Savojardo C, Schols P, Shen Y, Sivadasan N, Smedley D, Soru D, Srinivasan R, Sun Y, Sunderam U, Tan W, Tiwari N, Wang X, Wang Y, Williams A, Worthey EA, Yin R, You Y, Zeiberg D, Zucca S, Bakolitsa C, Brenner SE, Fullerton SM, Radivojac P, Rehm HL, O'Donnell-Luria A. Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project. Hum Genomics 2024; 18:44. [PMID: 38685113 PMCID: PMC11057178 DOI: 10.1186/s40246-024-00604-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 04/02/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting. METHODS We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values. RESULTS Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency. CONCLUSIONS Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.
Collapse
|
2
|
Shah RA, Chahal CAA, Ranjha S, Sharaf Dabbagh G, Asatryan B, Limongelli I, Khanji M, Ricci F, De Paoli F, Zucca S, Tristani-Firouzi M, St Louis EK, So EL, Somers VK. Cardiovascular Disease Burden, Mortality, and Sudden Death Risk in Epilepsy: A UK Biobank Study. Can J Cardiol 2024; 40:688-695. [PMID: 38013064 DOI: 10.1016/j.cjca.2023.11.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/19/2023] [Accepted: 11/07/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND Sudden death is the leading cause of mortality in medically refractory epilepsy. Middle-aged persons with epilepsy (PWE) are under investigated regarding their mortality risk and burden of cardiovascular disease (CVD). METHODS Using UK Biobank, we identified 7786 (1.6%) participants with diagnoses of epilepsy and 6,171,803 person-years of follow-up (mean 12.30 years, standard deviation 1.74); 566 patients with previous histories of stroke were excluded. The 7220 PWE comprised the study cohort with the remaining 494,676 without epilepsy as the comparator group. Prevalence of CVD was determined using validated diagnostic codes. Cox proportional hazards regression was used to assess all-cause mortality and sudden death risk. RESULTS Hypertension, coronary artery disease, heart failure, valvular heart disease, and congenital heart disease were more prevalent in PWE. Arrhythmias including atrial fibrillation/flutter (12.2% vs 6.9%; P < 0.01), bradyarrhythmias (7.7% vs 3.5%; P < 0.01), conduction defects (6.1% vs 2.6%; P < 0.01), and ventricular arrhythmias (2.3% vs 1.0%; P < 0.01), as well as cardiac implantable electric devices (4.6% vs 2.0%; P < 0.01) were more prevalent in PWE. PWE had higher adjusted all-cause mortality (hazard ratio [HR], 3.9; 95% confidence interval [CI], 3.01-3.39), and sudden death-specific mortality (HR, 6.65; 95% CI, 4.53-9.77); and were almost 2 years younger at death (68.1 vs 69.8; P < 0.001). CONCLUSIONS Middle-aged PWE have increased all-cause and sudden death-specific mortality and higher burden of CVD including arrhythmias and heart failure. Further work is required to elucidate mechanisms underlying all-cause mortality and sudden death risk in PWE of middle age, to identify prognostic biomarkers and develop preventative therapies in PWE.
Collapse
|
3
|
De Paoli F, Berardelli S, Limongelli I, Rizzo E, Zucca S. VarChat: the generative AI assistant for the interpretation of human genomic variations. Bioinformatics 2024; 40:btae183. [PMID: 38579245 PMCID: PMC11055464 DOI: 10.1093/bioinformatics/btae183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 03/05/2024] [Accepted: 04/04/2024] [Indexed: 04/07/2024] Open
Abstract
MOTIVATION In the modern era of genomic research, the scientific community is witnessing an explosive growth in the volume of published findings. While this abundance of data offers invaluable insights, it also places a pressing responsibility on genetic professionals and researchers to stay informed about the latest findings and their clinical significance. Genomic variant interpretation is currently facing a challenge in identifying the most up-to-date and relevant scientific papers, while also extracting meaningful information to accelerate the process from clinical assessment to reporting. Computer-aided literature search and summarization can play a pivotal role in this context. By synthesizing complex genomic findings into concise, interpretable summaries, this approach facilitates the translation of extensive genomic datasets into clinically relevant insights. RESULTS To bridge this gap, we present VarChat (varchat.engenome.com), an innovative tool based on generative AI, developed to find and summarize the fragmented scientific literature associated with genomic variants into brief yet informative texts. VarChat provides users with a concise description of specific genetic variants, detailing their impact on related proteins and possible effects on human health. In addition, VarChat offers direct links to related scientific trustable sources, and encourages deeper research. AVAILABILITY AND IMPLEMENTATION varchat.engenome.com.
Collapse
|
4
|
Zucca S, Nicora G, De Paoli F, Carta MG, Bellazzi R, Magni P, Rizzo E, Limongelli I. An AI-based approach driven by genotypes and phenotypes to uplift the diagnostic yield of genetic diseases. Hum Genet 2024:10.1007/s00439-023-02638-x. [PMID: 38520562 DOI: 10.1007/s00439-023-02638-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 12/27/2023] [Indexed: 03/25/2024]
Abstract
Identifying disease-causing variants in Rare Disease patients' genome is a challenging problem. To accomplish this task, we describe a machine learning framework, that we called "Suggested Diagnosis", whose aim is to prioritize genetic variants in an exome/genome based on the probability of being disease-causing. To do so, our method leverages standard guidelines for germline variant interpretation as defined by the American College of Human Genomics (ACMG) and the Association for Molecular Pathology (AMP), inheritance information, phenotypic similarity, and variant quality. Starting from (1) the VCF file containing proband's variants, (2) the list of proband's phenotypes encoded in Human Phenotype Ontology terms, and optionally (3) the information about family members (if available), the "Suggested Diagnosis" ranks all the variants according to their machine learning prediction. This method significantly reduces the number of variants that need to be evaluated by geneticists by pinpointing causative variants in the very first positions of the prioritized list. Most importantly, our approach proved to be among the top performers within the CAGI6 Rare Genome Project Challenge, where it was able to rank the true causative variant among the first positions and, uniquely among all the challenge participants, increased the diagnostic yield of 12.5% by solving 2 undiagnosed cases.
Collapse
|
5
|
Jain S, Bakolitsa C, Brenner SE, Radivojac P, Moult J, Repo S, Hoskins RA, Andreoletti G, Barsky D, Chellapan A, Chu H, Dabbiru N, Kollipara NK, Ly M, Neumann AJ, Pal LR, Odell E, Pandey G, Peters-Petrulewicz RC, Srinivasan R, Yee SF, Yeleswarapu SJ, Zuhl M, Adebali O, Patra A, Beer MA, Hosur R, Peng J, Bernard BM, Berry M, Dong S, Boyle AP, Adhikari A, Chen J, Hu Z, Wang R, Wang Y, Miller M, Wang Y, Bromberg Y, Turina P, Capriotti E, Han JJ, Ozturk K, Carter H, Babbi G, Bovo S, Di Lena P, Martelli PL, Savojardo C, Casadio R, Cline MS, De Baets G, Bonache S, Díez O, Gutiérrez-Enríquez S, Fernández A, Montalban G, Ootes L, Özkan S, Padilla N, Riera C, De la Cruz X, Diekhans M, Huwe PJ, Wei Q, Xu Q, Dunbrack RL, Gotea V, Elnitski L, Margolin G, Fariselli P, Kulakovskiy IV, Makeev VJ, Penzar DD, Vorontsov IE, Favorov AV, Forman JR, Hasenahuer M, Fornasari MS, Parisi G, Avsec Z, Çelik MH, Nguyen TYD, Gagneur J, Shi FY, Edwards MD, Guo Y, Tian K, Zeng H, Gifford DK, Göke J, Zaucha J, Gough J, Ritchie GRS, Frankish A, Mudge JM, Harrow J, Young EL, Yu Y, Huff CD, Murakami K, Nagai Y, Imanishi T, Mungall CJ, Jacobsen JOB, Kim D, Jeong CS, Jones DT, Li MJ, Guthrie VB, Bhattacharya R, Chen YC, Douville C, Fan J, Kim D, Masica D, Niknafs N, Sengupta S, Tokheim C, Turner TN, Yeo HTG, Karchin R, Shin S, Welch R, Keles S, Li Y, Kellis M, Corbi-Verge C, Strokach AV, Kim PM, Klein TE, Mohan R, Sinnott-Armstrong NA, Wainberg M, Kundaje A, Gonzaludo N, Mak ACY, Chhibber A, Lam HYK, Dahary D, Fishilevich S, Lancet D, Lee I, Bachman B, Katsonis P, Lua RC, Wilson SJ, Lichtarge O, Bhat RR, Sundaram L, Viswanath V, Bellazzi R, Nicora G, Rizzo E, Limongelli I, Mezlini AM, Chang R, Kim S, Lai C, O’Connor R, Topper S, van den Akker J, Zhou AY, Zimmer AD, Mishne G, Bergquist TR, Breese MR, Guerrero RF, Jiang Y, Kiga N, Li B, Mort M, Pagel KA, Pejaver V, Stamboulian MH, Thusberg J, Mooney SD, Teerakulkittipong N, Cao C, Kundu K, Yin Y, Yu CH, Kleyman M, Lin CF, Stackpole M, Mount SM, Eraslan G, Mueller NS, Naito T, Rao AR, Azaria JR, Brodie A, Ofran Y, Garg A, Pal D, Hawkins-Hooker A, Kenlay H, Reid J, Mucaki EJ, Rogan PK, Schwarz JM, Searls DB, Lee GR, Seok C, Krämer A, Shah S, Huang CV, Kirsch JF, Shatsky M, Cao Y, Chen H, Karimi M, Moronfoye O, Sun Y, Shen Y, Shigeta R, Ford CT, Nodzak C, Uppal A, Shi X, Joseph T, Kotte S, Rana S, Rao A, Saipradeep VG, Sivadasan N, Sunderam U, Stanke M, Su A, Adzhubey I, Jordan DM, Sunyaev S, Rousseau F, Schymkowitz J, Van Durme J, Tavtigian SV, Carraro M, Giollo M, Tosatto SCE, Adato O, Carmel L, Cohen NE, Fenesh T, Holtzer T, Juven-Gershon T, Unger R, Niroula A, Olatubosun A, Väliaho J, Yang Y, Vihinen M, Wahl ME, Chang B, Chong KC, Hu I, Sun R, Wu WKK, Xia X, Zee BC, Wang MH, Wang M, Wu C, Lu Y, Chen K, Yang Y, Yates CM, Kreimer A, Yan Z, Yosef N, Zhao H, Wei Z, Yao Z, Zhou F, Folkman L, Zhou Y, Daneshjou R, Altman RB, Inoue F, Ahituv N, Arkin AP, Lovisa F, Bonvini P, Bowdin S, Gianni S, Mantuano E, Minicozzi V, Novak L, Pasquo A, Pastore A, Petrosino M, Puglisi R, Toto A, Veneziano L, Chiaraluce R, Ball MP, Bobe JR, Church GM, Consalvi V, Cooper DN, Buckley BA, Sheridan MB, Cutting GR, Scaini MC, Cygan KJ, Fredericks AM, Glidden DT, Neil C, Rhine CL, Fairbrother WG, Alontaga AY, Fenton AW, Matreyek KA, Starita LM, Fowler DM, Löscher BS, Franke A, Adamson SI, Graveley BR, Gray JW, Malloy MJ, Kane JP, Kousi M, Katsanis N, Schubach M, Kircher M, Mak ACY, Tang PLF, Kwok PY, Lathrop RH, Clark WT, Yu GK, LeBowitz JH, Benedicenti F, Bettella E, Bigoni S, Cesca F, Mammi I, Marino-Buslje C, Milani D, Peron A, Polli R, Sartori S, Stanzial F, Toldo I, Turolla L, Aspromonte MC, Bellini M, Leonardi E, Liu X, Marshall C, McCombie WR, Elefanti L, Menin C, Meyn MS, Murgia A, Nadeau KCY, Neuhausen SL, Nussbaum RL, Pirooznia M, Potash JB, Dimster-Denk DF, Rine JD, Sanford JR, Snyder M, Cote AG, Sun S, Verby MW, Weile J, Roth FP, Tewhey R, Sabeti PC, Campagna J, Refaat MM, Wojciak J, Grubb S, Schmitt N, Shendure J, Spurdle AB, Stavropoulos DJ, Walton NA, Zandi PP, Ziv E, Burke W, Chen F, Carr LR, Martinez S, Paik J, Harris-Wai J, Yarborough M, Fullerton SM, Koenig BA, McInnes G, Shigaki D, Chandonia JM, Furutsuki M, Kasak L, Yu C, Chen R, Friedberg I, Getz GA, Cong Q, Kinch LN, Zhang J, Grishin NV, Voskanian A, Kann MG, Tran E, Ioannidis NM, Hunter JM, Udani R, Cai B, Morgan AA, Sokolov A, Stuart JM, Minervini G, Monzon AM, Batzoglou S, Butte AJ, Greenblatt MS, Hart RK, Hernandez R, Hubbard TJP, Kahn S, O’Donnell-Luria A, Ng PC, Shon J, Veltman J, Zook JM. CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods. Genome Biol 2024; 25:53. [PMID: 38389099 PMCID: PMC10882881 DOI: 10.1186/s13059-023-03113-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2023] [Accepted: 11/17/2023] [Indexed: 02/24/2024] Open
Abstract
BACKGROUND The Critical Assessment of Genome Interpretation (CAGI) aims to advance the state-of-the-art for computational prediction of genetic variant impact, particularly where relevant to disease. The five complete editions of the CAGI community experiment comprised 50 challenges, in which participants made blind predictions of phenotypes from genetic data, and these were evaluated by independent assessors. RESULTS Performance was particularly strong for clinical pathogenic variants, including some difficult-to-diagnose cases, and extends to interpretation of cancer-related variants. Missense variant interpretation methods were able to estimate biochemical effects with increasing accuracy. Assessment of methods for regulatory variants and complex trait disease risk was less definitive and indicates performance potentially suitable for auxiliary use in the clinic. CONCLUSIONS Results show that while current methods are imperfect, they have major utility for research and clinical applications. Emerging methods and increasingly large, robust datasets for training and assessment promise further progress ahead.
Collapse
|
6
|
Stenton SL, O’Leary M, Lemire G, VanNoy GE, DiTroia S, Ganesh VS, Groopman E, O’Heir E, Mangilog B, Osei-Owusu I, Pais LS, Serrano J, Singer-Berk M, Weisburd B, Wilson M, Austin-Tse C, Abdelhakim M, Althagafi A, Babbi G, Bellazzi R, Bovo S, Carta MG, Casadio R, Coenen PJ, De Paoli F, Floris M, Gajapathy M, Hoehndorf R, Jacobsen JO, Joseph T, Kamandula A, Katsonis P, Kint C, Lichtarge O, Limongelli I, Lu Y, Magni P, Mamidi TKK, Martelli PL, Mulargia M, Nicora G, Nykamp K, Pejaver V, Peng Y, Pham THC, Podda MS, Rao A, Rizzo E, Saipradeep VG, Savojardo C, Schols P, Shen Y, Sivadasan N, Smedley D, Soru D, Srinivasan R, Sun Y, Sunderam U, Tan W, Tiwari N, Wang X, Wang Y, Williams A, Worthey EA, Yin R, You Y, Zeiberg D, Zucca S, Bakolitsa C, Brenner SE, Fullerton SM, Radivojac P, Rehm HL, O’Donnell-Luria A. Critical assessment of variant prioritization methods for rare disease diagnosis within the Rare Genomes Project. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.08.02.23293212. [PMID: 37577678 PMCID: PMC10418577 DOI: 10.1101/2023.08.02.23293212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Background A major obstacle faced by rare disease families is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years, and causal variants are identified in under 50%. The Rare Genomes Project (RGP) is a direct-to-participant research study on the utility of genome sequencing (GS) for diagnosis and gene discovery. Families are consented for sharing of sequence and phenotype data with researchers, allowing development of a Critical Assessment of Genome Interpretation (CAGI) community challenge, placing variant prioritization models head-to-head in a real-life clinical diagnostic setting. Methods Predictors were provided a dataset of phenotype terms and variant calls from GS of 175 RGP individuals (65 families), including 35 solved training set families, with causal variants specified, and 30 test set families (14 solved, 16 unsolved). The challenge tasked teams with identifying the causal variants in as many test set families as possible. Ranked variant predictions were submitted with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on rank position of true positive causal variants and maximum F-measure, based on precision and recall of causal variants across EPCR thresholds. Results Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performing teams recalled the causal variants in up to 13 of 14 solved families by prioritizing high quality variant calls that were rare, predicted deleterious, segregating correctly, and consistent with reported phenotype. In unsolved families, newly discovered diagnostic variants were returned to two families following confirmatory RNA sequencing, and two prioritized novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant, in an unsolved proband with phenotype overlap with asparagine synthetase deficiency. Conclusions By objective assessment of variant predictions, we provide insights into current state-of-the-art algorithms and platforms for genome sequencing analysis for rare disease diagnosis and explore areas for future optimization. Identification of diagnostic variants in unsolved families promotes synergy between researchers with clinical and computational expertise as a means of advancing the field of clinical genome interpretation.
Collapse
|
7
|
Carraro M, Monzon AM, Chiricosta L, Reggiani F, Aspromonte MC, Bellini M, Pagel K, Jiang Y, Radivojac P, Kundu K, Pal LR, Yin Y, Limongelli I, Andreoletti G, Moult J, Wilson SJ, Katsonis P, Lichtarge O, Chen J, Wang Y, Hu Z, Brenner SE, Ferrari C, Murgia A, Tosatto SC, Leonardi E. Assessment of patient clinical descriptions and pathogenic variants from gene panel sequences in the CAGI-5 intellectual disability challenge. Hum Mutat 2019; 40:1330-1345. [PMID: 31144778 PMCID: PMC7341177 DOI: 10.1002/humu.23823] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 05/07/2019] [Accepted: 05/27/2019] [Indexed: 12/15/2022]
Abstract
The Critical Assessment of Genome Interpretation-5 intellectual disability challenge asked to use computational methods to predict patient clinical phenotypes and the causal variant(s) based on an analysis of their gene panel sequence data. Sequence data for 74 genes associated with intellectual disability (ID) and/or autism spectrum disorders (ASD) from a cohort of 150 patients with a range of neurodevelopmental manifestations (i.e. ID, autism, epilepsy, microcephaly, macrocephaly, hypotonia, ataxia) have been made available for this challenge. For each patient, predictors had to report the causative variants and which of the seven phenotypes were present. Since neurodevelopmental disorders are characterized by strong comorbidity, tested individuals often present more than one pathological condition. Considering the overall clinical manifestation of each patient, the correct phenotype has been predicted by at least one group for 93 individuals (62%). ID and ASD were the best predicted among the seven phenotypic traits. Also, causative or potentially pathogenic variants were predicted correctly by at least one group. However, the prediction of the correct causative variant seems to be insufficient to predict the correct phenotype. In some cases, the correct prediction has been supported by rare or common variants in genes different from the causative one.
Collapse
|
8
|
Nicora G, Limongelli I, Cova R, Della Porta MG, Malcovati L, Cazzola M, Bellazzi R. A Rule-Based Expert System for Automatic Implementation of Somatic Variant Clinical Interpretation Guidelines. Artif Intell Med 2019. [DOI: 10.1007/978-3-030-21642-9_15] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
9
|
Nicora G, Limongelli I, Gambelli P, Memmi M, Malovini A, Mazzanti A, Napolitano C, Priori S, Bellazzi R. CardioVAI: An automatic implementation of ACMG-AMP variant interpretation guidelines in the diagnosis of cardiovascular diseases. Hum Mutat 2018; 39:1835-1846. [PMID: 30298955 DOI: 10.1002/humu.23665] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 09/24/2018] [Accepted: 10/04/2018] [Indexed: 11/09/2022]
Abstract
Variant interpretation for the diagnosis of genetic diseases is a complex process. The American College of Medical Genetics and Genomics, with the Association for Molecular Pathology, have proposed a set of evidence-based guidelines to support variant pathogenicity assessment and reporting in Mendelian diseases. Cardiovascular disorders are a field of application of these guidelines, but practical implementation is challenging due to the genetic disease heterogeneity and the complexity of information sources that need to be integrated. Decision support systems able to automate variant interpretation in the light of specific disease domains are demanded. We implemented CardioVAI (Cardio Variant Interpreter), an automated system for guidelines based variant classification in cardiovascular-related genes. Different omics-resources were integrated to assess pathogenicity of every genomic variant in 72 cardiovascular diseases related genes. We validated our method on benchmark datasets of high-confident assessed variants, reaching pathogenicity and benignity concordance up to 83 and 97.08%, respectively. We compared CardioVAI to similar methods and analyzed the main differences in terms of guidelines implementation. We finally made available CardioVAI as a web resource (http://cardiovai.engenome.com/) that allows users to further specialize guidelines recommendations.
Collapse
|
10
|
Vetro A, Goidin D, Lesende I, Limongelli I, Ranzani GN, Novara F, Bonaglia MC, Rinaldi B, Franchi F, Manolakos E, Lonardo F, Scarano F, Scarano G, Costantino L, Tedeschi S, Giglio S, Zuffardi O. Diagnostic application of a capture based NGS test for the concurrent detection of variants in sequence and copy number as well as LOH. Clin Genet 2017; 93:545-556. [PMID: 28556904 DOI: 10.1111/cge.13060] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Revised: 05/11/2017] [Accepted: 05/12/2017] [Indexed: 01/08/2023]
Abstract
Whole exome sequencing (WES) has made the identification of causative SNVs/InDels associated with rare Mendelian conditions increasingly accessible. Incorporation of softwares allowing CNVs detection into the WES bioinformatics pipelines may increase the diagnostic yield. However, no standard protocols for this analysis are so far available and CNVs in non-coding regions are totally missed by WES, in spite of their possible role in the regulation of the flanking genes expression. So, in a number of cases the diagnostic workflow contemplates an initial investigation by genomic arrays followed, in the negative cases, by WES. The opposite workflow may also be applied, according to the familial segregation of the disease. We show preliminary results for a diagnostic application of a single next generation sequencing panel permitting the concurrent detection of LOH and variations in sequences and copy number. This approach allowed us to highlight compound heterozygosity for a CNV and a sequence variant in a number of cases, the duplication of a non-coding region responsible for sex reversal, and a whole-chromosome isodisomy causing reduction to homozygosity for a WFS1 variant. Moreover, the panel enabled us to detect deletions, duplications, and amplifications with sensitivity comparable to that of the most widely used array-CGH platforms.
Collapse
|
11
|
Marini S, Limongelli I, Rizzo E, Malovini A, Errichiello E, Vetro A, Da T, Zuffardi O, Bellazzi R. A Data Fusion Approach to Enhance Association Study in Epilepsy. PLoS One 2016; 11:e0164940. [PMID: 27984588 PMCID: PMC5161322 DOI: 10.1371/journal.pone.0164940] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2016] [Accepted: 10/04/2016] [Indexed: 11/25/2022] Open
Abstract
Among the scientific challenges posed by complex diseases with a strong genetic component, two stand out. One is unveiling the role of rare and common genetic variants; the other is the design of classification models to improve clinical diagnosis and predictive models for prognosis and personalized therapies. In this paper, we present a data fusion framework merging gene, domain, pathway and protein-protein interaction data related to a next generation sequencing epilepsy gene panel. Our method allows integrating association information from multiple genomic sources and aims at highlighting the set of common and rare variants that are capable to trigger the occurrence of a complex disease. When compared to other approaches, our method shows better performances in classifying patients affected by epilepsy.
Collapse
|
12
|
Della Porta MG, Gallì A, Bacigalupo A, Zibellini S, Bernardi M, Rizzo E, Allione B, van Lint MT, Pioltelli P, Marenco P, Bosi A, Voso MT, Sica S, Cuzzola M, Angelucci E, Rossi M, Ubezio M, Malovini A, Limongelli I, Ferretti VV, Spinelli O, Tresoldi C, Pozzi S, Luchetti S, Pezzetti L, Catricalà S, Milanesi C, Riva A, Bruno B, Ciceri F, Bonifazi F, Bellazzi R, Papaemmanuil E, Santoro A, Alessandrino EP, Rambaldi A, Cazzola M. Clinical Effects of Driver Somatic Mutations on the Outcomes of Patients With Myelodysplastic Syndromes Treated With Allogeneic Hematopoietic Stem-Cell Transplantation. J Clin Oncol 2016; 34:3627-3637. [PMID: 27601546 DOI: 10.1200/jco.2016.67.3616] [Citation(s) in RCA: 177] [Impact Index Per Article: 22.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
PURPOSE The genetic basis of myelodysplastic syndromes (MDS) is heterogeneous, and various combinations of somatic mutations are associated with different clinical phenotypes and outcomes. Whether the genetic basis of MDS influences the outcome of allogeneic hematopoietic stem-cell transplantation (HSCT) is unclear. PATIENTS AND METHODS We studied 401 patients with MDS or acute myeloid leukemia (AML) evolving from MDS (MDS/AML). We used massively parallel sequencing to examine tumor samples collected before HSCT for somatic mutations in 34 recurrently mutated genes in myeloid neoplasms. We then analyzed the impact of mutations on the outcome of HSCT. RESULTS Overall, 87% of patients carried one or more oncogenic mutations. Somatic mutations of ASXL1, RUNX1, and TP53 were independent predictors of relapse and overall survival after HSCT in both patients with MDS and patients with MDS/AML (P values ranging from .003 to .035). In patients with MDS/AML, gene ontology (ie, secondary-type AML carrying mutations in genes of RNA splicing machinery, TP53-mutated AML, or de novo AML) was an independent predictor of posttransplantation outcome (P = .013). The impact of ASXL1, RUNX1, and TP53 mutations on posttransplantation survival was independent of the revised International Prognostic Scoring System (IPSS-R). Combining somatic mutations and IPSS-R risk improved the ability to stratify patients by capturing more prognostic information at an individual level. Accounting for various combinations of IPSS-R risk and somatic mutations, the 5-year probability of survival after HSCT ranged from 0% to 73%. CONCLUSION Somatic mutation in ASXL1, RUNX1, or TP53 is independently associated with unfavorable outcomes and shorter survival after allogeneic HSCT for patients with MDS and MDS/AML. Accounting for these genetic lesions may improve the prognostication precision in clinical practice and in designing clinical trials.
Collapse
|
13
|
Gabetta M, Limongelli I, Rizzo E, Riva A, Segagni D, Bellazzi R. BigQ: a NoSQL based framework to handle genomic variants in i2b2. BMC Bioinformatics 2015; 16:415. [PMID: 26714792 PMCID: PMC4696314 DOI: 10.1186/s12859-015-0861-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2015] [Accepted: 12/15/2015] [Indexed: 12/25/2022] Open
Abstract
Background Precision medicine requires the tight integration of clinical and molecular data. To this end, it is mandatory to define proper technological solutions able to manage the overwhelming amount of high throughput genomic data needed to test associations between genomic signatures and human phenotypes. The i2b2 Center (Informatics for Integrating Biology and the Bedside) has developed a widely internationally adopted framework to use existing clinical data for discovery research that can help the definition of precision medicine interventions when coupled with genetic data. i2b2 can be significantly advanced by designing efficient management solutions of Next Generation Sequencing data. Results We developed BigQ, an extension of the i2b2 framework, which integrates patient clinical phenotypes with genomic variant profiles generated by Next Generation Sequencing. A visual programming i2b2 plugin allows retrieving variants belonging to the patients in a cohort by applying filters on genomic variant annotations. We report an evaluation of the query performance of our system on more than 11 million variants, showing that the implemented solution scales linearly in terms of query time and disk space with the number of variants. Conclusions In this paper we describe a new i2b2 web service composed of an efficient and scalable document-based database that manages annotations of genomic variants and of a visual programming plug-in designed to dynamically perform queries on clinical and genetic data. The system therefore allows managing the fast growing volume of genomic variants and can be used to integrate heterogeneous genomic annotations. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0861-0) contains supplementary material, which is available to authorized users.
Collapse
|
14
|
Limongelli I, Marini S, Bellazzi R. PaPI: pseudo amino acid composition to score human protein-coding variants. BMC Bioinformatics 2015; 16:123. [PMID: 25928477 PMCID: PMC4411653 DOI: 10.1186/s12859-015-0554-8] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2014] [Accepted: 01/15/2015] [Indexed: 12/31/2022] Open
Abstract
Background High throughput sequencing technologies are able to identify the whole genomic variation of an individual. Gene-targeted and whole-exome experiments are mainly focused on coding sequence variants related to a single or multiple nucleotides. The analysis of the biological significance of this multitude of genomic variant is challenging and computational demanding. Results We present PaPI, a new machine-learning approach to classify and score human coding variants by estimating the probability to damage their protein-related function. The novelty of this approach consists in using pseudo amino acid composition through which wild and mutated protein sequences are represented in a discrete model. A machine learning classifier has been trained on a set of known deleterious and benign coding variants with the aim to score unobserved variants by taking into account hidden sequence patterns in human genome potentially leading to diseases. We show how the combination of amphiphilic pseudo amino acid composition, evolutionary conservation and homologous proteins based methods outperforms several prediction algorithms and it is also able to score complex variants such as deletions, insertions and indels. Conclusions This paper describes a machine-learning approach to predict the deleteriousness of human coding variants. A freely available web application (http://papi.unipv.it) has been developed with the presented method, able to score up to thousands variants in a single run. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0554-8) contains supplementary material, which is available to authorized users.
Collapse
|
15
|
Vetro A, Iascone M, Limongelli I, Ameziane N, Gana S, Della Mina E, Giussani U, Ciccone R, Forlino A, Pezzoli L, Rooimans MA, van Essen AJ, Messa J, Rizzuti T, Bianchi P, Dorsman J, de Winter JP, Lalatta F, Zuffardi O. Loss-of-Function FANCL Mutations Associate with Severe Fanconi Anemia Overlapping the VACTERL Association. Hum Mutat 2015; 36:562-8. [PMID: 25754594 DOI: 10.1002/humu.22784] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2014] [Accepted: 02/26/2015] [Indexed: 11/08/2022]
Abstract
The diagnosis of VACTERL syndrome can be elusive, especially in the prenatal life, due to the presence of malformations that overlap those present in other genetic conditions, including the Fanconi anemia (FA). We report on three VACTERL cases within two families, where the two who arrived to be born died shortly after birth due to severe organs' malformations. The suspicion of VACTERL association was based on prenatal ultrasound assessment and postnatal features. Subsequent chromosome breakage analysis suggested the diagnosis of FA. Finally, by next-generation sequencing based on the analysis of the exome in one family and of a panel of Fanconi genes in the second one, we identified novel FANCL truncating mutations in both families. We used ectopic expression of wild-type FANCL to functionally correct the cellular FA phenotype for both mutations. Our study emphasizes that the diagnosis of FA should be considered when VACTERL association is suspected. Furthermore, we show that loss-of-function mutations in FANCL result in a severe clinical phenotype characterized by early postnatal death.
Collapse
|
16
|
Decio A, Tonduti D, Pichiecchio A, Vetro A, Ciccone R, Limongelli I, Giorda R, Caffi L, Balottin U, Zuffardi O, Orcesi S. A novel mutation in COL4A1 gene: a possible cause of early postnatal cerebrovascular events. Am J Med Genet A 2015; 167A:810-5. [PMID: 25706114 DOI: 10.1002/ajmg.a.36907] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2014] [Accepted: 11/14/2014] [Indexed: 11/09/2022]
Abstract
COL4A1 is located in humans on chromosome13q34 and it encodes the alpha 1 chain of type IV collagen, a component of basal membrane. It is expressed mainly in the brain, muscles, kidneys and eyes. Different COL4A1 mutations have been reported in many patients who present a very wide spectrum of clinical symptoms. They typically show a multisystemic phenotype. Here we report on the case of a patient carrying a novel de novo splicing mutation of COL4A1 associated with a distinctive clinical picture characterized by onset in infancy and an unusual evolution of the neuroradiological features. At three months of age, the child was diagnosed with a congenital cataract, while his brain MRI was normal. Over the following years, the patient developed focal epilepsy, mild diplegia, asymptomatic microhematuria, raised creatine kinase levels, MRI white matter abnormalities and brain calcification on CT. During the neuroradiological follow-up the extension and intensity of the brain lesions progressively decreased. The significance of a second variant in COL4A1 carried by the child and inherited from his father remains to be clarified. In conclusion, our patient shows new aspects of this collagenopathy and possibly a COL4A1 compound heterozygosity.
Collapse
|
17
|
Desbats MA, Vetro A, Limongelli I, Lunardi G, Casarin A, Doimo M, Spinazzi M, Angelini C, Cenacchi G, Burlina A, Rodriguez Hernandez MA, Chiandetti L, Clementi M, Trevisson E, Navas P, Zuffardi O, Salviati L. Primary coenzyme Q10 deficiency presenting as fatal neonatal multiorgan failure. Eur J Hum Genet 2015; 23:1254-8. [PMID: 25564041 PMCID: PMC4430297 DOI: 10.1038/ejhg.2014.277] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2014] [Revised: 09/04/2014] [Accepted: 10/21/2014] [Indexed: 11/09/2022] Open
Abstract
Coenzyme Q10 deficiency is a clinically and genetically heterogeneous disorder, with manifestations that may range from fatal neonatal multisystem failure, to adult-onset encephalopathy. We report a patient who presented at birth with severe lactic acidosis, proteinuria, dicarboxylic aciduria, and hepatic insufficiency. She also had dilation of left ventricle on echocardiography. Her neurological condition rapidly worsened and despite aggressive care she died at 23 h of life. Muscle histology displayed lipid accumulation. Electron microscopy showed markedly swollen mitochondria with fragmented cristae. Respiratory-chain enzymatic assays showed a reduction of combined activities of complex I+III and II+III with normal activities of isolated complexes. The defect was confirmed in fibroblasts, where it could be rescued by supplementing the culture medium with 10 μM coenzyme Q10. Coenzyme Q10 levels were reduced (28% of controls) in these cells. We performed exome sequencing and focused the analysis on genes involved in coenzyme Q10 biosynthesis. The patient harbored a homozygous c.545T>G, p.(Met182Arg) alteration in COQ2, which was validated by functional complementation in yeast. In this case the biochemical and morphological features were essential to direct the genetic diagnosis. The parents had another pregnancy after the biochemical diagnosis was established, but before the identification of the genetic defect. Because of the potentially high recurrence risk, and given the importance of early CoQ10 supplementation, we decided to treat with CoQ10 the newborn child pending the results of the biochemical assays. Clinicians should consider a similar management in siblings of patients with CoQ10 deficiency without a genetic diagnosis.
Collapse
|
18
|
Marinoni A, Rizzo E, Limongelli I, Gamba P, Bellazzi R. A kinetic model-based algorithm to classify NGS short reads by their allele origin. J Biomed Inform 2014; 53:121-7. [PMID: 25311269 DOI: 10.1016/j.jbi.2014.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Revised: 09/29/2014] [Accepted: 10/01/2014] [Indexed: 11/17/2022]
Abstract
Genotyping Next Generation Sequencing (NGS) data of a diploid genome aims to assign the zygosity of identified variants through comparison with a reference genome. Current methods typically employ probabilistic models that rely on the pileup of bases at each locus and on a priori knowledge. We present a new algorithm, called Kimimila (KInetic Modeling based on InforMation theory to Infer Labels of Alleles), which is able to assign reads to alleles by using a distance geometry approach and to infer the variant genotypes accurately, without any kind of assumption. The performance of the model has been assessed on simulated and real data of the 1000 Genomes Project and the results have been compared with several commonly used genotyping methods, i.e., GATK, Samtools, VarScan, FreeBayes and Atlas2. Despite our algorithm does not make use of a priori knowledge, the percentage of correctly genotyped variants is comparable to these algorithms. Furthermore, our method allows the user to split the reads pool depending on the inferred allele origin.
Collapse
|
19
|
Di Fonzo A, Ronchi D, Gallia F, Cribiù FM, Trezzi I, Vetro A, Della Mina E, Limongelli I, Bellazzi R, Ricca I, Micieli G, Fassone E, Rizzuti M, Bordoni A, Fortunato F, Salani S, Mora G, Corti S, Ceroni M, Bosari S, Zuffardi O, Bresolin N, Nobile-Orazio E, Comi GP. Lower motor neuron disease with respiratory failure caused by a novel MAPT mutation. Neurology 2014; 82:1990-8. [PMID: 24808015 DOI: 10.1212/wnl.0000000000000476] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
OBJECTIVE To investigate the molecular defect underlying a large Italian kindred with progressive adult-onset respiratory failure, proximal weakness of the upper limbs, and evidence of lower motor neuron degeneration. METHODS We describe the clinical features of 5 patients presenting with prominent respiratory insufficiency, proximal weakness of the upper limbs, and no signs of frontotemporal lobar degeneration or semantic dementia. Molecular analysis was performed combining linkage and exome sequencing analyses. Further investigations included transcript analysis and immunocytochemical and protein studies on established cell models. RESULTS Genome-wide linkage analysis showed an association with chromosome 17q21. Exome analysis disclosed a missense change in MAPT segregating dominantly with the disease and resulting in D348G-mutated tau protein. Motor neuron cell lines overexpressing mutated D348G tau isoforms displayed a consistent reduction in neurite length and arborization. The mutation does not seem to modify tau interactions with microtubules. Neuropathologic studies were performed in one affected subject, which exhibited α-motoneuron loss and atrophy of the spinal anterior horns with accumulation of phosphorylated tau within the surviving motor neurons. Staining for 3R- and 4R-tau revealed pathology similar to that observed in familial cases harboring MAPT mutations. CONCLUSION Our study broadens the phenotype of tauopathies to include lower motor neuron disease and implicate tau degradation pathway defects in motor neuron degeneration.
Collapse
|