1
|
Yin R, Gutierrez A, Kobren SN, Avillach P. VarPPUD: Variant post prioritization developed for undiagnosed genetic disorders. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.15.24305876. [PMID: 38699371 PMCID: PMC11065012 DOI: 10.1101/2024.04.15.24305876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Rare and ultra-rare genetic conditions are estimated to impact nearly 1 in 17 people worldwide, yet accurately pinpointing the diagnostic variants underlying each of these conditions remains a formidable challenge. Because comprehensive, in vivo functional assessment of all possible genetic variants is infeasible, clinicians instead consider in silico variant pathogenicity predictions to distinguish plausibly disease-causing from benign variants across the genome. However, in the most difficult undiagnosed cases, such as those accepted to the Undiagnosed Diseases Network (UDN), existing pathogenicity predictions cannot reliably discern true etiological variant(s) from other deleterious candidate variants that were prioritized through N-of-1 efforts. Pinpointing the disease-causing variant from a pool of plausible candidates remains a largely manual effort requiring extensive clinical workups, functional and experimental assays, and eventual identification of genotype- and phenotype-matched individuals. Here, we introduce VarPPUD, a tool trained on prioritized variants from UDN cases, that leverages gene-, amino acid-, and nucleotide-level features to discern pathogenic variants from other deleterious variants that are unlikely to be confirmed as disease relevant. VarPPUD achieves a cross-validated accuracy of 79.3% and precision of 77.5% on a held-out subset of uniquely challenging UDN cases, respectively representing an average 18.6% and 23.4% improvement over nine traditional pathogenicity prediction approaches on this task. We validate VarPPUD's ability to discriminate likely from unlikely pathogenic variants on synthetic, GAN-generated candidate variants as well. Finally, we show how VarPPUD can be probed to evaluate each input feature's importance and contribution toward prediction-an essential step toward understanding the distinct characteristics of newly-uncovered disease-causing variants. Significance Statement Patients with chronic, undiagnosed and underdiagnosed genetic conditions often endure expensive and excruciating years-long diagnostic odysseys without clear results. In many instances, clinical genome sequencing of patients and their family members fails to reveal known disease-causing variants, although compelling variants of uncertain significance are frequently encountered. Existing computational tools struggle to reliably differentiate truly disease-causing variants from other plausible candidate variants within these prioritized sets. Consequently, the confirmation of disease-causing variants often necessitates extensive experimental follow-up, including studies in model organisms and identification of other similarly presenting genotype-matched individuals, a process that can extend for several years. Here, we present VarPPUD, a tool trained specifically to distinguish likely from unlikely to be confirmed pathogenic variants that were prioritized across cases in the Undiagnosed Diseases Network. By evaluating the importance and impact of different input feature values on prediction, we gain deeper insights into the distinctive attributes of difficult-to-identify diagnostic variants. For patients who remain undiagnosed following comprehensive whole genome sequencing, our new method VarPPUD may reveal pathogenic variants amid a pool of candidate variants, thereby advancing diagnostic efforts where progress has otherwise stalled.
Collapse
|
2
|
Kothari C, Srivastava S, Kousa Y, Izem R, Gierdalski M, Kim D, Good A, Dies KA, Geisel G, Morizono H, Gallo V, Pomeroy SL, Garden GA, Guay-Woodford L, Sahin M, Avillach P. Validation of a computational phenotype for finding patients eligible for genetic testing for pathogenic PTEN variants across three centers. J Neurodev Disord 2022; 14:24. [PMID: 35321655 PMCID: PMC8943944 DOI: 10.1186/s11689-022-09434-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/10/2021] [Accepted: 03/04/2022] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Computational phenotypes are most often combinations of patient billing codes that are highly predictive of disease using electronic health records (EHR). In the case of rare diseases that can only be diagnosed by genetic testing, computational phenotypes identify patient cohorts for genetic testing and possible diagnosis. This article details the validation of a computational phenotype for PTEN hamartoma tumor syndrome (PHTS) against the EHR of patients at three collaborating clinical research centers: Boston Children's Hospital, Children's National Hospital, and the University of Washington. METHODS A combination of billing codes from the International Classification of Diseases versions 9 and 10 (ICD-9 and ICD-10) for diagnostic criteria postulated by a research team at Cleveland Clinic was used to identify patient cohorts for genetic testing from the clinical data warehouses at the three research centers. Subsequently, the EHR-including billing codes, clinical notes, and genetic reports-of these patients were reviewed by clinical experts to identify patients with PHTS. RESULTS The PTEN genetic testing yield of the computational phenotype, the number of patients who needed to be genetically tested for incidence of pathogenic PTEN gene variants, ranged from 82 to 94% at the three centers. CONCLUSIONS Computational phenotypes have the potential to enable the timely and accurate diagnosis of rare genetic diseases such as PHTS by identifying patient cohorts for genetic sequencing and testing.
Collapse
Affiliation(s)
- Cartik Kothari
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
| | - Siddharth Srivastava
- Department of Neurology, Rosamund Stone Zander Translational Neuroscience Center, Boston Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Youssef Kousa
- Division of Neurology, Children's National Hospital, Washington, DC, 20010, USA.,Department of Genomics and Precision Medicine, The George Washington University School of Medicine and Health Sciences, Washington, DC, 20052, USA
| | - Rima Izem
- Division of Biostatistics and Study Methodology, Children's National Research Institute, Silver Spring, MD, 20910, USA
| | - Marcin Gierdalski
- Division of Biostatistics and Study Methodology, Children's National Hospital, Washington, DC, 20010, USA
| | - Dongkyu Kim
- Division of Biostatistics and Study Methodology, Children's National Hospital, Washington, DC, 20010, USA
| | - Amy Good
- Institute for Translational Health Sciences, University of Washington, Seattle, WA, 98195, USA
| | - Kira A Dies
- Department of Neurology, Rosamund Stone Zander Translational Neuroscience Center, Boston Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Gregory Geisel
- Department of Neurology, Rosamund Stone Zander Translational Neuroscience Center, Boston Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Hiroki Morizono
- Center for Genetic Medicine Research, Children's National Hospital, Washington, DC, 20010, USA.,Department of Genomics and Precision Medicine, The George Washington University School of Medicine and Health Sciences, Washington, DC, 20052, USA
| | - Vittorio Gallo
- Center for Neuroscience Research, Children's National Research Institute, Children's National Hospital, Washington, DC, 20010, USA
| | - Scott L Pomeroy
- Department of Neurology, Boston Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Gwenn A Garden
- Department of Neurology and Center on Human Development and Disability, University of Washington, Seattle, WA, 98195, USA.,Department of Neurology, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Lisa Guay-Woodford
- Center for Translational Research, Children's National Hospital, Washington, DC, 20010, USA
| | - Mustafa Sahin
- Department of Neurology, Rosamund Stone Zander Translational Neuroscience Center, Boston Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Paul Avillach
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
3
|
Amiri H, Kohane IS. Machine Learning of Patient Characteristics to Predict Admission Outcomes in the Undiagnosed Diseases Network. JAMA Netw Open 2021; 4:e2036220. [PMID: 33630084 PMCID: PMC7907957 DOI: 10.1001/jamanetworkopen.2020.36220] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
IMPORTANCE The Undiagnosed Diseases Network (UDN) is a national network that evaluates individual patients whose signs and symptoms have been refractory to diagnosis. Providing reliable estimates of admission outcomes may assist clinical evaluators to distinguish, prioritize, and accelerate admission to the UDN for patients with undiagnosed diseases. OBJECTIVE To develop computational models that effectively predict admission outcomes for applicants seeking UDN evaluation and to rank the applications based on the likelihood of patient admission to the UDN. DESIGN, SETTING, AND PARTICIPANTS This prognostic study included all applications submitted to the UDN from July 2014 to June 2019, with 1209 applications accepted and 1212 applications not accepted. The main inclusion criterion was an undiagnosed condition despite thorough evaluation by a health care professional; the main exclusion criteria were a diagnosis that explained the objective findings or a review of the records that suggested a diagnosis. A classifier was trained using information extracted from application forms, referral letters from health care professionals, and semantic similarity between referral letters and textual description of known mendelian disorders. The admission labels were provided by the case review committee of the UDN. In addition to retrospective analysis, the classifier was prospectively tested on another 288 applications that were not evaluated at the time of classifier development. MAIN OUTCOMES AND MEASURES The primary outcomes were whether a patient was accepted or not accepted to the UDN and application order ranked based on likelihood of admission. The performance of the classifier was assessed by comparing its predictions against the UDN admission outcomes and by measuring improvement in the mean processing time for accepted applications. RESULTS The best classifier obtained sensitivity of 0.843, specificity of 0.738, and area under the receiver operating characteristic curve of 0.844 for predicting admission outcomes among 1212 accepted and 1210 not accepted applications. In addition, the classifier can decrease the current mean (SD) UDN processing time for accepted applications from 3.29 (3.17) months to 1.05 (3.82) months (68% improvement) by ordering applications based on their likelihood of acceptance. CONCLUSIONS AND RELEVANCE A classification system was developed that may assist clinical evaluators to distinguish, prioritize, and accelerate admission to the UDN for patients with undiagnosed diseases. Accelerating the admission process may improve the diagnostic journeys for these patients and serve as a model for partial automation of triaging or referral for other resource-constrained applications. Such classification models make explicit some of the considerations that currently inform the use of whole-genome sequencing for undiagnosed disease and thereby invite a broader discussion in the clinical genetics community.
Collapse
Affiliation(s)
- Hadi Amiri
- Department of Biomedical Informatics, Harvard University, Boston, Massachusetts
- Department of Computer Science, University of Massachusetts, Lowell
| | - Isaac S. Kohane
- Department of Biomedical Informatics, Harvard University, Boston, Massachusetts
| |
Collapse
|
4
|
Ramirez-Cheyne J, Moreno M, Mosquera S, Duque S, Holguín J, Camacho A. Primeros 2 años del registro municipal de enfermedades huérfanas-raras de Cali e identificación de algunas variables sociodemográficas y clínicas asociadas a mortalidad. IATREIA 2020. [DOI: 10.17533/udea.iatreia.37] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Introducción: en los últimos años Colombia reconoció las enfermedades huérfanas-raras como problema de interés en salud pública y ordenó su notificación obligatoria.
Objetivo: describir la información sobre las enfermedades huérfanas-raras obtenida en Cali a través del SIVIGILA en los primeros 2 años de registro.
Materiales y métodos: estudio observacional transversal analítico. Se calcularon frecuencias absolutas y relativas. Se realizó un análisis de normalidad con el Test Shapiro-Wilk. Se calcularon prevalencias. Se evaluó la relación de diferentes variables sociodemográficas y clínicas y el riesgo de mortalidad usando modelos lineales generalizados, la familia de distribución de Poisson con función de enlace logarítmica y modelos de varianza.
Resultados: fueron notificados 635 casos: 78 en el 2016 (prevalencia 3,25/100.0009) y 557 en el 2017 (prevalencia 23,01/100.000). La mayoría de los casos pertenecen al régimen contributivo. Las comunas con mayor número de casos y mayor prevalencia fueron la 17 y la 22. Entre las primeras enfermedades huérfanas-raras más comunes está la drepanocitosis, fue la más notificada en Cali con 25 casos para el 2016 (prevalencia 1,04/100.000) y 77 casos para el 2017 (prevalencia 3,1/100.000). La tasa cruda de mortalidad estimada para el periodo de estudio fue 0,83/100.000, las enfermedades con mayor mortalidad fueron la drepanocitosis en mujeres (0,12/100.000) y la polineuropatía en hombres (0,13/100.000).
Discusión: es necesario realizar y publicar en el futuro análisis más profundos a través de la revisión detallada de historias clínicas y la incorporación de otras fuentes disponibles, como el Registro Individual de la Prestación de Servicios (RIPS) y el Registro Único de Afiliados (RUAF), con el fin de disminuir el subregistro y suministrar a toda la comunidad información más precisa y detallada.
Collapse
|
5
|
van Karnebeek CDM, Beumer D, Pawliuk C, Goez H, Mostafavi S, Andrews G, Steele R, Siden H. A novel classification system for research reporting in rare and progressive genetic conditions. Dev Med Child Neurol 2019; 61:1208-1213. [PMID: 30868573 DOI: 10.1111/dmcn.14180] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/18/2018] [Indexed: 01/01/2023]
Abstract
AIM To create a classification system for severe, rare, and progressive genetic conditions for use in research reporting. METHOD A modified Delphi consensus technique was used to create and reach agreement on a new system of condition categories. Interrater reliability was tested via two rounds of an online survey whereby physicians classified a subset of conditions using our novel system. Overall percentage agreement and agreement above chance were calculated using Fleiss' kappa (κ). RESULTS Eleven physicians completed the first Delphi, with an overall agreement of 76.4%, the κ value was 0.57 (95% confidence interval 0.51-0.63), indicating moderate agreement (0.41-0.60) above chance. Based on the first survey several categories were described in more detail. The second survey confirmed a classification system with 12 categories, with an overall percentage agreement among the participants of 82.6%. The overall mean κ value was 0.71 (95% confidence interval 0.65-0.77), indicating substantial agreement (0.61-0.80). INTERPRETATION Our new system was useful in categorizing a broad range of rare childhood diseases and may be applicable to other rare disease studies; further validation in larger cohorts is required. WHAT THIS PAPER ADDS This novel 12-category classification system can be used in research reporting in rare and progressive genetic conditions.
Collapse
Affiliation(s)
- Clara D M van Karnebeek
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada.,Departments of Pediatrics and Clinical Genetics, Amsterdam University Medical Centres, Amsterdam, the Netherlands.,Department of Pediatrics, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Daniël Beumer
- Department of Anesthesiology, Amsterdam University Medical Centres, Amsterdam, the Netherlands
| | - Colleen Pawliuk
- BC Children's Hospital Research Institute, Vancouver, BC, Canada
| | - Helly Goez
- Division of Pediatric Neurology, Department of Pediatrics, University of Alberta, Stollery Children's Hospital, Edmonton, AB, Canada
| | - Sara Mostafavi
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada
| | - Gail Andrews
- BC Children's Hospital Research Institute, Vancouver, BC, Canada
| | - Rose Steele
- School of Nursing, Faculty of Health, York University, Toronto, ON, Canada
| | - Harold Siden
- Department of Pediatrics, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada.,BC Children's Hospital Research Institute, Vancouver, BC, Canada
| |
Collapse
|
6
|
Mateus HE, Pérez AM, Mesa ML, Escobar G, Gálvez JM, Montaño JI, Ospina ML, Laissue P. A first description of the Colombian national registry for rare diseases. BMC Res Notes 2017; 10:514. [PMID: 29073918 PMCID: PMC5659024 DOI: 10.1186/s13104-017-2840-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Accepted: 10/23/2017] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE Orphan diseases must be considered a public health concern, underlying country-specific challenges for their accurate and opportune diagnosis, classification and management. Orphan disease registries have not yet been created in South America, a continent having a population of ~ 415 million inhabitants. In Colombia ~ 3 million of patients are affected by rare diseases. The aim of the present study was to establish the first Colombian national registry for rare diseases. The registry was created after the establishment of laws promoting the development of clinical guidelines for diagnosis, management, census and registry of patients suffering rare diseases. RESULTS In total, 13,215 patients were recorded in the Colombian registry. The survey reported 653 rare diseases. The most common diseases were congenital factor VIII deficiency (hemophilia A) (8.5%), myasthenia gravis (6.4%), von Willebrand disease (5.9%), short stature due to growth hormone qualitative anomaly (4.2%), bronchopulmonary dysplasia (3.9%) and cystic fibrosis (3.2%). Although, a marked under-reporting of cases was observed, some pathologies displayed similar behavior to that reported by other initiatives and databases. The data currently available in the registry provides a baseline for improvement regarding local and regional surveys and the start for better understanding rare diseases in Colombia.
Collapse
Affiliation(s)
- Heidi Eliana Mateus
- Center For Research in Genetics and Genomics-CIGGUR, GENIUROS Research Group, School of Medicine and Health Sciences, Universidad del Rosario, Carrera 24 No. 63C-69, Bogotá, Colombia
| | - Ana María Pérez
- Center For Research in Genetics and Genomics-CIGGUR, GENIUROS Research Group, School of Medicine and Health Sciences, Universidad del Rosario, Carrera 24 No. 63C-69, Bogotá, Colombia
| | | | - Germán Escobar
- Ministerio de Salud y Protección Social, Bogotá, Colombia
| | - Jubby Marcela Gálvez
- Center For Research in Genetics and Genomics-CIGGUR, GENIUROS Research Group, School of Medicine and Health Sciences, Universidad del Rosario, Carrera 24 No. 63C-69, Bogotá, Colombia
| | | | | | - Paul Laissue
- Center For Research in Genetics and Genomics-CIGGUR, GENIUROS Research Group, School of Medicine and Health Sciences, Universidad del Rosario, Carrera 24 No. 63C-69, Bogotá, Colombia.
| |
Collapse
|
7
|
Zoni A, Domínguez Berjón M, Barceló E, Esteban Vasallo M, Abaitua I, Jiménez Villa J, Margolles Martins M, Navarro C, Posada M, Ramos Aceitero J, Vázquez Santos C, Zurriaga Llorens O, Astray Mochales J. Identifying data sources for a national population-based registry: the experience of the Spanish Rare Diseases Registry. Public Health 2015; 129:271-5. [DOI: 10.1016/j.puhe.2014.12.013] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2013] [Revised: 12/09/2014] [Accepted: 12/12/2014] [Indexed: 10/24/2022]
|